Commit Graph

51676 Commits

Author SHA1 Message Date
Benjamin Trent f71c305090
[7.x] [Transform] add support for terms agg in transforms (#56696) (#56809)
* [Transform] add support for terms agg in transforms (#56696)

This adds support for `terms` and `rare_terms` aggs in transforms. 

The default behavior is that the results are collapsed in the following manner:
`<AGG_NAME>.<BUCKET_NAME>.<SUBAGGS...>...`
Or if no sub aggs exist
`<AGG_NAME>.<BUCKET_NAME>.<_doc_count>`

The mapping is also defined as `flattened` by default. This is to avoid field explosion while still providing (limited) search and aggregation capabilities.
2020-05-15 08:08:43 -04:00
David Roberts 270a23e422 [TEST] Fix log tail mocking in native process unit tests (#56804)
This is a followup to #56632. Tests that had to be changed
to mock the C++ log handler more accurately need to be more
careful about when that stream ends, as ending of that
stream is used to detect crashes in the production system.

Fixes #56796
2020-05-15 12:46:37 +01:00
Alan Woodward d33d13f2be Simplify generics on Mapper.Builder (#56747)
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
2020-05-15 12:14:49 +01:00
Francisco Fernández Castaño 1530bff0cb
Move azure client logic from AzureStorageService to AzureBlobStore (#56806)
Backport of #56782
2020-05-15 11:30:15 +02:00
David Turner 27a090232e Suppress Kerberos tests on JDK15 (#56767)
Somewhat convoluted AwaitsFix for #56507 that only applies on JDK15.
2020-05-15 07:41:04 +01:00
Yang Wang c66e7ecbfe
Fix test failure of file role store auto-reload (#56398) (#56802)
Ensure assertion is only performed when we can be sure that the desired changes are picked up by the file watcher.
2020-05-15 15:10:45 +10:00
Ryan Ernst 9fb80d3827
Move publishing configuration to a separate plugin (#56727)
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
2020-05-14 20:23:07 -07:00
Tal Levy 5e90ff32f7
Add Normalize Pipeline Aggregation (#56399) (#56792)
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.

For example:

```
{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}
```
2020-05-14 17:40:15 -07:00
Lee Hinman a73d7d9e2b
[7.x] Don't allow invalid template combinations (#56397) (#56795)
Backports the following commits to 7.x:

- Don't allow invalid template combinations (#56397)
2020-05-14 16:20:53 -06:00
Mark Vieira 0fd756d511
Enforce strict license distribution requirements (#56642) 2020-05-14 13:57:56 -07:00
Jake Landis a22aabcc15
[7.x] Reduce chance for test failure due to schedule (#56633) (#56695)
If CI is running tests at exactly 0 or 5 minutes past the hour
the ack-watch docs tests may fail with a 409 error if the ack
test happens to run at the exact time that the schedule watch
is running.

This commit changes the public documentation (and the test) for
the ack to a feb 29th at noon schedule. Test doc or tests do
not really care about the schedule date and this is chosen
since it is a valid date, but one that is extremely unlikely
to cause issues.
2020-05-14 15:52:04 -05:00
James Rodewig 2a943a58a4
[DOCS] EQL: Document `number` function (#56770)
Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
2020-05-14 15:44:04 -04:00
Costin Leau 6f4af43405 EQL: Skip execution for filters with empty results (#56718)
Optimize away events queries and joins/sequence that cannot match any
results without having to query the backend.

(cherry picked from commit 69c8ef8cfefd8fc6dcb6d1a566bfcd537068e3e4)
2020-05-14 22:38:23 +03:00
Armin Braun 14a042fbe5
Make No. of Transport Threads == Available CPUs (#56488) (#56780)
We never do any file IO or other blocking work on the transport threads
so no tangible benefit can be derived from using more threads than CPUs
for IO.
There are however significant downsides to using more threads than necessary
with Netty in particular. Since we use the default setting for
`io.netty.allocator.useCacheForAllThreads` which is `true` we end up
using up to `16MB` of thread local buffer cache for each transport thread.
Meaning we potentially waste CPUs * 16MB of heap for unnecessary IO threads in addition to obvious inefficiencies of artificially adding extra context switches.
2020-05-14 21:33:46 +02:00
Mark Tozzi b718193a01
Clean up DocValuesIndexFieldData (#56372) (#56684) 2020-05-14 12:42:37 -04:00
Nhat Nguyen 044ee380e8 Use ConcurrentSet in testTrackingChannelTask (#56775)
We need to use a ConcurrentSet to track the canceled tasks
as cancelTaskAndDescendants can be called concurrently.

Closes #56746
2020-05-14 12:22:59 -04:00
Dimitris Athanasiou ac5902624c
[7.x][ML] Improve error upon DF analytics mappings conflict (#56700) (#56776)
Adds the conflicting types and an example of an index which specifies
them in order to make it easier for the user to understand the conflict.

Backport of #56700
2020-05-14 19:16:10 +03:00
James Rodewig 2921747b23
[7.x] [DOCS] EQL: Document sequences (#56721) (#56774)
Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
2020-05-14 11:51:40 -04:00
Lisa Cawley 6a8e10189f [DOCS] Add throttling based on configuration parameter (#56653) 2020-05-14 08:45:29 -07:00
Jim Ferenczi fb5e6329b7 Stop/Start async search maintenance service in tests(#56673)
This change ensures that the maintenance service that is responsible for deleting the expired response is stopped between each test. This is needed since we check that no search context are in-flight after each test method.

Fixes #55988
2020-05-14 15:13:01 +02:00
Francisco Fernández Castaño 97bf47f5b9
Track GET/LIST GoogleCloudStorage API calls (#56758)
Backporting #56585 to 7.x branch.

Adds tracking for the API calls performed by the GoogleCloudStorage
underlying SDK. It hooks an HttpResponseInterceptor to the SDK
transport layer and does http request filtering based on the URI
paths that we are interested to track. Unfortunately we cannot hook
a wrapper into the ServiceRPC interface since we're using different
levels of abstraction to implement retries during reads
(GoogleCloudStorageRetryingInputStream).
2020-05-14 14:03:21 +02:00
David Turner f0c2c25527 AwaitsFix for #56746 (and #56751) 2020-05-14 12:46:32 +01:00
David Turner 63cc53e512 AwaitsFix for #56757 2020-05-14 12:00:15 +01:00
David Turner bec6821fe6 AwaitsFix for #56755 2020-05-14 11:46:05 +01:00
Martijn van Groningen b87aeb09f7
Allow more apis to resolve data streams (#56743)
Backporting #56683 to 7.x branch.

Allow get settings, cluster state and field caps apis to resolve data streams.
2020-05-14 10:57:13 +02:00
Alexander Reelsen 3a263d91f6 Ensure watcher email action message ids are always unique (#56574)
If an email action is used in a foreach loop, message ids could have
been duplicated, which then get rejected by the mail server.

This commit introduces an additional static counter in the email action
in order to ensure that every message id is unique.
2020-05-14 10:36:00 +02:00
David Roberts 4438115be0 [DOCS] Docs changes for overridden delimiter in find_file_structure (#56288)
Docs for #55735

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-05-14 09:25:21 +01:00
Przemysław Witek 98fbd85290
[7.x] Add scope-related fields to Annotation (#56417) (#56681) 2020-05-14 10:23:13 +02:00
Andrei Stefan ddf4e47e86
EQL: fix QueryFolderOkTests (#56714) (#56728)
(cherry picked from commit 8b21ccd0eac3b3d0fbd090152b3dff6ae5217b52)
2020-05-14 10:58:25 +03:00
David Roberts 3051c37f92
[ML] Tail the C++ logging pipe before connecting other pipes (#56701)
Prior to this change the named pipes that connect the ML C++
processes to the Elasticsearch JVM were all opened before any
of them were read from or written to.

This created a problem, where if the C++ process logged more
messages between opening the log pipe and opening the last
pipe to be connected than there was space for in the named
pipe's buffer then the C++ process would block.  This would
mean it never got as far as opening the last named pipe, so
the JVM would never get as far as reading from the log pipe,
hence a deadlock.

This change alters the connection order so that the JVM
starts reading from the logging pipe immediately after opening
it so that if the C++ process logs messages while opening the
other named pipes they are captured in a timely manner and
there is no danger of a deadlock.

Backport of #56632
2020-05-14 07:10:30 +01:00
Nhat Nguyen ac432f6612 Reduce test load in TaskManagerTests 2020-05-13 23:52:48 -04:00
Aleksandr Maus 87a10806ab
EQL: Fix cidrMatch function fails to match when used in scripts (#56246) (#56735)
EQL: Fix cidrMatch function fails to match when used in scripts (#56246)

Addresses https://github.com/elastic/elasticsearch/issues/55709
2020-05-13 22:41:24 -04:00
debadair 83e9ff42da
[DOCS] Added info about automatic config for Beats & Logstash. (#56317) (#56729)
* [DOCS] Added info about automatic config for Beats & Logstash.

* Update docs/reference/ilm/set-up-lifecycle-policy.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/ilm/set-up-lifecycle-policy.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/ilm/index.asciidoc

* Updated note in GS tutorial

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-05-13 19:27:22 -07:00
Nhat Nguyen 566b23c42c
Cancel task and descendants on channel disconnects (#56620)
If a channel gets disconnected, then we should cancel the tasks
associated with that channel as their results won't be retrieved.

Closes #56327
Relates #56619

Backport of #56620
2020-05-13 22:09:58 -04:00
Jason Tedor 7c8860b7e6
Update number of replicas when removing setting (#56723)
We previously rejected removing the number of replicas setting, which
prevents users from reverting this setting to its default the natural
way. To fix this, we put back the setting with the default value in the
cases that the user is trying to remove it. Yet, we also need to do the
work of updating the routing table and so on appropriately. This case
was missed because when the setting is being removed, we were defaulting
to -1 in this code path, which is treated as not being updated. Instead,
we must treat the case when we are removing this setting as if the
setting is being updated, too. This commit does that.
2020-05-13 20:13:25 -04:00
debadair 60f8a32dba
[DOCS] Add info about ILM and unallocated shards. (#56655) (#56724)
* [DOCS] Add info about ILM and unallocated shards.

* Incorporated review feedback.

* Update docs/reference/ilm/actions/ilm-allocate.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Apply suggestions from code review

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Fix xref

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-05-13 16:12:37 -07:00
David Roberts ab40466bfb
Prevent unexpected native controller output hanging the process (#56685)
In normal operation native controllers are not expected to write
anything to stdout or stderr.  However, if due to an error or
something unexpected with the environment a native controller
does write something to stdout or stderr then it will block if
nothing is reading that output.

This change makes the stdout and stderr of native controllers
reuse the same stdout and stderr as the Elasticsearch JVM (which
are by default redirected to es.stdout.log and es.stderr.log) so
that if something unexpected is written to native controller
output then:

1. The native controller process does not block, waiting for
   something to read the output
2. We can see what the output was, making it easier to debug
   obscure environmental problems

Backport of #56491

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-05-13 22:57:00 +01:00
Nik Everett b98b260048
Merge significant_terms into the terms package (backport of #56699) (#56715)
This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.

Precondition for the terms work in #56487
2020-05-13 17:36:21 -04:00
Luca Cavanna 34410814b9
Don't omit empty arrays when filtering _source (#56527)
When using source filtering exclusions, empty arrays are not preserved in documents, and no empty arrays are returned if arrays are empty after applying exclusions. We have special treatment to make sure that we preserve empty objects, but the behaviour for arrays is different.

It looks like this regression was introduced by #22593, shortly after we refactored source filtering to use automata (#20736).

Note that this change affects what the search API returns when using source exclusions, as well as what gets indexed when using source exclusions for the _source field.

Closes #23796
2020-05-13 23:24:21 +02:00
Nik Everett 126619ae3c
Add list of defered aggregations to the profiler (backport of #56208) (#56682)
This adds a few things to the `breakdown` of the profiler:
* `histogram` aggregations now contain `total_buckets` which is the
  count of buckets that they collected. This could be useful when
  debugging a histogram inside of another bucketing agg that is fairly
  selective.
* All bucketing aggs that can delay their sub-aggregations will now add
  a list of delayed sub-aggregations. This is useful because we
  sometimes have fairly involved logic around which sub-aggregations get
  delayed and this will save you from having to guess.
* Aggregtations wrapped in the `MultiBucketAggregatorWrapper` can't
  accurately add anything to the breakdown. Instead they the wrapper
  adds a marker entry `"multi_bucket_aggregator_wrapper": true` so we
  can be quickly pick out such aggregations when debugging.

It also fixes a bug where `_count` breakdown entries were contributing
to the overall `time_in_nanos`. They didn't add a large amount of time
so it is unlikely that this caused a big problem, but I was there.

To support the arbitrary breakdown data this reworks the profiler so
that the `breakdown` can contain any data that is supported by
`StreamOutput#writeGenericValue(Object)` and
`XContentBuilder#value(Object)`.
2020-05-13 16:33:22 -04:00
Julie Tibshirani 1ad83c37c4
Use index sort range query when possible. (#56710)
This PR proposes to use `IndexSortSortedNumericDocValuesRangeQuery` when
possible to speed up certain range queries. Points-based queries are already
very efficient, the only time this query makes a difference is when the range
matches a large number of documents.

Relates to #48665.
2020-05-13 13:24:45 -07:00
Ross Wolf 61e2cf89b5
EQL: Add number function (#55084)
* EQL: Add number function
* EQL: Fix the locale used for number for deterministic functionality
* EQL: Add more ToNumber tests
* EQL: Add more number ToNumberProcessor unit tests
* EQL: Remove unnecessary overrides, fix processor methods
* EQL: Remove additional unnecessary overrides
* EQL: Lint fixes for ToNumber
* EQL: ToNumber renames from PR feedback
* EQL: Remove NumberFormat locale handling
* EQL: Removed NumberFormat from ToNumber
* EQL: Add number function tests
* EQL: ToNumberProcessorTests formatting
* EQL: Remove newline in ToNumberProcessorTests
* EQL: Add number(..., null) test
* EQL: Create expression.function.scalar.math package
* EQL: Remove painless whitespace for ToNumber.asScript
* EQL: Add Long support
2020-05-13 14:09:06 -06:00
Jason Tedor 5ca2ea2dde
Allow removing replicas setting on closed indices (#56680)
This is similar to a previous change that allowed removing the number of
replicas settings (so setting it to its default) on open indices. This
commit allows the same for closed indices.

It is unfortunate that we have separate branches for handling open and
closed indices here, but I do not see a clean way to merge these two
together without making a rather unnatural method (note that they invoke
different methods for doing the settings updates). For now, we leave
this as-is even though it led to the miss here.
2020-05-13 15:56:58 -04:00
Mark Vieira e3be18a443
Add version 6.8.10 2020-05-13 11:27:40 -07:00
Bogdan Pintea ee437bef27
Docs: forward port release docs of 7.7.0 (#56706)
Forward port the release docs of 7.7.0: breaking changes, release notes,
release highlights.
2020-05-13 20:08:14 +02:00
Julie Tibshirani a92d138c77 Correct the type of the 'analyzer' parameter in the _analyze docs. (#56650)
This optional parameter can only be a string. To test out a transient custom
analysis chain, users are expected to use the 'tokenizer', 'filter', and
'char_filter' parameters.
2020-05-13 11:05:06 -07:00
Bogdan Pintea 2f0663c490 Add the 7.7.1 Version
Add the bumped 7.7 branch new version, 7.7.1
2020-05-13 18:46:07 +02:00
David Turner 26382dff19 Clarify doc count stats (#56665)
Today we report some statistics in terms of Lucene-level documents, which
differ from Elasticsearch-level documents in a number of ways and include
things like document tombstones which users cannot directly observe. This
commit clarifies the internal nature of these statistics.

Closes #56497
2020-05-13 15:07:44 +01:00
Costin Leau 9f1ecd52eb EQL: Introduce support for sequences (#56300)
Initial support for EQL sequences
The current algorithm is focused on correctness and does not contain
any optimization which is left for the future.

The current implementation uses a state machine approach which moves
ascending and runs each query one after the other working on computing
sequences as the data comes in.
For each result, the key and its timestamp are being extracted which are
then used for matching/building a sequence.

(cherry picked from commit 4f3e18c894a1841d333022361ad9d1fdf1477dc3)
2020-05-13 15:42:31 +03:00
James Rodewig c859fafcbd [DOCS] Correct `query` datatype in enrich policy definition (#56224)
Corrects the datatype for the `query` property of an enrich policy
object. The `query` property is a query object, not a string.
2020-05-13 08:35:17 -04:00