Commit Graph

4828 Commits

Author SHA1 Message Date
Benjamin Trent f71c305090
[7.x] [Transform] add support for terms agg in transforms (#56696) (#56809)
* [Transform] add support for terms agg in transforms (#56696)

This adds support for `terms` and `rare_terms` aggs in transforms. 

The default behavior is that the results are collapsed in the following manner:
`<AGG_NAME>.<BUCKET_NAME>.<SUBAGGS...>...`
Or if no sub aggs exist
`<AGG_NAME>.<BUCKET_NAME>.<_doc_count>`

The mapping is also defined as `flattened` by default. This is to avoid field explosion while still providing (limited) search and aggregation capabilities.
2020-05-15 08:08:43 -04:00
David Roberts 270a23e422 [TEST] Fix log tail mocking in native process unit tests (#56804)
This is a followup to #56632. Tests that had to be changed
to mock the C++ log handler more accurately need to be more
careful about when that stream ends, as ending of that
stream is used to detect crashes in the production system.

Fixes #56796
2020-05-15 12:46:37 +01:00
Alan Woodward d33d13f2be Simplify generics on Mapper.Builder (#56747)
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
2020-05-15 12:14:49 +01:00
Yang Wang c66e7ecbfe
Fix test failure of file role store auto-reload (#56398) (#56802)
Ensure assertion is only performed when we can be sure that the desired changes are picked up by the file watcher.
2020-05-15 15:10:45 +10:00
Ryan Ernst 9fb80d3827
Move publishing configuration to a separate plugin (#56727)
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
2020-05-14 20:23:07 -07:00
Tal Levy 5e90ff32f7
Add Normalize Pipeline Aggregation (#56399) (#56792)
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.

For example:

```
{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}
```
2020-05-14 17:40:15 -07:00
Mark Vieira 0fd756d511
Enforce strict license distribution requirements (#56642) 2020-05-14 13:57:56 -07:00
Costin Leau 6f4af43405 EQL: Skip execution for filters with empty results (#56718)
Optimize away events queries and joins/sequence that cannot match any
results without having to query the backend.

(cherry picked from commit 69c8ef8cfefd8fc6dcb6d1a566bfcd537068e3e4)
2020-05-14 22:38:23 +03:00
Mark Tozzi b718193a01
Clean up DocValuesIndexFieldData (#56372) (#56684) 2020-05-14 12:42:37 -04:00
Dimitris Athanasiou ac5902624c
[7.x][ML] Improve error upon DF analytics mappings conflict (#56700) (#56776)
Adds the conflicting types and an example of an index which specifies
them in order to make it easier for the user to understand the conflict.

Backport of #56700
2020-05-14 19:16:10 +03:00
Jim Ferenczi fb5e6329b7 Stop/Start async search maintenance service in tests(#56673)
This change ensures that the maintenance service that is responsible for deleting the expired response is stopped between each test. This is needed since we check that no search context are in-flight after each test method.

Fixes #55988
2020-05-14 15:13:01 +02:00
David Turner bec6821fe6 AwaitsFix for #56755 2020-05-14 11:46:05 +01:00
Alexander Reelsen 3a263d91f6 Ensure watcher email action message ids are always unique (#56574)
If an email action is used in a foreach loop, message ids could have
been duplicated, which then get rejected by the mail server.

This commit introduces an additional static counter in the email action
in order to ensure that every message id is unique.
2020-05-14 10:36:00 +02:00
Przemysław Witek 98fbd85290
[7.x] Add scope-related fields to Annotation (#56417) (#56681) 2020-05-14 10:23:13 +02:00
Andrei Stefan ddf4e47e86
EQL: fix QueryFolderOkTests (#56714) (#56728)
(cherry picked from commit 8b21ccd0eac3b3d0fbd090152b3dff6ae5217b52)
2020-05-14 10:58:25 +03:00
David Roberts 3051c37f92
[ML] Tail the C++ logging pipe before connecting other pipes (#56701)
Prior to this change the named pipes that connect the ML C++
processes to the Elasticsearch JVM were all opened before any
of them were read from or written to.

This created a problem, where if the C++ process logged more
messages between opening the log pipe and opening the last
pipe to be connected than there was space for in the named
pipe's buffer then the C++ process would block.  This would
mean it never got as far as opening the last named pipe, so
the JVM would never get as far as reading from the log pipe,
hence a deadlock.

This change alters the connection order so that the JVM
starts reading from the logging pipe immediately after opening
it so that if the C++ process logs messages while opening the
other named pipes they are captured in a timely manner and
there is no danger of a deadlock.

Backport of #56632
2020-05-14 07:10:30 +01:00
Aleksandr Maus 87a10806ab
EQL: Fix cidrMatch function fails to match when used in scripts (#56246) (#56735)
EQL: Fix cidrMatch function fails to match when used in scripts (#56246)

Addresses https://github.com/elastic/elasticsearch/issues/55709
2020-05-13 22:41:24 -04:00
Nik Everett b98b260048
Merge significant_terms into the terms package (backport of #56699) (#56715)
This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.

Precondition for the terms work in #56487
2020-05-13 17:36:21 -04:00
Ross Wolf 61e2cf89b5
EQL: Add number function (#55084)
* EQL: Add number function
* EQL: Fix the locale used for number for deterministic functionality
* EQL: Add more ToNumber tests
* EQL: Add more number ToNumberProcessor unit tests
* EQL: Remove unnecessary overrides, fix processor methods
* EQL: Remove additional unnecessary overrides
* EQL: Lint fixes for ToNumber
* EQL: ToNumber renames from PR feedback
* EQL: Remove NumberFormat locale handling
* EQL: Removed NumberFormat from ToNumber
* EQL: Add number function tests
* EQL: ToNumberProcessorTests formatting
* EQL: Remove newline in ToNumberProcessorTests
* EQL: Add number(..., null) test
* EQL: Create expression.function.scalar.math package
* EQL: Remove painless whitespace for ToNumber.asScript
* EQL: Add Long support
2020-05-13 14:09:06 -06:00
Costin Leau 9f1ecd52eb EQL: Introduce support for sequences (#56300)
Initial support for EQL sequences
The current algorithm is focused on correctness and does not contain
any optimization which is left for the future.

The current implementation uses a state machine approach which moves
ascending and runs each query one after the other working on computing
sequences as the data comes in.
For each result, the key and its timestamp are being extracted which are
then used for matching/building a sequence.

(cherry picked from commit 4f3e18c894a1841d333022361ad9d1fdf1477dc3)
2020-05-13 15:42:31 +03:00
Ignacio Vera b4521d5183
upgrade to Lucene 8.6.0 snapshot (#56661) 2020-05-13 14:25:16 +02:00
Marios Trivyzas cbbbd499bf
SQL/EQL: Add support for scalars within LIKE/RLIKE (#56495) (#56674)
- Add support for scalar functions on the field of SQL's LIKE/RLIKE
- Add support for scalar functions on the field of EQL's match/matchLite

Closes: #55058
(cherry picked from commit 51c14e2dbb7fb29004a23369c449d425b3ac8fe2)
2020-05-13 13:40:24 +02:00
Luca Cavanna 30e9a1b8c7 Improve error handling when decoding async execution ids (#56285)
When decoding async execution ids, exceptions thrown from the decode method itself were not caught, leading to cryptic errors like "Input byte array has incorrect ending byte at 68" being returned. With this commit we return "invalid id: [abcdef]".

Added tests coverage for a couple of these scenarios and also added tests for equals/hashcode methods.
2020-05-13 12:26:17 +02:00
Marios Trivyzas e781193cf9
SQL: Fix JDBC url pattern in docs and error message (#56612)
The docs pattern url was using `*` which means zero or many instead
of `?` which means zero or one. The pattern url returned in error
messages was not in sync with the one in the docs.

Fixes: #56476
(cherry picked from commit 1a5945c3962cdda21482f4b0b3e0ca508534c2c4)
2020-05-13 12:13:58 +02:00
David Turner c10b4ae15a Support cloning of searchable snapshot indices (#56595)
Today you can convert a searchable snapshot index back into a regular index by
restoring the underlying snapshot, but this is somewhat wasteful if the shards
are already in cache since it copies the whole index from the repository again.

Instead, we can make use of the locally-cached data by using the clone API to
copy the contents of the cache into the layout expected by a regular shard.
This commit marks the searchable snapshot's private index settings as
`NotCopyableOnResize` so that they are removed by resize operations such as
cloning.

Cloning a regular index typically hard-links the underlying files rather than
copying them, but this is tricky to support in the case of a searchable
snapshot so this commit takes the simpler approach of always copying the
underlying files.
2020-05-13 11:05:14 +01:00
Ioannis Kakavas cc119c3853
Expose idp.metadata.http.refresh for SAML realm (#56354) (#56593)
This setting was not returned in the SamlRealmSettings#getSettings
so it was not possible for users to set this in the realm config
in our configuration.
2020-05-13 11:51:18 +03:00
Jake Landis a010f4f624
[7.x] Watcher dont add watches post index if stopped (#56556) (#56629)
Watcher adds watches to the trigger service on the postIndex action
for the .watches index. This has the (intentional) side effect of also
adding the watches to the stats. The tests rely on these stats for their
assertions. The tests also start and stop Watcher between each test for
a clean slate.

When Watcher executes it updates the .watches index and upon this update
it will go through the postIndex method and end up added that watch to the
trigger service (and stats). Functionally this is not a problem, if Watcher
is stopping or stopped since Watcher is also paused and will not execute
the watch. However, with specific timing and expectations of a clean slate
can cause issues the test assertions against the stats.

This commit ensures that the postIndex action only adds to the trigger service
if the Watcher state is not stopping or stopped. When started back up it will
re-read index .watches.

This commit also un-mutes the tests related to #53177 and #56534
2020-05-12 16:30:27 -05:00
Jake Landis 9c76ee47c4
[7.x] json spec: allow null for documentation url (#55749) (#56625)
This commit allows the JSON schema's documentation.url property to have a null value.
This can useful for cases where a feature is under development, and does not have
documentation published yet.

This commit also adds a documentation.url for two ml resources.
2020-05-12 14:49:02 -05:00
Armin Braun 0a879b95d1
Save Bounds Checks in BytesReference (#56577) (#56621)
Two spots that allow for some optimization:

* We are often creating a composite reference of just a single item in
the transport layer => special cased via static constructor to make sure we never do that
   * Also removed the pointless case of an empty composite bytes ref
* `ByteBufferReference` is practically always created from a heap buffer these days so there
is no point of dealing with all the bounds checks and extra references to sliced buffers from that
and we can just use the underlying array directly
2020-05-12 20:33:45 +02:00
Armin Braun c104c9a11b
Fix Missing IgnoredUnavailable Flag in 7.x SLM Retention Task (#56616)
Without the flag we run into the situation where a broken repository (broken by some old 6.x
version of ES that is missing some snap-${uuid}.dat blobs fails to run the SLM retention task
since it always errors out).
2020-05-12 18:07:58 +02:00
Marios Trivyzas 4240b97d0e
SQL: [Test] Fix JdbcPreparedStatement date test
Use `ORDER BY` to ensure order of the rows since more
than are returned in the testDate().

Follows: #56492
(cherry picked from commit 0053a1cb515b4db160d7b0bed5cf3f13c1050687)
2020-05-12 17:08:16 +02:00
Martijn van Groningen 0c61bc63e4
Backport: auto create data streams using index templates v2 (#56596)
Backport: #55377

This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.

Relates to #53100
2020-05-12 17:01:15 +02:00
Andrei Stefan f0074e93a0
QL: case sensitive support in EQL (#56404) (#56597)
* QL: case sensitive support in EQL (#56404)
* adds a generic startsWith function to QL
* modifies the existent EQL startsWith function to be case sensitive
aware
* improves the existent EQL startsWith function to use a prefix query
when the function is used in a case sensitive context. Same improvement
is used in SQL's newly added STARTS_WITH function.
* adds case sensitivity to EQL configuration through a case_sensitive
parameter in the eql request, as established in #54411.
The case_sensitive parameter can be specified when running queries
(default is case insensitive)

(cherry picked from commit ee5a09ea840167566e34c28c8225dc38bc6a7ae8)
2020-05-12 16:56:18 +03:00
Hendrik Muhs a9425a0240
[7.x][Transform] fix count when matching exact ids(#56544) (#56582)
fix count in get and get stats if explicit ids are given and ids might be
duplicated when configuration are stored in different index (versions).

fixes #56196
2020-05-12 14:23:13 +02:00
Marios Trivyzas 575cafb8da
SQL: Fix serialization of JDBC prep statement date/time params (#56492) (#56579)
The Date/Time related query params of a JDBC prepared statement
serialized using java.util.Date. The rules for serializing
`java.util.Date` objects though reside in
`XContentElasticsearchExtension` which is not available in the
jdbc jar as this class is in `server` module. Therefore, a
custom extension of the `XContentBuilderExtension` iface has been
added to the jdbc module/jar.

Moreover the sql's `qa` project had as dependency the `sql-action`
module which depends on `server` so the `XContentBuilderExtension`
was available for the integ tests hiding the real problem.

Previously, when a user was setting a `java.sql.Time` to the prepStmt,
the DataType used was `DATETIME` instead of `TIME` and therefore
prevented from filtering with a `TIME` casted field:
```
SELECT * FROM test WHERE date::TIME = ?
```

Fixes: #56084
(cherry picked from commit f8d8e971bd2c85fa4aea44b5b3ba0cdcc950a4ed)
2020-05-12 13:25:02 +02:00
Martijn van Groningen 2e86801f61
Backport: enable searchable snapshots feature flag for xpack rest tests.
Backport of: #56569

A data stream test, which tests data stream resolvability in xpack apis failed in release builds.
A invocation of a searchable snapshot api failed, because the corresponding feature flag
wasn't enabled for xpack rest tests.

Closes #56531
2020-05-12 12:18:24 +02:00
Ignacio Vera 222ee721ec
Add moving percentiles pipeline aggregation (#55441) (#56575)
Similar to what the moving function aggregation does, except merging windows of percentiles
sketches together instead of cumulatively merging final metrics
2020-05-12 11:35:23 +02:00
Marios Trivyzas 5c0f26de1d
SQL: [Docs] Fix example for DATETIME_PARSE (#56409)
When no timezone is specified the session timezone is used without
conversion, fix the docs test accordingly.

Follows: #56158
(cherry picked from commit 4b79b19ea5c3d17e05cb8130f3c754ac9bfd2382)
2020-05-12 09:23:00 +02:00
Ryan Ernst 902fc546bd
Migrate remaining ESIntegTestCases to internalClusterTest (#56479) (#56563)
This commit migrates the ESIntegTestCase tests in x-pack to the
internalClusterTest source set.
2020-05-11 21:06:04 -07:00
Nick Knize 9b64149ad2
[Geo] Refactor Point Field Mappers (#56060) (#56540)
This commit refactors the following:
  * GeoPointFieldMapper and PointFieldMapper to
    AbstractPointGeometryFieldMapper derived from AbstractGeometryFieldMapper.
  * .setupFieldType moved up to AbstractGeometryFieldMapper
  * lucene indexing moved up to AbstractGeometryFieldMapper.parse
  * new addStoredFields, addDocValuesFields abstract methods for implementing
    stored field and doc values field indexing in the concrete field mappers

This refactor is the next phase for setting up a framework for extending
spatial field mapper functionality in x-pack.
2020-05-11 17:11:36 -05:00
Tim Brooks 760ab726c2
Share netty event loops between transports (#56553)
Currently Elasticsearch creates independent event loop groups for each
transport (http and internal) transport type. This is unnecessary and
can lead to contention when different threads access shared resources
(ex: allocators). This commit moves to a model where, by default, the
event loops are shared between the transports. The previous behavior can
be attained by specifically setting the http worker count.
2020-05-11 15:43:43 -06:00
Benjamin Trent 1d6b2f074e
[Transform] adds geotile_grid support in group_by (#56514) (#56549)
This adds support for grouping by geo points. This uses the agg [geotile_grid](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geotilegrid-aggregation.html).

I am opting to store the tile results of group_by as a `geo_shape` so that users can query the results. Additionally, the shapes could be visualized and filtered in the kibana maps app.

relates to https://github.com/elastic/elasticsearch/issues/56121
2020-05-11 17:02:40 -04:00
Lee Hinman 1337b35572
Remove prefer_v2_templates query string parameter (#56545)
This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that
allowed specifying whether V1 or V2 template should be used when an index is created. It has been
removed in favor of V2 templates always having priority.

Relates to #53101
Resolves #56528

This is not a breaking change because this flag was never in a released version.
2020-05-11 14:56:42 -06:00
zhenxianyimeng 8e96e5c936
Use CollectionUtils.isEmpty where appropriate (#55910)
This commit uses the isEmpty utility method for arrays in place of null and greater than zero checks.
2020-05-11 09:55:57 -07:00
Armin Braun 3ab6eba6bc
Fix RollupJobTaskTests Leaking Threads on Slowness (#56438) (#56518)
We are ensuring order in the two tests changed by waiting on latches.
The problem is, that 3s is a pretty short wait and on CI can randomly be exceeded
by pure chance. If that happened we wouldn't have visibility on it since we didn't
assert that the waits actually worked.
=> Fixed by asserting that the waits work and upping the timeout to our standard 10s
Also, moved to a per-test threadpool to make it simpler to identify which test failed,
should an unexpected task run on a closed client's pool afterall.
2020-05-11 17:24:10 +02:00
Jim Ferenczi 02ab9112a9 Fix spurious failures in AsyncSearchIntegTestCase (#56026)
Async search integration tests are subject to random failures when:
  * The test index has more than one replica.
  * The request cache is used.
  * Some shards are empty.
  * The maintenance service starts a garbage collection when node is closing.

They are also slow because the test index is created/populated on each
test method.

This change refactors these integration tests in order to:
  * Create the index once for the entire test suite.
  * Fix the usage of the request cache and replicas.
  * Ensures that all shards have at least one document.
  * Increase the delay of the maintenance service garbage collection.

Closes #55895
Closes #55988
2020-05-11 15:03:03 +02:00
Martijn van Groningen 9ae09570d8
Allow a number of broadcast transport actions to resolve data streams (#55726) (#56502)
Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction
to be able to resolve data streams by default. Implementations can change this ability.

This change allows to following APIs to resolve data streams: flush,
refresh (already supported data streams), force merge, clear indices cache,
indices stats (already supported data streams), segments, upgrade stats, 
upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and
reload analyzers APIs.

Relates to #53100
2020-05-11 12:48:35 +02:00
Nik Everett 2f38aeb5e2
Save memory when numeric terms agg is not top (#55873) (#56454)
Right now all implementations of the `terms` agg allocate a new
`Aggregator` per bucket. This uses a bunch of memory. Exactly how much
isn't clear but each `Aggregator` ends up making its own objects to read
doc values which have non-trivial buffers. And it forces all of it
sub-aggregations to do the same. We allocate a new `Aggregator` per
bucket for two reasons:

1. We didn't have an appropriate data structure to track the
   sub-ordinals of each parent bucket.
2. You can only make a single call to `runDeferredCollections(long...)`
   per `Aggregator` which was the only way to delay collection of
   sub-aggregations.

This change switches the method that builds aggregation results from
building them one at a time to building all of the results for the
entire aggregator at the same time.

It also adds a fairly simplistic data structure to track the sub-ordinals
for `long`-keyed buckets.

It uses both of those to power numeric `terms` aggregations and removes
the per-bucket allocation of their `Aggregator`. This fairly
substantially reduces memory consumption of numeric `terms` aggregations
that are not the "top level", especially when those aggregations contain
many sub-aggregations. It also is a pretty big speed up, especially when
the aggregation is under a non-selective aggregation like
the `date_histogram`.

I picked numeric `terms` aggregations because those have the simplest
implementation. At least, I could kind of fit it in my head. And I
haven't fully understood the "bytes"-based terms aggregations, but I
imagine I'll be able to make similar optimizations to them in follow up
changes.
2020-05-08 20:38:53 -04:00
Armin Braun 0a254cf223
Serialize Monitoring Bulk Request Compressed (#56410) (#56442)
Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB)
buffer usage for bulk exports in some cases which destabilizes master nodes.
Since we need to know the serialized length of the bulk body we can't do the serialization
in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway).
=> let's at least serialize on heap in compressed form and decompress as we're streaming to the
HTTP connection. For small requests this adds negligible overhead but for large requests this reduces
the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
2020-05-08 23:16:07 +02:00
Dimitris Athanasiou 44ffa388ac
[7.x][ML] Use non-zero timeout when force stopping DF analytics (#56423) (#56428)
We have been using a zero timeout in the case that DF analytics
is stopped. This may cause a timeout when we cancel, for example,
the reindex task.

This commit fixes this by using the default timeout instead.

Backport of #56423
2020-05-08 21:12:11 +03:00