OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Roberts	270a23e422	[TEST] Fix log tail mocking in native process unit tests (#56804 ) This is a followup to #56632. Tests that had to be changed to mock the C++ log handler more accurately need to be more careful about when that stream ends, as ending of that stream is used to detect crashes in the production system. Fixes #56796	2020-05-15 12:46:37 +01:00
Alan Woodward	d33d13f2be	Simplify generics on Mapper.Builder (#56747 ) Mapper.Builder currently has some complex generics on it to allow fluent builder construction. However, the second parameter, a return type from the build() method, is unnecessary, as we can use covariant return types. This commit removes this second generic parameter.	2020-05-15 12:14:49 +01:00
Yang Wang	c66e7ecbfe	Fix test failure of file role store auto-reload (#56398 ) (#56802 ) Ensure assertion is only performed when we can be sure that the desired changes are picked up by the file watcher.	2020-05-15 15:10:45 +10:00
Ryan Ernst	9fb80d3827	Move publishing configuration to a separate plugin (#56727 ) This is another part of the breakup of the massive BuildPlugin. This PR moves the code for configuring publications to a separate plugin. Most of the time these publications are jar files, but this also supports the zip publication we have for integ tests.	2020-05-14 20:23:07 -07:00
Tal Levy	5e90ff32f7	Add Normalize Pipeline Aggregation (#56399 ) (#56792 ) This aggregation will perform normalizations of metrics for a given series of data in the form of bucket values. The aggregations supports the following normalizations - rescale 0-1 - rescale 0-100 - percentage of sum - mean normalization - z-score normalization - softmax normalization To specify which normalization is to be used, it can be specified in the normalize agg's `normalizer` field. For example: ``` { "normalize": { "buckets_path": <>, "normalizer": "percent" } } ```	2020-05-14 17:40:15 -07:00
Mark Vieira	0fd756d511	Enforce strict license distribution requirements (#56642 )	2020-05-14 13:57:56 -07:00
Costin Leau	6f4af43405	EQL: Skip execution for filters with empty results (#56718 ) Optimize away events queries and joins/sequence that cannot match any results without having to query the backend. (cherry picked from commit 69c8ef8cfefd8fc6dcb6d1a566bfcd537068e3e4)	2020-05-14 22:38:23 +03:00
Mark Tozzi	b718193a01	Clean up DocValuesIndexFieldData (#56372 ) (#56684 )	2020-05-14 12:42:37 -04:00
Dimitris Athanasiou	ac5902624c	[7.x][ML] Improve error upon DF analytics mappings conflict (#56700 ) (#56776 ) Adds the conflicting types and an example of an index which specifies them in order to make it easier for the user to understand the conflict. Backport of #56700	2020-05-14 19:16:10 +03:00
Jim Ferenczi	fb5e6329b7	Stop/Start async search maintenance service in tests(#56673 ) This change ensures that the maintenance service that is responsible for deleting the expired response is stopped between each test. This is needed since we check that no search context are in-flight after each test method. Fixes #55988	2020-05-14 15:13:01 +02:00
David Turner	bec6821fe6	AwaitsFix for #56755	2020-05-14 11:46:05 +01:00
Alexander Reelsen	3a263d91f6	Ensure watcher email action message ids are always unique (#56574 ) If an email action is used in a foreach loop, message ids could have been duplicated, which then get rejected by the mail server. This commit introduces an additional static counter in the email action in order to ensure that every message id is unique.	2020-05-14 10:36:00 +02:00
Przemysław Witek	98fbd85290	[7.x] Add scope-related fields to Annotation (#56417 ) (#56681 )	2020-05-14 10:23:13 +02:00
Andrei Stefan	ddf4e47e86	EQL: fix QueryFolderOkTests (#56714 ) (#56728 ) (cherry picked from commit 8b21ccd0eac3b3d0fbd090152b3dff6ae5217b52)	2020-05-14 10:58:25 +03:00
David Roberts	3051c37f92	[ML] Tail the C++ logging pipe before connecting other pipes (#56701 ) Prior to this change the named pipes that connect the ML C++ processes to the Elasticsearch JVM were all opened before any of them were read from or written to. This created a problem, where if the C++ process logged more messages between opening the log pipe and opening the last pipe to be connected than there was space for in the named pipe's buffer then the C++ process would block. This would mean it never got as far as opening the last named pipe, so the JVM would never get as far as reading from the log pipe, hence a deadlock. This change alters the connection order so that the JVM starts reading from the logging pipe immediately after opening it so that if the C++ process logs messages while opening the other named pipes they are captured in a timely manner and there is no danger of a deadlock. Backport of #56632	2020-05-14 07:10:30 +01:00
Aleksandr Maus	87a10806ab	EQL: Fix cidrMatch function fails to match when used in scripts (#56246 ) (#56735 ) EQL: Fix cidrMatch function fails to match when used in scripts (#56246) Addresses https://github.com/elastic/elasticsearch/issues/55709	2020-05-13 22:41:24 -04:00
Nik Everett	b98b260048	Merge significant_terms into the terms package (backport of #56699 ) (#56715 ) This merges the code for the `significant_terms` agg into the package for the code for the `terms` agg. They are super entangled already, this mostly just admits that to ourselves. Precondition for the terms work in #56487	2020-05-13 17:36:21 -04:00
Ross Wolf	61e2cf89b5	EQL: Add number function (#55084 ) * EQL: Add number function * EQL: Fix the locale used for number for deterministic functionality * EQL: Add more ToNumber tests * EQL: Add more number ToNumberProcessor unit tests * EQL: Remove unnecessary overrides, fix processor methods * EQL: Remove additional unnecessary overrides * EQL: Lint fixes for ToNumber * EQL: ToNumber renames from PR feedback * EQL: Remove NumberFormat locale handling * EQL: Removed NumberFormat from ToNumber * EQL: Add number function tests * EQL: ToNumberProcessorTests formatting * EQL: Remove newline in ToNumberProcessorTests * EQL: Add number(..., null) test * EQL: Create expression.function.scalar.math package * EQL: Remove painless whitespace for ToNumber.asScript * EQL: Add Long support	2020-05-13 14:09:06 -06:00
Costin Leau	9f1ecd52eb	EQL: Introduce support for sequences (#56300 ) Initial support for EQL sequences The current algorithm is focused on correctness and does not contain any optimization which is left for the future. The current implementation uses a state machine approach which moves ascending and runs each query one after the other working on computing sequences as the data comes in. For each result, the key and its timestamp are being extracted which are then used for matching/building a sequence. (cherry picked from commit 4f3e18c894a1841d333022361ad9d1fdf1477dc3)	2020-05-13 15:42:31 +03:00
Ignacio Vera	b4521d5183	upgrade to Lucene 8.6.0 snapshot (#56661 )	2020-05-13 14:25:16 +02:00
Marios Trivyzas	cbbbd499bf	SQL/EQL: Add support for scalars within LIKE/RLIKE (#56495 ) (#56674 ) - Add support for scalar functions on the field of SQL's LIKE/RLIKE - Add support for scalar functions on the field of EQL's match/matchLite Closes: #55058 (cherry picked from commit 51c14e2dbb7fb29004a23369c449d425b3ac8fe2)	2020-05-13 13:40:24 +02:00
Luca Cavanna	30e9a1b8c7	Improve error handling when decoding async execution ids (#56285 ) When decoding async execution ids, exceptions thrown from the decode method itself were not caught, leading to cryptic errors like "Input byte array has incorrect ending byte at 68" being returned. With this commit we return "invalid id: [abcdef]". Added tests coverage for a couple of these scenarios and also added tests for equals/hashcode methods.	2020-05-13 12:26:17 +02:00
Marios Trivyzas	e781193cf9	SQL: Fix JDBC url pattern in docs and error message (#56612 ) The docs pattern url was using `*` which means zero or many instead of `?` which means zero or one. The pattern url returned in error messages was not in sync with the one in the docs. Fixes: #56476 (cherry picked from commit 1a5945c3962cdda21482f4b0b3e0ca508534c2c4)	2020-05-13 12:13:58 +02:00
David Turner	c10b4ae15a	Support cloning of searchable snapshot indices (#56595 ) Today you can convert a searchable snapshot index back into a regular index by restoring the underlying snapshot, but this is somewhat wasteful if the shards are already in cache since it copies the whole index from the repository again. Instead, we can make use of the locally-cached data by using the clone API to copy the contents of the cache into the layout expected by a regular shard. This commit marks the searchable snapshot's private index settings as `NotCopyableOnResize` so that they are removed by resize operations such as cloning. Cloning a regular index typically hard-links the underlying files rather than copying them, but this is tricky to support in the case of a searchable snapshot so this commit takes the simpler approach of always copying the underlying files.	2020-05-13 11:05:14 +01:00
Ioannis Kakavas	cc119c3853	Expose idp.metadata.http.refresh for SAML realm (#56354 ) (#56593 ) This setting was not returned in the SamlRealmSettings#getSettings so it was not possible for users to set this in the realm config in our configuration.	2020-05-13 11:51:18 +03:00
Jake Landis	a010f4f624	[7.x] Watcher dont add watches post index if stopped (#56556 ) (#56629 ) Watcher adds watches to the trigger service on the postIndex action for the .watches index. This has the (intentional) side effect of also adding the watches to the stats. The tests rely on these stats for their assertions. The tests also start and stop Watcher between each test for a clean slate. When Watcher executes it updates the .watches index and upon this update it will go through the postIndex method and end up added that watch to the trigger service (and stats). Functionally this is not a problem, if Watcher is stopping or stopped since Watcher is also paused and will not execute the watch. However, with specific timing and expectations of a clean slate can cause issues the test assertions against the stats. This commit ensures that the postIndex action only adds to the trigger service if the Watcher state is not stopping or stopped. When started back up it will re-read index .watches. This commit also un-mutes the tests related to #53177 and #56534	2020-05-12 16:30:27 -05:00
Jake Landis	9c76ee47c4	[7.x] json spec: allow null for documentation url (#55749 ) (#56625 ) This commit allows the JSON schema's documentation.url property to have a null value. This can useful for cases where a feature is under development, and does not have documentation published yet. This commit also adds a documentation.url for two ml resources.	2020-05-12 14:49:02 -05:00
Armin Braun	0a879b95d1	Save Bounds Checks in BytesReference (#56577 ) (#56621 ) Two spots that allow for some optimization: * We are often creating a composite reference of just a single item in the transport layer => special cased via static constructor to make sure we never do that * Also removed the pointless case of an empty composite bytes ref * `ByteBufferReference` is practically always created from a heap buffer these days so there is no point of dealing with all the bounds checks and extra references to sliced buffers from that and we can just use the underlying array directly	2020-05-12 20:33:45 +02:00
Armin Braun	c104c9a11b	Fix Missing IgnoredUnavailable Flag in 7.x SLM Retention Task (#56616 ) Without the flag we run into the situation where a broken repository (broken by some old 6.x version of ES that is missing some snap-${uuid}.dat blobs fails to run the SLM retention task since it always errors out).	2020-05-12 18:07:58 +02:00
Marios Trivyzas	4240b97d0e	SQL: [Test] Fix JdbcPreparedStatement date test Use `ORDER BY` to ensure order of the rows since more than are returned in the testDate(). Follows: #56492 (cherry picked from commit 0053a1cb515b4db160d7b0bed5cf3f13c1050687)	2020-05-12 17:08:16 +02:00
Martijn van Groningen	0c61bc63e4	Backport: auto create data streams using index templates v2 (#56596 ) Backport: #55377 This commit adds the ability to auto create data streams using index templates v2. Index templates (v2) now have a data_steam field that includes a timestamp field, if provided and index name matches with that template then a data stream (plus first backing index) is auto created. Relates to #53100	2020-05-12 17:01:15 +02:00
Andrei Stefan	f0074e93a0	QL: case sensitive support in EQL (#56404 ) (#56597 ) * QL: case sensitive support in EQL (#56404) * adds a generic startsWith function to QL * modifies the existent EQL startsWith function to be case sensitive aware * improves the existent EQL startsWith function to use a prefix query when the function is used in a case sensitive context. Same improvement is used in SQL's newly added STARTS_WITH function. * adds case sensitivity to EQL configuration through a case_sensitive parameter in the eql request, as established in #54411. The case_sensitive parameter can be specified when running queries (default is case insensitive) (cherry picked from commit ee5a09ea840167566e34c28c8225dc38bc6a7ae8)	2020-05-12 16:56:18 +03:00
Hendrik Muhs	a9425a0240	[7.x][Transform] fix count when matching exact ids(#56544 ) (#56582 ) fix count in get and get stats if explicit ids are given and ids might be duplicated when configuration are stored in different index (versions). fixes #56196	2020-05-12 14:23:13 +02:00
Marios Trivyzas	575cafb8da	SQL: Fix serialization of JDBC prep statement date/time params (#56492 ) (#56579 ) The Date/Time related query params of a JDBC prepared statement serialized using java.util.Date. The rules for serializing `java.util.Date` objects though reside in `XContentElasticsearchExtension` which is not available in the jdbc jar as this class is in `server` module. Therefore, a custom extension of the `XContentBuilderExtension` iface has been added to the jdbc module/jar. Moreover the sql's `qa` project had as dependency the `sql-action` module which depends on `server` so the `XContentBuilderExtension` was available for the integ tests hiding the real problem. Previously, when a user was setting a `java.sql.Time` to the prepStmt, the DataType used was `DATETIME` instead of `TIME` and therefore prevented from filtering with a `TIME` casted field: ``` SELECT * FROM test WHERE date::TIME = ? ``` Fixes: #56084 (cherry picked from commit f8d8e971bd2c85fa4aea44b5b3ba0cdcc950a4ed)	2020-05-12 13:25:02 +02:00
Martijn van Groningen	2e86801f61	Backport: enable searchable snapshots feature flag for xpack rest tests. Backport of: #56569 A data stream test, which tests data stream resolvability in xpack apis failed in release builds. A invocation of a searchable snapshot api failed, because the corresponding feature flag wasn't enabled for xpack rest tests. Closes #56531	2020-05-12 12:18:24 +02:00
Ignacio Vera	222ee721ec	Add moving percentiles pipeline aggregation (#55441 ) (#56575 ) Similar to what the moving function aggregation does, except merging windows of percentiles sketches together instead of cumulatively merging final metrics	2020-05-12 11:35:23 +02:00
Marios Trivyzas	5c0f26de1d	SQL: [Docs] Fix example for DATETIME_PARSE (#56409 ) When no timezone is specified the session timezone is used without conversion, fix the docs test accordingly. Follows: #56158 (cherry picked from commit 4b79b19ea5c3d17e05cb8130f3c754ac9bfd2382)	2020-05-12 09:23:00 +02:00
Ryan Ernst	902fc546bd	Migrate remaining ESIntegTestCases to internalClusterTest (#56479 ) (#56563 ) This commit migrates the ESIntegTestCase tests in x-pack to the internalClusterTest source set.	2020-05-11 21:06:04 -07:00
Nick Knize	9b64149ad2	[Geo] Refactor Point Field Mappers (#56060 ) (#56540 ) This commit refactors the following: * GeoPointFieldMapper and PointFieldMapper to AbstractPointGeometryFieldMapper derived from AbstractGeometryFieldMapper. * .setupFieldType moved up to AbstractGeometryFieldMapper * lucene indexing moved up to AbstractGeometryFieldMapper.parse * new addStoredFields, addDocValuesFields abstract methods for implementing stored field and doc values field indexing in the concrete field mappers This refactor is the next phase for setting up a framework for extending spatial field mapper functionality in x-pack.	2020-05-11 17:11:36 -05:00
Tim Brooks	760ab726c2	Share netty event loops between transports (#56553 ) Currently Elasticsearch creates independent event loop groups for each transport (http and internal) transport type. This is unnecessary and can lead to contention when different threads access shared resources (ex: allocators). This commit moves to a model where, by default, the event loops are shared between the transports. The previous behavior can be attained by specifically setting the http worker count.	2020-05-11 15:43:43 -06:00
Benjamin Trent	1d6b2f074e	[Transform] adds geotile_grid support in group_by (#56514 ) (#56549 ) This adds support for grouping by geo points. This uses the agg [geotile_grid](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geotilegrid-aggregation.html). I am opting to store the tile results of group_by as a `geo_shape` so that users can query the results. Additionally, the shapes could be visualized and filtered in the kibana maps app. relates to https://github.com/elastic/elasticsearch/issues/56121	2020-05-11 17:02:40 -04:00
Lee Hinman	1337b35572	Remove prefer_v2_templates query string parameter (#56545 ) This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that allowed specifying whether V1 or V2 template should be used when an index is created. It has been removed in favor of V2 templates always having priority. Relates to #53101 Resolves #56528 This is not a breaking change because this flag was never in a released version.	2020-05-11 14:56:42 -06:00
zhenxianyimeng	8e96e5c936	Use CollectionUtils.isEmpty where appropriate (#55910 ) This commit uses the isEmpty utility method for arrays in place of null and greater than zero checks.	2020-05-11 09:55:57 -07:00
Armin Braun	3ab6eba6bc	Fix RollupJobTaskTests Leaking Threads on Slowness (#56438 ) (#56518 ) We are ensuring order in the two tests changed by waiting on latches. The problem is, that 3s is a pretty short wait and on CI can randomly be exceeded by pure chance. If that happened we wouldn't have visibility on it since we didn't assert that the waits actually worked. => Fixed by asserting that the waits work and upping the timeout to our standard 10s Also, moved to a per-test threadpool to make it simpler to identify which test failed, should an unexpected task run on a closed client's pool afterall.	2020-05-11 17:24:10 +02:00
Jim Ferenczi	02ab9112a9	Fix spurious failures in AsyncSearchIntegTestCase (#56026 ) Async search integration tests are subject to random failures when: * The test index has more than one replica. * The request cache is used. * Some shards are empty. * The maintenance service starts a garbage collection when node is closing. They are also slow because the test index is created/populated on each test method. This change refactors these integration tests in order to: * Create the index once for the entire test suite. * Fix the usage of the request cache and replicas. * Ensures that all shards have at least one document. * Increase the delay of the maintenance service garbage collection. Closes #55895 Closes #55988	2020-05-11 15:03:03 +02:00
Martijn van Groningen	9ae09570d8	Allow a number of broadcast transport actions to resolve data streams (#55726 ) (#56502 ) Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction to be able to resolve data streams by default. Implementations can change this ability. This change allows to following APIs to resolve data streams: flush, refresh (already supported data streams), force merge, clear indices cache, indices stats (already supported data streams), segments, upgrade stats, upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and reload analyzers APIs. Relates to #53100	2020-05-11 12:48:35 +02:00
Nik Everett	2f38aeb5e2	Save memory when numeric terms agg is not top (#55873 ) (#56454 ) Right now all implementations of the `terms` agg allocate a new `Aggregator` per bucket. This uses a bunch of memory. Exactly how much isn't clear but each `Aggregator` ends up making its own objects to read doc values which have non-trivial buffers. And it forces all of it sub-aggregations to do the same. We allocate a new `Aggregator` per bucket for two reasons: 1. We didn't have an appropriate data structure to track the sub-ordinals of each parent bucket. 2. You can only make a single call to `runDeferredCollections(long...)` per `Aggregator` which was the only way to delay collection of sub-aggregations. This change switches the method that builds aggregation results from building them one at a time to building all of the results for the entire aggregator at the same time. It also adds a fairly simplistic data structure to track the sub-ordinals for `long`-keyed buckets. It uses both of those to power numeric `terms` aggregations and removes the per-bucket allocation of their `Aggregator`. This fairly substantially reduces memory consumption of numeric `terms` aggregations that are not the "top level", especially when those aggregations contain many sub-aggregations. It also is a pretty big speed up, especially when the aggregation is under a non-selective aggregation like the `date_histogram`. I picked numeric `terms` aggregations because those have the simplest implementation. At least, I could kind of fit it in my head. And I haven't fully understood the "bytes"-based terms aggregations, but I imagine I'll be able to make similar optimizations to them in follow up changes.	2020-05-08 20:38:53 -04:00
Armin Braun	0a254cf223	Serialize Monitoring Bulk Request Compressed (#56410 ) (#56442 ) Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the HTTP connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.	2020-05-08 23:16:07 +02:00
Dimitris Athanasiou	44ffa388ac	[7.x][ML] Use non-zero timeout when force stopping DF analytics (#56423 ) (#56428 ) We have been using a zero timeout in the case that DF analytics is stopped. This may cause a timeout when we cancel, for example, the reindex task. This commit fixes this by using the default timeout instead. Backport of #56423	2020-05-08 21:12:11 +03:00
David Roberts	9a3924a641	[ML] Adjust list of platforms that have ML native code (#56426 ) Native code is now available for linux-aarch64. Note that it is _not_ currently supported!	2020-05-08 16:22:45 +01:00
Dimitris Athanasiou	c117ae7a6e	[7.x][ML] Force stopping stopped DF analytics should succeed (#56421 ) (#56424 ) Force stopping a DF analytics job whose config exists and that is stopped should succeed. This was broken by #56360. Closes #56414 Backport of #56421	2020-05-08 18:04:24 +03:00
Tanguy Leroux	8e9b69bfd7	Use snapshot information to build searchable snapshot store MetadataSnapshot (#56289 ) (#56403 ) While investigating possible optimizations to speed up searchable snapshots shard restores, we noticed that Elasticsearch builds the list of shard files on local disk in order to compare it with the list of files contained in the snapshot to restore. This list of files is materialized with a MetadataSnapshot object whose construction involves to read the footer checksum of every files of the shard using Store.checksumFromLuceneFile() method. Further investigation shows that a MetadataSnapshot object is also created for other types of operations like building the list of files to recover in a peer recovery (and primary shard relocation) or in order to assign a shard to a node. These operations use the Store.getMetadata(IndexCommit) method to build the list of files and checksums. In the case of searchable snapshots building the MetadataSnapshot object can potentially trigger cache misses, which in turn can cause the download and the writing in cache of the last range of the file in order to check the 16 bytes footer. This in turn can cause more evictions. Since searchable snapshots already contains the footer information of every file in BlobStoreIndexShardSnapshot it can directly read the checksum from it and avoid to use the cache at all to create a MetadataSnapshot for the operations mentioned above. This commit adds a shortcut to the SearchableSnapshotDirectory.openInput() method - similarly to what already exists for segment infos - so that it creates a specific IndexInput for checksum reading operation.	2020-05-08 14:16:19 +02:00
Dimitris Athanasiou	60b1c67409	[7.x][ML] Allow stopping DF analytics whose config is missing (#56360 ) (#56408 ) It is possible that the config document for a data frame analytics job is deleted from the config index. If that is the case the user is unable to stop a running job because we attempt to retrieve the config and that will throw. This commit changes that. When the request is forced, we do not expand the requested ids based on the existing configs but from the list of running tasks instead. Backport of #56360	2020-05-08 13:54:44 +03:00
Dimitris Athanasiou	d064eda2b0	[7.x][ML] Ensure phase progress may only increase (#56339 ) (#56357 ) Due to multi-threading it is possible that phase progress updates written from the c++ process arrive reordered. We can address this by ensuring that progress may only increase. Closes #56282 Backport of #56339	2020-05-07 19:46:58 +03:00
William Brafford	691044e67b	Add xpack setting deprecations to deprecation API (#56290 ) * Add xpack setting deprecations to deprecation API The deprecated settings showed up in the deprecation log file by default, but I did not add them to the deprecation API. This commit fixes that. Now if you use one of the deprecated basic feature enablement settings, calling _monitoring/deprecations will inform you of that fact. * Remove incorrectly backported settings documents It seems that I backported these docs to the wrong place in #56061, in #55980, and in #56167. I hope they're in the right place now. Co-authored-by: debadair <debadair@elastic.co>	2020-05-07 10:28:17 -04:00
Nik Everett	e35919d3b8	Optimize date_histograms across daylight savings time (backport of #55559 ) (#56334 ) Rounding dates on a shard that contains a daylight savings time transition is currently something like 1400% slower than when a shard contains dates only on one side of the DST transition. And it makes a ton of short lived garbage. This replaces that implementation with one that benchmarks to having around 30% overhead instead of the 1400%. And it doesn't generate any garbage per search hit. Some background: There are two ways to round in ES: * Round to the nearest time unit (Day/Hour/Week/Month/etc) * Round to the nearest time interval (3 days/2 weeks/etc) I'm only optimizing the first one in this change and plan to do the second in a follow up. It turns out that rounding to the nearest unit really is two problems: when the unit rounds to midnight (day/week/month/year) and when it doesn't (hour/minute/second). Rounding to midnight is consistently about 25% faster and rounding to individual hour or minutes. This optimization relies on being able to usually figure out what the minimum and maximum dates are on the shard. This is similar to an existing optimization where we rewrite time zones that aren't fixed (think America/New_York and its daylight savings time transitions) into fixed time zones so long as there isn't a daylight savings time transition on the shard (UTC-5 or UTC-4 for America/New_York). Once I implement time interval rounding the time zone rewriting optimization should no longer be needed. This optimization doesn't come into play for `composite` or `auto_date_histogram` aggs because neither have been migrated to the new `DATE` `ValuesSourceType` which is where that range lookup happens. When they are they will be able to pick up the optimization without much work. I expect this to be substantial for `auto_date_histogram` but less so for `composite` because it deals with fewer values. Note: My 30% overhead figure comes from small numbers of daylight savings time transitions. That overhead gets higher when there are more transitions in logarithmic fashion. When there are two thousand years worth of transitions my algorithm ends up being 250% slower than rounding without a time zone, but java time is 47000% slower at that point, allocating memory as fast as it possibly can.	2020-05-07 09:10:51 -04:00
Tanguy Leroux	6233e32ab3	Fix SearchableSnapshotDirectoryTests.testIndexSearcher() (#56275 ) Closes #56233	2020-05-07 11:12:35 +02:00
Tanguy Leroux	65a061e33a	Fix SearchableSnapshotDirectoryTests.testClearCache (#56277 ) This test sometimes fails when prewarming is enabled because it's possible that some files are cached in background while the test tries to clear the cache. This commit disables prewarming for this test.	2020-05-07 10:59:33 +02:00
Andrei Stefan	980f175222	EQL: simplify equals/not-equals TRUE/FALSE expressions (#56191 ) (#56306 ) * Simplify equals/not-equals TRUE/FALSE expressions, by returning them as is (TRUE variant) or negating them (FALSE variant) (cherry picked from commit 17858afbe6da5fa0b3ecfc537cabb337e4baaffe)	2020-05-07 03:02:04 +03:00
Jason Tedor	33669c0420	Upgrade to Jackson 2.10.4 (#56188 ) Another Jackson release is available. There are some CVEs addressed, none of which impact us, but since we can now bump Jackson easily, let us move along with the train to avoid the false positives from security scanners.	2020-05-06 17:20:23 -04:00
Przemysław Witek	0cd0ab276e	Introduce Annotation.Builder class and use it to create instances of Annotation class (#56276 ) (#56286 )	2020-05-06 20:47:03 +02:00
Julie Tibshirani	e852bb29b7	Simplify signature of FieldMapper#parseCreateField. (#56144 ) `FieldMapper#parseCreateField` accepts the parse context, plus a list of fields as an output parameter. These fields are immediately added to the document through `ParseContext#doc()`. This commit simplifies the signature by removing the list of fields, and having the mappers add the fields directly to `ParseContext#doc()`. I think this is nicer for implementors, because previously fields could be added either through the list, or the context (through `add`, `addWithKey`, etc.)	2020-05-06 11:12:09 -07:00
Dimitris Athanasiou	011e995165	[7.x][ML] Unmute ClssificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange (#56268 ) (#56287 ) Closes #56240	2020-05-06 18:20:46 +03:00
Luca Cavanna	9a9cb68e83	Async Search: correct shards counting (#55758 ) Async search allows users to retrieve partial results for a running search. For partial results, the number of successful shards does not include the skipped shards, while the response returned to users should. Also, we recently had a bug where async search would miss tracking shard failures, which would have been caught if we had assertions in place that verified that whenever we get the last response, the number of failures included in it is the same as the failures that were tracked through the listener notifications.	2020-05-06 12:13:30 +02:00
Tanguy Leroux	07ad742b60	Enable prewarming by default for searchable snapshots (#56201 ) Now searchable snapshots directories respect the repository rate limitations (#55952) we can enable prewarming by default for shards.	2020-05-06 10:18:34 +02:00
Tanguy Leroux	131a3911eb	Replace BlobContainerWrapper by FilterBlobContainer (#56200 ) A FilterBlobContainer class was introduced in #55952 and it delegates its behavior to a given BlobContainer while allowing to override only necessary methods. This commit replaces the existing BlobContainerWrapper class from the test framework with the new FilterBlobContainer from core.	2020-05-06 10:05:43 +02:00
Julie Tibshirani	bd7a2d2b01	Mute the geogrid agg circuit breaker tests.	2020-05-05 18:09:07 -07:00
Jake Landis	a22690c9ca	[7.x] Ensure that the monitoring export exceptions are logged. (#56237 ) (#56251 ) If an exception occurs while flushing a bulk the cause of the exception can be lost. This commit ensures that cause of the exception is carried forward and gets logged.	2020-05-05 19:24:26 -05:00
Julie Tibshirani	49de092b38	Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet.	2020-05-05 16:25:36 -07:00
Bogdan Pintea	47250b14a4	SQL: Add BigDecimal support to JDBC (#56015 ) (#56220 ) * SQL: Add BigDecimal support to JDBC (#56015) * Introduce BigDecimal support to JDBC -- fetching This commit adds support for the getBigDecimal() methods. * Allow BigDecimal params in double range A prepared statement will now accept a BigDecimal parameter as a proxy for a double, if the conversion is lossless. (cherry picked from commit e9a873ad7f387682e3472110b1d7c0514bd347c9) * Fix compilation error Dimond notation with anonymous inner classes not avail in Java8.	2020-05-05 23:19:36 +02:00
Bogdan Pintea	f159fd8a20	Fix test on incompatible client versions (#56234 ) (#56241 ) The incomatible client version test is changed to: - iterate on all versions prior to the allowed one_s; - format the exception message just as the server does it. The defect stemed from the fact that the clients will not send a version's qualifier, but just major.minor.revision, so the raised error/exception_message won't contain it, while the test expected it. (cherry picked from commit 4a81c8f7a1f4573e3be95f346d9fb18772b297ee)	2020-05-05 23:18:29 +02:00
Julie Tibshirani	63062ec7bd	Mute ClassificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange.	2020-05-05 13:48:35 -07:00
Dan Hermann	6674f14fb3	[7.x] Get index includes parent data stream for backing indices (#56238 )	2020-05-05 15:43:42 -05:00
Benjamin Trent	e1c5ca421e	[7.x] [ML] lay ground work for handling >1 result indices (#55892 ) (#56192 ) * [ML] lay ground work for handling >1 result indices (#55892) This commit removes all but one reference to `getInitialResultsIndexName`. This is to support more than one result index for a single job.	2020-05-05 15:54:08 -04:00
Julie Tibshirani	793f265451	Mute SearchableSnapshotDirectoryTests.testIndexSearcher.	2020-05-05 12:29:05 -07:00
Ross Wolf	389082033e	EQL: Add concat function (#55193 ) * EQL: Add concat function * EQL: for loop spacing for concat * EQL: return unresolved arguments to concat early * EQL: Add concat integration tests * EQL: Fix concat query fail test * EQL: Add class for concat function testing * EQL: Add concat integration tests * EQL: Update concat() null behavior	2020-05-05 12:53:34 -06:00
Bogdan Pintea	23c35e32f2	SQL: introduce a query builder for the Rest tests (#55094 ) (#56221 ) * Introduce a query builder for the rest tests The new BaseRestSqlTestCase.RequestObjectBuilder class is a helper class to build REST request objects for the tests. Consequently, "manual" string concatenation to form JSON is done away with. The class mimics SqlQueryRequestBuilder API. (cherry picked from commit c8363f04c029542c233a758e9286d33c51d9c0c4)	2020-05-05 18:55:41 +02:00
Tal Levy	e4f2c3105d	Add geo_shape support for geotile_grid and geohash_grid (#55966 ) (#56228 ) this commit adds aggregation support for the geo_shape field type on geo*_grid aggregations. it introduces a Tiler for both tiles and hashes that enables a new type of ValuesSource to replace the GeoPoint's CellIdSource. This makes it possible for the existing Aggregator to be re-used, so no new implementations of the grid aggregators are added.	2020-05-05 09:54:14 -07:00
Benjamin Trent	641f598364	[Transform] fixes http status code when bad scripts are provided (#56117 ) (#56219 ) Transforms should propagate up the search execution exception if one is returned when it does the test query. this allows transforms to return a `4xx` when the aggs are malformed but parseable. closes https://github.com/elastic/elasticsearch/issues/55994	2020-05-05 12:36:22 -04:00
Bogdan Pintea	0e5632dc3a	SQL: relax version lock between server and clients (#56148 ) (#56223 ) * Relax version lock between ES/SQL and clients Allow older-than-server clients to connect, if these are past or on a certain min release. (cherry picked from commit 108f907297542ce649aa7304060aaf0a504eb699)	2020-05-05 18:27:06 +02:00
William Brafford	3499fa917c	Deprecated xpack "enable" settings should be no-ops (#55416 ) (#56167 ) The following settings are now no-ops: * xpack.flattened.enabled * xpack.logstash.enabled * xpack.rollup.enabled * xpack.slm.enabled * xpack.sql.enabled * xpack.transform.enabled * xpack.vectors.enabled Since these settings no longer need to be checked, we can remove settings parameters from a number of constructors and methods, and do so in this commit. We also update documentation to remove references to these settings.	2020-05-05 10:40:49 -04:00
Tanguy Leroux	b9636713b1	Searchable Snapshots should respect max_restore_bytes_per_sec (#55952 ) (#56199 ) This commit changes searchable snapshots so that it now respects the repository's max_restore_bytes_per_sec setting when it downloads blobs. Backport of #55952 for 7.x	2020-05-05 15:43:06 +02:00
David Roberts	7aa0daaabd	[7.x][ML] More advanced model snapshot retention options (#56194 ) This PR implements the following changes to make ML model snapshot retention more flexible in advance of adding a UI for the feature in an upcoming release. - The default for `model_snapshot_retention_days` for new jobs is now 10 instead of 1 - There is a new job setting, `daily_model_snapshot_retention_after_days`, that defaults to 1 for new jobs and `model_snapshot_retention_days` for pre-7.8 jobs - For days that are older than `model_snapshot_retention_days`, all model snapshots are deleted as before - For days that are in between `daily_model_snapshot_retention_after_days` and `model_snapshot_retention_days` all but the first model snapshot for that day are deleted - The `retain` setting of model snapshots is still respected to allow selected model snapshots to be retained indefinitely Backport of #56125	2020-05-05 14:31:58 +01:00
David Turner	40ea0eabd9	Forbid snapshot access on applier thread (#56044 ) This commit strengthens the assertion about which threads may access a blob store to exclude the cluster applier thread, since we no longer need to do so. Relates #50999	2020-05-05 13:27:55 +01:00
Dimitris Athanasiou	2d7899c83c	[7.x][ML] Adjust DF Analytics process phases (#56107 ) (#56177 ) As of elastic/ml-cpp#1179, the analytics process reports phases depending on the analysis type. This commit adjusts the phases of current analyses from `analyzing` to the following: - outlier_detection: [`computing_outlier`] - regression/classification: [`feature_selection`, `coarse_parameter_search`, `fine_tuning_parameters`, `final_training`] Backport of #56107	2020-05-05 15:00:07 +03:00
Dimitris Athanasiou	75dadb7a6d	[7.x][ML] Add loss_function to regression (#56118 ) (#56187 ) Adds parameters `loss_function` and `loss_function_parameter` to regression. Backport of #56118	2020-05-05 14:59:51 +03:00
Hendrik Muhs	e177a38504	[7.x][Transform] add throttling (#56007 ) (#56184 ) add throttling to transform, throttling will slow down search requests by delaying the execution based on a documents per second metric. fixes #54862	2020-05-05 13:09:02 +02:00
Marios Trivyzas	363e994171	SQL: Fix DATETIME_PARSE behaviour regarding timezones (#56158 ) (#56182 ) Previously, when the timezone was missing from the datetime string and the pattern, UTC was used, instead of the session defined timezone. Moreover, if a timezone was included in the datetime string and the pattern then this timezone was used. To have a consistent behaviour the resulting datetime will always be converted to the session defined timezone, e.g.: ``` SELECT DATETIME_PARSE('2020-05-04 10:20:30.123 +02:00', 'HH:mm:ss dd/MM/uuuu VV') AS datetime; ``` with `time_zone` set to `-03:00` will result in ``` 2020-05-04T05:20:40.123-03:00 ``` Follows: #54960 (cherry picked from commit 8810ed03a209cc8fe1bad309a81e85b56a39da27)	2020-05-05 12:08:39 +02:00
Tanguy Leroux	f717830563	Use workers to warm cache parts (#55793 ) (#56181 ) Today the cache prewarming introduced in #55322 works by enqueuing altogether the files parts to warm in the searchable_snapshots thread pool. In order to make this fairer among concurrent warmings, this commit starts workers that concurrently polls file parts to warm from a queue, warms the part and then immediately schedule another warming execution. This should leave more room for concurrent shard warming to sneak in and be executed. Relates #55322	2020-05-05 11:48:06 +02:00
Tanguy Leroux	35622747fd	Add Minio tests for searchable snapshots (#56112 ) (#56179 ) This commit adds QA tests for searchable snapshot on MinIO, similarly to what already exist for S3, GCS and Azure.	2020-05-05 11:40:06 +02:00
Marios Trivyzas	cc21468559	SQL: Fix issue with date range queries and timezone (#56115 ) (#56174 ) Previously, the timezone parameter was not passed to the RangeQuery and as a results queries that use the ES date math notation (now, now-1d, now/d, now/h, now+2h, etc.) were using the UTC timezone and not the one passed through the "timezone"/"time_zone" JDBC/REST params. As a consequence, the date math defined dates were always considered in UTC and possibly led to incorrect results for queries like: ``` SELECT * FROM t WHERE date BETWEEN now-1d/d AND now/d ``` Fixes: #56049 (cherry picked from commit 300f010c0b18ed0f10a41d5e1606466ba0a3088f)	2020-05-05 10:54:23 +02:00
Dimitris Athanasiou	6061aa3db4	[7.x][ML] Fix race condition updating reindexing progress (#56135 ) (#56146 ) In #55763 I thought I could remove the flag that marks reindexing was finished on a data frame analytics task. However, that exposed a race condition. It is possible that between updating reindexing progress to 100 because we have called `DataFrameAnalyticsManager.startAnalytics()` and a call to the _stats API which updates reindexing progress via the method `DataFrameAnalyticsTask.updateReindexTaskProgress()` we end up overwriting the 100 with a lower progress value. This commit fixes this issue by bringing back the help of a `isReindexingFinished` flag as it was prior to #55763. Closes #56128 Backport of #56135	2020-05-05 10:48:42 +03:00
Albert Zaharovits	e8763bad41	Let realms gracefully terminate the authN chain (#55623 ) AuthN realms are ordered as a chain so that the credentials of a given user are verified in succession. Upon the first successful verification, the user is authenticated. Realms do however have the option to cut short this iterative process, when the credentials don't verify and the user cannot exist in any other realm. This mechanism is currently used by the Reserved and the Kerberos realm. This commit improves the early termination operation by allowing realms to gracefully terminate authentication, as if the chain has been tried out completely. Previously, early termination resulted in an authentication error which varies the response body compared to the failed authentication outcome where no realm could verify the credentials successfully. Reserved users are hence denied authentication in exactly the same way as other users are when no realm can validate their credentials.	2020-05-05 10:11:49 +03:00
Martijn van Groningen	2ac32db607	Move includeDataStream flag from IndicesOptions to IndexNameExpressionResolver.Context (#56151 ) Backport of #56034. Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context as a dedicated field that callers to IndexNameExpressionResolver can set. Also alter indices stats api to support data streams. The rollover api uses this api and otherwise rolling over data stream does no longer work. Relates to #53100	2020-05-04 22:38:33 +02:00
Dan Hermann	9892813842	[7.x] Delay warning about missing x-pack (#56142 ) * Delay warning about missing x-pack (#54265) Currently, when monitoring is enabled in a freshly-installed cluster, the non-master nodes log a warning message indicating that master may not have x-pack installed. The message is often printed even when the master does have x-pack installed but takes some time to setup the local exporter for monitoring. This commit adds the local exporter setting `wait_master.timeout` which defaults to 30 seconds. The setting configures the time that the non-master nodes should wait for master to setup monitoring. After the time elapses, they log a message to the user about possible missing x-pack installation on master. The logging of this warning was moved from `resolveBulk()` to `openBulk()` since `resolveBulk()` is called only on cluster updates and the message might not be logged until a new cluster update occurs. Closes #40898	2020-05-04 14:16:18 -05:00
Benjamin Trent	6c26de444d	[ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines (#56020 ) (#56126 ) If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam. It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion. Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor). closes https://github.com/elastic/elasticsearch/issues/55985	2020-05-04 13:32:01 -04:00
Martijn van Groningen	6d03081560	Add auto create action (#56122 ) Backport of #55858 to 7.x branch. Currently the TransportBulkAction detects whether an index is missing and then decides whether it should be auto created. The coordination of the index creation also happens in the TransportBulkAction on the coordinating node. This change adds a new transport action that the TransportBulkAction delegates to if missing indices need to be created. The reasons for this change: * Auto creation of data streams can't occur on the coordinating node. Based on the index template (v2) either a regular index or a data stream should be created. However if the coordinating node is slow in processing cluster state updates then it may be unaware of the existence of certain index templates, which then can load to the TransportBulkAction creating an index instead of a data stream. Therefor the coordination of creating an index or data stream should occur on the master node. See #55377 * From a security perspective it is useful to know whether index creation originates from the create index api or from auto creating a new index via the bulk or index api. For example a user would be allowed to auto create an index, but not to use the create index api. The auto create action will allow security to distinguish these two different patterns of index creation. This change adds the following new transport actions: AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created. The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege. Relates to #53100	2020-05-04 19:10:09 +02:00
Julie Tibshirani	6b5cf1b031	For constant_keyword, make sure exists query handles missing values. (#55757 ) It's possible for a constant_keyword to have a 'null' value before any documents are seen that contain a value for the field. In this case, no documents have a value for the field, and 'exists' queries should return no documents.	2020-05-04 09:41:52 -07:00
Ross Wolf	6da686c7e0	EQL: Add match function implementation (#55182 ) * EQL: Add Match function * EQL: Add note about character classes * EQL: QueryFolderFailTests.java * EQL: Add match() fail tests * EQL: Add match tests and fix alias * EQL: Add match verifier failure tests * EQL: Reorder query folder fail tests	2020-05-04 09:34:20 -06:00
Dimitris Athanasiou	76fa5a2397	[7.x][ML] Improve cleanup for DF Analytics HLRC tests (#56101 ) (#56109 ) Adds the step of stopping all data frame analytics before deleting them to the cleanup of the corresponding HLRC tests. Closes #56097 Backport of #56101	2020-05-04 16:08:08 +03:00

1 2 3 4 5 ...

4877 Commits