OpenSearch

Commit Graph

Author	SHA1	Message	Date
Benjamin Trent	c7ba79bc19	[7.x] [ML] make waiting for renormalization optional for internally flushing job (#58537 ) (#58553 ) * [ML] make waiting for renormalization optional for internally flushing job (#58537) When flushing, datafeeds only need the guaruntee that the latest bucket has been handled. But, in addition to this, the typical call to flush waits for renormalization to complete. For large jobs, this can take a fair bit of time (even longer than a bucket length). This causes unnecessary delays in handling data. This commit adds a new internal only flag that allows datafeeds (and forecasting) to skip waiting on renormalization. closes #58395	2020-06-25 12:26:52 -04:00
Jim Ferenczi	6451187e84	Filter empty fields in SearchHit#toXContent (#58418 ) This commit restores the filtering of empty fields during the xcontent serialization of SearchHit. The filtering was removed unintentionally in #41656.	2020-06-25 17:49:03 +02:00
Nik Everett	03e6d1b535	Add Variable Width Histogram Aggregation (backport of #42035 ) (#58440 ) Implements a new histogram aggregation called `variable_width_histogram` which dynamically determines bucket intervals based on document groupings. These groups are determined by running a one-pass clustering algorithm on each shard and then reducing each shard's clusters using an agglomerative clustering algorithm. This PR addresses #9572. The shard-level clustering is done in one pass to minimize memory overhead. The algorithm was lightly inspired by [this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches a small number of documents to sample the data and determine initial clusters. Subsequent documents are then placed into one of these clusters, or a new one if they are an outlier. This algorithm is described in more details in the aggregation's docs. At reduce time, a [hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering) algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304) continually merges the closest buckets from all shards (based on their centroids) until the target number of buckets is reached. The final values produced by this aggregation are approximate. Each bucket's min value is used as its key in the histogram. Furthermore, buckets are merged based on their centroids and not their bounds. So it is possible that adjacent buckets will overlap after reduction. Because each bucket's key is its min, this overlap is not shown in the final histogram. However, when such overlap occurs, we set the key of the bucket with the larger centroid to the midpoint between its minimum and the smaller bucket’s maximum: `min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to increases the accuracy of the clustering. Nodes are unable to share centroids during the shard-level clustering phase. In the future, resolving https://github.com/elastic/elasticsearch/issues/50863 would let us solve this issue. It doesn’t make sense for this aggregation to support the `min_doc_count` parameter, since clusters are determined dynamically. The `order` parameter is not supported here to keep this large PR from becoming too complex. Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>	2020-06-25 11:40:47 -04:00
Nik Everett	c7726cc93e	Fix janky test Fixes a test that incorrectly assumed that a list of random values less than or equal to `n` always contained `n`. Oops. Closes #58353	2020-06-25 11:13:29 -04:00
Nik Everett	71adade73a	Return clear error message if aggregation type is invalid (#58255 ) (#58365 ) The main changes are: 1. Catch the `NamedObjectNotFoundException` when parsing aggregation type, and then throw a `ParsingException` with clear error message with hint. 2. Add a unit test method: AggregatorFactoriesTests#testInvalidType(). Closes #58146. Co-authored-by: bellengao <gbl_long@163.com>	2020-06-25 11:08:25 -04:00
Dimitris Athanasiou	c3dfafe0b4	[7.x][ML] Avoid assertion error on empty string feature values for inference (#58541 ) (#58550 ) It is possible for the source document to have an empty string value for a field that is mapped as numeric. We should treat those as missing values and avoid throwing an assertion error. Backport of #58541	2020-06-25 18:07:29 +03:00
Dimitris Athanasiou	5af7071db0	[7.x][ML] Change inference default field name to <dep_var>_prediction… (#58546 ) This changes the default value for the results field of inference applied on models that are trained via a data frame analytics job. Previously, the results field default was `predicted_value`. This commit makes it the same as in the training job itself. The new default field is `<dependent_variable>_prediction`. Apart from making inference consistent with the training job the model came from, it is helpful to preserve the dependent variable name by default as it provides some context to the user that may avoid confusion as to which model results came from. Backport of #58538	2020-06-25 18:03:43 +03:00
David Roberts	1742b1c39e	Cancel persistent task recheck when no longer master (#58539 ) If a persistent task cannot be assigned on the first attempt then the master node will schedule periodic rechecks to see if the assignment requirements have been met. These periodic rechecks should be cancelled if the node ceases to be master. Previously they weren't, leading to exceptions being logged repeatedly. This PR cancels the rechecks on learning that the node is no longer the master. Fixes #58531	2020-06-25 15:51:57 +01:00
William Brafford	958b21d727	Enable TTY password OS tests, plus refactoring (#57759 ) (#58200 ) * Enable TTY password OS tests, plus refactoring (#57759) Two keystore tests were unintentionally ignored when the password-protected keystore work was merged. I've reënabled those tests here. I've also refactored the test methods a little bit to reduce the API surface: instead of having a "startElasticsearchTtyPassword" method and a "startElasticsearchStandardInputPassword" method, I've made a single "startElasticsearch" method with a "useTty" boolean argument. * Separate daemonization and non-daemonization case for tty Centos 6 uses a version of expect that kills the elasticsearch process when it tries to daemonize. I will fix this in future work but for now I'm replacing it with a todo.	2020-06-25 10:49:17 -04:00
Nik Everett	335505c4e1	Drop deprecated aggregator wrapper (backport of #58367 ) (#58448 ) This drops the deprecated and now unused `asMultiBucketAggregator`. It was too easy to use it to make inefficient `Aggregators`. Relates to #56487	2020-06-25 09:31:19 -04:00
Rory Hunter	ebe1d9cdbe	Update rest-api-spec keyword list Follow-up to 35aecf4c9aa. Somehow I missed the fact that there's an ILM API named `retry`, which is a keyword in Ruby. I've removed it from the keywords list.	2020-06-25 09:55:13 +01:00
Rory Hunter	e413de4203	Validate that REST API names do not contain keywords (#58452 ) If an API name (or components of a name) overlaps with a reserved word in the programming language for an ES client, then it's possible that the code that is generated from the API will not compile. This PR adds validation to check for such overlaps.	2020-06-25 09:48:54 +01:00
James Rodewig	c3f4034199	[DOCS] Note that DS timestamp field mapping changes require reindex (#58444 ) (#58517 ) With #58096, data streams now track the timestamp field mapping outside of the template associated with the stream. This means you can no longer update the timestamp field mapping using template changes. This updates the associated data stream docs.	2020-06-24 17:21:26 -04:00
Julie Tibshirani	1f2e05c947	Simplify mapping validation for resizing indices. (#58514 ) When creating a target index from a source index, we don't allow for target mappings to be specified. This PR simplifies the check that the target mappings are empty. This refactor will help when implementing composable template merging, since we no longer need to resolve + check the target mappings when creating an index from a template.	2020-06-24 14:07:19 -07:00
Benjamin Trent	add8ff1ad3	[ML] assume data streams are enabled in data stream tests (#58502 ) (#58508 )	2020-06-24 14:14:48 -04:00
markharwood	837f2643eb	Docs - Added field capabilities breaking change (#58509 )	2020-06-24 18:39:01 +01:00
Chris Roberson	d5899d1765	[Monitoring] APM mapping update (#46244 ) (#58498 ) * Add acm mapping to APM for beats * Add root mapping for APM * Add sourcemap mapping to APM * Fix missing properties * Fix a second missing properties * Add request property to acm * Remove root and sourcemap per review Co-authored-by: Mike Place <mike.place@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-06-24 13:26:30 -04:00
Tim Brooks	5efec3a517	Add error logging when http test fails (#58505 ) Netty4HttpServerTransportTests has started to fail intermittently. It seems like unexpected successful responses are being received when the test is simulating errors. This commit adds logging to the test to provide additional information when there is an unexpected success. It also adds the logging to the nio http test.	2020-06-24 11:02:20 -06:00
Ryan Ernst	eb16ad4574	Add plugin installer test for plugin jar contents (#58287 ) The plugin installer currently checks the fake plugins installed contain a single jar file. This commit adds another test for a plugin which contains multiple jar files, ensuring all jars exist in the installed plugin.	2020-06-24 09:50:02 -07:00
Armin Braun	9e4c5d1dde	Cleaner Handling of Snapshot Related null Custom Values in CS (#58382 ) (#58501 ) Add the ability to get a custom value while specifying a default and use it throughout the codebase to get rid of the `null` edge case and shorten the code a little.	2020-06-24 17:24:44 +02:00
Martijn van Groningen	f4fad9c65a	Re-enable data streams yaml tests in bwc mode (#58500 ) Backport of #58403 to 7.x branch.	2020-06-24 16:59:51 +02:00
Hendrik Muhs	c1bbfeddc9	Improve rolling upgrade test setup assertions (#58313 ) wrap test setup and add proper assert messages relates #58282	2020-06-24 16:54:48 +02:00
Andrei Stefan	69f73d948b	EQL: code cleanup and further tests (#58458 ) (#58497 ) Add FunctionPipe tests to all functions. Cleanup functions code. (cherry picked from commit 0f83d5799841fe99d8aeaf46e50dd11aa6bf8a57)	2020-06-24 17:38:56 +03:00
Przemysław Witek	551b8bcd73	[7.x] Use static methods (rather than constants) to obtain .ml-meta and .ml-config index names (#58484 ) (#58490 )	2020-06-24 15:52:45 +02:00
Benjamin Trent	fa88e71532	[ML] unify usages of _all and wildcard <*> (#58460 ) (#58494 )	2020-06-24 09:47:57 -04:00
Russ Cam	441bc14d21	[DOCS] Update aliases to indicate array (#58469 ) Updates the aliases documentation to correct the parameter to an array.	2020-06-24 09:41:23 -04:00
Luca Cavanna	dbbf2772d8	Mute newly added ml data streams tests (#58492 ) Relates to #58491	2020-06-24 15:11:40 +02:00
Luca Cavanna	7e2bb8d6a2	Mute Netty4HttpServerTransportTests#testCorsRequest (#58480 ) Relates to #58433	2020-06-24 14:31:38 +02:00
Jim Ferenczi	f6d5f452cd	Fix MultiClusterSearchYamlTestSuiteIT test failures (#58359 ) Restore number of shards for the field_caps_empty_index	2020-06-24 13:39:30 +02:00
markharwood	d5ac3bb87f	Field capabilities - make `keyword` a family of field types (#58315 ) (#58483 ) Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type. Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities. Relates to #53175	2020-06-24 12:32:14 +01:00
Jim Ferenczi	ec8d5ec79c	Fix handling of terminate_after when size is 0 (#58212 ) `terminate_after` is ignored on search requests that don't return top hits (`size` set to 0) and do not tracked the number of hits accurately (`track_total_hits`). We use early termination when the number of hits to track is reached during collection but this breaks the hard termination of `terminate_after` if it happens before we reached the `terminate_after` value. This change ensures that we continue to check `terminate_after` even if the tracking of total hits has reached the provided value. Closes #57624	2020-06-24 13:16:11 +02:00
David Turner	796cb9e9ca	Reword INDEX_READ_ONLY_ALLOW_DELETE_BLOCK message (#58410 ) Users are perennially confused by the message they get when writing to an index is blocked due to excessive disk usage: TOO_MANY_REQUESTS/12/index read-only / allow delete (api) Of course this is technically accurate but it is hard to join the dots from this message to "your disk was too full" without some searching of forums and documentation. Additionally in #50166 we changed the status code to today's `429` from the previous `403` which changed the message from the one that's widely documented elsewhere: FORBIDDEN/12/index read-only / allow delete (api) Since #42559 we've considered this block to be under the sole control of the disk-based shard allocator, and we have seen no evidence to suggest that anyone is applying this block manually. Therefore this commit adjusts this block's message to indicate that it's caused by a lack of disk space.	2020-06-24 10:22:11 +01:00
Alan Woodward	d251a482e9	Move MappedFieldType.similarity() to TextSearchInfo (#58439 ) Similarities only apply to a few text-based field types, but are currently set directly on the base MappedFieldType class. This commit moves similarity information into TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper. It was previously possible to include a similarity parameter on a number of field types that would then ignore this information. To make it obvious that this has no effect, setting this parameter on non-text field types now issues a deprecation warning.	2020-06-24 10:00:32 +01:00
Jim Ferenczi	fcd8a432d9	Submit _async search task should cancel children on cancellation (#58332 ) This change allows the submit async search task to cancel children and removes the manual indirection that cancels the search task when the submit task is cancelled. This is now handled by the task cancellation, which can cancel grand-children since #54757.	2020-06-24 09:10:26 +02:00
Ryan Ernst	88f1dab8b5	Fix long/int precision for test baseport calculation	2020-06-23 16:02:13 -07:00
Ryan Ernst	6285b87b97	Adjust gradle base port by one (#58368 ) When assigning ports for internal cluster tests, we use the gradle worker id as an adjustment on the base port of 10300. In order to not go outside the max port range, we modulo the worker id by 223. Since gradle worker ids start at 1, we expect to never actually get the base port of 10300. However, as the gradle daemon lasts for longer, the module can result in a value of 0, which cases the test to fail. This commit adjusts the modulo to ensure the value is never 0. closes #58279	2020-06-23 15:42:26 -07:00
Ryan Ernst	89c03e593c	Create utility for custom config setup in packaging tests (#58352 ) This commit creates a shared withCustomConfig method that may be used by any packaging test. The method will copy the config directory and override the conf path appropriately depending on the distribution type.	2020-06-23 15:12:22 -07:00
Larry Gregory	2ca09cddaf	[DOCS] Rename kibana user to kibana_system (#58423 )	2020-06-23 14:25:09 -07:00
Przemysław Witek	4e4ca6ac25	Extract ClientHelper.filterSecurityHeaders method and use it in ML code (#58447 ) (#58459 )	2020-06-23 22:18:39 +02:00
Dan Hermann	b40c27698f	Fix incorrect stats warning when swap is disabled	2020-06-23 14:34:27 -05:00
Benjamin Trent	a9b868b7a9	[7.x] [ML] allow data streams to be expanded for analytics and transforms (#58280 ) (#58455 ) This commits allows data streams to be a valid source for analytics and transforms. Data streams are fairly transparent and our `_search` and `_reindex` actions work without error. For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.	2020-06-23 14:40:35 -04:00
Benjamin Trent	0cc84d3caf	[ML] wait for yellow state for stats index in tests (#58436 ) (#58456 ) GET inference stats now reads from the .ml-stats index. Our tests should wait for yellow state before attempting to query the index for stat information.	2020-06-23 13:32:24 -04:00
James Rodewig	affc3954e6	[DOCS] Fix typo in RoutingNode comment (#58079 ) (#58454 ) Co-authored-by: Howard <danielhuang@tencent.com>	2020-06-23 13:07:08 -04:00
Dimitris Athanasiou	f67fee387b	[7.x][ML] Make regression training set predictable in size (#58331 ) (#58453 ) Unlike `classification`, which is using a cross validation splitter that produces training sets whose size is predictable and equal to `training_percent * class_cardinality`, for regression we have been using a random splitter that takes an independent decision for each document. This means we cannot predict the exact size of the training set. This poses a problem as we move towards performing test inference on the java side as we need to be able to provide an accurate upper bound of the training set size to the c++ process. This commit replaces the random splitter we use for regression with the same streaming-reservoir approach we do for `classification`. Backport of #58331	2020-06-23 19:49:03 +03:00
Marios Trivyzas	e7c40d973e	SQL: Relax parsing of date/time escaped literals (#58336 ) (#58450 ) Improve the usability of the MS-SQL server/ODBC escaped date/time/timestamp literals, by allowing timezone/offset ids in the parsed string, e.g.: ``` {ts '2000-01-01T11:11:11Z'} ``` Closes: #58262 (cherry picked from commit 0af1f2fef805324e802d97d2fd9b4660abb403f0)	2020-06-23 18:05:54 +02:00
Christoph Büscher	642b05a511	Fix test failure in RangeQueryBuilderTests.testToQuery (#58449 ) Very rarely this test can fail if we draw a random TimeZone id that we cannot parse with the legacy joda DateMathParser and get an IllegalArgumentException. In addition to a "SystemV/*" time zone we also need an index "versionCreated" before V_7_0_0 and no "format" setting in the query builder. Given how unlikely this combination is, we should simply dissallow those time zone ids when generating the random query builder for RangeQueryBuilderTests. Closes #58431	2020-06-23 17:44:18 +02:00
David Roberts	0d6bfd0ac3	[7.x][ML] Fix wire serialization for flush acknowledgements (#58443 ) There was a discrepancy in the implementation of flush acknowledgements: most of the class was designed on the basis that the "last finalized bucket time" could be null but the wire serialization assumed that it was never null. This works because, the C++ sends zero "last finalized bucket time" when it is not known or not relevant. But then the Java code will print that to XContent as it is assuming null represents not known or not relevant. This change corrects the discrepancies. Internally within the class null represents not known or not relevant, but this is translated from/to 0 for communications from the C++ and old nodes that have the bug. Additionally I switched from Date to Instant for this class and made the member variables final to modernise it a bit. Backport of #58413	2020-06-23 16:42:06 +01:00
Mark Tozzi	52806a8f89	Small VS config cleanup (#58294 ) (#58442 )	2020-06-23 10:53:06 -04:00
Benjamin Trent	61142a3005	[ML] only log if forecasts are set to failed (#58421 ) (#58437 ) This adjusts the logging level for setting forecasts to failed to WARN. And it will only log if 1 or more forecasts were adjusted to failed.	2020-06-23 10:24:03 -04:00
James Rodewig	afbf3bd33b	[DOCS] Add data streams to bulk, delete, and index API docs (#58340 ) (#58434 ) Updates existing docs for the bulk, delete and index APIs to make them aware of data streams.	2020-06-23 09:40:25 -04:00

1 2 3 4 5 ...

52281 Commits All Branches Search

52281 Commits

All Branches