OpenSearch

Commit Graph

Author	SHA1	Message	Date
Dimitris Athanasiou	322f953060	[7.x][ML] Anomaly detection jobs should allow missing values for geo fields (#57300 ) (#57338 ) Allows geo fields (`geo_point`, `geo_shape`) to have missing values. Fixes a bug where such missing values would result in an error. Closes #57299 Backport of #57300	2020-05-29 13:06:16 +03:00
Benjamin Trent	24d605e41e	[ML] fixing GET _ml/inference so size param is respected (#57303 ) (#57308 ) `size` was previously ignored when grabbing full trained model configs. closes https://github.com/elastic/elasticsearch/issues/57298	2020-05-28 15:45:26 -04:00
David Roberts	d139a79ef6	[7.x][ML] Fix monitoring if orphaned anomaly detector persistent tasks exist (#57240 ) Since #51888 the ML job stats endpoint has returned entries for jobs that have a persistent task but not job config. Such orphaned tasks caused monitoring to fail. This change ignores any such corrupt jobs for monitoring purposes. Backport of #57235	2020-05-27 22:59:11 +01:00
Benjamin Trent	decc6277f9	[ML] allow unran/incomplete forecasts to be deleted for stopped/failed jobs (#57152 ) (#57172 ) If a job is NOT opened, forecasts should be able to be deleted, no matter their state. This also fixes a bug with expanding forecast IDs. We should check for wildcard `*` and `_all` when expanding the ids closes https://github.com/elastic/elasticsearch/issues/56419	2020-05-26 15:44:22 -04:00
David Kyle	571477d0ad	[7.x] Fix delete_expired_data/nightly maintenance when many model snapshots need deleting (#57041 ) (#57136 ) Fix delete_expired_data/nightly maintenance when many model snapshots need deleting (#57041) The queries performed by the expired data removers pull back entire documents when only a few fields are required. For ModelSnapshots in particular this is a problem as they contain quantiles which may be 100s of KB and the search size is set to 10,000. This change makes the search more efficient by only requesting the fields needed to work out which expired data should be deleted.	2020-05-26 10:56:42 +01:00
Przemysław Witek	ea2012778e	Mute failing test (#57112 ) (#57113 )	2020-05-25 14:06:29 +02:00
Benjamin Trent	f00dfb2d5f	[ML] adds WKT support in filestructurefinder (#57014 ) (#57032 ) Field mapping detection is done via grok patterns. This commit adds well-known text (WKT) formatted geometry detection. If everything is a `POINT`, then a `geo_point` mapping is preferred. Otherwise, if all the fields are WKT geometries a `geo_shape` mapping is preferred. This does NOT detect other types of formatted geometries (geohash, comma delimited points, etc.) closes https://github.com/elastic/elasticsearch/issues/56967	2020-05-21 08:22:51 -04:00
Alan Woodward	18bfbeda29	Move merge compatibility logic from MappedFieldType to FieldMapper (#56915 ) Merging logic is currently split between FieldMapper, with its merge() method, and MappedFieldType, which checks for merging compatibility. The compatibility checks are called from a third class, MappingMergeValidator. This makes it difficult to reason about what is or is not compatible in updates, and even what is in fact updateable - we have a number of tests that check compatibility on changes in mapping configuration that are not in fact possible. This commit refactors the compatibility logic so that it all sits on FieldMapper, and makes it called at merge time. It adds a new FieldMapperTestCase base class that FieldMapper tests can extend, and moves the compatibility testing machinery from FieldTypeTestCase to here. Relates to #56814	2020-05-20 09:43:13 +01:00
Benjamin Trent	297f864884	[ML] relax throttling on expired data cleanup (#56711 ) (#56895 ) Throttling nightly cleanup as much as we do has been over cautious. Night cleanup should be more lenient in its throttling. We still keep the same batch size, but now the requests per second scale with the number of data nodes. If we have more than 5 data nodes, we don't throttle at all. Additionally, the API now has `requests_per_second` and `timeout` set. So users calling the API directly can set the throttling. This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`. This will allow users to adjust throttling of the nightly maintenance.	2020-05-18 08:46:42 -04:00
Dimitris Athanasiou	54d3cc74ec	[7.x][ML] Ensure class is represented when its cardinality is low (#56783 ) (#56829 ) In DF analytics classification, it is possible to use no samples of a class if its cardinality is too low. This commit fixes this by ensuring the target sample count can never be zero. Backport of #56783	2020-05-15 20:52:06 +03:00
David Roberts	270a23e422	[TEST] Fix log tail mocking in native process unit tests (#56804 ) This is a followup to #56632. Tests that had to be changed to mock the C++ log handler more accurately need to be more careful about when that stream ends, as ending of that stream is used to detect crashes in the production system. Fixes #56796	2020-05-15 12:46:37 +01:00
Dimitris Athanasiou	ac5902624c	[7.x][ML] Improve error upon DF analytics mappings conflict (#56700 ) (#56776 ) Adds the conflicting types and an example of an index which specifies them in order to make it easier for the user to understand the conflict. Backport of #56700	2020-05-14 19:16:10 +03:00
David Roberts	3051c37f92	[ML] Tail the C++ logging pipe before connecting other pipes (#56701 ) Prior to this change the named pipes that connect the ML C++ processes to the Elasticsearch JVM were all opened before any of them were read from or written to. This created a problem, where if the C++ process logged more messages between opening the log pipe and opening the last pipe to be connected than there was space for in the named pipe's buffer then the C++ process would block. This would mean it never got as far as opening the last named pipe, so the JVM would never get as far as reading from the log pipe, hence a deadlock. This change alters the connection order so that the JVM starts reading from the logging pipe immediately after opening it so that if the C++ process logs messages while opening the other named pipes they are captured in a timely manner and there is no danger of a deadlock. Backport of #56632	2020-05-14 07:10:30 +01:00
Armin Braun	0a879b95d1	Save Bounds Checks in BytesReference (#56577 ) (#56621 ) Two spots that allow for some optimization: * We are often creating a composite reference of just a single item in the transport layer => special cased via static constructor to make sure we never do that * Also removed the pointless case of an empty composite bytes ref * `ByteBufferReference` is practically always created from a heap buffer these days so there is no point of dealing with all the bounds checks and extra references to sliced buffers from that and we can just use the underlying array directly	2020-05-12 20:33:45 +02:00
Ryan Ernst	902fc546bd	Migrate remaining ESIntegTestCases to internalClusterTest (#56479 ) (#56563 ) This commit migrates the ESIntegTestCase tests in x-pack to the internalClusterTest source set.	2020-05-11 21:06:04 -07:00
zhenxianyimeng	8e96e5c936	Use CollectionUtils.isEmpty where appropriate (#55910 ) This commit uses the isEmpty utility method for arrays in place of null and greater than zero checks.	2020-05-11 09:55:57 -07:00
Dimitris Athanasiou	44ffa388ac	[7.x][ML] Use non-zero timeout when force stopping DF analytics (#56423 ) (#56428 ) We have been using a zero timeout in the case that DF analytics is stopped. This may cause a timeout when we cancel, for example, the reindex task. This commit fixes this by using the default timeout instead. Backport of #56423	2020-05-08 21:12:11 +03:00
David Roberts	9a3924a641	[ML] Adjust list of platforms that have ML native code (#56426 ) Native code is now available for linux-aarch64. Note that it is _not_ currently supported!	2020-05-08 16:22:45 +01:00
Dimitris Athanasiou	c117ae7a6e	[7.x][ML] Force stopping stopped DF analytics should succeed (#56421 ) (#56424 ) Force stopping a DF analytics job whose config exists and that is stopped should succeed. This was broken by #56360. Closes #56414 Backport of #56421	2020-05-08 18:04:24 +03:00
Dimitris Athanasiou	60b1c67409	[7.x][ML] Allow stopping DF analytics whose config is missing (#56360 ) (#56408 ) It is possible that the config document for a data frame analytics job is deleted from the config index. If that is the case the user is unable to stop a running job because we attempt to retrieve the config and that will throw. This commit changes that. When the request is forced, we do not expand the requested ids based on the existing configs but from the list of running tasks instead. Backport of #56360	2020-05-08 13:54:44 +03:00
Dimitris Athanasiou	d064eda2b0	[7.x][ML] Ensure phase progress may only increase (#56339 ) (#56357 ) Due to multi-threading it is possible that phase progress updates written from the c++ process arrive reordered. We can address this by ensuring that progress may only increase. Closes #56282 Backport of #56339	2020-05-07 19:46:58 +03:00
Przemysław Witek	0cd0ab276e	Introduce Annotation.Builder class and use it to create instances of Annotation class (#56276 ) (#56286 )	2020-05-06 20:47:03 +02:00
Dimitris Athanasiou	011e995165	[7.x][ML] Unmute ClssificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange (#56268 ) (#56287 ) Closes #56240	2020-05-06 18:20:46 +03:00
Julie Tibshirani	49de092b38	Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet.	2020-05-05 16:25:36 -07:00
Julie Tibshirani	63062ec7bd	Mute ClassificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange.	2020-05-05 13:48:35 -07:00
Dan Hermann	6674f14fb3	[7.x] Get index includes parent data stream for backing indices (#56238 )	2020-05-05 15:43:42 -05:00
Benjamin Trent	e1c5ca421e	[7.x] [ML] lay ground work for handling >1 result indices (#55892 ) (#56192 ) * [ML] lay ground work for handling >1 result indices (#55892) This commit removes all but one reference to `getInitialResultsIndexName`. This is to support more than one result index for a single job.	2020-05-05 15:54:08 -04:00
William Brafford	3499fa917c	Deprecated xpack "enable" settings should be no-ops (#55416 ) (#56167 ) The following settings are now no-ops: * xpack.flattened.enabled * xpack.logstash.enabled * xpack.rollup.enabled * xpack.slm.enabled * xpack.sql.enabled * xpack.transform.enabled * xpack.vectors.enabled Since these settings no longer need to be checked, we can remove settings parameters from a number of constructors and methods, and do so in this commit. We also update documentation to remove references to these settings.	2020-05-05 10:40:49 -04:00
David Roberts	7aa0daaabd	[7.x][ML] More advanced model snapshot retention options (#56194 ) This PR implements the following changes to make ML model snapshot retention more flexible in advance of adding a UI for the feature in an upcoming release. - The default for `model_snapshot_retention_days` for new jobs is now 10 instead of 1 - There is a new job setting, `daily_model_snapshot_retention_after_days`, that defaults to 1 for new jobs and `model_snapshot_retention_days` for pre-7.8 jobs - For days that are older than `model_snapshot_retention_days`, all model snapshots are deleted as before - For days that are in between `daily_model_snapshot_retention_after_days` and `model_snapshot_retention_days` all but the first model snapshot for that day are deleted - The `retain` setting of model snapshots is still respected to allow selected model snapshots to be retained indefinitely Backport of #56125	2020-05-05 14:31:58 +01:00
Dimitris Athanasiou	75dadb7a6d	[7.x][ML] Add loss_function to regression (#56118 ) (#56187 ) Adds parameters `loss_function` and `loss_function_parameter` to regression. Backport of #56118	2020-05-05 14:59:51 +03:00
Dimitris Athanasiou	6061aa3db4	[7.x][ML] Fix race condition updating reindexing progress (#56135 ) (#56146 ) In #55763 I thought I could remove the flag that marks reindexing was finished on a data frame analytics task. However, that exposed a race condition. It is possible that between updating reindexing progress to 100 because we have called `DataFrameAnalyticsManager.startAnalytics()` and a call to the _stats API which updates reindexing progress via the method `DataFrameAnalyticsTask.updateReindexTaskProgress()` we end up overwriting the 100 with a lower progress value. This commit fixes this issue by bringing back the help of a `isReindexingFinished` flag as it was prior to #55763. Closes #56128 Backport of #56135	2020-05-05 10:48:42 +03:00
Martijn van Groningen	2ac32db607	Move includeDataStream flag from IndicesOptions to IndexNameExpressionResolver.Context (#56151 ) Backport of #56034. Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context as a dedicated field that callers to IndexNameExpressionResolver can set. Also alter indices stats api to support data streams. The rollover api uses this api and otherwise rolling over data stream does no longer work. Relates to #53100	2020-05-04 22:38:33 +02:00
Benjamin Trent	6c26de444d	[ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines (#56020 ) (#56126 ) If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam. It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion. Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor). closes https://github.com/elastic/elasticsearch/issues/55985	2020-05-04 13:32:01 -04:00
Martijn van Groningen	6d03081560	Add auto create action (#56122 ) Backport of #55858 to 7.x branch. Currently the TransportBulkAction detects whether an index is missing and then decides whether it should be auto created. The coordination of the index creation also happens in the TransportBulkAction on the coordinating node. This change adds a new transport action that the TransportBulkAction delegates to if missing indices need to be created. The reasons for this change: * Auto creation of data streams can't occur on the coordinating node. Based on the index template (v2) either a regular index or a data stream should be created. However if the coordinating node is slow in processing cluster state updates then it may be unaware of the existence of certain index templates, which then can load to the TransportBulkAction creating an index instead of a data stream. Therefor the coordination of creating an index or data stream should occur on the master node. See #55377 * From a security perspective it is useful to know whether index creation originates from the create index api or from auto creating a new index via the bulk or index api. For example a user would be allowed to auto create an index, but not to use the create index api. The auto create action will allow security to distinguish these two different patterns of index creation. This change adds the following new transport actions: AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created. The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege. Relates to #53100	2020-05-04 19:10:09 +02:00
Przemysław Witek	44f5a8ccd3	Use snapshot's latest result time rather than snapshot's creation time when creating an annotation (#56093 ) (#56103 )	2020-05-04 12:36:12 +02:00
William Brafford	d53c941c41	Make xpack.monitoring.enabled setting a no-op (#55617 ) (#56061 ) * Make xpack.monitoring.enabled setting a no-op This commit turns xpack.monitoring.enabled into a no-op. Mostly, this involved removing the setting from the setup for integration tests. Monitoring may introduce some complexity for test setup and teardown, so we should keep an eye out for turbulence and failures * Docs for making deprecated setting a no-op	2020-05-01 16:42:11 -04:00
Ryan Ernst	52b9d8d15e	Convert remaining license methods to isAllowed (#55908 ) (#55991 ) This commit converts the remaining isXXXAllowed methods to instead of use isAllowed with a Feature value. There are a couple other methods that are static, as well as some licensed features that check the license directly, but those will be dealt with in other followups.	2020-04-30 15:52:22 -07:00
Benjamin Trent	c36bcb4dd0	[ML] fixing file structure finder multiline merge max for delimited formats (#56023 ) (#56035 ) This commit correctly sets the maxLinesPerRow in the CsvPreference for delimited files given the file structure finder settings. Previously, it was silently ignored.	2020-04-30 10:51:32 -04:00
Benjamin Trent	04b1f6498b	[ML] using new fixed interval in ml tests (#56021 ) (#56031 ) This commit removes deprecated references to DateHistogram.interval from ml tests	2020-04-30 10:26:39 -04:00
Dimitris Athanasiou	17b904def5	[7.x][ML] Decouple DFA progress testing from analyses phases (#55925 ) (#56024 ) This refactors native integ tests to assert progress without expecting explicit phases for analyses. We can test those with yaml tests in a single place. Backport of #55925	2020-04-30 17:05:47 +03:00
William Brafford	273ff6a105	Make xpack.ilm.enabled setting a no-op (#55592 ) (#55980 ) * Make xpack.ilm.enabled setting a no-op * Add watcher setting to not use ILM * Update documentation for no-op setting * Remove NO_ILM ml index templates * Remove unneeded setting from test setup * Inline variable definitions for ML templates * Use identical parameter names in templates * New ILM/watcher setting falls back to old setting * Add fallback unit test for watcher/ilm setting	2020-04-30 09:50:18 -04:00
David Kyle	c204353249	[ML] Wait for model loaded and cached in ModelLoadingServiceTests (#56014 ) Fixes test by exposing the method ModelLoadingService::addModelLoadedListener() so that the test class can be notified when a model is loaded which happens in a background thread	2020-04-30 13:32:07 +01:00
Dimitris Athanasiou	c5aa281171	[7.x][ML] Remove error on parsing progress for unknown phase in DFA (#55926 ) (#55954 ) On second thought, this check does not seem to be adding value. We can test that the phases are as we expect them for each analysis by adding yaml tests. Those would fail if we introduce new phases from c++ accidentally or without coordination. This would achieve the same thing. At the same time we would not have to comment out this code each time a new phase is introduced. Instead we can just temporarily mute those yaml tests. Note I will add those tests right after the imminent new phases are added to the c++ side. Backport of #55926	2020-04-29 20:11:33 +03:00
Benjamin Trent	edd049f9cd	[ML] Allow a certain number of ill-formatted rows when delimited format is specified (#55735 ) (#55944 ) While it is good to not be lenient when attempting to guess the file format, it is frustrating to users when they KNOW it is CSV but there are a few ill-formatted rows in the file (via some entry error, etc.). This commit allows for up to 10% of sample rows to be considered "bad". These rows are effectively ignored while guessing the format. This percentage of "allows bad rows" is only applied when the user has specified delimited formatting options. As the structure finder needs some guidance on what a "bad row" actually means. related to https://github.com/elastic/elasticsearch/issues/38890	2020-04-29 11:15:21 -04:00
Dimitris Athanasiou	d9685a0f19	[7.x][ML] Validate at least one feature is available for DF analytics (#55876 ) (#55914 ) We were previously checking at least one supported field existed when the _explain API was called. However, in the case of analyses with required fields (e.g. regression) we were not accounting that the dependent variable is not a feature and thus if the source index only contains the dependent variable field there are no features to train a model on. This commit adds a validation that at least one feature is available for analysis. Note that we also move that validation away from `ExtractedFieldsDetector` and the _explain API and straight into the _start API. The reason for doing this is to allow the user to use the _explain API in order to understand why they would be seeing an error like this one. For example, the user might be using an index that has fields but they are of unsupported types. If they start the job and get an error that there are no features, they will wonder why that is. Calling the _explain API will show them that all their fields are unsupported. If the _explain API was failing instead, there would be no way for the user to understand why all those fields are ignored. Closes #55593 Backport of #55876	2020-04-29 11:39:58 +03:00
David Roberts	61ac09ae21	[ML] Add daily_model_snapshot_retention_after_days to job config (#55891 ) This change adds a new setting, daily_model_snapshot_retention_after_days, to the anomaly detection job config. Initially this has no effect, the effect will be added in a followup PR. This PR gets the complexities of making changes that interact with BWC over well before feature freeze. Backport of #55878	2020-04-29 09:12:53 +01:00
Dimitris Athanasiou	abab4c4d4f	[7.x][ML] Do not fail DFA task when it's stopped whilst reindexing (#55797 ) (#55800 ) Adding to #55659, we missed another way we could set the task to failed due to task cancellation. CI revealed that we might also get a `SearchPhaseExecutionException` whose cause is a `TaskCancelledException`. That exception is not wrapped so unwrapping it will not return the underlying `TaskCancelledException`. Thus to be complete in catching this, we also need to check the error's cause. Closes #55068 Backport of #55797	2020-04-27 16:03:57 +03:00
Dimitris Athanasiou	7f100c1196	[7.x][ML] Allow analytics process define its own progress phases (#55763 ) (#55791 ) This is a continuation from #55580. Now that we're parsing phase progresses from the analytics process we change `ProgressTracker` to allow for custom phases between the `loading_data` and `writing_results` phases. Each `DataFrameAnalysis` may declare its own phases. This commit sets things in place for the analytics process to start reporting different phases per analysis type. However, this is still preserving existing behaviour as all analyses currently declare a single `analyzing` phase. Backport of #55763	2020-04-27 13:30:05 +03:00
David Roberts	3ba44a5af8	[ML] Adding failed_category_count to model_size_stats (#55761 ) The failed_category_count statistic records the number of times categorization wanted to create a new category but couldn't because the job had reached its model_memory_limit. Backport of #55716	2020-04-25 10:36:49 +01:00
Dimitris Athanasiou	210b7f1b76	[7.x][ML] Remove parsing of old progress format in DF Analytics (#55711 ) (#55720 ) Since #55580 we've introduced a new format for parsing progress from the data frame analytics process. As the process is now writing out progress in this new way, we can remove the parsing of the old format. Backport of #55711	2020-04-24 16:50:56 +03:00

1 2 3 4 5 ...

847 Commits