OpenSearch

Commit Graph

Author	SHA1	Message	Date
Benjamin Trent	34d61d3231	ML: ignore unknown fields for JobTaskState (#37982 )	2019-01-29 12:51:34 -06:00
David Kyle	6d1693ff49	[ML] Prevent submit after autodetect worker is stopped (#37700 ) Runnables can be submitted to AutodetectProcessManager.AutodetectWorkerExecutorService without error after it has been shutdown which can lead to requests timing out as their handlers are never called by the terminated executor. This change throws an EsRejectedExecutionException if a runnable is submitted after after the shutdown and calls AbstractRunnable.onRejection on any tasks not run. Closes #37108	2019-01-29 15:09:40 +00:00
Henrique Gonçalves	eceb3185c7	[ML] Make GetJobStats work with arbitrary wildcards and groups (#36683 ) The /_ml/anomaly_detectors/{job}/_stats endpoint now works correctly when {job} is a wildcard or job group. Closes #34745	2019-01-29 09:06:50 +00:00
Dimitris Athanasiou	ebe9c95230	[ML] Audit all errors during job deletion (#37933 ) This commit moves the auditing of job deletion related errors to the final listener in the job delete action. This ensures any error that occurs during job deletion is audited.	2019-01-29 10:23:50 +02:00
Benjamin Trent	7e4c0e6991	ML: Adds set_upgrade_mode API endpoint (#37837 ) * ML: Add MlMetadata.upgrade_mode and API * Adding tests * Adding wait conditionals for the upgrade_mode call to return * Adding tests * adjusting format and tests * Adjusting wait conditions for api return and msgs * adjusting doc tests * adding upgrade mode tests to black list	2019-01-28 09:07:30 -06:00
David Kyle	c0409fb9f0	[ML] Marginal gains in slow multi node QA tests (#37825 ) Move 2 tests that are simple rest tests and out of the QA suite and cut the number of post data calls in ForecastIT	2019-01-28 10:00:59 +00:00
David Roberts	57d321ed5f	[ML] Tighten up use of aliases rather than concrete indices (#37874 ) We have read and write aliases for the ML results indices. However, the job still had methods that purported to reliably return the name of the concrete results index being used by the job. After reindexing prior to upgrade to 7.x this will be wrong, so the method has been renamed and the comments made more explicit to say the returned index name may not be the actual concrete index name for the lifetime of the job. Additionally, the selection of indices when deleting the job has been changed so that it works regardless of concrete index names. All these changes are nice-to-have for 6.7 and 7.0, but will become critical if we add rolling results indices in the 7.x release stream as 6.7 and 7.0 nodes may have to operate in a mixed version cluster that includes a version that can roll results indices.	2019-01-28 09:38:46 +00:00
David Roberts	f2c0c26d15	[ML] Adjust structure finder for Joda to Java time migration (#37306 ) The ML file structure finder has always reported both Joda and Java time format strings. This change makes the Java time format strings the ones that are incorporated into mappings and ingest pipeline definitions. The BWC syntax of prepending "8" to these formats is used. This will need to be removed once Java time format strings become the default in Elasticsearch. This commit also removes direct imports of Joda classes in the structure finder unit tests. Instead the core Joda BWC class is used.	2019-01-26 20:19:57 +00:00
Benjamin Trent	9e932f4869	ML: removing unnecessary upgrade code (#37879 )	2019-01-25 13:57:41 -06:00
Christoph Büscher	b4b4cd6ebd	Clean codebase from empty statements (#37822 ) * Remove empty statements There are a couple of instances of undocumented empty statements all across the code base. While they are mostly harmless, they make the code hard to read and are potentially error-prone. Removing most of these instances and marking blocks that look empty by intention as such. * Change test, slightly more verbose but less confusing	2019-01-25 14:23:02 +01:00
David Roberts	deafce1acd	[ML] No need to add state doc mapping on job open in 7.x (#37759 ) When upgrading from 5.4 to 5.5 to 6.7 (inclusive) it was necessary to ensure there was a mapping for type "doc" on the ML state index before opening a job. This was because 5.4 created a multi-type ML state index. In version 7.x we can be sure that any such 5.4 index is no longer in use. It would have had to be reindexed into the 6.x index format prior to the upgrade to version 7.x.	2019-01-25 13:15:35 +00:00
Jim Ferenczi	787acb14b9	Track total hits up to 10,000 by default (#37466 ) This commit changes the default for the `track_total_hits` option of the search request to `10,000`. This means that by default search requests will accurately track the total hit count up to `10,000` documents, requests that match more than this value will set the `"total.relation"` to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response. Scroll queries are not impacted, they will continue to count the total hits accurately. The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request. I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate. Closes #33028	2019-01-25 13:45:39 +01:00
David Kyle	e1226f69b7	[ML] Increase close job timeout and lower the max number (#37770 )	2019-01-24 09:18:48 +00:00
Lee Hinman	427bc7f940	Use ILM for Watcher history deletion (#37443 ) * Use ILM for Watcher history deletion This commit adds an index lifecycle policy for the `.watch-history-*` indices. This policy is automatically used for all new watch history indices. This does not yet remove the automatic cleanup that the monitoring plugin does for the .watch-history indices, and it does not touch the `xpack.watcher.history.cleaner_service.enabled` setting. Relates to #32041	2019-01-23 10:18:08 -07:00
Alexander Reelsen	daa2ec8a60	Switch mapping/aggregations over to java time (#36363 ) This commit moves the aggregation and mapping code from joda time to java time. This includes field mappers, root object mappers, aggregations with date histograms, query builders and a lot of changes within tests. The cut-over to java time is a requirement so that we can support nanoseconds properly in a future field mapper. Relates #27330	2019-01-23 10:40:05 +01:00
David Roberts	7b3dd3022d	[ML] Update ML results mappings on process start (#37706 ) This change moves the update to the results index mappings from the open job action to the code that starts the autodetect process. When a rolling upgrade is performed we need to update the mappings for already-open jobs that are reassigned from an old version node to a new version node, but the open job action is not called in this case. Closes #37607	2019-01-23 09:37:37 +00:00
Ryan Ernst	fc99eb3e65	Add cache cleaning task for ML snapshot (#37505 ) The ML subproject of xpack has a cache for the cpp artifact snapshots which is checked on each build. The cache is outside of the build dir so that it is not wiped on a typical clean, as the artifacts can be large and do not change often. This commit adds a cleanCache task which will wipe the cache dir, as over time the size of the directory can become bloated.	2019-01-19 16:16:58 -08:00
Benjamin Trent	12cdf1cba4	ML: Add support for single bucket aggs in Datafeeds (#37544 ) Single bucket aggs are now supported in datafeed aggregation configurations.	2019-01-18 15:08:53 -06:00
Benjamin Trent	5384162a42	ML: creating ML State write alias and pointing writes there (#37483 ) * ML: creating ML State write alias and pointing writes there * Moving alias check to openJob method * adjusting concrete index lookup for ml-state	2019-01-18 14:32:34 -06:00
Yannick Welsch	6d64a2a901	Propagate Errors in executors to uncaught exception handler (#36137 ) This is a continuation of #28667 and has as goal to convert all executors to propagate errors to the uncaught exception handler. Notable missing ones were the direct executor and the scheduler. This commit also makes it the property of the executor, not the runnable, to ensure this property. A big part of this commit also consists of vastly improving the test coverage in this area.	2019-01-17 17:46:35 +01:00
David Kyle	75410dc632	[Ml] Prevent config snapshot failure blocking migration (#37493 )	2019-01-16 11:51:15 +00:00
Hendrik Muhs	15d1b904a1	[ML] log minimum diskspace setting if forecast fails due to insufficient d… (#37486 ) log minimum disk space setting if forecast fails due to insufficient disk space	2019-01-16 08:10:13 +01:00
David Kyle	bea46f7b52	[ML] Migrate unallocated jobs and datafeeds (#37430 ) Migrate ml job and datafeed config of open jobs and update the parameters of the persistent tasks as they become unallocated during a rolling upgrade. Block allocation of ml persistent tasks until the configs are migrated.	2019-01-15 18:21:39 +00:00
David Kyle	7c11b05c28	[ML] Remove unused code from the JIndex project (#37477 )	2019-01-15 17:19:58 +00:00
David Roberts	7cdf7f882b	[ML] Fix ML datafeed CCS with wildcarded cluster name (#37470 ) The test that remote clusters used by ML datafeeds have a license that allows ML was not accounting for the possibility that the remote cluster name could be wildcarded. This change fixes that omission. Fixes #36228	2019-01-15 14:19:05 +00:00
Julie Tibshirani	36a3b84fc9	Update the default for include_type_name to false. (#37285 ) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.	2019-01-14 13:08:01 -08:00
David Kyle	2ee55a50bf	[ML] Use String rep of Version in map for serialisation (#37416 )	2019-01-14 16:39:47 +00:00
Benjamin Trent	5101e51891	ML: Fix testMigrateConfigs (#37373 ) * ML: :s/execute/get * Fixing other broken tests * unmuting test	2019-01-11 13:29:30 -06:00
Gordon Brown	827ece73c8	Mute MlConfigMigratorIT.testMigrateConfigs (#37374 )	2019-01-11 11:11:58 -07:00
David Roberts	953fb9352f	[ML] Update error message for process update (#37363 ) When this message was first added the model debug config was the only thing that could be updated, but now more aspects of the config can be updated so the message needs to be more general.	2019-01-11 16:31:55 +00:00
Benjamin Trent	19a7e0f4eb	ML: update .ml-state actions to support > 1 index (#37307 ) * ML: Updating .ml-state calls to be able to support > 1 index * Matching bulk delete behavior with dbq * Adjusting state name * refreshing indices before search * fixing line length * adjusting index expansion options	2019-01-11 08:03:41 -06:00
David Roberts	1da59db3fb	[ML] Wait for autodetect to be ready in the datafeed (#37349 ) This is a reinforcement of #37227. It turns out that persistent tasks are not made stale if the node they were running on is restarted and the master node does not notice this. The main scenario where this happens is when minimum master nodes is the same as the number of nodes in the cluster, so the cluster cannot elect a master node when any node is restarted. When an ML node restarts we need the datafeeds for any jobs that were running on that node to not just wait until the jobs are allocated, but to wait for the autodetect process of the job to start up. In the case of reassignment of the job persistent task this was dealt with by the stale status test. But in the case where a node restarts but its persistent tasks are not reassigned we need a deeper test. Fixes #36810	2019-01-11 13:22:35 +00:00
markharwood	434430506b	Type removal - added deprecation warnings to _bulk apis (#36549 ) Added warnings checks to existing tests Added “defaultTypeIfNull” to DocWriteRequest interface so that Bulk requests can override a null choice of document type with any global custom choice. Related to #35190	2019-01-10 21:35:19 +00:00
David Roberts	b65006e8cd	[ML] Fix ML memory tracker for old jobs (#37311 ) Jobs created in version 6.1 or earlier can have a null model_memory_limit. If these are parsed from cluster state following a full cluster restart then we replace the null with 4096mb to make the meaning explicit. But if such jobs are streamed from an old node in a mixed version cluster this does not happen. Therefore we need to account for the possibility of a null model_memory_limit in the ML memory tracker.	2019-01-10 17:28:00 +00:00
Benjamin Trent	df3b58cb04	ML: add migrate anomalies assistant (#36643 ) * ML: add migrate anomalies assistant * adjusting failure handling for reindex * Fixing request and tests * Adding tests to blacklist * adjusting test * test fix: posting data directly to the job instead of relying on datafeed * adjusting API usage * adding Todos and adjusting endpoint * Adding types to reindexRequest * removing unreliable "live" data test * adding index refresh to test * adding index refresh to test * adding index refresh to yaml test * fixing bad exists call * removing todo * Addressing remove comments * Adjusting rest endpoint name * making service have its own logger * adjusting validity check for newindex names * fixing typos * fixing renaming	2019-01-09 14:25:35 -06:00
David Roberts	e0ce73713f	[ML] Stop datafeeds running when their jobs are stale (#37227 ) We already had logic to stop datafeeds running against jobs that were OPENING, but a job that relocates from one node to another while OPENED stays OPENED, and this could cause the datafeed to fail when it sent data to the OPENED job on its new node before it had a corresponding autodetect process. This change extends the check to stop datafeeds running when their job is OPENING _or_ stale (i.e. has not had its status reset since relocating to a different node). Relates #36810	2019-01-09 10:42:47 +00:00
David Roberts	f14cff2102	[TEST] Ensure interrupted flag reset after test that sets it (#37230 ) Test fix to stop a problem in one test leaking into a different test and causing that other test to spuriously fail.	2019-01-09 08:51:00 +00:00
Benjamin Trent	6b376a1ff4	ML: fix delayed data annotations on secured cluster (#37193 ) * changing executing context for writing annotation * adjusting user * removing unused import	2019-01-07 15:18:38 -06:00
Benjamin Trent	1780ced82d	ML: changing JobResultsProvider.getForecastRequestStats to support > 1 index (#37157 ) * ML: changing JobResultsProvider.getForecastRequestStats to support more than one index * moving to use idsQuery()	2019-01-07 10:58:55 -06:00
Armin Braun	31c33fdb9b	MINOR: Remove some Deadcode in Gradle (#37160 )	2019-01-07 09:21:25 +01:00
David Roberts	ff7df40b20	[ML] Uplift model memory limit on job migration (#37126 ) When a 6.1-6.3 job is opened in a later version we increase the model memory limit by 30% if it's below 0.5GB. The migration of jobs from cluster state to the config index changes the job version, so we need to also do this uplift as part of that config migration. Relates #36961	2019-01-04 12:21:28 +00:00
Dimitris Athanasiou	0fd27d4d6f	[ML] Unused state remover should also account for jobs in index (#37119 ) The unused state remover was never adjusted to account for jobs stored in the config index. The result was that when triggered it removed state for all jobs stored in the config index. This commit fixes the issue. Closes #37109	2019-01-04 12:43:44 +02:00
Dimitris Athanasiou	586453fef1	[ML] Remove types from datafeed (#36538 ) Closes #34265	2019-01-04 09:43:44 +02:00
David Roberts	13649aa70a	[TEST] Revert "Mute ForecastIT.testSingleSeries" (#37110 ) The problem that caused the test to be muted was fixed in https://github.com/elastic/ml-cpp/pull/332 Closes #36258	2019-01-03 16:23:18 +00:00
Benjamin Trent	cfc310748d	addressing (#36891 )(#36888 )(#36889 ) (#37080 )	2019-01-03 07:25:57 -06:00
David Kyle	42bb2bae21	[ML] Order GET job stats response by job id (#36841 )	2019-01-02 16:52:20 +00:00
Hendrik Muhs	632c7fbed2	[ML] fix x-pack usage regression caused by index migration (#36936 ) Changes the feature usage retrieval to use the job manager rather than directly talking to the cluster state, because jobs can now be either in cluster state or stored in an index This is a follow-up of #36702 / #36698	2018-12-31 08:30:08 +01:00
Dimitris Athanasiou	08bcd83757	[ML] Reduce persistent tasks periodic reassignment interval in ... (#36845 ) ... MlDistributedFailureIT.testLoseDedicatedMasterNode. An intermittent failure has been observed in `MlDistributedFailureIT. testLoseDedicatedMasterNode`. The test launches a cluster comprised by a dedicated master node and a data and ML node. It creates a job and datafeed and starts them. It then shuts down and restarts the master node. Finally, the test asserts that the two tasks have been reassigned within 10s. The intermittent failure is due to the assertions that the tasks have been reassigned failing. Investigating the failure revealed that the `assertBusy` that performs that assertion times out. Furthermore, it appears that the job task is not reassigned because the memory tracking info is stale. Memory tracking info is refreshed asynchronously when a job is attempted to be reassigned. Tasks are attempted to be reassigned either due to a relevant cluster state change or periodically. The periodic interval is controlled by a cluster setting called `cluster.persistent_tasks.allocation.recheck_interval` and defaults to 30s. What seems to be happening in this test is that if all cluster state changes after the master node is restarted come through before the async memory info refresh completes, then the job might take up to 30s until it is attempted to reassigned. Thus the `assertBusy` times out. This commit changes the test to reduce the periodic check that reassigns persistent tasks to `200ms`. If the above theory is correct, this should eradicate those failures. Closes #36760	2018-12-20 14:53:36 +02:00
David Roberts	0f2f00a20a	[ML] Resolve 7.0.0 TODOs in ML code (#36842 ) This change cleans up a number of ugly BWC workarounds in the ML code. 7.0 cannot run in a mixed version cluster with versions prior to 6.7, so code that deals with these old versions is no longer required. Closes #29963	2018-12-20 12:49:57 +00:00
David Kyle	d43cbdab97	[ML] ensure the ml-config index (#36792 ) (#36832 )	2018-12-19 13:43:43 +00:00

1 2 3 4 5 ...

265 Commits