OpenSearch

Commit Graph

Author	SHA1	Message	Date
Przemysław Witek	e60837aa3b	[7.x] Log whole analytics stats when the state assertion fails (#49906 ) (#49911 )	2019-12-06 14:31:17 +01:00
Przemysław Witek	1d8e3d69d7	Make only a part of `stop()` method a critical section. (#49756 ) (#49788 )	2019-12-03 09:54:16 +01:00
Dimitris Athanasiou	4edb2e7bb6	[7.x][ML] Add optional source filtering during data frame reindexing (#49690 ) (#49718 ) This adds a `_source` setting under the `source` setting of a data frame analytics config. The new `_source` is reusing the structure of a `FetchSourceContext` like `analyzed_fields` does. Specifying includes and excludes for source allows selecting which fields will get reindexed and will be available in the destination index. Closes #49531 Backport of #49690	2019-11-29 16:10:44 +02:00
Dimitris Athanasiou	c23a2187da	[7.x][ML] Only report complete writing_results progress after completion (#49551 ) (#49577 ) We depend on the number of data frame rows in order to report progress for the writing of results, the last phase of a job run. However, results include other objects than just the data frame rows (e.g, progress, inference model, etc.). The problem this commit fixes is that if we receive the last data frame row results we'll report that progress is complete even though we still have more results to process potentially. If the job gets stopped for any reason at this point, we will not be able to restart the job properly as we'll think that the job was completed. This commit addresses this by limiting the max progress we can report for the writing_results phase before the results processor completes to 98. At the end, when the process is done we set the progress to 100. The commit also improves failure capturing and reporting in the results processor. Backport of #49551	2019-11-26 12:20:37 +02:00
Benjamin Trent	688c78c589	[ML] Stop timing stats failure propagation (#49495 ) (#49501 )	2019-11-25 10:09:30 -05:00
David Roberts	62811c2272	[ML] Add default categorization analyzer definition to ML info (#49545 ) The categorization job wizard in the ML UI will use this information when showing the effect of the chosen categorization analyzer on a sample of input.	2019-11-25 13:39:16 +00:00
Dimitris Athanasiou	aca38f6882	[7.x][ML] DFA jobs should accept excluding an unsupported field (#49535 ) (#49544 ) Before this change excluding an unsupported field resulted in an error message that explained the excluded field could not be detected as if it doesn't exist. This error message is confusing. This commit commit changes this so that there is no error in this scenario. When excluding a field that does exist but has been automatically been excluded from the analysis there is no harm (unlike excluding a missing field which could be a typo). Backport of #49535	2019-11-25 15:13:00 +02:00
Dimitris Athanasiou	c149c64dc4	[7.x][ML] Apply source query on data frame analytics memory estimation (#49517 ) (#49532 ) Closes #49454 Backport of #49517	2019-11-25 12:51:57 +02:00
Dimitris Athanasiou	8eaee7cbdc	[7.x][ML] Explain data frame analytics API (#49455 ) (#49504 ) This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why. Backport of #49455	2019-11-22 22:06:10 +02:00
Benjamin Trent	a7477ad7c3	[7.x] [ML][Inference] compressing model definition and lazy parsing (#49269 ) (#49446 ) * [ML][Inference] compressing model definition and lazy parsing (#49269) * [ML][Inference] compressing model definition and lazy parsing * addressing PR comments * adding commons io * implementing simplified bounded stream * adjusting for type inclusion	2019-11-21 15:32:32 -05:00
Benjamin Trent	d41b2e3f38	[ML][Inference] allowing per-model licensing (#49398 ) (#49435 ) * [ML][Inference] allowing per-model licensing * changing to internal action + removing pre-mature opt	2019-11-21 09:46:34 -05:00
Przemysław Witek	c7ac2011eb	[7.x] Implement accuracy metric for multiclass classification (#47772 ) (#49430 )	2019-11-21 15:01:18 +01:00
David Roberts	20558cf61c	[ML] Fix simultaneous stop and force stop datafeed (#49367 ) If a datafeed is stopped normally and force stopped at the same time then it is possible that the force stop removes the persistent task while the normal stop is performing actions. Currently this causes the normal stop to error, but since stopping a stopped datafeed is not an error this doesn't make sense. Instead the force stop should just take precedence. This is a followup to #49191 and should really have been included in the changes in that PR.	2019-11-20 12:52:47 +00:00
Przemysław Witek	9c0ec7ce23	[7.x] Make AnalyticsProcessManager class more robust (#49282 ) (#49356 )	2019-11-20 10:08:16 +01:00
Dimitris Athanasiou	4d6e037e90	[7.x][ML] Extract creation of DFA field extractor into a factory (#49315 ) (#49329 ) This commit moves the async calls required to retrieve the components that make up `ExtractedFieldsExtractor` out of `DataFrameDataExtractorFactory` and into a dedicated `ExtractorFieldsExtractorFactory` class. A few more refactorings are performed: - The detector no longer needs the results field. Instead, it knows whether to use it or not based on whether the task is restarting. - We pass more accurately whether the task is restarting or not. - The validation of whether fields that have a cardinality limit are valid is now performed in the detector after retrieving the respective cardinalities. Backport of #49315	2019-11-20 10:02:42 +02:00
Przemysław Witek	42bb8ae525	[7.x] Extract indexData method out of RegressionIT tests (#49306 ) (#49313 )	2019-11-19 22:47:12 +01:00
Benjamin Trent	19602fd573	[ML][Inference] changing setting to be memorySizeSettting (#49259 ) (#49302 )	2019-11-19 07:56:40 -05:00
David Roberts	a5204c1c80	[ML] Fixes for stop datafeed edge cases (#49284 ) The following edge cases were fixed: 1. A request to force-stop a stopping datafeed is no longer ignored. Force-stop is an important recovery mechanism if normal stop doesn't work for some reason, and needs to operate on a datafeed in any state other than stopped. 2. If the node that a datafeed is running on is removed from the cluster during a normal stop then the stop request is retried (and will likely succeed on this retry by simply cancelling the persistent task for the affected datafeed). 3. If there are multiple simultaneous force-stop requests for the same datafeed we no longer fail the one that is processed second. The previous behaviour was wrong as stopping a stopped datafeed is not an error, so stopping a datafeed twice simultaneously should not be either. Backport of #49191	2019-11-19 10:51:46 +00:00
Benjamin Trent	eefe7688ce	[7.x][ML] ML Model Inference Ingest Processor (#49052 ) (#49257 ) * [ML] ML Model Inference Ingest Processor (#49052) * [ML][Inference] adds lazy model loader and inference (#47410) This adds a couple of things: - A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them - A Model class and its first sub-class LocalModel. Used to cache model information and run inference. - Transport action and handler for requests to infer against a local model Related Feature PRs: * [ML][Inference] Adjust inference configuration option API (#47812) * [ML][Inference] adds logistic_regression output aggregator (#48075) * [ML][Inference] Adding read/del trained models (#47882) * [ML][Inference] Adding inference ingest processor (#47859) * [ML][Inference] fixing classification inference for ensemble (#48463) * [ML][Inference] Adding model memory estimations (#48323) * [ML][Inference] adding more options to inference processor (#48545) * [ML][Inference] handle string values better in feature extraction (#48584) * [ML][Inference] Adding _stats endpoint for inference (#48492) * [ML][Inference] add inference processors and trained models to usage (#47869) * [ML][Inference] add new flag for optionally including model definition (#48718) * [ML][Inference] adding license checks (#49056) * [ML][Inference] Adding memory and compute estimates to inference (#48955) * fixing version of indexed docs for model inference	2019-11-18 13:19:17 -05:00
Przemysław Witek	5f9965e4b8	Lower minimum model memory limit value from 1MB to 1kB. (#49227 ) (#49242 )	2019-11-18 14:58:20 +01:00
Dimitris Athanasiou	805c31e19e	[7.x][ML] Avoid NPE when node load is calculated on job assignment (#49186 ) (#49214 ) This commit fixes a NPE problem as reported in #49150. But this problem uncovered that we never added proper handling of state for data frame analytics tasks. In this commit we improve the `MlTasks.getDataFrameAnalyticsState` method to handle null tasks and state tasks properly. Closes #49150 Backport of #49186	2019-11-18 10:33:07 +02:00
Przemysław Witek	150db2b544	Throw an exception when memory usage estimation endpoint encounters empty data frame. (#49143 ) (#49164 )	2019-11-18 07:52:57 +01:00
Rory Hunter	c46a0e8708	Apply 2-space indent to all gradle scripts (#49071 ) Backport of #48849. Update `.editorconfig` to make the Java settings the default for all files, and then apply a 2-space indent to all `*.gradle` files. Then reformat all the files.	2019-11-14 11:01:23 +00:00
Przemysław Witek	e6ad3c29fd	Do not throw exceptions resulting from persisting datafeed timing stats. (#49044 ) (#49050 )	2019-11-13 20:23:13 +01:00
Christoph Büscher	6119f0aaa2	Fix Eclipse compilation in DataFrameDataExtractorTests (#48942 )	2019-11-11 16:17:55 +01:00
Dimitris Athanasiou	dfc6a13b44	[7.x][ML] Handle nested arrays in source fields (#48885 ) (#48889 ) Backport of #48885	2019-11-07 07:30:50 +02:00
David Roberts	c03f7ba74c	[TEST] Mute TimeoutCheckerTests.testWatchdog Due to https://github.com/elastic/elasticsearch/issues/48861	2019-11-05 11:49:46 +00:00
Dimitris Athanasiou	f2d4c94a9c	[7.x][ML] Deduplicate multi-fields for data frame analytics (#48799 ) (#48806 ) In the case multi-fields exist in the source index, we pick all variants of them in our extracted fields detection for data frame analytics. This means we may have multiple instances of the same feature. The worse consequence of this is when the dependent variable (for regression or classification) is also duplicated which means we train a model on the dependent variable itself. Now that #48770 is merged, this commit is adding logic to only select one variant of multi-fields. Closes #48756 Backport of #48799	2019-11-01 16:53:05 +02:00
Dimitris Athanasiou	1f662e0b12	[7.x][ML] Prevent fetching multi-field from source (#48770 ) (#48797 ) Aggregatable mutli-fields are at the moment wrongly mapped as normal doc_value fields and thus they support fetching from source. However, they do not exist in the source. This results to failure to extract such fields. This commit fixes this bug. While a fix could be worked out on top of the existing code, it is evident the extraction logic has become difficult to understand and maintain. As we also want to deduplicate multi-fields for data frame analytics, it seemed appropriate to refactor the code to simplify and better handle the extraction of multi-fields. Relates #48756 Backport of #48770	2019-11-01 14:18:03 +02:00
David Roberts	c3063c4e1f	[ML] Make the URL of the ML C++ Ivy repo configurable (#48702 ) At present the ML C++ artifact is always downloaded from S3. This change adds an option to configure the location. (The intention is to use a file:/// URL to pick up the artifact built in a Docker container in ml-cpp PR builds so that C++ changes that will break Java integration tests can be detected before the ml-cpp PRs are merged.) Relates elastic/ml-cpp#766	2019-10-31 09:21:44 +00:00
Dimitris Athanasiou	919596b2e8	[7.x][ML] Move field extraction logic to its own package (#48709 ) (#48712 ) Moves common field extraction logic to its own package so that it can be used both for anomaly detection and data frame analytics. In preparation for refactoring extraction fields to be simpler and to support multi-fields properly. Backport of #48709	2019-10-31 02:41:00 +02:00
Benjamin Trent	c9ead80c31	[7.x] [ML][Inference] separating definition and config object storage (#48651 ) (#48695 ) * [ML][Inference] separating definition and config object storage (#48651) This separates out the `definition` object from being stored within the configuration object in the index. This allows us to gather the config object without decompressing a potentially large definition. Additionally, `input` is moved to the TrainedModelConfig object and out of the definition. This is so the trained input fields are accessible outside the potentially large model definition.	2019-10-30 13:27:29 -04:00
Przemysław Witek	7c944d26c5	[7.x] Assert that the results of classification analysis can be evaluated using _evaluate API. (#48626 ) (#48634 )	2019-10-29 16:20:56 +01:00
Przemysław Witek	7e30277a37	Mute RegressionIT.testStopAndRestart (#48575 ) (#48576 )	2019-10-28 13:08:11 +01:00
Martijn van Groningen	b034153df7	Change grok watch dog to be Matcher based instead of thread based. (#48346 ) There is a watchdog in order to avoid long running (and expensive) grok expressions. Currently the watchdog is thread based, threads that run grok expressions are registered and after completion unregister. If these threads stay registered for too long then the watch dog interrupts these threads. Joni (the library that powers grok expressions) has a mechanism that checks whether the current thread is interrupted and if so abort the pattern matching. Newer versions have an additional method to abort long running pattern matching inside joni. Instead of checking the thread's interrupted flag, joni now also checks a volatile field that can be set via a `Matcher` instance. This is more efficient method for aborting long running matches. (joni checks each 30k iterations whether interrupted flag is set vs. just checking a volatile field) Recently we upgraded to a recent joni version (#47374), and this PR is a followup of that PR. This change should also fix #43673, since it appears when unit tests are ran the a test runner thread's interrupted flag may already have been set, due to some thread reuse.	2019-10-24 15:34:01 +02:00
Przemysław Witek	149537a165	Assert that inference model has been persisted (#48332 ) (#48453 )	2019-10-24 14:18:43 +02:00
Przemyslaw Gomulka	aaa6209be6	[7.x] [Java.time] Calculate week of a year with ISO rules BACKPORT(#48209 ) (#48349 ) Reverting the change introducing IsoLocal.ROOT and introducing IsoCalendarDataProvider that defaults start of the week to Monday and requires minimum 4 days in first week of a year. This extension is using java SPI mechanism and defaults for Locale.ROOT only. It require jvm property java.locale.providers to be set with SPI,COMPAT closes #41670 backport #48209	2019-10-23 17:39:38 +02:00
Przemysław Witek	60d8ecb2b7	Mute ClassificationIT tests (#48338 ) (#48339 )	2019-10-22 12:45:50 +02:00
Przemysław Witek	2db2b945ec	[7.x] Change format of MulticlassConfusionMatrix result to be more self-explanatory (#48174 ) (#48294 )	2019-10-21 22:07:19 +02:00
Benjamin Trent	abd1b5118f	[ML] fixing tests (#48084 ) (#48253 ) * [ML] fixing tests * unmuting tests * reverting outlier detection job changes	2019-10-21 09:21:06 -04:00
rsarawgi	5e4dd0fd2e	[ML] Removing usages of ToXContentParams.INCLUDE_TYPE (#48165 ) Removing the option of ToXContentParams.INCLUDE_TYPE and replacing them with ToXContentParams.FOR_INTERNAL_STORAGE Closes #48057	2019-10-18 14:49:26 +01:00
Przemysław Witek	28f68fa221	Make num_top_classes parameter's default value equal to 2 (#48119 ) (#48201 )	2019-10-17 18:43:15 +02:00
Dimitris Athanasiou	e0489fc328	[7.x][ML] Always refresh dest index before starting analytics process (#48090 ) (#48196 ) If a job stops right after reindexing is finished but before we refresh the destination index, we don't refresh at all. If the job is started again right after, it jumps into the analyzing state. However, the data is still not searchable. This is why we were seeing test failures that we start the process expecting X rows (where X is lower than the expected number of docs) and we end up getting X+. We fix this by moving the refresh of the dest index right before we start the process so it always ensures the data is searchable. Closes #47612 Backport of #48090	2019-10-17 17:20:19 +01:00
Benjamin Trent	ee110c2d42	[ML] Muting tests due to #48085 (#48086 ) (#48154 )	2019-10-16 15:46:50 -04:00
Benjamin Trent	0dddbb5b42	[ML] Parse and index inference model (#48016 ) (#48152 ) This adds parsing an inference model as a possible result of the analytics process. When we do parse such a model we persist a `TrainedModelConfig` into the inference index that contains additional metadata derived from the running job.	2019-10-16 15:46:20 -04:00
Przemysław Witek	8f815240b3	[7.x] Allow integer types for classification's dependent variable (#47902 ) (#48080 )	2019-10-16 11:09:56 +02:00
David Roberts	d9c7e3847e	[TEST] Don't assert order of data frame analytics audit messages (#48065 ) Audit messages are stored with millisecond timestamps. If two messages have the same millisecond timestamp then asserting on their order is impossible given the information available. This PR changes the assertion on audit messages in the native data frame analytics tests to assert that the expected audit messages exist in any order. Fixes #48035	2019-10-15 19:59:52 +01:00
Przemysław Witek	eaa56344b5	Verify that the failure reason of analytics process is empty (#48042 ) (#48071 )	2019-10-15 18:33:20 +02:00
Przemysław Witek	620bd9d224	Enable test testSingleNumericFeatureAndMixedTrainingAndNonTrainingRows_TopClassesRequested now that top classes are correctly reported by C++. (#48043 ) (#48053 )	2019-10-15 14:49:16 +02:00
David Roberts	984323783e	[ML][7.x] Add lazy assignment job config option (#47993 ) This change adds: - A new option, allow_lazy_open, to anomaly detection jobs - A new option, allow_lazy_start, to data frame analytics jobs Both work in the same way: they allow a job to be opened/started even if no ML node exists that can accommodate the job immediately. In this situation the job waits in the opening/starting state until ML node capacity is available. (The starting state for data frame analytics jobs is new in this change.) Additionally, the ML nightly maintenance tasks now creates audit warnings for ML jobs that are unassigned. This means that jobs that cannot be assigned to an ML node for a very long time will show a yellow warning triangle in the UI. A final change is that it is now possible to close a job that is not assigned to a node without using force. This is because previously jobs that were open but not assigned to a node were an aberration, whereas after this change they'll be relatively common.	2019-10-15 06:55:11 +01:00
David Roberts	1ca25bed38	[ML][7.x] Add option to stop datafeed that finds no data (#47995 ) Adds a new datafeed config option, max_empty_searches, that tells a datafeed that has never found any data to stop itself and close its associated job after a certain number of real-time searches have returned no data. Backport of #47922	2019-10-14 17:19:13 +01:00
David Roberts	46ae86ac31	[ML] Fix detection of syslog-like timestamp in find_file_structure (#47970 ) Usually syslog timestamps have two spaces before a single digit day-of-month. However, in some non-syslog cases where syslog-like timestamps are used there is only one space. The grok pattern supports this, so the timestamp parser should too. This change makes the find_file_structure endpoint do this. Also fixes another problem that the same test case exposed in the find_file_structure endpoint, which was that the exclude_lines_pattern for delimited files was always created on the assumption the delimiter was a comma. Now it is based on the actual delimiter.	2019-10-13 20:07:54 +01:00
Benjamin Trent	627faf1850	[7.x] [ML][Analytics] fix bug where regression deleted early does not delete state (#47885 ) (#47914 ) * [ML][Analytics] fix bug where regression deleted early does not delete state (#47885) * [ML][Analytics] fix bug where regression deleted early does not delete state * Fixing ml with security test failure * fixing for older java	2019-10-11 15:11:16 -04:00
Przemysław Witek	c62fe8c344	Require that the dependent variable column has at most 2 distinct values in classfication analysis. (#47858 ) (#47906 )	2019-10-11 14:57:08 +02:00
Igor Motov	b5afa95fd8	Fix Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart Tracked by #47612	2019-10-10 18:17:01 +04:00
Igor Motov	17433e79d8	Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart Tracked by #47612	2019-10-10 17:56:23 +04:00
Dimitris Athanasiou	c1b0bfd74a	[7.x][ML] Unwrap exception causes before calling instanceof (#47676 ) (#47724 ) When exceptions could be returned from another node, the exception might be wrapped in a `RemoteTransportException`. In places where we handled specific exceptions using `instanceof` we ought to unwrap the cause first. This commit attempts to fix this issue after searching code in the ML plugin. Backport of #47676	2019-10-08 16:02:47 +03:00
Dimitris Athanasiou	7667ea5f6f	[7.x][ML] Additional outlier detection parameters (#47600 ) (#47669 ) Adds the following parameters to `outlier_detection`: - `compute_feature_influence` (boolean): whether to compute or not feature influence scores - `outlier_fraction` (double): the proportion of the data set assumed to be outlying prior to running outlier detection - `standardization_enabled` (boolean): whether to apply standardization to the feature values Backport of #47600	2019-10-07 18:21:33 +03:00
Dimitris Athanasiou	ffacfc642c	[7.x][ML] Mute RegressionIT.testStopAndRestart (#47624 ) (#47625 ) Relates #47612	2019-10-05 23:58:32 +03:00
Przemysław Witek	ee952da2e2	[7.x] Implement evaluation API for multiclass classification problem (#47126 ) (#47343 )	2019-10-04 17:54:51 +02:00
Przemysław Witek	ec9b77deaa	[7.x] Implement new analysis type: classification (#46537 ) (#47559 )	2019-10-04 13:47:19 +02:00
David Roberts	31a5e1c7ee	[ML] More accurate job memory overhead (#47516 ) When an ML job runs the memory required can be broken down into: 1. Memory required to load the executable code 2. Instrumented model memory 3. Other memory used by the job's main process or ancilliary processes that is not instrumented Previously we added a simple fixed overhead to account for 1 and 3. This was 100MB for anomaly detection jobs (large because of the completely uninstrumented categorization function and normalize process), and 20MB for data frame analytics jobs. However, this was an oversimplification because the executable code only needs to be loaded once per machine. Also the 100MB overhead for anomaly detection jobs was probably too high in most cases because categorization and normalization don't use _that_ much memory. This PR therefore changes the calculation of memory requirements as follows: 1. A per-node overhead of 30MB for _only_ the first job of any type to be run on a given node - this is to account for loading the executable code 2. The established model memory (if applicable) or model memory limit of the job 3. A per-job overhead of 10MB for anomaly detection jobs and 5MB for data frame analytics jobs, to account for the uninstrumented memory usage This change will enable more jobs to be run on the same node. It will be particularly beneficial when there are a large number of small jobs. It will have less of an effect when there are a small number of large jobs.	2019-10-04 09:57:31 +01:00
Dimitris Athanasiou	b9541eb3af	[7.x][ML] Make PUT data frame analytics action a master node action (… (#47433 ) While it seemed like the PUT data frame analytics action did not have to be a master node action as the config is stored in an index rather than the cluster state, there are other subtle nuances which make it worthwhile to convert it. In particular, it helps maintain order of execution for put actions which are anyhow user driven and are expected to have low volume. This commit converts `TransportPutDataFrameAnalyticsAction` from a handled transport action to a master node action. Note this means that the action might fail in a mixed cluster but as the API is still experimental and not widely used there will be few moments more suitable to make this change than now.	2019-10-02 16:24:21 +03:00
David Roberts	4379a3c52b	[ML] Throttle the delete-by-query of expired results (#47177 ) Due to #47003 many clusters will have built up a large backlog of expired results. On upgrading to a version where that bug is fixed users could find that the first ML daily maintenance task deletes a very large amount of documents. This change introduces throttling to the delete-by-query that the ML daily maintenance uses to delete expired results to limit it to deleting an average 200 documents per second. (There is no throttling for state/forecast documents as these are expected to be lower volume.) Additionally a rough time limit of 8 hours is applied to the whole delete expired data action. (This is only rough as it won't stop part way through a single operation - it only checks the timeout between operations.) Relates #47103	2019-10-02 11:16:34 +01:00
Dimitris Athanasiou	36884a3c32	[7.x][ML] Restore analytics state if available (#47128 ) (#47393 ) This commit restores the model state if available in data frame analytics jobs. In addition, this changes the start API so that a stopped job can be restarted. As we now store the progress in the state index when the task is stopped, we can use it to determine what state the job was in when it got stopped. Note that in order to be able to distinguish between a job that runs for the first time and another that is restarting, we ensure reindexing progress is reported to be at least 1 for a running task.	2019-10-02 10:24:05 +03:00
Benjamin Trent	f5fe5e7cd6	[7.x] [ML][Inference] Adding preprocessors to definition object (#47320 ) (#47370 ) * [ML][Inference] Adding preprocessors to definition object (#47320) * [ML][Inference] Adding preprocessors to definition object * Update TrainedModelConfig.java * adjusting for backport	2019-10-01 13:31:25 -04:00
Benjamin Trent	4335e07716	[7.x] [ML][Inference] adding .ml-inference* index and storage (#47267 ) (#47310 ) * [ML][Inference] adding .ml-inference* index and storage (#47267) * [ML][Inference] adding .ml-inference* index and storage * Addressing PR comments * Allowing null definition, adding validation tests for model config * fixing line length * adjusting for backport	2019-10-01 08:20:33 -04:00
David Roberts	0807d409bf	[ML] Reinstate ML daily maintenance actions (#47103 ) A refactoring in 6.6 meant that the ML daily maintenance actions have not been run at all since then. This change installs the local master listener that schedules the ML daily maintenance, and also defends against some subtle race conditions that could occur in the future if a node flipped very quickly between master and non-master. Fixes #47003	2019-09-30 13:12:32 +01:00
Rory Hunter	53a4d2176f	Convert most awaitBusy calls to assertBusy (#45794 ) (#47112 ) Backport of #45794 to 7.x. Convert most `awaitBusy` calls to `assertBusy`, and use asserts where possible. Follows on from #28548 by @liketic. There were a small number of places where it didn't make sense to me to call `assertBusy`, so I kept the existing calls but renamed the method to `waitUntil`. This was partly to better reflect its usage, and partly so that anyone trying to add a new call to awaitBusy wouldn't be able to find it. I also didn't change the usage in `TransportStopRollupAction` as the comments state that the local awaitBusy method is a temporary copy-and-paste. Other changes: * Rework `waitForDocs` to scale its timeout. Instead of calling `assertBusy` in a loop, work out a reasonable overall timeout and await just once. * Some tests failed after switching to `assertBusy` and had to be fixed. * Correct the expect templates in AbstractUpgradeTestCase. The ES Security team confirmed that they don't use templates any more, so remove this from the expected templates. Also rewrite how the setup code checks for templates, in order to give more information. * Remove an expected ML template from XPackRestTestConstants The ML team advised that the ML tests shouldn't be waiting for any `.ml-notifications` templates, since such checks should happen in the production code instead. Also rework the template checking code in `XPackRestTestHelper` to give more helpful failure messages. * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4	2019-09-29 12:21:46 +01:00
Przemysław Witek	3fbd58d156	[7.x] Allow evaluation to consist of multiple steps. (#46653 ) (#47194 )	2019-09-27 13:01:51 +02:00
David Roberts	77cc6d5bad	[TEST] Work around _cat/indices bug with security enabled (#47160 ) When the ML native multi-node tests use _cat/indices/_all and the request goes to a non-master node, _all is translated to a list of concrete indices by the authz layer on the coordinating node before the request is forwarded to the master node. Then it is possible for the master node to return an index_not_found_exception if one of the concrete indices that was expanded on the coordinating node has been deleted in the meantime. (#47159 has been opened to track the underlying problem.) It has been observed that the index that gets deleted when the problem affects the ML native multi-node tests is always the ML notifications index. The tests that fail are only interested in the presence or absense of ML results indices. Therefore the workaround is to only _cat indices that match the ML results index pattern. Fixes #45652	2019-09-26 13:29:40 +01:00
Dimitris Athanasiou	0765bd4bf7	[7.x][ML] Ensure data frame analytics task is only marked completed once (#47119 ) (#47157 ) Closes #46907	2019-09-26 15:26:06 +03:00
Tanguy Leroux	95e2ca741e	Remove unused private methods and fields (#47154 ) This commit removes a bunch of unused private fields and unused private methods from the code base. Backport of (#47115)	2019-09-26 12:49:21 +02:00
Benjamin Trent	05fb7be571	[7.x] [ML][Inference] Feature pre-processing objects and functions (#46777 ) (#47040 ) * [ML][Inference] Feature pre-processing objects and functions (#46777) To support inference on pre-trained machine learning models, some basic feature encoding will be necessary. I am using a named object serialization approach so new encodings/pre-processing steps could be added in the future. This PR lays down the ground work for 3 basic encodings: * HotOne * Target Mean * Frequency More feature encodings or pre-processings could be added in the future: * Handling missing columns * Standardization * Label encoding * etc.... * fixing compilation for namedxcontent tests	2019-09-25 08:16:24 -04:00
Yannick Welsch	eb86d71edd	Mute MlJobIT.testDeleteJob Relates #45652	2019-09-25 12:53:09 +02:00
Yannick Welsch	7a5b5af171	Mute MlJobIT.testDeleteJobAsync Relates #45652	2019-09-25 12:53:05 +02:00
Benjamin Trent	00c1c0132b	[ML] fix two datafeed flush lockup bugs (#46982 ) (#47024 ) * [ML] fix two flush lockup bugs * Addressing PR comments * moving debug logging line so it is only written on success	2019-09-24 13:03:20 -04:00
Yannick Welsch	9638ca20b0	Allow dropping documents with auto-generated ID (#46773 ) When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) + coordinating nodes that do not have the ingest processor functionality, this can lead to a NullPointerException. The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when the request contains auto-generated IDs. The response serialization is lenient for our REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for communication between nodes) assumes for this field to be non-null, which means that it can't be serialized between nodes. Bulk requests with ingest functionality are processed on the coordinating node if the node has the ingest capability, and only otherwise sent to a different node. This means that, in order to reproduce this, one needs two nodes, with the coordinating node not having the ingest functionality. Closes #46678	2019-09-19 16:46:33 +02:00
Dimitris Athanasiou	02a5e153dc	[7.x][ML] Parse and index data frame analytics state (#46804 ) (#46820 ) This commit reuses the same state processor that is used for autodetect to parse state output from data frame analytics jobs. We then index the state document into the state index. Backport of #46804	2019-09-18 20:37:40 +03:00
Dimitris Athanasiou	cebe8da617	[7.x][ML] MlMemoryTracker should ignore analytics tasks without config (#46789 ) (#46811 ) It is possible for a running analytics job that its config is removed from the '.ml-config' index (perhaps the user deleted the entire index, etc.). In that case the task remains without a matching config. I have raised #46781 to discuss how to deal with this issue. This commit focuses on `MlMemoryTracker` and changes it so that when we get the configs for the running tasks we leniently ignore missing ones. This at least means memory tracking will keep working for other jobs if one or more are missing. In addition, this commit makes the cleanup code for native analytics tests more robust by explicitly stopping all jobs and force-stopping if an error occurs. This helps so that a single failing test does not cause other tests fail due to pending tasks. Backport of #46789	2019-09-18 16:35:25 +03:00
Przemysław Witek	e49be611ad	[7.x] Add audit messages for Data Frame Analytics (#46521 ) (#46738 )	2019-09-16 21:21:38 +02:00
Dimitris Athanasiou	63eb0d9081	[7.x][ML] Avoid marking data frame analytics task completed twice (#46721 ) (#46724 ) When the stop API is called while the task is running there is a chance the task gets marked completed twice. This may cause undesired side effects, like indexing the progress document a second time after the stop API has returned (the cause for #46705). This commit adds a check that the task has not been completed before proceeding to mark it so. In addition, when we update the task's state we could get some warnings that the task was missing if the stop API has been called in the meantime. We now check the errors are `ResourceNotFoundException` and ignore them if so. Closes #46705 Backports #46721	2019-09-15 17:25:26 +03:00
Dimitris Athanasiou	0bc8acaf5b	[7.x][ML] Create state index and alias before starting an analytics job (#46602 ) (#46648 ) This is fixing a bug where if an analytics job is started before any anomaly detection job is opened, we create an index after the state write alias. Instead, we should create the state index and alias before starting an analytics job and this commit makes sure this is the case. Backport of #46602	2019-09-13 10:34:12 +03:00
David Roberts	461de5b58e	[TEST] Remove incorrect data frame analytics state assertion (#46597 ) After starting the analytics job and checking its state the state can be any of "started", "reindexing" or "analyzing" depending on how quickly the work is done.	2019-09-11 16:33:14 +01:00
Dimitris Athanasiou	579af626f5	[7.x][ML] No error when datafeed stops during updating to started (#46495 ) (#46542 ) Investigating the test failure reported in #45518 it appears that the datafeed task was not found during a tast state update. There are only two places where such an update is performed: when we set the state to `started` and when we set it to `stopping`. We handle `ResourceNotFoundException` in the latter but not in the former. Thus the test reveals a rare race condition where the datafeed gets requested to stop before we managed to update its state to `started`. I could not reproduce this scenario but it would be my best guess. This commit catches `ResourceNotFoundException` while updating the state to `started` and lets the task terminate smoothly. Closes #45518 Backport of #46495	2019-09-11 13:18:42 +03:00
Przemysław Witek	e38e631dac	[7.x] Implement DataFrameAnalyticsAuditMessage and DataFrameAnalyticsAuditor (#45967 ) (#46519 )	2019-09-11 12:17:26 +02:00
Przemysław Witek	e21deae535	Disallow persisting any documents when datafeed is isolated (#46485 ) (#46490 )	2019-09-09 21:01:27 +02:00
David Roberts	7c7fb7e32d	[ML] Tolerate total_search_time_ms not mapped in get datafeed stats (#46432 ) ML users who upgrade from versions prior to 7.4 to 7.4 or later will have ML results indices that do not have mappings for the total_search_time_ms field. Therefore, when searching these indices we must tolerate this field not having a mapping. Fixes #46437	2019-09-06 14:31:15 +01:00
Dimitris Athanasiou	a6834068e3	[7.x][ML] Extract DataFrameAnalyticsTask into its own class (#46402 ) (#46426 ) This refactors `DataFrameAnalyticsTask` into its own class. The task has quite a lot of functionality now and I believe it would make code more readable to have it live as its own class rather than an inner class of the start action class. Backport of #46402	2019-09-06 14:13:46 +03:00
Benjamin Trent	457ff3e2fb	7.x/ml fix instance serialization bwc (#46404 ) * [ML] Fixing instance serialization version for bwc * fixing CppLogMessage	2019-09-05 13:23:26 -05:00
Benjamin Trent	5201386232	[ML] testFullClusterRestart waiting for stable cluster (#46280 ) (#46335 ) * [ML] waiting for ml indices before waiting task assignment testFullClusterRestart * waiting for a stable cluster after fullrestart * removing unused imports	2019-09-05 06:57:33 -05:00
Dimitris Athanasiou	8fca5b5204	[7.x][ML] Unmute testStopOutlierDetectionWithEnoughDocumentsToScroll (#46271 ) (#46282 ) The test seems to have been failing due to a race condition between stopping the task and refreshing the destination index. In particular, we were going forward with refreshing the destination index even though the task stopped in the meantime. This was fixed in request. Closes #43960 Backport of #46271	2019-09-04 10:57:01 +03:00
Benjamin Trent	d0c5573a51	[ML] Throw an error when a datafeed needs CCS but it is not enabled for the node (#46044 ) (#46096 ) Though we allow CCS within datafeeds, users could prevent nodes from accessing remote clusters. This can cause mysterious errors and difficult to troubleshoot. This commit adds a check to verify that `cluster.remote.connect` is enabled on the current node when a datafeed is configured with a remote index pattern.	2019-08-30 09:27:07 -05:00
Dimitris Athanasiou	5921ae53d8	[7.x][ML] Regression dependent variable must be numeric (#46072 ) (#46136 ) * [ML] Regression dependent variable must be numeric This adds a validation that the dependent variable of a regression analysis must be numeric. * Address review comments and fix some problems In addition to addressing the review comments, this commit fixes a few issues I found during testing. In particular: - if there were mappings for required fields but they were not included we were not reporting the error - if explicitly included fields had unsupported types we were not reporting the error Unfortunately, I couldn't get those fixed without refactoring the code in `ExtractedFieldsDetector`.	2019-08-30 09:57:43 +03:00
Przemysław Witek	b8a0379057	Refactor auditor-related classes (#45893 ) (#46120 )	2019-08-29 14:21:03 +02:00
Przemysław Witek	fbe9e8a530	Do not throw an exception if the process finished quickly but without any error. (#46073 ) (#46113 )	2019-08-29 10:47:17 +02:00
Dimitris Athanasiou	25d64508f6	[7.x][ML] Support boolean fields for DF analytics (#46037 ) (#46054 ) This commit adds support for `boolean` fields in data frame analytics (and currently both outlier detection and regression). The analytics process expects `boolean` fields to be encoded as integers with 0 or 1 value.	2019-08-28 12:02:29 +03:00
Dimitris Athanasiou	873ad3f942	[7.x][ML] Add option to regression to randomize training set (#45969 ) (#46017 ) Adds a parameter `training_percent` to regression. The default value is `100`. When the parameter is set to a value less than `100`, from the rows that can be used for training (ie. those that have a value for the dependent variable) we randomly choose whether to actually use for training. This enables splitting the data into a training set and the rest, usually called testing, validation or holdout set, which allows for validating the model on data that have not been used for training. Technically, the analytics process considers as training the data that have a value for the dependent variable. Thus, when we decide a training row is not going to be used for training, we simply clear the row's dependent variable.	2019-08-27 17:53:11 +03:00
Benjamin Trent	a3a4ae0ac2	[ML] fixing bug where analytics process starts with 0 rows (#45879 ) (#45988 ) The native process requires that there be a non-zero number of rows to analyze. If the flag --rows 0 is passed to the executable, it throws and does not start. When building the configuration for the process we should not start the native process if there are no rows. Adding some logging to indicate what is occurring.	2019-08-26 14:18:17 -05:00
Benjamin Trent	d64018f8e1	[ML] add supported types to no fields error message (#45926 ) (#45987 ) * [ML] add supported types to no fields error message * adding supported types to logger debug	2019-08-26 14:18:00 -05:00
Dimitris Athanasiou	be554fe5f0	[7.x][ML] Improve progress reportings for DF analytics (#45856 ) (#45910 ) Previously, the stats API reports a progress percentage for DF analytics tasks that are running and are in the `reindexing` or `analyzing` state. This means that when the task is `stopped` there is no progress reported. Thus, one cannot distinguish between a task that never run to one that completed. In addition, there are blind spots in the progress reporting. In particular, we do not account for when data is loaded into the process. We also do not account for when results are written. This commit addresses the above issues. It changes progress to being a list of objects, each one describing the phase and its progress as a percentage. We currently have 4 phases: reindexing, loading_data, analyzing, writing_results. When the task stops, progress is persisted as a document in the state index. The stats API now reports progress from in-memory if the task is running, or returns the persisted document (if there is one).	2019-08-23 23:04:39 +03:00
Przemysław Witek	85d55e30d0	Add test that proves _timing_stats document is deleted when the job is deleted (#45840 ) (#45854 )	2019-08-23 07:03:09 +02:00
Przemysław Witek	2ed19b2c81	Put error message from inside the process into the exception that is thrown when the process doesn't start correctly. (#45846 ) (#45875 )	2019-08-23 07:02:50 +02:00
Benjamin Trent	8e3c54fff7	[7.x] [ML] Adding data frame analytics stats to _usage API (#45820 ) (#45872 ) * [ML] Adding data frame analytics stats to _usage API (#45820) * [ML] Adding data frame analytics stats to _usage API * making the size of analytics stats 10k * adjusting backport	2019-08-22 15:15:41 -05:00
Przemysław Witek	7512337922	[7.x] Allow the user to specify 'query' in Evaluate Data Frame request (#45775 ) (#45825 )	2019-08-22 11:14:26 +02:00
Przemysław Witek	bf701b83d2	Shorten field names in EstimateMemoryUsageResponse (#45719 ) (#45772 )	2019-08-21 12:45:09 +02:00
Przemysław Witek	c6709f0979	Mute tests affected by renaming fields in Estimate memory usage response (#45743 ) (#45766 )	2019-08-21 09:57:23 +02:00
Dimitris Athanasiou	d5c3d9b50f	[7.x][ML] Do not skip rows with missing values for regression (#45751 ) (#45754 ) Regression analysis support missing fields. Even more, it is expected that the dependent variable has missing fields to the part of the data frame that is not for training. This commit allows to declare that an analysis supports missing values. For such analysis, rows with missing values are not skipped. Instead, they are written as normal with empty strings used for the missing values. This also contains a fix to the integration test. Closes #45425	2019-08-21 08:15:38 +03:00
Benjamin Trent	ba7b677618	[ML] better handle empty results when evaluating regression (#45745 ) (#45759 ) * [ML] better handle empty results when evaluating regression * adding new failure test to ml_security black list * fixing equality check for regression results	2019-08-20 17:37:04 -05:00
Dimitris Athanasiou	49edf9e5b5	[7.x][ML] Remove timeout on waiting for DF analytics result processor to complete (#45724 ) (#45733 ) We cannot know how long the analysis will take to complete thus we should not have a timeout. Note that if the process crashes, the result processor will pick the exception due to the stream closing. Closes #45723	2019-08-20 17:21:40 +03:00
Przemysław Witek	b37ebd1adf	Prepare the codebase for new Auditor subclasses (#45716 ) (#45731 )	2019-08-20 16:03:50 +02:00
Przemysław Witek	80dd0a0948	Get rid of EstimateMemoryUsageRequest and EstimateMemoryUsageAction.Request. (#45718 ) (#45725 )	2019-08-20 15:49:17 +02:00
Przemysław Witek	7bc8400222	Call the new _estimate_memory_usage API endpoint on df analytics _start (#45536 ) (#45701 )	2019-08-19 21:37:55 +02:00
Igor Motov	98c850c08b	Geo: Change order of parameter in Geometries to lon, lat 7.x (#45618 ) Changes the order of parameters in Geometries from lat, lon to lon, lat and moves all Geometry classes are moved to the org.elasticsearch.geomtery package. Backport of #45332 Closes #45048	2019-08-16 14:42:02 -04:00
Przemysław Witek	df574e5168	[7.x] Implement ml/data_frame/analytics/_estimate_memory_usage API endpoint (#45188 ) (#45510 )	2019-08-14 08:26:03 +02:00
Armin Braun	90803a5caf	Reenable Integ Tests in native-multi-node-tests (#45482 ) (#45496 ) * Reenable Integ Tests in native-multi-node-tests * The tests broken here were likely fixed by #45463 => let's reenable them and see if things run fine again * Relates #45405, #45455	2019-08-13 15:55:54 +02:00
Przemysław Witek	1aed388a24	Add view_index_metadata to roles.yml and remove as many df analytics test cases from build.gradle blacklist as possible. (#45451 ) (#45465 )	2019-08-13 08:31:58 +02:00
Mark Vieira	7e3379444b	Fix build failure due to unknown task and disable test conventions (cherry picked from commit 8ed84bc5cef9bcfae6c817059f764d97e4451a4a)	2019-08-12 09:18:39 -07:00
Przemyslaw Gomulka	421e9b8e8b	Mute integ tests in native-multi-node-tests (#45457 ) Tracked at #45405	2019-08-12 17:42:24 +02:00
Przemyslaw Gomulka	d11ae08467	Muting ForecastIT.testOverflowToDisk (#45435 ) (#45438 ) awaits #45405	2019-08-12 11:01:32 +02:00
Dimitris Athanasiou	d02d6e40c2	[ML] Mute regression integ test Relates #45425	2019-08-12 10:59:24 +03:00
Armin Braun	a9e1402189	Remove Settings from BaseRestRequest Constructor (#45418 ) (#45429 ) * Resolving the todo, cleaning up the unused `settings` parameter * Cleaning up some other minor dead code in affected classes	2019-08-12 05:14:45 +02:00
Dimitris Athanasiou	27497ff75f	[7.x][ML] Add regression analysis to DF analytics (#45292 ) (#45388 ) This commit adds a first draft of a regression analysis to data frame analytics. There is high probability that the exact syntax might change. This commit adds the new analysis type and its parameters as well as appropriate validation. It also modifies the extractor and the fields detector to be able to handle categorical fields as regression analysis supports them.	2019-08-09 19:31:13 +03:00
Jason Tedor	9a142ff25c	Introduce formal node ML role (#45174 ) This commit builds on the ability for plugins to introduce new roles to add a formal node ML role.	2019-08-06 13:00:05 -04:00
David Roberts	a1f0285f0e	[TEST] Only test US locale in day/month order test in FIPS JVM (#45141 ) In the FIPS JVM the JVM default locale seems to leak into places where it should be overridden. This change skips assertions in TimestampFormatFinderTests.testGuessIsDayFirstFromLocale that may be impacted. Fixes #45140	2019-08-02 15:04:47 +01:00
David Roberts	f617585dbd	[ML] Improve CSV header row detection in find_file_structure (#45099 ) When doing a fieldwise Levenshtein distance comparison between CSV rows, this change ignores all fields that have long values, not just the longest field. This approach works better for CSV formats that have multiple freeform text fields rather than just a single "message" field. Fixes #45047	2019-08-02 09:08:21 +01:00
Dimitris Athanasiou	8a6675b994	[7.x][ML] Check dest index is empty when starting DF analytics (#45094 ) (#45112 ) If one tries to start a DF analytics job that has already run, the result will be that the task will fail after reindexing the dest index from the source index. The results of the prior run will be gone and the task state is not properly set to failed with the failure reason. This commit improves the behavior in this scenario. First, we set the task state to `failed` in a set of failures that were missed. Second, a validation is added that if the destination index exists, it must be empty.	2019-08-02 00:19:48 +03:00
Przemysław Witek	6c87845fc1	Persist DatafeedTimingStats with RefreshPolicy.NONE by default (#44940 ) (#45079 )	2019-08-01 14:36:59 +02:00
Dimitris Athanasiou	aef419c0b0	[7.x][ML] Catch any error thrown while closing data frame analytics process (#44958 ) (#44968 ) In case closing the process throws an exception we should be catching it no matter its type. The process may have terminated because of a fatal error in which case closing the process will throw a server error, not an `IOException`. If this happens we fail to mark the persistent task as failed and the task gets in limbo.	2019-07-29 21:59:10 +03:00
Benjamin Trent	3b514f0dae	[ML] update Instant serialization (#44765 ) (#44954 ) * [ML] update Instant serialization * addressing PR comments * removing unused import	2019-07-29 13:06:56 -05:00
Dimitris Athanasiou	9dd527328a	[ML] Outlier detection should only fetch docs that have the analyzed … (#44944 ) (#44959 ) As data frame rows with missing values for analyzed fields are skipped, we can be more efficient by including a query that only picks documents that have values for all analyzed fields. Besides improving the number of documents we go through, we also provide a more accurate measurement of how many rows we need which reduces the memory requirements. This also adds an integration test that runs outlier detection on data with missing fields.	2019-07-29 18:23:56 +03:00
Luca Cavanna	a3cc32da64	TaskListener#onFailure to accept Exception instead of Throwable (#44946 ) TaskListener accepts today Throwable in its onFailure method. Though looking at where it is called (TransportAction), it can never be notified of a Throwable. This commit changes the signature of TaskListener#onFailure so that it accepts an `Exception` rather than a `Throwable` as second argument.	2019-07-29 16:47:19 +02:00
David Kyle	d05f12dadb	[ML] Close any opened pipes if there is an error connecting to the process (#44869 )	2019-07-29 10:48:31 +01:00
Przemysław Witek	79121ea127	[7.x] Implement exponential average search time per hour statistics. (#44683 ) (#44897 )	2019-07-26 15:56:34 +02:00
Przemysław Witek	8bb8543fdf	Treat PostDataActionResponse.DataCounts.bucketCount as incremental rather than absolute (total). (#44803 ) (#44856 )	2019-07-25 20:46:56 +02:00
Przemysław Witek	53f409e5ae	Add result_type field to TimingStats and DatafeedTimingStats documents (#44812 ) (#44841 )	2019-07-25 10:11:55 +02:00
Andrei Stefan	2633d11eb7	Switch from using docvalue_fields to extracting values from _source (#44062 ) (#44804 ) * Switch from using docvalue_fields to extracting values from _source where applicable. Doing this means parsing the _source and handling the numbers parsing just like Elasticsearch is doing it when it's indexing a document. * This also introduces a minor limitation: aliases type of fields that are NOT part of a tree of sub-fields will not be able to be retrieved anymore. field_caps API doesn't shed any light into a field being an alias or not and at _source parsing time there is no way to know if a root field is an alias or not. Fields of the type "a.b.c.alias" can be extracted from docvalue_fields, only if the field they point to can be extracted from docvalue_fields. Also, not all fields in a hierarchy of fields can be evaluated to being an alias. (cherry picked from commit 8bf8a055e38f00df5f49c8d97f632f69d6e00c2c)	2019-07-25 10:02:41 +03:00
Alpar Torok	b34ac66d96	Mute multiple tests on Windows (7.x) (#44676 ) * Mute failing test tracked in #44552 * mute EvilSecurityTests tracking in #44558 * Fix line endings in ESJsonLayoutTests * Mute failing ForecastIT test on windows Tracking in #44609 * mute BasicRenormalizationIT.testDefaultRenormalization tracked in #44613 * fix mute testDefaultRenormalization * Increase busyWait timeout windows is slow * Mute failure unconfigured node name * mute x-pack internal cluster test windows tracking #44610 * Mute JvmErgonomicsTests on windows Tracking #44669 * mute SharedClusterSnapshotRestoreIT testParallelRestoreOperationsFromSingleSnapshot Tracking #44671 * Mute NodeTests on Windows Tracking #44256	2019-07-22 11:32:29 +03:00
David Kyle	0fc091f166	Enable XLint warnings for ML (#44346 ) Removes the warning suppression -Xlint:-deprecation,-rawtypes,-serial,-try,-unchecked. Many warnings were unchecked warnings in the test code often because of the use of mocks. These are suppressed with @SuppressWarning	2019-07-18 09:33:37 +01:00
Tal Levy	a5ad59451c	migrate more ML actions off of using Request suppliers (#44462 ) (#44529 ) many classes still use the Streamable constructors of HandledTransportAction, this commit moves more of those classes to the new Writeable constructors. relates #34389.	2019-07-17 20:28:29 -07:00
Ryan Ernst	0755a13c9f	Convert AcknowledgedRequest to Writeable.Reader (#44412 ) (#44454 ) This commit adds constructors to AcknolwedgedRequest subclasses to implement Writeable.Reader, and ensures all future subclasses implement the same. relates #34389	2019-07-17 11:17:36 -07:00
Tal Levy	901310a826	[7.x] Migrate ML Actions to use writeable ActionType (#44302 ) (#44391 ) * Migrate ML Actions to use writeable ActionType (#44302) This commit converts all the StreamableResponseActionType actions in the ML core module to be ActionType and leverage the Writeable infrastructure.	2019-07-16 12:41:10 -07:00
Przemysław Witek	34bf6bcec0	Treat big changes in searchCount as significant and persist the document after such changes (#44413 ) (#44424 )	2019-07-16 16:15:32 +02:00
Lee Hinman	fb0461ac76	[7.x] Add Snapshot Lifecycle Management (#44382 ) * Add Snapshot Lifecycle Management (#43934) * Add SnapshotLifecycleService and related CRUD APIs This commit adds `SnapshotLifecycleService` as a new service under the ilm plugin. This service handles snapshot lifecycle policies by scheduling based on the policies defined schedule. This also includes the get, put, and delete APIs for these policies Relates to #38461 * Make scheduledJobIds return an immutable set * Use Object.equals for SnapshotLifecyclePolicy * Remove unneeded TODO * Implement ToXContentFragment on SnapshotLifecyclePolicyItem * Copy contents of the scheduledJobIds * Handle snapshot lifecycle policy updates and deletions (#40062) (Note this is a PR against the `snapshot-lifecycle-management` feature branch) This adds logic to `SnapshotLifecycleService` to handle updates and deletes for snapshot policies. Policies with incremented versions have the old policy cancelled and the new one scheduled. Deleted policies have their schedules cancelled when they are no longer present in the cluster state metadata. Relates to #38461 * Take a snapshot for the policy when the SLM policy is triggered (#40383) (This is a PR for the `snapshot-lifecycle-management` branch) This commit fills in `SnapshotLifecycleTask` to actually perform the snapshotting when the policy is triggered. Currently there is no handling of the results (other than logging) as that will be added in subsequent work. This also adds unit tests and an integration test that schedules a policy and ensures that a snapshot is correctly taken. Relates to #38461 * Record most recent snapshot policy success/failure (#40619) Keeping a record of the results of the successes and failures will aid troubleshooting of policies and make users more confident that their snapshots are being taken as expected. This is the first step toward writing history in a more permanent fashion. * Validate snapshot lifecycle policies (#40654) (This is a PR against the `snapshot-lifecycle-management` branch) With the commit, we now validate the content of snapshot lifecycle policies when the policy is being created or updated. This checks for the validity of the id, name, schedule, and repository. Additionally, cluster state is checked to ensure that the repository exists prior to the lifecycle being added to the cluster state. Part of #38461 * Hook SLM into ILM's start and stop APIs (#40871) (This pull request is for the `snapshot-lifecycle-management` branch) This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are cancelled. Relates to #38461 * Add tests for SnapshotLifecyclePolicyItem (#40912) Adds serialization tests for SnapshotLifecyclePolicyItem. * Fix improper import in build.gradle after master merge * Add human readable version of modified date for snapshot lifecycle policy (#41035) * Add human readable version of modified date for snapshot lifecycle policy This small change changes it from: ``` ... "modified_date": 1554843903242, ... ``` To ``` ... "modified_date" : "2019-04-09T21:05:03.242Z", "modified_date_millis" : 1554843903242, ... ``` Including the `"modified_date"` field when the `?human` field is used. Relates to #38461 * Fix test * Add API to execute SLM policy on demand (#41038) This commit adds the ability to perform a snapshot on demand for a policy. This can be useful to take a snapshot immediately prior to performing some sort of maintenance. ```json PUT /_ilm/snapshot/<policy>/_execute ``` And it returns the response with the generated snapshot name: ```json { "snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug" } ``` Note that this does not allow waiting for the snapshot, and the snapshot could still fail. It does record this information into the cluster state similar to a regularly trigged SLM job. Relates to #38461 * Add next_execution to SLM policy metadata (#41221) * Add next_execution to SLM policy metadata This adds the next time a snapshot lifecycle policy will be executed when retriving a policy's metadata, for example: ```json GET /_ilm/snapshot?human { "production" : { "version" : 1, "modified_date" : "2019-04-15T21:16:21.865Z", "modified_date_millis" : 1555362981865, "policy" : { "name" : "<production-snap-{now/d}>", "schedule" : "/30 * * * ?", "repository" : "repo", "config" : { "indices" : [ "foo-", "important" ], "ignore_unavailable" : true, "include_global_state" : false } }, "next_execution" : "2019-04-15T21:16:30.000Z", "next_execution_millis" : 1555362990000 }, "other" : { "version" : 1, "modified_date" : "2019-04-15T21:12:19.959Z", "modified_date_millis" : 1555362739959, "policy" : { "name" : "<other-snap-{now/d}>", "schedule" : "0 30 2 * ?", "repository" : "repo", "config" : { "indices" : [ "other" ], "ignore_unavailable" : false, "include_global_state" : true } }, "next_execution" : "2019-04-16T02:30:00.000Z", "next_execution_millis" : 1555381800000 } } ``` Relates to #38461 * Fix and enhance tests * Figured out how to Cron * Change SLM endpoint from /_ilm/* to /_slm/* (#41320) This commit changes the endpoint for snapshot lifecycle management from: ``` GET /_ilm/snapshot/<policy> ``` to: ``` GET /_slm/policy/<policy> ``` It mimics the ILM path only using `slm` instead of `ilm`. Relates to #38461 * Add initial documentation for SLM (#41510) * Add initial documentation for SLM This adds the initial documentation for snapshot lifecycle management. It also includes the REST spec API json files since they're sort of documentation. Relates to #38461 * Add `manage_slm` and `read_slm` roles (#41607) * Add `manage_slm` and `read_slm` roles This adds two more built in roles - `manage_slm` which has permission to perform any of the SLM actions, as well as stopping, starting, and retrieving the operation status of ILM. `read_slm` which has permission to retrieve snapshot lifecycle policies as well as retrieving the operation status of ILM. Relates to #38461 * Add execute to the test * Fix ilm -> slm typo in test * Record SLM history into an index (#41707) It is useful to have a record of the actions that Snapshot Lifecycle Management takes, especially for the purposes of alerting when a snapshot fails or has not been taken successfully for a certain amount of time. This adds the infrastructure to record SLM actions into an index that can be queried at leisure, along with a lifecycle policy so that this history does not grow without bound. Additionally, SLM automatically setting up an index + lifecycle policy leads to `index_lifecycle` custom metadata in the cluster state, which some of the ML tests don't know how to deal with due to setting up custom `NamedXContentRegistry`s. Watcher would cause the same problem, but it is already disabled (for the same reason). * High Level Rest Client support for SLM (#41767) * High Level Rest Client support for SLM This commit add HLRC support for SLM. Relates to #38461 * Fill out documentation tests with tags * Add more callouts and asciidoc for HLRC * Update javadoc links to real locations * Add security test testing SLM cluster privileges (#42678) * Add security test testing SLM cluster privileges This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm` cluster privileges. Relates to #38461 * Don't redefine vars * Add Getting Started Guide for SLM (#42878) This commit adds a basic Getting Started Guide for SLM. * Include SLM policy name in Snapshot metadata (#43132) Keep track of which SLM policy in the metadata field of the Snapshots taken by SLM. This allows users to more easily understand where the snapshot came from, and will enable future SLM features such as retention policies. * Fix compilation after master merge * [TEST] Move exception wrapping for devious exception throwing Fixes an issue where an exception was created from one line and thrown in another. * Fix SLM for the change to AcknowledgedResponse * Add Snapshot Lifecycle Management Package Docs (#43535) * Fix compilation for transport actions now that task is required * Add a note mentioning the privileges needed for SLM (#43708) * Add a note mentioning the privileges needed for SLM This adds a note to the top of the "getting started with SLM" documentation mentioning that there are two built-in privileges to assist with creating roles for SLM users and administrators. Relates to #38461 * Mention that you can create snapshots for indices you can't read * Fix REST tests for new number of cluster privileges * Mute testThatNonExistingTemplatesAreAddedImmediately (#43951) * Fix SnapshotHistoryStoreTests after merge * Remove overridden newResponse functions that have been removed * Fix compilation for backport * Fix get snapshot output parsing in test * [DOCS] Add redirects for removed autogen anchors (#44380) * Switch <tt>...</tt> in javadocs for {@code ...}	2019-07-16 07:37:13 -06:00
Przemysław Witek	3f3a3d3f2b	[7.x] Add DatafeedTimingStats.average_search_time_per_bucket_ms and TimingStats.total_bucket_processing_time_ms stats (#44125 ) (#44404 )	2019-07-16 12:51:29 +02:00
Ryan Ernst	7e06888bae	Convert testclusters to use distro download plugin (#44253 ) (#44362 ) Test clusters currently has its own set of logic for dealing with finding different versions of Elasticsearch, downloading them, and extracting them. This commit converts testclusters to use the DistributionDownloadPlugin.	2019-07-15 17:53:05 -07:00
Ryan Ernst	59658daef9	Separate streamable based master node actions (#44313 ) This commit creates new base classes for master node actions whose response types still implement Streamable. This simplifies both finding remaining classes to convert, as well as creating new master node actions that use Writeable for their responses. relates #34389	2019-07-15 09:20:20 -07:00
Armin Braun	d73e2f9c56	HLRC: Fix '+' Not Correctly Encoded in GET Req. (#33164 ) (#44324 ) * HLRC: Fix '+' Not Correctly Encoded in GET Req. * Encode `+` correctly as `%2B` in URL paths * Keep encoding `+` as space in URL parameters * Closes #33077	2019-07-15 10:21:54 +02:00
Ryan Ernst	1dcf53465c	Reorder HandledTransportAction ctor args (#44291 ) This commit moves the Supplier variant of HandledTransportAction to have a different ordering than the Writeable.Reader variant. The Supplier version is used for the legacy Streamable, and currently having the location of the Writeable.Reader vs Supplier in the same place forces using casts of Writeable.Reader to select the correct super constructor. This change in ordering allows easier migration to Writeable.Reader. relates #34389	2019-07-12 13:45:09 -07:00
Przemysław Witek	dd5f4ae00e	Update .ml-config mappings before indexing job, datafeed or df analytics config (#44216 ) (#44273 )	2019-07-12 16:49:48 +02:00

1 2 3 4 5 ...

651 Commits