OpenSearch

Commit Graph

Author	SHA1	Message	Date
Jason Tedor	381d7586e4	Introduce formal role for remote cluster client (#54138 ) This commit introduce a formal role for identifying nodes that are capable of making connections to remote clusters. Relates #53924	2020-03-24 21:59:43 -04:00
David Roberts	7667004b20	[ML] Add a model memory estimation endpoint for anomaly detection (#54129 ) A new endpoint for estimating anomaly detection job model memory requirements: POST _ml/anomaly_detectors/estimate_model_memory Backport of #53507	2020-03-24 22:55:11 +00:00
Dimitris Athanasiou	c141c1dd89	[7.x][ML] Stratified cross validation split for classification (#54087 ) (#54104 ) As classification now works for multiple classes, randomly picking training/test data frame rows is not good enough. This commit introduces a stratified cross validation splitter that maintains the proportion of the each class in the dataset in the sample that is used for training the model. Backport of #54087	2020-03-24 18:47:36 +02:00
David Roberts	1421471556	[ML] Introduce a "starting" datafeed state for lazy jobs (#54065 ) It is possible for ML jobs to open lazily if the "allow_lazy_open" option in the job config is set to true. Such jobs wait in the "opening" state until a node has sufficient capacity to run them. This commit fixes the bug that prevented datafeeds for jobs lazily waiting assignment from being started. The state of such datafeeds is "starting", and they can be stopped by the stop datafeed API while in this state with or without force. Backport of #53918	2020-03-24 13:00:04 +00:00
Dimitris Athanasiou	be20bb5755	[7.x][ML] No refresh on indexing DFA stats (#53977 ) (#54064 ) When we index data frame analytics stats docs we do not need to refresh immediately. Backport of #53977	2020-03-24 13:13:03 +02:00
Dimitris Athanasiou	5ce7c99e74	[7.x][ML] Data frame analytics data counts (#53998 ) (#54031 ) This commit instruments data frame analytics with stats for the data that are being analyzed. In particular, we count training docs, test docs, and skipped docs. In order to account docs with missing values as skipped docs for analyses that do not support missing values, this commit changes the extractor so that it only ignores docs with missing values when it collects the data summary, which is used to estimate memory usage. Backport of #53998	2020-03-24 11:30:43 +02:00
Benjamin Trent	19af869243	[ML] adds multi-class feature importance support (#53803 ) (#54024 ) Adds multi-class feature importance calculation. Feature importance objects are now mapped as follows (logistic) Regression: ``` { "feature_name": "feature_0", "importance": -1.3 } ``` Multi-class [class names are `foo`, `bar`, `baz`] ``` { “feature_name”: “feature_0”, “importance”: 2.0, // sum(abs()) of class importances “foo”: 1.0, “bar”: 0.5, “baz”: -0.5 }, ``` For users to get the full benefit of aggregating and searching for feature importance, they should update their index mapping as follows (before turning this option on in their pipelines) ``` "ml.inference.feature_importance": { "type": "nested", "dynamic": true, "properties": { "feature_name": { "type": "keyword" }, "importance": { "type": "double" } } } ``` The mapping field name is as follows `ml.<inference.target_field>.<inference.tag>.feature_importance` if `inference.tag` is not provided in the processor definition, it is not part of the field path. `inference.target_field` is defaulted to `ml.inference`. //cc @lcawl ^ Where should we document this? If this makes it in for 7.7, there shouldn't be any feature_importance at inference BWC worries as 7.7 is the first version to have it.	2020-03-23 18:49:07 -04:00
Benjamin Trent	d276058c6c	[ML] adjusting feature importance mapping for multi-class support (#53821 ) (#54013 ) Feature importance storage format is changing to encompass multi-class. Feature importance objects are now mapped as follows (logistic) Regression: ``` { "feature_name": "feature_0", "importance": -1.3 } ``` Multi-class [class names are `foo`, `bar`, `baz`] ``` { “feature_name”: “feature_0”, “importance”: 2.0, // sum(abs()) of class importances “foo”: 1.0, “bar”: 0.5, “baz”: -0.5 }, ``` This change adjusts the mapping creation for analytics so that the field is mapped as a `nested` type. Native side change: https://github.com/elastic/ml-cpp/pull/1071	2020-03-23 15:50:12 -04:00
Przemysław Witek	88c5d520b3	[7.x] Verify that the field is aggregatable before attempting cardinality aggregation (#53874 ) (#54004 )	2020-03-23 19:36:33 +01:00
Dimitris Athanasiou	965af3a68b	[7.x][ML] Delete DF analytics stats upon job deletion (#53933 ) (#53997 ) Since a data frame analytics job may have associated docs in the .ml-stats-* indices, when the job is deleted we should delete those docs too. Backport of #53933	2020-03-23 19:55:36 +02:00
Ryan Ernst	960d1fb578	Revert "Introduce system index APIs for Kibana (#53035 )" (#53992 ) This reverts commit `c610e0893d`. backport of #53912	2020-03-23 10:29:35 -07:00
Dimitris Athanasiou	3873510332	[7.x][ML] Refactor DFA custom processor to cross validation splitter (#53915 ) (#53956 ) While `CustomProcessor` is generic and allows for flexibility, there are new requirements that make cross validation a concept it's hard to abstract behind custom processor. In particular, we would like to add data_counts to the DFA jobs stats. Counting training VS. test docs would be a useful statistic. We would also want to add a different cross validation strategy for multiclass classification. This commit renames custom processors to cross validation splitters which allows for those enhancements without cryptically doing things as a side effect of the abstract custom processing. Backport of #53915	2020-03-23 17:15:14 +02:00
Przemysław Witek	a68071dbba	[7.x] Delete empty .ml-state* indices during nightly maintenance task. (#53587 ) (#53849 )	2020-03-20 13:08:36 +01:00
Alan Woodward	d23112f441	Report parser name and location in XContent deprecation warnings (#53805 ) It's simple to deprecate a field used in an ObjectParser just by adding deprecation markers to the relevant ParseField objects. The warnings themselves don't currently have any context - they simply say that a deprecated field has been used, but not where in the input xcontent it appears. This commit adds the parent object parser name and XContentLocation to these deprecation messages. Note that the context is automatically stripped from warning messages when they are asserted on by integration tests and REST tests, because randomization of xcontent type during these tests means that the XContentLocation is not constant	2020-03-20 11:52:55 +00:00
Dimitris Athanasiou	60153c5433	[7.x][ML] Data frame analytics analysis stats (#53788 ) (#53844 ) Adds parsing and indexing of analysis instrumentation stats. The latest one is also returned from the get-stats API. Note that we chose to duplicate objects even where they are currently similar. There are already ideas on how these will diverge in the future and while the duplication looks ugly at the moment, it is the option that offers the highest flexibility. Backport of #53788	2020-03-20 12:11:53 +02:00
Benjamin Trent	433952b595	[7.x] [ML] only retry persistence failures when the failure is intermittent and stop retrying when analytics job is stopping (#53725 ) (#53808 ) * [ML] only retry persistence failures when the failure is intermittent and stop retrying when analytics job is stopping (#53725) This fixes two issues: - Results persister would retry actions even if they are not intermittent. An example of an persistent failure is a doc mapping problem. - Data frame analytics would continue to retry to persist results even after the job is stopped. closes https://github.com/elastic/elasticsearch/issues/53687	2020-03-19 13:56:41 -04:00
Jake Landis	db3420d757	[7.x] Optimize which Rest resources are used by the Rest tests… (#53766 ) This should help with Gradle's incremental compile such that projects only depend upon the resources they use. related #52114	2020-03-19 12:28:59 -05:00
Benjamin Trent	2ccb963f1d	Create GET _cat/transforms API Issue (#53643 ) (#53726 ) Adds new` _cat/transform` and `_cat/transform/{transform_id}` endpoints.	2020-03-18 10:45:28 -04:00
Przemysław Witek	ec13c093df	Make ML index aliases hidden (#53160 ) (#53710 )	2020-03-18 10:28:45 +01:00
Przemysław Witek	376b2ae735	[7.x] Make classification evaluation metrics work when there is field mapping type mismatch (#53458 ) (#53601 )	2020-03-16 15:38:56 +01:00
Dimitris Athanasiou	94da4ca3fc	[7.x][ML] Extend classification to support multiple classes (#53539 ) (#53597 ) Prepares classification analysis to support more than just two classes. It introduces a new parameter to the process config which dictates the `num_classes` to the process. It also changes the max classes limit to `30` provisionally. Backport of #53539	2020-03-16 15:00:54 +02:00
Benjamin Trent	1262ab2762	[ML] [Inference] fix number inference models returned in x-pack info call (#53540 ) (#53560 ) the ML portion of the x-pack info API was erroneously counting configuration documents and definition documents. The underlying implementation of our storage separates the two out. This PR filters the query so that only trained model config documents are counted.	2020-03-13 16:53:34 -04:00
Benjamin Trent	4e43ede735	[ML] renaming inference processor field field_mappings to new name field_map (#53433 ) (#53502 ) This renames the `inference` processor configuration field `field_mappings` to `field_map`. `field_mappings` is now deprecated.	2020-03-13 15:40:57 -04:00
Tom Veasey	690099553c	[7.x][ML] Adds the class_assignment_objective parameter to classification (#53552 ) Adds a new parameter for classification that enables choosing whether to assign labels to maximise accuracy or to maximise the minimum class recall. Fixes #52427.	2020-03-13 17:35:51 +00:00
Benjamin Trent	89668c5ea0	[ML][Inference] adds new default_field_map field to trained models (#53294 ) (#53419 ) Adds a new `default_field_map` field to trained model config objects. This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data. The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.	2020-03-11 13:49:39 -04:00
Przemysław Witek	8c4c19d310	Perform evaluation in multiple steps when necessary (#53295 ) (#53409 )	2020-03-11 15:36:38 +01:00
Przemysław Witek	063957b7d8	Simplify "refresh" calls. (#53385 ) (#53393 )	2020-03-11 12:26:11 +01:00
Dimitris Athanasiou	cc7751eb16	[7.x][ML] Add ILM policy to ml stats indices (#53349 ) (#53392 ) Adds a size based ILM policy to automatically rollover ml stats indices. Backport of #53349	2020-03-11 13:01:34 +02:00
Dimitris Athanasiou	0fd0516d0d	[7.x][ML] Rename data frame analytics maximum_number_trees to max_trees (#53300 ) (#53390 ) Deprecates `maximum_number_trees` parameter of classification and regression and replaces it with `max_trees`. Backport of #53300	2020-03-11 12:45:27 +02:00
David Roberts	532a720e1b	[ML] Skeleton estimate_model_memory endpoint for anomaly detection (#53386 ) This is a partial implementation of an endpoint for anomaly detector model memory estimation. It is not complete, lacking docs, HLRC and sensible numbers for many anomaly detector configurations. These will be added in a followup PR in time for 7.7 feature freeze. A skeleton endpoint is useful now because it allows work on the UI side of the change to commence. The skeleton endpoint handles the same cases that the old UI code used to handle, and produces very similar estimates for these cases. Backport of #53333	2020-03-11 10:20:00 +00:00
Przemysław Witek	d54d7f2be0	[7.x] Implement ILM policy for .ml-state* indices (#52356 ) (#53327 )	2020-03-10 14:24:18 +01:00
Benjamin Trent	856d9bfbc1	[ML] fixing data frame analysis test when two jobs are started in succession quickly (#53192 ) (#53332 ) A previous change (#53029) is causing analysis jobs to wait for certain indices to be made available. While this it is good for jobs to wait, they could fail early on _start. This change will cause the persistent task to continually retry node assignment when the failure is due to shards not being available. If the shards are not available by the time `timeout` is reached by the predicate, it is treated as a _start failure and the task is canceled. For tasks seeking a new assignment after a node failure, that behavior is unchanged. closes #53188	2020-03-10 08:30:47 -04:00
Mayya Sharipova	f96ad5c32d	Mute testSingleNumericFeatureAndMixedTrainingAndNonTrainingRows	2020-03-06 12:48:05 -05:00
Mark Vieira	09a3f45880	Mute ClassificationIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet Signed-off-by: Mark Vieira <portugee@gmail.com>	2020-03-06 07:38:04 -08:00
James Baiera	01f00df5cd	Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet	2020-03-06 07:37:57 -08:00
Dimitris Athanasiou	9abf537527	[7.x][ML] Improve DF analytics audits and logging (#53179 ) (#53218 ) Adds audits for when the job starts reindexing, loading data, analyzing, writing results. Also adds some info logging. Backport of #53179	2020-03-06 13:47:27 +02:00
Benjamin Trent	af0b1c2860	[ML] Fix minor race condition in dataframe analytics _stop (#53029 ) (#53164 ) Tests have been periodically failing due to a race condition on checking a recently `STOPPED` task's state. The `.ml-state` index is not created until the task has already been transitioned to `STARTED`. This allows the `_start` API call to return. But, if a user (or test) immediately attempts to `_stop` that job, the job could stop and the task removed BEFORE the `.ml-state\|stats` indices are created/updated. This change moves towards the task cleaning up itself in its main execution thread. `stop` flips the flag of the task to `isStopping` and now we check `isStopping` at every necessary method. Allowing the task to gracefully stop. closes #53007	2020-03-05 09:59:18 -05:00
Benjamin Trent	181ee3ae0b	[ML] specifying missing_field_value value and using it instead of empty_string (#53108 ) (#53165 ) For analytics, we need a consistent way of indicating when a value is missing. Inheriting from anomaly detection, analysis sent `""` when a field is missing. This works fine with numbers, but the underlying analytics process actually treats `""` as a category in categorical values. Consequently, you end up with this situation in the resulting model ``` { "frequency_encoding" : { "field" : "RainToday", "feature_name" : "RainToday_frequency", "frequency_map" : { "" : 0.009844409027270245, "No" : 0.6472019970785184, "Yes" : 0.6472019970785184 } } } ``` For inference this is a problem, because inference will treat missing values as `null`. And thus not include them on the infer call against the model. This PR takes advantage of our new `missing_field_value` option and supplies `\0` as the value.	2020-03-05 09:50:52 -05:00
David Roberts	01504df876	[TEST] Force close failed job before skipping test (#53128 ) The assumption added in #52631 skips a problematic test if it fails to create the required conditions for the scenario it is supposed to be testing. (This happens very rarely.) However, before skipping the test it needs to remove the failed job it has created because the standard test cleanup code treats failed jobs as fatal errors. Closes #52608	2020-03-05 10:52:41 +00:00
Jay Modi	c610e0893d	Introduce system index APIs for Kibana (#53035 ) This commit introduces a module for Kibana that exposes REST APIs that will be used by Kibana for access to its system indices. These APIs are wrapped versions of the existing REST endpoints. A new setting is also introduced since the Kibana system indices' names are allowed to be changed by a user in case multiple instances of Kibana use the same instance of Elasticsearch. Additionally, the ThreadContext has been extended to indicate that the use of system indices may be allowed in a request. This will be built upon in the future for the protection of system indices. Backport of #52385	2020-03-03 14:11:36 -07:00
Yang Wang	70814daa86	Allow _rollup_search with read privilege (#52043 ) (#53047 ) Currently _rollup_search requires manage privilege to access. It should really be a read only operation. This PR changes the requirement to be read indices privilege. Resolves: #50245	2020-03-03 22:29:54 +11:00
Mark Vieira	f8396e8d15	Mute RunDataFrameAnalyticsIT.testStopOutlierDetectionWithEnoughDocumentsToScroll Signed-off-by: Mark Vieira <portugee@gmail.com>	2020-03-02 09:21:55 -08:00
Lisa Cawley	4fbe1b0550	[DOCS] Adds cat anomaly detectors API (#52866 ) (#52970 )	2020-03-02 07:28:55 -08:00
Dimitris Athanasiou	85b4e45093	[7.x]ML] Parse and report memory usage for DF Analytics (#52778 ) (#52980 ) Adds reporting of memory usage for data frame analytics jobs. This commit introduces a new index pattern `.ml-stats-*` whose first concrete index will be `.ml-stats-000001`. This index serves to store instrumentation information for those jobs. Backport of #52778 and #52958	2020-02-29 13:03:40 +02:00
Benjamin Trent	19a6c5d980	[7.x] [ML][Inference] Add support for multi-value leaves to the tree model (#52531 ) (#52901 ) * [ML][Inference] Add support for multi-value leaves to the tree model (#52531) This adds support for multi-value leaves. This is a prerequisite for multi-class boosted tree classification.	2020-02-27 14:05:28 -05:00
Benjamin Trent	eac38e9847	[ML] Add indices_options to datafeed config and update (#52793 ) (#52905 ) This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index. This is necessary for the following use cases: - Reading from frozen indices - Allowing certain indices in multiple index patterns to not exist yet These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object. closes https://github.com/elastic/elasticsearch/issues/48056	2020-02-27 13:43:25 -05:00
David Kyle	d8bdf31110	Revert "Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart" This reverts commit `ad3a3b1af9`.	2020-02-27 12:38:13 +00:00
David Kyle	6e5e64559a	Unwrap cause from remote ActionTransportExceptions (#52842 ) (#52878 ) And log the cause	2020-02-27 11:58:28 +00:00
Yang Wang	14c21aedd2	Simplify ml license checking with XpackLicenseState internals (#52684 ) (#52863 ) This change removes TrainedModelConfig#isAvailableWithLicense method with calls to XPackLicenseState#isAllowedByLicense. Please note there are subtle changes to the code logic. But they are the right changes: * Instead of Platinum license, Enterprise license nows guarantees availability. * No explicit check when the license requirement is basic. Since basic license is always available, this check is unnecessary. * Trial license is always allowed.	2020-02-27 14:14:16 +11:00
Jake Landis	b4179a8814	[7.x] Refactor watcher tests (#52799 ) (#52844 ) This PR moves the majority of the Watcher REST tests under the Watcher x-pack plugin. Specifically, moves the Watcher tests from: x-pack/plugin/test x-pack/qa/smoke-test-watcher x-pack/qa/smoke-test-watcher-with-security x-pack/qa/smoke-test-monitoring-with-watcher to: x-pack/plugin/watcher/qa/rest (/test and /qa/smoke-test-watcher) x-pack/plugin/watcher/qa/with-security x-pack/plugin/watcher/qa/with-monitoring Additionally, this disables Watcher from the main x-pack test cluster and consolidates the stop/start logic for the tests listed. No changes to the tests (beyond moving them) are included. 3rd party tests and doc tests (which also touch Watcher) are not included in the changes here.	2020-02-26 15:57:10 -06:00
Lisa Cawley	b788ec7157	[DOCS] Adds cat datafeeds API (#52738 )	2020-02-26 09:28:57 -08:00
David Kyle	ad3a3b1af9	Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart	2020-02-26 14:31:00 +00:00
Jake Landis	8d311297ca	[7.x] Smarter copying of the rest specs and tests (#52114 ) (#52798 ) * Smarter copying of the rest specs and tests (#52114) This PR addresses the unnecessary copying of the rest specs and allows for better semantics for which specs and tests are copied. By default the rest specs will get copied if the project applies `elasticsearch.standalone-rest-test` or `esplugin` and the project has rest tests or you configure the custom extension `restResources`. This PR also removes the need for dozens of places where the x-pack specs were copied by supporting copying of the x-pack rest specs too. The plugin/task introduced here can also copy the rest tests to the local project through a similar configuration. The new plugin/task allows a user to minimize the surface area of which rest specs are copied. Per project can be configured to include only a subset of the specs (or tests). Configuring a project to only copy the specs when actually needed should help with build cache hit rates since we can better define what is actually in use. However, project level optimizations for build cache hit rates are not included with this PR. Also, with this PR you can no longer use the includePackaged flag on integTest task. The following items are included in this PR: * new plugin: `elasticsearch.rest-resources` * new tasks: CopyRestApiTask and CopyRestTestsTask - performs the copy * new extension 'restResources' ``` restResources { restApi { includeCore 'foo' , 'bar' //will include the core specs that start with foo and bar includeXpack 'baz' //will include x-pack specs that start with baz } restTests { includeCore 'foo', 'bar' //will include the core tests that start with foo and bar includeXpack 'baz' //will include the x-pack tests that start with baz } } ```	2020-02-26 08:13:41 -06:00
David Kyle	37be695d5c	[ML] Handle failed datafeed in MlDistributedFailureIT (#52631 ) (#52789 )	2020-02-26 08:18:37 +00:00
David Roberts	cf122d13b8	[ML] Use event.timezone in file_structure_finder ingest pipeline (#52720 ) This is because beat.timezone was renamed to event.timezone in elastic/beats#9458	2020-02-25 12:33:53 +00:00
David Kyle	de3d674bb7	Revert "Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart" This reverts commit `c4d91143ac`.	2020-02-24 15:22:49 +00:00
David Kyle	044a4e127a	[ML] Add reason to DataFrameAnalyticsTask setFailed log message (#52659 ) (#52707 )	2020-02-24 15:21:51 +00:00
Benjamin Trent	afd90647c9	[ML] Adds feature importance to option to inference processor (#52218 ) (#52666 ) This adds machine learning model feature importance calculations to the inference processor. The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values` Example: ``` "inference": { "field_mappings": {}, "model_id": "my_model", "inference_config": { "regression": { "num_top_feature_importance_values": 3 } } } ``` This will write to the document as follows: ``` "inference" : { "feature_importance" : { "FlightTimeMin" : -76.90955548511226, "FlightDelayType" : 114.13514762158526, "DistanceMiles" : 13.731580450792187 }, "predicted_value" : 108.33165831875137, "model_id" : "my_model" } ``` This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888). It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7. Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded. NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc usability blocked by: https://github.com/elastic/ml-cpp/pull/991	2020-02-21 18:42:31 -05:00
Jack Conradson	c4d91143ac	Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart Relates: #52654	2020-02-21 09:32:19 -08:00
Jay Modi	f3f6ff97ee	Single instance of the IndexNameExpressionResolver (#52604 ) This commit modifies the codebase so that our production code uses a single instance of the IndexNameExpressionResolver class. This change is being made in preparation for allowing name expression resolution to be augmented by a plugin. In order to remove some instances of IndexNameExpressionResolver, the single instance is added as a parameter of Plugin#createComponents and PersistentTaskPlugin#getPersistentTasksExecutor. Backport of #52596	2020-02-21 07:50:02 -07:00
Przemysław Witek	b84e8db7b5	[7.x] Rename .ml-state index to .ml-state-000001 to support rollover (#52510 ) (#52595 )	2020-02-21 08:55:59 +01:00
Yang Wang	4bc7545e43	Add enterprise mode and refactor license check (#51864 ) (#52115 ) Add enterprise operation mode to properly map enterprise license. Aslo refactor XPackLicenstate class to consolidate license status and mode checks. This class has many sychronised methods to check basically three things: * Minimum operation mode required * Whether security is enabled * Whether current license needs to be active Depends on the actual feature, either 1, 2 or all of above checks are performed. These are now consolidated in to 3 helper methods (2 of them are new). The synchronization is pushed down to the helper methods so actual checking methods no longer need to worry about it. resolves: #51081	2020-02-21 14:18:18 +11:00
Benjamin Trent	2a5c181dda	[ML][Inference] don't return inflated definition when storing trained models (#52573 ) (#52580 ) When `PUT` is called to store a trained model, it is useful to return the newly create model config. But, it is NOT useful to return the inflated definition. These definitions can be large and returning the inflated definition causes undo work on the server and client side. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-20 19:47:29 -05:00
Benjamin Trent	013d5c2d24	[ML] Adds support for a global calendar via `_all` (#50372 ) (#52578 ) This adds `_all` to Calendar searches. This enables users to supply the `_all` string in the `job_ids` array when creating a Calendar. That calendar will now be applied to all jobs (existing and newly created). Closes #45013 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-20 17:22:59 -05:00
David Kyle	7bbe5c8464	[Ml] Validate tree feature index is within range (#52514 ) This changes the tree validation code to ensure no node in the tree has a feature index that is beyond the bounds of the feature_names array. Specifically this handles the situation where the C++ emits a tree containing a single node and an empty feature_names list. This is valid tree used to centre the data in the ensemble but the validation code would reject this as feature_names is empty. This meant a broken workflow as you cannot GET the model and PUT it back	2020-02-19 14:41:43 +00:00
Przemysław Witek	7cd997df84	[ML] Make ml internal indices hidden (#52423 ) (#52509 )	2020-02-19 14:02:32 +01:00
David Roberts	9c49868bc5	[TEST] Use busy asserts in ML distributed failure test (#52461 ) When changing a job state using a mechanism that doesn't wait for the desired state to be reached within the production code the test code needs to loop until the cluster state has been updated. Closes #52451	2020-02-18 11:17:37 +00:00
David Roberts	48ccf36db9	[ML] Increase assertBusy timeout in ML node failure tests (#52425 ) Following the change to store cluster state in Lucene indices (#50907) it can take longer for all the cluster state updates associated with node failure scenarios to be processed during internal cluster tests where several nodes all run in the same JVM.	2020-02-17 17:04:18 +00:00
Dimitris Athanasiou	ad56802ac6	[7.x][ML] Refactor ML mappings and templates into JSON resources (#51… (#52353 ) ML mappings and index templates have so far been created programmatically. While this had its merits due to static typing, there is consensus it would be clear to maintain those in json files. In addition, we are going to adding ILM policies to these indices and the component for a plugin to register ILM policies is `IndexTemplateRegistry`. It expects the templates to be in resource json files. For the above reasons this commit refactors ML mappings and index templates into json resource files that are registered via `MlIndexTemplateRegistry`. Backport of #51765	2020-02-14 17:16:06 +02:00
Przemysław Witek	0da3af7581	[7.x] [ML] Add _cat/ml/data_frame/analytics API (#52260 ) (#52312 )	2020-02-13 16:55:47 +01:00
Jay Modi	5bcc6fce5c	Remove DeprecationLogger from route objects (#52285 ) This commit removes the need for DeprecatedRoute and ReplacedRoute to have an instance of a DeprecationLogger. Instead the RestController now has a DeprecationLogger that will be used for all deprecated and replaced route messages. Relates #51950 Backport of #52278	2020-02-12 15:05:41 -07:00
Benjamin Trent	2a968f4f2b	[ML] job results provider refactoring (#52012 ) (#52238 ) During a bug hunt, I caught a handful of things (unrelated to the bug) that could be potential issues: 1. Needlessly wrapping in exception handling (minor cleanup) 2. Potential of notifying listeners of a failure multiple times + even trying to notify of a success after a failure notification	2020-02-11 17:54:44 -05:00
David Roberts	d1d9c40e71	[ML] Switch poor categorization audit warning to use status field (#52195 ) In #51146 a rudimentary check for poor categorization was added to 7.6. This change replaces that warning based on a Java-side check with a new one based on the categorization_status field that the ML C++ sets. categorization_status was added in 7.7 and above by #51879, so this new warning based on more advanced conditions will also be in 7.7 and above. Closes #50749	2020-02-11 15:33:27 +00:00
David Roberts	473468d763	[ML] Better error when persistent task assignment disabled (#52014 ) Changes the misleading error message when attempting to open a job while the "cluster.persistent_tasks.allocation.enable" setting is set to "none" to a clearer message that names the setting. Closes #51956	2020-02-11 15:23:21 +00:00
Dimitris Athanasiou	6086fadf00	[7.x][ML] Prepare to hold additional stats in DF Analytics task (#52134 ) (#52187 ) Refactors `DataFrameAnalyticsTask` to hold a `StatsHolder` object. That just has a `ProgressTracker` for now but this is paving the way to add additional stats like memory usage, analysis stats, etc. Backport #52134	2020-02-11 11:18:45 +02:00
Dimitris Athanasiou	cbebc26f50	[7.x][ML] Retry persisting DF Analytics results (#52048 ) (#52160 ) Employs `ResultsPersisterService` from `DataFrameRowsJoiner` in order to add retries when a data frame analytics job is persisting the results to the destination data frame. Backport of #52048	2020-02-11 09:55:00 +02:00
Przemysław Witek	c7cc383d33	[7.x] Update persistent state document in the index the document belongs to (#51751 ) (#52145 )	2020-02-10 16:32:34 +01:00
David Roberts	1cefafdd14	[ML] Add new categorization stats to model_size_stats (#52009 ) This change adds support for the following new model_size_stats fields: - categorized_doc_count - total_category_count - frequent_category_count - rare_category_count - dead_category_count - categorization_status Backport of #51879	2020-02-10 09:10:50 +00:00
Jay Modi	3edadfefd0	RestHandlers declare handled routes (#52123 ) This commit changes how RestHandlers are registered with the RestController so that a RestHandler no longer needs to register itself with the RestController. Instead the RestHandler interface has new methods which when called provide information about the routes (method and path combinations) that are handled by the handler including any deprecated and/or replaced combinations. This change also makes the publication of RestHandlers safe since they no longer publish a reference to themselves within their constructors. Closes #51622 Co-authored-by: Jason Tedor <jason@tedor.me> Backport of #51950	2020-02-09 22:48:32 -07:00
Benjamin Trent	dffcd021df	[7.x] [ML] Add bwc serialization unit test scaffold (#51889 ) (#52061 ) * [ML] Add bwc serialization unit test scaffold (#51889) Adds new `AbstractBWCSerializationTestCase` which provides easy scaffolding for BWC serialization unit tests. These are no replacement for true BWC tests (which execute actual old code). These tests do provide some good coverage for the current code when serializing to/from old versions. * removing unnecessary override for 7.series branch * adding necessary import Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-07 17:17:11 -05:00
Benjamin Trent	846f87a26e	[ML] allow close/stop for jobs/datafeeds with missing configs (#51888 ) (#51997 ) If the configs are removed (by some horrific means), we should still allow tasks to be cleaned up easily. Datafeeds and jobs with missing configs are now visible in their respective _stats calls and can be stopped/closed.	2020-02-06 12:10:18 -05:00
Benjamin Trent	79f143907a	[7.x] [ML] add _cat/ml/trained_models API (#51529 ) (#51936 ) * [ML] add _cat/ml/trained_models API (#51529) This adds _cat/ml/trained_models.	2020-02-05 08:26:44 -05:00
Julie Tibshirani	38ce428831	Create a class to hold field capabilities for one index. (#51844 ) Currently, the same class `FieldCapabilities` is used both to represent the capabilities for one index, and also the merged capabilities across indices. To help clarify the logic, this PR proposes to create a separate class `IndexFieldCapabilities` for the capabilities in one index. The refactor will also help when adding `source_path` information in #49264, since the merged source path field will have a different structure from the field for a single index. Individual changes: * Add a new class IndexFieldCapabilities. * Remove extra constructor from FieldCapabilities. * Combine the add and merge methods in FieldCapabilities.Builder.	2020-02-04 11:24:57 -08:00
David Roberts	9d55c45b5a	[ML] Improve multiline_start_pattern for CSV in find_file_structure (#51737 ) The work to switch file upload over to treating delimited files like semi-structured text and using the ingest pipeline for CSV parsing makes the multi-line start pattern used for delimited files much more critical than it used to be. Previously it was always based on the time field, even if that was towards the end of the columns, and no multi-line pattern was created if no timestamp was detected. This change improves the multi-line start pattern by: 1. Never creating a multi-line pattern if the sample contained only single line records. This improves the import efficiency in a common case. 2. Choosing the leftmost field that has a well-defined pattern, whether that be the time field or a boolean/numeric field. This reduces the risk of a field with newlines occurring earlier, and also means the algorithm doesn't automatically fail for data without a timestamp.	2020-02-04 12:37:48 +00:00
Benjamin Trent	d293980a09	[7.x] [ML] add GET _cat/ml/datafeeds (#51500 ) (#51829 ) * [ML] add GET _cat/ml/datafeeds (#51500) This adds GET _cat/ml/datafeeds && _cat/ml/datafeeds/{datafeed_id} * fixing for java8 compilation	2020-02-03 17:16:33 -05:00
David Roberts	d5d8fb26fa	[TEST] Remove obsolete test trace logging from NetworkDisruptionIT (#51746 ) The issue this logging was added to fix (#49908) was closed in December and the problem has not recurred so this logging is no longer needed.	2020-02-03 11:25:53 +00:00
Dimitris Athanasiou	55b5c8f703	[7.x][ML] Remove index.unassigned.node_left.delayed_timeout setting from M… (#51740 ) (#51764 ) This setting was introduced with the purpose of reducing the time took by tests that shut nodes down. Tests like `MlDistributedFailureIT` and `NetworkDisruptionIT`. However, it is unfortunate to have to set the value to an explicit value in production. In addition, and most important, the dynamically choosing the value for this setting makes it impossible to adopt static index template configs that we register via `IndexTemplateRegistry`, which we need to use in order to start registering ILM policies for the ML indices. This commit removes this setting from our templates. I run the tests a few times and could not see execution time differing significantly. Backport of #51740	2020-01-31 20:28:29 +02:00
Benjamin Trent	e372854d43	[ML][Inference] Fix model pagination with models as resources (#51573 ) (#51736 ) This adds logic to handle paging problems when the ID pattern + tags reference models stored as resources. Most of the complexity comes from the issue where a model stored as a resource could be at the start, or the end of a page or when we are on the last page.	2020-01-31 07:52:19 -05:00
Gordon Brown	10c8179351	Use exclusions list instead of fake system indices (#51586 ) This commit switches the strategy for managing dot-prefixed indices that should be hidden indices from using "fake" system indices to an explicit exclusions list that must be updated when those indices are converted to hidden indices.	2020-01-30 16:31:27 -07:00
Benjamin Trent	1380dd439a	[7.x] [ML][Inference] Fix weighted mode definition (#51648 ) (#51695 ) * [ML][Inference] Fix weighted mode definition (#51648) Weighted mode inaccurately assumed that the "max value" of the input values would be the maximum class value. This does not make sense. Weighted Mode should know how many classes there are. Hence the new parameter `num_classes`. This indicates what the maximum class value to be expected.	2020-01-30 15:33:25 -05:00
Henning Andersen	149b68d850	[ML] Fix possible race condition starting datafeed (#51646 ) Datafeeds being closed while starting could result in and NPE. This was handled as any other failure, masking out the NPE. However, this conflicts with the changes in #50886. Related to #50886 and #51302	2020-01-30 08:23:45 +01:00
Przemysław Witek	683170b007	Increase the number of indexed documents to increase a chance that there are at least 2 training rows. (#51607 ) (#51615 )	2020-01-29 17:17:19 +01:00
Gordon Brown	89c2834b24	Deprecate creation of dot-prefixed index names except for hidden and system indices (#49959 ) This commit deprecates the creation of dot-prefixed index names (e.g. .watches) unless they are either 1) a hidden index, or 2) registered by a plugin that extends SystemIndexPlugin. This is the first step towards more thorough protections for system indices. This commit also modifies several plugins which use dot-prefixed indices to register indices they own as system indices, and adds a plugin to register .tasks as a system index.	2020-01-28 10:01:16 -07:00
David Roberts	550254ec7f	[ML] Use CSV ingest processor in find_file_structure ingest pipeline (#51492 ) Changes the find_file_structure response to include a CSV ingest processor in the ingest pipeline it suggests. Previously the Kibana file upload functionality parsed CSV in the browser, but by parsing CSV in the ingest pipeline it makes the Kibana file upload functionality more easily interchangable with Filebeat such that the configurations it creates can more easily be used to import data with the same structure repeatedly in production.	2020-01-28 14:38:43 +00:00
David Roberts	3c223ceea1	[ML] Fix 2 digit year regex in find_file_structure (#51469 ) The DATE and DATESTAMP Grok patterns match 2 digit years as well as 4 digit years. The pattern determination in find_file_structure worked correctly in this case, but the regex used to create a multi-line start pattern was assuming a 4 digit year. Also, the quick rule-out patterns did not always correctly consider 2 digit years, meaning that detection was inconsistent. This change fixes both problems, and also extends the tests for DATE and DATESTAMP to check both 2 and 4 digit years.	2020-01-27 17:23:18 +00:00
Przemysław Witek	dd3e2f1e18	[7.x] Update quantiles document in the index the document belongs to (#51135 ) (#51415 )	2020-01-27 10:13:02 +01:00
Benjamin Trent	bf53ca3380	[7.x] [ML] Add _cat/ml/anomaly_detectors API (#51364 ) (#51408 ) [ML] Add _cat/ml/anomaly_detectors API (#51364)	2020-01-24 11:54:22 -05:00
Benjamin Trent	fc994d9ce1	[ML][Inference] Adds validations for model PUT (#51376 ) (#51409 ) Adds validations making sure that * `input.field_names` is not empty * `ensemble.trained_models` is not empty * `tree.feature_names` is not empty closes https://github.com/elastic/elasticsearch/issues/51354	2020-01-24 09:29:12 -05:00
Benjamin Trent	76660a5a4f	[7.x] [ML][Inference] add tags url param to GET (#51330 ) (#51404 ) * [ML][Inference] add tags url param to GET (#51330) Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint. This parameter allows the list of models to be further reduced to those who contain all the provided tags.	2020-01-24 08:26:58 -05:00
Dimitris Athanasiou	3443d69883	[7.x][ML] Rename DataFrameAnalyticsIndex to DestinationIndex (#51353 ) (#51356 ) As we prepare to introduce a new index for storing additional information about data frame analytics jobs (e.g. intrumentation), renaming this class to `DestinationIndex` better captures what it does and leaves its prior name available for a more suitable use. Backport of #51353 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-01-24 09:51:48 +02:00

1 2 3 4 5 ...

766 Commits