OpenSearch

Commit Graph

Author	SHA1	Message	Date
Benjamin Trent	2ccb963f1d	Create GET _cat/transforms API Issue (#53643 ) (#53726 ) Adds new` _cat/transform` and `_cat/transform/{transform_id}` endpoints.	2020-03-18 10:45:28 -04:00
Przemysław Witek	ec13c093df	Make ML index aliases hidden (#53160 ) (#53710 )	2020-03-18 10:28:45 +01:00
Przemysław Witek	376b2ae735	[7.x] Make classification evaluation metrics work when there is field mapping type mismatch (#53458 ) (#53601 )	2020-03-16 15:38:56 +01:00
Dimitris Athanasiou	94da4ca3fc	[7.x][ML] Extend classification to support multiple classes (#53539 ) (#53597 ) Prepares classification analysis to support more than just two classes. It introduces a new parameter to the process config which dictates the `num_classes` to the process. It also changes the max classes limit to `30` provisionally. Backport of #53539	2020-03-16 15:00:54 +02:00
Benjamin Trent	1262ab2762	[ML] [Inference] fix number inference models returned in x-pack info call (#53540 ) (#53560 ) the ML portion of the x-pack info API was erroneously counting configuration documents and definition documents. The underlying implementation of our storage separates the two out. This PR filters the query so that only trained model config documents are counted.	2020-03-13 16:53:34 -04:00
Benjamin Trent	4e43ede735	[ML] renaming inference processor field field_mappings to new name field_map (#53433 ) (#53502 ) This renames the `inference` processor configuration field `field_mappings` to `field_map`. `field_mappings` is now deprecated.	2020-03-13 15:40:57 -04:00
Tom Veasey	690099553c	[7.x][ML] Adds the class_assignment_objective parameter to classification (#53552 ) Adds a new parameter for classification that enables choosing whether to assign labels to maximise accuracy or to maximise the minimum class recall. Fixes #52427.	2020-03-13 17:35:51 +00:00
Benjamin Trent	89668c5ea0	[ML][Inference] adds new default_field_map field to trained models (#53294 ) (#53419 ) Adds a new `default_field_map` field to trained model config objects. This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data. The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.	2020-03-11 13:49:39 -04:00
Przemysław Witek	8c4c19d310	Perform evaluation in multiple steps when necessary (#53295 ) (#53409 )	2020-03-11 15:36:38 +01:00
Przemysław Witek	063957b7d8	Simplify "refresh" calls. (#53385 ) (#53393 )	2020-03-11 12:26:11 +01:00
Dimitris Athanasiou	cc7751eb16	[7.x][ML] Add ILM policy to ml stats indices (#53349 ) (#53392 ) Adds a size based ILM policy to automatically rollover ml stats indices. Backport of #53349	2020-03-11 13:01:34 +02:00
Dimitris Athanasiou	0fd0516d0d	[7.x][ML] Rename data frame analytics maximum_number_trees to max_trees (#53300 ) (#53390 ) Deprecates `maximum_number_trees` parameter of classification and regression and replaces it with `max_trees`. Backport of #53300	2020-03-11 12:45:27 +02:00
David Roberts	532a720e1b	[ML] Skeleton estimate_model_memory endpoint for anomaly detection (#53386 ) This is a partial implementation of an endpoint for anomaly detector model memory estimation. It is not complete, lacking docs, HLRC and sensible numbers for many anomaly detector configurations. These will be added in a followup PR in time for 7.7 feature freeze. A skeleton endpoint is useful now because it allows work on the UI side of the change to commence. The skeleton endpoint handles the same cases that the old UI code used to handle, and produces very similar estimates for these cases. Backport of #53333	2020-03-11 10:20:00 +00:00
Przemysław Witek	d54d7f2be0	[7.x] Implement ILM policy for .ml-state* indices (#52356 ) (#53327 )	2020-03-10 14:24:18 +01:00
Benjamin Trent	856d9bfbc1	[ML] fixing data frame analysis test when two jobs are started in succession quickly (#53192 ) (#53332 ) A previous change (#53029) is causing analysis jobs to wait for certain indices to be made available. While this it is good for jobs to wait, they could fail early on _start. This change will cause the persistent task to continually retry node assignment when the failure is due to shards not being available. If the shards are not available by the time `timeout` is reached by the predicate, it is treated as a _start failure and the task is canceled. For tasks seeking a new assignment after a node failure, that behavior is unchanged. closes #53188	2020-03-10 08:30:47 -04:00
Mayya Sharipova	f96ad5c32d	Mute testSingleNumericFeatureAndMixedTrainingAndNonTrainingRows	2020-03-06 12:48:05 -05:00
Mark Vieira	09a3f45880	Mute ClassificationIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet Signed-off-by: Mark Vieira <portugee@gmail.com>	2020-03-06 07:38:04 -08:00
James Baiera	01f00df5cd	Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet	2020-03-06 07:37:57 -08:00
Dimitris Athanasiou	9abf537527	[7.x][ML] Improve DF analytics audits and logging (#53179 ) (#53218 ) Adds audits for when the job starts reindexing, loading data, analyzing, writing results. Also adds some info logging. Backport of #53179	2020-03-06 13:47:27 +02:00
Benjamin Trent	af0b1c2860	[ML] Fix minor race condition in dataframe analytics _stop (#53029 ) (#53164 ) Tests have been periodically failing due to a race condition on checking a recently `STOPPED` task's state. The `.ml-state` index is not created until the task has already been transitioned to `STARTED`. This allows the `_start` API call to return. But, if a user (or test) immediately attempts to `_stop` that job, the job could stop and the task removed BEFORE the `.ml-state\|stats` indices are created/updated. This change moves towards the task cleaning up itself in its main execution thread. `stop` flips the flag of the task to `isStopping` and now we check `isStopping` at every necessary method. Allowing the task to gracefully stop. closes #53007	2020-03-05 09:59:18 -05:00
Benjamin Trent	181ee3ae0b	[ML] specifying missing_field_value value and using it instead of empty_string (#53108 ) (#53165 ) For analytics, we need a consistent way of indicating when a value is missing. Inheriting from anomaly detection, analysis sent `""` when a field is missing. This works fine with numbers, but the underlying analytics process actually treats `""` as a category in categorical values. Consequently, you end up with this situation in the resulting model ``` { "frequency_encoding" : { "field" : "RainToday", "feature_name" : "RainToday_frequency", "frequency_map" : { "" : 0.009844409027270245, "No" : 0.6472019970785184, "Yes" : 0.6472019970785184 } } } ``` For inference this is a problem, because inference will treat missing values as `null`. And thus not include them on the infer call against the model. This PR takes advantage of our new `missing_field_value` option and supplies `\0` as the value.	2020-03-05 09:50:52 -05:00
David Roberts	01504df876	[TEST] Force close failed job before skipping test (#53128 ) The assumption added in #52631 skips a problematic test if it fails to create the required conditions for the scenario it is supposed to be testing. (This happens very rarely.) However, before skipping the test it needs to remove the failed job it has created because the standard test cleanup code treats failed jobs as fatal errors. Closes #52608	2020-03-05 10:52:41 +00:00
Jay Modi	c610e0893d	Introduce system index APIs for Kibana (#53035 ) This commit introduces a module for Kibana that exposes REST APIs that will be used by Kibana for access to its system indices. These APIs are wrapped versions of the existing REST endpoints. A new setting is also introduced since the Kibana system indices' names are allowed to be changed by a user in case multiple instances of Kibana use the same instance of Elasticsearch. Additionally, the ThreadContext has been extended to indicate that the use of system indices may be allowed in a request. This will be built upon in the future for the protection of system indices. Backport of #52385	2020-03-03 14:11:36 -07:00
Yang Wang	70814daa86	Allow _rollup_search with read privilege (#52043 ) (#53047 ) Currently _rollup_search requires manage privilege to access. It should really be a read only operation. This PR changes the requirement to be read indices privilege. Resolves: #50245	2020-03-03 22:29:54 +11:00
Mark Vieira	f8396e8d15	Mute RunDataFrameAnalyticsIT.testStopOutlierDetectionWithEnoughDocumentsToScroll Signed-off-by: Mark Vieira <portugee@gmail.com>	2020-03-02 09:21:55 -08:00
Lisa Cawley	4fbe1b0550	[DOCS] Adds cat anomaly detectors API (#52866 ) (#52970 )	2020-03-02 07:28:55 -08:00
Dimitris Athanasiou	85b4e45093	[7.x]ML] Parse and report memory usage for DF Analytics (#52778 ) (#52980 ) Adds reporting of memory usage for data frame analytics jobs. This commit introduces a new index pattern `.ml-stats-*` whose first concrete index will be `.ml-stats-000001`. This index serves to store instrumentation information for those jobs. Backport of #52778 and #52958	2020-02-29 13:03:40 +02:00
Benjamin Trent	19a6c5d980	[7.x] [ML][Inference] Add support for multi-value leaves to the tree model (#52531 ) (#52901 ) * [ML][Inference] Add support for multi-value leaves to the tree model (#52531) This adds support for multi-value leaves. This is a prerequisite for multi-class boosted tree classification.	2020-02-27 14:05:28 -05:00
Benjamin Trent	eac38e9847	[ML] Add indices_options to datafeed config and update (#52793 ) (#52905 ) This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index. This is necessary for the following use cases: - Reading from frozen indices - Allowing certain indices in multiple index patterns to not exist yet These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object. closes https://github.com/elastic/elasticsearch/issues/48056	2020-02-27 13:43:25 -05:00
David Kyle	d8bdf31110	Revert "Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart" This reverts commit `ad3a3b1af9`.	2020-02-27 12:38:13 +00:00
David Kyle	6e5e64559a	Unwrap cause from remote ActionTransportExceptions (#52842 ) (#52878 ) And log the cause	2020-02-27 11:58:28 +00:00
Yang Wang	14c21aedd2	Simplify ml license checking with XpackLicenseState internals (#52684 ) (#52863 ) This change removes TrainedModelConfig#isAvailableWithLicense method with calls to XPackLicenseState#isAllowedByLicense. Please note there are subtle changes to the code logic. But they are the right changes: * Instead of Platinum license, Enterprise license nows guarantees availability. * No explicit check when the license requirement is basic. Since basic license is always available, this check is unnecessary. * Trial license is always allowed.	2020-02-27 14:14:16 +11:00
Jake Landis	b4179a8814	[7.x] Refactor watcher tests (#52799 ) (#52844 ) This PR moves the majority of the Watcher REST tests under the Watcher x-pack plugin. Specifically, moves the Watcher tests from: x-pack/plugin/test x-pack/qa/smoke-test-watcher x-pack/qa/smoke-test-watcher-with-security x-pack/qa/smoke-test-monitoring-with-watcher to: x-pack/plugin/watcher/qa/rest (/test and /qa/smoke-test-watcher) x-pack/plugin/watcher/qa/with-security x-pack/plugin/watcher/qa/with-monitoring Additionally, this disables Watcher from the main x-pack test cluster and consolidates the stop/start logic for the tests listed. No changes to the tests (beyond moving them) are included. 3rd party tests and doc tests (which also touch Watcher) are not included in the changes here.	2020-02-26 15:57:10 -06:00
Lisa Cawley	b788ec7157	[DOCS] Adds cat datafeeds API (#52738 )	2020-02-26 09:28:57 -08:00
David Kyle	ad3a3b1af9	Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart	2020-02-26 14:31:00 +00:00
Jake Landis	8d311297ca	[7.x] Smarter copying of the rest specs and tests (#52114 ) (#52798 ) * Smarter copying of the rest specs and tests (#52114) This PR addresses the unnecessary copying of the rest specs and allows for better semantics for which specs and tests are copied. By default the rest specs will get copied if the project applies `elasticsearch.standalone-rest-test` or `esplugin` and the project has rest tests or you configure the custom extension `restResources`. This PR also removes the need for dozens of places where the x-pack specs were copied by supporting copying of the x-pack rest specs too. The plugin/task introduced here can also copy the rest tests to the local project through a similar configuration. The new plugin/task allows a user to minimize the surface area of which rest specs are copied. Per project can be configured to include only a subset of the specs (or tests). Configuring a project to only copy the specs when actually needed should help with build cache hit rates since we can better define what is actually in use. However, project level optimizations for build cache hit rates are not included with this PR. Also, with this PR you can no longer use the includePackaged flag on integTest task. The following items are included in this PR: * new plugin: `elasticsearch.rest-resources` * new tasks: CopyRestApiTask and CopyRestTestsTask - performs the copy * new extension 'restResources' ``` restResources { restApi { includeCore 'foo' , 'bar' //will include the core specs that start with foo and bar includeXpack 'baz' //will include x-pack specs that start with baz } restTests { includeCore 'foo', 'bar' //will include the core tests that start with foo and bar includeXpack 'baz' //will include the x-pack tests that start with baz } } ```	2020-02-26 08:13:41 -06:00
David Kyle	37be695d5c	[ML] Handle failed datafeed in MlDistributedFailureIT (#52631 ) (#52789 )	2020-02-26 08:18:37 +00:00
David Roberts	cf122d13b8	[ML] Use event.timezone in file_structure_finder ingest pipeline (#52720 ) This is because beat.timezone was renamed to event.timezone in elastic/beats#9458	2020-02-25 12:33:53 +00:00
David Kyle	de3d674bb7	Revert "Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart" This reverts commit `c4d91143ac`.	2020-02-24 15:22:49 +00:00
David Kyle	044a4e127a	[ML] Add reason to DataFrameAnalyticsTask setFailed log message (#52659 ) (#52707 )	2020-02-24 15:21:51 +00:00
Benjamin Trent	afd90647c9	[ML] Adds feature importance to option to inference processor (#52218 ) (#52666 ) This adds machine learning model feature importance calculations to the inference processor. The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values` Example: ``` "inference": { "field_mappings": {}, "model_id": "my_model", "inference_config": { "regression": { "num_top_feature_importance_values": 3 } } } ``` This will write to the document as follows: ``` "inference" : { "feature_importance" : { "FlightTimeMin" : -76.90955548511226, "FlightDelayType" : 114.13514762158526, "DistanceMiles" : 13.731580450792187 }, "predicted_value" : 108.33165831875137, "model_id" : "my_model" } ``` This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888). It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7. Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded. NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc usability blocked by: https://github.com/elastic/ml-cpp/pull/991	2020-02-21 18:42:31 -05:00
Jack Conradson	c4d91143ac	Mute RunDataFrameAnalyticsIT.testOutlierDetectionStopAndRestart Relates: #52654	2020-02-21 09:32:19 -08:00
Jay Modi	f3f6ff97ee	Single instance of the IndexNameExpressionResolver (#52604 ) This commit modifies the codebase so that our production code uses a single instance of the IndexNameExpressionResolver class. This change is being made in preparation for allowing name expression resolution to be augmented by a plugin. In order to remove some instances of IndexNameExpressionResolver, the single instance is added as a parameter of Plugin#createComponents and PersistentTaskPlugin#getPersistentTasksExecutor. Backport of #52596	2020-02-21 07:50:02 -07:00
Przemysław Witek	b84e8db7b5	[7.x] Rename .ml-state index to .ml-state-000001 to support rollover (#52510 ) (#52595 )	2020-02-21 08:55:59 +01:00
Yang Wang	4bc7545e43	Add enterprise mode and refactor license check (#51864 ) (#52115 ) Add enterprise operation mode to properly map enterprise license. Aslo refactor XPackLicenstate class to consolidate license status and mode checks. This class has many sychronised methods to check basically three things: * Minimum operation mode required * Whether security is enabled * Whether current license needs to be active Depends on the actual feature, either 1, 2 or all of above checks are performed. These are now consolidated in to 3 helper methods (2 of them are new). The synchronization is pushed down to the helper methods so actual checking methods no longer need to worry about it. resolves: #51081	2020-02-21 14:18:18 +11:00
Benjamin Trent	2a5c181dda	[ML][Inference] don't return inflated definition when storing trained models (#52573 ) (#52580 ) When `PUT` is called to store a trained model, it is useful to return the newly create model config. But, it is NOT useful to return the inflated definition. These definitions can be large and returning the inflated definition causes undo work on the server and client side. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-20 19:47:29 -05:00
Benjamin Trent	013d5c2d24	[ML] Adds support for a global calendar via `_all` (#50372 ) (#52578 ) This adds `_all` to Calendar searches. This enables users to supply the `_all` string in the `job_ids` array when creating a Calendar. That calendar will now be applied to all jobs (existing and newly created). Closes #45013 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-20 17:22:59 -05:00
David Kyle	7bbe5c8464	[Ml] Validate tree feature index is within range (#52514 ) This changes the tree validation code to ensure no node in the tree has a feature index that is beyond the bounds of the feature_names array. Specifically this handles the situation where the C++ emits a tree containing a single node and an empty feature_names list. This is valid tree used to centre the data in the ensemble but the validation code would reject this as feature_names is empty. This meant a broken workflow as you cannot GET the model and PUT it back	2020-02-19 14:41:43 +00:00
Przemysław Witek	7cd997df84	[ML] Make ml internal indices hidden (#52423 ) (#52509 )	2020-02-19 14:02:32 +01:00
David Roberts	9c49868bc5	[TEST] Use busy asserts in ML distributed failure test (#52461 ) When changing a job state using a mechanism that doesn't wait for the desired state to be reached within the production code the test code needs to loop until the cluster state has been updated. Closes #52451	2020-02-18 11:17:37 +00:00

1 2 3 4 5 ...

699 Commits