OpenSearch

Commit Graph

Author	SHA1	Message	Date
Dimitris Athanasiou	4edb2e7bb6	[7.x][ML] Add optional source filtering during data frame reindexing (#49690 ) (#49718 ) This adds a `_source` setting under the `source` setting of a data frame analytics config. The new `_source` is reusing the structure of a `FetchSourceContext` like `analyzed_fields` does. Specifying includes and excludes for source allows selecting which fields will get reindexed and will be available in the destination index. Closes #49531 Backport of #49690	2019-11-29 16:10:44 +02:00
Dimitris Athanasiou	8eaee7cbdc	[7.x][ML] Explain data frame analytics API (#49455 ) (#49504 ) This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why. Backport of #49455	2019-11-22 22:06:10 +02:00
Benjamin Trent	eefe7688ce	[7.x][ML] ML Model Inference Ingest Processor (#49052 ) (#49257 ) * [ML] ML Model Inference Ingest Processor (#49052) * [ML][Inference] adds lazy model loader and inference (#47410) This adds a couple of things: - A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them - A Model class and its first sub-class LocalModel. Used to cache model information and run inference. - Transport action and handler for requests to infer against a local model Related Feature PRs: * [ML][Inference] Adjust inference configuration option API (#47812) * [ML][Inference] adds logistic_regression output aggregator (#48075) * [ML][Inference] Adding read/del trained models (#47882) * [ML][Inference] Adding inference ingest processor (#47859) * [ML][Inference] fixing classification inference for ensemble (#48463) * [ML][Inference] Adding model memory estimations (#48323) * [ML][Inference] adding more options to inference processor (#48545) * [ML][Inference] handle string values better in feature extraction (#48584) * [ML][Inference] Adding _stats endpoint for inference (#48492) * [ML][Inference] add inference processors and trained models to usage (#47869) * [ML][Inference] add new flag for optionally including model definition (#48718) * [ML][Inference] adding license checks (#49056) * [ML][Inference] Adding memory and compute estimates to inference (#48955) * fixing version of indexed docs for model inference	2019-11-18 13:19:17 -05:00
Przemysław Witek	150db2b544	Throw an exception when memory usage estimation endpoint encounters empty data frame. (#49143 ) (#49164 )	2019-11-18 07:52:57 +01:00
Rory Hunter	c46a0e8708	Apply 2-space indent to all gradle scripts (#49071 ) Backport of #48849. Update `.editorconfig` to make the Java settings the default for all files, and then apply a 2-space indent to all `*.gradle` files. Then reformat all the files.	2019-11-14 11:01:23 +00:00
Dimitris Athanasiou	7667ea5f6f	[7.x][ML] Additional outlier detection parameters (#47600 ) (#47669 ) Adds the following parameters to `outlier_detection`: - `compute_feature_influence` (boolean): whether to compute or not feature influence scores - `outlier_fraction` (double): the proportion of the data set assumed to be outlying prior to running outlier detection - `standardization_enabled` (boolean): whether to apply standardization to the feature values Backport of #47600	2019-10-07 18:21:33 +03:00
Przemysław Witek	ee952da2e2	[7.x] Implement evaluation API for multiclass classification problem (#47126 ) (#47343 )	2019-10-04 17:54:51 +02:00
Przemysław Witek	ec9b77deaa	[7.x] Implement new analysis type: classification (#46537 ) (#47559 )	2019-10-04 13:47:19 +02:00
Dimitris Athanasiou	873ad3f942	[7.x][ML] Add option to regression to randomize training set (#45969 ) (#46017 ) Adds a parameter `training_percent` to regression. The default value is `100`. When the parameter is set to a value less than `100`, from the rows that can be used for training (ie. those that have a value for the dependent variable) we randomly choose whether to actually use for training. This enables splitting the data into a training set and the rest, usually called testing, validation or holdout set, which allows for validating the model on data that have not been used for training. Technically, the analytics process considers as training the data that have a value for the dependent variable. Thus, when we decide a training row is not going to be used for training, we simply clear the row's dependent variable.	2019-08-27 17:53:11 +03:00
Benjamin Trent	a3a4ae0ac2	[ML] fixing bug where analytics process starts with 0 rows (#45879 ) (#45988 ) The native process requires that there be a non-zero number of rows to analyze. If the flag --rows 0 is passed to the executable, it throws and does not start. When building the configuration for the process we should not start the native process if there are no rows. Adding some logging to indicate what is occurring.	2019-08-26 14:18:17 -05:00
Benjamin Trent	ba7b677618	[ML] better handle empty results when evaluating regression (#45745 ) (#45759 ) * [ML] better handle empty results when evaluating regression * adding new failure test to ml_security black list * fixing equality check for regression results	2019-08-20 17:37:04 -05:00
Przemysław Witek	1aed388a24	Add view_index_metadata to roles.yml and remove as many df analytics test cases from build.gradle blacklist as possible. (#45451 ) (#45465 )	2019-08-13 08:31:58 +02:00
Dimitris Athanasiou	27497ff75f	[7.x][ML] Add regression analysis to DF analytics (#45292 ) (#45388 ) This commit adds a first draft of a regression analysis to data frame analytics. There is high probability that the exact syntax might change. This commit adds the new analysis type and its parameters as well as appropriate validation. It also modifies the extractor and the fields detector to be able to handle categorical fields as regression analysis supports them.	2019-08-09 19:31:13 +03:00
Dimitris Athanasiou	8a6675b994	[7.x][ML] Check dest index is empty when starting DF analytics (#45094 ) (#45112 ) If one tries to start a DF analytics job that has already run, the result will be that the task will fail after reindexing the dest index from the source index. The results of the prior run will be gone and the task state is not properly set to failed with the failure reason. This commit improves the behavior in this scenario. First, we set the task state to `failed` in a set of failures that were missed. Second, a validation is added that if the destination index exists, it must be empty.	2019-08-02 00:19:48 +03:00
Andrei Stefan	2633d11eb7	Switch from using docvalue_fields to extracting values from _source (#44062 ) (#44804 ) * Switch from using docvalue_fields to extracting values from _source where applicable. Doing this means parsing the _source and handling the numbers parsing just like Elasticsearch is doing it when it's indexing a document. * This also introduces a minor limitation: aliases type of fields that are NOT part of a tree of sub-fields will not be able to be retrieved anymore. field_caps API doesn't shed any light into a field being an alias or not and at _source parsing time there is no way to know if a root field is an alias or not. Fields of the type "a.b.c.alias" can be extracted from docvalue_fields, only if the field they point to can be extracted from docvalue_fields. Also, not all fields in a hierarchy of fields can be evaluated to being an alias. (cherry picked from commit 8bf8a055e38f00df5f49c8d97f632f69d6e00c2c)	2019-07-25 10:02:41 +03:00
Ryan Ernst	7e06888bae	Convert testclusters to use distro download plugin (#44253 ) (#44362 ) Test clusters currently has its own set of logic for dealing with finding different versions of Elasticsearch, downloading them, and extracting them. This commit converts testclusters to use the DistributionDownloadPlugin.	2019-07-15 17:53:05 -07:00
Benjamin Trent	c82d9c5b50	[ML] Adds support for regression.mean_squared_error to eval API (#44140 ) (#44218 ) * [ML] Adds support for regression.mean_squared_error to eval API * addressing PR comments * fixing tests	2019-07-11 09:22:52 -05:00
Dimitris Athanasiou	cab879118d	[7.x][ML] Support multiple source indices for df-analytics (#43702 ) (#43731 ) This commit adds support for multiple source indices. In order to deal with multiple indices having different mappings, it attempts a best-effort approach to merge the mappings assuming there are no conflicts. In case conflicts exists an error will be returned. To allow users creating custom mappings for special use cases, the destination index is now allowed to exist before the analytics job runs. In addition, settings are no longer copied except for the `index.number_of_shards` and `index.number_of_replicas`.	2019-06-28 13:28:03 +03:00
Przemysław Witek	94f18da5df	Add version and create_time to data frame analytics config (#43683 ) (#43712 )	2019-06-28 07:37:21 +02:00
Dimitris Athanasiou	126c2fd2d5	[7.x][ML] Machine learning data frame analytics (#43544 ) (#43592 ) This merges the initial work that adds a framework for performing machine learning analytics on data frames. The feature is currently experimental and requires a platinum license. Note that the original commits can be found in the `feature-ml-data-frame-analytics` branch. A new set of APIs is added which allows the creation of data frame analytics jobs. Configuration allows specifying different types of analysis to be performed on a data frame. At first there is support for outlier detection. The APIs are: - PUT _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id}/_stats - POST _ml/data_frame/analysis/{id}/_start - POST _ml/data_frame/analysis/{id}/_stop - DELETE _ml/data_frame/analysis/{id} When a data frame analytics job is started a persistent task is created and started. The main steps of the task are: 1. reindex the source index into the dest index 2. analyze the data through the data_frame_analyzer c++ process 3. merge the results of the process back into the destination index In addition, an evaluation API is added which packages commonly used metrics that provide evaluation of various analysis: - POST _ml/data_frame/_evaluate	2019-06-25 20:29:11 +03:00
Alpar Torok	94930d0e84	Testclusters: convert ml qa tests (#43229 ) * Testclusters: convert ml qa tests This PR converts the ML tests to use testclusters.	2019-06-18 11:55:11 +03:00
Dimitris Athanasiou	76a92b49a8	[ML] Get resources action should be lenient when sort field is unmapped (#42991 ) (#43046 ) Get resources action sorts on the resource id. When there are no resources at all, then it is possible the index does not contain a mapping for the resource id field. In that case, the search api fails by default. This commit adjusts the search request to ignore unmapped fields. Closes elastic/kibana#37870	2019-06-10 19:50:19 +03:00
Mark Vieira	e44b8b1e2e	[Backport] Remove dependency substitutions 7.x (#42866 ) * Remove unnecessary usage of Gradle dependency substitution rules (#42773) (cherry picked from commit 12d583dbf6f7d44f00aa365e34fc7e937c3c61f7)	2019-06-04 13:50:23 -07:00
Przemysław Witek	f5014ace64	[ML] Add validation that rejects duplicate detectors in PutJobAction (#40967 ) (#41072 ) * [ML] Add validation that rejects duplicate detectors in PutJobAction Closes #39704 * Add YML integration test for duplicate detectors fix. * Use "== false" comparison rather than "!" operator. * Refine error message to sound more natural. * Put job description in square brackets in the error message. * Use the new validation in ValidateJobConfigAction. * Exclude YML tests for new validation from permission tests.	2019-04-10 15:43:35 +02:00
Martijn Laarman	9b4d96534b	Fix #38623 remove xpack namespace REST API (#38625 ) (#39036 ) * Fix #38623 remove xpack namespace REST API Except for xpack.usage and xpack.info API's, this moves the last remaining API's out of the xpack namespace * rename xpack api's inside inside the files as well * updated yaml tests references to xpack namespaces api's * update callsApi calls in the IT subclasses * make sure docs testing does not use xpack namespaced api's * fix leftover xpack namespaced method names in docs/build.gradle * found another leftover reference (cherry picked from commit ccb5d934363c37506b76119ac050a254fa80b5e7)	2019-02-18 12:40:07 +01:00
Benjamin Trent	7e4c0e6991	ML: Adds set_upgrade_mode API endpoint (#37837 ) * ML: Add MlMetadata.upgrade_mode and API * Adding tests * Adding wait conditionals for the upgrade_mode call to return * Adding tests * adjusting format and tests * Adjusting wait conditions for api return and msgs * adjusting doc tests * adding upgrade mode tests to black list	2019-01-28 09:07:30 -06:00
Benjamin Trent	9e932f4869	ML: removing unnecessary upgrade code (#37879 )	2019-01-25 13:57:41 -06:00
Benjamin Trent	df3b58cb04	ML: add migrate anomalies assistant (#36643 ) * ML: add migrate anomalies assistant * adjusting failure handling for reindex * Fixing request and tests * Adding tests to blacklist * adjusting test * test fix: posting data directly to the job instead of relying on datafeed * adjusting API usage * adding Todos and adjusting endpoint * Adding types to reindexRequest * removing unreliable "live" data test * adding index refresh to test * adding index refresh to test * adding index refresh to yaml test * fixing bad exists call * removing todo * Addressing remove comments * Adjusting rest endpoint name * making service have its own logger * adjusting validity check for newindex names * fixing typos * fixing renaming	2019-01-09 14:25:35 -06:00
David Roberts	e42cc5cd8c	[ML] Add a file structure determination endpoint (#33471 ) This endpoint accepts an arbitrary file in the request body and attempts to determine the structure. If successful it also proposes mappings that could be used when indexing the file's contents, and calculates simple statistics for each of the fields that are useful in the data preparation step prior to configuring machine learning jobs.	2018-09-07 17:41:57 +01:00
Benjamin Trent	767d8e0801	[ML] Delete forecast API (#31134 ) (#33218 ) * Delete forecast API (#31134)	2018-09-03 19:06:18 -05:00
Nik Everett	2c81d7f77e	Build: Rework shadow plugin configuration (#32409 ) This reworks how we configure the `shadow` plugin in the build. The major change is that we no longer bundle dependencies in the `compile` configuration, instead we bundle dependencies in the new `bundle` configuration. This feels more right because it is a little more "opt in" rather than "opt out" and the name of the `bundle` configuration is a little more obvious. As an neat side effect of this, the `runtimeElements` configuration used when one project depends on another now contains exactly the dependencies needed to run the project so you no longer need to reference projects that use the shadow plugin like this: ``` testCompile project(path: ':client:rest-high-level', configuration: 'shadow') ``` You can instead use the much more normal: ``` testCompile "org.elasticsearch.client:elasticsearch-rest-high-level-client:${version}" ```	2018-08-21 20:03:28 -04:00
Jason Tedor	28d12b05b7	Move ML tests to be sub-projects of ML (#33026 ) This commit moves the ML QA tests to be a sub-project of ML. The purpose of this refactoring is to enable ML developers to run :x-pack:plugin:ml:check and run the vast majority of a ML tests with a single command (this still does not contain the ML REST tests, nor the upgrade tests). This simplifies local development for faster iteration.	2018-08-21 12:23:21 -04:00

32 Commits