OpenSearch

Commit Graph

Author	SHA1	Message	Date
Dimitris Athanasiou	60153c5433	[7.x][ML] Data frame analytics analysis stats (#53788 ) (#53844 ) Adds parsing and indexing of analysis instrumentation stats. The latest one is also returned from the get-stats API. Note that we chose to duplicate objects even where they are currently similar. There are already ideas on how these will diverge in the future and while the duplication looks ugly at the moment, it is the option that offers the highest flexibility. Backport of #53788	2020-03-20 12:11:53 +02:00
Christoph Büscher	d846ea43f4	Fix ReloadSynonymAnalyzerIT failure (#53663 ) (#53806 ) There is an assertion in ReloadAnalyzersResponse.merge that compares index names of merged responses that was falsely using object equality instead of String.equals(). In the past this didn't seem to matter but with changes in the test setup we started to see failures. Correcting this and also simplifying test a bit to be able to run it repeatedly if needed. Backport of #53663	2020-03-19 19:00:14 +01:00
Benjamin Trent	2ccb963f1d	Create GET _cat/transforms API Issue (#53643 ) (#53726 ) Adds new` _cat/transform` and `_cat/transform/{transform_id}` endpoints.	2020-03-18 10:45:28 -04:00
Alan Woodward	580bc40c0c	Make it possible to deprecate all variants of a ParseField with no replacement (#53722 ) Sometimes we want to deprecate and remove a ParseField entirely, without replacement; for example, the various places where we specify a _type field in 7x. Currently we can tell users only that a particular field name should not be used, and that another name should be used in its place. This commit adds the ability to say that a field should not be used at all.	2020-03-18 14:16:19 +00:00
Christoph Büscher	2384c1359d	Revert "Fix ReloadSynonymAnalyzerIT failure (#53663 )" This reverts commit `2c32173fce`.	2020-03-18 12:44:23 +01:00
Christoph Büscher	2c32173fce	Fix ReloadSynonymAnalyzerIT failure (#53663 ) There is an assertion in ReloadAnalyzersResponse.merge that compares index names of merged responses that was falsely using object equality instead of String.equals(). In the past this didn't seem to matter but with changes in the test setup we started to see failures. Correcting this and also simplifying test a bit to be able to run it repeatedly if needed. Closes #53443	2020-03-18 11:55:37 +01:00
Przemysław Witek	ec13c093df	Make ML index aliases hidden (#53160 ) (#53710 )	2020-03-18 10:28:45 +01:00
Hendrik Muhs	7a12300ce6	[7.x][Transform] enhance the output of preview to return full… (#53695 ) changes the output format of preview regarding deduced mappings and enhances it to return all the details about auto-index creation. This allows the user to customize the index creation. Using HLRC you can create a index request from the output of the response. backport #53572	2020-03-18 08:37:56 +01:00
David Kyle	2b635737e1	[ML] Parse single named object in config classes (#53472 ) (#53542 )	2020-03-17 13:59:52 +00:00
Yang Wang	7f21ade924	Explicitly require that derived API keys have no privileges (#53647 ) (#53648 ) The current implicit behaviour is that when an API keys is used to create another API key, the child key is created without any privilege. This implicit behaviour is surprising and is a source of confusion for users. This change makes that behaviour explicit.	2020-03-17 17:56:37 +11:00
Ryan Ernst	e7f38674ed	Add internalClusterTest to check task (#53444 ) This commit adds internalClusterTest in xpack core to run as part of check. This was accidentally removed in a refactoring. Other xpack modules already do this, but core was left out. This commit also mutes 2 tests that currently fail. closes #53407	2020-03-16 18:55:01 -07:00
Gordon Brown	880cc3ca7e	Hide I/SLM history aliases (#53564 ) This commit adjusts the aliases used for the ILM and SLM history indices to be hidden aliases. Also tweaks the configuration of the `IndexTemplateRegistry`s used by these history system to only upgrade the template from the master node, as documents are indexed from the master node, so the template version should only be upgraded from the master node.	2020-03-16 13:07:26 -06:00
markharwood	2c74f3e22c	Backport of new wildcard field type (#53590 ) * New wildcard field optimised for wildcard queries (#49993) Indexes values using size 3 ngrams and also stores the full original as a binary doc value. Wildcard queries operate by using a cheap approximation query on the ngram field followed up by a more expensive verification query using an automaton on the binary doc values. Also supports aggregations and sorting.	2020-03-16 15:07:13 +00:00
Przemysław Witek	376b2ae735	[7.x] Make classification evaluation metrics work when there is field mapping type mismatch (#53458 ) (#53601 )	2020-03-16 15:38:56 +01:00
Jim Ferenczi	e6680be0b1	Add new x-pack endpoints to track the progress of a search asynchronously (#49931 ) (#53591 ) This change introduces a new API in x-pack basic that allows to track the progress of a search. Users can submit an asynchronous search through a new endpoint called `_async_search` that works exactly the same as the `_search` endpoint but instead of blocking and returning the final response when available, it returns a response after a provided `wait_for_completion` time. ```` GET my_index_pattern/_async_search?wait_for_completion=100ms { "aggs": { "date_histogram": { "field": "@timestamp", "fixed_interval": "1h" } } } ```` If after 100ms the final response is not available, a `partial_response` is included in the body: ```` { "id": "9N3J1m4BgyzUDzqgC15b", "version": 1, "is_running": true, "is_partial": true, "response": { "_shards": { "total": 100, "successful": 5, "failed": 0 }, "total_hits": { "value": 1653433, "relation": "eq" }, "aggs": { ... } } } ```` The partial response contains the total number of requested shards, the number of shards that successfully returned and the number of shards that failed. It also contains the total hits as well as partial aggregations computed from the successful shards. To continue to monitor the progress of the search users can call the get `_async_search` API like the following: ```` GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms ```` That returns a new response that can contain the same partial response than the previous call if the search didn't progress, in such case the returned `version` should be the same. If new partial results are available, the version is incremented and the `partial_response` contains the updated progress. Finally if the response is fully available while or after waiting for completion, the `partial_response` is replaced by a `response` section that contains the usual _search response: ```` { "id": "9N3J1m4BgyzUDzqgC15b", "version": 10, "is_running": false, "response": { "is_partial": false, ... } } ```` Asynchronous search are stored in a restricted index called `.async-search` if they survive (still running) after the initial submit. Each request has a keep alive that defaults to 5 days but this value can be changed/updated any time: ````` GET my_index_pattern/_async_search?wait_for_completion=100ms&keep_alive=10d ````` The default can be changed when submitting the search, the example above raises the default value for the search to `10d`. ````` GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms&keep_alive=10d ````` The time to live for a specific search can be extended when getting the progress/result. In the example above we extend the keep alive to 10 more days. A background service that runs only on the node that holds the first primary shard of the `async-search` index is responsible for deleting the expired results. It runs every hour but the expiration is also checked by running queries (if they take longer than the keep_alive) and when getting a result. Like a normal `_search`, if the http channel that is used to submit a request is closed before getting a response, the search is automatically cancelled. Note that this behavior is only for the submit API, subsequent GET requests will not cancel if they are closed. Asynchronous search are not persistent, if the coordinator node crashes or is restarted during the search, the asynchronous search will stop. To know if the search is still running or not the response contains a field called `is_running` that indicates if the task is up or not. It is the responsibility of the user to resume an asynchronous search that didn't reach a final response by re-submitting the query. However final responses and failures are persisted in a system index that allows to retrieve a response even if the task finishes. ```` DELETE _async_search/9N3J1m4BgyzUDzqgC15b ```` The response is also not stored if the initial submit action returns a final response. This allows to not add any overhead to queries that completes within the initial `wait_for_completion`. The `.async-search` index is a restricted index (should be migrated to a system index in +8.0) that is accessible only through the async search APIs. These APIs also ensure that only the user that submitted the initial query can retrieve or delete the running search. Note that admins/superusers would still be able to cancel the search task through the task manager like any other tasks. Relates #49091 Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>	2020-03-16 15:31:27 +01:00
Dimitris Athanasiou	94da4ca3fc	[7.x][ML] Extend classification to support multiple classes (#53539 ) (#53597 ) Prepares classification analysis to support more than just two classes. It introduces a new parameter to the process config which dictates the `num_classes` to the process. It also changes the max classes limit to `30` provisionally. Backport of #53539	2020-03-16 15:00:54 +02:00
Tom Veasey	690099553c	[7.x][ML] Adds the class_assignment_objective parameter to classification (#53552 ) Adds a new parameter for classification that enables choosing whether to assign labels to maximise accuracy or to maximise the minimum class recall. Fixes #52427.	2020-03-13 17:35:51 +00:00
Tim Vernum	a8677499d7	[Backport] Add support for secondary authentication (#53530 ) This change makes it possible to send secondary authentication credentials to select endpoints that need to perform a single action in the context of two users. Typically this need arises when a server process needs to call an endpoint that users should not (or might not) have direct access to, but some part of that action must be performed using the logged-in user's identity. Backport of: #52093	2020-03-13 16:30:20 +11:00
Jay Modi	af36665b08	Deprecate the logstash enabled setting (#53487 ) The setting, `xpack.logstash.enabled`, exists to enable or disable the logstash extensions found within x-pack. In practice, this setting had no effect on the functionality of the extension. Given this, the setting is now deprecated in preparation for removal. Backport of #53367	2020-03-12 10:18:39 -06:00
Yannick Welsch	48124807d5	Fix SourceOnlySnapshotIT (#53462 ) The tests in this class had been failing for a while, but went unnoticed as not tested by CI (see #53442). The reason the tests fail is that the can-match phase is smarter now, and filters out access to a non-existing field. Closes #53442	2020-03-12 14:15:03 +01:00
Benjamin Trent	89668c5ea0	[ML][Inference] adds new default_field_map field to trained models (#53294 ) (#53419 ) Adds a new `default_field_map` field to trained model config objects. This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data. The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.	2020-03-11 13:49:39 -04:00
Przemysław Witek	8c4c19d310	Perform evaluation in multiple steps when necessary (#53295 ) (#53409 )	2020-03-11 15:36:38 +01:00
Dimitris Athanasiou	cc7751eb16	[7.x][ML] Add ILM policy to ml stats indices (#53349 ) (#53392 ) Adds a size based ILM policy to automatically rollover ml stats indices. Backport of #53349	2020-03-11 13:01:34 +02:00
Dimitris Athanasiou	0fd0516d0d	[7.x][ML] Rename data frame analytics maximum_number_trees to max_trees (#53300 ) (#53390 ) Deprecates `maximum_number_trees` parameter of classification and regression and replaces it with `max_trees`. Backport of #53300	2020-03-11 12:45:27 +02:00
David Roberts	532a720e1b	[ML] Skeleton estimate_model_memory endpoint for anomaly detection (#53386 ) This is a partial implementation of an endpoint for anomaly detector model memory estimation. It is not complete, lacking docs, HLRC and sensible numbers for many anomaly detector configurations. These will be added in a followup PR in time for 7.7 feature freeze. A skeleton endpoint is useful now because it allows work on the UI side of the change to commence. The skeleton endpoint handles the same cases that the old UI code used to handle, and produces very similar estimates for these cases. Backport of #53333	2020-03-11 10:20:00 +00:00
Jake Landis	2ab502afc4	[7.x] Remove dead 'beats' code (#53312 ) (#53376 )	2020-03-10 20:57:29 -05:00
Przemko Robakowski	847ac9c7d7	Fix null config in SnapshotLifecyclePolicy.toRequest (#53328 ) (#53355 ) This avoids NPE when executing SLM policy when no config was provided. Related to #44465 Closes #53171 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-03-10 20:44:30 +01:00
Przemysław Witek	d54d7f2be0	[7.x] Implement ILM policy for .ml-state* indices (#52356 ) (#53327 )	2020-03-10 14:24:18 +01:00
Hendrik Muhs	696aa4ddaf	[7.x][Transform] add support for script in group_by (#53167 ) (#53324 ) add the possibility to base the group_by on the output of a script. closes #43152 backport #53167	2020-03-10 11:12:58 +01:00
Cauê Marcondes	b68d7b1c33	giving kibana user privileges to create custom link index (#53221 ) (#53278 )	2020-03-10 09:50:38 +01:00
Henning Andersen	a4d481f2bb	ILM Freeze step retry when not acknowledged (#53287 ) A freeze operation can partially fail in multiple places, including the close verification step. This left the index in an unfrozen but partially closed state. Now throw an exception to retry the freeze step instead.	2020-03-10 08:03:39 +01:00
Jay Modi	a81460dbf5	Make watch history indices hidden (#52974 ) This commit updates the template used for watch history indices with the hidden index setting so that new indices will be created as hidden. Relates #50251 Backport of #52962	2020-03-06 09:47:03 -07:00
Dimitris Athanasiou	9abf537527	[7.x][ML] Improve DF analytics audits and logging (#53179 ) (#53218 ) Adds audits for when the job starts reindexing, loading data, analyzing, writing results. Also adds some info logging. Backport of #53179	2020-03-06 13:47:27 +02:00
Nik Everett	609c61f75c	Formalize usage stats for analytics (backport of #52966 ) (#53077 ) This moves the usage statistics gathering from the `AnalyticsPlugin` into an `AnalyicsUsage`, removing the static state. It also checks the license level when parsing all analytics aggregations. This is how we were checking them before but we did it in an easy to forget way. This way is slightly simpler, I think.	2020-03-04 10:29:11 -05:00
Adrien Grand	cb868d2f5e	Introduce a `constant_keyword` field. (#49713 ) (#53024 ) This field is a specialization of the `keyword` field for the case when all documents have the same value. It typically performs more efficiently than keywords at query time by figuring out whether all or none of the documents match at rewrite time, like `term` queries on `_index`. The name is up for discussion. I liked including `keyword` in it, so that we still have room for a `singleton_numeric` in the future. However I'm unsure whether to call it `singleton`, `constant` or something else, any opinions? For this field there is a choice between 1. accepting values in `_source` when they are equal to the value configured in mappings, but rejecting mapping updates 2. rejecting values in `_source` but then allowing updates to the value that is configured in the mapping This commit implements option 1, so that it is possible to reindex from/to an index that has the field mapped as a keyword with no changes to the source. Backport of #49713	2020-03-03 16:01:47 +01:00
Yang Wang	70814daa86	Allow _rollup_search with read privilege (#52043 ) (#53047 ) Currently _rollup_search requires manage privilege to access. It should really be a read only operation. This PR changes the requirement to be read indices privilege. Resolves: #50245	2020-03-03 22:29:54 +11:00
Hendrik Muhs	a328a8eaf1	[7.x][Transform] implement node.transform to control where to… (#52998 ) implement transform node attributes to disable transform on certain nodes and test which nodes are allowed to do remote connections closes #52200 closes #50033 closes #48734 backport #52712	2020-03-02 16:10:57 +01:00
Martijn van Groningen	d102158e6f	Improve closing mock webserver when failed to start (#52943 ) Fix NPE when closing a webserver that hasn't started correctly. This can happen when ssl context isn't initialized. The server instance is then never set, which causes an NPE that masks the actual failure. Example stacktrace that would mask an actual failure: ``` java.lang.NullPointerException at org.elasticsearch.test.http.MockWebServer.close(MockWebServer.java:271) at org.elasticsearch.xpack.watcher.test.integration.HttpSecretsIntegrationTests.cleanup(HttpSecretsIntegrationTests.java:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) ```	2020-03-02 07:19:08 +01:00
Dimitris Athanasiou	85b4e45093	[7.x]ML] Parse and report memory usage for DF Analytics (#52778 ) (#52980 ) Adds reporting of memory usage for data frame analytics jobs. This commit introduces a new index pattern `.ml-stats-*` whose first concrete index will be `.ml-stats-000001`. This index serves to store instrumentation information for those jobs. Backport of #52778 and #52958	2020-02-29 13:03:40 +02:00
Yang Wang	82553524af	Respect runas realm for ApiKey security operations (#52178 ) (#52932 ) When user A runs as user B and performs any API key related operations, user B's realm should always be used to associate with the API key. Currently user A's realm is used when getting or invalidating API keys and owner=true. The PR is to fix this bug. resolves: #51975	2020-02-28 10:53:52 +11:00
Benjamin Trent	19a6c5d980	[7.x] [ML][Inference] Add support for multi-value leaves to the tree model (#52531 ) (#52901 ) * [ML][Inference] Add support for multi-value leaves to the tree model (#52531) This adds support for multi-value leaves. This is a prerequisite for multi-class boosted tree classification.	2020-02-27 14:05:28 -05:00
Benjamin Trent	eac38e9847	[ML] Add indices_options to datafeed config and update (#52793 ) (#52905 ) This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index. This is necessary for the following use cases: - Reading from frozen indices - Allowing certain indices in multiple index patterns to not exist yet These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object. closes https://github.com/elastic/elasticsearch/issues/48056	2020-02-27 13:43:25 -05:00
Yang Wang	14c21aedd2	Simplify ml license checking with XpackLicenseState internals (#52684 ) (#52863 ) This change removes TrainedModelConfig#isAvailableWithLicense method with calls to XPackLicenseState#isAllowedByLicense. Please note there are subtle changes to the code logic. But they are the right changes: * Instead of Platinum license, Enterprise license nows guarantees availability. * No explicit check when the license requirement is basic. Since basic license is always available, this check is unnecessary. * Trial license is always allowed.	2020-02-27 14:14:16 +11:00
Yang Wang	f5c4e92558	Refactor license checking (#52118 ) (#52859 ) Improve code resuse and readility. Add convenience checking method which covers most use cases without having to pass many boolean arguments.	2020-02-27 13:04:19 +11:00
Adrien Grand	1807f86751	Generalize how queries on `_index` are handled at rewrite time (#52815 ) Generalize how queries on `_index` are handled at rewrite time (#52486) Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder. This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed. Closes #49254	2020-02-26 15:37:43 +01:00
Tim Brooks	6669e53f08	Do not lock on reads of XPackLicenseState (#52492 ) XPackLicenseState reads to necessary to validate a number of cluster operations. This reads occasionally occur on transport threads which should not be blocked. Currently we sychronize when reading. However, this is unecessary as only a single piece of state is updateable. This commit makes this state volatile and removes the locking.	2020-02-25 15:38:35 -07:00
David Kyle	044a4e127a	[ML] Add reason to DataFrameAnalyticsTask setFailed log message (#52659 ) (#52707 )	2020-02-24 15:21:51 +00:00
Yang Wang	7cefba78c5	License removal leads back to a basic license (#52407 ) (#52683 ) A new basic license will be generated when existing license is deleted. In addition, deleting an existing basic license is a no-op. Resolves: #45022	2020-02-24 11:02:40 +11:00
Jason Tedor	1685cbe504	Add messages for CCR on license state changes (#52470 ) When a license expires, or license state changes, functionality might be disabled. This commit adds messages for CCR to inform users that CCR functionality will be disabled when a license expires, or when license state changes to a license level lower than trial/platinum/enterprise.	2020-02-22 09:09:42 -05:00
Benjamin Trent	afd90647c9	[ML] Adds feature importance to option to inference processor (#52218 ) (#52666 ) This adds machine learning model feature importance calculations to the inference processor. The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values` Example: ``` "inference": { "field_mappings": {}, "model_id": "my_model", "inference_config": { "regression": { "num_top_feature_importance_values": 3 } } } ``` This will write to the document as follows: ``` "inference" : { "feature_importance" : { "FlightTimeMin" : -76.90955548511226, "FlightDelayType" : 114.13514762158526, "DistanceMiles" : 13.731580450792187 }, "predicted_value" : 108.33165831875137, "model_id" : "my_model" } ``` This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888). It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7. Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded. NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc usability blocked by: https://github.com/elastic/ml-cpp/pull/991	2020-02-21 18:42:31 -05:00

1 2 3 4 5 ...

1665 Commits