OpenSearch

Commit Graph

Author	SHA1	Message	Date
Benjamin Trent	89668c5ea0	[ML][Inference] adds new default_field_map field to trained models (#53294 ) (#53419 ) Adds a new `default_field_map` field to trained model config objects. This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data. The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.	2020-03-11 13:49:39 -04:00
Dimitris Athanasiou	0fd0516d0d	[7.x][ML] Rename data frame analytics maximum_number_trees to max_trees (#53300 ) (#53390 ) Deprecates `maximum_number_trees` parameter of classification and regression and replaces it with `max_trees`. Backport of #53300	2020-03-11 12:45:27 +02:00
Hendrik Muhs	696aa4ddaf	[7.x][Transform] add support for script in group_by (#53167 ) (#53324 ) add the possibility to base the group_by on the output of a script. closes #43152 backport #53167	2020-03-10 11:12:58 +01:00
Christoph Büscher	9e561c2921	Fix AbstractBulkByScrollRequest slices parameter via Rest (#53068 ) Currently the AbstractBulkByScrollRequest accepts slice values of 0 via its `setSlices` method, denoting the "auto" slicing behaviour that is usable by settting the "slices=auto" parameter on rest requests. When using the High Level Rest Client, however, we send the 0 value as an integer, which is then rejected as invalid by `AbstractBulkByScrollRequest#parseSlices`. Instead of making parsing of the rest request more lenient, this PR opts for changing the RequestConverter logic in the client to translate 0 values to "auto" on the rest requests. Closes #53044	2020-03-06 15:38:04 +01:00
Aleksandr Maus	2dc872f052	EQL: Add HLRC for EQL stats (#53043 ) (#53148 )	2020-03-05 09:20:38 -05:00
Nik Everett	28df7ae5ed	Support multiple metrics in `top_metrics` agg (backport of #52965 ) (#53163 ) This adds support for returning multiple metrics to the `top_metrics` agg. It looks like: ``` POST /test/_search?filter_path=aggregations { "aggs": { "tm": { "top_metrics": { "metrics": [ {"field": "v"}, {"field": "m"} ], "sort": {"s": "desc"} } } } } ```	2020-03-05 08:12:01 -05:00
Aleksandr Maus	b47bffba24	EQL: consistent naming for event type vs event category (#53073 ) (#53090 ) Related to https://github.com/elastic/elasticsearch/issues/52941	2020-03-04 08:02:38 -05:00
Costin Leau	712e0c05cd	EQL: Add implicit ordering on timestamp (#53004 ) QL: Move Sort base class from SQL to QL (cherry picked from commit 798015b7bbd565e9c4222724614baeb432c7c2b3)	2020-03-02 22:41:36 +02:00
Aleksandr Maus	89ed857c79	EQL: Change request parameter query to filter and rule to query (#52971 ) (#53006 ) Related to https://github.com/elastic/elasticsearch/issues/52911	2020-03-02 09:26:23 -05:00
Dimitris Athanasiou	85b4e45093	[7.x]ML] Parse and report memory usage for DF Analytics (#52778 ) (#52980 ) Adds reporting of memory usage for data frame analytics jobs. This commit introduces a new index pattern `.ml-stats-*` whose first concrete index will be `.ml-stats-000001`. This index serves to store instrumentation information for those jobs. Backport of #52778 and #52958	2020-02-29 13:03:40 +02:00
Dan Hermann	dd44376d27	[7.x] Send the fields param in body instead of URL params (#52948 )	2020-02-28 08:57:35 -06:00
Costin Leau	a674085903	EQL: Disable field extraction for returned events (#52884 ) Return the whole source of matching events (cherry picked from commit 79ca586ab1d89d645fb58142b82202f14ce5d361)	2020-02-28 13:48:15 +02:00
Nik Everett	1d1956ee93	Add size support to `top_metrics` (backport of #52662 ) (#52914 ) This adds support for returning the top "n" metrics instead of just the very top. Relates to #51813	2020-02-27 16:12:52 -05:00
Benjamin Trent	19a6c5d980	[7.x] [ML][Inference] Add support for multi-value leaves to the tree model (#52531 ) (#52901 ) * [ML][Inference] Add support for multi-value leaves to the tree model (#52531) This adds support for multi-value leaves. This is a prerequisite for multi-class boosted tree classification.	2020-02-27 14:05:28 -05:00
Benjamin Trent	eac38e9847	[ML] Add indices_options to datafeed config and update (#52793 ) (#52905 ) This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index. This is necessary for the following use cases: - Reading from frozen indices - Allowing certain indices in multiple index patterns to not exist yet These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object. closes https://github.com/elastic/elasticsearch/issues/48056	2020-02-27 13:43:25 -05:00
Josh Devins	68ba571f70	Adds recall@k metric to rank eval API (#52889 ) This change adds the recall@k metric and refactors precision@k to match the new metric. Recall@k is an important metric to use for learning to rank (LTR) use-cases. Candidate generation or first ranking phase ranking functions are often optimized for high recall, in order to generate as many relevant candidates in the top-k as possible for a second phase of ranking. Adding this metric allows tuning that base query for LTR. See: https://github.com/elastic/elasticsearch/issues/51676 Backports: https://github.com/elastic/elasticsearch/pull/52577	2020-02-27 16:04:24 +01:00
Costin Leau	40bc06f6ad	EQL: Hook engine to Elasticsearch (#52828 ) Add query execution and return actual results returned from Elasticsearch inside the tests (cherry picked from commit 3e039282bf991af87604a6d4f8eada19d5e33842)	2020-02-27 11:22:22 +02:00
Jake Landis	8d311297ca	[7.x] Smarter copying of the rest specs and tests (#52114 ) (#52798 ) * Smarter copying of the rest specs and tests (#52114) This PR addresses the unnecessary copying of the rest specs and allows for better semantics for which specs and tests are copied. By default the rest specs will get copied if the project applies `elasticsearch.standalone-rest-test` or `esplugin` and the project has rest tests or you configure the custom extension `restResources`. This PR also removes the need for dozens of places where the x-pack specs were copied by supporting copying of the x-pack rest specs too. The plugin/task introduced here can also copy the rest tests to the local project through a similar configuration. The new plugin/task allows a user to minimize the surface area of which rest specs are copied. Per project can be configured to include only a subset of the specs (or tests). Configuring a project to only copy the specs when actually needed should help with build cache hit rates since we can better define what is actually in use. However, project level optimizations for build cache hit rates are not included with this PR. Also, with this PR you can no longer use the includePackaged flag on integTest task. The following items are included in this PR: * new plugin: `elasticsearch.rest-resources` * new tasks: CopyRestApiTask and CopyRestTestsTask - performs the copy * new extension 'restResources' ``` restResources { restApi { includeCore 'foo' , 'bar' //will include the core specs that start with foo and bar includeXpack 'baz' //will include x-pack specs that start with baz } restTests { includeCore 'foo', 'bar' //will include the core tests that start with foo and bar includeXpack 'baz' //will include the x-pack tests that start with baz } } ```	2020-02-26 08:13:41 -06:00
Sachin Frayne	d3c0a2f013	Improve the error message when loading text fielddata. (#52753 ) Emphasize keyword over fielddata as the preferred way to use String fields for aggregations or sorting.	2020-02-25 15:45:44 -08:00
Aleksandr Maus	a7bdb0b456	EQL: Add integration tests harness to test EQL feature parity with original implementation (#52248 ) (#52675 ) The tests use the original test queries from https://github.com/endgameinc/eql/blob/master/eql/etc/test_queries.toml for EQL implementation correctness validation. The file test_queries_unsupported.toml serves as a "blacklist" for the queries that we do not support. Currently all of the queries are blacklisted. Over the time the expectation is to eventually have an empty "blacklist" when all of the queries are fully supported. The tests use the original test vector from https://raw.githubusercontent.com/endgameinc/eql/master/eql/etc/test_data.json. Only one EQL and the response is stubbed for now to match the expected output from that query. This part would need some tweaking after EQL is fully wired. Related to https://github.com/elastic/elasticsearch/issues/49581	2020-02-24 12:46:59 -05:00
Przemko Robakowski	e72cb79476	Add docs for errors in GetAlias API (#51850 ) (#52716 ) Closes #31499 Co-authored-by: Maxim <timonin.maksim@mail.ru>	2020-02-24 18:22:09 +01:00
Igor Motov	e5b21a3fc6	Add HLRC for EQL search (#52550 ) Adds EQL HLRC client with the search method. Relates to #51961	2020-02-21 08:44:08 -05:00
David Kyle	7bbe5c8464	[Ml] Validate tree feature index is within range (#52514 ) This changes the tree validation code to ensure no node in the tree has a feature index that is beyond the bounds of the feature_names array. Specifically this handles the situation where the C++ emits a tree containing a single node and an empty feature_names list. This is valid tree used to centre the data in the ensemble but the validation code would reject this as feature_names is empty. This meant a broken workflow as you cannot GET the model and PUT it back	2020-02-19 14:41:43 +00:00
Nik Everett	146def8caa	Implement top_metrics agg (#51155 ) (#52366 ) The `top_metrics` agg is kind of like `top_hits` but it only works on doc values so it should be faster. At this point it is fairly limited in that it only supports a single, numeric sort and a single, numeric metric. And it only fetches the "very topest" document worth of metric. We plan to support returning a configurable number of top metrics, requesting more than one metric and more than one sort. And, eventually, non-numeric sorts and metrics. The trick is doing those things fairly efficiently. Co-Authored by: Zachary Tong <zach@elastic.co>	2020-02-14 11:19:11 -05:00
Nik Everett	2dac36de4d	HLRC support for string_stats (#52163 ) (#52297 ) This adds a builder and parsed results for the `string_stats` aggregation directly to the high level rest client. Without this the HLRC can't access the `string_stats` API without the elastic licensed `analytics` module. While I'm in there this adds a few of our usual unit tests and modernizes the parsing.	2020-02-12 19:25:05 -05:00
David Roberts	1cefafdd14	[ML] Add new categorization stats to model_size_stats (#52009 ) This change adds support for the following new model_size_stats fields: - categorized_doc_count - total_category_count - frequent_category_count - rare_category_count - dead_category_count - categorization_status Backport of #51879	2020-02-10 09:10:50 +00:00
Martijn van Groningen	44ea1efd26	Tidy up GetSourceRequest class: (#51916 ) * No need to implement ToXContentObject * Made index and id fields immutable.	2020-02-10 09:42:03 +01:00
Benjamin Trent	c6111eb90e	[ML][Inference] adding number_samples to TreeNode (#51937 ) (#52060 ) in preparation for feature importance and split information gain, adding `number_samples` field to `TreeNode` definition. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-07 17:04:58 -05:00
David Kyle	8f10a7c6ca	[ML] Make Ensemble feature names optional (#51996 ) The featureNames field is requisite in individual models but is not required by the Ensemble.	2020-02-07 10:08:37 +00:00
Jason Tedor	25daf5f1e1	Add autoscaling API skelton (#51564 ) The main purpose of this commit is to add a single autoscaling REST endpoint skeleton, for the purpose of starting to build out the build and testing infrastructure that will surround it. For example, rather than commiting a fully-functioning autoscaling API, we introduce here the skeleton so that we can start wiring up the build and testing infrastructure, establish security roles/permissions, an so on. This way, in a forthcoming PR that introduces actual functionality, that PR will be smaller and have less distractions around that sort of infrastructure.	2020-02-06 21:55:01 -05:00
Ioannis Kakavas	5092d3098d	Cleanup test user in HLRC test (#49477 ) (#51942 ) SecurityIT.testGetUser creates a user for testing purposes, but did not delete the user at the end of the test. This could leave the cluster in an unexpected state for other tests. This commit: - Deletes the user at the end of `testGetUser` - Adds the test-name as metadata to the users that are created in `SecurityIT` so that their origin is clear if they do interfere with other tests - Enables SecurityDocumentationIT.testGetUsers on the expectation that the new cleanup step will resolve the unreliability of that test. Relates: #48440 Co-authored-by: Tim Vernum <tim@adjective.org>	2020-02-06 13:05:09 +02:00
Martijn van Groningen	0610eb51ef	Change HLRC SourceExists to use GetSourceRequest instead of GetRequest (#51789 ) (#51913 ) Originates from #50885 Co-authored-by: Maxim <timonin.maksim@mail.ru>	2020-02-05 13:27:31 +01:00
Julie Tibshirani	38ce428831	Create a class to hold field capabilities for one index. (#51844 ) Currently, the same class `FieldCapabilities` is used both to represent the capabilities for one index, and also the merged capabilities across indices. To help clarify the logic, this PR proposes to create a separate class `IndexFieldCapabilities` for the capabilities in one index. The refactor will also help when adding `source_path` information in #49264, since the merged source path field will have a different structure from the field for a single index. Individual changes: * Add a new class IndexFieldCapabilities. * Remove extra constructor from FieldCapabilities. * Combine the add and merge methods in FieldCapabilities.Builder.	2020-02-04 11:24:57 -08:00
James Rodewig	4ea7297e1e	[DOCS] Change http://elastic.co -> https (#48479 ) (#51812 ) Co-authored-by: Jonathan Budzenski <jon@budzenski.me>	2020-02-03 09:50:11 -05:00
Ryan Ernst	21224caeaf	Remove comparison to true for booleans (#51723 ) While we use `== false` as a more visible form of boolean negation (instead of `!`), the true case is implied and the true value does not need to explicitly checked. This commit converts cases that have slipped into the code checking for `== true`.	2020-01-31 16:35:43 -08:00
Lee Hinman	b9faa0733d	[7.x] Rename ILM history index enablement setting (#51698 ) (#51705 ) * Rename ILM history index enablement setting The previous setting was `index.lifecycle.history_index_enabled`, this commit changes it to `indices.lifecycle.history_index_enabled` to indicate this is not an index-level setting (it's node level).	2020-01-30 15:27:44 -07:00
Benjamin Trent	1380dd439a	[7.x] [ML][Inference] Fix weighted mode definition (#51648 ) (#51695 ) * [ML][Inference] Fix weighted mode definition (#51648) Weighted mode inaccurately assumed that the "max value" of the input values would be the maximum class value. This does not make sense. Weighted Mode should know how many classes there are. Hence the new parameter `num_classes`. This indicates what the maximum class value to be expected.	2020-01-30 15:33:25 -05:00
Hendrik Muhs	b5106aa59d	dump audit index to logs for better debugging (#51627 ) The audit index is re-created for every testrun and therefore potential useful debug information gets lost. This change reads out the audit index and logs the results, which makes them available for debugging CI issues. relates #51549	2020-01-30 11:14:56 +01:00
Albert Zaharovits	f25b6cc2eb	Add new 'maintenance' index privilege #50643 This commit creates a new index privilege named `maintenance`. The privilege grants the following actions: `refresh`, `flush` (also synced-`flush`), and `force-merge`. Previously the actions were only under the `manage` privilege which in some situations was too permissive. Co-authored-by: Amir H Movahed <arhd83@gmail.com>	2020-01-30 11:59:11 +02:00
Benjamin Trent	fc994d9ce1	[ML][Inference] Adds validations for model PUT (#51376 ) (#51409 ) Adds validations making sure that * `input.field_names` is not empty * `ensemble.trained_models` is not empty * `tree.feature_names` is not empty closes https://github.com/elastic/elasticsearch/issues/51354	2020-01-24 09:29:12 -05:00
Benjamin Trent	76660a5a4f	[7.x] [ML][Inference] add tags url param to GET (#51330 ) (#51404 ) * [ML][Inference] add tags url param to GET (#51330) Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint. This parameter allows the list of models to be further reduced to those who contain all the provided tags.	2020-01-24 08:26:58 -05:00
Nhat Nguyen	072203cba8	Clean soft-deletes setting in ccr tests (#51113 ) (#51372 ) We no longer need to explicitly enable soft-deletes in CCR tests. Relates #50775 Backport of #51113	2020-01-23 16:31:47 -05:00
Martijn van Groningen	0a8d8d7ae3	Add Get Source API to the HLRC (#51342 ) Backport to 7.x branch of #50885. Relates to #47678 Co-authored-by: Maxim <timonin.maksim@mail.ru>	2020-01-23 13:16:20 +01:00
Tim Vernum	e41c0b1224	Deprecating kibana_user and kibana_dashboard_only_user roles (#50963 ) This change adds a new `kibana_admin` role, and deprecates the old `kibana_user` and`kibana_dashboard_only_user`roles. The deprecation is implemented via a new reserved metadata attribute, which can be consumed from the API and also triggers deprecation logging when used (by a user authenticating to Elasticsearch). Some docs have been updated to avoid references to these deprecated roles. Backport of: #46456 Co-authored-by: Larry Gregory <lgregorydev@gmail.com>	2020-01-15 11:07:19 +11:00
Benjamin Trent	72c270946f	[ML][Inference] Adding classification_weights to ensemble models (#50874 ) (#50994 ) * [ML][Inference] Adding classification_weights to ensemble models classification_weights are a way to allow models to prefer specific classification results over others this might be advantageous if classification value probabilities are a known quantity and can improve model error rates.	2020-01-14 12:40:25 -05:00
Dimitris Athanasiou	1d8cb3c741	[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914 ) (#50976 ) Adds a new parameter to regression and classification that enables computation of importance for the top most important features. The computation of the importance is based on SHAP (SHapley Additive exPlanations) method. Backport of #50914	2020-01-14 16:46:09 +02:00
Nhat Nguyen	fb32a55dd5	Deprecate synced flush (#50835 ) A normal flush has the same effect as a synced flush on Elasticsearch 7.6 or later. It's deprecated in 7.6 and will be removed in 8.0. Relates #50776	2020-01-13 19:54:38 -05:00
Nhat Nguyen	05f97d5e1b	Revert "Deprecate synced flush (#50835 )" This reverts commit `1a32d7142a`.	2020-01-13 11:41:03 -05:00
Nhat Nguyen	1a32d7142a	Deprecate synced flush (#50835 ) A normal flush has the same effect as a synced flush on Elasticsearch 7.6 or later. It's deprecated in 7.6 and will be removed in 8.0. Relates #50776	2020-01-13 10:58:29 -05:00
Benjamin Trent	fa116a6d26	[7.x] [ML][Inference] PUT API (#50852 ) (#50887 ) * [ML][Inference] PUT API (#50852) This adds the `PUT` API for creating trained models that support our format. This includes * HLRC change for the API * API creation * Validations of model format and call * fixing backport	2020-01-12 10:59:11 -05:00

1 2 3 4 5 ...

1088 Commits