OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-03-09 14:34:43 +00:00

Author	SHA1	Message	Date
Armin Braun	cec02da0ac	Fix Source Only Snapshot REST Test Failure (#50456 ) (#50459 ) We are matching on the exact number of shards in this test, but may run into snapshotting more than the single index created in it due to auto-created indices like `.watcher`. Fixed by making the test only take a snapshot of the single index used by this test. Closes #50450	2019-12-23 12:24:08 +01:00
Benjamin Trent	71ff330c4e	[ML][Inference] updates specs with new params + docs (#50373 ) (#50441 )	2019-12-20 12:13:45 -05:00
Przemysław Witek	3e3a93002f	[7.x] Fix accuracy metric (#50310 ) (#50433 )	2019-12-20 15:34:38 +01:00
Hendrik Muhs	de14092ad2	[Transform] refactor source and dest validation to support CCS (#50018 ) refactors source and dest validation, adds support for CCS, makes resolve work like reindex/search, allow aliased dest index with a single write index. fixes #49988 fixes #49851 relates #43201	2019-12-20 10:49:53 +01:00
Przemysław Witek	cc4bc797f9	[7.x] Implement `precision` and `recall` metrics for classification evaluation (#49671 ) (#50378 )	2019-12-19 18:55:05 +01:00
Tim Vernum	47e5e34f42	Support "enterprise" license types (#49474 ) This adds "enterprise" as an acceptable type for a license loaded through the PUT _license API. Internally an enterprise license is treated as having a "platinum" operating mode. The handling of License types was refactored to have a new explicit "LicenseType" enum in addition to the existing "OperatingMode" enum. By default (in 7.x) the GET license API will return "platinum" when an enterprise license is active in order to be compatible with existing consumers of that API. A new "accept_enterprise" flag has been introduced to allow clients to opt-in to receive the correct "enterprise" type. Backport of: #49223	2019-12-12 14:37:44 +11:00
Julie Tibshirani	277880bb4f	In sparse vector REST tests, specify the index name in searches. (#50061 ) The `sparse_vector` REST tests occasionally fail on 7.x because we don't receive the expected response headers with deprecation warnings. One theory as to what is happening is that there is an extra empty index present in addition to the test index. Since the search doesn't specify an index name, it hits both the test index and this extra empty index and shard responses from the extra index don't produce deprecation warnings. If not all shard responses contain the warning headers, then certain deprecation warnings can be lost (due to the bug described in #33936). This PR tries to harden the `sparse_vector` tests by always specifying the index name during a search. This doesn't fix the root causes of the issue, but is good practice and can help avoid intermittent failures. Addresses #49383.	2019-12-11 10:33:47 -08:00
Stuart Cam	44cd2f444c	Add the REST API specifications for SLM Status / Start / Stop endpoints. (#49759 ) Was originally missed in PR #47710 (cherry picked from commit 133b34c8355639ae0f699a86ffd9f37d19f73bca)	2019-12-11 13:34:13 +11:00
Dimitris Athanasiou	8891f4db88	[7.x][ML] Introduce randomize_seed setting for regression and classification (#49990 ) (#50023 ) This adds a new `randomize_seed` for regression and classification. When not explicitly set, the seed is randomly generated. One can reuse the seed in a similar job in order to ensure the same docs are picked for training. Backport of #49990	2019-12-10 15:29:19 +02:00
Dimitris Athanasiou	4edb2e7bb6	[7.x][ML] Add optional source filtering during data frame reindexing (#49690 ) (#49718 ) This adds a `_source` setting under the `source` setting of a data frame analytics config. The new `_source` is reusing the structure of a `FetchSourceContext` like `analyzed_fields` does. Specifying includes and excludes for source allows selecting which fields will get reindexed and will be available in the destination index. Closes #49531 Backport of #49690	2019-11-29 16:10:44 +02:00
Jim Ferenczi	496bb9e2ee	Add a listener to track the progress of a search request locally (#49471 ) (#49691 ) This commit adds a function in NodeClient that allows to track the progress of a search request locally. Progress is tracked through a SearchProgressListener that exposes query and fetch responses as well as partial and final reduces. This new method can be used by modules/plugins inside a node in order to track the progress of a local search request. Relates #49091	2019-11-28 18:23:09 +01:00
David Roberts	62811c2272	[ML] Add default categorization analyzer definition to ML info (#49545 ) The categorization job wizard in the ML UI will use this information when showing the effect of the chosen categorization analyzer on a sample of input.	2019-11-25 13:39:16 +00:00
Dimitris Athanasiou	8eaee7cbdc	[7.x][ML] Explain data frame analytics API (#49455 ) (#49504 ) This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why. Backport of #49455	2019-11-22 22:06:10 +02:00
Przemysław Witek	c7ac2011eb	[7.x] Implement accuracy metric for multiclass classification (#47772 ) (#49430 )	2019-11-21 15:01:18 +01:00
Przemysław Witek	38aec2e298	Relax assertions related to datafeed timing stats in .yml test (#49285 ) (#49291 )	2019-11-19 12:50:14 +01:00
Julie Tibshirani	a0ee6c8f7e	Add telemetry for flattened fields. (#48972 ) (#49125 ) Currently we just record the number of flattened fields defined in the mappings.	2019-11-18 12:29:42 -08:00
Benjamin Trent	eefe7688ce	[7.x][ML] ML Model Inference Ingest Processor (#49052 ) (#49257 ) * [ML] ML Model Inference Ingest Processor (#49052) * [ML][Inference] adds lazy model loader and inference (#47410) This adds a couple of things: - A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them - A Model class and its first sub-class LocalModel. Used to cache model information and run inference. - Transport action and handler for requests to infer against a local model Related Feature PRs: * [ML][Inference] Adjust inference configuration option API (#47812) * [ML][Inference] adds logistic_regression output aggregator (#48075) * [ML][Inference] Adding read/del trained models (#47882) * [ML][Inference] Adding inference ingest processor (#47859) * [ML][Inference] fixing classification inference for ensemble (#48463) * [ML][Inference] Adding model memory estimations (#48323) * [ML][Inference] adding more options to inference processor (#48545) * [ML][Inference] handle string values better in feature extraction (#48584) * [ML][Inference] Adding _stats endpoint for inference (#48492) * [ML][Inference] add inference processors and trained models to usage (#47869) * [ML][Inference] add new flag for optionally including model definition (#48718) * [ML][Inference] adding license checks (#49056) * [ML][Inference] Adding memory and compute estimates to inference (#48955) * fixing version of indexed docs for model inference	2019-11-18 13:19:17 -05:00
Przemysław Witek	150db2b544	Throw an exception when memory usage estimation endpoint encounters empty data frame. (#49143 ) (#49164 )	2019-11-18 07:52:57 +01:00
Mayya Sharipova	0e933a093d	Add index name to search requests (#49175 ) We can't guarantee expected request failures if search request is across many indexes, as if expected shards fail, some indexes may return 200. closes #47743	2019-11-15 16:39:18 -05:00
Yogesh Gaikwad	1b64c1992a	Add owner flag parameter to the rest spec (#48500 ) This commit adds missing info about newly added `owner` flag to the rest spec, also adds a rest test for the same. Closes#48499	2019-10-30 13:07:01 +11:00
Julie Tibshirani	89c65752dc	Update the signature of vector script functions. (#48653 ) Previously the functions accepted a doc values reference, whereas they now accept the name of the vector field. Here's an example of how a vector function was called before and after the change. ``` Before: cosineSimilarity(params.query_vector, doc['field']) After: cosineSimilarity(params.query_vector, 'field') ``` This seems more intuitive, since we don't allow direct access to vector doc values and the the meaning of `doc['field']` is unclear. The PR makes the following changes (broken into distinct commits): * Add new function signatures of the form `function(params.query_vector, 'field')` and deprecates the old ones. Because Painless doesn't allow two methods with the same name and number of arguments, we allow a generic `Object` to be passed in to the function and decide on the behavior through an `instanceof` check. * Refactor the class bindings so that the document field is passed to the constructor instead of the instance method. This allows us to avoid retrieving the vector doc values on every function invocation, which gives a tiny speed-up in benchmarks. Note that this PR adds new signatures for the sparse vector functions too, even though sparse vectors are deprecated. It seemed simplest to understand (for both us and users) to keep everything symmetric between dense and sparse vectors.	2019-10-29 15:46:05 -07:00
Benjamin Trent	6ea59dd428	[ML][Transforms] add wait_for_checkpoint flag to stop (#47935 ) (#48591 ) Adds `wait_for_checkpoint` for `_stop` API.	2019-10-28 13:02:57 -04:00
Russ Cam	b24bbd4296	Change policy_id to list type in slm.get_lifecycle (#47766 ) This commit changes the REST API spec slm.get_lifecycle's policy_id url part to be of type "list", in line with other REST API specs that accept a comma-separated list of values. Closes #47765	2019-10-25 09:04:25 +10:00
Julie Tibshirani	2664cbd20b	Deprecate the sparse_vector field type. (#48368 ) We have not seen much adoption of this experimental field type, and don't see a clear use case as it's currently designed. This PR deprecates the field type in 7.x. It will be removed from 8.0 in a follow-up PR.	2019-10-23 16:35:03 -07:00
Igor Motov	8163e0a9e5	Mute XPackRestIT security/authz/14_cat_indices Mutes "Test empty request while single authorized closed index" Tracked by #47875	2019-10-23 14:17:44 -04:00
Przemysław Witek	2db2b945ec	[7.x] Change format of MulticlassConfusionMatrix result to be more self-explanatory (#48174 ) (#48294 )	2019-10-21 22:07:19 +02:00
Przemysław Witek	1a42e37070	[7.x] Default "prediction_field_name" to (dependent_variable + "_prediction") (#48232 ) (#48279 )	2019-10-21 13:18:08 +02:00
Albert Zaharovits	69fc715bc3	Fix security origin for TokenService#findActiveTokensFor... (#47418 ) (#48280 ) All internal searches (triggered by APIs) across the .security index must be performed while "under the security origin". Otherwise, the search is performed in the context of the caller which most likely does not have privileges to search .security (hopefully). This commit fixes this in the case of two methods in the TokenService and corrects an overly done such context switch in the ApiKeyService. In addition, this makes all tests from the client/rest-high-level module execute as an all mighty administrator, but not a literal superuser. Closes #47151	2019-10-21 13:15:05 +03:00
Przemysław Witek	28f68fa221	Make num_top_classes parameter's default value equal to 2 (#48119 ) (#48201 )	2019-10-17 18:43:15 +02:00
Martijn van Groningen	a5fe69c344	Include enrich into the info api as feature (#48157 ) This commit also fixes a bug, the enrich enabled setting was not included in the list of settings. Backport of #48109	2019-10-17 09:51:32 +02:00
Martijn van Groningen	aff0c9babc	This commits merges (#48040 ) the enrich-7.x feature branch, which is backport merge and adds a new ingest processor, named enrich processor, that allows document being ingested to be enriched with data from other indices. Besides a new enrich processor, this PR adds several APIs to manage an enrich policy. An enrich policy is in charge of making the data from other indices available to the enrich processor in an efficient manner. Related to #32789	2019-10-15 17:31:45 +02:00
David Roberts	984323783e	[ML][7.x] Add lazy assignment job config option (#47993 ) This change adds: - A new option, allow_lazy_open, to anomaly detection jobs - A new option, allow_lazy_start, to data frame analytics jobs Both work in the same way: they allow a job to be opened/started even if no ML node exists that can accommodate the job immediately. In this situation the job waits in the opening/starting state until ML node capacity is available. (The starting state for data frame analytics jobs is new in this change.) Additionally, the ML nightly maintenance tasks now creates audit warnings for ML jobs that are unassigned. This means that jobs that cannot be assigned to an ML node for a very long time will show a yellow warning triangle in the UI. A final change is that it is now possible to close a job that is not assigned to a node without using force. This is because previously jobs that were open but not assigned to a node were an aberration, whereas after this change they'll be relatively common.	2019-10-15 06:55:11 +01:00
Martijn van Groningen	cc4b6c43b3	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-15 07:23:47 +02:00
James Baiera	18d7e32b7d	Add wait for completion for Enrich policy execution (#47886 ) This PR adds the ability to run the enrich policy execution task in the background, returning a task id instead of waiting for the completed operation.	2019-10-14 16:05:28 -04:00
David Roberts	1ca25bed38	[ML][7.x] Add option to stop datafeed that finds no data (#47995 ) Adds a new datafeed config option, max_empty_searches, that tells a datafeed that has never found any data to stop itself and close its associated job after a certain number of real-time searches have returned no data. Backport of #47922	2019-10-14 17:19:13 +01:00
Martijn van Groningen	d4901a71d7	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-14 10:27:17 +02:00
Tanguy Leroux	742fa818b8	Add Pause/Resume Auto Follower APIs (#47510 ) (#47904 ) This commit adds two APIs that allow to pause and resume CCR auto-follower patterns: // pause auto-follower POST /_ccr/auto_follow/my_pattern/pause // resume auto-follower POST /_ccr/auto_follow/my_pattern/resume The ability to pause and resume auto-follow patterns can be useful in some situations, including the rolling upgrades of cluster using a bi-directional cross-cluster replication scheme (see #46665). This commit adds a new active flag to the AutoFollowPattern and adapts the AutoCoordinator and AutoFollower classes so that it stops to fetch remote's cluster state when all auto-follow patterns associate to the remote cluster are paused. When an auto-follower is paused, remote indices that match the pattern are just ignored: they are not added to the pattern's followed indices uids list that is maintained in the local cluster state. This way, when the auto-follow pattern is resumed the indices created in the remote cluster in the meantime will be picked up again and added as new following indices. Indices created and then deleted in the remote cluster will be ignored as they won't be seen at all by the auto-follower pattern at resume time. Backport of #47510 for 7.x	2019-10-13 09:22:51 +02:00
Benjamin Trent	627faf1850	[7.x] [ML][Analytics] fix bug where regression deleted early does not delete state (#47885 ) (#47914 ) * [ML][Analytics] fix bug where regression deleted early does not delete state (#47885) * [ML][Analytics] fix bug where regression deleted early does not delete state * Fixing ml with security test failure * fixing for older java	2019-10-11 15:11:16 -04:00
Przemysław Witek	c62fe8c344	Require that the dependent variable column has at most 2 distinct values in classfication analysis. (#47858 ) (#47906 )	2019-10-11 14:57:08 +02:00
Hendrik Muhs	fd1c4c198a	[Transform] fixes tests which might fail due to auto-stop (#47867 ) Batch transforms automatically stop after all data has processed, therefore tests can not reliable test the state. This change rewrites tests to remove the unreliable tests or use continuous transforms instead as they do not auto-stop. fixes #47441	2019-10-11 11:10:38 +02:00
Martijn van Groningen	102016d571	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-10 14:44:05 +02:00
Hendrik Muhs	0e7869128a	[7.5][Transform] introduce new roles and deprecate old ones (#47780 ) (#47819 ) deprecate data_frame_transforms_{user,admin} roles and introduce transform_{user,admin} roles as replacement	2019-10-10 10:31:24 +02:00
Martijn van Groningen	da1e2ea461	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-09 09:06:13 +02:00
Hendrik Muhs	5e0e54f455	[Transform] move root endpoint to _transform with BWC layer (#47127 ) (#47682 ) move the main endpoint to /_transform/ from /_data_frame/transforms/ with providing backwards compatibility and deprecation warnings	2019-10-08 08:59:01 +02:00
Dimitris Athanasiou	7667ea5f6f	[7.x][ML] Additional outlier detection parameters (#47600 ) (#47669 ) Adds the following parameters to `outlier_detection`: - `compute_feature_influence` (boolean): whether to compute or not feature influence scores - `outlier_fraction` (double): the proportion of the data set assumed to be outlying prior to running outlier detection - `standardization_enabled` (boolean): whether to apply standardization to the feature values Backport of #47600	2019-10-07 18:21:33 +03:00
Yogesh Gaikwad	b6d1d2e6ec	Add 'create_doc' index privilege (#45806 ) (#47645 ) Use case: User with `create_doc` index privilege will be allowed to only index new documents either via Index API or Bulk API. There are two cases that we need to think: - User indexing a new document without specifying an Id. For this ES auto generates an Id and now ES version 7.5.0 onwards defaults to `op_type` `create` we just need to authorize on the `op_type`. - User indexing a new document with an Id. This is problematic as we do not know whether a document with Id exists or not. If the `op_type` is `create` then we can assume the user is trying to add a document, if it exists it is going to throw an error from the index engine. Given these both cases, we can safely authorize based on the `op_type` value. If the value is `create` then the user with `create_doc` privilege is authorized to index new documents. In the `AuthorizationService` when authorizing a bulk request, we check the implied action. This code changes that to append the `:op_type/index` or `:op_type/create` to indicate the implied index action.	2019-10-07 23:58:44 +11:00
Martijn van Groningen	f2f2304c75	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-07 10:07:56 +02:00
Przemysław Witek	ee952da2e2	[7.x] Implement evaluation API for multiclass classification problem (#47126 ) (#47343 )	2019-10-04 17:54:51 +02:00
Przemysław Witek	ec9b77deaa	[7.x] Implement new analysis type: classification (#46537 ) (#47559 )	2019-10-04 13:47:19 +02:00
Lee Hinman	2e3eb4b24e	Add API to execute SLM retention on-demand (#47405 ) (#47463 ) * Add API to execute SLM retention on-demand (#47405) This is a backport of #47405 This commit adds the `/_slm/_execute_retention` API endpoint. This endpoint kicks off SLM retention and then returns immediately. This in particular allows us to run retention without scheduling it (for entirely manual invocation) or perform a one-off cleanup. This commit also includes HLRC for the new API, and fixes an issue in SLMSnapshotBlockingIntegTests where retention invoked prior to the test completing could resurrect an index the internal test cluster cleanup had already deleted. Resolves #46508 Relates to #43663	2019-10-02 12:29:04 -06:00

1 2 3 4 5 ...

591 Commits