OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-07 13:38:49 +00:00

Author	SHA1	Message	Date
Jay Modi	107989df3e	Introduce hidden indices (#51164 ) This change introduces a new feature for indices so that they can be hidden from wildcard expansion. The feature is referred to as hidden indices. An index can be marked hidden through the use of an index setting, `index.hidden`, at creation time. One primary use case for this feature is to have a construct that fits indices that are created by the stack that contain data used for display to the user and/or intended for querying by the user. The desire to keep them hidden is to avoid confusing users when searching all of the data they have indexed and getting results returned from indices created by the system. Hidden indices have the following properties: * API calls for all indices (empty indices array, _all, or ) will not return hidden indices by default. Wildcard expansion will not return hidden indices by default unless the wildcard pattern begins with a `.`. This behavior is similar to shell expansion of wildcards. * REST API calls can enable the expansion of wildcards to hidden indices with the `expand_wildcards` parameter. To expand wildcards to hidden indices, use the value `hidden` in conjunction with `open` and/or `closed`. * Creation of a hidden index will ignore global index templates. A global index template is one with a match-all pattern. * Index templates can make an index hidden, with the exception of a global index template. * Accessing a hidden index directly requires no additional parameters. Backport of #50452	2020-01-17 10:09:01 -07:00
David Roberts	295665b1ea	[ML] Add audit warning for 1000 categories found early in job (#51146 ) If 1000 different category definitions are created for a job in the first 100 buckets it processes then an audit warning will now be created. (This will cause a yellow warning triangle in the ML UI's jobs list.) Such a large number of categories suggests that the field that categorization is working on is not well suited to the ML categorization functionality.	2020-01-17 16:28:45 +00:00
Przemysław Witek	da73c9104e	[ML] Fix tests randomly failing on CI (#51142 ) (#51150 )	2020-01-17 14:58:58 +01:00
Dimitris Athanasiou	b70ebdeb96	[7.x][ML] DF Analytics _explain API should skip object fields (#51115 ) (#51147 ) Object fields cannot be used as features. At the moment _explain API includes them and even worse it allows it does not error when an object field is excluded. This creates the expectation to the user that all children fields will also be excluded while it's not the case. This commit omits object fields from the _explain API and also adds an error if an object field is included or excluded. Backport of #51115	2020-01-17 14:02:59 +02:00
Przemysław Witek	b1a526d5e9	[7.x] [ML] Update DFA progress document in the index the document belongs to (#51111 ) (#51117 )	2020-01-17 08:12:54 +01:00
Tom Veasey	32ec934b15	[7.x][ML] Assert top classes are ordered by score (#51028 ) Backport #51003.	2020-01-16 12:23:15 +00:00
David Roberts	1536c3e622	[TEST] Increase ML distributed test job open timeout (#50998 ) There have been occasional failures, presumably due to too many tests running in parallel, caused by jobs taking around 15 seconds to open. (You can see the job open successfully during the cleanup phase shortly after the failure of the test in these cases.) This change increases the wait time from 10 seconds to 20 seconds to reduce the risk of this happening.	2020-01-15 08:58:55 +00:00
Nik Everett	fc5fde7950	Add "did you mean" to ObjectParser (#50938 ) (#50985 ) Check it out: ``` $ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{ "dac": {} }' { "error" : { "root_cause" : [ { "type" : "x_content_parse_exception", "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?" } ], "type" : "x_content_parse_exception", "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?" }, "status" : 400 } ``` The tricky thing about implementing this is that x-content doesn't depend on Lucene. So this works by creating an extension point for the error message using SPI. Elasticsearch's server module provides the "spell checking" implementation. s	2020-01-14 17:53:41 -05:00
Benjamin Trent	72c270946f	[ML][Inference] Adding classification_weights to ensemble models (#50874 ) (#50994 ) * [ML][Inference] Adding classification_weights to ensemble models classification_weights are a way to allow models to prefer specific classification results over others this might be advantageous if classification value probabilities are a known quantity and can improve model error rates.	2020-01-14 12:40:25 -05:00
Tom Veasey	de5713fa4b	[ML] Disable invalid assertion (#50988 ) Backport #50986.	2020-01-14 17:35:00 +00:00
David Kyle	7f309a18f1	[7.x][ML] Explicitly require a OriginSettingClient in ML results iterators (#50981 ) In classes where the client is used directly rather than through a call to executeAsyncWithOrigin explicitly require the client to be OriginSettingClient rather than using the Client interface. Also remove calls to deprecated ClientHelper.clientWithOrigin() method.	2020-01-14 17:14:39 +00:00
Dimitris Athanasiou	1d8cb3c741	[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914 ) (#50976 ) Adds a new parameter to regression and classification that enables computation of importance for the top most important features. The computation of the importance is based on SHAP (SHapley Additive exPlanations) method. Backport of #50914	2020-01-14 16:46:09 +02:00
Przemysław Witek	9c6ffdc2be	[7.x] Handle nested and aliased fields correctly when copying mapping. (#50918 ) (#50968 )	2020-01-14 14:43:39 +01:00
Benjamin Trent	eb8fd44836	[ML][Inference] minor fixes for created_by, and action permission (#50890 ) (#50911 ) The system created and models we provide now use the `_xpack` user for uniformity with our other features The `PUT` action is now an admin cluster action And XPackClient class now references the action instance.	2020-01-13 07:59:31 -05:00
Benjamin Trent	fa116a6d26	[7.x] [ML][Inference] PUT API (#50852 ) (#50887 ) * [ML][Inference] PUT API (#50852) This adds the `PUT` API for creating trained models that support our format. This includes * HLRC change for the API * API creation * Validations of model format and call * fixing backport	2020-01-12 10:59:11 -05:00
Dimitris Athanasiou	422422a2bc	[7.x][ML] Reuse SourceDestValidator for data frame analytics (#50841 ) (#50850 ) This commit removes validation logic of source and dest indices for data frame analytics and replaces it with using the common `SourceDestValidator` class which is already used by transforms. This way the validations and their messages become consistent while we reduce code. This means that where these validations fail the error messages will be slightly different for data frame analytics. Backport of #50841	2020-01-10 14:24:13 +02:00
Jake Landis	de6f132887	[7.x] Foreach processor - fork recursive call (#50514 ) (#50773 ) A very large number of recursive calls can cause a stack overflow exception. This commit forks the recursive calls for non-async processors. Once forked, each thread will handle at most 10 recursive calls to help keep the stack size and thread count down to a reasonable size.	2020-01-09 13:21:18 -06:00
Benjamin Trent	cc0e64572a	[ML][Inference][HLRC] Add necessary lang ident classes (#50705 ) (#50794 ) This adds the necessary named XContent classes to the HLRC for the lang ident model. This is so the HLRC can call `GET _ml/inference/lang_ident_model_1?include_definition=true` without XContent parsing errors. The constructors are package private as since this classes are used exclusively within the pre-packaged model (and require the specific weights, etc. to be of any use).	2020-01-09 10:33:38 -05:00
Adrien Grand	31158ab3d5	Add per-field metadata. (#50333 ) This PR adds per-field metadata that can be set in the mappings and is later returned by the field capabilities API. This metadata is completely opaque to Elasticsearch but may be used by tools that index data in Elasticsearch to communicate metadata about fields with tools that then search this data. A typical example that has been requested in the past is the ability to attach a unit to a numeric field. In order to not bloat the cluster state, Elasticsearch requires that this metadata be small: - keys can't be longer than 20 chars, - values can only be numbers or strings of no more than 50 chars - no inner arrays or objects, - the metadata can't have more than 5 keys in total. Given that metadata is opaque to Elasticsearch, field capabilities don't try to do anything smart when merging metadata about multiple indices, the union of all field metadatas is returned. Here is how the meta might look like in mappings: ```json { "properties": { "latency": { "type": "long", "meta": { "unit": "ms" } } } } ``` And then in the field capabilities response: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms" ] } } } } ``` When there are no conflicts, values are arrays of size 1, but when there are conflicts, Elasticsearch includes all unique values in this array, without giving ways to know which index has which metadata value: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms", "ns" ] } } } } ``` Closes #33267	2020-01-08 16:21:18 +01:00
Benjamin Trent	060e0a6277	[ML][Inference] Add support for models shipped as resources (#50680 ) (#50700 ) This adds support for models that are shipped as resources in the ML plugin. The first of which is the `lang_ident` model.	2020-01-07 09:21:59 -05:00
Przemysław Witek	4116452d90	Implement testStopAndRestart for ClassificationIT (#50585 ) (#50698 )	2020-01-07 13:41:37 +01:00
David Roberts	35453e2b0e	[ML] Improve uniqueness of result document IDs (#50644 ) Switch from a 32 bit Java hash to a 128 bit Murmur hash for creating document IDs from by/over/partition field values. The 32 bit Java hash was not sufficiently unique, and could produce identical numbers for relatively common combinations of by/partition field values such as L018/128 and L017/228. Fixes #50613	2020-01-07 10:24:45 +00:00
David Roberts	46d600c446	[ML] Fix off-by-one error in ml_classic tokenizer end offset (#50655 ) The end offset of a tokenizer is supposed to point one past the end of the input, not to the end character of the input. The ml_classic tokenizer was erroneously doing the latter.	2020-01-07 10:14:59 +00:00
Benjamin Trent	06cea5136e	[ML] construct new random generator on each persistence call (#50657 ) (#50684 ) Sharing a random generator may cause test failures as non-threadsafe random generators are periodically utilized in tests (see: https://github.com/elastic/elasticsearch/issues/50651) This change constructs a calls `Randomness.get()` within the `bulkIndexWithRetry` method so that the returned `Random` object is only used in a single thread. Before, the member variable could have been used between threads, which caused test failures.	2020-01-06 16:26:29 -05:00
Benjamin Trent	5ab9e75e28	[7.x] [ML][Inference] lang_ident model (#50292 ) (#50675 ) * [ML][Inference] lang_ident model (#50292) This PR contains a java port of Google's CLD3 compact NN model https://github.com/google/cld3 The ported model is formatted to fit within our inference model formatting and stored as a resource in the `:xpack:ml:` plugin and is under basic license. The model is broken up into two major parts: - Preprocessing through the custom embedding (based on CLD3's embedding layer) - Pushing the embedded text through the two layers of fully connected shallow NN. Main differences between this port and CLD3: - We take advantage of Java's internal Unicode handling where possible (i.e. codepoints, characters, decoders, etc.) - We do not trim down input text by removing duplicated tokens - We do not encode doubles/floats as longs/integers.	2020-01-06 16:24:03 -05:00
Benjamin Trent	f52af7977d	[ML][Inference] minor cleanup for inference (#50444 ) (#50676 )	2020-01-06 14:05:04 -05:00
Dimitris Athanasiou	ca0828ba07	[7.x][ML] Implement force deleting a data frame analytics job (#50553 ) (#50589 ) Adds a `force` parameter to the delete data frame analytics request. When `force` is `true`, the action force-stops the jobs and then proceeds to the deletion. This can be used in order to delete a non-stopped job with a single request. Closes #48124 Backport of #50553	2020-01-03 13:46:02 +02:00
Przemysław Witek	8917c05df8	[7.x] Synchronize processInStream.close() call (#50581 )	2020-01-03 10:23:51 +01:00
Przemysław Witek	4ecabe496f	Mute testStopAndRestart test case (#50551 )	2020-01-02 15:28:20 +01:00
Christoph Büscher	1599af8428	Fix type conversion problem in Eclipse (#50549 ) Eclipse 4.13 shows a type mismatch error in the affected line because it cannot correctly infer the boolean return type for the method call. Assigning return value to a local variable resolves this problem.	2020-01-02 14:29:20 +01:00
Przemysław Witek	3e3a93002f	[7.x] Fix accuracy metric (#50310 ) (#50433 )	2019-12-20 15:34:38 +01:00
Przemysław Witek	14d95aae46	[7.x] Make each analysis report desired field mappings to be copied (#50219 ) (#50428 )	2019-12-20 15:10:33 +01:00
Przemysław Witek	5bb668b866	[7.x] Get rid of maxClassesCardinality internal parameter (#50418 ) (#50423 )	2019-12-20 14:24:23 +01:00
Przemysław Witek	cc4bc797f9	[7.x] Implement `precision` and `recall` metrics for classification evaluation (#49671 ) (#50378 )	2019-12-19 18:55:05 +01:00
Dimitris Athanasiou	d3c83cd55a	[7.x][ML] Refresh state index before completing data frame analytics job (#50322 ) (#50324 ) In order to ensure any persisted model state is searchable by the moment the job reports itself as `stopped`, we need to refresh the state index before completing. This should fix the occasional failures we see in #50168 and #50313 where the model state appears missing. Closes #50168 Closes #50313 Backport of #50322	2019-12-18 22:19:59 +00:00
Benjamin Trent	4396a1f78b	[ML][Inference] fix support for nested fields (#50258 ) (#50335 ) This fixes support for nested fields We now support fully nested, fully collapsed, or a mix of both on inference docs. ES mappings allow the `_source` to be any combination of nested objects + dot delimited fields. So, we should do our best to find the best path down the Map for the desired field.	2019-12-18 15:47:06 -05:00
Dimitris Athanasiou	447bac27d2	[7.x][ML] Delete unused data frame analytics state (#50243 ) (#50280 ) This commit adds removal of unused data frame analytics state from the _delete_expired_data API (and in extend th ML daily maintenance task). At the moment the potential state docs include the progress document and state for regression and classification analyses. Backport of #50243	2019-12-18 12:30:11 +00:00
Przemysław Witek	ac974c35c0	Pass processConnectTimeout to the method that fetches C++ process' PID (#50276 ) (#50290 )	2019-12-17 21:32:37 +01:00
David Kyle	098f540f9d	[ML] Remove usage of base action logger in ml actions (#50074 ) (#50236 )	2019-12-17 13:03:27 +00:00
David Kyle	5542686283	[ML] Wait for green after opening job in NetworkDisruptionIT (#50232 ) Closes #49908	2019-12-16 14:55:58 +00:00
Dimitris Athanasiou	73add726d7	[7.x][ML] Fix exception when field is not included and excluded at the same time (#50192 ) (#50223 ) Executing the data frame analytics _explain API with a config that contains a field that is not in the includes list but at the same time is the excludes list results to trying to remove the field twice from the iterator. That causes an `IllegalStateException`. This commit fixes this issue and adds a test that captures the scenario. Backport of #50192	2019-12-16 11:30:06 +00:00
Benjamin Trent	4805d8ac7d	[ML][Inference] Adding a warning_field for warning msgs. (#49838 ) (#50183 ) This adds a new field for the inference processor. `warning_field` is a place for us to write warnings provided from the inference call. When there are warnings we are not going to write an inference result. The goal of this is to indicate that the data provided was too poor or too different for the model to make an accurate prediction. The user could optionally include the `warning_field`. When it is not provided, it is assumed no warnings were desired to be written. The first of these warnings is when ALL of the input fields are missing. If none of the trained fields are present, we don't bother inferencing against the model and instead provide a warning stating that the fields were missing. Also, this adds checks to not allow duplicated fields during processor creation.	2019-12-13 10:39:51 -05:00
Benjamin Trent	41736dd6c3	[ML] retry bulk indexing of state docs (#50149 ) (#50185 ) This exchanges the direct use of the `Client` for `ResultsPersisterService`. State doc persistence will now retry. Failures to persist state will still not throw, but will be audited and logged.	2019-12-13 10:39:34 -05:00
Dimitris Athanasiou	fe3c9e71d1	[7.x][ML] Fix DFA explain API timeout when source index is missing (#50176 ) (#50180 ) This commit fixes a bug that caused the data frame analytics _explain API to time out in a multi-node setup when the source index was missing. When we try to create the extracted fields detector, we check the index settings. If the index is missing that responds with a failure that could be wrapped as a remote exception. While we unwrapped correctly to check if the cause was an `IndexNotFoundException`, we then proceeded to cast the original exception instead of the cause. Backport of #50176	2019-12-13 17:00:55 +02:00
Dimitris Athanasiou	e6cbcf7f7c	[7.x] [ML] Persist/restore state for DFA classification (#50040 ) (#50147 ) This commit adds state persist/restore for data frame analytics classification jobs. Backport of #50040	2019-12-13 10:33:19 +02:00
Benjamin Trent	d7ffa7f8f7	[7.x][ML] Add graceful retry for anomaly detector result indexing failures(#49508 ) (#50145 ) * [ML] Add graceful retry for anomaly detector result indexing failures (#49508) All results indexing now retry the amount of times configured in `xpack.ml.persist_results_max_retries`. The retries are done in a semi-random, exponential backoff. * fixing test	2019-12-12 12:24:58 -05:00
Benjamin Trent	c043aa887f	[ML][Inference] Simplify inference processor options (#50105 ) (#50146 ) * [ML][Inference] Simplify inference processor options * addressing pr comments	2019-12-12 11:13:55 -05:00
David Roberts	13e47df97d	[TEST] Increase timeout for ML internal cluster cleanup (#50142 ) Closes #48511	2019-12-12 15:38:22 +00:00
David Kyle	7d4118dc4e	Enable trace logging in failing ml NetworkDisruptionIT https://github.com/elastic/elasticsearch/issues/49908	2019-12-12 11:16:01 +00:00
Dimitris Athanasiou	03ecaae221	[7.x][ML] Avoid classification integ test training on single class (#50072 ) (#50078 ) The `ClassificationIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet` test was previously set up to just have 10 rows. With `training_percent` of 50%, only 5 rows will be used for training. There is a good chance that all 5 rows will be of one class which results to failure. This commit increases the rows to 100. Now 50 rows should be used for training and the chance of failure should be very small. Backport of #50072	2019-12-11 18:50:26 +02:00

1 2 3 4 5 ...

608 Commits