OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Roberts	1cefafdd14	[ML] Add new categorization stats to model_size_stats (#52009 ) This change adds support for the following new model_size_stats fields: - categorized_doc_count - total_category_count - frequent_category_count - rare_category_count - dead_category_count - categorization_status Backport of #51879	2020-02-10 09:10:50 +00:00
David Kyle	8f10a7c6ca	[ML] Make Ensemble feature names optional (#51996 ) The featureNames field is requisite in individual models but is not required by the Ensemble.	2020-02-07 10:08:37 +00:00
Darren LaCasse	480e9238a4	[DOCS] Remove extra word (#51757 )	2020-01-31 10:30:06 -08:00
István Zoltán Szabó	dfc9f2330c	[DOCS] Adds PUT inference API docs (#51231 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com> Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-01-31 13:13:34 +01:00
Lisa Cawley	1a40ebfa67	[DOCS] Adds missing testenv attribute (#51719 )	2020-01-30 16:15:17 -08:00
David Roberts	550254ec7f	[ML] Use CSV ingest processor in find_file_structure ingest pipeline (#51492 ) Changes the find_file_structure response to include a CSV ingest processor in the ingest pipeline it suggests. Previously the Kibana file upload functionality parsed CSV in the browser, but by parsing CSV in the ingest pipeline it makes the Kibana file upload functionality more easily interchangable with Filebeat such that the configurations it creates can more easily be used to import data with the same structure repeatedly in production.	2020-01-28 14:38:43 +00:00
Benjamin Trent	76660a5a4f	[7.x] [ML][Inference] add tags url param to GET (#51330 ) (#51404 ) * [ML][Inference] add tags url param to GET (#51330) Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint. This parameter allows the list of models to be further reduced to those who contain all the provided tags.	2020-01-24 08:26:58 -05:00
István Zoltán Szabó	8bdf654cc7	[DOCS] Refines description. (#51400 )	2020-01-24 13:34:25 +01:00
Lisa Cawley	ec47698f7c	[DOCS] Updates categorization examples with wizard screenshots (#51133 )	2020-01-22 11:28:17 -08:00
Lisa Cawley	4590d4156a	[DOCS] Clarify interval, frequency, and bucket span in ML APIs and example (#51280 )	2020-01-22 08:15:46 -08:00
David Kyle	ca4b90a001	[ML] Calculate results and snapshot retention using latest bucket timestamps (#51061 ) (#51301 ) The retention period is calculated relative to the last bucket result or snapshot time rather than wall clock	2020-01-22 14:52:33 +00:00
István Zoltán Szabó	83c92cf7eb	[DOCS] Adds text about data types to the categorization docs (#51145 )	2020-01-17 10:00:20 -08:00
Dimitris Athanasiou	b70ebdeb96	[7.x][ML] DF Analytics _explain API should skip object fields (#51115 ) (#51147 ) Object fields cannot be used as features. At the moment _explain API includes them and even worse it allows it does not error when an object field is excluded. This creates the expectation to the user that all children fields will also be excluded while it's not the case. This commit omits object fields from the _explain API and also adds an error if an object field is included or excluded. Backport of #51115	2020-01-17 14:02:59 +02:00
Christoph Büscher	d291f189a8	Fix hardcoded version replacement in put-dfanalytics.asciidoc #51053 The version replacement for the code snippet should replace 7.6 with the current version, but doesn't match because of a missing whitespace. Closes #51052	2020-01-15 18:09:37 +01:00
Przemysław Witek	b4a631277a	Add missing docs for new evaluation metrics (#50967 ) (#51041 )	2020-01-15 15:53:42 +01:00
István Zoltán Szabó	b570f417c2	[DOCS] Describes the relationship of the time-related settings in anomaly detection docs (#50959 ) Co-Authored-By: David Roberts <dave.roberts@elastic.co>	2020-01-15 08:46:04 +01:00
Dimitris Athanasiou	1d8cb3c741	[7.x][ML] Add num_top_feature_importance_values param to regression and classi… (#50914 ) (#50976 ) Adds a new parameter to regression and classification that enables computation of importance for the top most important features. The computation of the importance is based on SHAP (SHapley Additive exPlanations) method. Backport of #50914	2020-01-14 16:46:09 +02:00
lcawl	8a5de4f56f	[DOCS] Clarify detector_index property in ML APIs (#50723 )	2020-01-09 08:34:34 -08:00
István Zoltán Szabó	4f150e4961	[7.x][DOCS] Moves analysis resources to PUT DFA API docs (#50793 )	2020-01-09 16:21:35 +01:00
István Zoltán Szabó	71afeec7d0	Revert "[DOCS] Moves analysis resources to PUT DFA API docs (#50704 )" This reverts commit `4e1107d5d7`.	2020-01-09 14:31:35 +01:00
István Zoltán Szabó	4e1107d5d7	[DOCS] Moves analysis resources to PUT DFA API docs (#50704 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-01-09 14:13:37 +01:00
István Zoltán Szabó	acd73dda1c	[DOCS] Improves find_file_structure documentation (#50743 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-01-09 11:20:29 +01:00
István Zoltán Szabó	0ac6786f41	[DOCS] Forms role and privilege requirements as bulleted lists in DFA API docs (#50732 ) Co-Authored-By: Lisa Cawley <lcawley@elastic.co>	2020-01-09 10:45:18 +01:00
István Zoltán Szabó	8a1bb440e2	[DOCS] Clarifies model_size_stats.total_xxx_field_count objects and removes notes in GET job stats API docs. (#50728 )	2020-01-09 09:45:37 +01:00
István Zoltán Szabó	d7bb5d7531	[DOCS] Improves description for forecast_stats (#50729 ) Co-Authored-By: Lisa Cawley <lcawley@elastic.co>	2020-01-09 09:35:47 +01:00
Lisa Cawley	62969c35cd	[DOCS] Adds missing timing_stats descriptions (#50574 )	2020-01-03 09:14:09 -08:00
István Zoltán Szabó	0bcbddecf8	[DOCS] Fine-tunes training_percent definition. (#50601 )	2020-01-03 14:51:03 +01:00
Dimitris Athanasiou	ca0828ba07	[7.x][ML] Implement force deleting a data frame analytics job (#50553 ) (#50589 ) Adds a `force` parameter to the delete data frame analytics request. When `force` is `true`, the action force-stops the jobs and then proceeds to the deletion. This can be used in order to delete a non-stopped job with a single request. Closes #48124 Backport of #50553	2020-01-03 13:46:02 +02:00
István Zoltán Szabó	a34b3f133c	[DOCS] Specifies the possible data types of classification dependent_variable (#50582 )	2020-01-03 10:42:56 +01:00
Lisa Cawley	81a9cff16f	[7.x][DOCS] Remove redundant results from ML APIs (#50565 )	2020-01-02 11:23:26 -08:00
Lisa Cawley	ab5a69d1e2	[7.x][DOCS] Move machine learning results definitions into APIs (#50543 )	2019-12-31 13:21:17 -08:00
Lisa Cawley	f8eef43fc6	[7.x][DOCS] Move model snapshot resource definitions into APIs (#50540 )	2019-12-31 10:53:05 -08:00
Lisa Cawley	3fb4f1b5bf	[DOCS] Moves job count resource definitions into API (#50529 )	2019-12-30 14:55:36 -08:00
Lisa Cawley	4b829db593	[7.x][DOCS] Move datafeed resource definitions into APIs (#50516 )	2019-12-30 09:35:16 -08:00
Lisa Cawley	72840c0cb2	[7.x][DOCS] Move anomaly detection job resource definitions into APIs (#50490 )	2019-12-27 13:30:26 -08:00
James Rodewig	ef467cc6f5	[DOCS] Remove unneeded redirects (#50476 ) The docs/reference/redirects.asciidoc file stores a list of relocated or deleted pages for the Elasticsearch Reference documentation. This prunes several older redirects that are no longer needed and don't require work to fix broken links in other repositories.	2019-12-26 08:29:28 -05:00
Lisa Cawley	d479e0563a	[7.x][DOCS] Augments ML shared definitions (#50487 )	2019-12-24 10:22:05 -08:00
Orhan Toy	6a3d1a077e	[DOCS] Fixes "enables you to" typos (#50225 )	2019-12-23 14:39:14 -05:00
Lisa Cawley	2106a7b02a	[7.x][DOCS] Updates ML links (#50387 ) (#50409 )	2019-12-20 10:01:19 -08:00
lcawl	246d926412	[DOCS] Fixes security links	2019-12-18 11:52:00 -08:00
István Zoltán Szabó	5759a263cb	[DOCS] Adds GET, GET stats and DELETE inference APIs (#50224 ) Co-Authored-By: Lisa Cawley <lcawley@elastic.co>	2019-12-18 09:18:56 +01:00
István Zoltán Szabó	8f36bfa37f	[7.x][DOCS] Changes hyperparam optimization section ID (#50173 )	2019-12-13 12:22:50 +01:00
István Zoltán Szabó	7611b3c9be	[7.x][DOCS] Moves data frame analytics job resource definitions into APIs (#50165 ) * [7.x][DOCS] Moves data frame analytics job resource definitions into APIs.	2019-12-13 11:48:21 +01:00
Dimitris Athanasiou	8891f4db88	[7.x][ML] Introduce randomize_seed setting for regression and classification (#49990 ) (#50023 ) This adds a new `randomize_seed` for regression and classification. When not explicitly set, the seed is randomly generated. One can reuse the seed in a similar job in order to ensure the same docs are picked for training. Backport of #49990	2019-12-10 15:29:19 +02:00
István Zoltán Szabó	63d3933787	[DOCS] Fixes classification evaluation example response. (#49905 )	2019-12-06 13:25:40 +01:00
István Zoltán Szabó	f4b3bb7d6b	[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831 )	2019-12-05 14:16:38 +01:00
István Zoltán Szabó	04e99ff1ee	[DOCS] Fixes typo in the ML anomaly detection time functions docs. (#49834 )	2019-12-05 09:58:30 +01:00
Dimitris Athanasiou	4edb2e7bb6	[7.x][ML] Add optional source filtering during data frame reindexing (#49690 ) (#49718 ) This adds a `_source` setting under the `source` setting of a data frame analytics config. The new `_source` is reusing the structure of a `FetchSourceContext` like `analyzed_fields` does. Specifying includes and excludes for source allows selecting which fields will get reindexed and will be available in the destination index. Closes #49531 Backport of #49690	2019-11-29 16:10:44 +02:00
lcawl	777431265b	[DOCS] Fixes typo in ML resources	2019-11-26 10:28:59 -08:00
lcawl	a42003b95b	[DOCS] Fixes data type formatting	2019-11-26 08:22:50 -08:00
David Roberts	62811c2272	[ML] Add default categorization analyzer definition to ML info (#49545 ) The categorization job wizard in the ML UI will use this information when showing the effect of the chosen categorization analyzer on a sample of input.	2019-11-25 13:39:16 +00:00
Dimitris Athanasiou	d21df9eba9	[ML][DOCS] Anomaly detection job retention days settings do not require restart (#49546 )	2019-11-25 14:19:10 +01:00
Dimitris Athanasiou	8eaee7cbdc	[7.x][ML] Explain data frame analytics API (#49455 ) (#49504 ) This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why. Backport of #49455	2019-11-22 22:06:10 +02:00
Lisa Cawley	97cdfd2848	[DOCS] Clarify ML job closure prerequisites (#49265 )	2019-11-19 08:36:50 -08:00
David Roberts	698ebd3d0a	[TEST] Mute docs snippet test in close-job.asciidoc (#49000 ) Due to https://github.com/elastic/elasticsearch/pull/48583#issuecomment-552991325	2019-11-12 17:34:27 +00:00
Benjamin Trent	46ab1db54f	[7.x] [ML] Add new geo_results.(actual_point\|typical_point) fields for `lat_long` results (#47050 ) (#48958 ) * [ML] Add new geo_results.(actual_point\|typical_point) fields for `lat_long` results (#47050) [ML] Add new geo_results.(actual_point\|typical_point) fields for `lat_long` results (#47050) Related PR: https://github.com/elastic/ml-cpp/pull/809 * adjusting bwc version	2019-11-11 15:43:03 -05:00
István Zoltán Szabó	c2f52015d3	[DOCS] Removes best practice about fields that are highly correlated to the dependent variable. (#48935 )	2019-11-11 16:01:21 +01:00
István Zoltán Szabó	91888959e8	[DOCS] Extends analyzed_fields description in PUT DFA API docs. (#48307 )	2019-11-11 15:55:12 +01:00
István Zoltán Szabó	3c9bd13dca	[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241 )	2019-11-06 07:41:38 -05:00
István Zoltán Szabó	70765dfb05	[DOCS] Adds classification type evaluation docs to the DFA evaluation API (#47657 )	2019-11-06 07:38:33 -05:00
Lisa Cawley	13ce179706	[DOCS] Re-enable code snippet testing in close anomaly detection job API (#48259 ) (#48585 )	2019-10-28 08:42:09 -07:00
David Roberts	984323783e	[ML][7.x] Add lazy assignment job config option (#47993 ) This change adds: - A new option, allow_lazy_open, to anomaly detection jobs - A new option, allow_lazy_start, to data frame analytics jobs Both work in the same way: they allow a job to be opened/started even if no ML node exists that can accommodate the job immediately. In this situation the job waits in the opening/starting state until ML node capacity is available. (The starting state for data frame analytics jobs is new in this change.) Additionally, the ML nightly maintenance tasks now creates audit warnings for ML jobs that are unassigned. This means that jobs that cannot be assigned to an ML node for a very long time will show a yellow warning triangle in the UI. A final change is that it is now possible to close a job that is not assigned to a node without using force. This is because previously jobs that were open but not assigned to a node were an aberration, whereas after this change they'll be relatively common.	2019-10-15 06:55:11 +01:00
David Roberts	1ca25bed38	[ML][7.x] Add option to stop datafeed that finds no data (#47995 ) Adds a new datafeed config option, max_empty_searches, that tells a datafeed that has never found any data to stop itself and close its associated job after a certain number of real-time searches have returned no data. Backport of #47922	2019-10-14 17:19:13 +01:00
István Zoltán Szabó	9eac8bf2a8	[DOCS] Adds supported fields section to the PUT DFA API description (#47842 )	2019-10-10 12:42:54 +02:00
István Zoltán Szabó	6f4b7e9a7f	[DOCS] Extends the analyzed_fields description in the PUT DFA API docs (#47791 )	2019-10-09 18:14:58 +02:00
Lisa Cawley	39ef795085	[DOCS] Cleans up links to security content (#47610 ) (#47703 )	2019-10-07 15:23:19 -07:00
Dimitris Athanasiou	7667ea5f6f	[7.x][ML] Additional outlier detection parameters (#47600 ) (#47669 ) Adds the following parameters to `outlier_detection`: - `compute_feature_influence` (boolean): whether to compute or not feature influence scores - `outlier_fraction` (double): the proportion of the data set assumed to be outlying prior to running outlier detection - `standardization_enabled` (boolean): whether to apply standardization to the feature values Backport of #47600	2019-10-07 18:21:33 +03:00
István Zoltán Szabó	a57cb5843f	[DOCS] Fixes an attribute in the update datafeed API docs. (#47551 )	2019-10-04 08:46:16 +02:00
István Zoltán Szabó	9dcdbea228	[DOCS] Amends update datafeed API docs (#47448 )	2019-10-03 13:13:18 +02:00
István Zoltán Szabó	033aa9cf9b	[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966 ) * [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs. * [DOCS] Removes extra lines from examples. * Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc Co-Authored-By: Lisa Cawley <lcawley@elastic.co> * Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc Co-Authored-By: Lisa Cawley <lcawley@elastic.co> * [DOCS] Explains examples.	2019-10-02 10:33:45 +02:00
István Zoltán Szabó	1cecbd0cd3	[DOCS] Fine tunes update anomaly detection job API documentation (#47280 ) * [DOCS] Fine tunes update anomaly detection job API documentation. * [DOCS] Removes delimiter to fix the table.	2019-10-02 10:06:49 +02:00
István Zoltán Szabó	6a9f04ee76	[DOCS] Fixes typos in the PUT dfa and the evaluate dfa documentation. (#47348 )	2019-10-02 09:52:29 +02:00
István Zoltán Szabó	170b102ab5	[DOCS] Changes wording to move away from data frame terminology in the ES repo (#47093 ) * [DOCS] Changes wording to move away from data frame terminology in the ES repo. Co-Authored-By: Lisa Cawley <lcawley@elastic.co>	2019-10-01 08:08:17 +02:00
István Zoltán Szabó	3be51fbdf7	[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176 ) * [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs. Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com> Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>	2019-09-19 09:23:18 +02:00
István Zoltán Szabó	fe8f33a8e1	[DOCS] Adds outlier detection params to the data frame analytics resources (#46323 ) * [DOCS] Adds outlier detection params to the data frame analytics resources. Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com> Co-Authored-By: Lisa Cawley <lcawley@elastic.co>	2019-09-16 14:23:23 +02:00
James Rodewig	e253ee6ba6	[DOCS] Change // CONSOLE comments to [source,console] (#46440 ) (#46494 )	2019-09-09 12:35:50 -04:00
James Rodewig	f04573f8e8	[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449 ) (#46459 )	2019-09-06 16:09:09 -04:00
James Rodewig	c46c57d439	[DOCS] Change // CONSOLE comments to [source,console] (#46441 ) (#46451 )	2019-09-06 11:31:13 -04:00
James Rodewig	bb7bff5e30	[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295 ) (#46418 )	2019-09-06 09:22:08 -04:00
István Zoltán Szabó	8208ffa666	[DOCS] Adds progress parameter description to the GET stats data frame analytics API doc. (#46434 )	2019-09-06 15:18:57 +02:00
Benjamin Trent	d0c5573a51	[ML] Throw an error when a datafeed needs CCS but it is not enabled for the node (#46044 ) (#46096 ) Though we allow CCS within datafeeds, users could prevent nodes from accessing remote clusters. This can cause mysterious errors and difficult to troubleshoot. This commit adds a check to verify that `cluster.remote.connect` is enabled on the current node when a datafeed is configured with a remote index pattern.	2019-08-30 09:27:07 -05:00
István Zoltán Szabó	a75348d1fb	[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649 ) * [DOCS] [PUT DFA] Documents inline the child params of source and dest. * [DOCS] Fixes indentation issues and amends dfa definitions.	2019-08-29 15:09:02 +02:00
Dimitris Athanasiou	dd6c13fdf9	[ML] Add description to DF analytics (#45774 ) (#46019 )	2019-08-27 15:48:59 +03:00
Dimitris Athanasiou	be554fe5f0	[7.x][ML] Improve progress reportings for DF analytics (#45856 ) (#45910 ) Previously, the stats API reports a progress percentage for DF analytics tasks that are running and are in the `reindexing` or `analyzing` state. This means that when the task is `stopped` there is no progress reported. Thus, one cannot distinguish between a task that never run to one that completed. In addition, there are blind spots in the progress reporting. In particular, we do not account for when data is loaded into the process. We also do not account for when results are written. This commit addresses the above issues. It changes progress to being a list of objects, each one describing the phase and its progress as a percentage. We currently have 4 phases: reindexing, loading_data, analyzing, writing_results. When the task stops, progress is persisted as a document in the state index. The stats API now reports progress from in-memory if the task is running, or returns the persisted document (if there is one).	2019-08-23 23:04:39 +03:00
Przemysław Witek	7512337922	[7.x] Allow the user to specify 'query' in Evaluate Data Frame request (#45775 ) (#45825 )	2019-08-22 11:14:26 +02:00
Przemysław Witek	5faa012fd6	[7.x] Add docs for HLRC for Estimate memory usage API (#45538 ) (#45783 )	2019-08-21 14:27:36 +02:00
Przemysław Witek	df574e5168	[7.x] Implement ml/data_frame/analytics/_estimate_memory_usage API endpoint (#45188 ) (#45510 )	2019-08-14 08:26:03 +02:00
István Zoltán Szabó	276e9c6697	[DOCS] Adds supported time units ref to the ML and DF API params. (#45322 )	2019-08-08 14:26:19 +02:00
Lisa Cawley	7c9c9a9cc4	[DOCS] Reformats ML update APIs (#45253 )	2019-08-06 11:16:29 -07:00
István Zoltán Szabó	dae648eb32	[DOCS] Makes clearer the note under freq_rare. (#45193 )	2019-08-05 13:29:43 +02:00
James Rodewig	8dd74dfe0b	Rename "indices APIs" to "index APIs" (#44863 )	2019-08-02 14:10:09 -04:00
Lisa Cawley	09bd6c4692	[DOCS] Clarifies bucket span in overall buckets API (#45110 )	2019-08-02 08:42:39 -07:00
Lisa Cawley	e4b7ae211b	[DOCS] Updates terms in machine learning get APIs (#44986 )	2019-07-30 11:29:25 -07:00
István Zoltán Szabó	19426f9cdf	[DOCS] Adds allow no jobs param to the GET, GET stats and Close APIs (#44503 )	2019-07-30 14:27:27 +02:00
Lisa Cawley	a041d1eacf	[DOCS] Updates anomaly detection terminology (#44888 )	2019-07-26 11:10:49 -07:00
Lisa Cawley	cef375f883	[DOCS] Updates terms in machine learning datafeed APIs (#44883 )	2019-07-26 10:48:28 -07:00
István Zoltán Szabó	cd7ba9f302	[DOCS] Amends data frame analytics resources, GET, and PUT API docs (#44806 ) This PR addresses the feedback in https://github.com/elastic/ml-team/issues/175#issuecomment-512215731. * Adds an example to `analyzed_fields` * Includes `source` and `dest` objects inline in the resource page * Lists `model_memory_limit` in the PUT API page * Amends the `analysis` section in the resource page * Removes Properties headings in subsections	2019-07-26 11:52:43 +02:00
Lisa Cawley	21971feae8	[DOCS] Updates terms in machine learning calendar APIs (#44866 )	2019-07-25 11:50:34 -07:00
Lisa Cawley	a79adca7e3	[DOCS] Updates terms in anomaly detection job APIs (#44839 )	2019-07-25 09:06:52 -07:00
István Zoltán Szabó	4a31c426e6	[DOCS] Adds allow no datafeeds query param to the GET, GET stats and STOP datafeed APIs (#44499 )	2019-07-25 17:08:01 +02:00
James Rodewig	d46545f729	[DOCS] Update anchors and links for Elasticsearch API relocation (#44500 )	2019-07-19 09:18:23 -04:00
Lisa Cawley	8445c41004	[DOCS] Moves content to ML anomaly-detection folder (#44520 ) (#44530 )	2019-07-18 08:44:52 -07:00
Lisa Cawley	213af8411f	[DOCS] Fixes query default value (#44572 )	2019-07-18 08:18:58 -07:00
Lisa Cawley	53514b0477	[DOCS] Separates data frame analytics APIs (#44451 )	2019-07-16 13:33:23 -07:00
James Rodewig	ac07eef86c	[DOCS] Remove :edit_url: overrides. (#44445 ) These overrides do not work in Asciidoctor and are no longer needed.	2019-07-16 15:04:44 -04:00
Lisa Cawley	e7ea49e32f	[DOCS] Removes unnecessary resource definition pages (#44289 )	2019-07-15 10:03:53 -07:00
David Kyle	2382701547	Wait for pending tasks in docs tests cleanup (#44123 ) ML and Data Frame tests should wait for pending tasks	2019-07-15 12:04:27 +01:00
Lisa Cawley	8fdcf28fac	[DOCS] Reformats API parameter details (#44194 )	2019-07-12 08:28:49 -07:00
Lisa Cawley	4d8bf1c3e3	[DOCS] Removes links to ML tutorial (#44251 )	2019-07-12 08:28:36 -07:00
István Zoltán Szabó	2171b6b47f	[DOCS] Adds data frame analytics API and evaluate API resource documentation (#43972 ) This PR adds the resource documentation of the data frame analytics APIs and the evaluate API to the ML API doc pool.	2019-07-11 18:12:48 +02:00
lcawl	4e6cbc2890	[DOCS] Fixes formatting in data frame analytics API	2019-07-10 18:01:47 -07:00
Przemysław Witek	44781e415e	[7.x] [ML] Add DatafeedTimingStats to datafeed GetDatafeedStatsAction.Response (#43045 ) (#44118 )	2019-07-10 11:51:44 +02:00
David Kyle	23d7e309da	Mute put job docs test Relates to #43271	2019-07-09 13:23:31 +01:00
lcawl	cd4021274a	[DOCS] Enables testing for create job ML API (#44022 )	2019-07-08 11:43:18 -07:00
Lisa Cawley	117f14e0ed	[DOCS] Updates 7.x version in data frame analytics API (#44026 )	2019-07-08 11:20:57 -07:00
Lisa Cawley	efddbcc1d1	[DOCS] Fixes earliest_record_timestamp data type (#44030 )	2019-07-08 10:16:07 -07:00
lcawl	a831d4707c	[DOCS] Temporarily disables data frame API testing	2019-07-05 10:56:09 -07:00
István Zoltán Szabó	7242267f5d	[DOCS] Adds data frame analytics APIs to the ML APIs (#43875 ) This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.	2019-07-05 14:25:54 +02:00
Lisa Cawley	1b7bcdc3a0	[DOCS] Adds data frame API response codes for allow_no_match (#43666 )	2019-06-27 15:17:58 -07:00
Lisa Cawley	42cb59f7b4	[DOCS] Updates ML APIs to use new API template (#43711 )	2019-06-27 15:05:51 -07:00
lcawl	d46e2bb26a	[DOCS] Adds anchors and attributes to ML APIs	2019-06-27 09:44:56 -07:00
Matthew Adams	0bcadbf846	Clarify storage location of ML Snapshots (#43437 ) The existing language was misleading about the model snapshots and where they are located. Saying "to disk" sounds like files external to Elasticsearch IMO. It raises the obvious question, where on disk? which node? Is it in the Elasticsearch snapshot repo? The model snapshots are held in an internal index.	2019-06-24 09:14:12 +01:00
Przemysław Witek	b2613a123d	[7.x] Report exponential_avg_bucket_processing_time which gives more weight to recent buckets (#43189 ) (#43263 )	2019-06-17 08:58:26 +02:00
lcawl	8a341a3ea5	[DOCS] Fix link to ML node description	2019-06-13 13:56:06 -07:00
Ryan Ernst	c3ce3f6891	Add native code info to ML info api (#43172 ) The machine learning feature of xpack has native binaries with a different commit id than the rest of code. It is currently exposed in the xpack info api. This commit adds that commit information to the ML info api, so that it may be removed from the info api.	2019-06-13 11:38:58 -07:00
Benjamin Trent	79052050bf	[ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds (#42969 ) (#43069 ) * [ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds * only supporting doc_values for geo_point fields * moving validation into GeoPointField ctor	2019-06-10 21:52:53 -05:00
David Roberts	b202a59f88	[ML] Add earliest and latest timestamps to field stats (#42890 ) This change adds the earliest and latest timestamps into the field stats for fields of type "date" in the output of the ML find_file_structure endpoint. This will enable the cards for date fields in the file data visualizer in the UI to be made to look more similar to the cards for date fields in the index data visualizer in the UI.	2019-06-06 08:58:35 +01:00
David Roberts	b61202b0a8	[ML] Add a limit on line merging in find_file_structure (#42501 ) When analysing a semi-structured text file the find_file_structure endpoint merges lines to form multi-line messages using the assumption that the first line in each message contains the timestamp. However, if the timestamp is misdetected then this can lead to excessive numbers of lines being merged to form massive messages. This commit adds a line_merge_size_limit setting (default 10000 characters) that halts the analysis if a message bigger than this is created. This prevents significant CPU time being spent subsequently trying to determine the internal structure of the huge bogus messages.	2019-06-03 13:45:51 +01:00
Benjamin Trent	d06618a70d	[ML] adding delayed_data_check_config to datafeed update docs (#42095 ) (#42626 ) * [ML] adding delayed_data_check_config to datafeed update docs * [DOCS] Edits delayed data configuration details	2019-05-28 11:36:30 -04:00
David Roberts	f472186b9f	[ML] Improve file structure finder timestamp format determination (#41948 ) This change contains a major refactoring of the timestamp format determination code used by the ML find file structure endpoint. Previously timestamp format determination was done separately for each piece of text supplied to the timestamp format finder. This had the drawback that it was not possible to distinguish dd/MM and MM/dd in the case where both numbers were 12 or less. In order to do this sensibly it is best to look across all the available timestamps and see if one of the numbers is greater than 12 in any of them. This necessitates making the timestamp format finder an instantiable class that can accumulate evidence over time. Another problem with the previous approach was that it was only possible to override the timestamp format to one of a limited set of timestamp formats. There was no way out if a file to be analysed had a timestamp that was sane yet not in the supported set. This is now changed to allow any timestamp format that can be parsed by a combination of these Java date/time formats: yy, yyyy, M, MM, MMM, MMMM, d, dd, EEE, EEEE, H, HH, h, mm, ss, a, XX, XXX, zzz Additionally S letter groups (fractional seconds) are supported providing they occur after ss and separated from the ss by a dot, comma or colon. Spacing and punctuation is also permitted with the exception of the question mark, newline and carriage return characters, together with literal text enclosed in single quotes. The full list of changes/improvements in this refactor is: - Make TimestampFormatFinder an instantiable class - Overrides must be specified in Java date/time format - Joda format is no longer accepted - Joda timestamp formats in outputs are now derived from the determined or overridden Java timestamp formats, not stored separately - Functionality for determining the "best" timestamp format in a set of lines has been moved from TextLogFileStructureFinder to TimestampFormatFinder, taking advantage of the fact that TimestampFormatFinder is now an instantiable class with state - The functionality to quickly rule out some possible Grok patterns when looking for timestamp formats has been changed from using simple regular expressions to the much faster approach of using the Shift-And method of sub-string search, but using an "alphabet" consisting of just 1 (representing any digit) and 0 (representing non-digits) - Timestamp format overrides are now much more flexible - Timestamp format overrides that do not correspond to a built-in Grok pattern are mapped to a %{CUSTOM_TIMESTAMP} Grok pattern whose definition is included within the date processor in the ingest pipeline - Grok patterns that correspond to multiple Java date/time patterns are now handled better - the Grok pattern is accepted as matching broadly, and the required set of Java date/time patterns is built up considering all observed samples - As a result of the more flexible acceptance of Grok patterns, when looking for the "best" timestamp in a set of lines timestamps are considered different if they are preceded by a different sequence of punctuation characters (to prevent timestamps far into some lines being considered similar to timestamps near the beginning of other lines) - Out-of-the-box Grok patterns that are considered now include %{DATE} and %{DATESTAMP}, which have indeterminate day/month ordering - The order of day/month in formats with indeterminate day/month order is determined by considering all observed samples (plus the server locale if the observed samples still do not suggest an ordering) Relates #38086 Closes #35137 Closes #35132	2019-05-24 09:10:08 +01:00
Zachary Tong	6ae6f57d39	[7.x Backport] Force selection of calendar or fixed intervals (#41906 ) The date_histogram accepts an interval which can be either a calendar interval (DST-aware, leap seconds, arbitrary length of months, etc) or fixed interval (strict multiples of SI units). Unfortunately this is inferred by first trying to parse as a calendar interval, then falling back to fixed if that fails. This leads to confusing arrangement where `1d` == calendar, but `2d` == fixed. And if you want a day of fixed time, you have to specify `24h` (e.g. the next smallest unit). This arrangement is very error-prone for users. This PR adds `calendar_interval` and `fixed_interval` parameters to any code that uses intervals (date_histogram, rollup, composite, datafeed, etc). Calendar only accepts calendar intervals, fixed accepts any combination of units (meaning `1d` can be used to specify `24h` in fixed time), and both are mutually exclusive. The old interval behavior is deprecated and will throw a deprecation warning. It is also mutually exclusive with the two new parameters. In the future the old dual-purpose interval will be removed. The change applies to both REST and java clients.	2019-05-20 12:07:29 -04:00
James Rodewig	005296dac6	[DOCS] Allow attribute substitution in titleabbrevs for Asciidoctor migration (#41574 ) * [DOCS] Replace attributes in titleabbrevs for Asciidoctor migration * [DOCS] Add [subs="attributes"] so attributes render in Asciidoctor * Revert "[DOCS] Replace attributes in titleabbrevs for Asciidoctor migration" This reverts commit 98f130257a7c71e9f6cddf5157af7886418338d8. * [DOCS] Fix merge conflict	2019-04-30 13:46:45 -04:00
David Roberts	cbe7d335ff	[DOCS] Use "source" instead of "inline" in ML docs (#40635 ) Specifying an inline script in an "inline" field was deprecated in 5.x. The new field name is "source". (Since 6.x still accepts "inline" I will only backport this docs change as far as 7.0.)	2019-03-29 17:30:28 +00:00
David Kyle	48788269b0	[ML] Correct small inconsistencies in ml APIs spec and docs (#39907 )	2019-03-11 14:02:50 +00:00
David Roberts	5f8f91c03b	[ML] Use scaling thread pool and xpack.ml.max_open_jobs cluster-wide dynamic (#39736 ) This change does the following: 1. Makes the per-node setting xpack.ml.max_open_jobs into a cluster-wide dynamic setting 2. Changes the job node selection to continue to use the per-node attributes storing the maximum number of open jobs if any node in the cluster is older than 7.1, and use the dynamic cluster-wide setting if all nodes are on 7.1 or later 3. Changes the docs to reflect this 4. Changes the thread pools for native process communication from fixed size to scaling, to support the dynamic nature of xpack.ml.max_open_jobs 5. Renames the autodetect thread pool to the job comms thread pool to make clear that it will be used for other types of ML jobs (data frame analytics in particular) Backport of #39320	2019-03-06 12:29:34 +00:00
Tal Levy	92756288b4	relax ML Info Docs expected response (#38993 ) the get-ml-info API documentation tested that the response show that ML's `upgrade_mode` was false. For reasons that may be true due to other tests running in parallel or not cleaning themselves up, this may not be guaranteed. Since the actual value here is not of importance, this commit relaxes the requirement that upgrade_mode be static.	2019-02-15 16:31:01 -08:00
Alexander Reelsen	8e5e48319e	Add documentation about breaking java time changes (#38886 ) In addition remove joda time mentions across the docs, make sure links are updated to java time javadocs. Forward port of #38720	2019-02-14 10:18:12 +01:00
David Roberts	02f57b1e29	[DOCS] Add warning about bypassing ML PUT APIs (#38605 ) Now that ML configurations are stored in the .ml-config index rather than in cluster state there is a possibility that some users may try to add configurations directly to the index. Allowing this creates a variety of problems including possible data exflitration attacks (depending on how security is set up), so this commit adds warnings against allowing writes to the .ml-config index other than via the ML APIs. Backport of #38509	2019-02-08 11:35:37 +00:00
David Roberts	1fa413a16d	[ML] Remove "8" prefixes from file structure finder timestamp formats (#38016 ) In 7.x Java timestamp formats are the default timestamp format and there is no need to prefix them with "8". (The "8" prefix was used in 6.7 to distinguish Java timestamp formats from Joda timestamp formats.) This change removes the "8" prefixes from timestamp formats in the output of the ML file structure finder.	2019-02-01 15:36:04 +00:00
Benjamin Trent	8280a20664	ML: Add upgrade mode docs, hlrc, and fix bug (#37942 ) * ML: Add upgrade mode docs, hlrc, and fix bug * [DOCS] Fixes build error and edits text * adjusting docs * Update docs/reference/ml/apis/set-upgrade-mode.asciidoc Co-Authored-By: benwtrent <ben.w.trent@gmail.com> * Update set-upgrade-mode.asciidoc * Update set-upgrade-mode.asciidoc	2019-01-30 06:51:11 -06:00
Lisa Cawley	19529da2db	[DOCS] Delayed data annotations (#37939 )	2019-01-28 13:04:38 -08:00
Benjamin Trent	7e4c0e6991	ML: Adds set_upgrade_mode API endpoint (#37837 ) * ML: Add MlMetadata.upgrade_mode and API * Adding tests * Adding wait conditionals for the upgrade_mode call to return * Adding tests * adjusting format and tests * Adjusting wait conditions for api return and msgs * adjusting doc tests * adding upgrade mode tests to black list	2019-01-28 09:07:30 -06:00
David Roberts	f2c0c26d15	[ML] Adjust structure finder for Joda to Java time migration (#37306 ) The ML file structure finder has always reported both Joda and Java time format strings. This change makes the Java time format strings the ones that are incorporated into mappings and ingest pipeline definitions. The BWC syntax of prepending "8" to these formats is used. This will need to be removed once Java time format strings become the default in Elasticsearch. This commit also removes direct imports of Joda classes in the structure finder unit tests. Instead the core Joda BWC class is used.	2019-01-26 20:19:57 +00:00
Christoph Büscher	34f2d2ec91	Remove remaining occurances of "include_type_name=true" in docs (#37646 )	2019-01-22 15:13:52 +01:00
David Kyle	0ae7f8630c	Document ml datafeed Id limitations (#37653 )	2019-01-21 14:12:20 +00:00
Benjamin Trent	12cdf1cba4	ML: Add support for single bucket aggs in Datafeeds (#37544 ) Single bucket aggs are now supported in datafeed aggregation configurations.	2019-01-18 15:08:53 -06:00
Lisa Cawley	6dcb3af4c8	[DOCS] Adds size limitation to the get datafeeds APIs (#37578 )	2019-01-17 10:47:15 -08:00
Lisa Cawley	a2d9c464b2	[DOCS] Adds limitation to the get jobs API (#37549 )	2019-01-17 08:21:37 -08:00
Julie Tibshirani	36a3b84fc9	Update the default for include_type_name to false. (#37285 ) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs.	2019-01-14 13:08:01 -08:00
lcawl	2d5a8ec59d	[DOCS] Remove unused screenshots	2019-01-10 11:10:25 -08:00

1 2 3 4 5 ...

279 Commits