OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-18 19:05:06 +00:00

Author	SHA1	Message	Date
James Rodewig	b302b09b85	[DOCS] Reformat snippets to use two-space indents (#59973 ) (#59994 )	2020-07-21 15:49:58 -04:00
Przemysław Witek	283a1f605c	Rename binary_soft_classification evaluation to outlier_detection (#59951 ) (#59970 )	2020-07-21 15:15:04 +02:00
Lisa Cawley	fb212269ce	[DOCS] Changes level offset of anomaly detection pages (#59911 ) (#59940 )	2020-07-20 17:04:59 -07:00
Lisa Cawley	9633d503d8	[DOCS] Changes level offset for anomaly detection APIs (#59920 ) (#59928 )	2020-07-20 13:10:54 -07:00
Lisa Cawley	8f8d24b3c1	[DOCS] Changes level offset in data frame analytics APIs (#59919 ) (#59923 )	2020-07-20 13:06:29 -07:00
Benjamin Trent	a28547c4b4	[7.x] [ML] add new `custom` field to trained model processors (#59542 ) (#59700 ) * [ML] add new `custom` field to trained model processors (#59542) This commit adds the new configurable field `custom`. `custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job. Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature). This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors in the analytics job configuration, we need to know the input and output field names.	2020-07-16 10:57:38 -04:00
Przemysław Witek	df4fea79cb	Add a "verbose" option to the data frame analytics stats endpoint (#59589 ) (#59621 )	2020-07-16 09:51:31 +02:00
Dimitris Athanasiou	b2243337d8	[7.x][ML] Data frame analytics max_num_threads setting (#59254 ) (#59308 ) This adds a setting to data frame analytics jobs called `max_number_threads`. The setting expects a positive integer. When used the user specifies the max number of threads that may be used by the analysis. Note that the actual number of threads used is limited by the number of processors on the node where the job is assigned. Also, the process may use a couple more threads for operational functionality that is not the analysis itself. This setting may also be updated for a stopped job. More threads may reduce the time it takes to complete the job at the cost of using more CPU. Backport of #59254 and #57274	2020-07-09 19:15:46 +03:00
James Rodewig	6ed356ffc3	[DOCS] Replace `datatype` with `data type` (#58972 ) (#59184 )	2020-07-07 14:59:35 -04:00
Przemysław Witek	f35ad0d4e1	Report peak model memory in ModelSizeStats (#59017 ) (#59055 )	2020-07-06 12:55:12 +02:00
Benjamin Trent	b9d9964d10	[ML] add exponent output aggregator to inference (#58933 ) (#59016 ) * [ML] add exponent output aggregator to inference * fixing docs Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-03 14:51:00 -04:00
Przemysław Witek	751e84e4c8	Rename regression evaluation metrics to make the names consistent with loss functions (#58887 ) (#58927 )	2020-07-02 17:35:55 +02:00
Przemysław Witek	909649dd15	[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734 ) (#58825 )	2020-07-01 14:52:06 +02:00
Przemysław Witek	9ea9b7bd3b	[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684 ) (#58731 )	2020-06-30 14:09:11 +02:00
István Zoltán Szabó	13aa8b8d9a	[DOCS] Updates results_field description in the inference processor docs (#58554 )	2020-06-29 13:15:15 +02:00
Przemysław Witek	3f7c45472e	[7.x] Introduce DataFrameAnalyticsConfig update API (#58302 ) (#58648 )	2020-06-29 10:56:11 +02:00
Dimitris Athanasiou	1817b896c9	[7.x][ML] Add status and increased estimate to memory usage (#58588 ) (#58606 ) Adds parsing of `status` and `memory_reestimate_bytes` to data frame analytics `memory_usage`. When the training surpasses the model memory limit, the status will be set to `hard_limit` and `memory_reestimate_bytes` can be used to update the job's limit in order to restart the job. Backport of #58588	2020-06-28 16:27:26 +03:00
István Zoltán Szabó	3169e4c70e	[DOCS] Updates screenshots in ML population analysis (#58318 )	2020-06-23 09:05:08 +02:00
Benjamin Trent	bf8641aa15	[7.x] [ML] calculate cache misses for inference and return in stats (#58252 ) (#58363 ) When a local model is constructed, the cache hit miss count is incremented. When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.	2020-06-19 09:46:51 -04:00
Przemysław Witek	7a1300a09e	[7.x] Make ModelPlotConfig.annotations_enabled default to ModelPlotConfig.enabled if unset (#57808 ) (#57815 )	2020-06-08 17:41:12 +02:00
David Kyle	08d1286de7	[7.x] Delete expired data by job (#57337 ) (#57796 ) Deleting expired data can take a long time leading to timeouts if there are many jobs. Often the problem is due to a few large jobs which prevent the regular maintenance of the remaining jobs. This change adds a job_id parameter to the delete expired data endpoint to help clean up those problematic jobs.	2020-06-08 13:00:23 +01:00
David Roberts	1d64d55a86	[7.x][ML] Add per-partition categorization option (#57723 ) This PR adds the initial Java side changes to enable use of the per-partition categorization functionality added in elastic/ml-cpp#1293. There will be a followup change to complete the work, as there cannot be any end-to-end integration tests until elastic/ml-cpp#1293 is merged, and also elastic/ml-cpp#1293 does not implement some of the more peripheral functionality, like stop_on_warn and per-partition stats documents. The changes so far cover REST APIs, results object formats, HLRC and docs. Backport of #57683	2020-06-06 08:15:17 +01:00
Dimitris Athanasiou	f49a14ce6f	[7.x][ML] Fix race condition when force stopping DF analytics job (#57680 ) (#57717 ) When we force delete a DF analytics job, we currently first force stop it and then we proceed with deleting the job config. This may result in logging errors if the job config is deleted before it is retrieved while the job is starting. Instead of force stopping the job, it would make more sense to try to stop the job gracefully first. So we now try that out first. If normal stop fails, then we resort to force stopping the job to ensure we can go through with the delete. In addition, this commit introduces `timeout` for the delete action and makes use of it in the child requests. Backport of #57680	2020-06-05 17:50:01 +03:00
Przemysław Witek	6b5f49d097	[7.x] Introduce ModelPlotConfig. annotations_enabled setting (#57539 ) (#57641 )	2020-06-04 15:15:35 +02:00
Lisa Cawley	db5bf92acf	[7.x][DOCS] Replace docdir attribute with es-repo-dir (#57489 ) (#57494 )	2020-06-01 16:42:53 -07:00
Lisa Cawley	a1514c9ffe	[DOCS] Replaces docdir attributes in ML APIs (#57390 ) (#57467 )	2020-06-01 13:46:15 -07:00
Benjamin Trent	35d5126cea	[7.x] [ML] adds new for_export flag to GET _ml/inference API (#57351 ) (#57368 ) * [ML] adds new for_export flag to GET _ml/inference API (#57351) Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API. This flag is useful for moving models between clusters.	2020-05-29 14:01:08 -04:00
Benjamin Trent	c8374dc9f3	[ML] add max_model_memory parameter to forecast request (#57254 ) (#57355 ) This adds a max_model_memory setting to forecast requests. This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb"). The default value is `20mb`. There is a HARD limit at `500mb` which will throw an error if used. If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited. related native change: https://github.com/elastic/ml-cpp/pull/1238 closes: https://github.com/elastic/elasticsearch/issues/56420	2020-05-29 11:16:08 -04:00
István Zoltán Szabó	e1cab4feb4	[DOCS] Puts a link into the loss_function variable description (#56678 )	2020-05-28 09:46:11 +02:00
István Zoltán Szabó	27f258711a	[DOCS] Fixes formatting of admonition paragraph in PUT inference API docs. (#57196 )	2020-05-27 13:43:55 +02:00
István Zoltán Szabó	47bf95cee3	[DOCS] Improves navigation between forecast APIs and adds short description. (#57035 )	2020-05-25 09:11:00 +02:00
István Zoltán Szabó	9b7356d6af	[DOCS] Removes the Jobs section from the ML anomaly detection APIs page. (#57031 )	2020-05-21 17:32:07 +02:00
Benjamin Trent	297f864884	[ML] relax throttling on expired data cleanup (#56711 ) (#56895 ) Throttling nightly cleanup as much as we do has been over cautious. Night cleanup should be more lenient in its throttling. We still keep the same batch size, but now the requests per second scale with the number of data nodes. If we have more than 5 data nodes, we don't throttle at all. Additionally, the API now has `requests_per_second` and `timeout` set. So users calling the API directly can set the throttling. This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`. This will allow users to adjust throttling of the nightly maintenance.	2020-05-18 08:46:42 -04:00
David Roberts	4438115be0	[DOCS] Docs changes for overridden delimiter in find_file_structure (#56288 ) Docs for #55735 Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-05-14 09:25:21 +01:00
Lisa Cawley	1474606b18	[DOCS] Clarify model snapshot retention properties (#56477 )	2020-05-11 07:43:10 -07:00
István Zoltán Szabó	ebe1e4c4c4	[DOCS] Expands GET DFA stats API docs with new phases (#56407 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-05-11 09:26:15 +02:00
David Roberts	7aa0daaabd	[7.x][ML] More advanced model snapshot retention options (#56194 ) This PR implements the following changes to make ML model snapshot retention more flexible in advance of adding a UI for the feature in an upcoming release. - The default for `model_snapshot_retention_days` for new jobs is now 10 instead of 1 - There is a new job setting, `daily_model_snapshot_retention_after_days`, that defaults to 1 for new jobs and `model_snapshot_retention_days` for pre-7.8 jobs - For days that are older than `model_snapshot_retention_days`, all model snapshots are deleted as before - For days that are in between `daily_model_snapshot_retention_after_days` and `model_snapshot_retention_days` all but the first model snapshot for that day are deleted - The `retain` setting of model snapshots is still respected to allow selected model snapshots to be retained indefinitely Backport of #56125	2020-05-05 14:31:58 +01:00
Dimitris Athanasiou	75dadb7a6d	[7.x][ML] Add loss_function to regression (#56118 ) (#56187 ) Adds parameters `loss_function` and `loss_function_parameter` to regression. Backport of #56118	2020-05-05 14:59:51 +03:00
István Zoltán Szabó	9bcc975bd1	[DOCS] Simplifies footnote text in DFA APIs (#56105 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-05-05 09:05:08 +02:00
Lisa Cawley	b816ab0c18	[DOCS] Synchs and links hyperparameter descriptions (#56131 )	2020-05-04 10:37:26 -07:00
Lisa Cawley	006e00ed0a	[DOCS] Adds documentation for secondary authorization headers (#55365 ) (#55986 )	2020-04-29 16:29:38 -07:00
István Zoltán Szabó	e982cf4381	[DOCS] Makes the footnotes less verbose in configuring aggs page. (#55857 )	2020-04-29 09:52:29 +02:00
István Zoltán Szabó	a5cf4712e5	[DOCS] Changes feature importance links to point to the new page (#55531 ) * [DOCS] Changes feature importance links to point to the new page. * [DOCS] Fixes line breaks.	2020-04-28 09:03:43 +02:00
David Roberts	3ba44a5af8	[ML] Adding failed_category_count to model_size_stats (#55761 ) The failed_category_count statistic records the number of times categorization wanted to create a new category but couldn't because the job had reached its model_memory_limit. Backport of #55716	2020-04-25 10:36:49 +01:00
Lisa Cawley	314ca78e31	[7.x][DOCS] Update example and nesting in get data frame analytics job stats API (#55612 )	2020-04-22 10:58:26 -07:00
David Roberts	2dc5586afe	[ML] Add effective max model memory limit to ML info (#55581 ) The ML info endpoint returns the max_model_memory_limit setting if one is configured. However, it is still possible to create a job that cannot run anywhere in the current cluster because no node in the cluster has enough memory to accommodate it. This change adds an extra piece of information, limits.effective_max_model_memory_limit, to the ML info response that returns the biggest model memory limit that could be run in the current cluster assuming no other jobs were running. The idea is that the ML UI will be able to warn users who try to create jobs with higher model memory limits that their jobs will not be able to start unless they add a bigger ML node to their cluster. Backport of #55529	2020-04-22 12:28:50 +01:00
David Roberts	da5aeb8be7	[ML] Return assigned node in start/open job/datafeed response (#55570 ) Adds a "node" field to the response from the following endpoints: 1. Open anomaly detection job 2. Start datafeed 3. Start data frame analytics job If the job or datafeed is assigned to a node immediately then this field will return the ID of that node. In the case where a job or datafeed is opened or started lazily the node field will contain an empty string. Clients that want to test whether a job or datafeed was opened or started lazily can therefore check for this. Backport of #55473	2020-04-22 12:06:53 +01:00
István Zoltán Szabó	0ce3406033	[DOCS] Provides further details on aggregations in datafeeds (#55462 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-04-22 08:54:52 +02:00
Benjamin Trent	24d41eb695	[ML] partitions model definitions into chunks (#55260 ) (#55484 ) This paves the data layer way so that exceptionally large models are partitioned across multiple documents. This change means that nodes before 7.8.0 will not be able to use trained inference models created on nodes on or after 7.8.0. I chose the definition document limit to be 100. This SHOULD be plenty for any large model. One of the largest models that I have created so far had the following stats: ~314MB of inflated JSON, ~66MB when compressed, ~177MB of heap. With the chunking sizes of `16 * 1024 * 1024` its compressed string could be partitioned to 5 documents. Supporting models 20 times this size (compressed) seems adequate for now.	2020-04-20 16:08:54 -04:00
Lisa Cawley	c7cf6e621d	[DOCS] Remove text fields from classification dependent variables (#54849 )	2020-04-16 13:40:28 -07:00

1 2 3 4 5 ...

271 Commits