OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-18 19:05:06 +00:00

Author	SHA1	Message	Date
István Zoltán Szabó	3169e4c70e	[DOCS] Updates screenshots in ML population analysis (#58318 )	2020-06-23 09:05:08 +02:00
David Kyle	08d1286de7	[7.x] Delete expired data by job (#57337 ) (#57796 ) Deleting expired data can take a long time leading to timeouts if there are many jobs. Often the problem is due to a few large jobs which prevent the regular maintenance of the remaining jobs. This change adds a job_id parameter to the delete expired data endpoint to help clean up those problematic jobs.	2020-06-08 13:00:23 +01:00
David Roberts	1d64d55a86	[7.x][ML] Add per-partition categorization option (#57723 ) This PR adds the initial Java side changes to enable use of the per-partition categorization functionality added in elastic/ml-cpp#1293. There will be a followup change to complete the work, as there cannot be any end-to-end integration tests until elastic/ml-cpp#1293 is merged, and also elastic/ml-cpp#1293 does not implement some of the more peripheral functionality, like stop_on_warn and per-partition stats documents. The changes so far cover REST APIs, results object formats, HLRC and docs. Backport of #57683	2020-06-06 08:15:17 +01:00
Przemysław Witek	6b5f49d097	[7.x] Introduce ModelPlotConfig. annotations_enabled setting (#57539 ) (#57641 )	2020-06-04 15:15:35 +02:00
Lisa Cawley	db5bf92acf	[7.x][DOCS] Replace docdir attribute with es-repo-dir (#57489 ) (#57494 )	2020-06-01 16:42:53 -07:00
Lisa Cawley	a1514c9ffe	[DOCS] Replaces docdir attributes in ML APIs (#57390 ) (#57467 )	2020-06-01 13:46:15 -07:00
Benjamin Trent	c8374dc9f3	[ML] add max_model_memory parameter to forecast request (#57254 ) (#57355 ) This adds a max_model_memory setting to forecast requests. This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb"). The default value is `20mb`. There is a HARD limit at `500mb` which will throw an error if used. If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited. related native change: https://github.com/elastic/ml-cpp/pull/1238 closes: https://github.com/elastic/elasticsearch/issues/56420	2020-05-29 11:16:08 -04:00
István Zoltán Szabó	47bf95cee3	[DOCS] Improves navigation between forecast APIs and adds short description. (#57035 )	2020-05-25 09:11:00 +02:00
István Zoltán Szabó	9b7356d6af	[DOCS] Removes the Jobs section from the ML anomaly detection APIs page. (#57031 )	2020-05-21 17:32:07 +02:00
Benjamin Trent	297f864884	[ML] relax throttling on expired data cleanup (#56711 ) (#56895 ) Throttling nightly cleanup as much as we do has been over cautious. Night cleanup should be more lenient in its throttling. We still keep the same batch size, but now the requests per second scale with the number of data nodes. If we have more than 5 data nodes, we don't throttle at all. Additionally, the API now has `requests_per_second` and `timeout` set. So users calling the API directly can set the throttling. This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`. This will allow users to adjust throttling of the nightly maintenance.	2020-05-18 08:46:42 -04:00
David Roberts	4438115be0	[DOCS] Docs changes for overridden delimiter in find_file_structure (#56288 ) Docs for #55735 Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-05-14 09:25:21 +01:00
David Roberts	7aa0daaabd	[7.x][ML] More advanced model snapshot retention options (#56194 ) This PR implements the following changes to make ML model snapshot retention more flexible in advance of adding a UI for the feature in an upcoming release. - The default for `model_snapshot_retention_days` for new jobs is now 10 instead of 1 - There is a new job setting, `daily_model_snapshot_retention_after_days`, that defaults to 1 for new jobs and `model_snapshot_retention_days` for pre-7.8 jobs - For days that are older than `model_snapshot_retention_days`, all model snapshots are deleted as before - For days that are in between `daily_model_snapshot_retention_after_days` and `model_snapshot_retention_days` all but the first model snapshot for that day are deleted - The `retain` setting of model snapshots is still respected to allow selected model snapshots to be retained indefinitely Backport of #56125	2020-05-05 14:31:58 +01:00
Lisa Cawley	006e00ed0a	[DOCS] Adds documentation for secondary authorization headers (#55365 ) (#55986 )	2020-04-29 16:29:38 -07:00
István Zoltán Szabó	e982cf4381	[DOCS] Makes the footnotes less verbose in configuring aggs page. (#55857 )	2020-04-29 09:52:29 +02:00
David Roberts	3ba44a5af8	[ML] Adding failed_category_count to model_size_stats (#55761 ) The failed_category_count statistic records the number of times categorization wanted to create a new category but couldn't because the job had reached its model_memory_limit. Backport of #55716	2020-04-25 10:36:49 +01:00
Lisa Cawley	314ca78e31	[7.x][DOCS] Update example and nesting in get data frame analytics job stats API (#55612 )	2020-04-22 10:58:26 -07:00
David Roberts	2dc5586afe	[ML] Add effective max model memory limit to ML info (#55581 ) The ML info endpoint returns the max_model_memory_limit setting if one is configured. However, it is still possible to create a job that cannot run anywhere in the current cluster because no node in the cluster has enough memory to accommodate it. This change adds an extra piece of information, limits.effective_max_model_memory_limit, to the ML info response that returns the biggest model memory limit that could be run in the current cluster assuming no other jobs were running. The idea is that the ML UI will be able to warn users who try to create jobs with higher model memory limits that their jobs will not be able to start unless they add a bigger ML node to their cluster. Backport of #55529	2020-04-22 12:28:50 +01:00
David Roberts	da5aeb8be7	[ML] Return assigned node in start/open job/datafeed response (#55570 ) Adds a "node" field to the response from the following endpoints: 1. Open anomaly detection job 2. Start datafeed 3. Start data frame analytics job If the job or datafeed is assigned to a node immediately then this field will return the ID of that node. In the case where a job or datafeed is opened or started lazily the node field will contain an empty string. Clients that want to test whether a job or datafeed was opened or started lazily can therefore check for this. Backport of #55473	2020-04-22 12:06:53 +01:00
István Zoltán Szabó	0ce3406033	[DOCS] Provides further details on aggregations in datafeeds (#55462 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-04-22 08:54:52 +02:00
István Zoltán Szabó	3a3effedc2	[DOCS] Reworks some parts of EMM API docs (#54872 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-04-08 10:20:34 +02:00
Benjamin Trent	65233383f6	[7.x] [ML] prefer secondary authorization header for data[feed\|frame] authz (#54121 ) (#54645 ) * [ML] prefer secondary authorization header for data[feed\|frame] authz (#54121) Secondary authorization headers are to be used to facilitate Kibana spaces support + ML jobs/datafeeds. Now on PUT/Update/Preview datafeed, and PUT data frame analytics the secondary authorization is preferred over the primary (if provided). closes https://github.com/elastic/elasticsearch/issues/53801 * fixing for backport	2020-04-02 11:20:25 -04:00
Benjamin Trent	eb31be0e71	[7.x] [ML] add num_matches and preferred_to_categories to category defintion objects (#54214 ) (#54639 ) * [ML] add num_matches and preferred_to_categories to category defintion objects (#54214) This adds two new fields to category definitions. - `num_matches` indicating how many documents have been seen by this category - `preferred_to_categories` indicating which other categories this particular category supersedes when messages are categorized. These fields are only guaranteed to be up to date after a `_flush` or `_close` native change: https://github.com/elastic/ml-cpp/pull/1062 * adjusting for backport	2020-04-02 09:09:19 -04:00
István Zoltán Szabó	27f88fcdac	[DOCS] Updates estimate model memory docs (#54574 )	2020-04-01 15:55:25 +02:00
Jason Tedor	5fcda57b37	Rename MetaData to Metadata in all of the places (#54519 ) This is a simple naming change PR, to fix the fact that "metadata" is a single English word, and for too long we have not followed general naming conventions for it. We are also not consistent about it, for example, METADATA instead of META_DATA if we were trying to be consistent with MetaData (although METADATA is correct when considered in the context of "metadata"). This was a simple find and replace across the code base, only taking a few minutes to fix this naming issue forever.	2020-03-31 17:24:38 -04:00
Lisa Cawley	0fa1060ca4	[DOCS] Collapses content in machine learning APIs (#54234 ) (#54453 )	2020-03-30 11:06:33 -07:00
David Roberts	7667004b20	[ML] Add a model memory estimation endpoint for anomaly detection (#54129 ) A new endpoint for estimating anomaly detection job model memory requirements: POST _ml/anomaly_detectors/estimate_model_memory Backport of #53507	2020-03-24 22:55:11 +00:00
István Zoltán Szabó	53f7e31462	[DOCS] Fixes typo in start datafeed API docs. (#53811 )	2020-03-19 17:56:40 +01:00
István Zoltán Szabó	00203c35fe	[DOCS] Changes seconds to milliseconds since the Epoch in AD docs. (#53797 )	2020-03-19 15:42:58 +01:00
István Zoltán Szabó	aafc0409a9	[DOCS] Makes the description clearer on how to use aggregations in an anomaly detection job (#53103 ) Co-authored-by: lcawl <lcawley@elastic.co>	2020-03-09 09:49:59 +01:00
István Zoltán Szabó	bf3dcd4229	[DOCS] Adds deleting flag to the GET job stats API docs (#53223 )	2020-03-06 16:04:20 +01:00
Lisa Cawley	4fbe1b0550	[DOCS] Adds cat anomaly detectors API (#52866 ) (#52970 )	2020-03-02 07:28:55 -08:00
Benjamin Trent	eac38e9847	[ML] Add indices_options to datafeed config and update (#52793 ) (#52905 ) This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index. This is necessary for the following use cases: - Reading from frozen indices - Allowing certain indices in multiple index patterns to not exist yet These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object. closes https://github.com/elastic/elasticsearch/issues/48056	2020-02-27 13:43:25 -05:00
Lisa Cawley	b788ec7157	[DOCS] Adds cat datafeeds API (#52738 )	2020-02-26 09:28:57 -08:00
Lisa Cawley	924f0bd243	[DOCS] Updates custom rules example (#52731 )	2020-02-25 09:32:52 -08:00
David Roberts	cf122d13b8	[ML] Use event.timezone in file_structure_finder ingest pipeline (#52720 ) This is because beat.timezone was renamed to event.timezone in elastic/beats#9458	2020-02-25 12:33:53 +00:00
lcawl	c6e35b460e	[DOCS] Adds anchor for custom rules	2020-02-24 11:39:15 -08:00
David Roberts	1cefafdd14	[ML] Add new categorization stats to model_size_stats (#52009 ) This change adds support for the following new model_size_stats fields: - categorized_doc_count - total_category_count - frequent_category_count - rare_category_count - dead_category_count - categorization_status Backport of #51879	2020-02-10 09:10:50 +00:00
Darren LaCasse	480e9238a4	[DOCS] Remove extra word (#51757 )	2020-01-31 10:30:06 -08:00
Lisa Cawley	1a40ebfa67	[DOCS] Adds missing testenv attribute (#51719 )	2020-01-30 16:15:17 -08:00
David Roberts	550254ec7f	[ML] Use CSV ingest processor in find_file_structure ingest pipeline (#51492 ) Changes the find_file_structure response to include a CSV ingest processor in the ingest pipeline it suggests. Previously the Kibana file upload functionality parsed CSV in the browser, but by parsing CSV in the ingest pipeline it makes the Kibana file upload functionality more easily interchangable with Filebeat such that the configurations it creates can more easily be used to import data with the same structure repeatedly in production.	2020-01-28 14:38:43 +00:00
Lisa Cawley	ec47698f7c	[DOCS] Updates categorization examples with wizard screenshots (#51133 )	2020-01-22 11:28:17 -08:00
Lisa Cawley	4590d4156a	[DOCS] Clarify interval, frequency, and bucket span in ML APIs and example (#51280 )	2020-01-22 08:15:46 -08:00
István Zoltán Szabó	83c92cf7eb	[DOCS] Adds text about data types to the categorization docs (#51145 )	2020-01-17 10:00:20 -08:00
István Zoltán Szabó	b570f417c2	[DOCS] Describes the relationship of the time-related settings in anomaly detection docs (#50959 ) Co-Authored-By: David Roberts <dave.roberts@elastic.co>	2020-01-15 08:46:04 +01:00
lcawl	8a5de4f56f	[DOCS] Clarify detector_index property in ML APIs (#50723 )	2020-01-09 08:34:34 -08:00
István Zoltán Szabó	acd73dda1c	[DOCS] Improves find_file_structure documentation (#50743 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-01-09 11:20:29 +01:00
István Zoltán Szabó	8a1bb440e2	[DOCS] Clarifies model_size_stats.total_xxx_field_count objects and removes notes in GET job stats API docs. (#50728 )	2020-01-09 09:45:37 +01:00
István Zoltán Szabó	d7bb5d7531	[DOCS] Improves description for forecast_stats (#50729 ) Co-Authored-By: Lisa Cawley <lcawley@elastic.co>	2020-01-09 09:35:47 +01:00
Lisa Cawley	62969c35cd	[DOCS] Adds missing timing_stats descriptions (#50574 )	2020-01-03 09:14:09 -08:00
Lisa Cawley	81a9cff16f	[7.x][DOCS] Remove redundant results from ML APIs (#50565 )	2020-01-02 11:23:26 -08:00

1 2

91 Commits