OpenSearch

Commit Graph

Author	SHA1	Message	Date
Alpar Torok	7119e54be5	Mute data frame tests on 7.x Tracking in #45610 #45609	2019-08-15 17:07:53 +03:00
Benjamin Trent	0c343d8443	[7.x] [ML][Transforms] adjusting stats.progress for cont. transforms (#45361 ) (#45551 ) * [ML][Transforms] adjusting stats.progress for cont. transforms (#45361) * [ML][Transforms] adjusting stats.progress for cont. transforms * addressing PR comments * rename fix * Adjusting bwc serialization versions	2019-08-14 13:08:27 -05:00
David Roberts	14545f8958	[ML-DataFrame] Combine task_state and indexer_state in _stats (#45324 ) This commit replaces task_state and indexer_state in the data frame _stats output with a single top level state that combines the two. It is defined as: - failed if what's currently reported as task_state is failed - stopped if there is no persistent task - Otherwise what's currently reported as indexer_state Backport of #45276	2019-08-08 16:24:26 +01:00
Benjamin Trent	3a71b91dca	[ML][Data Frame] add support for geo_bounds aggregation (#44441 ) (#45281 ) This adds support for `geo_bounds` aggregation inside the `pivot.aggregations` configuration. The two points returned from the `geo_bounds` aggregation are transformed into `geo_shape` whose types are dynamic given the point's similarity. * `point` if the two points are identical * `linestring` if the two points share either a latitude or longitude * `polygon` if the two points are completely different The automatically deduced mapping for the resulting field is a `geo_shape`.	2019-08-07 10:37:09 -05:00
Benjamin Trent	7bfaba98c2	[ML][Data Frame] cleaning up and adjusting failure tests (#45101 ) (#45144 )	2019-08-05 09:12:11 -05:00
Benjamin Trent	22feedf289	[ML][Data Frame] add support for bucket_selector (#44718 ) (#45008 )	2019-07-30 11:32:58 -05:00
David Roberts	caf9411a72	[ML] Improve response format of data frame stats endpoint (#44743 ) This change adjusts the data frame transforms stats endpoint to return a structure that is easier to understand. This is a breaking change for clients of the data frame transforms stats endpoint, but the feature is in beta so stability is not guaranteed. Backport of #44350	2019-07-23 18:00:50 +01:00
Benjamin Trent	6f53865fde	[ML][Data Frame] Fixes failure state tests and failure setting handling (#44645 ) (#44698 ) * [ML][Data Frame] fixing flaky test * adjusting frequency * fixing tests * addressing PR comments	2019-07-23 08:33:12 -05:00
Benjamin Trent	a948362d0a	[7.x] [ML][Data Frame] deregister scheduler on transform failure (#44569 ) (#44576 ) * [ML][Data Frame] deregister scheduler on transform failure (#44569) * fixing test * Update DataFrameRestTestCase.java * Update DataFrameTaskFailedStateIT.java * Update DataFramePivotRestIT.java	2019-07-22 09:06:48 -05:00
Benjamin Trent	3477f5ae04	muting test testBulkIndexFailuresCauseTaskToFail (#44594 )	2019-07-18 15:03:50 -05:00
Benjamin Trent	858dbfc074	[ML][Data Frame] treat bulk index failures as an indexing failure (#44351 ) (#44427 ) * [ML][Data Frame] treat bulk index failures as an indexing failure * removing redundant public modifier * changing to an ElasticsearchException * fixing redundant public modifier	2019-07-16 10:04:28 -05:00
Ryan Ernst	7e06888bae	Convert testclusters to use distro download plugin (#44253 ) (#44362 ) Test clusters currently has its own set of logic for dealing with finding different versions of Elasticsearch, downloading them, and extracting them. This commit converts testclusters to use the DistributionDownloadPlugin.	2019-07-15 17:53:05 -07:00
Benjamin Trent	68cd675892	[ML][Data Frame] responding with 409 status code when failing _stop (#44231 ) (#44276 ) * [ML][Data Frame] responding with appropriate status code when failing _stop * adding null checks for persistent task data * addressing PR comments	2019-07-12 10:10:24 -05:00
David Roberts	cb62d4acdf	[ML-DataFrame] Add a frequency option to transform config, default 1m (#44120 ) Previously a data frame transform would check whether the source index was changed every 10 seconds. Sometimes it may be desirable for the check to be done less frequently. This commit increases the default to 60 seconds but also allows the frequency to be overridden by a setting in the data frame transform config.	2019-07-10 09:59:00 +01:00
Benjamin Trent	2c97e26ce8	[ML][Data Frame] fix progress measurement for continuous transforms (#43838 ) (#43887 ) * [ML][Data Frame] fix progress measurement for continuous transforms * Update DataFrameIndexer.java	2019-07-02 19:35:09 -05:00
Benjamin Trent	8108834534	[ML][Data Frame] account for delay in writing stats docs (#43703 ) (#43819 )	2019-07-01 09:14:44 -05:00
Benjamin Trent	67a3c656c3	[7.x] [ML][Data Frame] removing format support (#43659 ) (#43747 ) * [ML][Data Frame] removing format support (#43659) * Fixing conflicts	2019-06-28 10:02:37 -05:00
Benjamin Trent	d05593c3ad	[ML][Data Frame] adds tests for continuous DF (#43601 ) (#43654 )	2019-06-26 14:59:19 -05:00
Dimitris Athanasiou	126c2fd2d5	[7.x][ML] Machine learning data frame analytics (#43544 ) (#43592 ) This merges the initial work that adds a framework for performing machine learning analytics on data frames. The feature is currently experimental and requires a platinum license. Note that the original commits can be found in the `feature-ml-data-frame-analytics` branch. A new set of APIs is added which allows the creation of data frame analytics jobs. Configuration allows specifying different types of analysis to be performed on a data frame. At first there is support for outlier detection. The APIs are: - PUT _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id} - GET _ml/data_frame/analysis/{id}/_stats - POST _ml/data_frame/analysis/{id}/_start - POST _ml/data_frame/analysis/{id}/_stop - DELETE _ml/data_frame/analysis/{id} When a data frame analytics job is started a persistent task is created and started. The main steps of the task are: 1. reindex the source index into the dest index 2. analyze the data through the data_frame_analyzer c++ process 3. merge the results of the process back into the destination index In addition, an evaluation API is added which packages commonly used metrics that provide evaluation of various analysis: - POST _ml/data_frame/_evaluate	2019-06-25 20:29:11 +03:00
Benjamin Trent	b333ced5a7	[7.x] [ML][Data Frame] adds new pipeline field to dest config (#43124 ) (#43388 ) * [ML][Data Frame] adds new pipeline field to dest config (#43124) * [ML][Data Frame] adds new pipeline field to dest config * Adding pipeline support to _preview * removing unused import * moving towards extracting _source from pipeline simulation * fixing permission requirement, adding _index entry to doc * adjusting for java 8 compatibility * adjusting bwc serialization version to 7.3.0	2019-06-19 16:18:27 -05:00
Alpar Torok	cce5b0f018	Convert dataframes to use testclusters (#43032 )	2019-06-14 11:02:39 +03:00
Benjamin Trent	aff4795441	[ML][Data Frame] cleaning up tests since tasks are cancelled onfinish (#43136 ) (#43166 ) * [ML][Data Frame] cleaning up usage test since tasks are cancelled onfinish * Update DataFrameUsageIT.java * Fixing additional test, waiting for task to complete * removing unused import * unmuting test	2019-06-12 14:39:38 -05:00
David Kyle	597ae5c7b8	[ML DataFrame] Reject Data Frame Ids containing upper case characters (#43145 )	2019-06-12 18:13:18 +01:00
Benjamin Trent	e384bf0276	[ML-DataFrame] stop task at completion of data frame function (#42955 ) (#43114 ) * stop data frame task after it finishes * test auto stop * adapt tests * persist the state correctly and move stop into listener * Calling `onStop` even if persistence fails, changing `stop` to rely on doSaveState	2019-06-11 15:55:02 -05:00
Benjamin Trent	755ba72896	[ML][Data frame] make sure that fields exist when creating progress (#42943 ) (#42984 )	2019-06-07 10:13:18 -05:00
Mark Vieira	e44b8b1e2e	[Backport] Remove dependency substitutions 7.x (#42866 ) * Remove unnecessary usage of Gradle dependency substitution rules (#42773) (cherry picked from commit 12d583dbf6f7d44f00aa365e34fc7e937c3c61f7)	2019-06-04 13:50:23 -07:00
Benjamin Trent	0253927ec4	[ML Data Frame] Refactor stop logic (#42644 ) (#42763 ) * Revert "invalid test" This reverts commit 9dd8b52c13c716918ff97e6527aaf43aefc4695d. * Testing * mend * Revert "[ML Data Frame] Mute Data Frame tests" This reverts commit 5d837fa312b0e41a77a65462667a2d92d1114567. * Call onStop and onAbort outside atomic update * Don’t update CS * Tidying up * Remove invalid test that asserted logic that has been removed * Add stopped event * Revert "Add stopped event" This reverts commit 02ba992f4818bebd838e1c7678bd2e1cc090bfab. * Adding check for STOPPED in saveState	2019-06-03 06:53:44 -05:00
Benjamin Trent	f22dcfb9da	[ML] [Data Frame] nesting group_by fields like other aggs (#42718 ) (#42760 )	2019-05-31 10:55:35 -05:00
Benjamin Trent	b5527b3278	[ML] [Data Frame] add support for weighted_avg agg (#42646 ) (#42714 )	2019-05-30 12:05:35 -05:00
Hendrik Muhs	345ff21ae5	[ML-DataFrame] rewrite start and stop to answer with acknowledged (#42589 ) rewrite start and stop to answer with acknowledged fixes #42450	2019-05-29 11:14:32 +02:00
Hendrik Muhs	6d47ee9268	[ML-DataFrame] add support for fixed_interval, calendar_interval, remove interval (#42427 ) * add support for fixed_interval, calendar_interval, remove interval * adapt HLRC * checkstyle * add a hlrc to server test * adapt yml test * improve naming and doc * improve interface and add test code for hlrc to server * address review comments * repair merge conflict * fix date patterns * address review comments * remove assert for warning * improve exception message * use constants	2019-05-24 20:30:17 +02:00
Hendrik Muhs	7cee294acf	[ML-DataFrame]backport dataframe changes from 42202, using client instead of transport (#42468 ) backport dataframe changes from #42202, using client instead of transport	2019-05-24 11:05:30 +02:00
David Kyle	f696769a39	Mute Data Frame integration tests Relates to https://github.com/elastic/elasticsearch/issues/42344	2019-05-22 15:03:13 +01:00
David Kyle	7e4d3c695b	[ML Data Frame] Persist and restore checkpoint and position (#41942 ) Persist and restore Data frame's current checkpoint and position	2019-05-21 18:57:13 +01:00
David Kyle	24144aead2	[ML] Complete the Data Frame task on stop (#41752 ) (#42063 ) Wait for indexer to stop then complete the persistent task on stop. If the wait_for_completion is true the request will not return until stopped.	2019-05-21 10:24:20 +01:00
Zachary Tong	6ae6f57d39	[7.x Backport] Force selection of calendar or fixed intervals (#41906 ) The date_histogram accepts an interval which can be either a calendar interval (DST-aware, leap seconds, arbitrary length of months, etc) or fixed interval (strict multiples of SI units). Unfortunately this is inferred by first trying to parse as a calendar interval, then falling back to fixed if that fails. This leads to confusing arrangement where `1d` == calendar, but `2d` == fixed. And if you want a day of fixed time, you have to specify `24h` (e.g. the next smallest unit). This arrangement is very error-prone for users. This PR adds `calendar_interval` and `fixed_interval` parameters to any code that uses intervals (date_histogram, rollup, composite, datafeed, etc). Calendar only accepts calendar intervals, fixed accepts any combination of units (meaning `1d` can be used to specify `24h` in fixed time), and both are mutually exclusive. The old interval behavior is deprecated and will throw a deprecation warning. It is also mutually exclusive with the two new parameters. In the future the old dual-purpose interval will be removed. The change applies to both REST and java clients.	2019-05-20 12:07:29 -04:00
Benjamin Trent	f2447364fd	[ML] adds geo_centroid aggregation support to data frames (#42088 ) (#42094 )	2019-05-17 16:51:05 -04:00
Benjamin Trent	febee07dcc	[ML] adding pivot.max_search_page_size option for setting paging size (#41920 ) (#42079 ) * [ML] adding pivot.size option for setting paging size * Changing field name to address PR comments * fixing ctor usage * adjust hlrc for field name change	2019-05-10 13:22:31 -05:00
Hendrik Muhs	0c03707704	[ML-DataFrame] reset/clear the position after indexer is done (#41736 ) reset/clear the position after indexer is done	2019-05-06 09:41:51 +02:00
Benjamin Trent	a70f796edd	[ML] fix array oob in IDGenerator and adjust format for mapping (#41703 ) (#41717 ) * [ML] fix array oob in IDGenerator and adjust format for mapping * Update DataFramePivotRestIT.java	2019-05-02 11:09:42 -05:00
Benjamin Trent	92a820bc1a	[ML] Add bucket_script agg support to data frames (#41594 ) (#41639 )	2019-04-29 10:14:17 -05:00
Benjamin Trent	a0990ca239	[ML] cleanup + adding description field to transforms (#41554 ) (#41605 ) * [ML] cleanup + adding description field to transforms * making description length have a max of 1k	2019-04-26 16:50:59 -05:00
Benjamin Trent	08843ba62b	[ML] Adds progress reporting for transforms (#41278 ) (#41529 ) * [ML] Adds progress reporting for transforms * fixing after master merge * Addressing PR comments * removing unused imports * Adjusting afterKey handling and percentage to be 100* * Making sure it is a linked hashmap for serialization * removing unused import * addressing PR comments * removing unused import * simplifying code, only storing total docs and decrementing * adjusting for rewrite * removing initial progress gathering from executor	2019-04-25 11:23:12 -05:00
Benjamin Trent	e2f8ffdde8	[ML][Data Frame] Moving destination creation to _start (#41416 ) (#41433 ) * [ML][Data Frame] Moving destination creation to _start * slight refactor of DataFrameAuditor constructor	2019-04-23 09:32:57 -05:00
Hendrik Muhs	02247cc7df	[ML-DataFrame] adapt page size on circuit breaker responses (#41149 ) handle circuit breaker response and adapt page size to reduce memory pressure, reduce preview buckets to 100, initial page size to 500	2019-04-16 19:49:43 +02:00
Benjamin Trent	9e32e36799	[ML] fixing test related to #40963 (#41074 ) (#41116 )	2019-04-11 11:19:56 -05:00
Hendrik Muhs	f9018ab11b	[ML-DataFrame] create checkpoints on every new run (#40725 ) Use the checkpoint service to create a checkpoint on every new run. Expose checkpoints stats on _stats endpoint.	2019-04-10 09:14:11 +02:00
Julie Tibshirani	0702c72151	Mute DataFrameGetAndGetStatsIT#testGetPersistedStatsWithoutTask. Tracked in #40963.	2019-04-09 16:39:16 -07:00
Benjamin Trent	a8dbb07546	[ML] Changes default destination index field mapping and adds scripted_metric agg (#40750 ) (#40846 ) * [ML] Allowing destination index mappings to have dynamic types, adds script_metric agg * Making dynamic\|source mapping explicit	2019-04-05 11:34:20 -05:00
Benjamin Trent	945e7ca01e	[ML] Periodically persist data-frame running statistics to internal index (#40650 ) (#40729 ) * [ML] Add mappings, serialization, and hooks to persist stats * Adding tests for transforms without tasks having stats persisted * intermittent commit * Adjusting usage stats to account for stored stats docs * Adding tests for id expander * Addressing PR comments * removing unused import * adding shard failures to the task response	2019-04-02 14:16:55 -05:00

1 2

67 Commits