86 Commits

Author SHA1 Message Date
Benjamin Trent
f22dcfb9da
[ML] [Data Frame] nesting group_by fields like other aggs (#42718) (#42760) 2019-05-31 10:55:35 -05:00
Jason Tedor
371cb9a8ce
Remove Log4j 1.2 API as a dependency (#42702)
We had this as a dependency for legacy dependencies that still needed
the Log4j 1.2 API. This appears to no longer be necessary, so this
commit removes this artifact as a dependency.

To remove this dependency, we had to fix a few places where we were
accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since
both APIs were on the compile-time classpath).

Finally, we can remove our custom Netty logger factory. This was needed
when we were on Log4j 1.2 and handled logging in our own unique
way. When we migrated to Log4j 2 we could have dropped this
dependency. However, even then Netty would still pick up Log4j 1.2 since
it was on the classpath, thus the advantage to removing this as a
dependency now.
2019-05-30 16:08:07 -04:00
Benjamin Trent
b5527b3278
[ML] [Data Frame] add support for weighted_avg agg (#42646) (#42714) 2019-05-30 12:05:35 -05:00
David Kyle
c5a410f68b [ML Data Frame] Set DF task state when stopping (#42516)
Set the state to stopped prior to persisting
2019-05-29 16:39:44 +01:00
Hendrik Muhs
345ff21ae5 [ML-DataFrame] rewrite start and stop to answer with acknowledged (#42589)
rewrite start and stop to answer with acknowledged

fixes #42450
2019-05-29 11:14:32 +02:00
David Kyle
aea600fe7d [Ml Data Frame] Return bad_request on preview when config is invalid (#42447) 2019-05-28 15:36:50 +01:00
Hendrik Muhs
6d47ee9268 [ML-DataFrame] add support for fixed_interval, calendar_interval, remove interval (#42427)
* add support for fixed_interval, calendar_interval, remove interval

* adapt HLRC

* checkstyle

* add a hlrc to server test

* adapt yml test

* improve naming and doc

* improve interface and add test code for hlrc to server

* address review comments

* repair merge conflict

* fix date patterns

* address review comments

* remove assert for warning

* improve exception message

* use constants
2019-05-24 20:30:17 +02:00
Hendrik Muhs
7cee294acf
[ML-DataFrame]backport dataframe changes from 42202, using client instead of transport (#42468)
backport dataframe changes from #42202, using client instead of transport
2019-05-24 11:05:30 +02:00
David Kyle
075cc7c5cf [ML Data Frame] Persist data frame after state changes (#42347) 2019-05-22 15:40:40 +01:00
David Kyle
f696769a39 Mute Data Frame integration tests
Relates to https://github.com/elastic/elasticsearch/issues/42344
2019-05-22 15:03:13 +01:00
David Kyle
7e4d3c695b [ML Data Frame] Persist and restore checkpoint and position (#41942)
Persist and restore Data frame's current checkpoint and position
2019-05-21 18:57:13 +01:00
Dimitris Athanasiou
a4e6fb4dd2
[ML] Fix logger declaration in ML plugins (#42222) (#42238)
This corrects what appears to have been a copy-paste error
where the logger for `MachineLearning` and `DataFrame` was wrongly
set to be that of `XPackPlugin`.
2019-05-21 18:03:24 +03:00
David Kyle
0fd42ce1f5
[ML Data Frame] Start directly data frame rather than via the scheduler (#42224)
Trigger indexer start directly to put the indexer in INDEXING state immediately
2019-05-21 15:48:45 +01:00
David Kyle
24144aead2
[ML] Complete the Data Frame task on stop (#41752) (#42063)
Wait for indexer to stop then complete the persistent task on stop.
If the wait_for_completion is true the request will not return until stopped.
2019-05-21 10:24:20 +01:00
Zachary Tong
6ae6f57d39
[7.x Backport] Force selection of calendar or fixed intervals (#41906)
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.

This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed.  And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit).  This arrangement is very
error-prone for users.

This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.

The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.

The change applies to both REST and java clients.
2019-05-20 12:07:29 -04:00
Benjamin Trent
f2447364fd
[ML] adds geo_centroid aggregation support to data frames (#42088) (#42094) 2019-05-17 16:51:05 -04:00
Benjamin Trent
febee07dcc
[ML] adding pivot.max_search_page_size option for setting paging size (#41920) (#42079)
* [ML] adding pivot.size option for setting paging size

* Changing field name to address PR comments

* fixing ctor usage

* adjust hlrc for field name change
2019-05-10 13:22:31 -05:00
Benjamin Trent
0931815355
[ML] properly nesting objects in document source (#41901) (#42077)
* [ML] properly nesting objects in document source

* Throw exception on agg extraction failure, cause it to fail df

* throwing error to stop df if unsupported agg is found
2019-05-10 13:22:12 -05:00
David Kyle
ba9d2ccc1f [ML Data Frame] Set executing nodes in task actions (#41798)
Direct the task request to the node executing the task and also refactor the task responses
so all errors are returned and set the HTTP status code based on presence of errors.
2019-05-08 12:25:36 +01:00
Ryan Ernst
6fd8924c5a Switch run task to use real distro (#41590)
The run task is supposed to run elasticsearch with the given plugin or
module. However, for modules, this is most realistic if using the full
distribution. This commit changes the run setup to use the default or
oss as appropriate.
2019-05-06 12:34:07 -07:00
Benjamin Trent
50fc27e9a0
[ML] addresses preview bug, and adds check to PUT (#41803) (#41850) 2019-05-06 10:56:26 -05:00
Hendrik Muhs
d54a921032 remove unused import 2019-05-06 10:14:35 +02:00
Hendrik Muhs
0c03707704 [ML-DataFrame] reset/clear the position after indexer is done (#41736)
reset/clear the position after indexer is done
2019-05-06 09:41:51 +02:00
Benjamin Trent
b69e28177b
[ML] rewriting stats gathering to use callbacks instead of a latch (#41793) (#41804) 2019-05-03 18:18:27 -05:00
Hendrik Muhs
00af42fefe move checkpoints into x-pack core and introduce base classes for data frame tests (#41783)
move checkpoints into x-pack core and introduce base classes for data frame tests
2019-05-03 14:16:25 +02:00
Hendrik Muhs
befe2a45b9 [ML-DataFrame] refactor pivot to only take the pivot config (#41763)
refactor pivot class to only take the config at construction, other parameters are passed in as part of
method that require them
2019-05-03 13:37:51 +02:00
Benjamin Trent
33b4032fab
[ML] Correct indexer state on task re-allocation (#41724) (#41751) 2019-05-02 12:01:59 -05:00
Benjamin Trent
a70f796edd
[ML] fix array oob in IDGenerator and adjust format for mapping (#41703) (#41717)
* [ML] fix array oob in IDGenerator and adjust format for mapping

* Update DataFramePivotRestIT.java
2019-05-02 11:09:42 -05:00
Hendrik Muhs
be7ec5a47a simplify indexer by moving members to base class (#41741)
simplify indexer by moving members to base class
2019-05-02 16:08:08 +02:00
Benjamin Trent
92a820bc1a
[ML] Add bucket_script agg support to data frames (#41594) (#41639) 2019-04-29 10:14:17 -05:00
Benjamin Trent
a0990ca239
[ML] cleanup + adding description field to transforms (#41554) (#41605)
* [ML] cleanup + adding description field to transforms

* making description length have a max of 1k
2019-04-26 16:50:59 -05:00
Benjamin Trent
3ccb48e516
[ML] data frame, verify primary shards are active for configs index before task start (#41551) (#41580) 2019-04-26 10:23:43 -05:00
Benjamin Trent
4836ff7bcd
[ML] add multi node integ tests for data frames (#41508) (#41552)
* [ML] adding native-multi-node-integTests for data frames'

* addressing streaming issues

* formatting fixes

* Addressing PR comments
2019-04-26 07:18:49 -05:00
Benjamin Trent
08843ba62b
[ML] Adds progress reporting for transforms (#41278) (#41529)
* [ML] Adds progress reporting for transforms

* fixing after master merge

* Addressing PR comments

* removing unused imports

* Adjusting afterKey handling and percentage to be 100*

* Making sure it is a linked hashmap for serialization

* removing unused import

* addressing PR comments

* removing unused import

* simplifying code, only storing total docs and decrementing

* adjusting for rewrite

* removing initial progress gathering from executor
2019-04-25 11:23:12 -05:00
Hendrik Muhs
13d720f4a7 [ML-DataFrame] Replace Streamable w/ Writeable (#41477)
replace usages of Streamable with Writeable

Relates to #36176
2019-04-24 17:17:34 +02:00
Benjamin Trent
e2f8ffdde8
[ML][Data Frame] Moving destination creation to _start (#41416) (#41433)
* [ML][Data Frame] Moving destination creation to _start

* slight refactor of DataFrameAuditor constructor
2019-04-23 09:32:57 -05:00
David Kyle
1d2365f5b6 [ML-DataFrame] Refactorings and tidying (#41248)
Remove unnecessary generic params from SingleGroupSource
and unused code from the HLRC
2019-04-17 14:58:26 +01:00
David Kyle
711d2545aa [ML-DataFrame] Resolve random test failure using deterministic name (#41262) 2019-04-17 09:04:20 +01:00
Hendrik Muhs
02247cc7df [ML-DataFrame] adapt page size on circuit breaker responses (#41149)
handle circuit breaker response and adapt page size to reduce memory pressure, reduce preview buckets to 100, initial page size to 500
2019-04-16 19:49:43 +02:00
David Kyle
2b539f8347 [ML DataFrame] Data Frame stop all (#41156)
Wild card support for the data frame stop API
2019-04-15 15:04:28 +01:00
Benjamin Trent
9e32e36799
[ML] fixing test related to #40963 (#41074) (#41116) 2019-04-11 11:19:56 -05:00
Christoph Büscher
d00d3f4afa Mute DataFrameTransformCheckpointTests#testGetBehind 2019-04-10 15:55:09 +02:00
Hendrik Muhs
f9018ab11b [ML-DataFrame] create checkpoints on every new run (#40725)
Use the checkpoint service to create a checkpoint on every new run. Expose checkpoints stats on _stats endpoint.
2019-04-10 09:14:11 +02:00
Julie Tibshirani
0702c72151 Mute DataFrameGetAndGetStatsIT#testGetPersistedStatsWithoutTask.
Tracked in #40963.
2019-04-09 16:39:16 -07:00
Hendrik Muhs
d5fcbf2f4a refactor onStart and onFinish to take runnables and executed them guarded by state (#40855)
refactor onStart and onFinish to take action listeners and execute them when indexer is in indexing state.
2019-04-07 21:46:37 +02:00
Benjamin Trent
a8dbb07546
[ML] Changes default destination index field mapping and adds scripted_metric agg (#40750) (#40846)
* [ML] Allowing destination index mappings to have dynamic types, adds script_metric agg

* Making dynamic|source mapping explicit
2019-04-05 11:34:20 -05:00
Benjamin Trent
665f0d81aa
[ML] refactoring start task a bit, removing unused code (#40798) (#40845) 2019-04-05 09:01:01 -05:00
Hendrik Muhs
31e79a73d7 add HLRC protocol tests for transform state and stats (#40766)
adds HLRC protocol tests for state and stats hrlc clients
2019-04-03 12:51:15 +02:00
Benjamin Trent
945e7ca01e
[ML] Periodically persist data-frame running statistics to internal index (#40650) (#40729)
* [ML] Add mappings, serialization, and hooks to persist stats

* Adding tests for transforms without tasks having stats persisted

* intermittent commit

* Adjusting usage stats to account for stored stats docs

* Adding tests for id expander

* Addressing PR comments

* removing unused import

* adding shard failures to the task response
2019-04-02 14:16:55 -05:00
Benjamin Trent
4842d7fb7d
[ML] addressing test failure (#40701) (#40728)
* [ML] Fixing test

* adjusting line lengths

* marking valid seqno as final
2019-04-02 12:33:51 -05:00