* stop data frame task after it finishes
* test auto stop
* adapt tests
* persist the state correctly and move stop into listener
* Calling `onStop` even if persistence fails, changing `stop` to rely on doSaveState
The description field of xpack featuresets is optionally part of the
xpack info api, when using the verbose flag. However, this information
is unnecessary, as it is better left for documentation (and the existing
descriptions describe anything meaningful). This commit removes the
description field from feature sets.
increases the scheduler interval to fire less frequently, namely changing it from 1s to 10s. The scheduler interval is used for retrying after an error condition.
We had this as a dependency for legacy dependencies that still needed
the Log4j 1.2 API. This appears to no longer be necessary, so this
commit removes this artifact as a dependency.
To remove this dependency, we had to fix a few places where we were
accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since
both APIs were on the compile-time classpath).
Finally, we can remove our custom Netty logger factory. This was needed
when we were on Log4j 1.2 and handled logging in our own unique
way. When we migrated to Log4j 2 we could have dropped this
dependency. However, even then Netty would still pick up Log4j 1.2 since
it was on the classpath, thus the advantage to removing this as a
dependency now.
* add support for fixed_interval, calendar_interval, remove interval
* adapt HLRC
* checkstyle
* add a hlrc to server test
* adapt yml test
* improve naming and doc
* improve interface and add test code for hlrc to server
* address review comments
* repair merge conflict
* fix date patterns
* address review comments
* remove assert for warning
* improve exception message
* use constants
This corrects what appears to have been a copy-paste error
where the logger for `MachineLearning` and `DataFrame` was wrongly
set to be that of `XPackPlugin`.
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.
This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed. And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit). This arrangement is very
error-prone for users.
This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.
The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.
The change applies to both REST and java clients.
* [ML] adding pivot.size option for setting paging size
* Changing field name to address PR comments
* fixing ctor usage
* adjust hlrc for field name change
* [ML] properly nesting objects in document source
* Throw exception on agg extraction failure, cause it to fail df
* throwing error to stop df if unsupported agg is found
Direct the task request to the node executing the task and also refactor the task responses
so all errors are returned and set the HTTP status code based on presence of errors.
The run task is supposed to run elasticsearch with the given plugin or
module. However, for modules, this is most realistic if using the full
distribution. This commit changes the run setup to use the default or
oss as appropriate.
* [ML] Adds progress reporting for transforms
* fixing after master merge
* Addressing PR comments
* removing unused imports
* Adjusting afterKey handling and percentage to be 100*
* Making sure it is a linked hashmap for serialization
* removing unused import
* addressing PR comments
* removing unused import
* simplifying code, only storing total docs and decrementing
* adjusting for rewrite
* removing initial progress gathering from executor
* [ML] Add mappings, serialization, and hooks to persist stats
* Adding tests for transforms without tasks having stats persisted
* intermittent commit
* Adjusting usage stats to account for stored stats docs
* Adding tests for id expander
* Addressing PR comments
* removing unused import
* adding shard failures to the task response
* [ML] Add data frame task state object and field
* A new state item is added so that the overall task state can be
accoutned for
* A new FAILED state and reason have been added as well so that failures
can be shown to the user for optional correction
* Addressing PR comments
* adjusting after master merge
* addressing pr comment
* Adjusting auditor usage with failure state
* Refactor, renamed state items to task_state and indexer_state
* Adding todo and removing redundant auditor call
* Address HLRC changes and PR comment
* adjusting hlrc IT test
create and use unique, deterministic document ids based on the grouping values.
This is a pre-requisite for updating documents as well as preventing duplicates after a hard failure during indexing.
* [ML] make source and dest objects in the transform config
* addressing PR comments
* Fixing compilation post merge
* adding comment for Arrays.hashCode
* addressing changes for moving dest to object
* fixing data_frame yml tests
* fixing API test
Add a checkpoint service for data frame transforms, which allows to ask for a checkpoint of the
source. In future these checkpoints will be stored in the internal index to
- detect upstream changes
- updating the data frame without a full re-run
- allow data frame clients to checkpoint themselves
* [Data Frame] Refactor GET Transforms API:
* Add pagination
* comma delimited list expression support GET transforms
* Flag troublesome internal code for future refactor
* Removing `allow_no_transforms` param, ratcheting down pageparam option
* Changing DataFrameFeatureSet#usage to not get all configs
* Intermediate commit
* Writing test for batch data gatherer
* Removing unused import
* removing bad println used for debugging
* Updating BatchedDataIterator comments and query
* addressing pr comments
* disallow null scrollId to cause stackoverflow
* [Data Frame] Refactor PUT transform such that:
* POST _start creates the task and starts it
* GET transforms queries docs instead of tasks
* POST _stop verifies the stored config exists before trying to stop
the task
* Addressing PR comments
* Refactoring DataFrameFeatureSet#usage, decreasing size returned getTransformConfigurations
* fixing failing usage test
This change adds two new cluster privileges:
* manage_data_frame_transforms
* monitor_data_frame_transforms
And two new built-in roles:
* data_frame_transforms_admin
* data_frame_transforms_user
These permit access to the data frame transform endpoints.
(Index privileges are also required on the source and
destination indices for each data frame transform, but
since these indices are configurable they it is not
appropriate to grant them via built-in roles.)
fix a couple of odd behaviors of data frame transforms REST API's:
- check if id from body and id from URL match if both are specified
- do not allow a body for delete
- allow get and stats without specifying an id