Adds `value_count` as one of the accepted metrics. The caveat is that
it only accepts numeric values for two reasons:
- Job validation at creation makes sure all metrics are numeric fields.
Changing this would require new syntax (or disallowing anything but
value_count on mixed fields)
- when `toBuilders()` is called, we have to supply a ValueSource to
the ValueCountBuilder, and we don't know what the field type is at that
time.
These are both fixable, but relatively more involved. I think numeric-only
is a reasonable limitation to start with
Original commit: elastic/x-pack-elasticsearch@270f24c8bf
This PR adds logic to ensure that the fields (and field types) configured
in the Rollup Job are present in the index/indices specified by the job's
index pattern. If a field is missing, or is not aggregatable, it
will throw an exception before the job is created.
This is important for user-friendliness, because otherwise the user
only discovers an issue with mapping when the job is started and
fails to rollup correctly (and only really noticeable by looking at logs,
since it's a runtime failure).
Original commit: elastic/x-pack-elasticsearch@686cd03072
When a watch is acknowledged, while it is also being executed, the
acknowledgment information can get lost. The reason for this is the
fact, that the execution writes the watch status inside of the watch
regardless, if other writes happened inbetween to make sure the
execution state is caught.
This commit checks the current executions in the execution service and
aborts the API call, if the specified watch ID can be found in those.
Note, this does not prevent this issue fully, as a watch could be
triggered, while the acknowledgement update is running, but it does
reduce the surface area of this problem. In order to properly solve
this, indexing the watch status as part of a watch would need to be
changed.
relates elastic/x-pack-elasticsearch#4003
Original commit: elastic/x-pack-elasticsearch@d7e218b2ac
If there are multiple jobs that are all the "best" (e.g. share the
best interval) we have no way of knowing which is actually the best.
Unfortunately, we cannot just filter for all the jobs in a single
search because their doc_counts can potentially overlap.
To solve this, we execute an msearch-per-job so that the results
stay isolated. When rewriting the response, we iteratively
unroll and reduce the independent msearch responses into a single
"working tree". This allows us to intervene if there are
overlapping buckets and manually choose a doc_count.
Job selection is found by recursively descending through the aggregation
tree and independently pruning the list of valid job caps in each branch.
When a leaf node is reached in the branch, the remaining jobs are
sorted by "best'ness" (see comparator in RollupJobIdentifierUtils for the
implementation) and added to a global set of "best jobs". Once
all branches have been evaluated, the final set is returned to the
calling code.
Job "best'ness" is, briefly, the job(s) that have
- The largest compatible date interval
- Fewer and larger interval histograms
- Fewer terms groups
Note: the final set of "best" jobs is not guaranteed to be minimal,
there may be redundant effort due to independent branches choosing
jobs that are subsets of other branches.
Related changes:
- We have to include the job's ID in the rollup doc's
hash, so that different jobs don't overwrite the same summary
document.
- Now that we iteratively reduce the agg tree, the agg framework
injects empty buckets while we're working. In most cases this
is harmless, but for `avg` aggs the empty bucket is a SumAgg while
any unrolled versions are converted into AvgAggs... causing a cast
exception. To get around this, avg's are renamed to
`{source_name}.value` to prevent a conflict
- The job filtering has been pushed up into a query filter, since it
applies to the entire msearch rather than just individual agg components
- We no longer add a filter agg clause about the date_histo's interval, because
that is handled by the job validation and pruning.
Original commit: elastic/x-pack-elasticsearch@995be2a039
This adds a minimum compatible version to the model snapshot.
Nodes with a version earlier than that version cannot read
that model snapshot. Thus, such jobs are not assigned to
incompatible nodes.
relates elastic/x-pack-elasticsearch#4077
Original commit: elastic/x-pack-elasticsearch@2ffa6adce0
This is related to elastic/x-pack-elasticsearch#3877. This commit adds a route /start_basic that
will self generate a basic license. The only validation that is
performed is to check that you do not already have a basic license
installed. Additionally, if you lose features from switching to a basic
license, you must acknowledge the changes.
Original commit: elastic/x-pack-elasticsearch@7b8eeb50b1
Up to now a job update that reduces the model memory limit
was not allowed. However, there could definitely be cases
where reducing the limit is necessary and reasonable.
This commit makes it possible to decrease the limit as long
as it does not go below the current memory usage. We obtain
the latter from the model size stats.
The conditions under which updating the model_memory_limit
is not allowed are now:
- when the job is open
- latest model_size_stats.model_bytes < new value
relates elastic/x-pack-elasticsearch#2461
Original commit: elastic/x-pack-elasticsearch@5b35923590
Analysis limits contain settings that affect the resources
used by ML jobs. Those limits always take place. However,
explictly setting them is not required as they have reasonable
defaults. For a long time those defaults lived on the c++ side.
The job could just not have any explicit limits and that meant
defaults would be used at the c++ side. This has the disadvantage
that it is not obvious to the users what these settings are set to.
Additionally, users might not be aware of the settings existence.
On top of that, since 6.1, the default model_memory_limit was lowered
from 4GB to 1GB. For BWC, this meant that jobs where model_memory_limit
is null, the default of 4GB applies. Jobs that were created from 6.1
onwards, contain an explicit setting for model_memory_limit, which is
1GB unless the user sets it differently. This adds additional confusion.
This commit makes analysis limits an always explicit setting on the job.
Regardless of whether the user sets custom limits or not, the job object
(and response) will contain the full analysis limits values.
The possibilities for interpretation of missing values are:
- the entire analysis_limits is null: this may only happen for jobs
created prior to 6.1. Thus we set the model_memory_limit to 4GB.
- analysis_limits are non-null but model_memory_limit is: this also
may only happen for jobs prior to 6.1. Again, we set memory limit to
4GB.
- model_memory_limit is non-null: this either means the user set an
explicit value or the job was created from 6.1 onwards and it has
the explicit default of 1GB. We simply keep the given value.
For categorization_examples_limit the default has always been 4, so
we fill that in when it's missing.
Finally, note that we still need to handle potential null values
for the situation of a mixed cluster.
Original commit: elastic/x-pack-elasticsearch@5b6994ef75
This adds a new Rollup module to XPack, which allows users to configure periodic "rollup jobs" to pre-aggregate data. That data is then available later for search through a special RollupSearch API, which mimics the DSL and functionality of regular search.
Rollups are used to drastically reduce the on-disk footprint of metric-based data (e.g. timestamped document with numeric and keyword fields). It can also be used to speed up aggregations over large datasets, since the rolled data will be considerably smaller and fewer documents to search.
The PR adds seven new endpoints to interact with Rollups; create/get/delete job, start/stop job, a capabilities API similar to field-caps, and a Rollup-enabled search.
Original commit: elastic/x-pack-elasticsearch@dcde91aacf
... yet support updates. This commit introduces a few changes of how
watches are put.
The GET Watch API will never return credentials like basic auth
passwords, but a placeholder instead now. If the watcher is enabled to
encrypt sensitive settings, then the original encrypted value is
returned otherwise a "::es_redacted::" place holder.
There have been several Put Watch API changes.
The API now internally uses the Update API and versioning. This has
several implications. First if no version is supplied, we assume an
initial creation. This will work as before, however if a credential is
marked as redacted we will reject storing the watch, so users do not
accidentally store the wrong watch.
The watch xcontent parser now has an additional methods to tell the
caller if redacted passwords have been found. Based on this information
an error can be thrown.
If the user now wants to store a watch that contains a password marked
as redacted, this password will not be part of the toXContent
representation of the watch and in combinatination with update request
the existing password will be merged in. If the encrypted password is
supplied this one will be stored.
The serialization for GetWatchResponse/PutWatchRequest has changed.
The version checks for this will be put into the 6.x branch.
The Watcher UI now needs specify the version, when it wants to store a
watch. This also prevents last-write-wins scenarios and is the reason
why the put/get watch response now contains the internal version.
relates elastic/x-pack-elasticsearch#3089
Original commit: elastic/x-pack-elasticsearch@bb63be9f79
This commit adds the ability to refresh tokens that have been obtained by the API using a refresh
token. Refresh tokens are one time use tokens that are valid for 24 hours. The tokens may be used
to get a new access and refresh token if the refresh token has not been invalidated or
already refreshed.
relates elastic/x-pack-elasticsearch#2595
Original commit: elastic/x-pack-elasticsearch@23435eb815