This PR adds logic to ensure that the fields (and field types) configured
in the Rollup Job are present in the index/indices specified by the job's
index pattern. If a field is missing, or is not aggregatable, it
will throw an exception before the job is created.
This is important for user-friendliness, because otherwise the user
only discovers an issue with mapping when the job is started and
fails to rollup correctly (and only really noticeable by looking at logs,
since it's a runtime failure).
Original commit: elastic/x-pack-elasticsearch@686cd03072
When a watch is acknowledged, while it is also being executed, the
acknowledgment information can get lost. The reason for this is the
fact, that the execution writes the watch status inside of the watch
regardless, if other writes happened inbetween to make sure the
execution state is caught.
This commit checks the current executions in the execution service and
aborts the API call, if the specified watch ID can be found in those.
Note, this does not prevent this issue fully, as a watch could be
triggered, while the acknowledgement update is running, but it does
reduce the surface area of this problem. In order to properly solve
this, indexing the watch status as part of a watch would need to be
changed.
relates elastic/x-pack-elasticsearch#4003
Original commit: elastic/x-pack-elasticsearch@d7e218b2ac
Adds a SecureSetting option for the "bind_password" in LDAP/AD realms
and deprecates the non-secure version.
LDAP bind passwords should now be configured with the setting
`xpack.security.authc.realms.REALM_NAME.secure_bind_password`
in the elasticsearch keystore.
Original commit: elastic/x-pack-elasticsearch@1a0cebd77e
If there are multiple jobs that are all the "best" (e.g. share the
best interval) we have no way of knowing which is actually the best.
Unfortunately, we cannot just filter for all the jobs in a single
search because their doc_counts can potentially overlap.
To solve this, we execute an msearch-per-job so that the results
stay isolated. When rewriting the response, we iteratively
unroll and reduce the independent msearch responses into a single
"working tree". This allows us to intervene if there are
overlapping buckets and manually choose a doc_count.
Job selection is found by recursively descending through the aggregation
tree and independently pruning the list of valid job caps in each branch.
When a leaf node is reached in the branch, the remaining jobs are
sorted by "best'ness" (see comparator in RollupJobIdentifierUtils for the
implementation) and added to a global set of "best jobs". Once
all branches have been evaluated, the final set is returned to the
calling code.
Job "best'ness" is, briefly, the job(s) that have
- The largest compatible date interval
- Fewer and larger interval histograms
- Fewer terms groups
Note: the final set of "best" jobs is not guaranteed to be minimal,
there may be redundant effort due to independent branches choosing
jobs that are subsets of other branches.
Related changes:
- We have to include the job's ID in the rollup doc's
hash, so that different jobs don't overwrite the same summary
document.
- Now that we iteratively reduce the agg tree, the agg framework
injects empty buckets while we're working. In most cases this
is harmless, but for `avg` aggs the empty bucket is a SumAgg while
any unrolled versions are converted into AvgAggs... causing a cast
exception. To get around this, avg's are renamed to
`{source_name}.value` to prevent a conflict
- The job filtering has been pushed up into a query filter, since it
applies to the entire msearch rather than just individual agg components
- We no longer add a filter agg clause about the date_histo's interval, because
that is handled by the job validation and pruning.
Original commit: elastic/x-pack-elasticsearch@995be2a039
The credentials now get injected via environment variables, so that
external services can pull those.
As soon as the specified environment variables are set, the tests are run. No need to check for the @Network annotation
This also introduces new secret store settings for the secure settings in order to be sure to not leak them in the configuration files, that get dumped.
Relates elastic/x-pack-elasticsearch#3800
Original commit: elastic/x-pack-elasticsearch@a2cfb9cb86
The documentation mentions that the xpack.watcher.encrypt_sensitive_data
setting needs to be set in the keystore. This is wrong however, it needs
to be set in the standard elasticsearch yaml file.
relates elastic/x-pack-elasticsearch#4195
Original commit: elastic/x-pack-elasticsearch@613d63da85
This creates a new "beats_system" user and role with the same
privileges as the existing "logstash_system" user/role.
The "beat_system" user is also added as a managed user within
the "setup-passwords" command.
Users who upgrade from an earlier version of Elasticsearch/X-Pack
will need to manually set a password for the beats_system user via
the change password API (or Kibana UI)
Original commit: elastic/x-pack-elasticsearch@6087d3a18e
This adds a minimum compatible version to the model snapshot.
Nodes with a version earlier than that version cannot read
that model snapshot. Thus, such jobs are not assigned to
incompatible nodes.
relates elastic/x-pack-elasticsearch#4077
Original commit: elastic/x-pack-elasticsearch@2ffa6adce0
This is related to elastic/x-pack-elasticsearch#3877. This commit adds a route /start_basic that
will self generate a basic license. The only validation that is
performed is to check that you do not already have a basic license
installed. Additionally, if you lose features from switching to a basic
license, you must acknowledge the changes.
Original commit: elastic/x-pack-elasticsearch@7b8eeb50b1
Up to now a job update that reduces the model memory limit
was not allowed. However, there could definitely be cases
where reducing the limit is necessary and reasonable.
This commit makes it possible to decrease the limit as long
as it does not go below the current memory usage. We obtain
the latter from the model size stats.
The conditions under which updating the model_memory_limit
is not allowed are now:
- when the job is open
- latest model_size_stats.model_bytes < new value
relates elastic/x-pack-elasticsearch#2461
Original commit: elastic/x-pack-elasticsearch@5b35923590
Analysis limits contain settings that affect the resources
used by ML jobs. Those limits always take place. However,
explictly setting them is not required as they have reasonable
defaults. For a long time those defaults lived on the c++ side.
The job could just not have any explicit limits and that meant
defaults would be used at the c++ side. This has the disadvantage
that it is not obvious to the users what these settings are set to.
Additionally, users might not be aware of the settings existence.
On top of that, since 6.1, the default model_memory_limit was lowered
from 4GB to 1GB. For BWC, this meant that jobs where model_memory_limit
is null, the default of 4GB applies. Jobs that were created from 6.1
onwards, contain an explicit setting for model_memory_limit, which is
1GB unless the user sets it differently. This adds additional confusion.
This commit makes analysis limits an always explicit setting on the job.
Regardless of whether the user sets custom limits or not, the job object
(and response) will contain the full analysis limits values.
The possibilities for interpretation of missing values are:
- the entire analysis_limits is null: this may only happen for jobs
created prior to 6.1. Thus we set the model_memory_limit to 4GB.
- analysis_limits are non-null but model_memory_limit is: this also
may only happen for jobs prior to 6.1. Again, we set memory limit to
4GB.
- model_memory_limit is non-null: this either means the user set an
explicit value or the job was created from 6.1 onwards and it has
the explicit default of 1GB. We simply keep the given value.
For categorization_examples_limit the default has always been 4, so
we fill that in when it's missing.
Finally, note that we still need to handle potential null values
for the situation of a mixed cluster.
Original commit: elastic/x-pack-elasticsearch@5b6994ef75
This adds a new Rollup module to XPack, which allows users to configure periodic "rollup jobs" to pre-aggregate data. That data is then available later for search through a special RollupSearch API, which mimics the DSL and functionality of regular search.
Rollups are used to drastically reduce the on-disk footprint of metric-based data (e.g. timestamped document with numeric and keyword fields). It can also be used to speed up aggregations over large datasets, since the rolled data will be considerably smaller and fewer documents to search.
The PR adds seven new endpoints to interact with Rollups; create/get/delete job, start/stop job, a capabilities API similar to field-caps, and a Rollup-enabled search.
Original commit: elastic/x-pack-elasticsearch@dcde91aacf
This adds a new setting, `xpack.monitoring.collection.enabled`, and
disables it by default (`false`).
Original commit: elastic/x-pack-elasticsearch@4b3a5a1161