All ML objects stored in internal indices are currently parsed
strictly. This means unknown fields lead to parsing failures.
In turn, this means we cannot add new fields in any of those
objects (e.g. bucket, record, calendar, etc.) as it is not
backwards compatible.
This commit changes this by introducing lenient parsing when
it comes to reading those objects from the internal indices.
Note we still use strict parsing for the objects we read from
the c++ process, which is nice as it guarantees we would detect
if any of the fields were renamed on one side but not the other.
Also note that even though this is going in from 6.3, we cannot
introduce new fields until 7.0.
relates elastic/x-pack-elasticsearch#4232
Original commit: elastic/x-pack-elasticsearch@3f95d3c7b9
While it makes sense to apply auto-chunking in order to limit
the time range of the search for previewing datafeeds without aggs,
the same is not the case when aggs are used. In contrary, we should
do the preview the same way it would be if the datafeed run, as this
can reveal problems with regard to the datafeed configuration.
In addition, by default datafeeds with aggs have a manual chunking config
that limits the cost of each search. So, setting the chunking to auto
in those cases may lead to the datafeed preview failing even though
actually running the datafeed would work fine.
Original commit: elastic/x-pack-elasticsearch@79e317efb2
The ML open_job and start_datafeed endpoints start persistent tasks and
wait for these to be successfully assigned before returning. Since the
setup sequence is complex they do a "fast fail" validation step on the
coordinating node before the setup sequence. However, this leads to the
possibility of the "fast fail" validation succeeding and the eventual
persistent task assignment failing due to other changes during the setup
sequence. Previously when this happened the endpoints would time out,
which in the case of the open_job action takes 30 minutes by default.
The start_datafeed endpoint has a shorter default timeout of 20 seconds,
but in both cases the result of a timeout is an unfriendly HTTP 500
status.
This change adjusts the criteria used to wait for the persistent tasks to
be assigned to account for the possibility of assignment failure and, if
this happens, return an error identical to what the "fast fail"
validation would have returned. Additionally in this case the unassigned
persistent task is cancelled, leaving the system in the same state as if
the "fast fail" validation had failed.
Original commit: elastic/x-pack-elasticsearch@16916cbc13
In order to deal with the most anticipated scenario, when datafeed
frequency is greater than the query_delay, we add the query_delay
to the frequency in order to determine the next time we will trigger
a real-time run. For example, if frequency is 10s and query_delay 1s,
we make sure to trigger the real-time run at a 10s + 1s = 11s offset.
However, this is not correct in the case the frequency is less or
equal to the query_delay. For example, if frequency is 1s and
query_delay is 10s. we would also end up triggering at 11s offset.
But the right behaviour would be to trigger every second while
ensuring we are searching for up to 10seconds ago.
This commit fixes this issue.
relates elastic/x-pack-elasticsearch#4167
Original commit: elastic/x-pack-elasticsearch@f605885167
This change disables security for trial licenses unless security is
explicitly enabled in the settings. This is done to facilitate users
getting started and not having to deal with some of the complexities
involved in getting security configured. In order to do this and avoid
disabling security for existing users that have gold or platinum
licenses, we have to disable security after cluster formation so that
the license can be retrieved.
relates elastic/x-pack-elasticsearch#4078
Original commit: elastic/x-pack-elasticsearch@96bdb889fc
This adds a minimum compatible version to the model snapshot.
Nodes with a version earlier than that version cannot read
that model snapshot. Thus, such jobs are not assigned to
incompatible nodes.
relates elastic/x-pack-elasticsearch#4077
Original commit: elastic/x-pack-elasticsearch@2ffa6adce0
* Decouple XContentBuilder from BytesReference
This commit handles the removal of all mentions of BytesReference from
XContentBuilder. This is needed so that we can completely decouple the XContent
code and move it into its own dependency.
This is the x-pack side of https://github.com/elastic/elasticsearch/pull/28972
Original commit: elastic/x-pack-elasticsearch@8ba2e97b26
This commit replaces the usage of Lucene IOUtils with Elasticsearch
IOUtils, the former of which is now forbidden.
Original commit: elastic/x-pack-elasticsearch@8e0554001f
Up to now a job update that reduces the model memory limit
was not allowed. However, there could definitely be cases
where reducing the limit is necessary and reasonable.
This commit makes it possible to decrease the limit as long
as it does not go below the current memory usage. We obtain
the latter from the model size stats.
The conditions under which updating the model_memory_limit
is not allowed are now:
- when the job is open
- latest model_size_stats.model_bytes < new value
relates elastic/x-pack-elasticsearch#2461
Original commit: elastic/x-pack-elasticsearch@5b35923590
This wraps the stream (`.streamInput()`) that is passed to many of the
`createParser` instances in the enclosing (or a new) try-with-resources block.
This ensures the `BytesReference.streamInput()` is closed.
Relates to elastic/x-pack-elasticsearch#28504
Original commit: elastic/x-pack-elasticsearch@7546e3b4d4
Analysis limits contain settings that affect the resources
used by ML jobs. Those limits always take place. However,
explictly setting them is not required as they have reasonable
defaults. For a long time those defaults lived on the c++ side.
The job could just not have any explicit limits and that meant
defaults would be used at the c++ side. This has the disadvantage
that it is not obvious to the users what these settings are set to.
Additionally, users might not be aware of the settings existence.
On top of that, since 6.1, the default model_memory_limit was lowered
from 4GB to 1GB. For BWC, this meant that jobs where model_memory_limit
is null, the default of 4GB applies. Jobs that were created from 6.1
onwards, contain an explicit setting for model_memory_limit, which is
1GB unless the user sets it differently. This adds additional confusion.
This commit makes analysis limits an always explicit setting on the job.
Regardless of whether the user sets custom limits or not, the job object
(and response) will contain the full analysis limits values.
The possibilities for interpretation of missing values are:
- the entire analysis_limits is null: this may only happen for jobs
created prior to 6.1. Thus we set the model_memory_limit to 4GB.
- analysis_limits are non-null but model_memory_limit is: this also
may only happen for jobs prior to 6.1. Again, we set memory limit to
4GB.
- model_memory_limit is non-null: this either means the user set an
explicit value or the job was created from 6.1 onwards and it has
the explicit default of 1GB. We simply keep the given value.
For categorization_examples_limit the default has always been 4, so
we fill that in when it's missing.
Finally, note that we still need to handle potential null values
for the situation of a mixed cluster.
Original commit: elastic/x-pack-elasticsearch@5b6994ef75
* Pass InputStream when creating XContent parser
Rather than passing the raw `BytesReference` in when creating the xcontent
parser, this passes the StreamInput (which is an InputStream), this allows us to
decouple XContent from BytesReference.
This is the x-pack side of https://github.com/elastic/elasticsearch/pull/28754
* Use the streamInput variant, not sourceAsString
Original commit: elastic/x-pack-elasticsearch@dd5d8b1654
We were missing a notification for when a job is updated. This is
useful so users know that there's been changes which could justify
a change in the job behaviour.
In addition, having those notifications allows our integrations
tests to know when the update was processed which avoids having
to use `sleep()` with its instabilities.
Original commit: elastic/x-pack-elasticsearch@0b4eda2232
The notifier is scheduled to run once per second. Currently,
it simply polls for the next update in the queue. However,
when there are multiple updates queued up, there is no
reason to wait for subsequent runs in order to execute the
rest of the updates.
This commit changes the notifier to drain the queue each time
it runs. It then serially executes the updates.
relates elastic/x-pack-elasticsearch#3769
Original commit: elastic/x-pack-elasticsearch@7a433c17f2
This commit reenables running ITs in xpack by adding an internalClusterTest to xpack modules that contain ESIntegTestCase tests. The new task allows us to run these independently of rest integ tests, which are disabled for xpack modules because installing the bundled plugins directly is not quite the same as installing via the meta plugin. Some tests (ML) are moved to their own qa module to accommodate the need for a real cluster. A couple tests (monitoring and upgrade) have been marked as AwaitsFix.
Commits that have been folded into this commit:
* Move ML IT tests to qa/ml-native-tests
* Add internalClusterTest task and disable rest integ tests for xpack
modules. Also tweak ML tests and get upgrade tests working
* Adding the keystore and security back to the ml native tests
* Fixing native integ test
* Fix last ML test, add awaits fix to monitoring and upgrade tests
* cleanup PR
* fix checkstyle
Original commit: elastic/x-pack-elasticsearch@3c0ed6fd3b