Inference processors asynchronously usage write stats to the .ml-stats index after they used.
In tests the write can leak into the next test causing failures depending on which test follows.
This change waits for the usage stats docs to be written at the end of the test
If a search failure occurs during data frame extraction we catch
the error and retry once. However, we retry another search that is
identical to the first one. This means we will re-fetch any docs
that were already processed. This may result either to training
a model using duplicate data or in the case of outlier detection to
an error message that the process received more records than it
expected.
This commit fixes this issue by tracking the latest doc's sort key
and then using that in a range query in case we restart the search
due to a failure.
Backport of #61544
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Backports the following commits to 7.x:
[ML] write warning if configured memory limit is too low for analytics job (#61505)
Having `_start` fail when the configured memory limit is too low can be frustrating.
We should instead warn the user that their job might not run properly if their configured limit is too low.
It might be that our estimate is too high, and their configured limit works just fine.
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.
depends on #61515
backports #58435
Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog).
Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed.
relates #55699
relates #52369
backports #55941
This commit removes the log info message "Created ML annotations index and aliases".
The message comes in addition to elasticsearch's index creation logging and it does
not add to it. In addition, since #61107 that message may be logged multiple times.
Backport of #61461
feature_processors allow users to create custom features from
individual document fields.
These `feature_processors` are the same object as the trained model's pre_processors.
They are passed to the native process and the native process then appends them to the
pre_processor array in the inference model.
closes https://github.com/elastic/elasticsearch/issues/59327
When the ML annotations index was first added, only the
ML UI wrote to it, so the code to create it was designed
with this in mind. Now the ML backend also creates
annotations, and those mappings can change between
versions.
In this change:
1. The code that runs on the master node to create the
annotations index if it doesn't exist but another ML
index does also now ensures the mappings are up-to-date.
This is good enough for the ML UI's use of the
annotations index, because the upgrade order rules say
that the whole Elasticsearch cluster must be upgraded
prior to Kibana, so the master node should be on the
newer version before Kibana tries to write an
annotation with the new fields.
2. We now also check whether the annotations index exists
with the correct mappings before starting an autodetect
process on a node. This is necessary because ML nodes
can be upgraded before the master node, so could write
an annotation with the new fields before the master node
knows about the new fields.
Backport of #61107
When a user upgrades between versions, they may stop their ML jobs.
Then when the upgrade is complete, they will want to open the jobs again.
But, when opening a job, we attempt to clear out the jobs finished_time. If the job configuration has adjusted between the versions (i.e. added a new field), it will dynamically update the .ml-config index.
We should instead manually change the mapping to be the updated version.
`foreach` processors store information within the `_ingest` metadata object.
This commit adds the contents of the `_ingest` metadata (if it is not empty).
And will append new inference results if the result field already exists.
This allows a `foreach` to execute and multiple inference results being written to the same result field.
closes https://github.com/elastic/elasticsearch/issues/60867
Examines the reindex response in order to report potential
problems that occurred during the reindexing phase of
data frame analytics jobs.
Backport of #60911
If the search for get stats with multiple job Ids fails the listener is called for each failure.
This change waits for all responses then returns the first error if there was one.
* [ML] have DELETE analytics ignore stats failures and clean up unused stats (#60776)
When deleting an analytics configuration, the request MIGHT fail if
the .ml-stats index does not exist or is in strange state (shards unallocated).
Instead of making the request fail, we should log that we were unable to delete the stats docs and then
have them cleaned up in the 'delete_expire_data' janitorial process
When an exception is thrown during test inference we are
not including the cause message in our logging. This commit
addresses this issue.
Backport of #60749
* Merge test runner task into RestIntegTest (#60261)
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
* Fix merge issues
* use former 7.x common test configuration
This commit does three things:
* Removes all Copyright/license headers for the build.gradle files under x-pack. (implicit Apache license)
* Removes evaluationDependsOn(xpackModule('core')) from build.gradle files under x-pack
* Removes a place holder test in favor of disabling the test task (in the async plugin)
- Replace immediate task creations by using task avoidance api
- One step closer to #56610
- Still many tasks are created during configuration phase. Tackled in separate steps
Prior to this change ML memory estimation processes for a
given job would always use the same named pipe names. This
would often cause one of the processes to fail.
This change avoids this risk by adding an incrementing counter
value into the named pipe names used for memory estimation
processes.
Backport of #60395
In order to unify model inference and analytics results we
need to write the same fields.
prediction_probability and prediction_score are now written
for inference calls against classification models.
This sets up all indexing to one of our write aliases to require it actually be an alias.
This allows failures scenarios to be captured quickly, loudly, and then potentially recovered.
If a feature is created via a custom pre-processor,
we should return the importance for that feature.
This means we will not return the importance for the
original document field for custom processed features.
closes https://github.com/elastic/elasticsearch/issues/59330
Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.
Note the limit was implemented in #58885.
Backport of #60219
Previously the test was asserting the prediction on each document
was close 10.0 from the expected. It turned out that was not enough
as we occasionally saw the test failing by little.
Instead of relaxing that assertion, this commit changes it to
assert the mean prediction error is less than 10.0. This should
reduce the chances of the test failing significantly.
Fixes#60212
Backport of #60221
When the job is force-closed or shutting down due to a fatal error we clean
up all cancellable job operations. This includes cancelling the results processor.
However, this means that we might not persist objects that are written from the
process like stats, memory usage, etc.
In hindsight, we do not gain from cancelling the results processor in its
entirety. It makes more sense to skip row results and model chunks but keep
stats and instrumentation about the job as the latter may contain useful information
to understand what happened to the job.
Backport of #60113
Putting an ingest pipeline used to require that the user calling
it had permission to get nodes info as well as permission to
manage ingest. This was due to an internal implementaton detail
that was not visible to the end user.
This change alters the behaviour so that a user with the
manage_pipeline cluster privilege can put an ingest pipeline
regardless of whether they have the separate privilege to get
nodes info. The internal implementation detail now runs as
the internal _xpack user when security is enabled.
Backport of #60106
This commit continues on the work in #59801 and makes other
implementors of the LocalNodeMasterListener interface thread safe in
that they will no longer allow the callbacks to run on different
threads and possibly race each other. This also helps address other
issues where these events could be queued to wait for execution while
the service keeps moving forward thinking it is the master even when
that is not the case.
In order to accomplish this, the LocalNodeMasterListener no longer has
the executorName() method to prevent future uses that could encounter
this surprising behavior.
Each use was inspected and if the class was also a
ClusterStateListener, the implementation of LocalNodeMasterListener
was removed in favor of a single listener that combined the logic. A
single listener is used and there is currently no guarantee on execution
order between ClusterStateListeners and LocalNodeMasterListeners,
so a future change there could cause undesired consequences. For other
classes, the implementations of the callbacks were inspected and if the
operations were lightweight, the overriden executorName method was
removed to use the default, which runs on the same thread.
Backport of #59932
In #58877, when we switched test inference on java, we just
use the doc's `_source` as features. However, this could be
missing out on features that were used during training,
e.g. alias fields, etc.
This commit addresses this by extracting fields to use as
features during inference the same way they are extracted
in `DataFrameDataExtractor` when they are used for training.
Backport of #59963
* [ML] add new `custom` field to trained model processors (#59542)
This commit adds the new configurable field `custom`.
`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.
Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).
This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors
in the analytics job configuration, we need to know the input and output field names.
When an inference model is loaded it is accounted for in circuit breaker
and should not be released until there are no users of the model. Adds
a reference count to the model to track usage.
Backport of #59525 to 7.x branch.
* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
This commit adds a new api to track when gold+ features are used within
x-pack. The tracking is done internally whenever a feature is checked
against the current license. The output of the api is a list of each
used feature, which includes the name, license level, and last time it
was used. In addition to a unit test for the tracking, a rest test is
added which ensures starting up a default configured node does not
result in any features registering as used.
There are a couple features which currently do not work well with the
tracking, as they are checked in a manner that makes them look always
used. Those features will be fixed in followups, and in this PR they are
omitted from the feature usage output.
The `create_doc`, `create`, `write` and `index` privileges do not grant
the PutMapping action anymore. Apart from the `write` privilege, the other
three privileges also do NOT grant (auto) updating the mapping when ingesting
a document with unmapped fields, according to the templates.
In order to maintain the BWC in the 7.x releases, the above privileges will still grant
the Put and AutoPutMapping actions, but only when the "index" entity is an alias
or a concrete index, but not a data stream or a backing index of a data stream.