This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.
(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Since we are able to load the inference model
and perform inference in java, we no longer need
to rely on the analytics process to be performing
test inference on the docs that were not used for
training. The benefit is that we do not need to
send test docs and fit them in memory of the c++
process.
Backport of #58877
Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Backport of #59076 to 7.x branch.
The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.
Relates to #58642
Relates to #53100Closes#58956Closes#58583
There have been a few test failures that are likely caused by tests
performing actions that use ML indices immediately after the actions
that create those ML indices. Currently this can result in attempts
to search the newly created index before its shards have initialized.
This change makes the method that creates the internal ML indices
that have been affected by this problem (state and stats) wait for
the shards to be initialized before returning.
Backport of #59027
Backport of #58582 to 7.x branch.
This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.
The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.
Relates to #53100
* [ML] handles compressed model stream from native process (#58009)
This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.
1. ModelSizeInfo which contains model size information
2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.
`model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.
Native side change: https://github.com/elastic/ml-cpp/pull/1349
This PR implements recursive mapping merging for composable index templates.
When creating an index, we perform the following:
* Add each component template mapping in order, merging each one in after the
last.
* Merge in the index template mappings (if present).
* Merge in the mappings on the index request itself (if present).
Some principles:
* All 'structural' changes are disallowed (but everything else is fine). An
object mapper can never be changed between `type: object` and `type: nested`. A
field mapper can never be changed to an object mapper, and vice versa.
* Generally, each section is merged recursively. This includes `object`
mappings, as well as root options like `dynamic_templates` and `meta`. Once we
reach 'leaf components' like field definitions, they always overwrite an
existing one instead of being merged.
Relates to #53101.
When per_partition_categorization.stop_on_warn is set for an analysis
config it is now passed through to the autodetect C++ process.
Also adds some end-to-end tests that exercise the functionality
added in elastic/ml-cpp#1356
Backport of #58632
* Replace compile configuration usage with api (#58451)
- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
- required as java library will by default not have build jar file
- jar file is now explicit input of the task and gradle will ensure its properly build
* Fix compile usages in 7.x branch
This commits allows data streams to be a valid source for analytics and transforms.
Data streams are fairly transparent and our `_search` and `_reindex` actions work without error.
For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.
There was a discrepancy in the implementation of flush
acknowledgements: most of the class was designed on the
basis that the "last finalized bucket time" could be null
but the wire serialization assumed that it was never
null. This works because, the C++ sends zero "last
finalized bucket time" when it is not known or not
relevant. But then the Java code will print that to
XContent as it is assuming null represents not known or
not relevant.
This change corrects the discrepancies. Internally within
the class null represents not known or not relevant, but
this is translated from/to 0 for communications from the
C++ and old nodes that have the bug.
Additionally I switched from Date to Instant for this
class and made the member variables final to modernise it
a bit.
Backport of #58413
When a local model is constructed, the cache hit miss count is incremented.
When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
Since we change the memory estimates for data frame analytics jobs from worst case to a realistic case, the strict less-than assertion in the test does not hold anymore. I replaced it with a less-or-equal-than assertion.
Backport or #57882
This is a major refactor of the underlying inference logic.
The main refactor is now we are separating the model configuration and
the inference interfaces.
This has the following benefits:
- we can store extra things with the model that are not
necessary for inference (i.e. treenode split information gain)
- we can optimize inference separate from model serialization and storage.
- The user is oblivious to the optimizations (other than seeing the benefits).
A major part of this commit is removing all inference related methods from the
trained model configurations (ensemble, tree, etc.) and moving them to a new class.
This new class satisfies a new interface that is ONLY for inference.
The optimizations applied currently are:
- feature maps are flattened once
- feature extraction only happens once at the highest level
(improves inference + feature importance through put)
- Only storing what we need for inference + feature importance on heap
* [ML] mark forecasts for force closed/failed jobs as failed (#57143)
forecasts that are still running should be marked as failed/finished in the following scenarios:
- Job is force closed
- Job is re-assigned to another node.
Forecasts are not "resilient". Their execution does not continue after a node failure. Consequently, forecasts marked as STARTED or SCHEDULED should be flagged as failed. These forecasts can then be deleted.
Additionally, force closing a job kills the native task directly. This means that if a forecast was running, it is not allowed to complete and could still have the status of `STARTED` in the index.
relates to https://github.com/elastic/elasticsearch/issues/56419
* [ML] adds new for_export flag to GET _ml/inference API (#57351)
Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.
This flag is useful for moving models between clusters.
This adds a max_model_memory setting to forecast requests.
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").
The default value is `20mb`.
There is a HARD limit at `500mb` which will throw an error if used.
If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.
related native change: https://github.com/elastic/ml-cpp/pull/1238
closes: https://github.com/elastic/elasticsearch/issues/56420
Throttling nightly cleanup as much as we do has been over cautious.
Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.
Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.
This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
* [ML] lay ground work for handling >1 result indices (#55892)
This commit removes all but one reference to `getInitialResultsIndexName`.
This is to support more than one result index for a single job.
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.
- The default for `model_snapshot_retention_days` for new jobs is now
10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
that defaults to 1 for new jobs and `model_snapshot_retention_days`
for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
and `model_snapshot_retention_days` all but the first model snapshot
for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
selected model snapshots to be retained indefinitely
Backport of #56125
This refactors native integ tests to assert progress without
expecting explicit phases for analyses. We can test those with
yaml tests in a single place.
Backport of #55925
Also unmutes the integ test that stops and restarts
an outlier detection job with the hope of learning more
of the failure in #55068.
Backport of #55545
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Adds a "node" field to the response from the following endpoints:
1. Open anomaly detection job
2. Start datafeed
3. Start data frame analytics job
If the job or datafeed is assigned to a node immediately then
this field will return the ID of that node.
In the case where a job or datafeed is opened or started lazily
the node field will contain an empty string. Clients that want
to test whether a job or datafeed was opened or started lazily
can therefore check for this.
Backport of #55473
`updateAndGet` could actually call the internal method more than once on contention.
If I read the JavaDocs, it says:
```* @param updateFunction a side-effect-free function```
So, it could be getting multiple updates on contention, thus having a race condition where stats are double counted.
To fix, I am going to use a `ReadWriteLock`. The `LongAdder` objects allows fast thread safe writes in high contention environments. These can be protected by the `ReadWriteLock::readLock`.
When stats are persisted, I need to call reset on all these adders. This is NOT thread safe if additions are taking place concurrently. So, I am going to protect with `ReadWriteLock::writeLock`.
This should prevent race conditions while allowing high (ish) throughput in the highly contention paths in inference.
I did some simple throughput tests and this change is not significantly slower and is simpler to grok (IMO).
closes https://github.com/elastic/elasticsearch/issues/54786
* [ML] fix native ML test log spam (#55459)
This adds a dependency to ingest common. This removes the log spam resulting from basic plugins being enabled that require the common ingest processors.
* removing unnecessary changes
* removing unused imports
* removing unnecessary java setting
Removing the deprecated "xpack.monitoring.enabled" setting introduced
log spam and potentially some failures in ML tests. It's possible to use
a different, non-deprecated setting to disable monitoring, so we do that
here.
We believe there's no longer a need to be able to disable basic-license
features completely using the "xpack.*.enabled" settings. If users don't
want to use those features, they simply don't need to use them. Having
such features always available lets us build more complex features that
assume basic-license features are present.
This commit deprecates settings of the form "xpack.*.enabled" for
basic-license features, excluding "security", which is a special case.
It also removes deprecated settings from integration tests and unit
tests where they're not directly relevant; e.g. monitoring and ILM are
no longer disabled in many integration tests.