This commit adds a test that verifies that snapshots incrementality
is respected when a snapshot-backed index is snapshotted. This
test mounts a snapshot as a snapshot-backed index, creates a
new snapshot from it and then verifies that no new data blobs
were added to the repository.
The `standard` tokenfilter was removed by #33310, and should have been
unuseable in any indexes created since 7.0. However, a cacheing bug fixed
by #51092 meant that it was still possible in certain circumstances to create
indexes referencing the standard filter in versions up to 7.5.2. Our checks
in AnalysisModule still refer to 7.0.0, however, meaning that a cluster that
contains one of these rogue indexes cannot be upgraded.
This commit adjusts the AnalysisModule checks so that we only refuse to
build a mapping referring to standard filter if the index created version is
7.6 or later.
Fixes#62644
The autoscaling decision API now returns an absolute capacity,
and leaves the actual decision of whether a scale up or down
is needed to the orchestration system.
The decision API now returns both a tier and node level required
and current capacity as wells as a decider level breakdown of the
same though with in particular current memory still not populated.
In the context of of a recurring test failure tracked by #32827, we added trace logging and an extra cache key renderer argument to IndicesRequestCache#getOrCompute (see #39475 and #34180).
We addressed the issue with #54071, but the extra argument was left behind, with a NORELEASE comment saying it should be removed.
With this commit, we remove the extra cache key rendered argument and the corresponding log lines which are not so useful without it.
Closes#55837
This commit adds the `index.routing.allocation.prefer._tier` setting to the
`DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a
preference-based list of tiers for an index to be assigned to. For example, if the setting were set
to:
```
"index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content"
```
If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be
allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and
`data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes.
This allows us to specify an index's preference for tier(s) without causing the index to be
unassigned if no nodes of a preferred tier are available.
Subsequent work will change the ILM migration to make additional use of this setting.
Relates to #60848
This commit changes the yamlRestTest and javaRestTest tasks to be lazily created.
This change requires pro-actively creating the testClusters container so that the
configuration can be applied without any changes to the build.gradle files.
related: #60261
related: #47804
Backports #61590 to 7.x
So far we don't allow metadata fields in the document _source. However, in the case of the _doc_count field mapper (#58339) we want to be able to set
This PR adds a method to the metadata field parsers that exposes if the field can be included in the document source or not.
This way each metadata field can configure if it can be included in the document _source
This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well.
- `GET _ml/calendars/<calendar_id>/events`
- `GET _ml/calendars/<calendar_id>`
- `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
- `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`
We removed index-time boosting back in 5x, and we no longer document the 'boost'
parameter on any of our mapping types. However, it is still possible to define an
index-time boost on a field mapper for a surprisingly large number of field types, and
they even have an effect (sometimes, on some queries).
As a first step in finally removing all traces of index time boosting, this comment emits
a deprecation warning whenever a boost parameter is found on a mapping definition.
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)
Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.
* fixing test
* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
In #62357 we introduced an additional optimization that allows us to skip the
most of the fetch phase early if no results are found. This change caused
some cancellation test failures that were relying on definitive cancellation
during the fetch phase. This commit adds an additional quick cancellation
check at the very beginning of the fetch phase to make cancellation process
more deterministic.
Fixes#62530
Changes the way we collecting ordinals in the Cardinality aggregation from Lucene FixedBitSet to BitArray. The benefit is that BitArray is tracked by our Circuit breakers so it is safer.
Today when a snapshot restore is aborted (for example when the index is
explicitly deleted) while the restoration of the files from the repository has
already started the file restores are not interrupted. It means that Elasticsearch
will continue to read the files from the repository and will continue to write
them to disk until all files are restored; the store will then be closed and
files will be deleted from disk at some point but this can take a while. This
will also take some slots in the SNAPSHOT thread pool too. The Recovery
API won't show any files actively being recovered, the only notable
indicator would be the active threads in the SNAPSHOT thread pool.
This commit adds a check before reading a file to restore and before
writing bytes on disk so that a closing store can be detected more
quickly and the file recovery process aborted. This way the file
restores just stops and for most of the repository implementations
it means that no more bytes are read (see #62370 for S3), finishing
threads in the SNAPSHOT thread pool more quickly too.
Async search tests can take more than one minute due to the excessive trace logs.
And the point in time in the tests can be expired the midway.
Closes#62451
This test (in-part) verifies that snapshot creation is not
retried on master fail-over once a snaphot has been started already.
Unless we wait for the snapshot creation to show up in the cluster
state before failing the master node though, we could run into a
race where the snapshot wasn't yet in the cluster state and a retry goes through
successfully.
a dateformatter can be created with a list of parsers which are iterated
during parsing and the first one that passes will return a parsed date.
DateMathParser should do the same, when created based on a list of
non-rounding parsers it should also iterate over all of them - it is at
the moment only taking first element
closing #62207
The underlying issue was fixed a while ago in Lucene:
https://issues.apache.org/jira/browse/LUCENE-9517
and went away when lucene snapshot version was upgraded.
Also the name of the index to rollover had to be slightly changed,
so that it doesn't collide with data stream template's namespace.
(a regular index can't be created in the namespace that is managed
by a template that creates data streams)
Closes#62043
Expressions like `1 = 2 = 3 = 4` or `1 < 2 = 3 >= 4` were treated with
leftmost priority: ((1 = 2) = 3) = 4 which can lead to confusing
results. Since such expressions don't make so much change for EQL
filters we disallow them in the parser to prevent unexpected results
from their bad usage.
Major DBs like PostgreSQL and Oracle also disallow them in their SQL
syntax. (counter example would be MySQL which interprets them as we did
before with leftmost priority).
Fixes: #61654
(cherry picked from commit 8f94981bb093f104228d267b532e0a3d5b7f6a38)
The purpose for this change is to allow validation of queries without
having to actually execute them. The optimizer already picks up this
case.
Fix#62494
(cherry picked from commit 675889559b2f96a0c1faa6fc84fd537148ba2cce)
A recent AWS SDK upgrade has introduced a new source of spurious `WARN`
logs when the security manager prevents access to the user's home
directory and therefore to `$HOME/.aws/config`. This is the behaviour we
want, and it's harmless and handled by the SDK as if the config doesn't
exist, so this log message is unnecessary noise. This commit suppresses
this noisy logging by default.
Relates #20313, #56346, #53962Closes#62493
Removes the unnecessary `synchronized` introduced in #62433 and adjusts
the others to return `this` not `null` as required by the parent
method's Javadocs.
This commit address some build failures from the perspective of Intellij.
These changes include:
* changing an order of a dependency definition that seems to can cause Intellij build to fail.
* introduction of an abstract class out of the test source set (seems to be an issue sharing
classes cross projects with non-standard source sets.
* a couple of missing dependency definitions (not sure how the command line worked prior to this)
Removes methods that were no longer used regarding version 5.4 doc ids of ModelState.
Also adds clean up of 5.4 model state and quantile docs in the daily maintenance.
Backport of #62434
Backport of #62527 to 7.x branch.
This commit adds validation that prohibits the creation of regular indices
in the namespace of templates with data streams enabled.
It shouldn't be possible to create ordinary indices when the name of the index
matches with a composable index template that enables data streams. Auto creation
has logic that creates data streams instead of regular indices. However validation
logic for the create index api was missing.
Faster sequential access for stored fields
Spinoff of #61806
Today retrieving stored fields at search time is optimized for random access.
So we make no effort to keep state in order to not decompress the same data
multiple times because two documents might be in the same compressed block.
This strategy is acceptable when retrieving a top N sorted by score since
there is no guarantee that documents will be on the same block.
However, we have some use cases where the document to retrieve might be
completely sequential:
Scrolls or normal search sorted by document id.
Queries on Runtime fields that extract from _source.
This commit exposes a sequential stored fields reader in the
custom leaf reader that we use at search time.
That allows to leverage the merge instances of stored fields readers that
are optimized for sequential access.
This change focuses on the fetch phase for now and leverages the merge instances
for stored fields only if all documents to retrieve are adjacent.
Applying the same logic in the source lookup of runtime fields should
be trivial but will be done in a follow up.
The speedup on queries sorted by doc id is significant.
I played with the scroll task of the http_logs rally track
on my laptop and had the following result:
| Metric | Task | Baseline | Contender | Diff | Unit |
|--------------------------------------------------------------:|-------:|------------:|------------:|---------:|--------:|
| Total Young Gen GC | | 0.199 | 0.231 | 0.032 | s |
| Total Old Gen GC | | 0 | 0 | 0 | s |
| Store size | | 17.9704 | 17.9704 | 0 | GB |
| Translog size | | 2.04891e-06 | 2.04891e-06 | 0 | GB |
| Heap used for segments | | 0.820332 | 0.820332 | 0 | MB |
| Heap used for doc values | | 0.113979 | 0.113979 | 0 | MB |
| Heap used for terms | | 0.37973 | 0.37973 | 0 | MB |
| Heap used for norms | | 0.03302 | 0.03302 | 0 | MB |
| Heap used for points | | 0 | 0 | 0 | MB |
| Heap used for stored fields | | 0.293602 | 0.293602 | 0 | MB |
| Segment count | | 541 | 541 | 0 | |
| Min Throughput | scroll | 12.7872 | 12.8747 | 0.08758 | pages/s |
| Median Throughput | scroll | 12.9679 | 13.0556 | 0.08776 | pages/s |
| Max Throughput | scroll | 13.4001 | 13.5705 | 0.17046 | pages/s |
| 50th percentile latency | scroll | 524.966 | 251.396 | -273.57 | ms |
| 90th percentile latency | scroll | 577.593 | 271.066 | -306.527 | ms |
| 100th percentile latency | scroll | 664.73 | 272.734 | -391.997 | ms |
| 50th percentile service time | scroll | 522.387 | 248.776 | -273.612 | ms |
| 90th percentile service time | scroll | 573.118 | 267.79 | -305.328 | ms |
| 100th percentile service time | scroll | 660.642 | 268.963 | -391.678 | ms |
| error rate | scroll | 0 | 0 | 0 | % |
Closes#62024