CCS with remote indices only does not require any privileges on the local cluster.
This PR ensures that search with scroll follow the permission model.
This allows the `check-migration` step to move past the allocation check
if the tier routing settings are manually unset.
This helps a user unblock ILM in case a tier is removed (ie. if the warm tier
is decommissioned this will allow users to resume the ILM policies stuck in
`check-migration` waiting for the warm nodes to become available and the managed
index to allocate. this allows the index to allocate on the other available tiers)
(cherry picked from commit d7a1eaa7f51d0972d10c0df1d3cd77d6b755dd41)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Implement FORMAT according to the SQL Server spec: https://docs.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql?view=sql-server-ver15#ExampleD by translating to the java.time patterns used in DATETIME_FORMAT.
Closes: #54965
Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com>
Co-authored-by: Bogdan Pintea <bogdan.pintea@elastic.co>
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
(cherry picked from commit da511f4e033db6e8a6aa2a54b23e906b5e026845)
As part of the conversion, adds the ability to customize merge validation - in this case, we
allow an update to the constant value if it is currently set to null, but refuse further
updates once it has been set once.
This commit also converts ParametrizedMapperTests to use MapperServiceTestCase.
This PR adds a new 'version' field type that allows indexing string values
representing software versions similar to the ones defined in the Semantic
Versioning definition (semver.org). The field behaves very similar to a
'keyword' field but allows efficient sorting and range queries that take into
accound the special ordering needed for version strings. For example, the main
version parts are sorted numerically (ie 2.0.0 < 11.0.0) whereas this wouldn't
be possible with 'keyword' fields today.
Valid version values are similar to the Semantic Versioning definition, with the
notable exception that in addition to the "main" version consiting of
major.minor.patch, we allow less or more than three numeric identifiers, i.e.
"1.2" or "1.4.6.123.12" are treated as valid too.
Relates to #48878
This commit adds a test that verifies that snapshots incrementality
is respected when a snapshot-backed index is snapshotted. This
test mounts a snapshot as a snapshot-backed index, creates a
new snapshot from it and then verifies that no new data blobs
were added to the repository.
The autoscaling decision API now returns an absolute capacity,
and leaves the actual decision of whether a scale up or down
is needed to the orchestration system.
The decision API now returns both a tier and node level required
and current capacity as wells as a decider level breakdown of the
same though with in particular current memory still not populated.
This commit adds the `index.routing.allocation.prefer._tier` setting to the
`DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a
preference-based list of tiers for an index to be assigned to. For example, if the setting were set
to:
```
"index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content"
```
If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be
allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and
`data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes.
This allows us to specify an index's preference for tier(s) without causing the index to be
unassigned if no nodes of a preferred tier are available.
Subsequent work will change the ILM migration to make additional use of this setting.
Relates to #60848
Backports #61590 to 7.x
So far we don't allow metadata fields in the document _source. However, in the case of the _doc_count field mapper (#58339) we want to be able to set
This PR adds a method to the metadata field parsers that exposes if the field can be included in the document source or not.
This way each metadata field can configure if it can be included in the document _source
This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well.
- `GET _ml/calendars/<calendar_id>/events`
- `GET _ml/calendars/<calendar_id>`
- `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
- `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)
Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.
* fixing test
* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
Async search tests can take more than one minute due to the excessive trace logs.
And the point in time in the tests can be expired the midway.
Closes#62451
Expressions like `1 = 2 = 3 = 4` or `1 < 2 = 3 >= 4` were treated with
leftmost priority: ((1 = 2) = 3) = 4 which can lead to confusing
results. Since such expressions don't make so much change for EQL
filters we disallow them in the parser to prevent unexpected results
from their bad usage.
Major DBs like PostgreSQL and Oracle also disallow them in their SQL
syntax. (counter example would be MySQL which interprets them as we did
before with leftmost priority).
Fixes: #61654
(cherry picked from commit 8f94981bb093f104228d267b532e0a3d5b7f6a38)
The purpose for this change is to allow validation of queries without
having to actually execute them. The optimizer already picks up this
case.
Fix#62494
(cherry picked from commit 675889559b2f96a0c1faa6fc84fd537148ba2cce)
This commit address some build failures from the perspective of Intellij.
These changes include:
* changing an order of a dependency definition that seems to can cause Intellij build to fail.
* introduction of an abstract class out of the test source set (seems to be an issue sharing
classes cross projects with non-standard source sets.
* a couple of missing dependency definitions (not sure how the command line worked prior to this)
Removes methods that were no longer used regarding version 5.4 doc ids of ModelState.
Also adds clean up of 5.4 model state and quantile docs in the daily maintenance.
Backport of #62434
Backport of #62527 to 7.x branch.
This commit adds validation that prohibits the creation of regular indices
in the namespace of templates with data streams enabled.
It shouldn't be possible to create ordinary indices when the name of the index
matches with a composable index template that enables data streams. Auto creation
has logic that creates data streams instead of regular indices. However validation
logic for the create index api was missing.
Faster sequential access for stored fields
Spinoff of #61806
Today retrieving stored fields at search time is optimized for random access.
So we make no effort to keep state in order to not decompress the same data
multiple times because two documents might be in the same compressed block.
This strategy is acceptable when retrieving a top N sorted by score since
there is no guarantee that documents will be on the same block.
However, we have some use cases where the document to retrieve might be
completely sequential:
Scrolls or normal search sorted by document id.
Queries on Runtime fields that extract from _source.
This commit exposes a sequential stored fields reader in the
custom leaf reader that we use at search time.
That allows to leverage the merge instances of stored fields readers that
are optimized for sequential access.
This change focuses on the fetch phase for now and leverages the merge instances
for stored fields only if all documents to retrieve are adjacent.
Applying the same logic in the source lookup of runtime fields should
be trivial but will be done in a follow up.
The speedup on queries sorted by doc id is significant.
I played with the scroll task of the http_logs rally track
on my laptop and had the following result:
| Metric | Task | Baseline | Contender | Diff | Unit |
|--------------------------------------------------------------:|-------:|------------:|------------:|---------:|--------:|
| Total Young Gen GC | | 0.199 | 0.231 | 0.032 | s |
| Total Old Gen GC | | 0 | 0 | 0 | s |
| Store size | | 17.9704 | 17.9704 | 0 | GB |
| Translog size | | 2.04891e-06 | 2.04891e-06 | 0 | GB |
| Heap used for segments | | 0.820332 | 0.820332 | 0 | MB |
| Heap used for doc values | | 0.113979 | 0.113979 | 0 | MB |
| Heap used for terms | | 0.37973 | 0.37973 | 0 | MB |
| Heap used for norms | | 0.03302 | 0.03302 | 0 | MB |
| Heap used for points | | 0 | 0 | 0 | MB |
| Heap used for stored fields | | 0.293602 | 0.293602 | 0 | MB |
| Segment count | | 541 | 541 | 0 | |
| Min Throughput | scroll | 12.7872 | 12.8747 | 0.08758 | pages/s |
| Median Throughput | scroll | 12.9679 | 13.0556 | 0.08776 | pages/s |
| Max Throughput | scroll | 13.4001 | 13.5705 | 0.17046 | pages/s |
| 50th percentile latency | scroll | 524.966 | 251.396 | -273.57 | ms |
| 90th percentile latency | scroll | 577.593 | 271.066 | -306.527 | ms |
| 100th percentile latency | scroll | 664.73 | 272.734 | -391.997 | ms |
| 50th percentile service time | scroll | 522.387 | 248.776 | -273.612 | ms |
| 90th percentile service time | scroll | 573.118 | 267.79 | -305.328 | ms |
| 100th percentile service time | scroll | 660.642 | 268.963 | -391.678 | ms |
| error rate | scroll | 0 | 0 | 0 | % |
Closes#62024
FetchSubPhase#getProcessor currently takes a SearchLookup parameter. This
however is only needed by a couple of subphases, and will almost certainly change in
future as we want to simplify how fetch phases retrieve values for individual hits.
To future-proof against further signature changes, this commit moves the SearchLookup
reference into FetchContext instead.
The data frame structure in c++ has a limit on 2^32 documents. This commit
adds a check that the number of documents involved in the analysis are
less than that and fails to start otherwise. That saves the cost of
reindexing when it is unnecessary.
Backport of #62547
This adds ILM support for automatically migrating the managed
indices between data tiers.
This proposal makes use of a MigrateAction that is injected
(similar to how the Unfollow action is injected) in phases that
don't define index allocation rules using the AllocateAction or
don't explicitly define the MigrateAction itself (regardless if it's
enabled or disabled).
(cherry picked from commit c1746afffd61048d0c12d3a77e6d8191a804ed49)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
ReplaceDataStreamBackingIndexStep#performAction seems to perform an equality
check on an original Index and the write indexes names, but because this
compares an Index instance to a String, the condition can never be met. This PR
changes this comparison.
ShardClusterSnapshotRestoreIT is confusing as we already have a
very complete SharedClusterSnapshotRestoreIT test suite. This
commit removes ShardClusterSnapshotRestoreIT and folds its
unique test in DataStreamsSnapshotsIT.
Since #61857 we test using BCJSSE (Bouncy Castle SSL) when running on
Zulu8 because Azul have backported SSL changes from Java11 into their
Java8 JRE which prevents us from using Sun JSSE in FIPS mode.
BCJSSE uses different exception messages than Sun JSSE, so we needed
to update
RestrictedTrustManagerTests.testThatDelegateTrustManagerIsRespected
to reflect the fact that sometimes we might be receive BCJSSE error
messages on a Java8 JVM
Resolves: #62281
The usage of single quotes to wrap a string literal is forbidden
and an error encouraging the user to user double quotes is returned.
Tests are properly adjusted.
Relates to #61659
(cherry picked from commit 8be400b77370bf4cf68c89f492c2d235f3cce43c)
Current implementations of the indexer are using aggregations.
Thus each search step executes a search action. However,
we can generalize that to allow for any action that returns a `SearchResponse`.
This commit abstracts the search phase from the search action.
Backport of #61739
This new snapshot contains the following JIRAs that we're interested in:
- [LUCENE-9525](https://issues.apache.org/jira/browse/LUCENE-9525)
Better handling of small documents. This should improve retrieval times
when documents are less than ~1kB.
- [LUCENE-9510](https://issues.apache.org/jira/browse/LUCENE-9510)
Faster flushes when index sorting is enabled by not compressing the
temporary files that store stored fields and term vectors.
With this commit we rename all of the fielddata, doc_values and mapped field type classes for runtime fields to not start with the Script prefix but rather their runtime type (e.g. Boolean) and only then Script
To preserve the PIT semantics, the retrieval of results has moved from
using multi-get to using an idsQuery.
(cherry picked from commit 1c2362fcf2be62ce568b3772924abce7331ef23c)
With this commit we rename the script classes used for each mapped field type used for runtime fields. The new naming is a shorter version of the previous one: from e.g. BooleanScriptFieldScrip to BooleanScript . We also move such classes to the existing mapper package.
Constructing the timout checker FIRST and THEN registering the watcher allows the test to have a race condition.
The timeout value could be reached BEFORE the matcher is added. To prevent the matcher never being interrupted, a new timedOut value is added to the watcher thread entry. Then when a new matcher is registered, if the thread was previously timedout, we interrupt the matcher immediately.
closes#48861
This commit unmutes the windows check for testTooManyPartitions test.
The assertion has since changed to include a soft_limit check.
This coupled with changes over the past years means the test should be enabled again.
related to: #32033
With the addition of sub aggregations like filter, the validation could fail if 2 sub aggs use the
same output name. This change makes validation sub-agg aware.
fixes#57814
The OpenID Connect specification defines a number of ways for a
client (RP) to authenticate itself to the OP when accessing the
Token Endpoint. We currently only support `client_secret_basic`.
This change introduces support for 2 additional authentication
methods, namely `client_secret_post` (where the client credentials
are passed in the body of the POST request to the OP) and
`client_secret_jwt` where the client constructs a JWT and signs
it using the the client secret as a key.
Support for the above, and especially `client_secret_jwt` in our
integration tests meant that the OP we use ( Connect2id server )
should be able to validate the JWT that we send it from the RP.
Since we run the OP in docker and it listens on an ephemeral port
we would have no way of knowing the port so that we can configure
the ES running via the testcluster to know the "correct" Token
Endpoint, and even if we did, this would not be the Token Endpoint
URL that the OP would think it listens on. To alleviate this, we
run an ES single node cluster in docker, alongside the OP so that
we can configured it with the correct hostname and port within
the docker network.
Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
This implements the `fields` API in `_search` for runtime fields using
doc values. Most of that implementation is stolen from the
`docvalue_fields` fetch sub-phase, just moved into the same API that the
`fields` API uses. At this point the `docvalue_fields` fetch phase looks
like a special case of the `fields` API.
While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of `instanceof`s to a
method on `LeafFieldData` so we can be much more flexible with what is
returned and we're not forced to extend certain classes just to make the
fetch phase happy.
Relates to #59332
We were checking for loops in queries before, but we had an "off by one"
error where we wouldn't notice the "top level" runtime field when
detecting a loop. So the error message would be wrong.
I also caught a few bugs with query generation caused by missing
`@Override` annotations and fixed a few of them. There is a bug with
`regexp` queries with match options that I'm not fixing in this PR but
will get to later.
Relates to #59332
Prevent the analyzer for trying to resolve aliases on expressions that
reference themselves (or fields within themselves) as that causes
infinite recursion.
Fix#62296
(cherry picked from commit 021d27815b03e92e02859bc9c0c8eec78f30c72e)
This adds two extra bits of info to the profiler:
1. Count of the number of different types of collectors. This lets us figure
out if we're using the optimization for segment ordinals. It adds a few
more similar counters just for good measure.
2. Profiles the `getLeafCollector` and `postCollection` methods. These are
non-trivial for some aggregations, like cardinality.
We want Logstash indices to be system indices, but the logstash
service will still need to be able to manage its indices. This PR
adds special system index APIs to the logstash plugin so that
logstash can manage its pipelines without direct access to the
underlying indices.
* Add logstash module with dedicated logstash APIs
* merge with x-pack plugin
* add system index access allowance
* Break out serialization tests into distinct classes
* Log failures for partial multiget failure
* Move LogstashSystemIndexIT to javaRestTest task
Co-authored-by: William Brafford <william.brafford@elastic.co>
Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
AuthorizationService#authorize uses the thread context to carry the result of the
authorisation as transient headers. The listener argument to the `authorize` method
must necessarily observe the header values. This PR makes it so that
the authorisation transient headers (`_indices_permissions` and `_authz_info`, but
NOT `_originating_action_name`) of the child action override the ones of the parent action.
Co-authored-by: Tim Vernum tim@adjective.org
This was missing and caused nodes to drop out of the cluster on serialization failures
when ever one tried to get an enrich policy task by name.
The test in here is a little dirty but I figured it would be nice to have an actual reproducer
for the issue and I couldn't find any infrastructure to nicely time the tasks so I put this on
top of existing test infra.
Use the newly introduced PIT API to have a consistent view of the data
while doing sequence matching, which involves multiple calls, aka
repeatable reads and thus avoid race conditions or any in-flight updates
on the data.
(cherry picked from commit daa72fc3c71fd36afb55278021ff6bbc591ef148)
Backport of #62361 to 7.x branch.
This test was fine and shouldn't have been muted.
The test case class should have preserved data streams as part of #62205Closes#62210
* Add "synthetics-*-*" templates for synthetics fleet data
For the Elastic Agent we currently have `logs` and `metrics`, however, synthetic data doesn't belong
with those and thus we should have a place for it to live. This would be data reported from
heartbeat and under the 'monitoring' category.
This commit adds a composable index template for `synthetics-*-*` indices similar to the work in
#56709 and #57629.
Resolves#61665
This PR adds support for the 'fields' option in the following places:
* Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing
* The `top_hits` aggregation
Addresses #61949.
The annotations index is not covered by the comparison between
mappings and templates, as it does not use an index template.
This commit adds an assertion on annotations index mappings
that will fail if the mappings are not upgraded as expected.
Backport of #62325
The job comms thread pool is intended for the long-running job
processes that do anomaly detection or data frame analytics and
count towards job count and memory limits.
This commit moves the short-lived memory estimation processes
to the ML utility thread pool.
Although this doesn't matter in most cases, at the limits of
scale it could mean that memory estimations would get in the way
of starting jobs, or would queue up for an excessive period of
time while waiting for jobs to finish.
Similar to the work in #60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This
introduces a new `data_content` node role to be used for the Content tier.
Currently this tier is not used anywhere, but subsequent work will use this tier.
Relates to #60848
When calling `_execute` there is a chance that there will be bulk indexing failures
or search failures.
These will result in the call failing overall. But, no information is provided for troubleshooting the failure.
This commit adds logging to indicate the number of failures, and new debug level logging so that
failure details can be determined if necessary.
closes https://github.com/elastic/elasticsearch/issues/60491
Just a number of obvious spots where we were allocating
duplicate empty structures or otherwise inefficient that I
found while investigating snapshot cluster state update performance.
This commit deprecates the Repository Stats API added in 7.8.0 as
an experimental API behind a feature flag. The goal is to deprecate
this API in 7.10.0 and remove it in a follow up PR in 8.0.0.
This API is now superseded by the Repositories Metering API.
It has been observed that if the normalizer process fails
to connect to the JVM then this causes a null pointer
exception as the JVM tries to close the native process
object. The accessors and close methods of the native
process class that access the C++ log handler should not
assume that it connected correctly.
Backport of #62059 to 7.x branch.
Return a 404 http status code when attempting to delete a non existing data stream.
However only return a 404 when targeting a data stream without any wildcards.
Closes#62022
This commit addresses a super minor misalignment with master, applying exactly the same change that was made as part of #62057, which was backported before point in time APIs were backported.
If shards are relocated to new nodes, then searches with a point in time
will fail, although a pit keeps search contexts open. This commit solves
this problem by reducing info used by SearchShardIterator and always
including the matching nodes when resolving a point in time.
Closes#61627
This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext)
before any other operation and that it is correctly recycled or removed at the end of the operation.
This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once.
Relates #61446
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.
A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.
```
POST /my_index/_pit?keep_alive=1m
```
The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.
```
POST /_search
{
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"pit": {
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
}
}
```
Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.
```
DELETE /_pit
{
"id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```
#### Notable works in this change:
- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966
Relates #46523
Relates #26472
Co-authored-by: Jim Ferenczi <jimczi@apache.org>
CCR shard follow task can hit CircuitBreakingException on the leader
cluster (read changes requests) or the follower cluster (bulk requests).
CCR should retry on CircuitBreakingException as it's a transient error.
We were missing a few `@Override` annotations in runtime fields which
let us drift from the methods we were supposed to override. Oops. This
adds them and links the methods.
For runtime fields we have written quite some lucene queries that work against runtime values that are the result of the execution of the different script contexts that runtime fields support.
The all (but one) share the same main logic: use a two phase iterator, iterate over all documents, and decide whether the current doc matches or not based on what the script returns. I went ahead and shared this bit of code in the base class for all queries on top of runtime fields.
This commit removes the documentation for some specific Searchable Snapshot REST APIs:
- clear cache
- searchable snapshot stats
- repository stats
These APIs are low-level and are useful to investigate the behavior of snapshot
backed indices but we expect them to be removed in the future or to appear in
a different form.
Backporting #62205 to 7.x branch.
This is similar to what happens for indices. Initially we decided to let each test cleanup the
data streams it created.
The reason behind this was that client yaml test runners would need to be modified to do this too and
because data steams were new, we waited with that and let each test cleanup the data stream it created.
However we sometimes have very hard to debug test failures, because many tests fail because another test
failed mid way and didn't clean up the data streams it created. Given that and data streams exist in
the code base for a while now, we should automatically delete all data streams after each yaml test.
Relates to #62190
* preserve data streams for rolling upgrade yaml tests
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing. The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects. However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects. As a first
step it makes sense to move the returned mappings closer
to the standard format.
This is a small building block towards fixing #55616
This commit removes `integTest` task from all es-plugins.
Most relevant projects have been converted to use yamlRestTest, javaRestTest,
or internalClusterTest in prior PRs.
A few projects needed to be adjusted to allow complete removal of this task
* x-pack/plugin - converted to use yamlRestTest and javaRestTest
* plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task
* qa/die-with-dignity - convert to javaRestTest
* x-pack/qa/security-example-spi-extension - convert to javaRestTest
* multiple projects - remove the integTest.enabled = false (yay!)
related: #61802
related: #60630
related: #59444
related: #59089
related: #56841
related: #59939
related: #55896
This prevent `keyword` valued runtime scripts from emitting too many
values or values that take up too much space. Without this you can put
allocate a ton of memory with the script by sticking it into a tight
loop. Painless has some protections against this but:
1. I don't want to rely on them out of sheer paranoia
2. They don't really kick in when the script uses callbacks like we do
anyway.
Relates to #59332
* [ML] only persist progress if it has changed
We already search for the previously stored progress document.
For optimization purposes, and to prevent restoring the same
progress after a failed analytics job is stopped,
this commit does an equality check between the previously stored progress and current progress
If the progress has changed, persistence continues as normal.
When a tree model is provided, it is possible that it is a stump.
Meaning, it only has one node with no splits
This implies that the tree has no features. In this case,
having zero feature_names is appropriate. In any other case,
this should be considered a validation failure.
This commit adds the validation if there is more than 1 node,
that the feature_names in the model are non-empty.
closes#60759
Fetch failures are currently tracked byy AsyncSearchTask like ordinary shard failures. Though they should be treated differently or they end up causing weird scenarios like total=num_shards and successful=num_shards as the query phase ran fine yet the failed count would reflect the number of shards where fetch failed.
Given that partial results only include aggs for now and are complete even if fetch fails, we can ignore fetch failures in async search, as they will be anyways included in the response. They are in fact either received as a failure when all shards fail during fetch, or as part of the final response when only some shards fail during fetch.
Currently, the async search task is the task that will be running through the whole execution of an async search. While the submit async search task prints out the search as part of its description, async search task doesn't while it should.
With this commit we address that while also making sure that the description highlights that the task is originated from an async search.
Also, we streamline the way the description is printed out by SearchTask so that it does not get forgotten in the future.
Wildcard field bug fix for term and prefix queries.
We now escape any * or ? characters in the search string before delegating to the main wildcardQuery() method.
Closes#62081
In many cases we don't need a `StreamInput` or `StreamOutput`
wrapper around these streams so I this commit adjusts the API
to just normal streams and adds the wrapping where necessary.
Since join keys are common across all queries in a Join/Sequence, any
constraint applied on one query needs to be obeyed but all the other
queries.
This PR enhances the optimizer to propagate such constraints across
all queries so they get pushed down to the actual generated ES queries.
Fix#58937
(cherry picked from commit 4afa5debc199c132c07015bfae17952c40a21e5d)
Previous work has been done to prevent automatically creating a concrete index when an alias is desired.
This commit addresses a path where this check was not being done.
relates: #62064
* Update create-api-keys.asciidoc
* Adding note to create API keys for https
* Adding note for enabling TLS
* Add specific setting for ssl.enabled
* Incorporating review feedback
- Adds missing mappings for `alpha`, `gamma`, and `lambda`.
- Corrects name of `soft_tree_depth_limit` and `soft_tree_depth_tolerance`.
- Removes unused `regularization_depth_penalty_multiplier`,
`regularization_leaf_weight_penalty_multiplier` and
`regularization_tree_size_penalty_multiplier`.
Backport of #61980
Now that #61324 is merged it is possible for the find_file_structure
endpoint to suggest using date_nanos fields for timestamps where
the timestamp format provides greater than millisecond accuracy.
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.
In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.
Backport of #60371
At the end of the rolling upgrade tests check the mappings of the concrete
.ml and .transform-internal indices match the mappings in the templates.
When the templates change, the tests should prove that the mappings have
been updated in the new cluster.
This change moves watcher, ILM history and SLM history templates to composable templates.
Versions are updated to reflect the switch. Only change to the templates themselves is added `_meta` to mark them as managed
* The query client uses an array of indices instead of the comma separated
version of the indices names
(cherry picked from commit 8ec4a768f4892a4a2faed25836cb333a9deb2ace)
We've had some discussions around the user experience when using runtime fields. Although we do plan on having multiple runtime fields implementation (e.g. grok, lookup etc.) which could be exposed as different field types, we decided to expose all runtime fields under the same `runtime` type. At the moment, the only implementation will be through scripts, hence a `script` must be specified. In the future, there will be other ways to generate values for runtime fields besides scripts.
This translates also to renaming the RuntimeScriptFieldMapper class to RuntimeFieldMapper .
Relates to #59332
Today, the terms aggregation reduces multiple aggregations at once using a map
to group same buckets together. This operation can be costly since it requires
to lookup every bucket in a global map with no particular order.
This commit changes how term buckets are sorted by shards and partial reduces in
order to be able to reduce results using a merge-sort strategy.
For bwc, results are merged with the legacy code if any of the aggregations use
a different sort (if it was returned by a node in prior versions).
Relates #51857
Backport of #61998 to 7.x branch.
Moving the data stream yaml tests to xpack plugin module has the following benefits:
* The tests are ran both with security enabled (as part of xpack/plugin integTest)
and disabled (as part of xpack/plugin/data-stream/qa/rest integTest).
* and running the tests in mixed cluster qa environment.
This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch.
We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script.
The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields.
Co-authored-by: Nik Everett <nik9000@gmail.com>
Simplifies allocation for snapshot-backed shards by always making the recovery source "from snapshot" for those
snapshot-backed shards (instead of "recover from local or from empty store"). Also let's the balancer pick a node which
to allocate the snapshot-backed shard to (which takes number of shards on each node into account unlike the current
implementation which just picks whatever node we are allowed to allocate to, with no notion of "balancing" at all).
During prewarming of a Lucene file a CacheFile is acquired and
then locked for the duration of the prewarming, ie locked until all
the part of the file has been downloaded and written to cache on
disk. The locking (executed with CacheFile#fileLock()) is here to
prevent the cache file to be evicted while it is prewarming.
But holding the lock may take a while for large files, specially since
restoring snapshot files now respects the
indices.recovery.max_bytes_per_sec setting of 40mb (#58658),
and this can have bad consequences like preventing the CacheFile
to be evicted, opened or closed. In manual tests this bug slow
downs various requests like mounting a new searchable snapshot
index or deleting an existing one that is still prewarming.
This commit reduces the time the lock is held during prewarming so
that the read lock is only required when actively writing to the CacheFile.
* [ML] adds new n_gram_encoding custom processor (#61578)
This adds a new `n_gram_encoding` feature processor for analytics and inference.
The focus of this processor is simple ngram encodings that allow:
- multiple ngrams [1..5]
- Prefix, infix, suffix
Previously, we added a copy of the `_id` during reindexing and sorted
the destination index on that. This allowed us to traverse the docs in the
destination index in a stable order multiple times and with efficiency.
However, the destination index being sorted means we cannot have `nested`
typed fields. This is a problem as it does not allow us to provide
a good experience with our evaluate API when it comes to computing
metrics for specific classes, features, etc.
This commit changes the approach in order to result to a destination
index that allows nested fields.
Instead of adding a copy of the `_id` field, we now add an incremental
id that we can use to traverse the docs in a stable order. We also
ensure we always assign the same incremental id to the same doc from
the source indices by sorting on `_seq_no` during reindexing. That
in combination with the reindexing API using scroll gives us a stable
order as scroll uses the (`_index`, `_doc`, shard_id) tuple to resolve ties.
The extractor now does not need to scroll. Instead we sort on the incremental
id and we do ranged searches to avoid the sort-all-docs overhead.
Finally, the `TestDocsIterator` is simply changed to search_after the incremental id.
With these changes data frame analytics jobs do not use scroll at any part.
Having all these in place, the commit adds the `nested` types to the necessary
fields of `classification` and `regression` analyses results.
Backport of #61943
When a user authenticates via OpenID Connect we copy information from
the OIDC claims into the user's metadata in a particular format.
This commit adds a test that metadata in that format can be used in a
mustache template for Document Level Security.
Backport of: #60030
A role mapping with the following content:
"rules": { "field": { "userid" : "admin" } }
will never match because `userid` is not a valid field. The correct
field is `username`.
This change adds DEBUG logging when an undefined field is referenced.
The choice to use DEBUG rather than INFO/WARN is that the set of
fields is partially dynamic (e.g. the `metadata.*` fields), so
it may be perfectly reasonable to check a field that is not defined
for that user. For example this rule:
"rules": { "field": { "metadata.ranking" : "A" } }
would generate a log message for an unranked user, which would
erroneously suggest that such a rule is an error.
This DEBUG logging will assist in diagnosing problems, without
introducing that confusion.
Backport of: #61246
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
There are currently half a dozen ways to add plugins and modules for
test clusters to use. All of them require the calling project to peek
into the plugin or module they want to use to grab its bundlePlugin
task, and then both depend on that task, as well as extract the archive
path the task will produce. This creates cross project dependencies that
are difficult to detect, and if the dependent plugin/module has not yet
been configured, the build will fail because the task does not yet
exist.
This commit makes the plugin and module methods for testclusters
symmetetric, and simply adding a file provider directly, or a project
path that will produce the plugin/module zip. Internally this new
variant uses normal configuration/dependencies across projects to get
the zip artifact. It also has the added benefit of no longer needing the
caller to add to the test task a dependsOn for bundlePlugin task.
The main changes are:
* Fix custom params are missing when using template or script in watcher's
logging action or jira action.
* Add yaml tests to test passing params to template or script successfully.
Relates to #57625
Co-authored-by: bellengao <gbl_long@163.com>
The current implementation of the filter pipe is incomplete hence why
it got reverted. Note this is not a complete revert as some of the
improvements of said commit (such as the PostAnalyzer) are useful in
general.
Relates #61805
(cherry picked from commit 7a7eb66f7d39586c3a3bc00dce49e6c47a23b46a)
Backport of #61904 to 7.x branch.
The eql search api redirects to the search api. For this reason the eql
search api could work with concrete data stream names. However if security
is enabled and a data stream name snippet with a wildcard was used then
it could not resolve this expressions. This is because the EqlSearchRequest
class didn't overwrite the `includeDataStreams()` method. This pr fixes this,
so that the security layer can properly expand data stream name wildcard
expressions for the eql search api.
This commit also moves the eql data stream test to xpack rest tests,
so that the test runs with security enabled. This is required to reproduce
the bug.
Closes#60828
We frequently use `long`s with `BitArray` in aggs and right now we have
to assert that the `long` fits in an `int`. This adds support for `long`
to `BitArray` so we don't need those assertions.
This fixes a bug introduced by #61782. In that PR I thought I could
simplify the persistence of progress by using the progress straight
from the stats holder in the task instead of calling the get
stats action. However, I overlooked that it is then possible to
have stale progress for the reindexing task as that is only updated
when the get stats API is called.
In this commit this is fixed by updating reindexing task progress
before persisting the job progress. This seems to be much more
lightweight than calling the get stats request.
Closes#61852
Backport of #61868
For 1/2 the plugins in x-pack, the integTest
task is now a no-op and all of the tests are now executed via a test,
yamlRestTest, javaRestTest, or internalClusterTest.
This includes the following projects:
security, spatial, stack, transform, vecotrs, voting-only-node, and watcher.
A few of the more specialized qa projects within these plugins
have not been changed with this PR due to additional complexity which should
be addressed separately.
related: #60630
related: #56841
related: #59939
related: #55896
For 1/2 the plugins in x-pack, the integTest
task is now a no-op and all of the tests are now executed via a test,
yamlRestTest, javaRestTest, or internalClusterTest.
This includes the following projects:
async-search, autoscaling, ccr, enrich, eql, frozen-indicies,
data-streams, graph, ilm, mapper-constant-keyword, mapper-flattened, ml
A few of the more specialized qa projects within these plugins
have not been changed with this PR due to additional complexity which should
be addressed separately.
A follow up PR will address the remaining x-pack plugins (this PR is big enough as-is).
related: #61802
related: #56841
related: #59939
related: #55896
While starting the data frame analytics process it is possible
to get an exception before the process crash handler is in place.
In addition, right after starting the process, we check the process
is alive to ensure we capture a failed process. However, those exceptions
are unhandled.
This commit catches any exception thrown while starting the process
and sets the task to failed with the root cause error message.
I have also taken the chance to remove some unused parameters
in `NativeAnalyticsProcessFactory`.
Relates #61704
Backport of #61838
Allow filtering through a pipe, across events and sequences.
Filter pipes are pushed down to base queries.
For now filtering after limit (head/tail) is forbidden as the
semantics are still up for debate.
Fix#59763
(cherry picked from commit 80569a388b76cecb5f55037fe989c8b6f140761b)
The ML mappings upgrade test had become useless as it was
checking a field that has been the same since 6.5. This
commit switches to a field that was changed in 7.9.
Additionally, the test only used to check the results index
mappings. This commit also adds checking for the config
index.
Backport of #61340
During a rolling upgrade it is possible that a worker node will be upgraded before
the master in which case the DFA templates will not have been installed.
Before a DFA task starts check that the latest template is installed and install it if necessary.
When an error occurs and we set the task to failed via
the `DataFrameAnalyticsTask.setFailed` method we do not
persist progress. If the job is later restarted, this means
we do not correctly restore from where we can but instead
we start the job from scratch and have to redo the reindexing
phase.
This commit solves this bug by persisting the progress before
setting the task to failed.
Backport of #61782
@ywangd made an awesome analysis on why this test is failing, over
at https://github.com/elastic/elasticsearch/issues/55816#issuecomment-620913282
This change makes it so that we use the same client to perform a
refresh of a token, as we use to subsequently attempt to authenticate
with the refreshed token. This ensures the tests are failing and is
a good approximation of how we expect the same client doing the
refresh, to also perform the subsequent authentication in real life
uses.
The errors we were seeing from users have disappeared after #55114
so we deem our behavior safe.
System indices can be snapshotted and are therefore potential candidates
to be mounted as searchable snapshot indices. As of today nothing
prevents a snapshot to be mounted under an index name starting with .
and this can lead to conflicting situations because searchable snapshot
indices are read-only and Elasticsearch expects some system indices
to be writable; because searchable snapshot indices will soon use an
internal system index (#60522) to speed up recoveries and we should
prevent the system index to be itself a searchable snapshot index
(leading to some deadlock situation for recovery).
This commit introduces a changes to prevent snapshots to be mounted
as a system index.
BlobStoreCacheService implements ClusterStateListener in order to
maintain a ready flag that can be used to know when the snapshot
blob cache should be queries or not.
Now the getAsync() method correctly handles the various exceptions
that can be thrown when the .snapshot-blob-cache index is not
available(in isExpectedCacheGetException()) and logs as DEBUG
we can safely remove the ready flag.
This is a minor refactor where the job node load logic (node availability, etc.) is refactored into its own class.
This will allow future things (i.e. autoscaling decisions) to use the same node load detection class.
backport of #61521
This commit addresses two issues:
- per class feature importance is now written out for binary classification (logistic regression)
- The `class_name` in per class feature importance now matches what is written in the `top_classes` array.
backport of https://github.com/elastic/elasticsearch/pull/61597
- don't do encoding of asynchExecutionId if it is already provided in
the encoded form
- create a new instance of AsyncExecutionId after checks for
correctness are done
If the master node of the follower cluster is busy, then the
auto-follower will fail to initialize the following process. This also
occurs when an auto-follow pattern matches multiple indices. We should
set the timeout of put-follow requests issued by the auto-follower to
unbounded to avoid this problem.
Closes#56891
Backport of #61474.
Part of #46106. Simplify the implementation of deprecation logging by
relying of log4j more completely, and implementing additional behaviour
through custom appenders and filters.
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.
This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.
This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:
```
// Create an index
PUT /eggplant
// Get an index
GET /eggplant?flat_settings
```
Returns the default settings now of:
```json
{
"eggplant" : {
"aliases" : { },
"mappings" : { },
"settings" : {
"index.creation_date" : "1597855465598",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "1",
"index.provided_name" : "eggplant",
"index.routing.allocation.include._tier" : "data_hot",
"index.uuid" : "6ySG78s9RWGystRipoBFCA",
"index.version.created" : "8000099"
}
}
}
```
After the initial setting of this setting, it can be treated like any other index level setting.
This new setting is *not* set on a new index if any of the following is true:
- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)
Relates to #60848
The check introduced by #60640 for scroll searches, in which we log
if the index access control before the query and fetch phases differs
from when the scroll context is created, is too strict, leading to spurious
warning log messages.
The check verifies instance equality but this assumes that the fetch
phase is executed in the same thread context as the scroll context
validation. However, this is not true if the scroll search is executed
cross-cluster, and even for local scroll searches it is an unfounded assumption.
The check is hence reduced to a null check for the index access.
The fact that the access control is suitable given the indices that
are actually accessed (by the scroll) will be done in a follow-up,
after we better regulate the creation of index access controls in general.
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not.
To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method.
As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.
With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.
Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.
This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>.
Relates to #59332
Co-authored-by: Nik Everett <nik9000@gmail.com>
Replaces the superclass of the test for `HistogramFieldMapperTests` with
one that doesn't extend `ESSingleNodeTestCase` so we don't depend on the
entire world to test the field mapper.
Continues #61301.
Today we sometimes notify a listener of completion while holding
`SparseFileTracker#mutex`. This commit move all such calls out from
under the mutex and adds assertions that the mutex is not held in the
listener.
Closes#61520
Inference processors asynchronously usage write stats to the .ml-stats index after they used.
In tests the write can leak into the next test causing failures depending on which test follows.
This change waits for the usage stats docs to be written at the end of the test
If a TLS-protected connection closes unexpectedly then today we often
emit a `WARN` log, typically one of the following:
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received close_notify during handshake
We typically only report unexpectedly-closed connections at `DEBUG`
level, but these two messages don't follow that rule and generate a lot
of noise as a result. This commit adjusts the logging to report these
two exceptions at `DEBUG` level only.
Today we use `long` to represent the number of parts of a blob. There's
no need for this extra range, it forces us to do some casting elsewhere,
and indeed when snapshotting we iterate over the parts using an `int`
which would be an infinite loop in case of overflow anyway:
for (int i = 0; i < fileInfo.numberOfParts(); i++) {
This commit changes the representation of the number of parts of a blob
to an `int`.
We convert longs to ints using `Math.toIntExact` in places where we're
sure there will be no overflow, but this doesn't explain the intent of
these conversions very well. This commit introduces a dedicated method
for these conversions, and adds an assertion that we never overflow.
If a searchable snapshot shard fails (e.g. its node leaves the cluster)
we want to be able to start it up again on a different node as quickly
as possible to avoid unnecessarily blocking or failing searches. It
isn't feasible to fully restore such shards in an acceptably short time.
In particular we would like to be able to deal with the `can_match`
phase of a search ASAP so that we can skip unnecessary waiting on shards
that may still be warming up but which are not required for the search.
This commit solves this problem by introducing a system index that holds
much of the data required to start a shard. Today(*) this means it holds
the contents of every file with size <8kB, and the first 4kB of every
other file in the shard. This system index acts as a second-level cache,
behind the first-level node-local disk cache but in front of the blob
store itself. Reading chunks from the index is slower than reading them
directly from disk, but faster than reading them from the blob store,
and is also replicated and accessible to all nodes in the cluster.
(*) the exact heuristics for what we should put into the system index
are still under investigation and may change in future.
This second-level cache is populated when we attempt to read a chunk
which is missing from both levels of cache and must therefore be read
from the blob store.
We also introduce `SearchableSnapshotsBlobStoreCacheIntegTests` which
verify that we do not hit the blob store more than necessary when
starting up a shard that we've seen before, whether due to a node
restart or because a snapshot was mounted multiple times.
Backport of #60522
Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>
If a search failure occurs during data frame extraction we catch
the error and retry once. However, we retry another search that is
identical to the first one. This means we will re-fetch any docs
that were already processed. This may result either to training
a model using duplicate data or in the case of outlier detection to
an error message that the process received more records than it
expected.
This commit fixes this issue by tracking the latest doc's sort key
and then using that in a range query in case we restart the search
due to a failure.
Backport of #61544
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Backports the following commits to 7.x:
[ML] write warning if configured memory limit is too low for analytics job (#61505)
Having `_start` fail when the configured memory limit is too low can be frustrating.
We should instead warn the user that their job might not run properly if their configured limit is too low.
It might be that our estimate is too high, and their configured limit works just fine.
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.
depends on #61515
backports #58435
Refactor the tests to not require a mock HTTP Server. This has been
the cause of flakiness and removing it doesn't affect the logical
coverage of this suite. The "fake UI" is now simulated by an
http client that makes the necessary requests to Elasticsearch APIs.
Backport to add case insensitive support for regex queries.
Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master.
This can be removed when 8.7 Lucene is released.
Closes#59235