At the end of the rolling upgrade tests check the mappings of the concrete
.ml and .transform-internal indices match the mappings in the templates.
When the templates change, the tests should prove that the mappings have
been updated in the new cluster.
This change moves watcher, ILM history and SLM history templates to composable templates.
Versions are updated to reflect the switch. Only change to the templates themselves is added `_meta` to mark them as managed
* The query client uses an array of indices instead of the comma separated
version of the indices names
(cherry picked from commit 8ec4a768f4892a4a2faed25836cb333a9deb2ace)
We've had some discussions around the user experience when using runtime fields. Although we do plan on having multiple runtime fields implementation (e.g. grok, lookup etc.) which could be exposed as different field types, we decided to expose all runtime fields under the same `runtime` type. At the moment, the only implementation will be through scripts, hence a `script` must be specified. In the future, there will be other ways to generate values for runtime fields besides scripts.
This translates also to renaming the RuntimeScriptFieldMapper class to RuntimeFieldMapper .
Relates to #59332
Today, the terms aggregation reduces multiple aggregations at once using a map
to group same buckets together. This operation can be costly since it requires
to lookup every bucket in a global map with no particular order.
This commit changes how term buckets are sorted by shards and partial reduces in
order to be able to reduce results using a merge-sort strategy.
For bwc, results are merged with the legacy code if any of the aggregations use
a different sort (if it was returned by a node in prior versions).
Relates #51857
Backport of #61998 to 7.x branch.
Moving the data stream yaml tests to xpack plugin module has the following benefits:
* The tests are ran both with security enabled (as part of xpack/plugin integTest)
and disabled (as part of xpack/plugin/data-stream/qa/rest integTest).
* and running the tests in mixed cluster qa environment.
This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch.
We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script.
The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields.
Co-authored-by: Nik Everett <nik9000@gmail.com>
Simplifies allocation for snapshot-backed shards by always making the recovery source "from snapshot" for those
snapshot-backed shards (instead of "recover from local or from empty store"). Also let's the balancer pick a node which
to allocate the snapshot-backed shard to (which takes number of shards on each node into account unlike the current
implementation which just picks whatever node we are allowed to allocate to, with no notion of "balancing" at all).
During prewarming of a Lucene file a CacheFile is acquired and
then locked for the duration of the prewarming, ie locked until all
the part of the file has been downloaded and written to cache on
disk. The locking (executed with CacheFile#fileLock()) is here to
prevent the cache file to be evicted while it is prewarming.
But holding the lock may take a while for large files, specially since
restoring snapshot files now respects the
indices.recovery.max_bytes_per_sec setting of 40mb (#58658),
and this can have bad consequences like preventing the CacheFile
to be evicted, opened or closed. In manual tests this bug slow
downs various requests like mounting a new searchable snapshot
index or deleting an existing one that is still prewarming.
This commit reduces the time the lock is held during prewarming so
that the read lock is only required when actively writing to the CacheFile.
* [ML] adds new n_gram_encoding custom processor (#61578)
This adds a new `n_gram_encoding` feature processor for analytics and inference.
The focus of this processor is simple ngram encodings that allow:
- multiple ngrams [1..5]
- Prefix, infix, suffix
Previously, we added a copy of the `_id` during reindexing and sorted
the destination index on that. This allowed us to traverse the docs in the
destination index in a stable order multiple times and with efficiency.
However, the destination index being sorted means we cannot have `nested`
typed fields. This is a problem as it does not allow us to provide
a good experience with our evaluate API when it comes to computing
metrics for specific classes, features, etc.
This commit changes the approach in order to result to a destination
index that allows nested fields.
Instead of adding a copy of the `_id` field, we now add an incremental
id that we can use to traverse the docs in a stable order. We also
ensure we always assign the same incremental id to the same doc from
the source indices by sorting on `_seq_no` during reindexing. That
in combination with the reindexing API using scroll gives us a stable
order as scroll uses the (`_index`, `_doc`, shard_id) tuple to resolve ties.
The extractor now does not need to scroll. Instead we sort on the incremental
id and we do ranged searches to avoid the sort-all-docs overhead.
Finally, the `TestDocsIterator` is simply changed to search_after the incremental id.
With these changes data frame analytics jobs do not use scroll at any part.
Having all these in place, the commit adds the `nested` types to the necessary
fields of `classification` and `regression` analyses results.
Backport of #61943
When a user authenticates via OpenID Connect we copy information from
the OIDC claims into the user's metadata in a particular format.
This commit adds a test that metadata in that format can be used in a
mustache template for Document Level Security.
Backport of: #60030
A role mapping with the following content:
"rules": { "field": { "userid" : "admin" } }
will never match because `userid` is not a valid field. The correct
field is `username`.
This change adds DEBUG logging when an undefined field is referenced.
The choice to use DEBUG rather than INFO/WARN is that the set of
fields is partially dynamic (e.g. the `metadata.*` fields), so
it may be perfectly reasonable to check a field that is not defined
for that user. For example this rule:
"rules": { "field": { "metadata.ranking" : "A" } }
would generate a log message for an unranked user, which would
erroneously suggest that such a rule is an error.
This DEBUG logging will assist in diagnosing problems, without
introducing that confusion.
Backport of: #61246
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
There are currently half a dozen ways to add plugins and modules for
test clusters to use. All of them require the calling project to peek
into the plugin or module they want to use to grab its bundlePlugin
task, and then both depend on that task, as well as extract the archive
path the task will produce. This creates cross project dependencies that
are difficult to detect, and if the dependent plugin/module has not yet
been configured, the build will fail because the task does not yet
exist.
This commit makes the plugin and module methods for testclusters
symmetetric, and simply adding a file provider directly, or a project
path that will produce the plugin/module zip. Internally this new
variant uses normal configuration/dependencies across projects to get
the zip artifact. It also has the added benefit of no longer needing the
caller to add to the test task a dependsOn for bundlePlugin task.
The main changes are:
* Fix custom params are missing when using template or script in watcher's
logging action or jira action.
* Add yaml tests to test passing params to template or script successfully.
Relates to #57625
Co-authored-by: bellengao <gbl_long@163.com>
The current implementation of the filter pipe is incomplete hence why
it got reverted. Note this is not a complete revert as some of the
improvements of said commit (such as the PostAnalyzer) are useful in
general.
Relates #61805
(cherry picked from commit 7a7eb66f7d39586c3a3bc00dce49e6c47a23b46a)
Backport of #61904 to 7.x branch.
The eql search api redirects to the search api. For this reason the eql
search api could work with concrete data stream names. However if security
is enabled and a data stream name snippet with a wildcard was used then
it could not resolve this expressions. This is because the EqlSearchRequest
class didn't overwrite the `includeDataStreams()` method. This pr fixes this,
so that the security layer can properly expand data stream name wildcard
expressions for the eql search api.
This commit also moves the eql data stream test to xpack rest tests,
so that the test runs with security enabled. This is required to reproduce
the bug.
Closes#60828
We frequently use `long`s with `BitArray` in aggs and right now we have
to assert that the `long` fits in an `int`. This adds support for `long`
to `BitArray` so we don't need those assertions.
This fixes a bug introduced by #61782. In that PR I thought I could
simplify the persistence of progress by using the progress straight
from the stats holder in the task instead of calling the get
stats action. However, I overlooked that it is then possible to
have stale progress for the reindexing task as that is only updated
when the get stats API is called.
In this commit this is fixed by updating reindexing task progress
before persisting the job progress. This seems to be much more
lightweight than calling the get stats request.
Closes#61852
Backport of #61868
For 1/2 the plugins in x-pack, the integTest
task is now a no-op and all of the tests are now executed via a test,
yamlRestTest, javaRestTest, or internalClusterTest.
This includes the following projects:
security, spatial, stack, transform, vecotrs, voting-only-node, and watcher.
A few of the more specialized qa projects within these plugins
have not been changed with this PR due to additional complexity which should
be addressed separately.
related: #60630
related: #56841
related: #59939
related: #55896
For 1/2 the plugins in x-pack, the integTest
task is now a no-op and all of the tests are now executed via a test,
yamlRestTest, javaRestTest, or internalClusterTest.
This includes the following projects:
async-search, autoscaling, ccr, enrich, eql, frozen-indicies,
data-streams, graph, ilm, mapper-constant-keyword, mapper-flattened, ml
A few of the more specialized qa projects within these plugins
have not been changed with this PR due to additional complexity which should
be addressed separately.
A follow up PR will address the remaining x-pack plugins (this PR is big enough as-is).
related: #61802
related: #56841
related: #59939
related: #55896
While starting the data frame analytics process it is possible
to get an exception before the process crash handler is in place.
In addition, right after starting the process, we check the process
is alive to ensure we capture a failed process. However, those exceptions
are unhandled.
This commit catches any exception thrown while starting the process
and sets the task to failed with the root cause error message.
I have also taken the chance to remove some unused parameters
in `NativeAnalyticsProcessFactory`.
Relates #61704
Backport of #61838
Allow filtering through a pipe, across events and sequences.
Filter pipes are pushed down to base queries.
For now filtering after limit (head/tail) is forbidden as the
semantics are still up for debate.
Fix#59763
(cherry picked from commit 80569a388b76cecb5f55037fe989c8b6f140761b)
The ML mappings upgrade test had become useless as it was
checking a field that has been the same since 6.5. This
commit switches to a field that was changed in 7.9.
Additionally, the test only used to check the results index
mappings. This commit also adds checking for the config
index.
Backport of #61340
During a rolling upgrade it is possible that a worker node will be upgraded before
the master in which case the DFA templates will not have been installed.
Before a DFA task starts check that the latest template is installed and install it if necessary.
When an error occurs and we set the task to failed via
the `DataFrameAnalyticsTask.setFailed` method we do not
persist progress. If the job is later restarted, this means
we do not correctly restore from where we can but instead
we start the job from scratch and have to redo the reindexing
phase.
This commit solves this bug by persisting the progress before
setting the task to failed.
Backport of #61782
@ywangd made an awesome analysis on why this test is failing, over
at https://github.com/elastic/elasticsearch/issues/55816#issuecomment-620913282
This change makes it so that we use the same client to perform a
refresh of a token, as we use to subsequently attempt to authenticate
with the refreshed token. This ensures the tests are failing and is
a good approximation of how we expect the same client doing the
refresh, to also perform the subsequent authentication in real life
uses.
The errors we were seeing from users have disappeared after #55114
so we deem our behavior safe.
System indices can be snapshotted and are therefore potential candidates
to be mounted as searchable snapshot indices. As of today nothing
prevents a snapshot to be mounted under an index name starting with .
and this can lead to conflicting situations because searchable snapshot
indices are read-only and Elasticsearch expects some system indices
to be writable; because searchable snapshot indices will soon use an
internal system index (#60522) to speed up recoveries and we should
prevent the system index to be itself a searchable snapshot index
(leading to some deadlock situation for recovery).
This commit introduces a changes to prevent snapshots to be mounted
as a system index.
BlobStoreCacheService implements ClusterStateListener in order to
maintain a ready flag that can be used to know when the snapshot
blob cache should be queries or not.
Now the getAsync() method correctly handles the various exceptions
that can be thrown when the .snapshot-blob-cache index is not
available(in isExpectedCacheGetException()) and logs as DEBUG
we can safely remove the ready flag.
This is a minor refactor where the job node load logic (node availability, etc.) is refactored into its own class.
This will allow future things (i.e. autoscaling decisions) to use the same node load detection class.
backport of #61521
This commit addresses two issues:
- per class feature importance is now written out for binary classification (logistic regression)
- The `class_name` in per class feature importance now matches what is written in the `top_classes` array.
backport of https://github.com/elastic/elasticsearch/pull/61597
- don't do encoding of asynchExecutionId if it is already provided in
the encoded form
- create a new instance of AsyncExecutionId after checks for
correctness are done
If the master node of the follower cluster is busy, then the
auto-follower will fail to initialize the following process. This also
occurs when an auto-follow pattern matches multiple indices. We should
set the timeout of put-follow requests issued by the auto-follower to
unbounded to avoid this problem.
Closes#56891
Backport of #61474.
Part of #46106. Simplify the implementation of deprecation logging by
relying of log4j more completely, and implementing additional behaviour
through custom appenders and filters.
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.
This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.
This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:
```
// Create an index
PUT /eggplant
// Get an index
GET /eggplant?flat_settings
```
Returns the default settings now of:
```json
{
"eggplant" : {
"aliases" : { },
"mappings" : { },
"settings" : {
"index.creation_date" : "1597855465598",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "1",
"index.provided_name" : "eggplant",
"index.routing.allocation.include._tier" : "data_hot",
"index.uuid" : "6ySG78s9RWGystRipoBFCA",
"index.version.created" : "8000099"
}
}
}
```
After the initial setting of this setting, it can be treated like any other index level setting.
This new setting is *not* set on a new index if any of the following is true:
- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)
Relates to #60848
The check introduced by #60640 for scroll searches, in which we log
if the index access control before the query and fetch phases differs
from when the scroll context is created, is too strict, leading to spurious
warning log messages.
The check verifies instance equality but this assumes that the fetch
phase is executed in the same thread context as the scroll context
validation. However, this is not true if the scroll search is executed
cross-cluster, and even for local scroll searches it is an unfounded assumption.
The check is hence reduced to a null check for the index access.
The fact that the access control is suitable given the indices that
are actually accessed (by the scroll) will be done in a follow-up,
after we better regulate the creation of index access controls in general.
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not.
To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method.
As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.
With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.
Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.
This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>.
Relates to #59332
Co-authored-by: Nik Everett <nik9000@gmail.com>
Replaces the superclass of the test for `HistogramFieldMapperTests` with
one that doesn't extend `ESSingleNodeTestCase` so we don't depend on the
entire world to test the field mapper.
Continues #61301.
Today we sometimes notify a listener of completion while holding
`SparseFileTracker#mutex`. This commit move all such calls out from
under the mutex and adds assertions that the mutex is not held in the
listener.
Closes#61520
Inference processors asynchronously usage write stats to the .ml-stats index after they used.
In tests the write can leak into the next test causing failures depending on which test follows.
This change waits for the usage stats docs to be written at the end of the test
If a TLS-protected connection closes unexpectedly then today we often
emit a `WARN` log, typically one of the following:
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received close_notify during handshake
We typically only report unexpectedly-closed connections at `DEBUG`
level, but these two messages don't follow that rule and generate a lot
of noise as a result. This commit adjusts the logging to report these
two exceptions at `DEBUG` level only.
Today we use `long` to represent the number of parts of a blob. There's
no need for this extra range, it forces us to do some casting elsewhere,
and indeed when snapshotting we iterate over the parts using an `int`
which would be an infinite loop in case of overflow anyway:
for (int i = 0; i < fileInfo.numberOfParts(); i++) {
This commit changes the representation of the number of parts of a blob
to an `int`.
We convert longs to ints using `Math.toIntExact` in places where we're
sure there will be no overflow, but this doesn't explain the intent of
these conversions very well. This commit introduces a dedicated method
for these conversions, and adds an assertion that we never overflow.
If a searchable snapshot shard fails (e.g. its node leaves the cluster)
we want to be able to start it up again on a different node as quickly
as possible to avoid unnecessarily blocking or failing searches. It
isn't feasible to fully restore such shards in an acceptably short time.
In particular we would like to be able to deal with the `can_match`
phase of a search ASAP so that we can skip unnecessary waiting on shards
that may still be warming up but which are not required for the search.
This commit solves this problem by introducing a system index that holds
much of the data required to start a shard. Today(*) this means it holds
the contents of every file with size <8kB, and the first 4kB of every
other file in the shard. This system index acts as a second-level cache,
behind the first-level node-local disk cache but in front of the blob
store itself. Reading chunks from the index is slower than reading them
directly from disk, but faster than reading them from the blob store,
and is also replicated and accessible to all nodes in the cluster.
(*) the exact heuristics for what we should put into the system index
are still under investigation and may change in future.
This second-level cache is populated when we attempt to read a chunk
which is missing from both levels of cache and must therefore be read
from the blob store.
We also introduce `SearchableSnapshotsBlobStoreCacheIntegTests` which
verify that we do not hit the blob store more than necessary when
starting up a shard that we've seen before, whether due to a node
restart or because a snapshot was mounted multiple times.
Backport of #60522
Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>
If a search failure occurs during data frame extraction we catch
the error and retry once. However, we retry another search that is
identical to the first one. This means we will re-fetch any docs
that were already processed. This may result either to training
a model using duplicate data or in the case of outlier detection to
an error message that the process received more records than it
expected.
This commit fixes this issue by tracking the latest doc's sort key
and then using that in a range query in case we restart the search
due to a failure.
Backport of #61544
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Backports the following commits to 7.x:
[ML] write warning if configured memory limit is too low for analytics job (#61505)
Having `_start` fail when the configured memory limit is too low can be frustrating.
We should instead warn the user that their job might not run properly if their configured limit is too low.
It might be that our estimate is too high, and their configured limit works just fine.
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.
depends on #61515
backports #58435
Refactor the tests to not require a mock HTTP Server. This has been
the cause of flakiness and removing it doesn't affect the logical
coverage of this suite. The "fake UI" is now simulated by an
http client that makes the necessary requests to Elasticsearch APIs.
Backport to add case insensitive support for regex queries.
Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master.
This can be removed when 8.7 Lucene is released.
Closes#59235
Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog).
Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed.
relates #55699
relates #52369
backports #55941
The building block of the eql response is currently the SearchHit. This
is a problem since it is tied to an actual search, and thus has scoring,
highlighting, shard information and a lot of other things that are not
relevant for EQL.
This becomes a problem when doing sequence queries since the response is
not generated from one search query and thus there are no SearchHits to
speak of.
Emulating one is not just conceptually incorrect but also problematic
since most of the data is missed or made-up.
As such this PR introduces a simple class, Event, that maps nicely to
the terminology while hiding the ES internals (the use of SearchHit or
GetResult/GetResponse depending on the API used).
Fix#59764Fix#59779
Co-authored-by: Igor Motov <igor@motovs.org>
(cherry picked from commit 997376fbe6ef2894038968842f5e0635731ede65)
* Faster `equals` for `BytesArray` which is nice since with this change we use it for the search cache
* Lighter `StreamInput` for `BytesArray` that should save memory and some indirection relative to the one on the abstract bytes reference
* Lighter `writeTo` implementation
* Build a `BytesArray` instead of a PagedBytesReference whenever possible to save indirection and memory
This is mostly motivated by the performance issues we are seeing around the GET mappings
REST API which (in case of a large number of indices) will create decompressing streams in a hot loop
which takes a significant amount of time for the system calls involved in instantiating deflaters
and inflaters.
Also, this fixes a leaked deflater when deserializing cached repository data.
This commit removes the log info message "Created ML annotations index and aliases".
The message comes in addition to elasticsearch's index creation logging and it does
not add to it. In addition, since #61107 that message may be logged multiple times.
Backport of #61461
Report anonymous roles in response to "GET _security/_authenticate" API call when:
* Anonymous role is enabled
* User is not the anonymous user
* Credentials is not an API Key
There are warnings about unlicense realms when user lookup fails. This PR adds
similar warnings for when no authentication token can be extracted from the request.
The API key document currently doesn't include the user's full_name or email attributes,
and as a result, when those attributes return `null` when hitting `GET`ing `/_security/_authenticate`,
and in the SAML response from the [IdP Plugin](https://github.com/elastic/elasticsearch/pull/54046).
This changeset adds those fields to the document and extracts them to fill in the User when
authenticating. They're effectively going to be a snapshot of the User from when the key was
created, but this is in line with roles and metadata as well.
Signed-off-by: lloydmeta <lloydmeta@gmail.com>
Before when a value was copied to a field through a parent field or `copy_to`,
we parsed it using the `FieldMapper` from the source field. Instead we should
parse it using the target `FieldMapper`. This ensures that we apply the
appropriate mapping type and options to the copied value.
To implement the fix cleanly, this PR refactors the value parsing strategy. Now
instead of looking up values directly, field mappers produce a helper object
`ValueFetcher`. The value fetchers are responsible for almost all aspects of
fetching, including looking up the right paths in the _source.
The PR is fairly big but each commit can be reviewed individually.
Fixes#61033.
In addition, this commit converts ScaledFloatFieldMapper as it was relying
on a number of static values taken from NumberFieldMapper that had changed
or been removed.
This switches a few tests for field mappers from `ESSingleNodeTestCase`
to `ESTestCase` because, in general, we prefer to avoid
`ESSingleNodeTestCase` when we can because it is slow and "big". "Big"
here means that it pulls in an entire node, making it difficult to
reason about what you are testing.
The test didn't take into account the case where 0 documents are
indexed into the shard, meaning that files aren't loaded during
the pre-warm phase. The test injects FileSystem failures, if
the snapshot doesn't contain any files, pre-warm doesn't read
any files and the recovery completes normally.
Closes#61295
Backport of #61317
Adds a method to make a random date `DateFormatter` pattern. We expect
this'll be useful for runtime fields to compate their formatting with
the standard date field.
feature_processors allow users to create custom features from
individual document fields.
These `feature_processors` are the same object as the trained model's pre_processors.
They are passed to the native process and the native process then appends them to the
pre_processor array in the inference model.
closes https://github.com/elastic/elasticsearch/issues/59327
When the ML annotations index was first added, only the
ML UI wrote to it, so the code to create it was designed
with this in mind. Now the ML backend also creates
annotations, and those mappings can change between
versions.
In this change:
1. The code that runs on the master node to create the
annotations index if it doesn't exist but another ML
index does also now ensures the mappings are up-to-date.
This is good enough for the ML UI's use of the
annotations index, because the upgrade order rules say
that the whole Elasticsearch cluster must be upgraded
prior to Kibana, so the master node should be on the
newer version before Kibana tries to write an
annotation with the new fields.
2. We now also check whether the annotations index exists
with the correct mappings before starting an autodetect
process on a node. This is necessary because ML nodes
can be upgraded before the master node, so could write
an annotation with the new fields before the master node
knows about the new fields.
Backport of #61107
When a user upgrades between versions, they may stop their ML jobs.
Then when the upgrade is complete, they will want to open the jobs again.
But, when opening a job, we attempt to clear out the jobs finished_time. If the job configuration has adjusted between the versions (i.e. added a new field), it will dynamically update the .ml-config index.
We should instead manually change the mapping to be the updated version.
Fixes a test failure in which we allocated some shards and then
relocated them elsewhere, invalidating an assertion about the recovery
statistics which assumed that the shards stayed where they were
originally allocated.
Closes#61067.
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.
These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).
This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`
And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`
Relates to #60848
This adds a frozen phase to ILM that will allow the execution of the
set_priority, unfollow, allocate, freeze and searchable_snapshot actions.
The frozen phase will be executed after the cold and before the delete phase.
(cherry picked from commit 6d0148001c3481290ed7e60dab588e0191346864)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Today a snapshot repository verification ensures that all master-eligible and data nodes have write access to the
snapshot repository (and can see each other's data) since taking a snapshot requires data nodes and the currently
elected master to write to the repository. However, a dedicated voting-only master-eligible node is not a data node and
will never be the elected master so we should not require it to have write access to the repository.
Closes#59649
`foreach` processors store information within the `_ingest` metadata object.
This commit adds the contents of the `_ingest` metadata (if it is not empty).
And will append new inference results if the result field already exists.
This allows a `foreach` to execute and multiple inference results being written to the same result field.
closes https://github.com/elastic/elasticsearch/issues/60867
The Query string parser was not delegating the construction of wildcard/regex queries to the underlying field type.
The wildcard field has special data structures and queries that operate on them so cannot rely on the basic regex/wildcard queries that were being used for other fields.
Closes#60957
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking `end()` on these should be safe since their off-heap memory
will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with
bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc.
Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.
The test failed because the leader was taking a lot of CPUs to process
many mapping updates. This commit reduces the mapping updates, increases
timeout, and adds more debug info.
Closes#59832
Examines the reindex response in order to report potential
problems that occurred during the reindexing phase of
data frame analytics jobs.
Backport of #60911
If the search for get stats with multiple job Ids fails the listener is called for each failure.
This change waits for all responses then returns the first error if there was one.
This commit removes the ability to test the top level result of an aggregator
before it runs the final reduce. All aggregator tests that use AggregatorTestCase#search
are rewritten with AggregatorTestCase#searchAndReduce in order to ensure that we test
the final output (the one sent to the end user) rather than an intermediary result
that could be different.
This change also removes spurious commits triggered on top of a random index writer.
These commits slow down the tests and are redundant with the commits that the
random index writer performs.
Split the autoscaling decider into a service and configuration
in order to enable having additional context information available
in the service. Added AutoscalingDeciderContext holding generic
information all deciders are expected to need. Implemented GET
_autoscaling/decision
This adds a force-merge step to the searchable snapshot action, enabled by default,
but parameterizable using the `force_merge-index" optional boolean.
eg.
```
PUT _ilm/policy/my_policy
{
"policy": {
"phases": {
"cold": {
"actions": {
"searchable_snapshot" : {
"snapshot_repository" : "backing_repo",
"force_merge_index": true
}
}
}
}
}
}
```
(cherry picked from commit d0a17b2d35f1b083b574246bdbf3e1929471a4a9)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
"Transitive" is technically ok here but it's an overloaded word and it's
not immediately clear which meaning is intended so this log message
always makes me do a double-take. I think both "transient" and
"transitory" are clearer, with "transient" being the usual choice.
This suite is still occasionally failing with a timeout on macOS.
Suggest further increasing this timeout until this suite is broken up.
Relates #58071
Uses `my-data-stream` in place of `logs` for data stream examples.
This provides a more intuitive experience for users that copy/paste
their own values into snippets.
* [ML] have DELETE analytics ignore stats failures and clean up unused stats (#60776)
When deleting an analytics configuration, the request MIGHT fail if
the .ml-stats index does not exist or is in strange state (shards unallocated).
Instead of making the request fail, we should log that we were unable to delete the stats docs and then
have them cleaned up in the 'delete_expire_data' janitorial process
remove test, scripts are excluded in the change collector, the test is a leftover from a previous
solution of #57332, which has been discarded
relates #60724fixes#60794
When an exception is thrown during test inference we are
not including the cause message in our logging. This commit
addresses this issue.
Backport of #60749
This pull request adds recovery state tracking for Searchable Snapshots.
In order to track recoveries for searchable snapshot backed indices, this pull
request adds a new type of RecoveryState.
This newRecoveryState instance is able to deal with the
small differences that arise during Searchable snapshots recoveries.
Those differences can be summarized as follows:
- The Directory implementation that's provided by SearchableSnapshots mark the
snapshot files as reused during recovery. In order to keep track of the
recovery process as the cache is pre-warmed, those files shouldn't be marked
as reused.
- Once the shard is created, the cache starts its pre-warming phase, meaning that
we should keep track of those downloads during that process and tie the recovery
to this pre-warming phase. The shard is considered recovered once this pre-warming
phase has finished.
Backport of #60505
disable optimizations when using scripts in group_by, when scripts using scripts we can not predict
the outcome and we have no query counterpart. Other optimizations for other group_by's are not
affected.
fixes#57332
implements a test suite for testing continuous transform with randomization in terms of mappings,
index settings, transform configuration. Add a test case for terms and date histogram. The test
covers:
- continuous mode with several checkpoints created
- correctness of results
- optimizations (minimal necessary writes)
- permutations of features (index settings, aggs, data types, index or data stream)
When the RBACEngine authorizes scroll searches it sets the index access control
to the very limiting IndicesAccessControl.ALLOW_NO_INDICES value.
This change will set it to the value for the index access control that was produced
during the authorization of the initial search that created the scroll,
which is now stored in the scroll context.
Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.
This commit removes the body property from the
indices.create_data_stream.json REST API spec
as the API does not support sending a body.
Update the description of the API to remove
that a data stream can be updated with the
API - data streams can only be created with
this API and attempting to update yields a
`resource_already_exists_exception`.
Closes#60704
(cherry picked from commit 2cab2e0ee094769852df31566dbe22b5df59d900)
Currently, validation of mappers (checking that cross-references are correct, limits on
field name lengths and object depths, multiple definitions, etc) is performed by the
MapperService. This means that any mapper-specific validation, for example that done
on the CompletionFieldMapper, needs to be called specifically from core server code,
and so we can't add validation to mappers that live in plugins.
This commit reworks the validation framework so that mapper-specific validation is
done on the Mapper itself. Mapper gets a new `validate(MappingLookup)`
method (already present on `MetadataFieldMapper` and now pulled up to the parent
interface), which is called from a new `DocumentMapper.validate()` method. All
the validation code currently living on `MapperService` moves either to individual
mapper implementations (FieldAliasMapper, CompletionFieldMapper) or into
`MappingLookup`, an altered `DocumentFieldMappers` which now knows about
object fields and can check for duplicate definitions, or into DocumentMapper
which handles soft limit checks.
* Merge test runner task into RestIntegTest (#60261)
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
* Fix merge issues
* use former 7.x common test configuration
We have various ways of copying between two streams and handling thread-local
buffers throughout the codebase. This commit unifies a number of them and
removes buffer allocations in many spots.
This commit changes TokenAuthIntegTests so all occurrences of
assertThat(x.size(), equalTo(0));
become
assertThat(x, empty());
This means that the assertion failure message will include the
contents of the list (`x`) instead of just its size, which
facilitates easier failure diagnosis.
Relates: #56903
Backport of: #60496
This commit does three things:
* Removes all Copyright/license headers for the build.gradle files under x-pack. (implicit Apache license)
* Removes evaluationDependsOn(xpackModule('core')) from build.gradle files under x-pack
* Removes a place holder test in favor of disabling the test task (in the async plugin)
Closing a regular index and mounting a snapshot-backed index into that existing index does not clean the existing index
folders of those preexisting shards.
This PR removes the existing Lucene / translog files once the searchable snapshot shard is starting up. Future PRs will
make reuse of the existing index files to populate the cache.
When a new cluster starts, the HTTP layer becomes ready to accept incoming
requests while the basic license is still being populated in the background.
When a get license request comes in before the license is ready, it can get
404 error. This PR fixes it by either wrap the license check in assertBusy or
ensure the license is ready before perform the check.
This is a backport for both #60498 and #60573
For consistency reasons (and reducing the overload of IllegalArgumentException)
this changes the exception thrown when trying to create a data stream
that already exists.
(cherry picked from commit ac2184c4614bba0f3ee377da49aea0daed98bab4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
- Replace immediate task creations by using task avoidance api
- One step closer to #56610
- Still many tasks are created during configuration phase. Tackled in separate steps
* SQL: Add option to provide the delimiter for the CSV format (#59907)
* Add option to provide the delimiter to the CSV fmt
This adds the option to provide the desired character as the separator
for the CSV format (the default remains comma).
A set of characters are excluded though - like CR, LF, `"` - to avoid
slipping onto the CSV-dialects slope. The tab is also forbidden, the
user needs to choose the "tsv" format explicitely.
Update the doc to make it clear that the textual CSV, TSV and TXT
formats pass the cursor back to the user through the Cursor HTTP header.
(cherry picked from commit 3a8b00cc7480f7ada57fcea3cbac957facac08fc)
* Java8 fixes
- replace Set#of();
- URLDecoder#decode() requires a string (vs a charset) as 2nd arg.
* Fix SYS COLUMNS schema in ODBC mode (#59513)
* Fix SYS COLUMNS schema in ODBC mode
This fixes a regression when certain ODBC-specific columns that need to
be of the short type were returned as the integer type.
This also fixes the stubbing for the *-indices SYS COLUMN commands.
(cherry picked from commit 96d89dc9b1fd731e736ef804a16bd05496c1dea6)
* Java8 fix: avoid diamond notation in test.
Qualify anonymous class in test.
* fix npe on ambiguous group by
* add tests for aggregates and group by, add quotes to error message
* add more cases for Group By ambiguity test
* change error messages for field ambiguity
* change collection aliases approach
* add locations of attributes for ambiguous grouping error
* Adress review comments
- remove Comparable implementations from Attribute and Location;
- add ad-hoc comparator for sorting locations in ambiguity message;
- remove added AttributeAlias class with Touple;
- add code comment to explain issue with Location overwriting.
* Fix c&p error in location ref generation comparator
Fix copy&paste error in dedicated comparator used for sorting ambiguity
location references.
Slightly increase its readability.
Co-authored-by: Nikita Verkhovin <verkhovin13@gmail.com>
(cherry picked from commit 9ba70a3483f0f4987229bec231cdc004f51b88a5)
This commit adds compatibility testing of our JDBC driver against
different Elasticsearch versions. Although we are really testing the
forwards compatibility nature of the JDBC driver we model the testing
the same as we do existing BWC tests, that is, with the current branch
fetching the earlier versions of the artifact that is to be tested. In
this case, that's the JDBC driver itself.
Because the tests include the JDBC driver jar on it's classpath we had
to change the packaging of the driver jar in order to avoid jarhell and
other conflicting dependency issues when using an old JDBC driver with
later branches. For this we simply relocate all driver dependencies in
the shadow jar under a "shadowed" package. This allows the JDBC driver
to use the correct version of Elasticsearch libs classes, while the
tests themselves use their versions. Since this required a change to the
driver jar compatibility testing can only go back as far as that version
which at the time of this commit is 7.8.1.
Prior to this change ML memory estimation processes for a
given job would always use the same named pipe names. This
would often cause one of the processes to fail.
This change avoids this risk by adding an incrementing counter
value into the named pipe names used for memory estimation
processes.
Backport of #60395
The retention run goes through a number of steps and can randomly take more than 10s.
=> increased timeout to 30s like we did in other spots in this test
Also, noticed that we had a hard wait of 10s in this test, removed it and adjusted following
busy assert in a way that can deal with a missing snapshot (from when the assert runs before
the snapshot was put into the CS).
Closes#60336
In order to unify model inference and analytics results we
need to write the same fields.
prediction_probability and prediction_score are now written
for inference calls against classification models.
CCR will stop functioning if the master node is on 7.8, but data nodes
are before that version because the master node considers that all data
nodes do not have the remote cluster client role. This commit allows CCR
work on data nodes with legacy roles only.
Relates #54146
Relates #59375
This sets up all indexing to one of our write aliases to require it actually be an alias.
This allows failures scenarios to be captured quickly, loudly, and then potentially recovered.
The JDK bug (https://bugs.openjdk.java.net/browse/JDK-8246193) is fixed since b26.
The tests can be unmuted since we are already using b33. However the same bug is now
affecting jdk 8u262, which is the base for current Zulu jdk 8.48. This PR mute the tests
for this specific jdk version.
Relates: #56507
If a feature is created via a custom pre-processor,
we should return the importance for that feature.
This means we will not return the importance for the
original document field for custom processed features.
closes https://github.com/elastic/elasticsearch/issues/59330
This feature adds a new `fields` parameter to the search request, which
consults both the document `_source` and the mappings to fetch fields in a
consistent way. The PR merges the `field-retrieval` feature branch.
Addresses #49028 and #55363.
If a primary shard of a follower index is being relocated, then we
will fail to create a follow-task. This validation is too restricted.
We should ensure that all primaries of the follower index are active
instead.
Closes#59625
Today, a follow task will fail if the master node of the follower
cluster is temporarily overloaded and unable to process master node
requests (such as update mapping, setting, or alias) from a follow-task
within the default timeout. This error is transient, and follow-tasks
should not abort. We can avoid this problem by setting the timeout of
master node requests on the follower cluster to unbounded.
Closes#56891
Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.
Note the limit was implemented in #58885.
Backport of #60219
Previously the test was asserting the prediction on each document
was close 10.0 from the expected. It turned out that was not enough
as we occasionally saw the test failing by little.
Instead of relaxing that assertion, this commit changes it to
assert the mean prediction error is less than 10.0. This should
reduce the chances of the test failing significantly.
Fixes#60212
Backport of #60221