The recursive data.path FilePermission check is an extremely hot
codepath in Elasticsearch. Unfortunately the FilePermission check in
Java is extremely allocation heavy. As it iterates through different
file permissions, it allocates byte arrays for each Path component that
must be compared. This PR improves the situation by adding the recursive
data.path FilePermission it its own PermissionsCollection object which
is checked first.
The change #57936 introduced a dedicated thread pool for reads in system indices.
It also introduced a potential NPE in the case the index to read in not yet present in
the cluster state. This commit fixes that bug by using the getIndexSafe() instead of
just index() method when retrieving the index's metadata so that an INFE is thrown
if the index does not exist.
When an error occurs and we set the task to failed via
the `DataFrameAnalyticsTask.setFailed` method we do not
persist progress. If the job is later restarted, this means
we do not correctly restore from where we can but instead
we start the job from scratch and have to redo the reindexing
phase.
This commit solves this bug by persisting the progress before
setting the task to failed.
Backport of #61782
We had a bug here were we put a `null` value into the shard
assignment mapping when reassigning work after a snapshot delete
had gone through. This only affects partial snaphots but essentially
dead-locks the snapshot process.
Closes#61762
@ywangd made an awesome analysis on why this test is failing, over
at https://github.com/elastic/elasticsearch/issues/55816#issuecomment-620913282
This change makes it so that we use the same client to perform a
refresh of a token, as we use to subsequently attempt to authenticate
with the refreshed token. This ensures the tests are failing and is
a good approximation of how we expect the same client doing the
refresh, to also perform the subsequent authentication in real life
uses.
The errors we were seeing from users have disappeared after #55114
so we deem our behavior safe.
System indices can be snapshotted and are therefore potential candidates
to be mounted as searchable snapshot indices. As of today nothing
prevents a snapshot to be mounted under an index name starting with .
and this can lead to conflicting situations because searchable snapshot
indices are read-only and Elasticsearch expects some system indices
to be writable; because searchable snapshot indices will soon use an
internal system index (#60522) to speed up recoveries and we should
prevent the system index to be itself a searchable snapshot index
(leading to some deadlock situation for recovery).
This commit introduces a changes to prevent snapshots to be mounted
as a system index.
BlobStoreCacheService implements ClusterStateListener in order to
maintain a ready flag that can be used to know when the snapshot
blob cache should be queries or not.
Now the getAsync() method correctly handles the various exceptions
that can be thrown when the .snapshot-blob-cache index is not
available(in isExpectedCacheGetException()) and logs as DEBUG
we can safely remove the ready flag.
This reworks `CardinalityUpperBound` to support precise estimates while
maintaining most of the public API. This will allow us to make more
informed choices about the data structures that we use in aggregations.
None of those interesting choices come as part of this change, but they
are more possible with it.
This is a minor refactor where the job node load logic (node availability, etc.) is refactored into its own class.
This will allow future things (i.e. autoscaling decisions) to use the same node load detection class.
backport of #61521
This commit addresses two issues:
- per class feature importance is now written out for binary classification (logistic regression)
- The `class_name` in per class feature importance now matches what is written in the `top_classes` array.
backport of https://github.com/elastic/elasticsearch/pull/61597
- don't do encoding of asynchExecutionId if it is already provided in
the encoded form
- create a new instance of AsyncExecutionId after checks for
correctness are done
If the master node of the follower cluster is busy, then the
auto-follower will fail to initialize the following process. This also
occurs when an auto-follow pattern matches multiple indices. We should
set the timeout of put-follow requests issued by the auto-follower to
unbounded to avoid this problem.
Closes#56891
We leverage artifact transforms now when downloading and unpacking elasticsearch distributions.
This has the benefit of
- handcrafted extract tasks on the root project are not required.
- The general tight coupling to the root project has been removed.
- The overall required configurations required to handle a distribution have been reduced
- ElasticsearchDistribution has been simplified by making Extracted an ordinary Configuration
downloaded and unpacked external distributions are reused in later builds by been cached
in the gradle user home.
DistributionDownloadPlugin functional tests have been extended and ported
to DistributionDownloadPluginFuncTest.
* Fix ElasticsearchNode#getDistributionFiles (#61219)
Fixes#61647
Backport of #61474.
Part of #46106. Simplify the implementation of deprecation logging by
relying of log4j more completely, and implementing additional behaviour
through custom appenders and filters.
The fact that the data node is already blocked on writing
data files did not guarantee that the cluster state that made
the data node start snapshotting is already applied on master.
This could lead to races where the get snapshots action still
runs based on a state without the snapshot in it, tripping the assertion.
Much safer to handle this by waiting on the non-blocking snapshot create
to return, which guarantees that the CS has been applied on master.
Closes#61541
It looks like it is possible for a request to throw an exception early
before any API interaciton has happened. This can lead to the request count
map containing a `null` for the request count key.
The assertion is not correct and we should not NPE here
(as that might also hide the original exception since we are running this code in
a `finally` block from within the S3 SDK).
Closes#61670
This commit enhances the verbose output for the
`_ingest/pipeline/_simulate?verbose` api. Specifically
this adds the following:
* the pipeline processor is now included in the output
* the conditional (if) and result is now included in the output iff it was defined
* a status field is always displayed. the possible values of status are
* `success` - if the processor ran with out errors
* `error` - if the processor ran but threw an error that was not ingored
* `error_ignored` - if the processor ran but threw an error that was ingored
* `skipped` - if the process did not run (currently only possible if the if condition evaluates to false)
* `dropped` - if the the `drop` processor ran and dropped the document
* a `processor_type` field for the type of processor (e.g. set, rename, etc.)
* throw a better error if trying to simulate with a pipeline that does not exist
closes#56004