The rollover action is now a retryable step (see #48256)
so ILM will keep retrying until it succeeds as opposed to stopping and
moving the execution in the ERROR step.
Fixes#49073
(cherry picked from commit 3ae90898121b43032ec8f3b50514d93a86e14d0f)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
# Conflicts:
# x-pack/plugin/ilm/qa/multi-node/src/test/java/org/elasticsearch/xpack/ilm/TimeSeriesLifecycleActionsIT.java
Ensures that methods that are called from different threads ( i.e.
from the callbacks of org.apache.http.concurrent.FutureCallback )
catch `Exception` instead of only the expected checked exceptions.
This resolves a bug where OpenIdConnectAuthenticator#mergeObjects
would throw an IllegalStateException that was never caught causing
the thread to hang and the listener to never be called. This would
in turn cause Kibana requests to authenticate with OpenID Connect
to timeout and fail without even logging anything relevant.
This also guards against unexpected Exceptions that might be thrown
by invoked library methods while performing the necessary operations
in these callbacks.
This is intended as a stop-gap solution/improvement to #38941 that
prevents repo modifications without an intermittent master failover
from causing inconsistent (outdated due to inconsistent listing of index-N blobs)
`RepositoryData` to be written.
Tracking the latest repository generation will move to the cluster state in a
separate pull request. This is intended as a low-risk change to be backported as
far as possible and motived by the recently increased chance of #38941
causing trouble via SLM (see https://github.com/elastic/elasticsearch/issues/47520).
Closes#47834Closes#49048
This fixes a regression introduced in #42042. The logic here was
mistakenly inverted such that we only run these tests in a FIPS JVM
which is the opposite of what we intend.
The bats tests require several distributions to all be built into a
single directory. The addition of docker packaging tests now cause the
bats tests to depend on docker, even though docker is not used there.
This commit filters out docker distributions from those that bats
depends on.
This commit introduces a new class called ESSloppyMath
that is meant to reflect the purpose of Lucene's SloppyMath,
but add additional unimplemented faster alternatives to math functions.
The two that are used by geotile-grid a lot are sinh/atan.
In a quick elasticsearch rally benchmark for geotile-grid on Switzerland
data points, this shows a (1.22x) 22% speed-up over using Math's functions.
closes#41166.
The RunTask is responsible for logging output from nodes to the console
and also stays active since we want the cluster to keep running.
However, the implementation of the logging and waiting resulted in a
spin loop that continually polls for data to have been written to one
of the nodes' output files. On my laptop, this causes an idle
invocation of `gradle run` to consume an entire core.
The JDK provides a method to be notified of changes to files through
the use of a WatchService. While a WatchService based implementation
for logging and waiting works, a delay of up to ten seconds is
encountered when running on macOS. This is due to the lack of a native
WatchService implementation that uses kqueue or FSEvents; the current
WatchService implementation in the JDK uses polling with a default
interval of ten seconds. While the interval can be changed
programmatically it is not an acceptable solution due to the need to
access the com.sun.nio.file.SensitivityWatchEventModifier enum, which
is in an internal package.
The change in this commit instead introduces a check to see if any data
was available to read and log. If no data is available in any of the
node output files, the thread sleeps for 100ms. This is enough time to
prevent consuming large amounts of cpu while still providing output to
the console in a timely fashion.
The cluster state is obtained twice in the EnrichPolicyRunner when updating
the final alias. There is a possibility for the state to be slightly different
between those two calls. This PR just has the function get the cluster state
once and reuse it for the life of the function call.
Currently the `_analyze` endpoint doesn't correctly use normalizers specified
in the request. This change fixes that by returning the resolved normalizer from
TransportAnalyzeAction#getAnalyzer and updates test to be able to catch this
in the future.
Closes#48650
Today we wrap exceptions that occur while executing an ingest processor
in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we
only unwrap causes for exceptions that implement
ElasticsearchWrapperException, which the top-level
ElasticsearchException does not. Ultimately, this means that any
exception that occurs during processor execution does not have its cause
unwrapped, and so its status is blanket treated as a 500. This means
that while executing a bulk request with an ingest pipeline,
document-level failures that occur during a processor will cause the
status for that document to be treated as 500. Since that does not give
the client any indication that they made a mistake, it means some
clients will enter infinite retries, thinking that there is some
server-side problem that merely needs to clear. This commit addresses
this by introducing a dedicated ingest processor exception, so that its
causes can be unwrapped. While we could consider a broader change to
unwrap causes for more than just ElasticsearchWrapperExceptions, that is
a broad change with unclear implications. Since the problem of reporting
500s on client errors is a user-facing bug, we take the conservative
approach for now, and we can revisit the unwrapping in a future change.
Makes a few changes to better align the update license API docs with
the [API reference template][0].
Changes:
* Replaces POST with PUT in several snippet examples.
While both are valid, PUT is a bit more RESTful.
* Removes leading slashes (/) from all snippets.
* Relocates and retitles the 'Authorization' section to 'Prerequisites'.
* Replaces explicit titles with the appropriate API reference template
attributes.
* Replaces unneeded `[float]` tags with explicit anchors.
Closes#35341
[0]: https://github.com/elastic/docs/blob/master/shared/api-ref-ex.asciidoc
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
Temporarily "mute" the testReplaceChildren for Pivot since it leads to
failing tests for some seeds, since the new child doesn't respond to a
valid data type.
Relates to #48900
(cherry picked from commit 6200a2207b9a4264d2f3fc976577323c7e084317)
We can have a race here where `scheduleNextRun` executes concurrently to `stop`
and so we run into a `RejectedExecutionException` that we don't catch and thus it
fails tests.
=> Fixed by ignoring these so long as they coincide with a scheduler shutdown
The `edge_ngram` tokenizer limits tokens to the `max_gram` character
length. Autocomplete searches for terms longer than this limit return
no results.
To prevent this, you can use the `truncate` token filter to truncate
tokens to the `max_gram` character length. However, this could return irrelevant results.
This commit adds some advisory text to make users aware of this limitation and outline the tradeoffs for each approach.
Closes#48956.
When a node shuts down, `TransportService` moves to stopped state and
then closes connections. If a request is done in between, an exception
was thrown that was not retried in replication actions. Now throw a
wrapped `NodeClosedException` exception instead, which is correctly
handled in replication action. Fixed other usages too.
Relates #42612
* Move periodic job to ES repo
This change kickstarts the process of moving CI job definitions to this
repo.
* Added a minimal readme to provide pointers to the documentation
* Update .ci/README.md
Co-Authored-By: Rory Hunter <pugnascotia@users.noreply.github.com>
* Update .ci/README.md
Co-Authored-By: Rory Hunter <pugnascotia@users.noreply.github.com>
* point to main repo
* PR review
* Add link to JJBB
The team sometimes get questions around the use of `!foo` vs. `foo == false` in
PRs and reviews (e.g. #48615). This change adds a bullet point to CONTRIBUTING.md
to make expectations here clearer and gives us something to point to in case of
discussion.
This commit fixes an off-by-one bug in the AutoFollowIT test that causes
failures because the leaderIndices counter is incremented during the evaluation
of the leaderIndices.incrementAndGet() < 20 condition but the 20th index is
not created, making the final assertion not verified.
It also gives a bit more time for cluster state updates to be processed on the
follower cluster.
Closes#48982
Ensures that we always use the primary term established by the primary to index docs on the
replica. Makes the logic around replication less brittle by always using the operation primary
term on the replica that is coming from the primary.
Our documentation regarding FIPS 140 claimed that when using SAML
in a JVM that is configured in FIPS approved only mode, one could
not use encrypted assertions. This stemmed from a wrong
understanding regarding the compliance of RSA-OAEP which is used
as the key wrapping algorithm for encrypting the key with which the
SAML Assertion is encrypted.
However, as stated for instance in
https://downloads.bouncycastle.org/fips-java/BC-FJA-SecurityPolicy-1.0.0.pdf
RSA-OAEP is approved for key transport, so this limitation is not
effective.
This change removes the limitation from our FIPS 140 related
documentation.
This PR makes the following two fixes around updating flattened fields:
* Make sure that the new value for ignore_above is immediately taken into
affect. Previously we recorded the new value but did not use it when parsing
documents.
* Allow depth_limit to be updated dynamically. It seems plausible that a user
might want to tweak this setting as they encounter more data.
When using the move-to-step API, we should reread the phase JSON from
the latest version of the ILM policy. This allows a user to move to the
same step while re-reading the policy's latest version. For example,
when changing rollover criteria.
While manually messing around with some other things I discovered that
we only reread the policy when using the retry API, not the move-to-step
API. This commit changes the move-to-step API to always read the latest
version of the policy.
Reverts #48947 and fixes the issue orginally addressed by removing the assertion.
It turns out we can't simply pass empty shard generations to the snapshot finalization in the
BwC case as that results in no indices being added to the meta for the given snapshot since
we take the indices from the shard generations (even in the BwC case the `null` generations work
fine for this).
Closes#48983
This commit changes the ESMockAPIBasedRepositoryIntegTestCase so
that HttpHandler are now wrapped in order to log any exceptions that
could be thrown when executing the server side logic in repository
integration tests.
Backport of #48908
The enrich project doesn't have much history as all the other gradle projects,
so it makes sense to enable spotless for this gradle project.
Today in 6.x it is possible to add an index tombstone to the graveyard without
deleting the corresponding index metadata, because the deletion is slightly
deferred. If you shut down the node and upgrade to 7.x when in this state then
the node will fail to apply any cluster states, reporting
java.lang.IllegalStateException: Cannot delete index [...], it is still part of the cluster state.
This commit addresses this situation by skipping over any index metadata with a
corresponding tombstone, allowing this metadata to be cleaned up by the 7.x
node.
The previous approach did not work because the system property is passed
to Gradle but not to the tests JVM.
We shouldn't really pass this to the tests as we wouldn't want to have
differences.
This timeout being different might not be bad, but having a way to
differentiate could lead to others and it's best avoided.