This commit allows for configuration of the amount of time we wait for a node to startup. This is needed as some QA tests with plugins and tribe timeout when starting the tribe
node. Even regular node startup time with plugins is starting to approach the limit of 30 seconds.
Additionally, this commit provides better feedback when a wait has failed. The code checks to
ensure all expected files exist for each node; if they do not then we consider the wait task as
having failed. Prior to this change, when there was one or more missing file the build would
continue and attempt to execute the wait condition that typically makes an HTTP request to the
cluster. The output of this type of failure does include which files exist and which do not but
this change makes it clearer that the actual HTTP call did not time out, but the failure was before
the call was even made.
Cluster settings shouldn't leak into the next test.
I played with failing the test if it left over any settings but that
felt like it added more ceremony then it was worth. The advantage is
that any test that intentionally wants to leave settings in place after
the test would fail and require looking at but, so far as I can tell, we
don't have any such tests.
This commit switches the internal format of the elasticsearch keystore
to no longer use java's KeyStore class, but instead encrypt the binary
data of the secrets using AES-GCM. The cipher key is generated using
PBKDF2WithHmacSHA512. Tests are also added for backcompat reading the v1
and v2 formats.
Currently meta plugins will ask for confirmation of security policy
exceptions for each bundled plugin. This commit collects the necessary
permissions of each bundled plugin, and asks for confirmation of all of
them at the same time.
In some cases testShrinkIndexPrimaryTerm creates then 'mutates' 210
shards. If each shard opens more than 10 files (translog, lucene index),
we exceeded the maximum allowed file handles. In our test, the number of
file handles is limited to 2048 by HandleLimitFS. This commit reduces
the number of shards in testShrinkIndexPrimaryTerm to avoid such errors.
Closes#28153
Clear the disk watermark after the snippet showing users how to set it.
Without this our tests will fail if the disks have less than 10GB free.
Closes#28325
Prints the warning level logs from the integration testing cluster if
there is a failure. We'd attempted to print those logs but I believe our
regex was missing a space.
In rare cases the total nanoseconds for an entire window of operations can be 0
nanoseconds, causing the assertion in
QueueResizingEsThreadPoolExecutor.calculateLambda to trip. This ensures that we
calculate the lambda value with at least 1 nanosecond.
Resolves#27607
This pull request replaces the jvm-example plugin (from the jvm/site plugins era) by two new plugins: a custom-settings that shows how to register and use custom settings (including secured settings) in a plugin, and rest-handler plugin that shows how to register a rest handler.
The two plugins now reside in the plugins/examples project. They can serve as sample plugins for users, a special attention has been put on documentation. The packaging tests have been adapted to use the custom-settings plugin.
Our rest client throws exceptions in funny ways that might cause
`suppressed` exceptions to be eaten. This works around that in the
reindex-from-old tests so we don't stomp on a real failure. It is fairly
diryt so we should work on fixing the high level rest client.
Relates to #25453
The implementation maintains the order of the original requests yet this
functionality is not documented. This commit adds a note to the docs
regarding the ordering of responses to an multi-get request.
Relates #28356
Currently this method parses the string as a double. This means that it
might lose accuracy if the value is a long that is greater than
2^52. This commit changes this method to try to detect whether the
string represents a long first.
This commit updates netty to 4.1.16.Final. This is the latest version that we can have work without
extra permissions. This updated version of netty fixes issues seen with Java 9 and some data
not being sent, which results in timeouts.
Today after writing an operation to an engine, we will call
`IndexShard#afterWriteOperation` to flush a new commit if needed. The
`shouldFlush` condition is purely based on the uncommitted translog size
and the translog flush threshold size setting. However this can cause a
replica execute an infinite loop of flushing in the following situation.
1. Primary has a fully baked index commit with its local checkpoint
equals to max_seqno
2. Primary sends that fully baked commit, then replays all retained
translog operations to the replica
3. No operations are added to Lucence on the replica as seqno of these
operations are at most the local checkpoint
4. Once translog operations are replayed, the target calls
`IndexShard#afterWriteOperation` to flush. If the total size of the
replaying operations exceeds the flush threshold size, this call will
`Engine#flush`. However the engine won't flush as its index writer does
not have any uncommitted operations. The method
`IndexShard#afterWriteOperation` will keep flushing as the condition
`shouldFlush` is still true.
This issue can be avoided if we always flush if the `shouldFlush`
condition is true.
It has been pointed out that GET with body may cause problems to some proxies. We are then switching to POST the API that retrieve info and support a request body.
Closes#28326
This PR removes previously deprecated `isShardsAcked()` method in
favour of `isShardsAcknowledged()` on `CreateIndexResponse`, `CreateIndexClusterStateUpdateResponse` and `RolloverResponse`
Related to #27784
Follow-up of #27819
This change adds the `after_key` of a composite aggregation directly in the response.
It is redundant when all buckets are not filtered/removed by a pipeline aggregation since in this case the `after_key` is always the last bucket
in the response. Though when using a pipeline aggregation to filter composite buckets, the `after_key` can be lost if the last bucket is filtered.
This commit fixes this situation by always returning the `after_key` in a dedicated section.
This change adds the test name to the exceptions thrown by the MockPageCacheRecycler and MockBigArrays. Also, if there is more than one page/array which are not released it will add the first one as the cause of the thrown exception and the others as suppressed exceptions.
Relates to #21315
This change adds a note in the `terms` aggregation that explains how to retrieve **all**
terms (or all combinations of terms in a nested agg) using the `composite` aggregation.
The DiscoveryNodes.Delta was changed in #28197. Previous/Master nodes
are now always set in the `Delta` (before the change they were set only
if the master changed) and the `masterChanged()` method is now based on
object equality and nodes ephemeral ids (before the change it was based
on nodes id).
This commit adapts the DiscoveryNodesTests.testDeltas() to reflect the
changes.
The current install_plugin() does not play well with meta plugins because
it always checks for the plugin's descriptor file.
This commit changes the install_plugin() so that it only runs the install plugin
command and lets the caller verify that the required files are correctly installed.
It also adds a install_meta_plugin() function to install meta plugins.
The rethrottle test fails from time to time because one of the child
task that want to be rethrottled hasn't properly started yet. We retry
in this case but it looks like the retry either isn't long enough or
something else strange is happening.
This change adds yet more logging so future failure of this kind will be
easier to track down and it adds an extra wait condition: this waits for
all child tasks to be running or completed before rethrottling. This
*might* avoid the failure because once a child task is properly started
it should be quite ok to rethrottle.
Relates to #26192
We introduced a single commit assertion when opening an index but create
a new translog. However, this assertion is not held in this situation.
1. A replica with two commits c1 and c2 starts peer-recovery with c1
2. The recovery is sequence-based recovery but the primary is before 6.2 so
it sent true for “createNewTranslog”
3. Replica opens engine and create translog. We expect "open index and
create translog" have 1 commit but we have c1 and c2.
This commit makes sure to assert this iff the index was created on 6.2+.
This introduces a settings updater that allows to specify a list of
settings. Whenever one of those settings changes, the whole block of
settings is passed to the consumer.
This also fixes an issue with affix settings, when used in combination
with group settings, which could result in no found settings when used
to get a setting for a namespace.
Lastly logging has been slightly changed, so that filtered settings now
only log the setting key.
Another bug has been fixed for the mock log appender, which did not
work, when checking for the exact message.
Closes#28047
Second part in a series of PR's to remove Painless Type in favor of Java Class. This completely removes the Painless Type dependency from AnalyzerCaster. Both casting and promotion are now based on Java Class exclusively. This also allows AnalyzerCaster to be decoupled from Definition and make cast checks be static calls again.
The test failure tracked by #28053 occurs because we fail to get the
failure response from the reindex on the first try and on our second try
the delete index API call that was supposed to trigger the failure
actually deletes the index during document creation. This causes the
test to fail catastrophically.
This PR attempts to wait for the failure to finish before the test moves
on to the second attempt. The failure doesn't reproduce locally for me
so I can't be sure that this helps at all with the failure, but it
certainly feels like it should help some. Here is hoping this prevents
similar failures in the future.
The test failure tracked by #26758 occurs when we cancel a running reindex
request that has been sliced into many children. The main reindex
response *looks* canceled but none of the children look canceled. This
is super strange because for the main request to look canceled for any
length of time one of the children has to be canceled.
This change adds additional logging to the test so we have more to go on
to debug this the next time it fails.
Self referencing maps can cause SOE if they are iterated ie. in their toString methods. This chance adds some protected to the usage of those collections.