The REST status 503 means "I can not handle the request that you sent
me." However today we respond to a main request with a 503 when there
are certain cluster blocks despite still responding with an actual main
response. This is broken, we should respond with a 200 status. This
commit removes this silliness.
When converting the source for an indexing request to JSON, the
conversion can throw an I/O exception which we swallow and proceed with
logging to the slow log. The cause of the I/O exception is lost. This
commit changes this behavior and chooses to drop the entry from the slow
logs and instead lets an exception percolate up to the indexing
operation listener loop. Here, the exception will be caught and logged
at the warn level.
Today we can end up in a situation where the cluster state contains
unknown or invalid settings. This can happen easily during a rolling
upgrade. For example, consider two nodes that are on a version that
considers the setting foo.bar to be known and valid. Assume one of these
nodes is restarted on a higher version that considers foo.bar to now be
either unknown or invalid, and then the second node is restarted
too. Now, both nodes will be on a version that consider foo.bar to be
unknown or invalid yet this setting will still be contained in the
cluster state. This means that if a cluster settings update is applied
and we validate the settings update with the existing settings then
validation will fail. In such a state, the offending setting can not
even be removed. This commit helps out with this situation by archiving
any settings that are unknown or invalid at the time that a settings
update is applied. This allows the setting update to go through, and the
archived settings can be removed at a later time.
These can be seen at the debug level via cluster state update logging
but really they should be more visible like index creation and
deletion. This commit adds info-level logging for template puts and
deletes.
This interning is completely unnecessary because we look up the marker
by the prefix (value, not identity) anyway. This means that regardless
of the identity of the prefix, we end up with the same marker. That is
all that we really care about here.
As we have factored Elasticsearch into smaller libraries, we have ended
up in a situation that some of the dependencies of Elasticsearch are not
available to code that depends on these smaller libraries but not server
Elasticsearch. This is a good thing, this was one of the goals of
separating Elasticsearch into smaller libraries, to shed some of the
dependencies from other components of the system. However, this now
means that simple utility methods from Lucene that we rely on are no
longer available everywhere. This commit copies IOUtils (with some small
formatting changes for our codebase) into the fold so that other
components of the system can rely on these methods where they no longer
depend on Lucene.
The requiresKeystore flag was removed from PluginInfo in 6.3.0. This
commit fixes a pair of code comments that incorrectly refer to this
version as 7.0.0.
This commit removes the ability to specify that a plugin requires the
keystore and instead creates the keystore on package installation or
when Elasticsearch is started for the first time. The reason that we opt
to create the keystore on package installation is to ensure that the
keystore has the correct permissions (the package installation scripts
run as root as opposed to Elasticsearch running as the elasticsearch
user) and to enable removing the keystore on package removal if the
keystore is not modified.
While trying to reroute a shard to or from a non-data node (a node with ``node.data=false``), I encountered a null pointer exception. Though an exception is to be expected, the NPE was occurring because ``allocation.routingNodes()`` would not contain any non-data nodes, so when you attempt to do ``allocation.routingNodes.node(non-data-node)`` it would not find it, and thus error. This occurred regardless of whether I was rerouting to or from a non-data node.
This PR adds a check (as well as a test for these use cases) to return a legible, useful exception if the discovery node you are rerouting to or from is not a data node.
When an index writer encounters a tragic exception, it could be a
Throwable and not an Exception. Yet we blindly cast the tragic exception
to an Exception which can encounter a ClassCastException. This commit
addresses this by checking if the tragic exception is an Exception and
otherwise wrapping the Throwable in a RuntimeException if it is not. We
choose to wrap the Throwable instead of passing it around because
passing it around leads to changing a lot of places where we handle
Exception to handle Throwable instead. In general, we have tried to
avoid handling Throwable and instead let those bubble up to the uncaught
exception handler.
Log4j2 provides a wide range of logging methods. Our code typically only uses a subset of them. In particular, uses of the methods trace|debug|info|warn|error|fatal(Object) or trace|debug|info|warn|error|fatal(Object, Throwable) have all been wrong, leading to not properly logging the provided message. To prevent these issues in the future, the corresponding Logger methods have been blacklisted.
This commit restores the handling of tiebreaker for multi_match
cross fields query. This functionality was lost during a refactoring
of the multi_match query (#25115).
Fixes#28933
This commit makes the controller spawner also look under modules. It
also fixes a bug in module security policy loading where the module is a
meta plugin.
Today we check for a few cases where we should maybe die before failing
the engine (e.g., when a merge fails). However, there are still other
cases where a fatal error can be hidden from us (for example, a failed
index writer commit). This commit modifies the mechanism for failing the
engine to always check for a fatal error before failing the engine.
Today when requesting _all we return all nodes regardless of what other
node qualifiers are in the request. This is contrary to how the
remainder of the API behaves which acts as additive and subtractive
based on the qualifiers and their ordering. It is also contrary to how
the wildcard * behaves. This commit removes the special handling for
_all so that it behaves identical to the wildcard *.
Relates #28971
* Remove Booleans use from XContent and ToXContent
This removes the use of the `common.Boolean` class from two of the XContent
classes, so they can be decoupled from the ES code as much as possible.
Related to #28754, #28504
Today, failures from the primary-replica resync are ignored as the best
effort to not mark shards as stale during the cluster restart. However
this can be problematic if replicas failed to execute resync operations
but just fine in the subsequent write operations. When this happens,
replica will miss some operations from the new primary. There are some
implications if the local checkpoint on replica can't advance because of
the missing operations.
1. The global checkpoint won't advance - this causes both primary and
replicas keep many index commits
2. Engine on replica won't flush periodically because uncommitted stats
is calculated based on the local checkpoint
3. Replica can use a large number of bitsets to keep track operations seqno
However we can prevent this issue but still reserve the best-effort by
failing replicas which fail to execute resync operations but not mark
them as stale. We have prepared to the required infrastructure in #28049
and #28054 for this change.
Relates #24841
This change replaces the use of string concatenation with a call to
String.join(). String concatenation might be quadratic, unless the compiler can
optimise it away, whereas String.join() is more reliably linear. There can
sometimes be a large number of pending ClusterState update tasks and #28920
includes a report that this operation sometimes takes a long time.
At one point, modules and plugins were very different. But effectively
now they are the same, just from different directories. This commit
unifies the loading methods so they are simply two different
directories. Note that the main codepath to load plugin bundles had
duplication (was not calling getPluginBundles) since previous
refactorings to add meta plugins. Note this change also rewords the
primary exception message when a plugin descriptor is missing, as the
wording asking if the plugin was built before 2.0 isn't really
applicable anymore (it is highly unlikely someone tries to install a 1.x
plugin on any modern version).
This allows us to remove another dependency in the decoupling of the XContent
code. Rather than move this class over or decouple it, it can simply be removed.
Relates tangentially to #28504
These classes are used only in two places, and can be replaced by the
`CharArrayReader` and `CharArrayWriter`. The JDK can also perform lock biasing
and elision as well as escape analysis to optimize away non-contended locks,
rendering their lock-free implementations unnecessary.
Ingest has been failing to apply existing pipelines from cluster-state
into the in-memory representation that are no longer valid. One example of
this is a pipeline with a script processor. If a cluster starts up with scripting
disabled, these pipelines will not be loaded. Even though GETing a pipeline worked,
indexing operations claimed that this pipeline did not exist. This is because one
gets information from cluster-state and the other is from an in-memory data-structure.
Now, two things happen
1. suppress the exceptions until after other successful pipelines are loaded
2. replace failed pipelines with a placeholder pipeline
If the pipeline execution service encounters the stubbed pipeline, it is known that
something went wrong at the time of pipeline creation and an exception was thrown to
the user at some point at start-up.
closes#28269.
This switches the underlying byte output representation used by default in
`XContentBuilder` from `BytesStreamOutput` to a `ByteArrayOutputStream` (an
`OutputStream` can still be specified manually)
This is groundwork to allow us to decouple `XContent*` from the rest of the ES
core code so that it may be factored into a separate jar.
Since `BytesStreamOutput` was not using the recycling instance of `BigArrays`,
this should not affect the circuit breaking capabilities elsewhere in the
system.
Relates to #28504
* Factor UnknownNamedObjectException into its own class
This moves the inner class `UnknownNamedObjectException` from
`NamedXContentRegistry` into a top-level class. This is so that
`NamedXContentRegistry` doesn't have to depend on StreamInput and StreamOutput.
Relates to #28504
This reverts commit f057fc294a.
The rescorer does not resort the collapsed values inside the top docs
during rescoring. For this reason the Lucene rescorer is not compatible
with collapsing.
Relates #27243
This removes the readFrom and writeTo methods from XContentType, instead using
the more generic `readEnum` and `writeEnum` methods. Luckily they are both
encoded exactly the same way, so there is no compatibility layer needed for
backwards compatibility.
Relates to #28504
* Remove BytesRef usage from XContentParser and its subclasses
This removes all the BytesRef usage from XContentParser in favor of directly
returning a CharBuffer (this was originally what was returned, it was just
immediately wraped in a BytesRef).
Relates to #28504
* Rename method after Ryan's feedback
* Wrap stream passed to createParser in try-with-resources
This wraps the stream (`.streamInput()`) that is passed to many of the
`createParser` instances in the enclosing (or a new) try-with-resources block.
This ensures the `BytesReference.streamInput()` is closed.
Relates to #28504
* Use try-with-resources instead of closing in a finally block
This change ensures that we ignore terms removed from the analysis rather than returning a match_no_docs query for the part
that contain the stop word. For instance a query like "the AND fox" should ignore "the" if it is considered as a stop word instead of
adding a match_no_docs query.
This change also fixes the analysis of prefix terms that start with a stop word (e.g. `the*`). In such case if `analyze_wildcard` is true and `the`
is considered as a stop word this part of the query is rewritten into a match_no_docs query. Since it's a prefix query this change forces the prefix query
on `the` even if it is removed from the analysis.
Fixes#28855Fixes#28856
Pruning tombstones is quite expensive since we have to walk though all
deletes in the live version map and acquire a lock on every value even though
it's impossible to prune it. This change does a pre-check if a delete is old enough
and if not it skips acquireing the lock.
Increase the default limit of `index.highlight.max_analyzed_offset` to 1M instead of previous 10K.
Enhance an error message when offset increased to include field name, index name and doc_id.
Relates to https://github.com/elastic/kibana/issues/16764
When virtual lock is not possible because JNA is unavailable, we log a
warning message. Yet, this log message refers to mlockall rather than
virtual lock, presumably because of a copy/paste error. This commit
fixes this issue.
This commit replaces `org.apache.logging.log4j.util.Supplier` by
`java.util.function.Supplier` in non-logging code. These usages are
neither incorrect nor wrong but rather than accidental. I think our
intention was to use the JDK's Supplier in these places.
* Reject regex search if regex string is too long (#28344)
* Add docs
* Introduce index level setting `index.max_regex_length`
to control the maximum length of the regular expression
Closes#28344
Today we have two test base classes that have a lot in common when it comes to testing wire and xcontent serialization: `AbstractSerializingTestCase` and `AbstractXContentStreamableTestCase`. There are subtle differences though between the two, in the way they work, what can be overridden and features that they support (e.g. insertion of random fields).
This commit introduces a new base class called `AbstractWireTestCase` which holds all of the serialization test code in common between `Streamable` and `Writeable`. It has two minimal subclasses called `AbstractWireSerializingTestCase` and `AbstractStreamableTestCase` which are specialized for `Writeable` and `Streamable`.
This commit also introduces a new test class called `AbstractXContentTestCase` for all of the xContent testing, which holds a testFromXContent method for parsing and rendering to xContent. This one can be delegated to from the existing `AbstractStreamableXContentTestCase` and `AbstractSerializingTestCase` so that we avoid code duplicate as much as possible and all these base classes offer the same functionalities in the same way. Having this last base class decoupled from the serialization testing may also help with the REST high-level client testing, as there are some classes where it's hard to implement equals/hashcode and this makes it possible to override `assertEqualInstances` for custom equality comparisons (also this base class doesn't require implementing equals/hashcode as it doesn't test such methods.
This commit fixes a bug that was introduced in PR #27415 for 6.1
and 7.0 where a change to support MULTIPOINT shapes mucked up
indexing of standalone points.
Previously the message reported when `node_concurrent_outgoing_recoveries`
resulted in a `THROTTLE` decision included the reporting node's ID rather than
that of the primary. This commit fixes that.
Fixes#28777.
This commit makes AcknowledgedResponse implement ToXContentObject, so that the response knows how to print its own content out to XContent, which allows us to remove AcknowledgedRestListener.
These tests need to be skipped. They cause plugins to be loaded which
causes a child classloader to be opened. We do not want to add the
permissions to be able to close a classloader solely for these tests,
and the JARs created in the test can not be deleted on Windows until the
classloader is closed. Since this will not happen before test teardown,
the test will fail on Windows. So, we skip these tests.
This allows us to save a bit of code, but also adds more coverage as it tests serialization which was missing in some of the existing tests. Also it requires implementing equals/hashcode and we get the corresponding tests for them for free from the base test class.
* Pass InputStream when creating XContent parser
Rather than passing the raw `BytesReference` in when creating the xcontent
parser, this passes the StreamInput (which is an InputStream), this allows us to
decouple XContent from BytesReference.
This also removes the use of `commons.Booleans` so it doesn't require more
external commons classes.
Related to #28504
* Undo boolean removal
* Enhance deprecation javadoc
Add support version and version_type in ingest pipelines
Add support for setting document version and version type in set
processor of an ingest pipeline.
* Remove log4j dependency from elasticsearch-core
This removes the log4j dependency from our elasticsearch-core project. It was
originally necessary only for our jar classpath checking. It is now replaced by
a `Consumer<String>` so that the es-core dependency doesn't have external
dependencies.
The parts of #28191 which were moved in conjunction (like `ESLoggerFactory` and
`Loggers`) have been moved back where appropriate, since they are not required
in the core jar.
This is tangentially related to #28504
* Add javadocs for `output` parameter
* Change @code to @link
Pruning tombstones is best effort and should not block if a key is currently
locked. This can cause a deadlock in rare situations if we switch of append
only optimization while heavily updating the same key in the engine
while the LiveVersionMap is locked. This is very rare since this code
patch only executed every 15 seconds by default since that is the interval
we try to prune the deletes in the version map.
Closes#28714
This commit fixes an issue with setting plugin.mandatory to include a
meta-plugin. The issue here is that the names that we collect are the
underlying plugins, not the meta-plugin. We should not use the
underlying plugins instead using the names of non-meta plugins and the
names of meta-plugins. This commit addresses this. The strategy here is
that when we look at the installed plugins on the filesystem, we keep
track of which ones are meta-plugins and carry this information up to
where check which plugins are installed against the mandatory plugins.
Relates #28710
Today we have several levels of indirection to acquire an Engine.Searcher.
We first acquire a the reference manager for the scope then acquire an
IndexSearcher and then create a searcher for the engine based on that.
This change simplifies the creation into a single method call instead of
3 different ones.
The node stats API enables filtlering the top-level stats for only
desired top-level stats. Yet, this was never enabled for adaptive
replica selection stats. This commit enables this. We also add setting
these stats on the request builder, and fix an inconsistent name in a
setter.
Relates #28721
We currently revisit the index deletion policy whenever the global
checkpoint has advanced enough. We should also revisit the deletion
policy after releasing the last snapshot of a snapshotting commit. With
this change, the old index commits will be cleaned up as soon as
possible.
Follow-up of #28140https://github.com/elastic/elasticsearch/pull/28140#discussion_r162458207
Today we maintain a copy of every delete in the live version maps. This is unnecessary
and might add quite some overhead if maps grow large. This change moves out the deletes
tracking into the tombstone map only and relies on the cleaning of tombstones when deletes
are collected.
The AdaptiveSelectionStats object serializes the clientOutgoingConnections map that's concurrently updated in SearchTransportService. Serializing the map consists of first writing the size of the map and then serializing the entries. If the number of entries changes while the map is being serialized, the size and number of entries go out of sync. The deserialization routine expects those to be in sync though.
Closes#28713
The acquireIndexCommit was separated into acquireSafeIndexCommit and
acquireLastIndexCommit, however the test was not updated accordingly.
Relates #28271
Previously we introduced a new parameter to `acquireIndexCommit` to
allow acquire either a safe commit or a last commit. However with the
new parameters, callers can provide a nonsense combination - flush first
but acquire the safe commit. This commit separates acquireIndexCommit
method into two different methods to avoid that problem. Moreover, this
change should also improve the readability.
Relates #28038
* Remove deprecated createParser methods
This removes the final instances of the callers of `XContent.createParser` and
`XContentHelper.createParser` that did not pass in the `DeprecationHandler`. It
also removes the now-unused deprecated methods and fully removes any mention of
Log4j or LoggingDeprecationHandler from the XContent code.
Relates to #28504
* Add comments in JsonXContentGenerator
This test has a race condition. The action listener used to listen for
connections has a guard against being executed twice. However, this
listener can be executed twice. After on success is invoked the test
starts to tear down. At this point, the threads the test forked will
terminate and the remote cluster connection will be closed. However, a
thread forked to the management thread pool by the remote cluster
connection can still be executing and try to continue connecting. This
thread will be cancelled when the remote cluster connection is closed
and this leads to the action listener being invoked again. To address
this, we explicitly check that the reason that on failure was invoked
was cancellation, and we assert that the listener was already previously
invoked. Interestingly, this issue has always been present yet a recent
change (#28667) exposed errors that occur on tasks submitted to the
thread pool and were silently being lost.
Relates #28695
In order to allow us to gradually move to passing the deprecation handler is, we
need a shim that contains both the non-passed and passed version.
Relates to #28504
When we submit a task to a thread pool for asynchronous execution, we
are returned a future. Since we submitted to go asynchronous, these
futures are not inspected for failure (we would have to block a thread
to do that). While we have on failure handlers for exceptions that are
thrown during execution, we do not handle throwables that are not
exceptions and these end up silently lost. This commit adds a check
after the runnable returns that inspects the status of the future. If an
unhandled throwable occurred during execution, this throwable is
propogated out where it will land in the uncaught exception handler.
Relates #28667
We have code used in the networking layer to search for errors buried in
other exceptions. This code will be useful in other locations so with
this commit we move it to our exceptions helpers.
Relates #28691
Currently the Translog constructor is capable both of opening an existing translog and creating a
new one (deleting existing files). This PR separates these two into separate code paths. The
constructors opens files and a dedicated static methods creates an empty translog.
A recent change moved computing declared versions from using reflection
which occurred repeatedly to a lazily-initialized holder so that
declared versions are computed exactly once. This commit adds a comment
explaining the motivation for this change.
* Move more XContent.createParser calls to non-deprecated version
Part 2
This moves more of the callers to pass in the DeprecationHandler.
Relates to #28504
* Use parser's deprecation handler where appropriate
* Use logging handler in test that uses deprecated field on purpose
* Move more XContent.createParser calls to non-deprecated version
This moves more of the callers to pass in the DeprecationHandler.
Relates to #28504
* Use parser's deprecation handler where available
This method is called often enough (when computing minimum compatibility
versions) that the reflection and sort can be seen while profiling. This
commit addresses this issue by computing the declared versions exactly
once.
Relates #28661
If a tragic even happens while we are refreshing a searcher/reader the engine can open new files on a store that is already closed
For instance the following CI job failed because a merge was concurrently called on a failing shard:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+oracle-java10-periodic/84
This change increments the ref count of the store during a refresh in order to postpone the closing after a tragic event.
When installing a meta plugin we check the dependency of each sub plugin during the installation.
Though if the extended plugin is part of the meta plugin the installation fails because we only check for plugins that are
already installed. This change is a workaround that extracts all plugins (even those that are not fully installed yet) when the dependency check
is made during the installation. Note that this is how the plugin installation worked before https://github.com/elastic/elasticsearch/pull/28581.
This commit moves the semantic validation (like which version a plugin
was built for or which java version it is compatible with) from reading
a plugin descriptor, leaving the checks on the format of the descriptor
intact.
relates #28540
* NXYSignificanceHeuristic.java: implementation of equality would have
failed with a ClassCastException when comparing to another type.
Replaced with the Eclipse generated form.
Today we use the persisted global checkpoint to calculate the starting
seqno in peer-recovery. However we do not check whether the translog
actually belongs to the existing Lucene index when reading the global
checkpoint. In some rare cases if the translog does not match the Lucene
index, that recovering replica won't be able to complete its recovery.
This can happen as follows.
1. Replica executes a file-based recovery
2. Index files are copied to replica but crashed before finishing the recovery
3. Replica starts recovery again with seq-based as the copied commit is safe
4. Replica fails to open engine because translog and Lucene index are not matched
5. Replica won't be able to recover from primary
This commit enforces the translogUUID requirement when reading the
global checkpoint directly from the checkpoint file.
Relates #28435
This commit changes the state format that was previously passed in to
`MetaDataStateFormat` to always use Smile. This doesn't actually change the
format, since we have used Smile for writing the format since at least 5.0. This
removes the automatic detection of the state format when reading state, since
any state that could be processed in 6.x and 7.x would already have been written
in Smile format.
This is work towards removing the deprecated methods in the XContent code where
we do automatic content-type detection.
Relates to #28504
This commit forces the depth_first mode for `terms` aggregation that contain a sub-aggregation that need to access the score of the document
in a nested context (the `terms` aggregation is a child of a `nested` aggregation). The score of children documents is not accessible in
breadth_first mode because the `terms` aggregation cannot access the nested context.
Close#28394
* Search option terminate_after does not handle post_filters and aggregations correctly
This change fixes the handling of the `terminate_after` option when post_filters (or min_score) are used.
`post_filter` should be applied before `terminate_after` in order to terminate the query when enough document are accepted
by the post_filters.
This commit also changes the type of exception thrown by `terminate_after` in order to ensure that multi collectors (aggregations)
do not try to continue the collection when enough documents have been collected.
Closes#28411