`LobObtainFailedException` should be reserved for on-disk locks that
Lucene attempts (like `write.lock`). This switches our in-memory
semaphore locks for shards to use a different exception. Additionally,
ShardLockObtainFailedException no longer subclasses IOException, since
no IO is being done is this case.
Resolves#19978
Currently plugins can not inspect or upgrade custom
meta data on startup. This commit allow plugins
to check and/or upgrade global custom meta data on startup.
Plugins can stop a node if any custom meta data is not supported.
The big change here is cleaning up the `TaskListResponse` so it doesn't
have a breaky `toString` implementation. That was causing the reindex
tests to break.
Also removed `NetworkModule#registerTaskStatus` which is part of the
Plugin API. Use `Plugin#getNamedWriteables` instead.
Primary shard allocation observes limits in forcing allocation
Previously, during primary shards allocation of shards
with prior allocation IDs, if all nodes returned a
NO decision for allocation (e.g. the settings blocked
allocation on that node), we would chose one of those
nodes and force the primary shard to be allocated to it.
However, this meant that primary shard allocation
would not adhere to the decision of the MaxRetryAllocationDecider,
which would lead to attempting to allocate a shard
which has failed N number of times already (presumably
due to some configuration issue).
This commit solves this issue by introducing the
notion of force allocating a primary shard to a node
and each decider implementation must implement whether
this is allowed or not. In the case of MaxRetryAllocationDecider,
it just forwards the request to canAllocate.
Closes#19446
Parsing a search request is currently split up among a number of
classes, using multiple public static methods, which take multiple
regstries of elements that may appear in the search request like query
parsers and aggregations. This change begins consolidating all this code
by collapsing the registries normally used for parsing search requests
into a single SearchRequestParsers class. It is also made available to
plugin services to enable templating of search requests. Eventually all
of the actual parsing logic should move to the class, and the registries
should be hidden, but for now they are at least co-located to reduce the
number of objects that must be passed around.
Squashes all the subpackages of `org.elasticsearch.rest.action` down to
the following:
* `o.e.rest.action.admin` - Administrative actions
* `o.e.rest.action.cat` - Actions that make tables for `grep`ing
* `o.e.rest.action.document` - Actions that act on documents
* `o.e.rest.action.ingest` - Actions that act on ingest pipelines
* `o.e.rest.action.search` - Actions that search
I'm tempted to merge `search` into `document` but the `document`
package feels fairly complete as is and `Suggest` isn't actually always
about documents either....
I'm also tempted to merge `ingest` into `admin.cluster` because the
latter contains the actions for dealing with stored scripts.
I've moved the `o.e.rest.action.support` into `o.e.rest.action`.
I've also added `package-info.java`s to all packges in `o.e.rest`. I
figure if the package is too small to deserve a `package-info.java` file
then it is too small to deserve to be a package....
Also fixes checkstyle in all moved classes.
* Enable BoostingQuery with FVH highlighter
* apply boost with negativeBoost
* flatten boosting query with its own boost and update boost query to a single layer
* Assign scroll keepAlive when deserializing
The scroll time value was never assign when deserializing from the transport layer, meaning that it would always be null when received from another node, although the originating search request might have it set to some value.
* add tests for SearchRequest serialization and fail fast with illegal arguments
To ease testing, also introduced equals, hashcode and toString methods in SearchRequest and Scroll.
The serialization test brought up a few wrong assumptions about non null instance members, for which some null checks were needed to avoid NPEs when serializing.
* make Scroll implement Writeable rather than Streamable
* [TEST] add serialization test for ShardSearchTransportRequest
This also covers ShardSearchLocalRequest implicitly as most of the serialization code is in it.
that have analyzer aliases in their analysis settings will still work, but
any attempts to create an alias for analyzers in newly created indices
will result in an IllegalArgumentException.
As a result, the setting `index.analysis.analyzer.{analyzerName}.alias` is
no longer supported.
Closes#18244
The term persisted task was used to indicate that a task should store its results upon its completion. We would like to use this term to indicate that a task can survive restart of nodes instead. This commit removes usages of the term "persist" when it means store results.
This commit fixes the number of max local storage nodes setting used in
the discovery disruption tests. In some cases (randomly but rarely), the
acked indexing test can run with five nodes instead of three, breaching
the max local storage nodes configuration.
As the most complicated `FetchSubPhase` highlighting gets its own package
(`o.e.seach.fetch.subphase.highlight`. No other `FetchSubPhase`s get their
own package. Instead they all reside together in `o.e.search.fetch.subphase`.
Add package descriptions to `o.e.search.fetch` and subpackages.
This commit adds a function to shard-level query result to determine whether
there are any hits that needs fetching. Currently, a shard-level query result
can have hits when there are search hits and/or completion suggestion hits.
The newly added function encapsulates the checks to determine if a shard-level
query result has any fetchable hits, which is used in optimizing for sorting
documents and releasing search request contexts.
If a primary fails, an active replica is promoted to primary. Once we do the promotion, however, we are sure that the active replica is not relocating anymore. The reason is that when the primary fails, we first remove/cancel all initializing replicas (also if they are relocation targets). This is the only safe thing to do anyhow, because promoting relocating replica to primary would also mean that the replica recovery of the replica relocation target is suddenly promoted to primary relocation, which the recovery code treats in a different way.
This commit defaults the max local storage nodes to one. The motivation
for this change is that a default value greather than one is dangerous
as users sometimes end up unknowingly starting a second node and start
thinking that they have encountered data loss.
Relates #19964
ContextIndexSearcher#explain ignores the dfs data to create the normalized weight.
This change fixes this discrepancy by using the dfs data to create the normalized weight when needed.
This commit separates the description of the links in the network that are to be disrupted from the failure that is to be applied to the links (disconnect/unresponsive/delay). Previously we had subclasses for the various kind of network disruption schemes combining on one hand failure mode (disconnect/unresponsive/delay) as well as the network links to cut (two partitions / bridge partitioning) into a single class.
Reducing the ping timeouts on a test that does not simulate network failures can cause node disconnects within the test on a slow CI machine.
The test testSearchWithRelocationAndSlowClusterStateProcessing does not expect such disconnects, leading to shard relocation in the test to abort prematurely.
Today in the uncaught exception handler, we attempt to halt the virtual
machine on fatal errors. Yet, halting the virtual machine requires
privileges which might not be granted to the caller when the exception
is thrown for example from a scripting engine. This means that if an
OutOfMemoryError or another fatal error is hit inside a script, the
virtual machine will not exit because the halt call will be denied for
securiry privileges. In this commit, we mark this halt call as trusted
so that the virtual machine can be halted if a fatal error is
encountered in a script.
Relates #19923
I also reduced the visibility of a couple classes and renamed/consolidated some
test classes for consistency, eg. removing the `Simple` prefix or using the
`<Type>FieldMapperTests` convention for testing field mappers.
testUnknownObjectException used to generate malformed json objects in some cases, due to the existence of arrays as it was not closing the injected object correctly. That is why the test was catching JsonParseException among the exception that are expected to be thrown. That is fixed by tracking where the new object is placed and placing its end object marker to the right level rather than always at the end.
Also introduced a mechanism to explicitly declare objects that won't cause any exception when they get additional objects injected, so that there is no need to override the method anymore as that caused copy pasting of the whole test method. This also makes sure that changes are reflected in tests, as those inner objects are not skipped but we actually check that what is declared is true (no exceptions get thrown when an additional object is added within them.
This change adds support for treating dots in field names found in
mappings as path separators, like was previously done for dynamic
mappings and document parsing.
closes#19443
Currently, when attempting to delete a snapshot, we check
if a snapshot is in progress before proceeding with the
delete. However, we do not check if a restore is taking
place before deleting. This can lead to concurrency issues
where a restore is in progress but the snapshotted files
for the restore are being deleted underneath.
This commit first checks if a restore is in progress and
if so, it prevents the deletion of a snapshot with an
exception.
Note that this is not a complete solution because it is
still possible that a restore of the same snapshot is
started after the deletion commenced but before the
deletion finished. But there is a much smaller window
for this to occur and this commit is a quick way to
check for the common case.
When compiling many dynamically changing scripts, parameterized
scripts (<https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-using.html#prefer-params>)
should be preferred. This enforces a limit to the number of scripts that
can be compiled within a minute. A new dynamic setting is added -
`script.max_compilations_per_minute`, which defaults to 15.
If more dynamic scripts are sent, a user will get the following
exception:
```json
{
"error" : {
"root_cause" : [
{
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "i",
"node" : "a5V1eXcZRYiIk8lecjZ4Jw",
"reason" : {
"type" : "general_script_exception",
"reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
"caused_by" : {
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
}
}
],
"caused_by" : {
"type" : "general_script_exception",
"reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
"caused_by" : {
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
}
},
"status" : 500
}
```
This also fixes a bug in `ScriptService` where requests being executed
concurrently on a single node could cause a script to be compiled
multiple times (many in the case of a powerful node with many shards)
due to no synchronization between checking the cache and compiling the
script. There is now synchronization so that a script being compiled
will only be compiled once regardless of the number of concurrent
searches on a node.
Relates to #19396
Slims the public interface of RoutingNodes down to 4 methods to update routing entries:
- initializeShard() -> initializes an unassigned shard
- startShard() -> starts an initializing shard / completes relocation of a shard
- relocateShard() -> starts relocation of a started shard
- failShard() -> fails/cancels an assigned shard
In the spirit of PR #19743, where deassociateDeadNodes was moved to its own public method to be only called when nodes have actually left the cluster and not on every reroute step, this commit also removes electPrimariesAndUnassignedDanglingReplicas from AllocationService and folds it into the shard failure logic. This means that an active replica is promoted to primary in the same method where the primary was failed. Previously we would scan in each reroute iteration for active replicas to be promoted to primary.
If a `keyword` field is both indexed and doc-valued, then we will convert the
input string to utf8 bytes twice: once for indexing/storing, and once for doc
values. This commit changes `keyword` fields to compute the utf8 representation
up-front and then feed both the inverted index and doc values with it.
Rather than adding version-based bw compat logic, I broke the `keyword` field
(they are now indexed/stored as a binary field rather than string), which is
fine since we are still on alpha releases for 5.0.
Previously, the engine would catch an out of memory error and would try
to handle the error (it would try to fail the engine, and then it would
swallow the out of memory error). Catching the out of memory errors was
removed in 3343ceeae4 so this code path is
not effectively dead. This commit removes this dead code from the
engine.
Relates #19881
The payload option was introduced with the new completion
suggester implementation in v5, as a stop gap solution
to return additional metadata with suggestions.
Now we can return associated documents with suggestions
(#19536) through fetch phase using stored field (_source).
The additional fetch phase ensures that we only fetch
the _source for the global top-N suggestions instead of
fetching _source of top results for each shard.
The recent changes to the Histogram Aggregator introduced a bug where
an exception would not be thrown if the maxBound of the extended bounds
is less that the minBound. This change fixes that bug.
Closes#19833
PR #19715 made AllocationService less lenient, requiring ShardRouting instances that are passed to its applyStartedShards and
applyFailedShards methods to exist in the routing table. As primary shard failures also fail initializing replica shards,
concurrent replica shard failures that are treated in the same cluster state update might not reference existing replica entries
in the routing table anymore. To solve this, PR #19715 ordered the failures by first handling replica before
primary failures. There are other failures that influence more than one routing entry, however. When we have a failed shard entry
for both a relocation source and target, then, depending on the order, either one or the other might point to an out-dated shard
entry. As finding a good order is more difficult than applying the failures, this commit re-adds parts of the ShardRouting
re-resolve logic so that the applyFailedShards method can properly treat shard failure batches.
GeoDistance is implemented using a crazy enum that causes issues with the scripting modules. This commit moves all distance calculations to arcDistance and planeDistance static methods in GeoUtils. It also removes unnecessary distance helper methods from ScriptDocValues.GeoPoints.
This commit enables completion suggester to return documents
associated with suggestions. Now the document source is returned
with every suggestion, which respects source filtering options.
In case of suggest queries spanning more than one shard, the
suggest is executed in two phases, where the last phase fetches
the relevant documents from shards, implying executing suggest
requests against a single shard is more performant due to the
document fetch overhead when the suggest spans multiple shards.
The method requires pairs of fieldnames and property arguments and will fail if
the varargs input is an uneven number. We should check this and fail with an
appropriate IllegalArgumentException instead.
```
Elasticsearch doesn't have any automatic mechanism to share these
components between indexes. If any component is heavy enough to
warrant such sharing then it is the Pugin's responsibility to do
it in their {@link AnalysisProvider} implementation. We recommend
against doing this unless absolutely necessary because it can be
difficult to get the caching right given things like behavior
changes across versions.
```
Closes#19814
This commit cleans up indices in a snapshot repository when all
snapshots containing the index are all deleted. Previously, empty
indices folders would lay around after all snapshots containing
them were deleted.
Plugins provide NamedWriteables that are added to the
NamedWriteableRegistry. Those are added on Nodes already, the same mechanism is
added to the setup for TransportClient.
This commit updates Jackson to the 2.8.1 version, which is more strict when it comes to build objects. It also adds the snakeyaml dependency that was previously shaded in jackson libs.
It also closes#18076
Instead of being lenient in QueryParseContext#parseInnerQueryBuilder we check that the token where the parser stopped reading was END_OBJECT, and throw error otherwise. This is a best effort to verify that the parsers read a whole object rather than stepping out in the middle of it due to malformed queries.
Fuzzy Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Span term Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also modified the parsing code to consume the whole query object.
Common Terms Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Match Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Match phrase prefix Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Geo distance Query, like many other queries, used to parse even when the query referred to multiple fields and the last one would win. We rather throw an exception now instead.
Match phrase Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Wildcard Query, like many other queries, used to parse even when the query referred to multiple fields and the first one would win. We rather throw an exception now instead.
Also added test for short prefix query variant and modified the parsing code to consume the whole query object.
Regexp Query, like many other queries, used to parse even when the query referred to multiple fields and the last one would win. We rather throw an exception now instead.
Also added test for short prefix query variant.
Prefix Query, like many other queries, used to parse when the query refers to multiple fields and the last one would win. We rather throw an exception now instead.
Also added tests for short prefix quer variant.
Range Query, like many other queries, used to parse when the query refers to multiple fields and the last one would win. We rather throw an exception now instead.
Closes#19547
When we introduces [persistent node ids](https://github.com/elastic/elasticsearch/pull/19140) we were concerned that people may copy data folders from one to another resulting in two nodes competing for the same id in the cluster. To solve this we elected to not allow an incoming join if a different with same id already exists in the cluster, or if some other node already has the same transport address as the incoming join. The rationeel there was that it is better to prefer existing nodes and that we can rely on node fault detection to remove any node from the cluster that isn't correct any more, making room for the node that wants to join (and will keep trying).
Sadly there were two problems with this:
1) One minor and easy to fix - we didn't allow for the case where the existing node can have the same network address as the incoming one, but have a different ephemeral id (after node restart). This confused the logic in `AllocationService`, in this rare cases. The cluster is good enough to detect this and recover later on, but it's not clean.
2) The assumption that Node Fault Detection will clean up is *wrong* when the node just won an election (it wasn't master before) and needs to process the incoming joins in order to commit the cluster state and assume it's mastership. In those cases, the Node Fault Detection isn't active.
This PR fixes these two and prefers incoming nodes to existing node when finishing an election.
On top of the, on request by @ywelsch , `AllocationService` synchronization between the nodes of the cluster and it's routing table is now explicit rather than something we do all the time. The same goes for promotion of replicas to primaries.
AbstractQueryTestCase parses the main version of the query in strict mode, meaning that it will fail if any deprecated syntax is used. It should do the same for alternate versions (e.g. short versions). This is the way it is because the two alternate versions for ids query are both deprecated. Moved testing for those to a specific test method that isolates the deprecations and actually tests that the two are deprecated.
Factor rounding and Interval rounding (the non-date based rounding)
was no longer used so it has been removed. Offset rounding has been
retained for no since both date based rounding classes rely on it
This change does three things:
1. Makes PreBuiltTransportClientTests run since it was silently
failing on a missing dependency
2. Makes PreBuiltTransportClientTests pass
3. Removes the http.type and transport.type from being set in the
transport clients additional settings since these are set to `netty4` by
default anyway.
Fixes an issue where a node that receives a cluster state
update with a brand new cluster UUID but without an
initial persistence block could cause indices to be wiped out,
preventing them from being reimported as dangling indices.
This commit only removes the in-memory data structures and
thus, are subsequently reimported as dangling indices.
A primary shard currently instructs the master to fail a replica shard that it fails to replicate writes to before acknowledging the writes to the client. To ensure that the primary instructing the master to fail the replica is still the current primary in the cluster state on the master, it submits not only the identity of the replica shard to fail to the master but also its own shard identity. This can be problematic however when the primary is relocating. After primary relocation handoff but before the primary relocation target is activated, the primary relocation target is replicating writes through the authority of the primary relocation source. This means that the primary relocation target should probably send the identity of the primary relocation source as authority. However, this is not good enough either, as primary shard activation and shard failure instructions can arrive out-of-order. This means that the relocation target would have to send both relocation source and target identity as authority. Fortunately, there is another concept in the cluster state that represents this joint authority, namely primary terms. The primary term is only increased on initial assignment or when a replica is promoted. It stays the same however when a primary relocates.
This commit changes ShardStateAction to rely on primary terms for shard authority. It also changes the wire format to only transmit ShardId and allocation id of the shard to fail (instead of the full ShardRouting), so that the same action can be used in a subsequent PR to remove allocation ids from the active allocation set for which there exist no ShardRouting in the cluster anymore. Last but not least, this commit also makes AllocationService less lenient, requiring ShardRouting instances that are passed to its applyStartedShards and applyFailedShards methods to exist in the routing table. ShardStateAction, which is calling these methods, now has the responsibility to resolve the ShardRouting objects that are to be started / failed, and remove duplicates.
Today, when listing thread pools via the cat thread pool API, thread
pools are listed in a column-delimited format. This is unfriendly to
command-line tools, and inconsistent with other cat APIs. Instead,
thread pools should be listed in a row-delimited format.
Additionally, the cat thread pool API is limited to a fixed list of
thread pools that excludes certain built-in thread pools as well as all
custom thread pools. These thread pools should be available via the cat
thread pool API.
This commit improves the cat thread pool API by listing all thread pools
(built-in or custom), and by listing them in a row-delimited
format. Finally, for each node, the output thread pools are sorted by
thread pool name.
Relates #19721