Due to a misordering of the HTTP handlers, the Netty 4 HTTP server
mishandles Expect: 100-continue headers from clients. This commit fixes
this issue by ordering the handlers correctly.
Relates #19904
testUnknownObjectException used to generate malformed json objects in some cases, due to the existence of arrays as it was not closing the injected object correctly. That is why the test was catching JsonParseException among the exception that are expected to be thrown. That is fixed by tracking where the new object is placed and placing its end object marker to the right level rather than always at the end.
Also introduced a mechanism to explicitly declare objects that won't cause any exception when they get additional objects injected, so that there is no need to override the method anymore as that caused copy pasting of the whole test method. This also makes sure that changes are reflected in tests, as those inner objects are not skipped but we actually check that what is declared is true (no exceptions get thrown when an additional object is added within them.
>However, the version of the new cluster should be the same or newer than the cluster that was
Afaik, you can't restore a snapshot to a newer cluster that is not consecutively newer (i.e. can't restore 1.x snapshot to a 5.x cluster). This is to clarify the statement above moving forward.
Currently, when attempting to delete a snapshot, we check
if a snapshot is in progress before proceeding with the
delete. However, we do not check if a restore is taking
place before deleting. This can lead to concurrency issues
where a restore is in progress but the snapshotted files
for the restore are being deleted underneath.
This commit first checks if a restore is in progress and
if so, it prevents the deletion of a snapshot with an
exception.
Note that this is not a complete solution because it is
still possible that a restore of the same snapshot is
started after the deletion commenced but before the
deletion finished. But there is a much smaller window
for this to occur and this commit is a quick way to
check for the common case.
When compiling many dynamically changing scripts, parameterized
scripts (<https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-using.html#prefer-params>)
should be preferred. This enforces a limit to the number of scripts that
can be compiled within a minute. A new dynamic setting is added -
`script.max_compilations_per_minute`, which defaults to 15.
If more dynamic scripts are sent, a user will get the following
exception:
```json
{
"error" : {
"root_cause" : [
{
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "i",
"node" : "a5V1eXcZRYiIk8lecjZ4Jw",
"reason" : {
"type" : "general_script_exception",
"reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
"caused_by" : {
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
}
}
],
"caused_by" : {
"type" : "general_script_exception",
"reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
"caused_by" : {
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
}
},
"status" : 500
}
```
This also fixes a bug in `ScriptService` where requests being executed
concurrently on a single node could cause a script to be compiled
multiple times (many in the case of a powerful node with many shards)
due to no synchronization between checking the cache and compiling the
script. There is now synchronization so that a script being compiled
will only be compiled once regardless of the number of concurrent
searches on a node.
Relates to #19396
Slims the public interface of RoutingNodes down to 4 methods to update routing entries:
- initializeShard() -> initializes an unassigned shard
- startShard() -> starts an initializing shard / completes relocation of a shard
- relocateShard() -> starts relocation of a started shard
- failShard() -> fails/cancels an assigned shard
In the spirit of PR #19743, where deassociateDeadNodes was moved to its own public method to be only called when nodes have actually left the cluster and not on every reroute step, this commit also removes electPrimariesAndUnassignedDanglingReplicas from AllocationService and folds it into the shard failure logic. This means that an active replica is promoted to primary in the same method where the primary was failed. Previously we would scan in each reroute iteration for active replicas to be promoted to primary.
If a `keyword` field is both indexed and doc-valued, then we will convert the
input string to utf8 bytes twice: once for indexing/storing, and once for doc
values. This commit changes `keyword` fields to compute the utf8 representation
up-front and then feed both the inverted index and doc values with it.
Rather than adding version-based bw compat logic, I broke the `keyword` field
(they are now indexed/stored as a binary field rather than string), which is
fine since we are still on alpha releases for 5.0.
Previously, the engine would catch an out of memory error and would try
to handle the error (it would try to fail the engine, and then it would
swallow the out of memory error). Catching the out of memory errors was
removed in 3343ceeae4 so this code path is
not effectively dead. This commit removes this dead code from the
engine.
Relates #19881