This contains several cleanups to the indexed scripts.
Remove the unused FetchSourceContext from the Get request..
Add lang,_version,_id to the REST GET API.
Removes the routing from GetIndexedScriptRequest since the script index is a single shard that is replicated across all nodes.
Fix backward compatible template file reference
Before 1.3.0 on disk scripts could be referenced by requesting
````
_search/template
{
"template" : "ondiskscript"
}
````
This was broken in 1.3.0 by requiring
````
{
"template" :
{
"file" : "ondiskscript"
}
}
````
This commit restores the previous behavior.
Remove support for preference, realtime and refresh
These parameters don't make sense anymore for indexed scripts as we always force the preference to _local and
always refresh after a Put to the indexed scripts index.
Closes#7568Closes#7559Closes#7647Closes#7567
When using the DiskThresholdDecider, it's possible that shards could
already be marked as relocating to the node being evaluated. This commit
adds a new setting `cluster.routing.allocation.disk.include_relocations`
which adds the size of the shards currently being relocated to this node
to the node's used disk space.
This new option defaults to `true`, however it's possible to
over-estimate the usage for a node if the relocation is already
partially complete, for instance:
A node with a 10gb shard that's 45% of the way through a relocation
would add 10gb + (.45 * 10) = 14.5gb to the node's disk usage before
examining the watermarks to see if a new shard can be allocated.
Fixes#7753
Relates to #6168
The bulk API request was marked as completely failed,
in case a request with a closed index was referred in
any of the requests inside of a bulk one.
Implementation Note: Currently the implementation is a bit more verbose in order to prevent an instanceof check and another cast - if that is fast enough, we could execute that logic only once at the
beginning of the loop (thinking this might be a bit overoptimization here).
Closes#6410
The bulk API request was marked as completely failed,
in case a request with a closed index was referred in
any of the requests inside of a bulk one.
Implementation Note: Currently the implementation is a bit more verbose in order to prevent an instanceof check and another cast - if that is fast enough, we could execute that logic only once at the beginning of the loop (thinking this might be a bit overoptimization here).
Closes#6410
InternalEngine#refreshNeeded must increment the ref count on the
store used before it's checking if the searcher is current since
internally a searcher ref is acquired and if that happens concurrently
to a engine close it might violate the assumption that all files
are closed when the store is closed.
This commit also converts some try / finally into try / with.
We currently look for REST tests on file system although they are disabled. We should not do that and move the check earlier on. This way third parties using our test infra, which don't have REST tests on file system, can effectively disable the REST tests, otherwise they would get initialization error despite having disabled them. The downside is that the number of tests visualized is going to be zero instead of the real number of parsed REST tests, but there is nothing we can do about this. Tests get ignored anyways.
This change removes the backwards compatibility workaround that checks that a compoundOrder originated from a single user defined criteria for the purposes of serialising to older versioned nodes.
today we have to catch rejected operation exceptions in various places
and notify an ActionListener. This pattern is error prone and adds a lot
of boilerplait code. It's also easy to miss catching this exception
which only is relevant if nodes are under high load. This commit adds
infrastructure that makes ActionListener first class citizen on async
actions.
Closes#7765
We currently expose generic getters for indices and indicesOptions on the IndicesRequest interface. This commit adds a generic setter as well, which can be used to set the indices to a request. The setter impl throws `UnsupportedOperationException` if called on internal requests. Also throws exception if called on single index operations, since it accepts an array as argument.
Closes#7734
Update request internally executes index and delete operations. We need to make sure that those internal operations hold the same headers and context as the original update request. Achieved via copy constructors that accept the current request and the original request.
Closes#7766
Delete mapping executes flush, delete by query and refresh operations internally. Those internal requests are now initialized by passing in the original delete mapping request so that its headers and request context are kept around.
Closes#7736
Previously we resetted the test cluster for all subsequent tests
even though they didn't fail. This make suites like REST tests faster
and prevents crazy timeouts.
Closes#7775
Changes the name of the field in the scripted metrics aggregation from 'aggregation' to 'value' to be more in line with the other metrics aggregations like 'avg'
Master node related operations were missing proper handling of cluster blocks, allowing for example to perform cluster level update settings even before the state was fully restored on initial cluster startup
Note, the change allows to change read only related settings without checking for blocks on update settings, as without it, it means one can't re-enable metadata/write. Also, it doesn't check for blocks on cluster state and health API, as those are allowed to be used even when blocked to figure out what causes the block.
closes#7763closes#7740
upgradeOneNode() only checked if the new node is in the nodes info.
However, this does not guarantee that all nodes have joined the cluster
already. For example the new node could be the master and might not yet
know about all nodes and the other nodes might not know about the new
master yet.
Depending on which client is picked later, the client might then try to
send request to the old node that was shut down instead of the new one.
Instead of just checking if the new node is in the nodes info we should
therefore also check if the all nodes are in the nodes info.
#7719 introduced temporary node ids for nodes that can't be resolved via their address. The change is overly aggressive and creates temporary nodes also for the configure target hosts.
Closes#7747
In #7723 we removed the `updateAddresses` method from `RestClient` under the assumption that the addresses never change during the suite execution, as REST tests rely on the global cluster. Due to #6734 we restart the global cluster though before each test if there was a failure in the suite. If that happens we do need to make sure that the REST client points to the proper nodes. What was missing before was the http call to verify the es version every time the addresses change, which we do now since we effectively re-initialize the REST client when needed (if the http addresses have changed).
Closes#7737
This commit disalbes HTTP for all the suite and test scope tests
since it's an unused / unneeded module which takes time to startup.
This also uses a JVM private port range for HTTP ports to ensure
there are no cross JVM conflicts.
We rely on retry logic when reading a snapshot since it's concurrently
serialized. We should move to a better logic here but the refactoring
of the blobstore change the semantics and this now throws Json
exceptions rather than returning an unexpected Token
When executing a bulk request, with create index operation and auto generate id, if while the primary is relocating the bulk is executed, and the relocation is done while N items from the bulk have executed, the full shard bulk request will be retried on the new primary. This can create duplicates because the request is not makred as potentially holding conflicts.
This change carries over the response for each item on the request level, and if a conflict is detected on the primary shard, and the response is there (indicating that the request was executed once already), use the mentioned response as the actual response for that bulk shard item.
On top of that, when a primary fails and is retried, the change now marks the request as potentially causing duplicates, so the actual impl will do the extra lookup needed.
This change also fixes a bug in our exception handling on the replica, where if a specific item failed, and its not an exception we can ignore, we should actually cause the shard to fail.
closes#7729
When a node is elected as master or receives a join request, we submit a cluster state update task. We should give the node join update task a lower priority than the elect as master to increase the chance it will not be rejected. During master election there is a big chance that these will happen concurrently.
This commit lowers the priority of node joins from IMMEDIATE to URGENT
Closes#7733
Some tests like CorruptedTranslogTests rely on the fact that we
are recovering from translog. In those cases we need to prevent
flushes from happening during indexing. This change adds an optional
flag on the #indexRandom utility to disable flushes.
ActionNamesTests#testIncomingAction rarely uses a random action name to make sure that actions registered via plugins work properly. In some cases the random action would conflict with existing one (e.g. tv) and make the test fail. Fixed also testOutgoingAction although the probability of conflict there is way lower due to longer action names used from 1.4 on.
Most notably the elected_as_master task should run as soon as possible. This is an issue as node join request do use `Priority.IMMEDIATE` and can be unjustly rejected.
Closes#7718
It uses a cluster state update task and it gets rejected if not run on a master node. We should enable running on non-masters if the local flag is set.
Also, report any unexpected error that may happen during this cluster state update task
Closes#7731
Similar to NettyTransport.doStop() all actions which disconnect
from a node (and thus call awaitUnterruptibly) should not be executed
on the I/O thread.
This patch ensures that all disconnects happen in the generic threadpool, trying to avoid unnecessary `disconnectFromNode` calls.
Also added a missing return statement in case the component was not yet
started when catching an exception on the netty layer.
Closes#7726
The Unicast Zen Ping mechanism is configured to ping certain host:port combinations in order to discover other node. Since this is only a ping, we do not setup a full connection but rather do a light connect with one channel. This light connection is closed at the end of the pinging.
During pinging, we may discover disco nodes which are not yet connected (via temporalResponses). UnicastZenPing will setup the same light connection for those node. However, during pinging a cluster state may arrive with those nodes in it. In that case , we will mistakenly believe those nodes are connected and at the end of pinging we will mistakenly disconnect those valid node.
This commit makes sure that all nodes UnicastZenPing connects to have a unique id and can be safely disconnected.
Closes#7719
When cancelling recoveries, we wait for up to 10s for the source node to be notified before continuing. This is not needed in two cases:
1) The source node has been disconnected due to node shutdown (recovery is canceled as a response to cluster state processing)
2) The current thread is the one that will be notifying the source node (happens when one of the calls from the source nodes discoveres the local index is closed)
The first one is especially important as it may delay cluster state update processing with 10s.
Closes#7717
Make the http addresses within the REST client final. It makes no sense to update them before each test if we don't check the version of the nodes again, which would mean adding too much overhead (an additional http call before each test) for no reason. We just reuse the same nodes for the whole suite and check the version once while initializing the client. Would be nice to make the REST client within the execution context final but its initialization still needs to happen after the `ElasticsearchIntegrationTest#beforeInternal` that assigns `GLOBAL_CLUSTER` to `currentCluster`.
Closes#7723
When communicating with 1.3 and earlier nodes, it's possible that the
field data breaker info is not sent at all. When this happens, we should
leave the `breaker` variable as-is (unset) instead of creating an
AllCircuitBreakerStats object with a null fd breaker and fake request &
parent breakers.
The terms aggregation can now support sorting on multiple criteria by replacing the sort object with an array or sort object whose order signifies the priority of the sort. The existing syntax for sorting on a single criteria also still works.
Contributes to #6917
Replaces #7588
Also fix the test
FunctionScoreTests#simpleWeightedFunctionsTestWithRandomWeightsAndRandomCombineMode
which sometimes failed due to rounding issues. Make sure
only floats are returned as scores to assure ratio of
expected and returned score is 1.0f.
The test starts a cluster with random nodes as unicast hosts but *doesn't* use min_master_nodes. If the unicast hosts are started last, nodes may elect themselves as master as they do not have mechanism yet to share information.
ElasticsearchRestTests has now a `restClientSettings` method that can be overriden to provide headers as settings (similarly to what we do with transport client). Those headers will be sent together with every REST requests within the tests.
Closes#7710
Adds a special case to the GeoBoundingBoxFilterParser so that the left of the box is not normalised in the case where left and right are 360 apart. Before this change the left would be normalised to 180 in this case and the filter would only match points with a longitude of 180 (or -180).
Closes#5128
GroupShardsIterator is used in many places like the search execution
to determin which shards to query. This can hold shards of one index
as well as shards of multiple indices. The iteration order is used
to assigne a per-request shard ID for each shard that is used as a
tie-breaker when scores are the same. Today the iteration order is
soely depending on the HashMap iteration order which is undefined or
rather implementation dependent. This causes search requests to return
inconsistent results across requests if for instance different nodes
are coordinating the requests.
Simple queries like `match_all` may return results in arbitrary order
if pagination is used or may even return different results for the same
request even though there hasn't been a refresh call and preferences are
used.
Today if we run into exception like NumberFormatException or IAE
when we try to open a commit point to retrieve checksums and calculate
store metadata we just bubble them up. Yet, those are very likely index
corruptions. In such a case we should really mark the shard as
corrupted.
In the reduce logic of the top_hits aggregation if the first shard result to process contained has no results then the merging of all the shard results can go wrong resulting in an incorrect sorted hits.
This bug can only manifest with a sort other than score.
Closes#7697
If you have previously corrupted files, this method currently builds an
exception like:
```
failed engine [corrupted preexisting index]
failed to start shard
```
Followed by a CorruptIndexException. This commit writes the entire
stacktrace to provide additional information. It also changes the
failure message from `corrupted preexisting index` to `preexisting
corrupted index` to prevent confusion.
Closes#7596
With #7594 we replaced the static `BaseRestHandler#addUsefulHeaders` by introducing the `RestClientFactory` that can be injected and used to register the relevant headers. To simplify things, we can now register relevant headers through the `RestController` and remove the `RestClientFactory` that was just introduced.
Closes#7675
Returns information about settings, aliases, warmers, and mappings. Basically returns the IndexMetadata. This new endpoint replaces the /{index}/_alias|_aliases|_mapping|_mappings|_settings|_warmer|_warmers and /_alias|_aliases|_mapping|_mappings|_settings|_warmer|_warmers endpoints whilst maintaining the same response formats. The only exception to this is on the /_alias|_aliases|_warmer|_warmers endpoint which will now return a section for 'aliases' or 'warmers' even if no aliases or warmers exist. This backwards compatibility change is documented in the reference docs.
Closes#4069
With the change in #7493, we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master). If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed.
Closes#7558
By default the reroute API should return the new cluster state, excluding the metadata. It was however it was wrongly using an old parameter (filter_metadata) and thus failed to do so. This commits restores but wiring it to the correct `metric` parameter. We also add an enum representing the possible metrics, to avoid similar future mistakes.
Closes#7520Closes#7523