If no remote clusters are registered, we shouldn't even try resolving remote clusters as part of indices names. Just go ahead and treat the index as a local one, which may or may not exist. In fact, the character that we use a separator may be part of an alias name, or part of a date math expression. We will go ahead with the remote search only if the prefix before the first occurrence of the separator is an actual registered cluster. Otherwise we will treat the index as a local one.
Previously we would go remotely anytime we'd find the separator in an index name, which caused false positives with aliases and date math expressions. It is also safer to not go remotely unless there is some remote cluster registered. We'd also throw exception whenever an unknown remote cluster was referred to, but this could again conflict with date math expressions or aliases: say we have some remote cluster configured, and we are using the separator in a date math expression, we only want to go remotely when the prefix matches a configured cluster, stay local otherwise, as the index name may still be valid locally.
This commit adds `qa/multi-cluster-search` which currently does a
simple search across 2 clusters. This commit also adds support for IPv6
addresses and fixes an issue where all shards of the local cluster are searched
when only a remote index was given.
Search api looks at indices that the request relates to, if they contain a certain separator ("|" at the moment, to be better defined in the future) the index name is split into two part where the first portion is the name of a remote cluster and the second part is the name of the index expression. Remote clusters are defined as dynamic cluster settings. There are some TODOs and open question but the main functionality works.
* ClusterSearchShardsGroup to return ShardId rather than the int shard id
This allows more info to be retrieved, like the index uuid which is exposed through the ShardId object but was not available before
* Make ClusterSearchShardsResponse empty constructor public
This allows to receive such responses when sending ClusterSearchShardsRequests directly through TransportService (not using ClusterSearchShardsAction via Client), otherwise an empty response cannot be created unless the class that does it is in org.elasticsearch.action, admin.cluster.shards package
* adjust visibility of ClusterSearchShards members
The index uuid is unique across multiple clusters, while the index name is not. Using the index uuid to look up filters in the alias filters map is better and will be needed for multi cluster search.
If a bug occurs in painless compilation (not from a user, but from the
painless infrastructure), a VerifyError may be thrown when compiling the
broken generated class. This commit wraps VerifyErrors in
ScriptException so that useful information is returned to the user,
which can be passed on to the ES team for analysis.
This bug would cause a VerifyError when scripts using the === operator
were comparing a def type against a primitive type since the primitive
type wasn't being appropriately boxed.
This commit makes sure that there is only one instance of the two services rather than one per transport action that uses it.
Also, we take their initialization out of guice's hands by binding it to a specific instance. Otherwise those two objects would get created within a constructor that is called by guice. That may cause problem for instance when throwing an exception from such constructors as guice tries all over again to re-initialize objects and fills up logs with stacktraces.
* replace ShardRouting argument in AbstractSearchAsyncAction#onFirstPhaseResult with more contained String nodeId
There is no need to pass in ShardRouting if the only info read from it is the current node id, the shard id can be read directly from the ShardIterator that's already provided as an argument.
* avoid creating a new ShardId when creating a SearchShardTarget in SnapshotsService
Today when handling unreleased versions for backwards compatilibity
support, we scatted version constants across the code base and add some
asserts to support removing these constants when the version in question
is actually released. This commit improves this situation, enabling us
to just add a single unreleased version constant that can be renamed
when the version is actually released. This should make maintenance of
these versions simpler.
Relates #21760
NOTE: The result of `?.` and `?:` can't be assigned to primitives. So
`int[] someArray = null; int l = someArray?.length` and
`int s = params.size ?: 100` don't work. Do
`def someArray = null; def l = someArray?.length` and
`def s = params.size ?: 100` instead.
Relates to #21748
ShardSearchRequest was previously taking in the whole ShardRouting as a constructor argument while it only needs the ShardsId, changed that to carry over only the needed bits.
* Transport client: Fix remove address to actually work
The removeTransportAddress method of TransportClient removes the address
from the list of nodes that the client pings to sniff for nodes.
However, it does not remove it from the list of existing connected
nodes. This means removing a node is not possible, as long as that node
is still up.
This change removes the node from the connected nodes list before
triggering sampling (ie sniffing). While the fix is simple, testing was
not because there were no existing tests for sniffing. This change also
modifies the mocks used by transport client unit tests in order to allow
mocking sniffing.
* Scripting: Remove groovy scripting language
Groovy was deprecated in 5.0. This change removes it, along with the
legacy default language infrastructure in scripting.
The `error_trace` parameter turns on the `stack_trace` field
in errors which returns stack traces.
Removes documentation for `camelCase` because it hasn't worked
in a while....
Documents the internal parameters used to render stack traces as
internal only.
Closes#21708
This commit refactors the handling of bind permissions, which is in need
of a little cleanup. For example, in its current state, the code for
handling permissions for transport profiles is split across two
methods. This commit refactors this code hopefully making it easier to
work with in future changes. This change is mostly mechanical, no
functionality is changed.
Relates #21742
The test UnicastZenPing#testResolveTimeout chooses a random resolve
timeout between 1ms and 100ms. Close to the lower bound, this is far too
short and the test races against the concurrent resolves executing
before the timeout elapses. This commit increases the timeout to
something that is far less likely to race, yet will not slow the test
down since we are not doing resolves against a real DNS service anyway.
Note that we still want a short resolve timeout since we are testing
whether or not timeouts really work here (by latching one of the
resolves to respond slowly).
Add indices and filter information to search shards api output
The search shards api returns info about which shards are going to be hit by executing a search with provided parameters: indices, routing, preference. Indices can also be aliases, which can also hold filters. The output includes an array of shards and a summary of all the nodes the shards are allocated on. This commit adds a new indices section to the search shards output that includes one entry per index, where each index can be associated with an optional filter in case the index was hit through a filtered alias.
This is relevant since we have moved parsing of alias filters to the coordinating node.
Relates to #20916
Today there is no way to get notified if a node is disconnected. Client code
must poll the TransportClient constantly to detect that a node is not connected
anymore in order to react and add new nodes or notify altering etc. For instance
if a hostname gets resolved to an IP but that host is disconnected clients want
to reconnect by resolving the hostname again which is a common situation in cloud
environments.
Closes#21424
Today we eagerly resolve unicast hosts. This means that if DNS changes,
we will never find the host at the new address. Moreover, a single host
failng to resolve causes startup to abort. This commit introduces lazy
resolution of unicast hosts. If a DNS entry changes, there is an
opportunity for the host to be discovered. Note that under the Java
security manager, there is a default positive cache of infinity for
resolved hosts; this means that if a user does want to operate in an
environment where DNS can change, they must adjust
networkaddress.cache.ttl in their security policy. And if a host fails
to resolve, we warn log the hostname but continue pinging other
configured hosts.
When doing DNS resolutions for unicast hostnames, we wait until the DNS
lookups timeout. This appears to be forty-five seconds on modern JVMs,
and it is not configurable. If we do these serially, the cluster can be
blocked during ping for a lengthy period of time. This commit introduces
doing the DNS lookups in parallel, and adds a user-configurable timeout
for these lookups.
Relates #21630
PR #19416 added a safety mechanism to shard state fetching to only access the store when the shard lock can be acquired. This can lead to the following situation however where a shard has not fully shut down yet while the shard fetching is going on, resulting in a ShardLockObtainFailedException. PrimaryShardAllocator that decides where to allocate primary shards sees this exception and treats the shard as unusable. If this is the only shard copy in the cluster, the cluster stays red and a new shard fetching cycle will not be triggered as shard state fetching treats exceptions while opening the store as permanent failures.
This commit makes it so that PrimaryShardAllocator treats the locked shard as a possible allocation target (although with the least priority).
This commit clarifies the contract of Cache#computeIfAbsent so that an exception that occurs during the execution
of the loader is thrown to all callers. Prior to this commit, the first caller would get the ExecutionException
and other callers that called during the load execution would get null, which is confusing.
You can use `Debug.explain(someObject)` in painless to throw an
`Error` that can't be caught by painless code and contains an
object's class. This is useful because painless's sandbox doesn't
allow you to call `someObject.getClass()`.
Closes#20263
This commit adds the ability to support running with plugins in tests that make use of
backwards compatibility nodes. This can be used to test rolling upgrades with plugins
to ensure they do not cause issues during a rolling upgrade of elasticsearch.
The `type` parameter has always been accepted by the search_shards api, probably to make the api and its urls the same as search. Truth is that the type never had any effect, it's been ignored from day one while accepting it may make users think that we actually do something with it.
This commit removes support for the type parameter from the REST layer and the Java API. Backwards compatibility is maintained on the transport layer though.
The new added serialization test also uncovered a bug in the java API where the `ClusterSearchShardsRequest` could be created with no arguments, but the indices were required to be not null otherwise the request couldn't be serialized as `writeTo` would throw NPE. Fixed by setting a default value (empty array) for indices.
When a fatal error tragically closes an index writer, such an error
never makes its way to the uncaught exception handler. This prevents the
node from being torn down if an out of memory error or other fatal error
is thrown in the Lucene layer. This commit ensures that such events
bubble their way up to the uncaught exception handler.
Relates #21721