Currently an NioChannel is created and it is UNREGISTERED. At some point
it is registered with a selector. From that point on, the channel can
only be closed by the selector. The fact that a channel might not be
associated with a selector has significant implications for concurrency
and the channel shutdown process. The only thing that is simplified by
allowing channels to be in a state independent of a selector is some
testing scenarios.
This PR modifies channels so that they are given a selector at creation
time and are always associated with that selector. Only that selector
can close that channel. This simplifies the channel lifecycle and
closing intricacies.
Removes the primary term from the replication request and pushes it into the transport envelope. This makes it possible to remove the term from the ReplicationOperation universe. The primary term that is to be used for a replication operation is now determined in the reroute phase when the node decides to execute a primary action (and validated once the primary action gets to execute). This makes it possible to validate that the primary action was sent to the correct primary shard instance that it was meant to be sent to (currently we only validate primary actions using the allocation id, which can be reused for failed and reallocated primaries).
When a node tries to join a cluster, it goes through a validation step to make sure the node is compatible with the cluster. Currently we validation that the node can read the cluster state and that it is compatible with the indexes of the cluster. This PR adds validation that the joining node's version is compatible with the versions of existing nodes. Concretely we check that:
1) The node's min compatible version is higher or equal to any node in the cluster (this prevents a too-new node from joining)
2) The node's version is higher or equal to the min compat version of all cluster nodes (this prevents a too old join where, for example, the master is on 5.6, there's another 6.0 node in the cluster and a 5.4 node tries to join).
3) The node's major version is at least as higher as the lowest node in the cluster. This is important as we use the minimum version in the cluster to stop executing bwc code for operations that require multiple nodes. If the nodes are already operating in "new cluster mode", we should prevent nodes from the previous major to join (even if they are wire level compatible). This does mean that if you have a very unlucky partition during the upgrade which partitions all old nodes which are also a minority / data nodes only, the may not be able to re-join the cluster. We feel this edge case risk is well worth the simplification it brings to BWC layers only going one way. This restriction only holds if the cluster state has been recovered (i.e., the cluster has properly formed).
Also, the node join validation can now selectively fail specific nodes (previously the entire batch was failed). This is an important preparation for a follow up PR where we plan to have a rejected joining node die with dignity.
The `QueryRewriteContext` used to provide a client object that can
be used to fetch geo-shapes, terms or documents for percolation. Unfortunately
all client calls used to be blocking calls which can have significant impact on the
rewrite phase since it occupies an entire search thread until the resource is
received. In the case that the index the resource is fetched from isn't on the local
node this can have significant impact on query throughput.
Note: this doesn't fix MLT since it fetches stuff in doQuery which is a different beast. Yet, it is a huge step in the right direction
Today we have duplicated code that is quite complicated to iterate
over rewriteable (`QueryBuilders` mainly) This change introduces a
`Rewriteable` interface that allow to share code to do the rewriting as
well as encapsulation and composition of queries.
Setting a timeout or enforcing low-level search cancellation used to make us
wrap the collector and check either the current time or whether the search
task was cancelled for every collected document. This can be significant
overhead on cheap queries that match many documents.
This commit changes the approach to wrap the bulk scorer rather than the
collector and exponentially increase the interval between two consecutive
checks in order to reduce the overhead of those checks.
When a node tries to join a cluster, it goes through a validation step to make sure the node is compatible with the cluster. Currently we validation that the node can read the cluster state and that it is compatible with the indexes of the cluster. This PR adds validation that the joining node's version is compatible with the versions of existing nodes. Concretely we check that:
1) The node's min compatible version is higher or equal to any node in the cluster (this prevents a too-new node from joining)
2) The node's version is higher or equal to the min compat version of all cluster nodes (this prevents a too old join where, for example, the master is on 5.6, there's another 6.0 node in the cluster and a 5.4 node tries to join).
3) The node's major version is at least as higher as the lowest node in the cluster. This is important as we use the minimum version in the cluster to stop executing bwc code for operations that require multiple nodes. If the nodes are already operating in "new cluster mode", we should prevent nodes from the previous major to join (even if they are wire level compatible). This does mean that if you have a very unlucky partition during the upgrade which partitions all old nodes which are also a minority / data nodes only, the may not be able to re-join the cluster. We feel this edge case risk is well worth the simplification it brings to BWC layers only going one way.
Also, the node join validation can now selectively fail specific nodes (previously the entire batch was failed). This is an important preparation for a follow up PR where we plan to have a rejected joining node die with dignity.
* Register data node stats from info carried back in search responses
This is part of #24915, where we now calculate the EWMA of service time for
tasks in the search threadpool, and send that as well as the current queue size
back to the coordinating node. The coordinating node now tracks this information
for each node in the cluster.
This information will be used in the future the determining the best replica a
search request should be routed to. This change has no user-visible difference.
* Move response time timing into ResponseListenerWrapper
* Move ResponseListenerWrapper to ActionListener instead of SearchActionListener
Also removes the logger
* Move `requestIndex` back to private
* De-guice-ify ResponseCollectorService \o/
* Undo all changes to SearchQueryThenFetchAsyncAction
* Remove unneeded response collector from TransportSearchAction
* Undo all changes to SearchDfsQueryThenFetchAsyncAction
* Completely rewrite the inside of ResponseCollectorService's record keeping
* Documentation and cleanups for ResponseCollectorService
* Add unit test for collection of queue size and service time
* Fix Guice construction error
* Add basic unit tests for ResponseCollectorService
* Fix version constant for the master merge
* Fix test compilation after master merge
* Add a test for node removal on cluster changed event
* Remove integration test as there are now unit tests
* Rename ResponseListenerWrapper -> SearchExecutionStatsCollector
* Fix line-length
* Make classes private and final where appropriate
* Pass nodeId into SearchExecutionStatsCollector and use only ActionListener
* Get nodeId from connection so searchShardTarget can be private
* Remove threadpool from SearchContext, get it from IndexShard instead
* Add missing import
* Use BiFunction for responseWrapper rather than passing in collector service
The following token filters were moved: arabic_normalization, german_normalization, hindi_normalization, indic_normalization, persian_normalization, scandinavian_normalization, serbian_normalization, sorani_normalization, cjk_width and cjk_width
Relates to #23658
#25521 changed channel closing to be handled async on anything but transport stop. This means it may take a while before
calling `connection.close()` and the node being removed from the `connectedNodes` list (but the connection is immediately unusuable).
Fixes#25686
Currently replication and recovery are both coordinated through the latest cluster state available on the ClusterService as well as through the GlobalCheckpointTracker (to have consistent local/global checkpoint information), making it difficult to understand the relation between recovery and replication, and requiring some tricky checks in the recovery code to coordinate between the two. This commit makes the primary the single owner of its replication group, which simplifies the replication model and allows to clean up corner cases we have in our recovery code. It also reduces the dependencies in the code, so that neither RecoverySourceXXX nor ReplicationOperation need access to the latest state on ClusterService anymore. Finally, it gives us the property that in-sync shard copies won't receive global checkpoint updates which are above their local checkpoint (relates #25485).
It was brought up that our current client artifacts have generic names like 'rest' that may cause conflicts with other artifacts.
This commit renames:
- rest -> elasticsearch-rest-client
- sniffer -> elasticsearch-rest-client-sniffer
- rest-high-level -> elasticsearch-rest-high-level-client
A couple of small changes are also preparing the high level client for its first release.
Closes#20248
We already use a per JVM port range in MockTransportService. Yet,
it's possible that if we are executing in the JVM with ordinal 0 that
other clusters reuse ports from the mock transport service and some tests
try to simulate disconnects etc. By using a non-defautl port range (starting at 10300)
we prevent internal test clusters from reusing any of the mock impls ports
Relates to #25301
Today if we search across a large amount of shards we hit every shard. Yet, it's quite
common to search across an index pattern for time based indices but filtering will exclude
all results outside a certain time range ie. `now-3d`. While the search can potentially hit
hundreds of shards the majority of the shards might yield 0 results since there is not document
that is within this date range. Kibana for instance does this regularly but used `_field_stats`
to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results.
This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.
There is a bug when a call to `BytesReferenceStreamInput` skip is made
on a `BytesReference` that has an initial offset. The offset for the
current slice is added to the current index and then subtracted from the
length. This introduces the possibility of a negative number of bytes to
skip. This happens inside a loop, which leads to an infinte loop.
This commit correctly subtracts the current slice index from the
slice.length. Additionally, the `BytesArrayTests` are modified to test
instances that include an offset.
This is a protection mechanism to prevent a single search request from
hitting a large number of shards in the cluster concurrently. If a search is
executed against all indices in the cluster this can easily overload the cluster
causing rejections etc. which is not necessarily desirable. Instead this PR adds
a per request limit of `max_concurrent_shard_requests` that throttles the number of
concurrent initial phase requests to `256` by default. This limit can be increased per request
and protects single search requests from overloading the cluster. Subsequent PRs can introduces
addiontional improvemetns ie. limiting this on a `_msearch` level, making defaults a factor of
the number of nodes or sort shards iters such that we gain the best concurrency across nodes.
We lost the cluster alias due to some special caseing in inner hits
and due to the fact that we didn't pass on the alias to the shard request.
This change ensures that we have the cluster alias present on the shard to
ensure all SearchShardTarget reads preserve the alias.
Relates to #25606
Currently when we close a channel in Netty4Utils.closeChannels we
block until the closing is complete. This introduces the possibility
that a network selector thread will block while waiting until a
separate network selector thread closes a channel.
For instance: T1 closes channel 1 (which is assigned to a T1 selector).
Channel 1's close listener executes the closing of the node. That
means that T1 now tries to close channel 2. However, channel 2 is
assigned to a selector that is running on T2. T1 now must wait until T2
closes that channel at some point in the future.
This commit addresses this by adding a boolean to closeChannels
indicating if we should block on close. We only set this boolean to true
if we are closing down the server channels at shutdown. This call is
never made from a network thread. When we call the closeChannels method
with that boolean set to false, we do not block on close.
This change collapses some of the packages for the bucket aggregations into their parent packages. This was done for the following aggregations:
* The variants of the range aggregation (geo_distance, date and ip) were moved into the `o.e.s.a.bucket.range` package
* The `o.e.s.a.bucket.terms.support` package was removed and the classes were moved to `o.e.s.a.bucket.terms`
* The filter aggregation was moved to `o.e.s.a.bucket.filter`
Since this PR is already relatively large with only the above changes subsequent PRs will do similar operations on relevant metric and pipeline aggregations
Relates to #22868
We currently check whether translog files can be trimmed whenever we create a new translog generation or close a view. However #25294 added a long translog retention period (12h, max 512MB by default), which means translog files should potentially be cleaned up long after there isn't any indexing activity to trigger flushes/the creation of new translog files. We therefore need a scheduled background check to clean up those files once they are no longer needed.
Relates to #10708
This commit does two things:
- bumps the version from 6.0.0-alpha3 to 6.0.0-beta1
- renames the 6.0.0-alpha3 version constant to 6.0.0-beta1
Relates #25621
This commit adds cross-settings validation for the low/high/flood stage
disk watermark settings. This validation was enabled by the introduction
of multiple settings validation.
Relates #25600
This commit refactors the global checkpont tracker to make it more
resilient. The main idea is to make it more explicit what state is
actually captured and how that state is updated through
replication/cluster state updates etc. It also fixes the issue where the
local checkpoint information is not being updated when a shard becomes
primary. The primary relocation handoff becomes very simple too, we can
just verbatim copy over the internal state.
Relates #25468
* Improved REST endpoint exception handling, see #15335
Also improved OPTIONS http method handling to better conform with the
http spec.
* Tidied up formatting and comments
See #15335
* Tests for #15335
* Cleaned up comments, added section number
* Swapped out tab indents for space indents
* Test class now extends ESSingleNodeTestCase
* Capture RestResponse so it can be examined in test cases
Simple addition to surface the RestResponse object so we can run tests
against it (see issue #15335).
* Refactored class name, included feedback
See #15335.
* Unit test for REST error handling enhancements
Randomizing unit test for enhanced REST response error handling. See
issue #15335 for more details.
* Cleaned up formatting
* New constructor to set HTTP method
Constructor added to support RestController test cases.
* Refactored FakeRestRequest, streamlined test case.
* Cleaned up conflicts
* Tests for #15335
* Added functionality to ignore or include path wildcards
See #15335
* Further enhancements to request handling
Refactored executeHandler to prioritize explicit path matches. See
#15335 for more information.
* Cosmetic fixes
* Refactored method handlers
* Removed redundant import
* Updated integration tests
* Refactoring to address issue #17853
* Cleaned up test assertions
* Fixed edge case if OPTIONS method randomly selected as invalid method
In this test, an OPTIONS method request is valid, and should not return
a 405 error.
* Remove redundant static modifier
* Hook the multiple PathTrie attempts into RestHandler.dispatchRequest
* Add missing space
* Correctly retrieve new handler for each Trie strategy
* Only copy headers to threadcontext once
* Fix test after REST header copying moved higher up
* Restore original params when trying the next trie candidate
* Remove OPTIONS for invalidHttpMethodArray so a 405 is guaranteed in tests
* Re-add the fix I already added and got removed during merge :-/
* Add missing GET method to test
* Add documentation to migration guide about breaking 404 -> 405 changes
* Explain boolean response, pull into local var
* fixup! Explain boolean response, pull into local var
* Encapsulate multiple HTTP methods into PathTrie<MethodHandlers>
* Add PathTrie.retrieveAll where all matching modes can be retrieved
Then TrieMatchingMode can be package private and not leak into RestController
* Include body of error with 405 responses to give hint about valid methods
* Fix missing usageService handler addition
I accidentally removed this :X
* Initialize PathTrieIterator modes with Arrays.asList
* Use "== false" instead of !
* Missing paren :-/
Indexing ids in binary form should help with indexing speed since we would
have to compare fewer bytes upon sorting, should help with memory usage of
the live version map since keys will be shorter, and might help with disk
usage depending on how efficient the terms dictionary is at compressing
terms.
Since we can only expect base64 ids in the auto-generated case, this PR tries
to use an encoding that makes the binary id equal to the base64-decoded id in
the majority of cases (253 out of 256). It also specializes numeric ids, since
this seems to be common when content that is stored in Elasticsearch comes
from another database that uses eg. auto-increment ids.
Another option could be to require base64 ids all the time. It would make things
simpler but I'm not sure users would welcome this requirement.
This PR should bring some benefits, but I expect it to be mostly useful when
coupled with something like #24615.
Closes#18154
Transport profiles unfortunately have never been validated. Yet, it's very
easy to make a mistake when configuring profiles which will most likely stay
undetected since we don't validate the settings but allow almost everything
based on the wildcard in `transport.profiles.*`. This change removes the
settings subset based parsing of profiles but rather uses concrete affix settings
for the profiles which makes it easier to fall back to higher level settings since
the fallback settings are present when the profile setting is parsed. Previously, it was
unclear in the code which setting is used ie. if the profiles settings (with removed
prefixes) or the global node setting. There is no distinction anymore since we don't pull
prefix based settings.
Some tests use MockTransportService to do network based testing.
Yet, we run tests in multiple JVMs that means
concurrent tests could claim port that another JVM just released
and if that test tries to simulate a disconnect it might be smart
enough to re-connect depending on what is tested. To reduce the risk,
since this is very hard to debug we use a different default
port range per JVM unless the incoming settings overriding it.
Closes#25301
Today when we run out of disk all kinds of crazy things can happen
and nodes are becoming hard to maintain once out of disk is hit.
While we try to move shards away if we hit watermarks this might not
be possible in many situations. Based on the discussion in #24299
this change monitors disk utilization and adds a flood-stage watermark
that causes all indices that are allocated on a node hitting the flood-stage
mark to be switched read-only (with the option to be deleted). This allows users to react on the low disk
situation while subsequent write requests will be rejected. Users can switch
individual indices read-write once the situation is sorted out. There is no
automatic read-write switch once the node has enough space. This requires
user interaction.
The flood-stage watermark is set to `95%` utilization by default.
Closes#24299
All query builders written as self contained xContent objects, to we should mark
them accordingly using ToXContentObject. This also makes it possible to use
things like XContentHelper#toXContent to render query builders in tests.
QueryParseContext is currently only used as a wrapper for an XContentParser, so
this change removes it entirely and changes the appropriate APIs that use it so
far to only accept a parser instead.
This commit makes the use of the global network settings explicit instead
of implicit within NetworkService. It cleans up several places where we fall
back to the global settings while we should have used tcp or http ones.
In addition this change also removes unnecessary settings classes
Hadoop 2.7.x libraries fail when running on JDK9 due to the version string changing to a single
character. On Hadoop 2.8, this is no longer a problem, and it is unclear on whether the fix will be
backported to the 2.7 branch. This commit upgrades our dependency of Hadoop for the HDFS
Repository to 2.8.1.
We have various assertions that check we never block on transport
threads. This commit adds the thread names for the NioTransport to
these assertions.
With this change I had to fix two places where we were calling blocking
methods from the transport threads.
This commit adds additional protection to `ESSelector` and its
implementations to ensure that channels are not enqueued after the
selector is closed.
After a channel has been added to the queue, we check that the selector
is open. If it is not, then we remove the channel from the queue. If the
channel is removed successfully, we throw an `IllegalStateException`.
Our current TCPTransport logic assumes that we do not pass pings to
the TCPTransport level.
This commit fixes an issue where NioTransport was passing pings to
TCPTransport and leading to exceptions.
Currently QueryParseContext is only a thin wrapper around an XContentParser that
adds little functionality of its own. I provides helpers for long deprecated
field names which can be removed and two helper methods that can be made static
and moved to other classes. This is a first step in helping to remove
QueryParseContext entirely.
In SimpleNioTransportTests we assert that an IOException has a certain
message. This message appears that it is not dependible (and might
change based on platform).
Our other transport tests (mock and netty) do not make this assertion.
Instead they only assert on our application exception message. This
commit removes the IOException message assertion. And retains the
ConnectTransportException message assertion.
This commit introduces a nio based tcp transport into framework for
testing.
Currently Elasticsearch uses a simple blocking tcp transport for
testing purposes (MockTcpTransport). This diverges from production
where our current transport (netty) is non-blocking.
The point of this commit is to introduce a testing variant that more
closely matches the behavior of production instances.
This commit removes path.conf as a valid setting and replaces it with a
command-line flag for specifying a non-default path for configuration.
Relates #25392
The following token filters were moved: stemmer, stemmer_override, kstem, dictionary_decompounder, hyphenation_decompounder, reverse, elision and truncate.
Relates to #23658
While real secure settings (ie an ES keystore) cannot be merged
together, mocked secure settings can and need to be sometimes merged.
This commit adds a merge method to allow tests to merge together
multiple instances of secure settings.
OldIndexBackwardsCompatibilityIT#testOldClusterStates tested whether global and index metadata could be read from data directory,
this can also be tested in full cluster qa test that checks cluster state via api.
Relates to #24939
#25147 added the translog deletion policy but didn't enable it by default. This PR enables a default retention of 512MB (same maximum size of the current translog) and an age of 12 hours (i.e., after 12 hours all translog files will be deleted). This increases to chance to have an ops based recovery, even if the primary flushed or the replica was offline for a few hours.
In order to see which parts of the translog are committed into lucene the translog stats are extended to include information about uncommitted operations.
Views now include all translog ops and guarantee, as before, that those will not go away. Snapshotting a view allows to filter out generations that are not relevant based on a specific sequence number.
Relates to #10708
Most notable changes:
- better update concurrency: LUCENE-7868
- TopDocs.totalHits is now a long: LUCENE-7872
- QueryBuilder does not remove the boolean query around multi-term synonyms:
LUCENE-7878
- removal of Fields: LUCENE-7500
For the `TopDocs.totalHits` change, this PR relies on the fact that the encoding
of vInts and vLongs are compatible: you can write and read with any of them as
long as the value can be represented by a positive int.
MockTransportServices allows us to simulate network disruptions in our testing infra. Sadly it wasn't updated to the state of the art in Transport land. This PR brings it up to speed. Specifically:
1) Opening a connection is now also blocked (before only node connections were blocked)
2) Simplifies things using the latest connection based notification between TcpTransport and TransportService for when a disconnect happens.
3) By 2, it fixes a race condition where we may fail to respond to a sent request when it is sent concurrently with the closing of a connection. The old code relied on a node based bridge between tcp transport and transport service. Sadly, the following doesn't work any more:
```
if (transport.nodeConnected(node)) {
// this a connected node, disconnecting from it will be up the exception
transport.disconnectFromNode(node); <-- this may now be a noop and it doesn't mean that the transport service was notified of the disconnect between the nodeConnected check and here.
} else {
throw new ConnectTransportException(node, reason, e);
}
```
If secure settings are closed after the node has been constructed
no key-store access is permitted. We should also try to be as close as possible
to the real behavior if we mock secure settings. This change also adds
the same behavior as bootstrap has to InternalTestCluster to ensure we fail
if we try to read from secure settings after the node has been constructed.
In MockFSDirectory we should use the actual indexes settings to build
a new IndexMetaData settings object instead of the node settings.
Relates to #25297
I'm still trying to hunt down rare failures in the cancelation tests
for reindex and friends. Here is the latest:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.x+multijob-unix-compatibility/os=ubuntu/876/console
It doesn't show much, other than that one of the tasks didn't kill
itself when asked to cancel.
So I'm going a bit crazy with debug logging so that the next time this
comes up I can trace exactly what happened.
Additionally, this tweaks the logic around how rethrottles were
performed around cancel. Previously we set the `requestsPerSecond`
to `0` when we cancelled the task. That was the "old way" to set them
to inifity which was the intent. This switches that from `0` to
`Float.MAX_VALUE` which is the "new way" to set the `requestsPerSecond`
to infinity. I don't know that this is much better, but it feels better.
In tests, we sometimes create a random directory service and as part of that the IndexSettings get
built again. When we build them again, we need to make sure we do not set the secure settings on
the new IndexMetaData object that gets created as the node settings already have the secure
settings and the index settings and node settings will be combined. If both have secure settings,
the settings builder will throw an AlreadySetException.
Indexing or deleting documents through the IndexShard interface is quite complex and error-prone. It requires multiple calls, e.g. first prepareIndexOnPrimary, then do some checks if mapping updates have occurred, then do the actual indexing using index(...) etc. Currently each consumer of the interface (local recovery, peer recovery, replication) has additional custom checks built around it to deal with mapping updates, some of which are even inconsistent. This commit aims at reducing the complexity by exposing a simpler interface on IndexShard. There are no more prepare*** methods and the mapping complexity is also hidden, but still giving callers a possibility to implement custom logic to deal with mapping updates.
Today we maintain a map of open connections in order to close them when
a low level channel gets closed or handles a failure. We also spawn a thread due to some
tricky concurrency issues especially with respect to netty since they listener might
be called on a transport / boss thread. Executions on those threads must not be blocking
since otherwise we will likely deadlock the event processing which adds to the
complexity of the concurrency model in this class.
This change associates the connection with the close callback that every channel invokes
once it's closed which allows us to remove the connections map. A relaxed non-blocking
concurrency model in the connection close listener allows cleaning up connected nodes without
blocking on any lock.
This change adds tests for the aggregation parsing that try to simulate that we
can parse existing aggregations in a forward compatible way in the future,
ignoring potential newly added fields or substructures to the xContent response.
Removes the `assemble` task from the `build` task when we have
removed `assemble` from the project. We removed `assemble` from
projects that aren't published so our releases will be faster. But
That broke CI because CI builds with `gradle precommit build` and,
it turns out, that `build` includes `check` and `assemble`. With
this change CI will only run `check` for projects without an
`assemble`.
Today TcpTransport is the de-facto base-class for transport implementations.
The need for all the callbacks we have in TransportServiceAdaptor are not necessary
anymore since we can simply have the logic inside the base class itself. This change
moves the stats metrics directly into TcpTransport removing the need for low level
bytes send / received callbacks.
Removes the `assemble` task from projects that are not published.
This should speed up `gradle assemble` by skipping projects that
don't need to be built. Which is useful because `gradle assemble`
is how we cut releases.
We use assertBusy in many places where the underlying code throw exceptions. Currently we need to wrap those exceptions in a RuntimeException which is ugly.
This snapshot has faster range queries on range fields (LUCENE-7828), more
accurate norms (LUCENE-7730) and the ability to use fake term frequencies
(LUCENE-7854).
This commit renames the needsScores method so as to make it
automatically generatable, based on the name of the `_score` variable
which is available in search scripts. It also adds documentation to
ScriptContext to explain the naming and signature of such methods.
In #25201, a setting was added to allow setting the retry timeout for the rest client under the
impression that this would allow requests to go longer than 30s. However, there is also a socket
timeout that needs to be set to greater than 30s, which this change adds a setting for.
Expose the experimental simplepattern and
simplepatternsplit tokenizers in the common
analysis plugin. They provide tokenization based
on regular expressions, using Lucene's
deterministic regex implementation that is usually
faster than Java's and has protections against
creating too-deep stacks during matching.
Both have a not-very-useful default pattern of the
empty string because all tokenizer factories must
be able to be instantiated at index creation time.
They should always be configured by the user
in practice.
This commit adds a setting to change the request timeout for the rest client. This is useful as the
default timeout is 30s, which is also the same default for calls like cluster health. If both are
the same then the response from the cluster health api will not be received as the client usually
times out first making test failures harder to debug.
Relates #25185
Today if a channel gets closed due to a disconnect we notify the response
handler that the connection is closed and the node is disconnected. Unfortunately
this is not a complete solution since it only works for published connections.
Connections that are unpublished ie. for discovery can indefinitely hang since we
never invoke their handers when we get a failure while a user is waiting for
the response. This change adds connection tracking to TcpTransport that ensures
we are notifying the corresponding connection if there is a failure on a channel.
When we disabled `_all` by default for indices created in 6.0, we missed adding
a layer that would handle the situation where `_all` was not enabled in 5.x and
then the cluster was updated to 6.0, this means that when the cluster was
updated the `_all` field would be disabled for 5.x indices and field values
would not be added to the `_all` field.
This adds a compatibility layer for 5.x indices where we treat the default
enabled value for the `_all` field to be `true` if unset on 5.x indices.
Resolves#25068
Test: randomVersionBetween works with unreleased
Modifies randomVersionBetween so that it works with unreleased
versions. This should make switching a version from unreleased
to released much simpler.
This commit refactors the query phase in order to be able
to automatically detect queries that can be early terminated.
If the index sort matches the query sort, the top docs collection is early terminated
on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector.
This change also adds a new parameter to the search request called `track_total_hits`.
It indicates if the total number of hits that match the query should be tracked.
If false, queries sorted by the index sort will not try to compute this information and
and will limit the collection to the first N documents per segment.
Aggregations are not impacted and will continue to see every document
even when the index sort matches the query sort and `track_total_hits` is false.
Relates #6720
This commit modifies query_string, simple_query_string and multi_match queries to always use a DisjunctionMaxQuery when a disjunction over multiple fields is built. The tiebreaker is set to 1 in order to behave like the boolean query in terms of scoring.
The removal of the coord factor in Lucene 7 made this change mandatory to correctly handle minimum_should_match.
Closes#23966
This commit moves the assumeFalse() calls that implement test skipping
and blacklisting out of the @Before method of ESClientYamlSuiteTestCase.
The problem with having them in the @Before method is that if an
assumption triggers then the @Before methods of classes that extend
ESClientYamlSuiteTestCase will not run, but their @After methods will.
This can lead to inconsistencies that cause assertions in the @After
methods and fail the test even though it was skipped/blacklisted.
Instead the assumeFalse() calls are now at the beginning of the test()
method, which runs after all @Before methods (including those in classes
that extend ESClientYamlSuiteTestCase) have completed. The only side
effect is that overridden test() methods in classes that extend
ESClientYamlSuiteTestCase which call super.test() and also do other things
must now be designed not to consume any InternalAssumptionViolatedException
that may be thrown by the super.test() call.
Relates elastic/x-pack-elasticsearch#1650
For the response parsing we want to be lenient when it comes to parsing
new xContent fields. In order to ensure this in our testing, this change
adds a utility method to XContentTestUtils that takes xContent bytes
representation as input and recursively a random field on each object
level.
Sometimes we also want to exclude a whole subtree from this treatment
(e.g. skipping "_source"), other times an element (e.g. "fields", "highlight"
in SearchHit) can have arbitraryly named objects. Those cases can be
specified as exceptions.
Splits TranslogRecoveryPerformer into three parts:
- the translog operation to engine operation converter
- the operation perfomer (that indexes the operation into the engine)
- the translog statistics (for which there is already RecoveryState.Translog)
This makes it possible for peer recovery to use the same IndexShard interface as bulk shard requests (i.e. Engine operations instead of Translog operations). It also pushes the "fail on bad mapping" logic outside of IndexShard. Future pull requests could unify the BulkShard and peer recovery path even more.
This change moves the parent_id query to the parent-join module and handles the case when only the parent-join field can be declared on an index (index with single type on).
If single type is off it uses the legacy parent join field mapper and switch to the new one otherwise (default in 6).
Relates #20257
Both gradle and java code attempt to infer the type of a each
Version constant in Version.java. It is super important that
they infer that each constant has the same type. If they disagree
we might accidentally not be testing backwards compatibility for
some version.
This adds a test to make sure that they agree, modulo known and
accepted differences (mostly around alphas). It also changes the
minimum wire compatible version from the released 5.4.0 to the
unreleased 5.5.0 as that lines up with the gradle logic.
Relates to #24798
Note that the gradle and java version logic doesn't actually match so
this contains a hack to make it *look* like it matches. Since this is a
start, I'm merging it and going to work on some followups to make the
logic actually match.....
This commit creates TemplateScript and associated classes so that
templates no longer need a special ScriptService.compileTemplate method.
The execute() method is equivalent to the old run() method.
relates #20426
Previously, when allocating bytes for a BigArray, the array was created
(or attempted to be created) and only then would the array be checked
for the amount of RAM used to see if the circuit breaker should trip.
This is problematic because for very large arrays, if creating or
resizing the array, it is possible to attempt to create/resize and get
an OOM error before the circuit breaker trips, because the allocation
happens before checking with the circuit breaker.
This commit ensures that the circuit breaker is checked before all big
array allocations (note, this does not effect the array allocations that
are less than 16kb which use the [Type]ArrayWrapper classes found in
BigArrays.java). If such an allocation or resizing would cause the
circuit breaker to trip, then the breaker trips before attempting to
allocate and potentially running into an OOM error from the JVM.
Closes#24790
After every REST test we wait for the list of pending cluster tasks
to empty before moving on to the next task. If the list doesn't
empty in 10 second we fail the test. This improves the error message
when we fail the test to include the list of running tasks.
For comparing actual and parsed object equality for the response parsing we
currently rely on comparing the original xContent and the output of the parsed
object. Currently we only have cryptic error messages if this comparison fails
which are hard to read also because we recursively compare lists and maps of
the xContent structures we compare.
This commits leverages the existing NotEqualMessageBuilder for providing error
messages that are more detailed and useful for debugging if an error occurs.
ScriptContexts currently understand a FactoryType that can produce
instances of the script InstanceType. However, for search scripts, this
does not work as we have the concept of LeafSearchScript that is created
per lucene segment. This commit effectively renames the existing
SearchScript class into SearchScript.LeafFactory, which is a new,
optional, class that can be defined within a ScriptContext.
LeafSearchScript is effectively renamed back into SearchScript. This
change allows the model of stateless factory -> stateful factory ->
script instance to continue, but in a generic way that any script
context may take advantage of.
relates #20426
Removes the need for the `_UNRELEASED` suffix on versions by detecting if a version should be unreleased or not based on the versions around it. This should make it simpler to automate the task of adding a new version label.
This commit renames the concept of the "compiled type" to a "factory
type", along with all implementations of this class to be named Factory.
This brings it inline with the classes purpose.
This commit adds collection of all contexts to the parameters of
getScriptEngine. This will allow script engines like painless to
precache extra information about the contexts.
This commit changes the compile method of ScriptEngine to be generic in
the same way it is on ScriptService. This moves the shim of handling the
two existing context classes into each script engine, so that each
engine can be worked on independently to convert to real handling of
contexts.
This commit modifies the compile method of ScriptService to be context
aware. The ScriptContext is now a generic class which contains both the
instance type and compiled type for a script. Instance type may be
stateful (for example, pre loading field information for the index a
script will execute on, like in expressions), while the compiled type is
stateless and used to construct instance type instances. This change is
only a first step to cutover ScriptService to the new paradigm. It only
converts callers to the script service, and has a small shim to wrap
compilation from the script engines to support the current two fixed
instance types, SearchScript and ExecutableScript.
Since groovy was removed, we no longer have any ScriptEngines with
resources to release. We may want to keep the option open for a script
engine to close resources, but this would not be common. This commit
adds a default implementation to ScriptEngine for `close()` to reduce
the boiler plate that must be added for a ScriptEngine implementation.
With #24779 in place, we can now guaranteed that a single translog generation file will never have a sequence number conflict that needs to be resolved by looking at primary terms. These conflicts can a occur when a replica contains an operation which isn't part of the history of a newly promoted primary. That primary can then assign a different operation to the same slot and replicate it to the replica.
PS. Knowing that each generation file is conflict free will simplifying repairing these conflicts when we read from the translog.
PPS. This PR also fixes some bugs in the piping of primary terms in the bulk shard action. These bugs are a result of the legacy of IndexRequest/DeleteRequest being a ReplicationRequest. We need to change that as a follow up.
Relates to #10708
This commit cleans up tests which currently use custom script engine
implementations, converting them to use a MockScriptEngine with script
functions provided by the tests. It also creates a common set of metric
scripts which were copied across a couple metric agg tests.
Large test suites with unfortunate seed choices can easily exceed the
1000 script compilations per minute limit. This commit increases the
limit in integration tests to 2048.
Adds a "magic" key to the yaml testing stash mostly for use with
documentation tests. When unstashing an object, `$_path` is the
path into the current position in the object you are unstashing.
This means that in docs tests you can use
`// TESTRESPONSEs/somevalue/$body.${_path}/` to mean "replace
`somevalue` with whatever is the response in the same position."
Compare how you must carefully mock out all the numbers in the profile
response without this change:
```
// TESTRESPONSE[s/"id": "\[2aE02wS1R8q_QFnYu6vDVQ\]\[twitter\]\[1\]"/"id": $body.profile.shards.0.id/]
// TESTRESPONSE[s/"rewrite_time": 51443/"rewrite_time": $body.profile.shards.0.searches.0.rewrite_time/]
// TESTRESPONSE[s/"score": 51306/"score": $body.profile.shards.0.searches.0.query.0.breakdown.score/]
// TESTRESPONSE[s/"time_in_nanos": "1873811"/"time_in_nanos": $body.profile.shards.0.searches.0.query.0.time_in_nanos/]
// TESTRESPONSE[s/"build_scorer": 2935582/"build_scorer": $body.profile.shards.0.searches.0.query.0.breakdown.build_scorer/]
// TESTRESPONSE[s/"create_weight": 919297/"create_weight": $body.profile.shards.0.searches.0.query.0.breakdown.create_weight/]
// TESTRESPONSE[s/"next_doc": 53876/"next_doc": $body.profile.shards.0.searches.0.query.0.breakdown.next_doc/]
// TESTRESPONSE[s/"time_in_nanos": "391943"/"time_in_nanos": $body.profile.shards.0.searches.0.query.0.children.0.time_in_nanos/]
// TESTRESPONSE[s/"score": 28776/"score": $body.profile.shards.0.searches.0.query.0.children.0.breakdown.score/]
// TESTRESPONSE[s/"build_scorer": 784451/"build_scorer": $body.profile.shards.0.searches.0.query.0.children.0.breakdown.build_scorer/]
// TESTRESPONSE[s/"create_weight": 1669564/"create_weight": $body.profile.shards.0.searches.0.query.0.children.0.breakdown.create_weight/]
// TESTRESPONSE[s/"next_doc": 10111/"next_doc": $body.profile.shards.0.searches.0.query.0.children.0.breakdown.next_doc/]
// TESTRESPONSE[s/"time_in_nanos": "210682"/"time_in_nanos": $body.profile.shards.0.searches.0.query.0.children.1.time_in_nanos/]
// TESTRESPONSE[s/"score": 4552/"score": $body.profile.shards.0.searches.0.query.0.children.1.breakdown.score/]
// TESTRESPONSE[s/"build_scorer": 42602/"build_scorer": $body.profile.shards.0.searches.0.query.0.children.1.breakdown.build_scorer/]
// TESTRESPONSE[s/"create_weight": 89323/"create_weight": $body.profile.shards.0.searches.0.query.0.children.1.breakdown.create_weight/]
// TESTRESPONSE[s/"next_doc": 2852/"next_doc": $body.profile.shards.0.searches.0.query.0.children.1.breakdown.next_doc/]
// TESTRESPONSE[s/"time_in_nanos": "304311"/"time_in_nanos": $body.profile.shards.0.searches.0.collector.0.time_in_nanos/]
// TESTRESPONSE[s/"time_in_nanos": "32273"/"time_in_nanos": $body.profile.shards.0.searches.0.collector.0.children.0.time_in_nanos/]
```
To how you can cavalierly mock all the numbers at once with this change:
```
// TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
```
This commit removes an unused assertions enabled method in
ESTestCase. For future uses of such a method, use the field ENABLED in
org.elasticsearch.Assertions.
This commit moves the handling of nested and parent/child inner hits to specialized classes that can be defined outside of ES core.
InnerHitBuilderContext is now used by the parent query (nested or hasChild, ...) to build the sub context from the InnerHitBuilder definition.
BWC is also ensured so that nodes in previous versions can still send/receive inner hits to/from this version.
Relates #20257
As we work towards contexts implying the return type of compilation, we
first need ScriptContext to not be an enum. This commit removes the
Standard enum and Plugin subclass of ScriptContext.
ScriptEngine implementations have an overridable method to indicate they
are safe to use as inline scripts. Since groovy was removed fro 6.0,
there are no longer any implementations which used the default false
value. Furthermore, the value was not actually read anywhere. This
commit removes the method. The ScriptEngineRegistry was also no longer
necessary as it only was used to build a map from language to engine.
Allows plugins to register pre-configured tokenizers. Much
of the decisions are the same as those in #24223, #24572,
and #24223. This only migrates the lowercase tokenizer but
I figure that is a good start because it proves out the features.
This PR revolves around places in the code where introducing a StringBuilder might make the construction
of a String easier to follow and also, maybe avoid a case where the compiler's very safe way of introducing
StringBuilder instead of String might not always be optimal for performance.
Native scripts have been replaced in documentation by implementing
a ScriptEngine and they were deprecated in 5.5.0. This commit
removes the native script infrastructure for 6.0.
closes#19966
Approaching the release of 6.0 we need to sort out the usage of
`Version#minimumCompatibilityVersion` which was still set to 5.0.0.
Now this change moves it to the latest released version of 5.x (5.4 at this point)
to ensure we are compatible with the latest minor of the previous major. This change
also removes all the `_UNRELEASED` from the versions that where released and drops versions
that were never released and are not expected to be released (bugfixes in minors that are not
the latest in the previous major).
We've switched to supporting only `yml` files but anyone who didn't
notice will commit a `yaml` file which won't be executed
which is bad because it is easy not to notice. The test to catch this is
simple enough that I think it is worth adding just to warn folks about
their mistake.
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
Moves the remaining preconfigured token figured into the analysis-common module. There were a couple of tests in core that depended on the pre-configured token filters so I had to touch them:
* `GetTermVectorsCheckDocFreqIT` depended on `type_as_payload` but didn't do anything important with it. I dropped the dependency. Then I moved the test to a single node test case because we're trying to cut down on the number of `ESIntegTestCase` subclasses.
* `AbstractTermVectorsTestCase` and its subclasses depended on `type_as_payload`. I dropped their usage of the token filter and added an integration test for the termvectors API that uses `type_as_payload` to the `analysis-common` module.
* `AnalysisModuleTests` expected a few pre-configured token filtes be registered by default. They aren't any more so I dropped this assertion. We assert that the `CommonAnalysisPlugin` registers these pre-built token filters in `CommonAnalysisFactoryTests`
* `SearchQueryIT` and `SuggestSearchIT` had tests that depended on the specific behavior of the token filters so I moved the tests to integration tests in `analysis-common`.
Today when an index is `read-only` the index is also blocked from
being deleted which sometimes is undesired since in-order to make
changes to a cluster indices must be deleted to free up space. This is
a likely scenario in a hosted environment when disk-space is limited to switch
indices read-only but allow deletions to free up space.
This is almost exclusively for docs test which frequently match the
entire response. This allow something like:
```
- set: {nodes.$master.http.publish_address: host}
- match:
$body:
{
"nodes": {
$host: {
... stuff in here ...
}
}
}
```
This should make it possible for the docs tests to work with
unpredictable keys.
This moves the releasing logic to the base test, so that individual test cases don't need
to worry about releasing the aggregators. It's not a big deal for individual aggs,
but once tests start using sub-aggs, it can become tricky to free (without double-freeing)
all the aggregators.
Template script engines (mustache, the only one) currently return a
BytesReference that users must know is utf8 encoded. This commit
modifies all callers and mustache to have the template engine return
String. This is much simpler, and does not require decoding in order to
use (for example, in ingest).
We had a hack in setting up permissions for tests to support testing
the lang-python plugin. We also had a hack to prevent Log4j from
loading a shaded version of Jansi provided by Jython. This plugin has
been removed so these hacks are no longer necessary.
Relates #24681
The disruption tests sit in a single test suite which causes these tests
to be single-threaded. We can split this test suite into multiple suites
(logically, of course) enabling them to be run in parallel reducing the
total run time of all integration tests in core. This commit splits the
discovery with service disruptions test suite into three suites
- master disruptions
- discovery disruptions
- cluster disruptions
The last one could probably be better named, it is meant to represent
performing actions in the cluster (indexing, failing a shard, etc.)
while a disruption is taking place.
Relates #24662
When constructing an array list, if we know the size of the list in
advance (because we are adding objects to it derived from another list),
we should size the array list to the appropriate capacity in advance (to
avoid resizing allocations). This commit does this in various places.
Relates #24439
* Add parent-join module
This change adds a new module named `parent-join`.
The goal of this module is to provide a replacement for the `_parent` field but as a first step this change only moves the `has_child`, `has_parent` queries and the `children` aggregation to this module.
These queries and aggregations are no longer in core but they are deployed by default as a module.
Relates #20257
Today we prune transport handlers in TransportService when a node is disconnected.
This can cause connections to starve in the TransportService if the connection is
opened as a short living connection ie. without sharing the connection to a node
via registering in the transport itself. This change now moves to pruning based
on the connections cache key to ensure we notify handlers as soon as the connection
is closed for all connections not just for registered connections.
Relates to #24632
Relates to #24575
Relates to #24557
- Removes clusterState, getInitialClusterState and getMinimumMasterNodes methods from Discovery interface.
- Sets PingContextProvider in ZenPing constructor
- Renames state in ZenDiscovery to committedState
This allows other plugins to use a client to call the functionality
that is in the core modules without duplicating the logic.
Plugins can now safely send the request and response classes via the
client even if the requests are executed locally. All relevant classes
are loaded by the core classloader such that plugins can share them.
This is re-adds this commit that was revered in 952feb58e4
This commit adds support for histogram and date_histogram agg compound order by refactoring and reusing terms agg order code. The major change is that the Terms.Order and Histogram.Order classes have been replaced/refactored into a new class BucketOrder. This is a breaking change for the Java Transport API. For backward compatibility with previous ES versions the (date)histogram compound order will use the first order. Also the _term and _time aggregation order keys have been deprecated; replaced by _key.
Relates to #20003: now that all these aggregations use the same order code, it should be easier to move validation to parse time (as a follow up PR).
Relates to #14771: histogram and date_histogram aggregation order will now be validated at reduce time.
Closes#23613: if a single BucketOrder that is not a tie-breaker is added with the Java Transport API, it will be converted into a CompoundOrder with a tie-breaker.
This allows other plugins to use a client to call the functionality
that is in the core modules without duplicating the logic.
Plugins can now safely send the request and response classes via the
client even if the requests are executed locally. All relevant classes
are loaded by the core classloader such that plugins can share them.
Adds tests for reindex-from-remote for the latest 2.4, 1.7, and
0.90 releases. 2.4 and 1.7 are fairly popular versions but 0.90
is a point of pride.
This fixes any issues those tests revealed.
Closes#23828Closes#24520
This adds parsing to all implementations of SingleBucketAggregations. They are mostly similar, so they share the common
base class `ParsedSingleBucketAggregation` and the shared base test `InternalSingleBucketAggregationTestCase`.
There are now three public static method to build instances of
PreConfiguredTokenFilter and the ctor is private. I chose static
methods instead of constructors because those allow us to change
out the implementation returned if we so desire.
Relates to #23658
This PR introduces a subproject in test/fixtures that contains a Vagrantfile used for standing up a
KRB5 KDC (Kerberos). The PR also includes helper scripts for provisioning principals, a few
changes to the HDFS Fixture to allow it to interface with the KDC, as well as a new suite of
integration tests for the HDFS Repository plugin.
The HDFS Repository plugin senses if the local environment can support the HDFS Fixture
(Windows is generally a restricted environment). If it can use the regular fixture, it then tests if
Vagrant is installed with a compatible version to determine if the secure test fixtures should be
enabled. If the secure tests are enabled, then we create a Kerberos KDC fixture, tasks for adding
the required principals, and an HDFS fixture configured for security. A new integration test task is
also configured to use the KDC and secure HDFS fixture and to run a testing suite that uses
authentication. At the end of the secure integration test the fixtures are torn down.
Adds a new "icu_collation" field type that exposes lucene's
ICUCollationDocValuesField. ICUCollationDocValuesField is the replacement
for ICUCollationKeyFilter which has been deprecated since Lucene 5.
This commit renames ScriptEngineService to ScriptEngine. It is often
confusing because we have the ScriptService, and then
ScriptEngineService implementations, but the latter are not services as
we see in other places in elasticsearch.
File scripts have 2 related settings: the path of file scripts, and
whether they can be dynamically reloaded. This commit deprecates those
settings.
relates #21798
Today we rely on background syncs to relay the global checkpoint under
the mandate of the primary to its replicas. This means that the global
checkpoint on a replica can lag far behind the primary. The commit moves
to inlining global checkpoints with replication requests. When a
replication operation is performed, the primary will send the latest
global checkpoint inline with the replica requests. This keeps the
replicas closer in-sync with the primary.
However, consider a replication request that is not followed by another
replication request for an indefinite period of time. When the replicas
respond to the primary with their local checkpoint, the primary will
advance its global checkpoint. During this indefinite period of time,
the replicas will not be notified of the advanced global
checkpoint. This necessitates a need for another sync. To achieve this,
we perform a global checkpoint sync when a shard falls idle.
Relates #24513
This changes the way we register pre-configured token filters so that
plugins can declare them and starts to move all of the pre-configured
token filters out of core. It doesn't finish the job because doing
so would make the change unreviewably large. So this PR includes
a shim that keeps the "old" way of registering pre-configured token
filters around.
The Lowercase token filter is special because there is a "special"
interaction between it and the lowercase tokenizer. I'm not sure
exactly what to do about it so for now I'm leaving it alone with
the intent of figuring out what to do with it in a followup.
This also renames these pre-configured token filters from
"pre-built" to "pre-configured" because that seemed like a more
descriptive name.
This is a part of #23658
Now that indices have a single type by default, we can move to the next step
and identify documents using their `_id` rather than the `_uid`.
One notable change in this commit is that I made deletions implicitly create
types. This helps with the live version map in the case that documents are
deleted before the first type is introduced. Otherwise there would be no way
to differenciate `DELETE index/foo/1` followed by `PUT index/foo/1` from
`DELETE index/bar/1` followed by `PUT index/foo/1`, even though those are
different if versioning is involved.
In order to make MockLogAppender (utility to test logging) available outside
of es-core move MockLogAppender from test core-tests to test framework. As
package names do not change, no need to change clients.
Changes the scope of the AllocationService dependency injection hack so that it is at least contained to the AllocationService and does not leak into the Discovery world.
Async shard fetching only uses the node id to correlate responses to requests. This can lead to a situation where a response from an earlier request is mistaken as response from a new request when a node is restarted. This commit adds unique round ids to correlate responses to requests.
TransportService and RemoteClusterService are closely coupled already today
and to simplify remote cluster integration down the road it can be a direct
dependency of TransportService. This change moves RemoteClusterService into
TransportService with the goal to make it a hidden implementation detail
of TransportService in followup changes.
This commit cleans up some cases where a list or map was being
constructed, and then an existing collection was copied into the new
collection. The clean is to instead use an appropriate constructor to
directly copy the existing collection in during collection
construction. The advantage of this is that the new collection is sized
appropriately.
Relates #24409
Separates cluster state publishing from applying cluster states:
- ClusterService is split into two classes MasterService and ClusterApplierService. MasterService has the responsibility to calculate cluster state updates for actions that want to change the cluster state (create index, update shard routing table, etc.). ClusterApplierService has the responsibility to apply cluster states that have been successfully published and invokes the cluster state appliers and listeners.
- ClusterApplierService keeps track of the last applied state, but MasterService is stateless and uses the last cluster state that is provided by the discovery module to calculate the next prospective state. The ClusterService class is still kept around, which now just delegates actions to ClusterApplierService and MasterService.
- The discovery implementation is now responsible for managing the last cluster state that is used by the consensus layer and the master service. It also exposes the initial cluster state which is used by the ClusterApplierService. The discovery implementation is also responsible for adding the right cluster-level blocks to the initial state.
- NoneDiscovery has been renamed to TribeDiscovery as it is exclusively used by TribeService. It adds the tribe blocks to the initial state.
- ZenDiscovery is synchronized on state changes to the last cluster state that is used by the consensus layer and the master service, and does not submit cluster state update tasks anymore to make changes to the disco state (except when becoming master).
Control flow for cluster state updates is now as follows:
- State updates are sent to MasterService
- MasterService gets the latest committed cluster state from the discovery implementation and calculates the next cluster state to publish
- MasterService submits the new prospective cluster state to the discovery implementation for publishing
- Discovery implementation publishes cluster states to all nodes and, once the state is committed, asks the ClusterApplierService to apply the newly committed state.
- ClusterApplierService applies state to local node.
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.
Relates #15613
This commit fixes some inconsistencies in long GC disruption where we
mixed stopping and suspending when the action we are performing on
threads is suspending which is distinct from stopping a thread.
The one argument ctor for `Script` creates a script with the
default language but most usages of are for testing and either
don't care about the language or are for use with
`MockScriptEngine`. This replaces most usages of the one argument
ctor on `Script` with calls to `ESTestCase#mockScript` to make
it clear that the tests don't need the default scripting language.
I've also factored out some copy and pasted script generation
code into a single place. I would have had to change that code
to use `mockScript` anyway, so it was easier to perform the
refactor.
Relates to #16314
We can leak disrupted threads here since we never wait for them to
complete after freeing them from their loops. This commit addresses this
by joining on disrupted threads, and addresses fallout from trying to
join here.
Relates #24338
Another step down the road to dropping the
lucene-analyzers-common dependency from core.
Note that this removes some tests that no longer compile from
core. I played around with adding them to the analysis-common
module where they would compile but we already test these in
the tests generated from the example usage in the documentation.
I'm not super happy with the way that `requriesAnalysisSettings`
works with regards to plugins. I think it'd be fairly bug-prone
for plugin authors to use. But I'm making it visible as is for
now and I'll rethink later.
A part of #23658
Most of these settings should always be pulled from the repository
settings. A couple were leftover that should be moved to client
settings. The path style access setting should be removed altogether.
This commit adds deprecations for all of these existing settings, as
well as adding new client specific settings for max retries and
throttling.
relates #24143
Creates a new task `namingConventionsMain`, that runs on the
`buildSrc` and `test:framework` projects and fails the build if
any of the classes in the main artifacts are named like tests or
are non-abstract subclasses of ESTestCase.
It also fixes the three tests that would cause it to fail.
`tests.enable_mock_modules` is a documented but unrespected / unused
option to disable all mock modules / pluings during test runs. This
will basically site-step mock assertions like check-index on shard closing.
This can speed up test-execution dramatically on nodes with slow disks etc.
Relates to #24304
If the user explicitly configured path.data to include
default.path.data, then we should not fail the node if we find indices
in default.path.data. This commit addresses this.
Relates #24285
The unwrap method was leftover from support javascript and python. Since
those languages are removed in 6.0, this commit removes the unwrap
feature from scripts.
Today we might promote a primary and recover from store where after translog
recovery the local checkpoint is still behind the maximum sequence ID seen.
To fill the holes in the sequence ID history this PR adds a utility method
that fills up all missing sequence IDs up to the maximum seen sequence ID
with no-ops.
Relates to #10708
Start moving built in analysis components into the new analysis-common
module. The goal of this project is:
1. Remove core's dependency on lucene-analyzers-common.jar which should
shrink the dependencies for transport client and high level rest client.
2. Prove that analysis plugins can do all the "built in" things by moving all
"built in" behavior to a plugin.
3. Force tests not to depend on any oddball analyzer behavior. If tests
need anything more than the standard analyzer they can use the mock
analyzer provided by Lucene's test infrastructure.
This change adds an index setting to define how the documents should be sorted inside each Segment.
It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on disk.
It is not allowed to use a `nested` fields inside an index that defines an index sorting since `nested` fields relies on the original sort of the index.
This change does not add early termination capabilities in the search layer. This will be added in a follow up.
Relates #6720
This change simplifies how the rest test runner finds test files and
removes all leniency. Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.
closes#20240
We want to upgrade to Lucene 7 ahead of time in order to be able to check whether it causes any trouble to Elasticsearch before Lucene 7.0 gets released. From a user perspective, the main benefit of this upgrade is the enhanced support for sparse fields, whose resource consumption is now function of the number of docs that have a value rather than the total number of docs in the index.
Some notes about the change:
- it includes the deprecation of the `disable_coord` parameter of the `bool` and `common_terms` queries: Lucene has removed support for coord factors
- it includes the deprecation of the `index.similarity.base` expert setting, since it was only useful to configure coords and query norms, which have both been removed
- two tests have been marked with `@AwaitsFix` because of #23966, which we intend to address after the merge
In Elasticsearch 5.3.0 a bug was introduced in the merging of default
settings when the target setting existed as an array. When this bug
concerns path.data and default.path.data, we ended up in a situation
where the paths specified in both settings would be used to write index
data. Since our packaging sets default.path.data, users that configure
multiple data paths via an array and use the packaging are subject to
having shards land in paths in default.path.data when that is very
likely not what they intended.
This commit is an attempt to rectify this situation. If path.data and
default.path.data are configured, we check for the presence of indices
there. If we find any, we log messages explaining the situation and fail
the node.
Relates #24099
Some systems like GCE rely on a plaintext file containing credentials.
Rather than extract the information out of that credentials file and
store each peace individually in the keystore, it is cleaner to just
store the entire file.
This commit adds support to the keystore wrapper for secure file
settings. These are settings that contain an entire file that would
normally be stored on the local filesystem. Retrieving the file returns
an input stream to the file contents. This also adds a `add-file`
command to the keystore cli.
In order to support both strings and files as values for settings, the
metadata format of the keystore has also been updated (with backcompat)
to keep a map of setting name to type.
This commit adds support for replacing a stashed value within a header of a REST test. This is
useful for requests that may want to use a value previously obtained within a header.
We had a couple of unfortunate field name collisions in our CI, where the json duplicate check tripped. Increasing the minimum length of randomly generated field names should decrease the chance of this issue happening again.
This change adds secure settings for access/secret keys and proxy
username/password to ec2 discovery. It adds the new settings with the
prefix `discovery.ec2`, copies other relevant ec2 client settings to the
same prefix, and deprecates all other settings (`cloud.aws.*` and
`cloud.aws.ec2.*`). Note that this is simpler than the client configs
in repository-s3 because discovery is only initialized once for the
entire node, so there is no reason to complicate the configuration with
the ability to have multiple sets of client settings.
relates #22475
This test was sporadically failing for the following reason:
- 4 nodes (nodes 0, 1, 2, and 3) running with `minimum_master_nodes` set to 3
- we stop 2 nodes (node 0 and 3)
- wait for cluster block to be in place on all nodes
- start 2 nodes (node 4 and node 5) and do a `prepareHealth().setWaitForNodes("4")`
- then do a search request
The search request runs into the `ClusterBlockException` as the `prepareHealth().setWaitForNodes("4")` check succeeds on a cluster state that has
nodes 1, 2, 3, and 4, i.e., only one of the two new nodes has joined the cluster and only one of the two dead nodes was removed by the master
(removing the dead nodes only happens after there are again `minimum_master_nodes` nodes in the cluster).
This commit fixes the issue by reusing a method from InternalTestCluster that checks that the right nodes have rejoined the cluster.
ESTestCase has methods to shuffle xContent keys given a builder or a parser. Shuffling wasn't actually doing what was expected but rather reordering the keys in their natural ordering, hence the output was always the same at every run. Corrected that and added tests, also fixed a couple of tests that were affected by this fix.
When executing an index operation on the primary shard,
`TransportShardBulkAction` first parses the document, sees if there are any
mapping updates that needs to be applied, and then updates the mapping on the
master node. It then re-parses the document to make sure that the mappings have
been applied and propagated.
This adds a check that skips the second parsing of the document in the event
there was not a mapping update applied in the first case.
Fixes a performance regression introduced in #23665
This commit renames the random ASCII helper methods in ESTestCase. This
is because this method ultimately uses the random ASCII methods from
randomized runner, but these methods actually only produce random
strings generated from [a-zA-Z].
Relates #23886
the test reduce the wait for initial cluster state to 0, causing multiple nodes to be start while elections are going on. This means there is a chance of a split election which shouldn't cause the test to time out.
This commit adds a single node discovery type. With this discovery type,
a node will elect itself as master and never form a cluster with another
node.
Relates #23595
It starts nodes in any order and thus it disabled the wait for first cluster state at node start up time
the later is required for the auto management logic.
Closes#23728
The method Boolean#getBoolean is dangerous. It is too easy to mistakenly
invoke this method thinking that it is parsing a string as a
boolean. However, what it actually does is get a system property with
the specified string, and then attempts to use usual crappy boolean
parsing in the JDK to parse that system property as boolean with
complete leniency (it parses every input value into either true or
false); that is, this method amounts to invoking
Boolean#parseBoolean(String) on the result of
System#getProperty(String). Boo. This commit bans usage of this method.
Relates #23864
This commit changes the listener passed to sendMessage from a Runnable
to a ActionListener.
This change also removes IOException from the sendMessage signature.
That signature is misleading as it allows implementers to assume an
exception will be thrown in case of failure. That does not happen due
to Netty's async nature.
When executing an update request, the request timeout is not transferred
to the index/delete request executed on behalf of the update
request. This leads to update requests not timing out when they should
(e.g., if not all shards are available when the request specifies
wait_for_shards=all with a small timeout). This commit causes the
index/delete requests to honor the update request timeout.
Relates #23825
After the removal of the joda time hack we used to have, we can cleanup
the codebase handling in security, jarhell and plugins to be more picky
about uniqueness. This was originally in #18959 which was never merged.
closes#18959
Search took time uses an absolute clock to measure elapsed time, and
then tries to deal with the complexities of using an absolute clock for
this purpose. Instead, we should use a high-precision monotonic relative
clock that is designed exactly for measuring elapsed time. This commit
modifies the search infrastructure to use a relative clock for measuring
took time, but still provides an absolute clock for the components of
search that require a real clock (e.g., index name expression
resolution, etc.).
Relates #23662
Currently the task manager is tied to the transport and can only create tasks based on TransportRequests. This commit enables task manager to support tasks created by non-transport services such as the persistent tasks service.
Throw error when skip or do sections are malformed, such as they don't start with the proper token (START_OBJECT). That signals bad indentation, which would be ignored otherwise. Thanks (or due to) our pull parsing code, we were still able to properly parse the sections, yet other runners weren't able to.
Closes#21980
* [TEST] fix indentation in matrix_stats yaml tests
* [TEST] fix indentation in painless yaml test
* [TEST] fix indentation in analysis yaml tests
* [TEST] fix indentation in generated docs yaml tests
* [TEST] fix indentation in multi_cluster_search yaml tests
Today when resetting the deprecation logger after a test is torn down,
we attach a new thread context to the deprecation logger. This thread
context is never cleared and we are left with a thread context attached
to the deprecation logger for every test method that ran in the same
JVM. This commit adds a flag when resetting the deprecation logger to
not attach a new thread context when the test is being torn down.
Relates #23441
This commit fixes the date format in warning headers. There is some
confusion around whether or not RFC 1123 requires two-digit
days. However, the warning header specification very clearly relies on a
format that requires two-digit days. This commit removes the usage of
RFC 1123 date/time format from Java 8, which allows for one-digit days,
in favor of a format that forces two-digit days (it's otherwise
identical to RFC 1123 format, it is just fixed width).
Relates #23418
This commit adds a convenience method for simultaneously asserting
settings deprecations and other warnings and fixes some tests where
setting deprecations and general warnings were present.
Previously, cluster.routing.allocation.same_shard.host was not a dynamic
setting and could not be updated after startup. This commit changes the
behavior to allow the setting to be dynamically updatable. The
documentation already states that the setting is dynamic so no
documentation changes are required.
Closes#22992
The warning header used by Elasticsearch for delivering deprecation
warnings has a specific format (RFC 7234, section 5.5). The format
specifies that the warning header should be of the form
warn-code warn-agent warn-text [warn-date]
Here, the warn-code is a three-digit code which communicates various
meanings. The warn-agent is a string used to identify the source of the
warning (either a host:port combination, or some other identifier). The
warn-text is quoted string which conveys the semantic meaning of the
warning. The warn-date is an optional quoted date that can be in a few
different formats.
This commit corrects the warning header within Elasticsearch to follow
this specification. We use the warn-code 299 which means a
"miscellaneous persistent warning." For the warn-agent, we use the
version of Elasticsearch that produced the warning. The warn-text is
unchanged from what we deliver today, but is wrapped in quotes as
specified (this is important as a problem that exists today is that
multiple warnings can not be split by comma to obtain the individual
warnings as the warnings might themselves contain commas). For the
warn-date, we use the RFC 1123 format.
Relates #23275
This allows to set content-type together with the body itself. At the moment it is always json, but this change allows makes it easier to randomize it later
Console.readText may return null in certain cases. This commit fixes a
bug in Terminal.promptYesNo which assumed a non-null return value. It
also adds a test for this, and modifies mock terminal to be able to
handle null input values.
In #23253 we added an the ability to incrementally reduce search results.
This change exposes the parameter to control the batch since and therefore
the memory consumption of a large search request.
Today all query results are buffered up until we received responses of
all shards. This can hold on to a significant amount of memory if the number of
shards is large. This commit adds a first step towards incrementally reducing
aggregations results if a, per search request, configurable amount of responses
are received. If enough query results have been received and buffered all so-far
received aggregation responses will be reduced and released to be GCed.
This commit cleans up some parsing tests added from the High Level Rest Client: IndexResponseTests, DeleteResponseTests, UpdateResponseTests, BulkItemResponseTests.
These tests are now more uniform with the others test-from-to-XContent tests we have, they now shuffle the XContent fields before parsing, the asserting method for parsed objects does not used a Map<String, Object> anymore, and buggy equals/hasCode methods in ShardInfo and ShardInfo.Failure have been removed.
This commit enforces the requirement of Content-Type for the REST layer and removes the deprecated methods in transport
requests and their usages.
While doing this, it turns out that there are many places where *Entity classes are used from the apache http client
libraries and many of these usages did not specify the content type. The methods that do not specify a content type
explicitly have been added to forbidden apis to prevent more of these from entering our code base.
Relates #19388
With #22977, network disruption also disconnects nodes from the transport service. That has the side effect that when the disruption is healed, the disconnected node stay disconnected until the `NodeConnectionsService` restores the connection. This can take too long for the tests. This PR adds logic to the cluster healing to restore connections immediately.
See https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=debian/611/console for an example failure.
When nested objects are present in the mappings, many queries get deoptimized
due to the need to exclude documents that are not in the right space. For
instance, a filter is applied to all queries that prevents them from matching
non-root documents (`+*:* -_type:__*`). Moreover, a filter is applied to all
child queries of `nested` queries in order to make sure that the child query
only matches child documents (`_type:__nested_path`), which is required by
`ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested`
queries).
These additional filters slow down `nested` queries. In 1.7-, the cost was
somehow amortized by the fact that we cached filters very aggressively. However,
this has proven to be a significant source of slow downs since 2.0 for users
of `nested` mappings and queries, see #20797.
This change makes the filtering a bit smarter. For instance if the query is a
`match_all` query, then we need to exclude nested docs. However, if the query
is `foo: bar` then it may only match root documents since `foo` is a top-level
field, so no additional filtering is required.
Another improvement is to use a `FILTER` clause on all types rather than a
`MUST_NOT` clause on all nested paths when possible since `FILTER` clauses
are more efficient.
Here are some examples of queries and how they get rewritten:
```
"match_all": {}
```
This query gets rewritten to `ConstantScore(+*:* -_type:__*)` on master and
`ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})`
with this change. The automaton is the complement of `_type:__*` so it matches
the same documents, but is faster since it is now a positive clause. Simplistic
performance testing on a 10M index where each root document has 5 nested
documents on average gave a latency of 420ms on master and 90ms with this change
applied.
```
"term": {
"foo": {
"value": "0"
}
}
```
This query is rewritten to `+foo:0 #(ConstantScore(+*:* -_type:__*))^0.0` on
master and `foo:0` with this change: we do not need to filter nested docs out
since the query cannot match nested docs. While doing performance testing in
the same conditions as above, response times went from 250ms to 50ms.
```
"nested": {
"path": "nested",
"query": {
"term": {
"nested.foo": {
"value": "0"
}
}
}
}
```
This query is rewritten to
`+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+*:* -_type:__*))^0.0`
on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The
top-level filter (`-_type:__*`) could be removed since `nested` queries only
match documents of the parent space, as well as the child filter
(`#_type:__nested`) since the child query may only match nested docs since the
`nested` object has both `include_in_parent` and `include_in_root` set to
`false`. While doing performance testing in the same conditions as above,
response times went from 850ms to 270ms.
I encountered several cases of duplicate field names when generating random
fields using the RandomObjects helper. This leads to invalid json in some tests,
so increasing the minimum field name length to four to make this less likely to
happen.
When Netty decodes a bad HTTP request, it marks the decoder result on
the HTTP request as a failure, and reroutes the request to GET
/bad-request. This either leads to puzzling responses when a bad request
is sent to Elasticsearch (if an index named "bad-request" does not exist
then it produces an index not found exception and otherwise responds
with the index settings for the index named "bad-request"). This commit
addresses this by inspecting the decoder result on the HTTP request and
dispatching the request to a bad request handler preserving the initial
cause of the bad request and providing an error message to the client.
Relates #23153
This commit adds a new method to the TransportChannel that provides access to the version of the
remote node that the response is being sent on and that the request came from. This is helpful
for serialization of data attached as headers.
The traces callback is only called after responses are set. This can lead to concurrent issues where the trace is notified of previously sent responses if it was added after the response was sent (enabling further execution of the test) but before the tracer call backs are called.
EvillPeerRecoveryIT checks scenario where recovery is happening while there are on going indexing operation that already have been assigned a seq# . This is fairly hard to achieve and the test goes through a couple of hoops via the plugin infra to achieve that. This PR extends the unit tests infra to allow for those hoops to happen in unit tests. This allows the test to be moved to RecoveryDuringReplicationTests
Relates to #22484
We have a bunch of interfaces that have only a single implementation
for 6 years now. These interfaces are pretty useless from a SW development
perspective and only add unnecessary abstractions. They also require
lots of casting in many places where we expect that there is only one
concrete implementation. This change removes the interfaces, makes
all of the classes final and removes the duplicate `foo` `getFoo` accessors
in favor of `getFoo` from these classes.
Elasticsearch v5.0.0 uses allocation IDs to safely allocate primary shards whereas prior versions of ES used a version-based mode instead. Elasticsearch v5 still has support for version-based primary shard allocation as it needs to be able to load 2.x shards. ES v6 can drop the legacy support.
#22194 gave us the ability to open low level temporary connections to remote node based on their address. With this use case out of the way, actual full blown connections should validate the node on the other side, making sure we speak to who we think we speak to. This helps in case where multiple nodes are started on the same host and a quick node restart causes them to swap addresses, which in turn can cause confusion down the road.
Secure settings from the elasticsearch keystore were not yet validated.
This changed improves support in Settings so that secure settings more
seamlessly blend in with normal settings, allowing the existing settings
validation to work. Note that the setting names are still not validated
(yet) when using the elasticsearc-keystore tool.
As part of #22116 we are going to forbid usage of api
java.net.URL#openStream(). However in a number of places across the
we use this method to read files from the local filesystem. This commit
introduces a helper method openFileURLStream(URL url) to read files
from URLs. It does specific validation to only ensure that file:/
urls are read.
Additionlly, this commit removes unneeded method
FileSystemUtil.newBufferedReader(URL, Charset). This method used the
openStream () method which will soon be forbidden. Instead we use the
Files.newBufferedReader(Path, Charset).
This is in order to trigger listeners for disconnect events, most importantly the NodeFaultDetection. MockTransportService now does slightly a better job at mimicking real life failures: connecting to already connected node will be a noop (we don't detect any errors here in production either) and failing to send will cause the target node to be disconnected.
This is the cause of failure in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+5.2+multijob-unix-compatibility/os=debian/72
When a node receives a new cluster state from the master, it opens up connections to any new node in the cluster state. That has always been done serially on the cluster state thread but it has been a long standing TODO to do this concurrently, which is done by this PR.
This is spin off of #22828, where an extra handshake is done whenever connecting to a node, which may slow down connecting. Also, the handshake is done in a blocking fashion which triggers assertions w.r.t blocking requests on the cluster state thread. Instead of adding an exception, I opted to implement concurrent connections which both side steps the assertion and compensates for the extra handshake.
This commit upgrades the checkstyle configuration from version 5.9 to
version 7.5, the latest version as of today. The main enhancement
obtained via this upgrade is better detection of redundant modifiers.
Relates #22960
This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.
The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.
As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.
In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.
See #19388
This commit change ElasticsearchException.failureFromXContent() method so that it now parses root causes which were ignored before, and adds them as suppressed exceptions of the returned exception.
The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery.
This commit adds a test that creates situation where rollback is needed (primary failover with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming `startSeqNo` as a filter.
Relates to #22484 & #10708
With the new secure settings, methods like getAsMap() no longer work
correctly as a means of checking for empty settings, or the total size.
This change converts the existing uses of that method to use methods
directly on Settings. Note this does not update the implementations to
account for SecureSettings, as that will require a followup which
changes how secure settings work.
Also adds many `equals` and `hashCode` implementations and moves
the failure printing in `MatchAssertion` into a common spot and
exposes it over `assertEqualsWithErrorMessageFromXContent` which
does an object equality test but then uses `toXContent` to print
the differences.
Relates to #22278
This moves the building blocks for delete by query into core. This
should enabled two thigns:
1. Plugins other than reindex to implement "bulk by scroll" style
operations.
2. Plugins to directly call delete by query. Those plugins should
be careful to make sure that task cancellation still works, but
this should be possible.
Notes:
1. I've mostly just moved classes and moved around tests methods.
2. I haven't been super careful about cohesion between these core
classes and reindex. They are quite interconnected because I wanted
to make the change as mechanical as possible.
Closes#22616
* S3 repository: Add named configurations
This change implements named configurations for s3 repository as
proposed in #22520. The access/secret key secure settings which were
added in #22479 are reverted, and the only secure settings are those
with the new named configs. All other previously used settings for the
connection are deprecated.
closes#22520
Also adds many `equals` and `hashCode` implementations and moves
the failure printing in `MatchAssertion` into a common spot and
exposes it over `assertEqualsWithErrorMessageFromXContent` which
does an object equality test but then uses `toXContent` to print
the differences.
Relates to #22278
This commit introduces sequence-number-based recovery. When a replica
has fallen out of sync, rather than performing a file-based recovery we
first attempt to replay operations since the last local checkpoint on
the replica. To do this, at the start of recovery the replica tells the
primary what its local checkpoint is. The primary will then wait for all
operations between that local checkpoint and the current maximum
sequence number to complete; this is to ensure that there are no gaps in
the operations that will be replayed from the primary to the
replica. This is a best-effort attempt as we currently have no
guarantees on the primary that these operations will be available; if we
are not able to replay all operations in the desired range, we just
fallback to file-based recovery. Later work will strengthen the
guarantees.
Relates #22484
* Add top hits collapsing to search request
The field collapsing is done with a custom top docs collector that "collapse" search hits with same field value.
The distributed aspect is resolve using the two passes that the regular search uses. The first pass "collapse" the top hits, then the coordinating node merge/collapse the top hits from each shard.
```
GET _search
{
"collapse": {
"field": "category",
}
}
```
This change also adds an ExpandCollapseSearchResponseListener that intercepts the search response and expands collapsed hits using the CollapseBuilder#innerHit} options.
The retrieval of each inner_hits is done by sending a query to all shards filtered by the collapse key.
```
GET _search
{
"collapse": {
"field": "category",
"inner_hits": {
"size": 2
}
}
}
```
To effectively allow a plugin to intercept a transport handler it needs
to know if the handler must be executed even if there is a rejection on the
thread pool in the case the wrapper forks a thread to execute the actual handler.
Today we try to be smart and make a generic decision if an exception should
be treated as a document failure but in some cases concurrency in the index writer
make this decision very difficult since we don't have a consistent state in the case
another thread is currently failing the IndexWriter/InternalEngine due to a tragic event.
This change simplifies the exception handling and makes specific decisions about document failures
rather than using a generic heuristic. This prevent exceptions to be treated as document failures
that should have failed the engine but backed out of failing since since some other thread has
already taken over the failure procedure but didn't finish yet.
* S3 repository: Deprecate specifying credentials through env vars and sys props
This is a follow up to #22479, where storing credentials secure way was
added.
Today we do not preserve response headers if they are present on a transport protocol
response. While preserving these headers is not always desired, in the most cases we
should pass on these headers to have consistent results for depreciation headers etc.
yet, this hasn't been much of a problem since most of the deprecations are detected early
ie. on the coordinating node such that this bug wasn't uncovered until #22647
This commit allow to optionally preserve headers when a context is restored and also streamlines
the context restore since it leaked frequently into the callers thread context when the callers
context wasn't restored again.
Previously, certain settings that could take multiple comma delimited
values would pick up incorrect values for all entries but the first if
each comma separated value was followed by a whitespace character. For
example, the multi-value "A,B,C" would be correctly parsed as
["A", "B", "C"] but the multi-value "A, B, C" would be incorrectly parsed
as ["A", " B", " C"].
This commit allows a comma separated list to have whitespace characters
after each entry. The specific settings that were affected by this are:
cluster.routing.allocation.awareness.attributes
index.routing.allocation.require.*
index.routing.allocation.include.*
index.routing.allocation.exclude.*
cluster.routing.allocation.require.*
cluster.routing.allocation.include.*
cluster.routing.allocation.exclude.*
http.cors.allow-methods
http.cors.allow-headers
For the allocation filtering related settings, this commit also provides
validation of each specified entry if the filtering is done by _ip,
_host_ip, or _publish_ip, to ensure that each entry is a valid IP
address.
Closes#22297
Today we have quite some abstractions that are essentially providing a simple
dispatch method to the plugins defining a `HttpServerTransport`. This commit
removes `HttpServer` and `HttpServerAdaptor` and introduces a simple `Dispatcher` functional
interface that delegate to `RestController` by default.
Relates to #18482
All the language clients support a special ignore parameter that doesn't get passed to elasticsearch with the request, but used to indicate which error code should not lead to an exception if returned for a specific request.
Moving this to the low level REST client will allow the high level REST client to make use of it too, for instance so that it doesn't have to intercept ResponseExceptions when the get api returns a 404.
TransportInterceptors are commonly used to enrich requests with headers etc.
which requires access the the thread context. This is not always easily possible
since threadpools are hard to access for instance if the interceptor is used on a transport client.
This commit passes on the thread context to all the interceptors for further consumption.
Closes#22585
ClusterService and TransportService expect the local discovery node to be set
before they are started but this requires manual interaction and is error prone since
to work absolutely correct they should share the same instance (same ephemeral ID).
TransportService also has 2 modes of operation, mainly realted to transport client vs. internal
to a node. This change removes the mode where we don't maintain a local node and uses a dummy local
node in the transport client since we don't bind to any port in such a case.
Local discovery node instances are now managed by the node itself and only suppliers and factories that allow
creation only once are passed to TransportService and ClusterService.
There was still small race in MockTcpTransport where channesl that are concurrently
closing are not yet removed from the reference tracking causing tests to fail. Compared to
the other races before this is a rather small windown and requires very very short test durations.
Today there are several races / holes in TcpTransport and MockTcpTransport
that can allow connections to be opened and remain unclosed while the actual
transport implementation is closed. A recently added assertions in #22554 exposes
these problems. This commit fixes several issues related to missed locks or channel
creations outside of a lock not checking if the resource is still open.