* Register data node stats from info carried back in search responses
This is part of #24915, where we now calculate the EWMA of service time for
tasks in the search threadpool, and send that as well as the current queue size
back to the coordinating node. The coordinating node now tracks this information
for each node in the cluster.
This information will be used in the future the determining the best replica a
search request should be routed to. This change has no user-visible difference.
* Move response time timing into ResponseListenerWrapper
* Move ResponseListenerWrapper to ActionListener instead of SearchActionListener
Also removes the logger
* Move `requestIndex` back to private
* De-guice-ify ResponseCollectorService \o/
* Undo all changes to SearchQueryThenFetchAsyncAction
* Remove unneeded response collector from TransportSearchAction
* Undo all changes to SearchDfsQueryThenFetchAsyncAction
* Completely rewrite the inside of ResponseCollectorService's record keeping
* Documentation and cleanups for ResponseCollectorService
* Add unit test for collection of queue size and service time
* Fix Guice construction error
* Add basic unit tests for ResponseCollectorService
* Fix version constant for the master merge
* Fix test compilation after master merge
* Add a test for node removal on cluster changed event
* Remove integration test as there are now unit tests
* Rename ResponseListenerWrapper -> SearchExecutionStatsCollector
* Fix line-length
* Make classes private and final where appropriate
* Pass nodeId into SearchExecutionStatsCollector and use only ActionListener
* Get nodeId from connection so searchShardTarget can be private
* Remove threadpool from SearchContext, get it from IndexShard instead
* Add missing import
* Use BiFunction for responseWrapper rather than passing in collector service
This change removes the leniency of having a `null` index to fetch
terms from in 6.0 onwards. This feature will be deprecated in the 5.x series
and 6.0 nodes will require the index to be set.
Closes#25750
We used to compare agaisnt the min compatible version which is misleading since
it might move over time and since we backported the `can_match` API entirely
it's better to compare against a version constant.
We can't do it in the general case because of prefix queries, but I believe this
is mostly used in query strings and not in explicit `terms` queries.
Closes#25667
When sending replica requests for replication operations, we skip
sending the request to pre-6.0 nodes for operations that such nodes
would not be aware of (e.g., the background global checkpoint sync, or
the primary/replica resync) since they would not know what to do with
these requests. Yet, we simulate that we received responses from these
nodes. Today, this is done by simulating that they sent us that their
local checkpoint is unassigned sequence number. However, for pre-6.0
nodes we have introduced a special local checkpoint used in the global
checkpoint tracker for such nodes and that is what we should use here
too. This commit fixes this issue.
Relates #25744
The following token filters were moved: arabic_normalization, german_normalization, hindi_normalization, indic_normalization, persian_normalization, scandinavian_normalization, serbian_normalization, sorani_normalization, cjk_width and cjk_width
Relates to #23658
Currently replication and recovery are both coordinated through the latest cluster state available on the ClusterService as well as through the GlobalCheckpointTracker (to have consistent local/global checkpoint information), making it difficult to understand the relation between recovery and replication, and requiring some tricky checks in the recovery code to coordinate between the two. This commit makes the primary the single owner of its replication group, which simplifies the replication model and allows to clean up corner cases we have in our recovery code. It also reduces the dependencies in the code, so that neither RecoverySourceXXX nor ReplicationOperation need access to the latest state on ClusterService anymore. Finally, it gives us the property that in-sync shard copies won't receive global checkpoint updates which are above their local checkpoint (relates #25485).
When parsing indices options from REST, we parse the optional parameters that are supported at REST (ignore_unavailable, allow_no_indices and expand_wildcards) and we provide the API default values for all the other (internal) options so that they are set to the new indices options while parsing. The `ignoreAliases` option was forgotten though, which means that whenever you pass in any index option at REST to the delete index API, you get to delete aliases like it was supported before (as ignoreAliases gets set to false like in all the other APIs).
Added unit tests for IndicesOptions parsing from REST parameters, and yaml tests for the delete index API.
This change refactors the query_string query to analyze the query text around logical operators of the query string the same way than a match_query/multi_match_query.
It also adds a type parameter that can be used to change the way multi fields query are built the same way than a multi_match query does.
Now that these queries share the same behavior regarding text analysis, some parameters are obsolete and have been deprecated:
split_on_whitespace: This setting is now ignored with a deprecation notice
if it is used explicitely. With this PR The query_string always splits on logical operator.
It simplifies the understanding of the other parameters that can have different meanings
depending on the value of split_on_whitespace.
auto_generate_phrase_queries: This setting is now ignored with a deprecation notice
if it is used explicitely. This setting only makes sense when the parser splits on whitespace.
use_dismax: This setting is now ignored with a deprecation notice
if it is used explicitely. The tie_breaker parameter is sufficient to handle best_fields/most_fields.
Fixes#25574
With cross cluster search we can potentially proxy `can_match` requests
to nodes that don't have the endpoint. This might not cause any problem
from a functional perspecitve but will cause ugly error messages on
the target node. This commit will cause an IAE if we try to talk to an
incompatible node via a proxy.
Relates to #25704
It was brought up that our current client artifacts have generic names like 'rest' that may cause conflicts with other artifacts.
This commit renames:
- rest -> elasticsearch-rest-client
- sniffer -> elasticsearch-rest-client-sniffer
- rest-high-level -> elasticsearch-rest-high-level-client
A couple of small changes are also preparing the high level client for its first release.
Closes#20248
The slop parameter defaults to 0 in the Lucene SpanNearQuery, so we can set it
to this default value also and don't have to require it being specified in the
query when using the Rest API. Leaving `slop` a ctro arg in the Java API as it
should normally be specified and we can keep it `final` that way.
Closes#25642
A shrunk index should ignore anything from templates and instead take
its mappings, aliases, and settings from the original index, plus any
new settings and aliases passed in with the shrink request. This commit
causes this to be the case.
Relates #25380
Today if we search across a large amount of shards we hit every shard. Yet, it's quite
common to search across an index pattern for time based indices but filtering will exclude
all results outside a certain time range ie. `now-3d`. While the search can potentially hit
hundreds of shards the majority of the shards might yield 0 results since there is not document
that is within this date range. Kibana for instance does this regularly but used `_field_stats`
to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results.
This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.
Requests that execute a stored script will no longer be allowed to specify the lang of the script. This information is stored in the cluster state making only an id necessary to execute against. Putting a stored script will still require a lang.
* Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits
Closes#24986
* iter
* Update ScriptDocValues to not reuse GeoPoint and Date objects
* added Javadoc about script value re-use
* Enable doc values for range fields by default.
* Store ranges in a binary format that support multi field fields.
* Added BinaryDocValuesRangeQuery that can query ranges that have been encoded into a binary doc values field.
* Wrap range queries on a range field in IndexOrDocValuesQuery query.
Closes#24314
This method does exactly what getHits() does and is used in only a few places,
so it can safely be removed. It seems to be a left-over from when
InternalSearchHits was folded into the SearchHits interface, which didn't
contain this method.
In certain situations we can early terminate and just skip the entire
query phase or make the lucene level rewrite very cheap if we can already
tell that a query won't match any documents. For instance if there is a single
`match_none` ie. due to some range rewrite in a filter or must clause of a boolean
query it can just drop all it's other queries since it will never match.
Flake ids organize bytes in such a way that ids are ordered. However, we do not
need that property and could reorganize bytes in an order that would better suit
Lucene's terms dict instead.
Some synthetic tests suggest that this change decreases the disk footprint of
the `_id` field by about 50% in many cases (see `UUIDTests.testCompression`).
For instance, when simulating the indexing of 10M docs at a rate of 10k docs
per second, the current uid generator used 20.2 bytes per document on average,
while this new generator which only puts bytes in a different order uses 9.6
bytes per document on average.
We had already explored this idea in #18209 but the attempt to share long common
prefixes had had a bad impact on indexing speed. This time I have been more
careful about putting discriminant bytes early in the `_id` in a way that
preserves indexing speed on par with today, while still allowing for better
compression.
There is a bug when a call to `BytesReferenceStreamInput` skip is made
on a `BytesReference` that has an initial offset. The offset for the
current slice is added to the current index and then subtracted from the
length. This introduces the possibility of a negative number of bytes to
skip. This happens inside a loop, which leads to an infinte loop.
This commit correctly subtracts the current slice index from the
slice.length. Additionally, the `BytesArrayTests` are modified to test
instances that include an offset.
This is a protection mechanism to prevent a single search request from
hitting a large number of shards in the cluster concurrently. If a search is
executed against all indices in the cluster this can easily overload the cluster
causing rejections etc. which is not necessarily desirable. Instead this PR adds
a per request limit of `max_concurrent_shard_requests` that throttles the number of
concurrent initial phase requests to `256` by default. This limit can be increased per request
and protects single search requests from overloading the cluster. Subsequent PRs can introduces
addiontional improvemetns ie. limiting this on a `_msearch` level, making defaults a factor of
the number of nodes or sort shards iters such that we gain the best concurrency across nodes.
We lost the cluster alias due to some special caseing in inner hits
and due to the fact that we didn't pass on the alias to the shard request.
This change ensures that we have the cluster alias present on the shard to
ensure all SearchShardTarget reads preserve the alias.
Relates to #25606
Currently when we close a channel in Netty4Utils.closeChannels we
block until the closing is complete. This introduces the possibility
that a network selector thread will block while waiting until a
separate network selector thread closes a channel.
For instance: T1 closes channel 1 (which is assigned to a T1 selector).
Channel 1's close listener executes the closing of the node. That
means that T1 now tries to close channel 2. However, channel 2 is
assigned to a selector that is running on T2. T1 now must wait until T2
closes that channel at some point in the future.
This commit addresses this by adding a boolean to closeChannels
indicating if we should block on close. We only set this boolean to true
if we are closing down the server channels at shutdown. This call is
never made from a network thread. When we call the closeChannels method
with that boolean set to false, we do not block on close.
With #24236, tribe nodes submit cluster state changes to their MasterService, making it unnecessary to explicitly update the cluster state version. This PR fixes the double-incrementing of cluster state versions on tribe nodes, which are not harmful, but unnecessary.
This change collapses some of the packages for the bucket aggregations into their parent packages. This was done for the following aggregations:
* The variants of the range aggregation (geo_distance, date and ip) were moved into the `o.e.s.a.bucket.range` package
* The `o.e.s.a.bucket.terms.support` package was removed and the classes were moved to `o.e.s.a.bucket.terms`
* The filter aggregation was moved to `o.e.s.a.bucket.filter`
Since this PR is already relatively large with only the above changes subsequent PRs will do similar operations on relevant metric and pipeline aggregations
Relates to #22868
The test is currently serializing the cluster state using an older ES version format, but then deserializes those same bytes by
assuming they are of the current ES version.
When resolving wildcards, aliases should be treated as unavailable indices when the `ignoreAliases` option is set to `true` (currently enabled with delete index api and update aliases api). This way the `allow_no_indices` and `ignore_unavailable` options can be honoured, otherwise WildcardExpressionResolver ends up treating aliases differently and there is no way to control when an error is thrown.
The default behaviour for the delete index api, which has `ignore_unavailable` set to `false` and `allow_no_indices` set to `true` by default, is to throw an error when executed against an alias, same as when it's executed against an index that does not exist.
We currently check whether translog files can be trimmed whenever we create a new translog generation or close a view. However #25294 added a long translog retention period (12h, max 512MB by default), which means translog files should potentially be cleaned up long after there isn't any indexing activity to trigger flushes/the creation of new translog files. We therefore need a scheduled background check to clean up those files once they are no longer needed.
Relates to #10708
This commit does two things:
- bumps the version from 6.0.0-alpha3 to 6.0.0-beta1
- renames the 6.0.0-alpha3 version constant to 6.0.0-beta1
Relates #25621
This commit adjusts the expectation for the max number of threads in the
scaling thread pool configuration test. The reason that this expectation
is incorrect is because we removed the limitation that the number of
processors maxes out at 32, instead letting it be the true number of
logical processors on the machine. However, when we removed this
limitation, this test was never adjusted to reflect the new reality yet
it never arose since our tests were not running on machines with
incredibly high core counts.
Relates #20874
Updating the global checkpoint on a replica can occur for a few
different reasons:
- from inlined global checkpoint updates
- from a primary term transition
- from finalizing recovery
Yet, the trace logging for a global checkpoint update does not present
this information that can be useful when tracing test failures. This
commit adds a reason for the global checkpoint update on a replica so
that we can trace these updates.
Relates #25612
The current BWC code in `BulkItemRequest` mutates the underlying `DocWriteRequests` which causes test failures and unexpected state (our test infra checks bwc serialization on the fly). This PR removes this logic from master. Another PR will add a BWC layer to 5.x only.
This PR contains the logic in https://github.com/elastic/elasticsearch/pull/25510 , which is needed to run the tests.
Previously the primary didn't update it's own local checkpoint (and thus the global checkpoint) before some indexing occurred. With recent changes the primary now properly initializes it self and thus ops recovery is possible even if no indexing has occurred.
This commit adds cross-settings validation for the low/high/flood stage
disk watermark settings. This validation was enabled by the introduction
of multiple settings validation.
Relates #25600
When a shard is promoted to replica, it's possible that it was
previously a replica that started following a new primary. When it
started following this new primary, the state of its local checkpoint
tracker was reset. Upon promotion, it's possible that the state of the
local checkpoint tracker has not yet restored from a successful
primary-replica re-sync. To account for this, we must restore the state
of the local checkpoint tracker when a replica shard is promoted to
primary. To do this, we stream the operations in the translog, marking
the operations that are in the translog as completed. We do this before
we fill the gaps on the newly promoted primary, ensuring that we have a
primary shard with a complete history up to the largest maximum sequence
number it has ever seen.
Relates #25553
This commit refactors the global checkpont tracker to make it more
resilient. The main idea is to make it more explicit what state is
actually captured and how that state is updated through
replication/cluster state updates etc. It also fixes the issue where the
local checkpoint information is not being updated when a shard becomes
primary. The primary relocation handoff becomes very simple too, we can
just verbatim copy over the internal state.
Relates #25468
The created and found fields in index and delete responses became obsolete after the introduction of the result field in index, update and delete responses (#19566).
After deprecating the created and found fields in 5.x (#19633), now they are removed.
Fixes#19630
* Improved REST endpoint exception handling, see #15335
Also improved OPTIONS http method handling to better conform with the
http spec.
* Tidied up formatting and comments
See #15335
* Tests for #15335
* Cleaned up comments, added section number
* Swapped out tab indents for space indents
* Test class now extends ESSingleNodeTestCase
* Capture RestResponse so it can be examined in test cases
Simple addition to surface the RestResponse object so we can run tests
against it (see issue #15335).
* Refactored class name, included feedback
See #15335.
* Unit test for REST error handling enhancements
Randomizing unit test for enhanced REST response error handling. See
issue #15335 for more details.
* Cleaned up formatting
* New constructor to set HTTP method
Constructor added to support RestController test cases.
* Refactored FakeRestRequest, streamlined test case.
* Cleaned up conflicts
* Tests for #15335
* Added functionality to ignore or include path wildcards
See #15335
* Further enhancements to request handling
Refactored executeHandler to prioritize explicit path matches. See
#15335 for more information.
* Cosmetic fixes
* Refactored method handlers
* Removed redundant import
* Updated integration tests
* Refactoring to address issue #17853
* Cleaned up test assertions
* Fixed edge case if OPTIONS method randomly selected as invalid method
In this test, an OPTIONS method request is valid, and should not return
a 405 error.
* Remove redundant static modifier
* Hook the multiple PathTrie attempts into RestHandler.dispatchRequest
* Add missing space
* Correctly retrieve new handler for each Trie strategy
* Only copy headers to threadcontext once
* Fix test after REST header copying moved higher up
* Restore original params when trying the next trie candidate
* Remove OPTIONS for invalidHttpMethodArray so a 405 is guaranteed in tests
* Re-add the fix I already added and got removed during merge :-/
* Add missing GET method to test
* Add documentation to migration guide about breaking 404 -> 405 changes
* Explain boolean response, pull into local var
* fixup! Explain boolean response, pull into local var
* Encapsulate multiple HTTP methods into PathTrie<MethodHandlers>
* Add PathTrie.retrieveAll where all matching modes can be retrieved
Then TrieMatchingMode can be package private and not leak into RestController
* Include body of error with 405 responses to give hint about valid methods
* Fix missing usageService handler addition
I accidentally removed this :X
* Initialize PathTrieIterator modes with Arrays.asList
* Use "== false" instead of !
* Missing paren :-/
Indexing ids in binary form should help with indexing speed since we would
have to compare fewer bytes upon sorting, should help with memory usage of
the live version map since keys will be shorter, and might help with disk
usage depending on how efficient the terms dictionary is at compressing
terms.
Since we can only expect base64 ids in the auto-generated case, this PR tries
to use an encoding that makes the binary id equal to the base64-decoded id in
the majority of cases (253 out of 256). It also specializes numeric ids, since
this seems to be common when content that is stored in Elasticsearch comes
from another database that uses eg. auto-increment ids.
Another option could be to require base64 ids all the time. It would make things
simpler but I'm not sure users would welcome this requirement.
This PR should bring some benefits, but I expect it to be mostly useful when
coupled with something like #24615.
Closes#18154
Adds a unit test that checks the TermSuggestionContext contents that is the result
of TermSuggestionBuilder#build vs. the values the original builder contains.
Transport profiles unfortunately have never been validated. Yet, it's very
easy to make a mistake when configuring profiles which will most likely stay
undetected since we don't validate the settings but allow almost everything
based on the wildcard in `transport.profiles.*`. This change removes the
settings subset based parsing of profiles but rather uses concrete affix settings
for the profiles which makes it easier to fall back to higher level settings since
the fallback settings are present when the profile setting is parsed. Previously, it was
unclear in the code which setting is used ie. if the profiles settings (with removed
prefixes) or the global node setting. There is no distinction anymore since we don't pull
prefix based settings.
This change adds validation to the RemoteClusterConnection to ensure
we always use seed nodes from the same cluster. While we still allow to use
an arbitrary cluster alias we ensure that we, once we connected to a cluster the first time,
we always check against that initial cluster name when we execute a seed node handshake.
sequence number data in Lucene commit points. Instead, the test
retrieves the _seq_no value from the commit point directly and converts
it to a Long value.
This change adds a basic unit test for the SuggestionSearchContext that is
created as output of SuggestionBuilder#build. The current test only adds checks
for the common fields (like text, prefix, fieldName etc...).
Relates to #17118
* Refactor PathTrie and RestController to use a single trie for all methods
This changes `PathTrie` and `RestController` to use a single `PathTrie` for all
endpoints, it also allows retrieving the endpoints' supported HTTP methods more
easily.
This is a spin-off and prerequisite of #24437
* Use EnumSet instead of multiple if conditions
* Make MethodHandlers package-private and final
* Remove duplicate registerHandler method
* Remove public modifier
Today when we run out of disk all kinds of crazy things can happen
and nodes are becoming hard to maintain once out of disk is hit.
While we try to move shards away if we hit watermarks this might not
be possible in many situations. Based on the discussion in #24299
this change monitors disk utilization and adds a flood-stage watermark
that causes all indices that are allocated on a node hitting the flood-stage
mark to be switched read-only (with the option to be deleted). This allows users to react on the low disk
situation while subsequent write requests will be rejected. Users can switch
individual indices read-write once the situation is sorted out. There is no
automatic read-write switch once the node has enough space. This requires
user interaction.
The flood-stage watermark is set to `95%` utilization by default.
Closes#24299
This commit causes a replica to throwback its local checkpoint to the
global checkpoint when learning of a new primary through a replica
operation.
Relates #25452
In 6.x we prevent multiple types and default to `index.mapping.single_type: false`
This change removes the registered setting and ensures that it's preserved for
5.x indices.
Relates to #24961
All query builders written as self contained xContent objects, to we should mark
them accordingly using ToXContentObject. This also makes it possible to use
things like XContentHelper#toXContent to render query builders in tests.
* Adds rewrite phase to aggregations
This change adds aggregations to the rewrite performed by the `SearchSourceBuilder`. This means that `AggregationBuilder`s are able to implement a `rewrite()` method where they can return a new `AggregationBuilder` which is functionally the same but in a more primitive form. This is exactly analogous to the rewrite done by the `QueryBuilder`s.
The first aggregation to implement the rewrite are the filter and filters aggregations so they can rewrite the filters they contain.
Closes#17676
* Removes rewrite from PipelineAggregationBuilder
Rewrite is based on shard level information. Since pipeline aggregation are run in the reduce phase it doesn’t make sense to rewrite them on the shards. In fact eventually we shouldn’t be transporting them to the shards at all and should be retaining them on the coordinating node for execution in the reduce phase
* Addresses review comments
* addresses more review comments
* Fixed imports
The constructor using `types` has been deprecated for a while now (starting with
ES 5.1.). It can be removed in the next mayor version. Since types are optional
they should be added with the #types() setter.
* Adds check for negative search request size
This change adds a check to `SearchSourceBuilder` to throw and exception if the size set on it is set to a negative value.
Closes#22530
* fix error in reindex
* update re-index tests
* Addresses review comment
* Fixed tests
* Added random negative size test
* Fixes test
QueryParseContext is currently only used as a wrapper for an XContentParser, so
this change removes it entirely and changes the appropriate APIs that use it so
far to only accept a parser instead.
We have two ways to filter XContent:
- The first method is to parse the XContent as a map and use
XContentMapValues.filter(). This method filters the content of the map
using an automaton. It is used for source filtering, both at search and
indexing time. It performs well but can generate a lot of objects and
garbage collections when large XContent are filtered. It also returns
empty objects (see f2710c16eb) when all
the sub fields have been filtered out and handle dots in field names as
if they were sub fields.
- The second method is to parse the XContent and copy the XContentParser
structure to a XContentBuilder initialized with includes/excludes
filters. This method uses the Jackson streaming filter feature. It is
used by the Response Filtering ('filter_path') feature. It does not
generate a lot of objects, and does not return empty objects and also
does not handle dots in field names explicitely.
Both methods have similar goals but different tests. This commit changes
the current XContentBuilder test class so that it becomes a more generic
testing class and we can now ensure that filtering methods generate the
same results.
It also removes some tests from the XContentMapValuesTests class that
should be in XContentParserTests.
The significance aggs return Lucene index-level statistics that when merged are assumed to be from different shards. The Aggregator unit tests assume segments can be treated as shards and thus break the significance stats and introduce double-counting of background doc frequencies. This change addresses this problem by ensuring test indexes have only one shard.
Closes#25429
If all nodes get disconnected before we can send the request we might
try to reconnect and that will fail with an ISE instead of the a transport
exception.
Closes#25301
ensureYellow ensures at least yellow.
Also, since we only have 1 replica, we don't need to index for it to know about the primary term promotion
Closes#25287
This commit makes the use of the global network settings explicit instead
of implicit within NetworkService. It cleans up several places where we fall
back to the global settings while we should have used tcp or http ones.
In addition this change also removes unnecessary settings classes
The replica replication response object has an extra allocationId field that contains the allocation id of the replica on which the request was executed. As we are sending the allocation id with the actual replica replication request, and check when executing the replica replication action that the allocation id of the replica shard is what we expect, there is no need to communicate back the allocation id as part of the response object.
When a user requests a cluster allocation explain in a situation where
it does not make sense (for example, there are no unassigned shards), we
should consider this a bad request instead of a server error. Yet, today
by throwing an illegal state exception, these are treated as server
errors. This commit adjusts these so that they throw illegal argument
exceptions and are treated as bad requests.
Relates #25503
This commit adds a test for a scenario where a replica receives an extra
document that the promoted replica does not receive, misses the
primary/replica re-sync, and the recovers from the newly-promoted
primary.
Relates #25493
Failing to do so can cause other errors later on during query execution.
For example if `WrapperQueryBuilder` wraps a `GeoShapeQueryBuilder` that fetches the shape from an index then it will skip the shape fetching
and fail later with the error that no shapes have been fetched.
This commit adds an LRU set to used to determine if a keyed deprecation
message should be written to the deprecation logs, or only added to the
response headers on the thread context.
Relates #25474
We have various assertions that check we never block on transport
threads. This commit adds the thread names for the NioTransport to
these assertions.
With this change I had to fix two places where we were calling blocking
methods from the transport threads.
Currently QueryParseContext is only a thin wrapper around an XContentParser that
adds little functionality of its own. I provides helpers for long deprecated
field names which can be removed and two helper methods that can be made static
and moved to other classes. This is a first step in helping to remove
QueryParseContext entirely.
* Promote replica on the highest version node
This changes the replica selection to prefer to return replicas on the highest
version when choosing a replacement to promote when the primary shard fails.
Consider this situation:
- A replica on a 5.6 node
- Another replica on a 6.0 node
- The primary on a 6.0 node
The primary shard is sending sequence numbers to the replica on the 6.0 node and
skipping sending them for the 5.6 node. Now assume that the primary shard fails
and (prior to this change) the replica on 5.6 node gets promoted to primary, it
now has no knowledge of sequence numbers and the replica on the 6.0 node will be
expecting sequence numbers but will never receive them.
Relates to #10708
* Switch from map of node to version to retrieving the version from the node
* Remove uneeded null check
* You can pretend you're a functional language Java, but you're not fooling me.
* Randomize node versions
* Add test with random cluster state with multiple versions that fails shards
* Re-add comment and remove extra import
* Remove unneeded stuff, randomly start replicas a few more times
* Move test into FailedNodeRoutingTests
* Make assertions actually test replica version promotion
* Rewrite test, taking Yannick's feedback into account
When a setting is deprecated, if that setting is used repeatedly we
currently emit a deprecation warning every time the setting is used. In
cases like hitting settings endpoints over and over against a node with
a lot of deprecated settings, this can lead to excessive deprecation
warnings which can crush a node. This commit ensures that a given
setting only sees deprecation logging at most once.
Relates #25457
In #24477, a less verbose option was added to retrieve snapshot info via
GET /_snapshot/{repo}/{snapshots}. The point of adding this less
verbose option was so that if the repository is a cloud based one, and
there are many snapshots for which the snapshot info needed to be
retrieved, then each snapshot would require reading a separate snapshot
metadata file to pull out the necessary information. This can be costly
(performance and cost) on cloud based repositories, so a less verbose
option was added that only retrieves very basic information about each
snapshot that is all available in the index-N blob - requiring only one
read!
In order to display this less verbose snapshot info appropriately, logic
was added to not display those fields which could not be populated.
However, this broke integrators (e.g. ECE) that required these fields to
be present, even if empty. This commit is to return these fields in the
response, even if empty, if the verbose option is set.
This commit fixes a race condition in the node supplier used by the RemoteClusterConnection. The
node supplier stores an iterator over a set backed by a ConcurrentHashMap, but the get operation
of the supplier uses multiple methods of the iterator and is suceptible to a race between the
calls to hasNext() and next(). The test in this commit fails under the old implementation with a
NoSuchElementException. This commit adds a wrapper object over a set and a iterator, with all methods
being synchronized to avoid races. Modifications to the set result in the iterator being set to null
and the next retrieval creates a new iterator.
This commit changes how we determine if there were any remote indices that a search should have
been executed against. Previously, we used the list of remote shard iterators but if the remote
index pattern resolved to no indices there would be no remote shard iterators even though the
request specified remote indices. The map of remote cluster names to the original indices is used
instead so that we can determine if there were remote indices even when there are no remote shard
iterators.
Closes#25426
Expand `/_cat/nodes` with already present information about available disk space `diskAvail` (alias: `d`, `disk`) by:
* `diskTotal` (alias `dt`): total disk space
* `diskUsed` (alias `du`): used disk space (`diskTotal - diskAvail`)
* `diskUsedPercent` (alias `dup`): used disk space percentage
Note: The available disk space is the number of bytes available to the node's Java virtual machine. The size might be smaller than the real one. That means the used disk space (percentage) is larger.
Closes#21679
This commit introduces a nio based tcp transport into framework for
testing.
Currently Elasticsearch uses a simple blocking tcp transport for
testing purposes (MockTcpTransport). This diverges from production
where our current transport (netty) is non-blocking.
The point of this commit is to introduce a testing variant that more
closely matches the behavior of production instances.
This catches `AlreadyClosedException` during `stats` calls to avoid failing a `_nodes/stats` request because of the ignorable, concurrent index closure.
When relocating a shard before changing the state to relocated, we
verify that a relocation is a still taking place. Yet, this can throw an
exception if the relocation is in fact no longer valid. Sadly, we were
swallowing the exception in this situation. This commit allows such an
exception to bubble up after safely releasing resources.
We previously tried to maintain (while not formally supporting) 32-bit
support, although we never tested this anywhere in CI. Since we do not
formally support this, and 32-bit usage is very low, we have elected to
no longer maintain 32-bit support. This commit removes any implication
of 32-bit support.
Relates #25435
The primary shard uses the GlobalCheckPointTracker to track local checkpoint information of recovering and started replicas in order to calculate the global checkpoint. As the tracker is updated through recoveries as well, it is easier to reason about the tracker if we can ensure that there are no concurrent recovery attempts for the same target shard (which can happen in case of network disconnects).
When a replica shard increases its primary term under the mandate of a new primary, it should also update its global checkpoint; this gives us the guarantee that its global checkpoint is at least as high as the new primary and gives a starting point for the primary/replica resync.
Relates to #25355, #10708
We already have these tests in InternalAggregationTestCase to check random insertions into the response xContent so that we don't fail on future changes in the response format. This change adds the same to AggregationsTests and runs on a whole aggregations tree. Unfortunately we need to exclude many places in the xContent from random insertion, but I added a long comment trying to explaine those.
This commit marks a failing test as awaits fix. The test is failing due
to a primary shard not knowing its own local checkpoint in the global
checkpoint tracker after recovery. If such a shard becomes primary after
promotion, and is then subsequently relocated, it can lead to a
violation of an assertion that when the primary context is transferred
the knowledge of all in-sync local checkpoints is consistent with the
global checkpoint on the relocation target.
Relates #25415
This commit removes the default path settings for data and logs. With
this change, we now ship the packages with these settings set in the
elasticsearch.yml configuration file rather than going through the
default.path.data and default.path.logs dance that we went through in
the past.
Relates #25408
Today we load plugins reflectively, looking for constructors that
conform to specific signatures. This commit tightens the reflective
operations here, not allowing plugins to have ambiguous constructors.
Relates #25405
This commit removes path.conf as a valid setting and replaces it with a
command-line flag for specifying a non-default path for configuration.
Relates #25392
This commit removes an abstraction that was introduced when introducing
the primary context. As this abstraction is used in exactly one place,
we simply make that abstraction local to its usage so that we do not
accumulate yet another general abstraction with exactly one usage.
Relates #25402
This commit updates some assertions in the primary context sealing test
after the restriction on updating allocation IDs from master and
updating global checkpoint on replica while sealed were removed.
* Introduce primary context
The target of a primary relocation is not aware of the state of the
replication group. In particular, it is not tracking in-sync and
initializing shards and their checkpoints. This means that after the
target shard is started, its knowledge of the replication group could
differ from that of the relocation source. In particular, this differing
view can lead to it computing a global checkpoint that moves backwards
after it becomes aware of the state of the entire replication
group. This commit addresses this issue by transferring a primary
context during relocation handoff.
* Fix test
* Add assertion messages
* Javadocs
* Barrier between marking a shard in sync and relocating
* Fix misplaced call
* Paranoia
* Better latch countdown
* Catch any exception
* Fix comment
* Fix wait for cluster state relocation test
* Update knowledge via upate local checkpoint API
* toString
* Visibility
* Refactor permit
* Push down
* Imports
* Docs
* Fix compilation
* Remove assertion
* Fix compilation
* Remove context wrapper
* Move PrimaryContext to new package
* Piping for cluster state version
This commit adds piping for the cluster state version to the global
checkpoint tracker. We do not use it yet.
* Remove unused import
* Implement versioning in tracker
* Fix test
* Unneeded public
* Imports
* Promote on our own
* Add tests
* Import
* Newline
* Update comment
* Serialization
* Assertion message
* Update stale comment
* Remove newline
* Less verbose
* Remove redundant assertion
* Tracking -> in-sync
* Assertions
* Just say no
Friends do not let friends block the cluster state update thread on
network operations.
* Extra newline
* Add allocation ID to assertion
* Rename method
* Another rename
* Introduce sealing
* Sealing tests
* One more assertion
* Fix imports
* Safer sealing
* Remove check
* Remove another sealed check
The following token filters were moved: stemmer, stemmer_override, kstem, dictionary_decompounder, hyphenation_decompounder, reverse, elision and truncate.
Relates to #23658
While real secure settings (ie an ES keystore) cannot be merged
together, mocked secure settings can and need to be sometimes merged.
This commit adds a merge method to allow tests to merge together
multiple instances of secure settings.
This change removes the remaining explicitly specified `index.mapper.single_type`
settings from tests in order to allow the removal of the setting.
This is the already approved part of #25375 broken out to simplfiy reviews on
When Log4j 2 was introduced, we removed support for the system property
es.logger.prefix. Yet, some code was left behind. This commit removes
that dead code.
Relates #25377
Added unit test coverage for GlobalOrdinalsSignificantTermsAggregator, GlobalOrdinalsSignificantTermsAggregator.WithHash, SignificantLongTermsAggregator and SignificantStringTermsAggregator.
Removed integration test.
Relates #22278
This change cleans up remaining tests to not use index.mapping.single_type=false
but instead where applicable use a single type or markt the index as created
with a pre 6.x version.
Yet, there is still on leftover in the client tests that needs special attention.
See `org.elasticsearch.client.SearchIT`
Relates to #24961
OldIndexBackwardsCompatibilityIT#testOldClusterStates tested whether global and index metadata could be read from data directory,
this can also be tested in full cluster qa test that checks cluster state via api.
Relates to #24939