With #29331 we added support for the cluster health API to the
high-level REST client. The transport client does not support the level
parameter, and it always returns all the info needed for shards level
rendering. We have maintained that behaviour when adding support for
cluster health to the high-level REST client, to ease migration, but the
correct thing to do is to default the high-level REST client to
`cluster` level, which is the same default as when going through the
Elasticsearch REST layer.
If the publishing of a cluster state to a node fails, we currently only log it as debug information and
only on the master. This makes it hard to see the cause of (test) failures when logging is set to
default levels. This PR adds a warn level log on the node receiving the cluster state when it fails to
deserialise the cluster state and a warn level log on the master with a list of nodes for which
publication failed.
Today, if GET /_cluster/health?wait_for_active_shards=all does not immediately
succeed then it throws an exception due to an erroneous and unnecessary call to
ActiveShardCount#enoughShardsActive(). This commit fixes this logic.
Fixes#31151
TransportAction has many variants of execute. One of those variants
executes by returning a future, which is then often blocked on by
calling get(). This commit removes this variant of execute, instead
using a helper method for tests that want to block, or having tests
pass in a PlainActionFuture directly as a listener.
Co-authored-by: Simon Willnauer <simonw@apache.org>
Here is the problem: if two threads are racing and one hits a failure
freeing a context and the other succeeded, we can expose the value of
the has failure marker to the succeeding thread before the failing
thread has had a chance to set the failure marker. This is a problem if
the failing thread counted down the expected number of operations, then
be put to sleep by a gentle lullaby from the OS, and then the other
thread could count down to zero. Since the failing thread did not get to
set the failure marker, the succeeding thread would respond that the
clear scroll succeeded and that makes that thread a liar. This commit
addresses by first setting the failure marker before we potentially
expose its value to another thread.
Given the weirdness of the response returned by the get alias API, we went for a client specific response, which allows us to hold the error message, exception and status returned as part of the response together with aliases. See #30536 .
Relates to #27205
This adds a thread interrupter that allows us to encapsulate calls to org.joni.Matcher#search()
This method can hang forever if the regex expression is too complex.
The thread interrupter in the background checks every 3 seconds whether there are threads
execution the org.joni.Matcher#search() method for longer than 5 seconds and
if so interrupts these threads.
Joni has checks that that for every 30k iterations it checks if the current thread is interrupted and
if so returns org.joni.Matcher#INTERRUPTED
Closes#28731
This filesystem needs to be suppressed during these tests because it
adds random files to the directory upon directory creation. That means
that the size of these directories is off from what we expect them to
be. Rather than loosening the assertion which could hide bugs on real
directories, this commit suppresses this file system in this test suite.
This removes the abstract `getTranslog` method in `Engine`, instead leaving it
to the abstract implementations of the other methods that use the translog. This
allows future Engines not to have a Translog, as instead they must implement the
methods that use the translog pieces to return necessary values.
This test was failing from time to time due to a ConcurrentModificationException, which
was triggered due to the primary-replica resync running concurrently with shards being
removed.
Closes#30767
With `max_concurrent_shard_requests` we used to throttle / limit
the number of concurrent shard requests a high level search request
can execute per node. This had several problems since it limited the
number on a global level based on the number of nodes. This change
now throttles the number of concurrent requests per node while still
allowing concurrency across multiple nodes.
Closes#31192
Previously this was called for the combine script only. This change checks for self references for
init, map, and reduce scripts as well, and adds unit test coverage for the init, map, and combine cases.
* Fully encapsulate LocalCheckpointTracker inside of the engine
This makes the Engine interface not expose the `LocalCheckpointTracker`, instead
exposing the pieces needed (like retrieving the local checkpoint) as individual
methods.
* Remove DocumentFieldMappers#simpleMatchToFullName, as it is duplicative of MapperService#simpleMatchToIndexNames.
* Rename MapperService#simpleMatchToIndexNames -> simpleMatchToFullName for consistency.
* Simplify EsIntegTestCase#assertConcreteMappingsOnAll to accept concrete fields instead of wildcard patterns.
The following analyzers were moved from server module to analysis-common module:
`snowball`, `arabic`, `armenian`, `basque`, `bengali`, `brazilian`, `bulgarian`,
`catalan`, `chinese`, `cjk`, `czech`, `danish`, `dutch`, `english`, `finnish`,
`french`, `galician` and `german`.
Relates to #23658
We moved to 1 shard by default which caused some issues in how many
concurrent shard requests we allow by default. For instance searching
a 5 shard index on a single node will now be executed serially per shard
while we want these cases to have a good concurrency out of the box. This
change moves to `numNodes * 5` which corresponds to the default we used to
have in the previous version.
Relates to #30783Closes#30994
* Initial commit of rest high level exposure of cancel task
* fix javadocs
* address some code review comments
* update branch to use tasks namespace instead of cluster
* High-level client: list tasks failure to not lose nodeId
This commit reworks testing for `ListTasksResponse` so that random
fields insertion can be tested and xcontent equivalence can be checked
too. Proper exclusions need to be configured, and failures need to be
tested separately. This helped finding a little problem, whenever there
is a node failure returned, the nodeId was lost as it was never printed
out as part of the exception toXContent.
* added comment
* merge from master
* re-work CancelTasksResponseTests to separate XContent failure cases from non-failure cases
* remove duplication of logic in parser creation
* code review changes
* refactor TasksClient to support RequestOptions
* add tests for parent task id
* address final PR review comments, mostly formatting and such
This is related to #27260 and #28898. This commit adds the transport-nio
plugin as a random option when running the http smoke tests. As part of
this PR, I identified an issue where cors support was not properly
enabled causing these tests to fail when using transport-nio. This
commit also fixes that issue.
Adds support for `ignore_unmapped` parameter in geo distance sorting,
which is functionally equivalent to specifying an `unmapped_type` in
the field sort.
Closes#28152
Several AcknowledgedResponse implementations only parse the boolean acknowledged
flag and then create an instance of their class using that flag. This can be
simplified by adding this basic parser to the superclass, provide a common
helper method and call the appropriate ctor in the fromXContent methods.
This change moves an integration test that relies on setting
the value of a static variable (boolean max clause count) to
an unit test where we are sure that the same jvm is used to access
the static variable.
By default span_multi query will limit term expansions = boolean max clause.
This will limit high heap usage in case of high cardinality term
expansions. This applies only if top_terms_N is not used in inner multi
query.
This is related to #28898. This commit adds the acceptor thread name to
the method checking if this thread is a transport thread. Additionally,
it modifies the nio http transport to use the same worker name as the
netty4 http server transport.
This is related to #27260. This commit combines the AcceptingSelector
and SocketSelector classes into a single NioSelector. This change
allows the same selector to handle both server and socket channels. This
is valuable as we do not necessarily want a dedicated thread running for
accepting channels.
With this change, this commit removes the configuration for dedicated
accepting selectors for the normal transport class. The accepting
workload for new node connections is likely low, meaning that there is
no need to dedicate a thread to this process.
Currently the engine is initialized with a hardcoded 256MB of RAM. Elasticsearch
may never use more than that for a given shard, `IndexingMemoryController` only
has the power to flush segments to disk earlier in case multiple shards are
actively indexing and use too much memory.
While this amount of memory is enough for an index with few fields and larger
RAM buffers are not expected to improve indexing speed, this might actually be
little for an index that has many fields.
Kudos to @bleskes for finding it out when looking into a user who was reporting
a **much** slower indexing speed when upgrading from 2.x to 5.6 with an index
that has about 20,000 fields.
We currently have a specific REST action to retrieve all indices and types mappings, which used internally the get index API. This doesn't seem to be required anymore though as the existing RestGetMappingAction could as well take the requests with no indices and types specified.
This commit removes the RestGetAllMappingsAction in favour of using RestGetMappingAction also for requests that don't specify indices nor types.
The Index Audit trail allows the override of the template index
settings with settings specified on the conf file.
A bug will manifest when such conf file settings are specified
for templates that need to be upgraded. The bug is an endless
upgrade loop because the upgrade, although successful, is
not reckoned as such by the upgrade service.
move `finger_print`, `pattern` and `standard_html_strip` analyzers
to analysis-common module. (both AnalysisProvider and PreBuiltAnalyzerProvider)
Changed PreBuiltAnalyzerProviderFactory to extend from PreConfiguredAnalysisComponent and
changed to make sure that predefined analyzers are always instantiated with the current
ES version and if an instance is requested for a different version then delegate to PreBuiltCache.
This is similar to the behaviour that exists today in AnalysisRegistry.PreBuiltAnalysis and
PreBuiltAnalyzerProviderFactory. (#31095)
Relates to #23658
This is related to #28898. This commit adds cors support to the nio http
transport. Most of the work is copied directly from the netty module
implementation. Additionally, this commit adds tests for the nio http
channel.
When the last indexing operation is completed, we will fire a global
checkpoint sync. Since a global checkpoint sync request is a replication
request, it will acquire an index shard permit on the primary when
executing. If this happens at the same time while we are issuing the
synced-flush, the synced-flush request will fail as it thinks there are
in-flight operations. We can avoid such situation by retrying another
synced-flush if the current request fails due to ongoing operations on
the primary.
Closes#29392
This commit adds a new writeBlobAtomic() method to the BlobContainer
interface that can be implemented by repository implementations which
support atomic writes operations.
When the BlobContainer implementation does not provide a specific
implementation of writeBlobAtomic(), then the writeBlob() method is used.
Related to #30680
This will be necessary for the `docvalue_fields` option to work correctly once
we use the field's doc-value format to format doc-value fields. Binary values
are formatted as base64-encoded strings.
In spite of the existing caching, I have seen a number of nodes hot threads
where one thread had been spending all its cpu on computing the size of a
directory. I am proposing to move the computation of the size of the directory
to `StoreDirectory` in order to skip recomputing the size of the directory if
no changes have been made. This should help with users that have read-only
indices, which is very common for time-based indices.
Currently this class takes care of moth selecting the relevant value, and
replacing missing values if any. This is fine for sorting, which always needs
to do both at the same time, but we also have a number of aggregations and
script utils that need to retain information about missing values so this change
proposes to decouple selection of the relevant value and replacement of missing
values.
* Fix index prefixes to work with span_multi
Text fields that use `index_prefixes` can rewrite `prefix` queries into
`term` queries internally. This commit fix the handling of this rewriting
in the `span_multi` query.
This change also copies the index options of the text field into the
prefix field in order to be able to run positional queries. This is mandatory
for `span_multi` to work but this could also be useful to optimize `match_phrase_prefix`
queries in a follow up. Note that this change can only be done on indices created
after 6.3 since we set the index options to doc only in this version.
Fixes#31056
ObjectParser should throw XContentParseExceptions, not IAE. A dedicated parsing
exception can includes the place where the error occurred.
Closes#30605
This snapshot includes:
- LUCENE-8341: Record soft deletes in SegmentCommitInfo which will resolve#30851
- LUCENE-8335: Enforce soft-deletes field up-front
When `lenient=false`, attempts to create match phrase queries with custom analyzers against non-text fields will throw an IllegalArgumentException.
Also changes `*Match*QueryBuilderTests` so that it avoids this scenario
Fixes#31061
The majority of Responses inheriting from AcknowledgeResponse implement
the readFrom and writeTo serialization method in the same way. Moving this
as a default into AcknowledgeResponse and letting the few exceptions that
need a slightly different implementation handle this themselves saves a lot
of duplication.
Specifying `index_phrases: true` on a text field mapping will add a subsidiary
[field]._index_phrase field, indexing two-term shingles from the parent field.
The parent analysis chain is re-used, wrapped with a FixedShingleFilter.
At query time, if a phrase match query is executed, the mapping will redirect it
to run against the subsidiary field.
This should trade faster phrase querying for a larger index and longer indexing
times.
Relates to #27049
This commit fixes an issue with
PersistentTasksCustomMetaDataTests#testMinVersionSerialization. There
were two problems here:
- some versions do not have future compatible version (e.g., betas)
- the feature logic was incorrect
With #31020 we introduced the ability for transport clients to indicate what features they support
in order to make sure we don't serialize object to them they don't support. This PR adapts the
serialization logic of persistent tasks to be aware of those features and not serialize tasks that
aren't supported.
Also, a version check is added for the future where we may add new tasks implementations and
need to be able to indicate they shouldn't be serialized both to nodes and clients.
As the implementation relies on the interface of `PersistentTaskParams`, these are no longer
optional. That's acceptable as all current implementation have them and we plan to make
`PersistentTaskParams` more central in the future.
Relates to #30731
We compute a random version and later try to compute the version prior
that random version. If the random version is the earliest version in
our list of versions then it, by definition, does not have a previous
version. Yet trying to find its previous is someting we do and so the
test fails. This commit adds a version check to the randomization so
that we do not select the earliest version in our list.
This is related to #31017. That issue identified that these three http
methods were treated like GET requests. This commit adds them to
RestRequest. This means that these methods will be handled properly and
generate 405s.
This commit introduces the ability for a client to communicate to the
server features that it can support and for these features to be used in
influencing the decisions that the server makes when communicating with
the client. To this end we carry the features from the client to the
underlying stream as we carry the version of the client today. This
enables us to enhance the logic where we make protocol decisions on the
basis of the version on the stream to also make protocol decisions on
the basis of the features on the stream. With such functionality, the
client can communicate to the server if it is a transport client, or if
it has, for example, X-Pack installed. This enables us to support
rolling upgrades from the OSS distribution to the default distribution
without breaking client connectivity as we can now elect to serialize
customs in the cluster state depending on whether or not the client
reports to us using the feature capabilities that it can under these
customs. This means that we would avoid sending a client pieces of the
cluster state that it can not understand. However, we want to take care
and always send the full cluster state during node-to-node communication
as otherwise we would end up with different understanding of what is in
the cluster state across nodes depending on which features they reported
to have. This is why when deciding whether or not to write out a custom
we always send the custom if the client is not a transport client and
otherwise do not send the custom if the client is transport client that
does not report to have the feature required by the custom.
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
With the default distribution changing in 6.3, clusters might now contain custom metadata that a
pure OSS transport client cannot deserialize. As this can break transport clients when accessing
the cluster state or reroute APIs, we've decided to exclude any custom metadata that the transport
client might not be able to deserialize. This will ensure compatibility between a < 6.3 transport
client and a 6.3 default distribution cluster. Note that this PR only covers interoperability with older
clients, another follow-up PR will cover full interoperability for >= 6.3 transport clients where we will
make it possible again to get the custom metadata from the cluster state.
Relates to #30731
This change adds an option named `split_queries_on_whitespace` to the `keyword`
field type. When set to true full text queries (`match`, `multi_match`, `query_string`, ...) that target the field will split the input on whitespace to build the query terms. Defaults to `false`.
Closes#30393
The randomized alias names could contain unicode controll charactes that don't
survive an xContent rendering and parsing roundtrip when using the YAML xContent
type. This fix filters the randomized unicode string for control characters to
avoid this particular problem.
Closes#30911
In case an error is returned when calling search_shards on a remote
cluster, which will lead to throwing an exception in the coordinating
node, we should make sure that the status code returned by the
coordinating node is the same as the one returned by the remote
cluster. Up until now a 500 - Internal Server Error was always
returned. This commit changes this behaviour so that for instance if an
index is not found, which causes an 404, a 404 is also returned by the
coordinating node to the client.
Closes#27461
This commit reworks testing for `ListTasksResponse` so that random
fields insertion can be tested and xcontent equivalence can be checked
too. Proper exclusions need to be configured, and failures need to be
tested separately. This helped finding a little problem, whenever there
is a node failure returned, the nodeId was lost as it was never printed
out as part of the exception toXContent.
This is related to #30141. Right now in the transport client we open a
temporary node connection and take the node information. This node
information is used to open a permanent connection that is used for the
client. However, we continue to use the configured transport address.
If the configured transport address is a load balancer, you might
connect to a different node for the permanent connection. This causes
the handshake validation to fail. This commit removes the handshake
validation for the transport client when it simple node sample mode.
Since master will always communicate with a >=6.4 node, the logic for
checking if the node is 6.4 and conditionally reading and writing based
on that can be removed from master. This logic will stay in 6.x as it is
the bridge to the cleaner response in master. This also unmutes the
failing test due to this bwc change.
Closes#30807
This commit removes the RequestBuilder generic type from Action. It was
needed to be used by the newRequest method, which in turn was used by
client.prepareExecute. Both of these methods are now removed, along with
the existing users of prepareExecute constructing the appropriate
builder directly.
This change deprecates completion queries and documents without context that target a
context enabled completion field. Querying without context degrades the search
performance considerably (even when the number of indexed contexts is low).
This commit targets master but the deprecation will take place in 6.x and the functionality
will be removed in 7 in a follow up.
Closes#29222
This commit adds Verify Repository, the associated docs and tests for
the high level REST API client. A few small changes to the Verify
Repository Response went into the commit as well.
Relates #27205
Currently failures to compile a script usually lead to a ScriptException, which
inherits the 500 INTERNAL_SERVER_ERROR from ElasticsearchException if it does
not contain another root cause. Instead, this should be a 400 Bad Request error.
This PR changes this more generally for script compilation errors by changing
ScriptException to return 400 (bad request) as status code.
Closes#12315
When we are connecting to a remote cluster we should never select
dedicated master nodes as gateway nodes, or we will end up loading them
with requests that should rather go to other type of nodes e.g. data
nodes or coord_only nodes.
This commit adds the selection based on the node role, to the existing
selection based on version and potential node attributes.
Closes#30687
AliasMetaData should be parsed more leniently so that the high-level REST client can support forward compatibility on it. This commit addresses this issue that was found as part of #28799 and adds dedicated XContent tests as well.
With multiple data paths, we write the state files for index metadata to all data paths. We only properly fsync on the first location, though. For other locations, we possibly expose the file before its contents is properly fsynced. This can lead to situations where, after a crash, and where the first data path is not available anymore, ES will see a partially-written state file, preventing the node to start up.
This change adds a new option to the composite aggregation named `missing_bucket`.
This option can be set by source and dictates whether documents without a value for the
source should be ignored. When set to true, documents without a value for a field emits
an explicit `null` value which is then added in the composite bucket.
The `missing` option that allows to set an explicit value (instead of `null`) is deprecated in this change and will be removed in a follow up (only in 7.x).
This commit also changes how the big arrays are allocated, instead of reserving
the provided `size` for all sources they are created with a small intial size and they grow
depending on the number of buckets created by the aggregation:
Closes#29380