Today we have several code paths to merge top docs based on the number of
search results returned from the shards. If there is a only a single shard
holding any hits we go a different code path with quite some complexity while
if there are more than one the code is basically duplicated to safe the
creation of a dense array of top docs which can be large if there are many results.
This commit removes the need of the dense array and in-turn the justification for
the optimization. This commit introduces a single code path to merge top docs.
The InternalEngine Index/Delete methods (plus satellites like version loading from Lucene) have accumulated some cruft over the years making it hard to clearly the code flows for various use cases (primary indexing/recovery/replicas etc). This PR refactors those methods for better readability. The methods are broken up into smaller sub methods, albeit at the price of less code I reused.
To support the refactoring I have considerably beefed up the versioning tests.
This PR is a spin-off from #23543 , which made it clear this is needed.
The purpose of this validation is to make sure that the master doesn't step down
due to a change in master nodes, which also means that there is no way to revert
an accidental change. Since we validate using the current cluster state (and
not the one from which the settings come from) we have to be careful and only
validate if the local node is already a master. Doing so all the time causes
subtle issues. For example, a node that joins a cluster has no nodes in its
current cluster state. When it receives a cluster state from the master with
a dynamic minimum master nodes setting int it, we must make sure we don't reject it.
Closes#23695
SingleNodeDiscoveryIT uses a hardcoded port for the purpose of binding
two nodes within the limited port range that an unconfigured unicast zen
ping hosts list would try to discover another node on. This commit at
least removes this hardcoding for the first node to come up, although
still tries to bind the second node to the limited port range after the
first node has bound.
This commit makes closing a ReleasableBytesStreamOutput release the underlying BigArray so
that we can use try-with-resources with these streams and avoid leaking memory by not returning
the BigArray. As part of this change, the ReleasableBytesStreamOutput adds protection to only release the BigArray once.
In order to make some of the changes cleaner, the ReleasableBytesStream interface has been
removed. The BytesStream interface is changed to a abstract class so that we can use it as a
useable return type for a new method, Streams#flushOnCloseStream. This new method wraps a
given stream and overrides the close method so that the stream is simply flushed and not closed.
This behavior is used in the TcpTransport when compression is used with a
ReleasableBytesStreamOutput as we need to close the compressed stream to ensure all of the data
is written from this stream. Closing the compressed stream will try to close the underlying stream
but we only want to flush so that all of the written bytes are available.
Additionally, an error message method added in the BytesRestResponse did not use a builder
provided by the channel and instead created its own JSON builder. This changes that method to use the channel builder and in turn the bytes stream output that is managed by the channel.
This commit renames the random ASCII helper methods in ESTestCase. This
is because this method ultimately uses the random ASCII methods from
randomized runner, but these methods actually only produce random
strings generated from [a-zA-Z].
Relates #23886
This commit adds a description for a parameter that was added to
BootstrapChecks#enforceLimits(BoundTransportAddress, String) without the
Javadocs having been updated.
While there are use-cases where a single-node is in production, there
are also use-cases for starting a single-node that binds transport to an
external interface where the node is not in production (for example, for
testing the transport client against a node started in a Docker
container). It's tricky to balance the desire to always enforce the
bootstrap checks when a node might be in production with the need for
the community to perform testing in situations that would trip the
bootstrap checks. This commit enables some flexibility for these
users. By setting the discovery type to "single-node", we disable the
bootstrap checks independently of how transport is bound. While this
sounds like a hole in the bootstrap checks, the bootstrap checks can
already be avoided in the single-node use-case by binding only HTTP but
not transport. For users that are genuinely in production on a
single-node use-case with transport bound to an external use-case, they
can set the system property "es.enable.bootstrap.checks" to force
running the bootstrap checks. It would be a mistake for them not to do
this.
Relates #23598
This change adds a setting property that sets the value of a setting as final.
Updating a final setting is prohibited in any context, for instance an index setting
marked as final must be set at index creation and will refuse any update even if the index is closed.
This change also marks the setting `index.number_of_shards` as Final and the special casing for refusing the updates on this setting has been removed.
This commit adds a single node discovery type. With this discovery type,
a node will elect itself as master and never form a cluster with another
node.
Relates #23595
When terminating an executor service or a thread pool, we first
shutdown. Then, we do a timed await termination. If the await
termination fails because there are still tasks running, we then
shutdown now. However, this method does not wait for actively executing
tasks to terminate, so we should again wait for termination of these
tasks before returning. This commit does that.
Relates #23889
If a test touches ElasticsearchExceptionHandle before the class
initialzer for ElasticsearchException has run, a circular class
initialization problem can arise. Namely, the class initializer for
ElasticsearchExceptionHandle depends on the class initializer for
ElasticsearchExceptionHandle which depends on the class initializer for
all the classes that extend ElasticsearchException, but these classes
can not be loaded because ElasticsearchException has not finished its
class initializer. There are tests that can trigger this before
ElasticsearchException has been loaded due to an unlucky ordering of
test execution. This commit addresses this issue by making
ElasticsearchExceptionHandle private, and then exposing methods that
provide the necessary values from ElasticsearchExceptionHandle. Touching
these methods will force the class initializer for
ElasticsearchException to run first.
Currently for field sorting we always use a custom sort field and a custom comparator source.
Though for numeric fields this custom sort field could be replaced with a standard SortedNumericSortField unless
the field is nested especially since we removed the FieldData for numerics.
We can also use a SortedSetSortField for string sort based on doc_values when the field is not nested.
This change replaces IndexFieldData#comparatorSource with IndexFieldData#sortField that returns a Sorted{Set,Numeric}SortField when possible or a custom
sort field when the field sort spec is not handled by the SortedSortFields.
Today we prevent nodes from joining when indices exists that are too old.
Yet, the opposite can happen too since lucene / elasticsearch is not forward
compatible when it gets to indices we won't let nodes join the cluster once
there are indices in the clusterstate that are newer than the nodes version.
This prevents forward compatibility issues which we never test against. Yet,
this will not prevent rolling restarts or anything like this since indices
are always created with the minimum node version in the cluster such that an index
can only get the version of the higher nodes once all nodes are upgraded to this version.
TopDocs et.al. got additional parameters to incrementally reduce
top docs. In order to add incremental reduction `CollapseTopFieldDocs`
needs to have the same properties.
Remote nodes in cross-cluster search can be marked as eligible for
acting a gateway node via a remote node attribute setting. For example,
if search.remote.node.attr is set to "gateway", only nodes that have
node.attr.gateway set to "true" can be connected to for cross-cluster
search. Unfortunately, there is a bug in the handling of these
attributes due to the use of a dangerous method
Boolean#getBoolean(String) which obtains the system property with
specified name as a boolean. We are not looking at system properties
here, but node settings. This commit fixes this situation, and adds a
test. A follow-up will ban the use of Boolean#getBoolean.
Relates #23863
This commit changes the ClusterStatsNodes.NetworkTypes so that is does
not print out empty field names when no Transport or HTTP type is defined:
```
{
"network_types": {
...
"http_types": {
"": 2
}
}
}
```
is now rendered as:
```
{
"network_types": {
...
"http_types": {
}
}
}
```
SearchPhaseController is tighly coupled to AtomicArray which makes
non-dense representations of results very difficult. This commit removes
the coupling and cuts over to Collection rather than List to ensure no
order or random access lookup is implied.
This change introduces a new API called `_field_caps` that allows to retrieve the capabilities of specific fields.
Example:
````
GET t,s,v,w/_field_caps?fields=field1,field2
````
... returns:
````
{
"fields": {
"field1": {
"string": {
"searchable": true,
"aggregatable": true
}
},
"field2": {
"keyword": {
"searchable": false,
"aggregatable": true,
"non_searchable_indices": ["t"]
"indices": ["t", "s"]
},
"long": {
"searchable": true,
"aggregatable": false,
"non_aggregatable_indices": ["v"]
"indices": ["v", "w"]
}
}
}
}
````
In this example `field1` have the same type `text` across the requested indices `t`, `s`, `v`, `w`.
Conversely `field2` is defined with two conflicting types `keyword` and `long`.
Note that `_field_caps` does not treat this case as an error but rather return the list of unique types seen for this field.
Today we have no way to mark an execution as internal. This commit adds
a simple thread context header that allows executing code in a system context.
This allows intercepting code can make better decisions down the road when
it gets to authentication.
This commit changes the listener passed to sendMessage from a Runnable
to a ActionListener.
This change also removes IOException from the sendMessage signature.
That signature is misleading as it allows implementers to assume an
exception will be thrown in case of failure. That does not happen due
to Netty's async nature.
When executing an update request, the request timeout is not transferred
to the index/delete request executed on behalf of the update
request. This leads to update requests not timing out when they should
(e.g., if not all shards are available when the request specifies
wait_for_shards=all with a small timeout). This commit causes the
index/delete requests to honor the update request timeout.
Relates #23825
Today we have the shard target and the target request ID available in SearchPhaseResults.
Yet, the coordinating node maintains a shard index to reference the request, response tuples
internally which is also used in many other classes to reference back from fetch results to
query results etc. Today this shard index is implicitly passed via the index in AtomicArray
which causes an undesirable dependency on this interface.
This commit moves the shard index into the SearchPhaseResult and removes some dependencies
on AtomicArray. Further removals will follow in the future. The most important refactoring here
is the removal of AtomicArray.Entry which used to be created for every element in the atomic array
to maintain the shard index during result processing. This caused an unnecessary indirection, dependency
and potentially thousands of unnecessary objects in every search phase.
I think this query should not use the hashCode provided BytesRef#hashCode().
It uses StringHelper#GOOD_FAST_HASH_SEED which is initialized in a static
block to System.currentTimeMillis().
Running this query on different replicas may return inconsistent results.
Using a fixed seed should guaranty that the docs are sliced consistently
accross replicas.
Fixes#23096
This moves `updateReplicaRequest` to `createPrimaryResponse` and separates the
translog updating to be a separate function so that the function purpose is more
easily understood (and testable).
It also separates the logic for `MappingUpdatePerformer` into two functions,
`updateMappingsIfNeeded` and `verifyMappings` so they don't do too much in a
single function. This allows finer-grained error testing for when a mapping
fails to parse or be applied.
Finally, it separates parsing and version validation for
`executeIndexRequestOnReplica` into a separate
method (`prepareIndexOperationOnReplica`) and adds a test for it.
Relates to #23359
The translog already occupies 43 bytes on disk when empty. If the
translog generation threshold is below this, the flush thread can get
stuck in an infinite loop repeatedly rolling the generation. This commit
adds a lower bound on the translog generation to avoid this problem,
however we keep the lower bound small for convenience in testing.
Relates #23779
This commit adds support for the pattern keyword marker filter in
Lucene. Previously, the keyword marker filter in Elasticsearch
supported specifying a keywords set or a path to a set of keywords.
This commit exposes the regular expression pattern based keyword marker
filter also available in Lucene, so that any token matching the pattern
specified by the `keywords_pattern` setting is excluded from being
stemmed by any stemming filters.
Closes#4877
As the query of a search request defaults to match_all,
calling _delete_by_query without an explicit query may
result in deleting all data.
In order to protect users against falling into that
pitfall, this commit adds a check to require the explicit
setting of a query.
Closes#23629
This commit adds the boolean similarity scoring from Lucene to
Elasticsearch. The boolean similarity provides a means to specify that
a field should not be scored with typical full-text ranking algorithms,
but rather just whether the query terms match the document or not.
Boolean similarity scores a query term equal to its query boost only.
Boolean similarity is available as a default similarity option and thus
a field can be specified to have boolean similarity by declaring in its
mapping:
"similarity": "boolean"
Closes#6731
Change the error response when using a non UTF timezone for range queries with epoch_millis
or epoch_second formats to an illegal argument exception. The goal is to provide a better
explanation of why the query has failed. The current behavior is to respond with a parse exception.
Closes#22621
Today, when parsing mget requests, we silently ignore keys in the top
level that do not match "docs" or "ids". This commit addresses this
situation by throwing an exception if any other key occurs here, and
providing the names of valid keys.
Relates #23746
This commit fixes the serialization for plugin info. Namely, the
serialization incorrectly specified the backwards compatibility version
as strictly after version 5.4.0, whereas it should be on or after
version 5.4.0.
This commit introduces a maximum size for a translog generation and
automatically rolls the translog when a generation exceeds the threshold
into a new generation. This threshold is configurable per index and
defaults to sixty-four megabytes. We introduce this constraint as
sequence numbers will require keeping around more than the current
generation (to ensure that we can rollback to the global
checkpoint). Without keeping the size of generations under control,
having to keep old generations around could consume excessive disk
space. A follow-up will enable commits to trim previous generations
based on the global checkpoint.
Relates #23606
The OpenJDK project provides early-access builds of upcoming
releases. These early-access builds are not suitable for
production. These builds sometimes end up on systems due to aggressive
packaging (e.g., Ubuntu). This commit adds a bootstrap check to ensure
these early-access builds are not being used in production.
Relates #23743
The hit object can be very small e.g. when using "stored_fields": ["_none_"],
this adds a test that checks that we can still parse back the object.
* also check type/id null
Removed `parse(String index, String type, String id, BytesReference source)` in DocumentMapper.java and replaced all of its use in Test files with `parse(SourceToParse source)`.
`parse(String index, String type, String id, BytesReference source)` was only used in test files and never in the main code so it was removed. All of the test files that used it was then modified to use `parse(SourceToParse source)` method that existing in DocumentMapper.java
This commit fixes an issue manifested in the
SharedClusterSnapshotRestoreIT#testGetSnapshotsRequest where a delete
request on a snapshot encounters an in-progress snapshot, so it first
tries to abort the snapshot. During the aborting process, an exception
is thrown which is handled by the snapshot listener's onSnapshotFailure
method. This method retries the delete snapshot request, only to
encounter that the snapshot is missing, throwing an exception. It is
possible that the snapshot failure resulted in the snapshot never having
been written to the repository, and hence, there is nothing to delete.
This commit handles the SnapshotMissingException by logging it and
notifying the listener of the missing snapshot.
Closes#23663
This is especially useful when we rewrite the query because the result of the rewrite can be very different on different shards. See #18254 for example.
The output of the different implementations of terms aggs is always very similar. The toXContent methods for each of those classes though was duplicating almost the same code multiple times. This commit centralizes the code for rendering XContent to a single place, which can be reused from the different terms aggs implementations.
We currently use POSIX exit codes in all of our CLIs. However, posix
only suggests these exit codes are standard across tools. It does not
prescribe particular uses for codes outside of that range. This commit
adds 2 exit codes specific to plugin installation to make distinguishing
an incorrectly built plugin and a plugin already existing easier.
closes#15295
This commit catches the underlying failure when trying to list plugin
information when a plugin is incompatible with the current version of
elasticsearch. This could happen when elasticsearch is upgraded but old
plugins still exist. With this change, all plugins will be output,
instead of failing at the first out of date plugin.
closes#20691
Changes reindex and friends to wait until the entire request has
been "cleaned up" before responding. "Clean up" in this context
is clearing the scroll and (for reindex-from-remote) shutting
down the client. Failures to clean up are still only logged, not
returned to the user.
Closes#23653
After the removal of the joda time hack we used to have, we can cleanup
the codebase handling in security, jarhell and plugins to be more picky
about uniqueness. This was originally in #18959 which was never merged.
closes#18959
The SharedClusterSnapshotRestoreIT#testDataFileCorruptionDuringRestore
would fail sporadically because it tried to simulate restoring a
corrupted index. The test would wait until the restore is finished (and
marked as failed) before exiting. However, in the background, the
cluster still continues to retry allocation of the failed shards,
despite the restore operation being marked as completed, which in turn
generates cluster states to process. The end of every ESIntegTestCase
verifies that none of the nodes have any pending cluster states to
process. Hence, this check sometimes fails on this particular test.
This commit solves the issue by ensuring the index is deleted before
exiting the test.
Search took time uses an absolute clock to measure elapsed time, and
then tries to deal with the complexities of using an absolute clock for
this purpose. Instead, we should use a high-precision monotonic relative
clock that is designed exactly for measuring elapsed time. This commit
modifies the search infrastructure to use a relative clock for measuring
took time, but still provides an absolute clock for the components of
search that require a real clock (e.g., index name expression
resolution, etc.).
Relates #23662
A better toString() is added for snapshot operations in progress in the
cluster state and logging has been increased to help debug
SharedClusterSnapshotRestoreIT tests.
MapperService#parentTypes is rewrapped in an UnmodifiableSet in MapperService#internalMerge every time the cluster state is updated. After thousands of updates the collection is wrapped so deeply that calling a method on it generates a StackOverflowError.
Closes#23604
When adding filesystem stats from individual filesystems, free and
available can overflow. This commit guards against this by adjusting
these situations to Long.MAX_VALUE.
Relates #23641
Currently the task manager is tied to the transport and can only create tasks based on TransportRequests. This commit enables task manager to support tasks created by non-transport services such as the persistent tasks service.
* Add support for fragment_length in the unified highlighter
This commit introduce a new break iterator (a BoundedBreakIterator) designed for the unified highlighter
that is able to limit the size of fragments produced by generic break iterator like `sentence`.
The `unified` highlighter now supports `boundary_scanner` which can `words` or `sentence`.
The `sentence` mode will use the bounded break iterator in order to limit the size of the sentence to `fragment_length`.
When sentences bigger than `fragment_length` are produced, this mode will break the sentence at the next word boundary **after**
`fragment_length` is reached.
This commit changes the method for checking the interrupt status of a
thread that is intentionally interrupted during
AdapterActionFutureTests#testInteruption. Namely, we want to check and
clear the interrupt status before joining on the interrupting thread. If
we do not clear the status, when we lose a race where the interrupting
thread is not yet finished, an interrupted exception will be thrown when
we try to join on it. Clearing the interrupted status on the main thread
addresses this issue.
When a thread blocking on an adapter action future is interrupted, we
throw an illegal state exception. This is documented, but it is rude to
not restore the interrupt flag. This commit restores the interrupt flag
in this situation, and adds a test.
Relates #23618
In cases where the user specifies only the `text` option on the top level
suggest element (either via REST or the java api), this gets transferred to the
`text` property in the SuggestionSearchContext. CompletionSuggestionContext
currently requires prefix or regex to be specified, otherwise errors. We should
use the global `text` property as a fallback if neither prefix nor regex is provided.
Closes to #23340
This fixes an NPE in finding scaled float stats. The type of min/max
methods on the wrapped long stats returns a boxed type, but in the case
this is null, the unbox done for the FieldStats.Double ctor primitive
types will cause the NPE. These methods would have null for min/max when
the field exists, but does not actually have points values.
fixes#23487
This commit adds a system property that enables end-users to explicitly
enforce the bootstrap checks, independently of the binding of the
transport protocol. This can be useful for single-node production
systems that do not bind the transport protocol (and thus the bootstrap
checks would not be enforced).
Relates #23585
This change adds a new method that returns the underlying char[] of a SecureString and the ability
to clone the SecureString so that the original SecureString is not vulnerable to modification.
Closing the cloned SecureString will wipe the char[] that backs the clone but the original SecureString remains unaffected.
Additionally, while making a separate change I found that SecureSettings will fail when diff is called on them and there
is no fallback setting. Given the idea behind SecureSetting, I think that diff should just be a no-op and I have
implemented this here as well.
Currently GeoHashGridAggregatorTests#testWithSeveralDocs increases the expected
document count per hash for each geo point added to a document. When points
added to the same doc fall into one bucket (one hash cell) the document should
only be counted once.
Closes#23555
A previous change to the multi-search request execution to avoid stack
overflows regressed on limiting the number of concurrent search requests
from a batched multi-search request. In particular, the replacement of
the tail-recursive call with a loop could asynchronously fire off all of
the remaining search requests in the batch while max concurrent search
requests are already executing. This commit attempts to address this
issue by taking a more careful approach to the initial problem of
recurisve calls. The cause of the initial problem was due to possibility
of individual requests completing on the same thread as invoked the
search action execution. This can happen, for example, in cases when an
individual request does not resolve to any shards. To address this
problem, when an individual request completes we check if it completed
on the same thread as fired off the request. In this case, we loop and
otherwise safely recurse. Sadly, there was a unit test to check that the
maximum number of concurrent search requests was not exceeded, but that
test was broken while modifying the test to reproduce a case that led to
the possibility of stack overflow. As such, we randomize whether or not
search actions execute on the same thread as the thread that invoked the
action.
Relates #23538
When plugins are installed on a union filesystem (for example, inside a
Docker container), removing them can fail because we attempt an atomic
move which will not work if the plugin is not installed in the top
layer. This commit modifies removing a plugin to fall back to a
non-atomic move in cases when the underlying filesystem does not support
atomic moves.
Relates #23548
Today when handling a multi-search request, we asynchornously execute as
many search requests as the minimum of the number of search requests in
the multi-search request and the maximum number of concurrent
requests. When these search requests return, we poll more search
requests from a queue of search requests from the original multi-search
request. The implementation of this was recursive, and if the number of
requests in the multi-search request was large, a stack overflow could
arise due to the recursive invocation. This commit replaces this
recursive implementation with a simple iterative implementation.
Relates #23527
This commit upgrades to the newest version of randomized runner. There
is a new additional check that allows ensuring the working directory
for each child jvm is empty. By default, this check will fail the test
run. However, for elasticsearch, we default to wipe the directory. For
example, if you previously told the runner to not wipe the directory, in
order to investigate a failure, the wipe option will delete this data
upon re-running the test.
When parsing the control groups to which the Elasticsearch process
belongs, we extract a map from subsystems to paths by parsing
/proc/self/cgroup. This file contains colon-delimited entries of the
form hierarchy-ID:subsystem-list:cgroup-path. For control group version
1 hierarchies, the subsystem-list is a comma-delimited list of the
subsystems for that hierarchy. For control group version 2 hierarchies
(which can only exist on Linux kernels since version 4.5), the
subsystem-list is an empty string. The previous parsing of
/proc/self/cgroup incorrectly accounted for this possibility (a +
instead of a * in a regular expression). This commit addresses this
issue, adds a test case that covers this possibility, and simplifies the
code that parses /proc/self/cgroup.
Relates #23493
Previously, the RestController would stash the context prior to copying headers. However, there could be deprecation
log messages logged and in turn warning headers being added to the context prior to the stashing of the context. These
headers in the context would then be removed from the request and also leaked back into the calling thread's context.
This change moves the stashing of the context to the HttpTransport so that the network threads' context isn't
accidentally populated with warning headers and to ensure the headers added early on in the RestController are not
excluded from the response.
This commit adds the size of the cluster state to the response for the
get cluster state API call (GET /_cluster/state). The size that is
returned is the size of the full cluster state in bytes when compressed.
This is the same size of the full cluster state when serialized to
transmit over the network. Specifying the ?human flag displays the
compressed size in a more human friendly manner. Note that even if the
cluster state request filters items from the cluster state (so a subset
of the cluster state is returned), the size that is returned is the
compressed size of the entire cluster state.
Closes#3415
Currently "foo:*" is parsed as prefix query on the field `foo` unless the field is defined in `default_field` or `fields`.
This commit fixes this behavior, "foo:*" is now rewritten to an exists query on the field name.
This change also removes the assumption that "_all:*" should return all docs.
relates #23356
Today the status is lost when parsing back a BulkItemResponse.Failure. This commit changes the BulkItemResponse.Failure parsing method so that it correctly instantiates a failure with the parsed status instead of realying on the parsed ElasticsearchException (that always return an internal server error status).
Adds a common base class for testing subclasses of
`InternalSingleBucketAggregation`. They are so similar they
call into question the utility of having all of these classes.
We maybe could just use `InternalSingleBucketAggregation` in
all those cases.... But for now, let's test the classes!
Relates to #22278
The code was testing `PointRangeQuery` however we now use the
`IndexOrDocValuesQuery` in field mappers. This makes the test generate queries
through mappers so that we test the actual queries that would be generated.
Two tests were periodically failing. What both tests are doing is starting a relocation of a shard from one node to another. Once the
recovery process is started, the test blocks cluster state processing on the relocation target using the BlockClusterStateProcessing disruption. The test then indefinitely
waits for the relocation to complete. The stack dump shows that the relocation is stuck in the PeerRecoveryTargetService.waitForClusterState method, waiting for the relocation target node to have at least the same cluster state
version as the relocation source.
The reason why it gets stuck is the following race:
1) The test code executes a reroute command that relocates a shard from one node to another
2) Relocation target node starts applying the clusterstate with relocation info, starting the recovery process.
4) Recovery is super fast and quickly goes to the waitForClusterState method, which wants to ensure that the cluster state that is
applied on the relocation target is at least as new as the one on the relocation source. The relocation source has already applied the
cluster state but the relocation target is still in the process of applying it. The waitForClusterState method thus uses a
ClusterObserver to wait for the next cluster state. Internally this means submitting a task with priority HIGH to the cluster service.
5) Cluster state application is about to finish on the relocation target. As one of the last steps, it acks to the master which makes the
reroute command return successfully.
6) The test code then blocks cluster state processing on the relocation target by submitting a cluster state update task (with priority
IMMEDIATE) that blocks execution.
If the task that is submitted in step 6 is handled before the one in step 4 by ClusterService's thread executor, cluster state
processing becomes blocked and prevents the cluster state observer from observing the applied cluster state.
This refactors the `TransportShardBulkAction` to split it appart and make it
unit-testable, and then it also adds unit tests that use these methods.
In particular, this makes `executeBulkItemRequest` shorter and more readable
This commit fixes the date format in warning headers. There is some
confusion around whether or not RFC 1123 requires two-digit
days. However, the warning header specification very clearly relies on a
format that requires two-digit days. This commit removes the usage of
RFC 1123 date/time format from Java 8, which allows for one-digit days,
in favor of a format that forces two-digit days (it's otherwise
identical to RFC 1123 format, it is just fixed width).
Relates #23418
We have many version constants in master that have already been
released, but are still marked (by naming convention) as unreleased.
This commit renames those version constants.
Previously, cluster.routing.allocation.same_shard.host was not a dynamic
setting and could not be updated after startup. This commit changes the
behavior to allow the setting to be dynamically updatable. The
documentation already states that the setting is dynamic so no
documentation changes are required.
Closes#22992
When multiple reduce phases were needed the per term error got lost in subsequent reduces in some situations:
When a previous reduce phase had calculated a non-zero error for a particular bucket we were not accounting for this error in subsequent reduce phases and instead were relying on the overall error for the agg which meant we were implicitly assuming that all shards that made up that aggregation had returned the term. This is plainly not true so we need to make sure the per term error for the aggregation is used when calcualting the error for that term in the new reduced aggregation.
Change Setting#get(Settings, Settings) to fallback only if the setting is present in the secondary.
This is needed to fix setting that relies on other settings.
Replace IT with uts.
The warning header used by Elasticsearch for delivering deprecation
warnings has a specific format (RFC 7234, section 5.5). The format
specifies that the warning header should be of the form
warn-code warn-agent warn-text [warn-date]
Here, the warn-code is a three-digit code which communicates various
meanings. The warn-agent is a string used to identify the source of the
warning (either a host:port combination, or some other identifier). The
warn-text is quoted string which conveys the semantic meaning of the
warning. The warn-date is an optional quoted date that can be in a few
different formats.
This commit corrects the warning header within Elasticsearch to follow
this specification. We use the warn-code 299 which means a
"miscellaneous persistent warning." For the warn-agent, we use the
version of Elasticsearch that produced the warning. The warn-text is
unchanged from what we deliver today, but is wrapped in quotes as
specified (this is important as a problem that exists today is that
multiple warnings can not be split by comma to obtain the individual
warnings as the warnings might themselves contain commas). For the
warn-date, we use the RFC 1123 format.
Relates #23275
We recently added parsing code to parse suggesters responses into java api objects. This was done using a switch based on the type of the returned suggestion. We can now replace the switch with using NamedXContentRegistry, which will also be used for aggs parsing.
Previously when multiple reduces occured for the terms aggregation we would add up the errors for the aggregations but would not take into account the errors that had already been calculated for the previous reduce phases.
This change corrects that by adding the previously created errors to the new error value.
Closes#23286
This test makes little sense when sent from the REST layer, as WrapperQueryBuilder is supposed to be used from the Java api. Also, providing the inner query as base64 string will work only for string formats and break for binary formats like SMILE and CBOR, whcih doesn't play well with randomizing content type in our REST tests
Previously this code was duplicated across the 3 different topdocs variants
we have. It also had no real unittest (where we tested with holes in the results)
which caused a sneaky bug where the comparison used `result.size()` vs `results.size()`
causing several NPEs downstream. This change adds a static method to fill the top docs
that is shared across all variants and adds a unittest that would have caught the issue
very quickly.
Closes#19356Closes#23357
Console.readText may return null in certain cases. This commit fixes a
bug in Terminal.promptYesNo which assumed a non-null return value. It
also adds a test for this, and modifies mock terminal to be able to
handle null input values.
The IndexShardOperationsLock has a mechanism to delay operations if there is currently a block on the lock. These
delayed operations are executed when the block is released and are executed by a different thread. When the different
thread executes the operations, the ThreadContext is that of the thread that was blocking operations. In order to
preserve the ThreadContext, we need to store it and wrap the listener when the operation is delayed.
This change exposes the new Lucene graph based word delimiter token filter in the analysis filters.
Unlike the `word_delimiter` this token filter named `word_delimiter_graph` correctly handles multi terms expansion at query time.
Closes#23104
This commit adds a boundary_scanner property to the search highlight
request so the user can specify different boundary scanners:
* `chars` (default, current behavior)
* `word` Use a WordBreakIterator
* `sentence` Use a SentenceBreakIterator
This commit also adds "boundary_scanner_locale" to define which locale
should be used when scanning the text.
There are two ways to determine the latest index-N blob that contains
the truth of the contents of the repository: (1) list all index-N blobs
and figure out what the latest value of N is, and (2) read the
index.latest blob, which contains the latest value of N explicitely.
Note that the index.latest blob is not written atomically and can be
re-written, as opposed to the index-N blobs which are never re-written
(to create an updated index blob, index-{N+1} is written).
Previously, the latest index-N was determined by first trying to read
the index.latest blob and if that blob was missing (it was deleted
before being re-written and in between deleting it and re-writing it,
the system crashed), then all index-N blobs were listed to pick the
highest N value.
For non-read-only repositories, this could produce race conditions with
the file system. In particular, it is possible that the index.latest
blob is being read in order to serve a read request (e.g. get snapshots)
and while doing so, an attempt is made to delete the index.latest blob
and re-write it in order to finalize a snapshot operation. On some file
systems (e.g. Windows), it is forbidden to delete a file while it is
open for reading by another process/thread.
This commit changes the priority so that figuring out the latest index-N
blob is first done by listing all index-N blobs and determining the
latest N value. If that values because the repository does not
support listing blobs (e.g. the URL repository), then the index.latest
blob is read. This is safe because in read-only repositories that do
not support listing blobs, the index.latest blob is never deleted and
then re-written, so the aforementioned issue does not arise.
In oder to use lucene's utilities to merge top docs the results
need to be passed in a dense array where the index corresponds to the shard index in
the result list. Yet, we were sorting results before merging them just to order them
in the incoming order again for the above mentioned reason. This change removes the
obsolet sort and prevents unnecessary materializing of results.
In update scripts, `ctx._now` uses the same milliseconds value used by the
rest of the system to calculate deltas. However, that time is not
actually epoch milliseconds, as it is derived from `System.nanoTime()`.
This change reworks the estimated time thread in ThreadPool which this
time is based on to make available both the relative time, as well as
absolute milliseconds (epoch) which may be used with calendar system. It
also renames the EstimatedTimeThread to a more apt CachedTimeThread.
closes#23169
From #23093, we fixed the issue where a filesystem can be so large that it
overflows and returns a negative number. However, there is another issue when
adding a path as a sub-path to another `FsInfo.Path` object, when adding the
totals the values can still overflow.
This adds the same safety to return `Long.MAX_VALUE` instead of the negative
number, as well as a test exercising the logic.
When a node wants to join a cluster, it sends a join request to the master. The master then sends a join validation request to the node. This checks that the node can deserialize the current cluster state that exists on the master and that it can thus handle all the indices that are currently in the cluster (see #21830).
The current code can trip an assertion as it does not take the cluster state as is but sets itself as the local node on the cluster state. This can result in an inconsistent DiscoveryNodes object as the local node is not yet part of the cluster state and a node with same id but different address can still exist in the cluster state. Also another node with the same address but different id can exist in the cluster state if multiple nodes are run on the same machine and ports have been swapped after node crashes/restarts.
Also expand testing on the different ways to provide index settings and remove dead code around ability to provide settings as query string parameters
Closes#23242
Elasticsearch accepts multiple content-type formats, hence scripts can be stored/provided in json, yaml, cbor or smile. Yet the format that should be used internally is json. This is a problem mainly around search templates, as they only support json out of the four content-types, so instead of maintaining the content-type of the request we should rather convert the scripts/templates to json.
Binary formats were not previously supported. If you stored a template in yaml format, you'd get back an error "No encoder found for MIME type [application/yaml]" when trying to execute it. With this commit the request content-type is independent from the template, which always gets converted to json internally. That is transparent to users and doesn't affect the content type of the response obtained when executing the template.
Both PRs below have been backported to 5.4 such that we can enable
BWC tests of this feature as well as remove version dependend serialization
for search request / responses.
Relates to #23288
Relates to #23253
* Make document write requests immutable
Previously, write requests were mutated at the
transport level to update request version, version type
and sequence no before replication.
Now that all write requests go through the shard bulk
transport action, we can use the primary response stored
in item level bulk requests to pass the updated version,
seqence no. to replicas.
* incorporate feedback
* minor cleanup
* Add bwc test to ensure correct index version propagates to replica
* Fix bwc for propagating write operation versions
* Add assertion on replica request version type
* fix tests using internal version type for replica op
* Fix assertions to assert version type in replica and recovery
* add bwc tests for version checks in concurrent indexing
* incorporate feedback
The assertion that if there are buffered aggs at least one incremental
reduce phase should have happened doens't hold if there are shard failure.
This commit removes this assertion.
Relates to #23288
In #23253 we added an the ability to incrementally reduce search results.
This change exposes the parameter to control the batch since and therefore
the memory consumption of a large search request.
InternalTopHits uses "==" to compare hit scores and fails when score is NaN.
This commit changes the comparaison to always use Double.compare.
Relates #23253
We can and should randomly reduce down to a single result before
we passing the aggs to the final reduce. This commit changes the logic
to do that and ensures we don't trip the assertions the previous imple tripped.
Relates to #23253
Today all query results are buffered up until we received responses of
all shards. This can hold on to a significant amount of memory if the number of
shards is large. This commit adds a first step towards incrementally reducing
aggregations results if a, per search request, configurable amount of responses
are received. If enough query results have been received and buffered all so-far
received aggregation responses will be reduced and released to be GCed.
Today, the relationship between Lucene and the translog is rather
simple: every document not in Lucene is guaranteed to be in the
translog. We need a stronger guarantee from the translog though, namely
that it can replay all operations after a certain sequence number. For
this to be possible, the translog has to made sequence-number aware. As
a first step, we introduce the min and max sequence numbers into the
translog so that each generation knows the possible range of operations
contained in the generation. This will enable future work to keep around
all generations containing operations after a certain sequence number
(e.g., the global checkpoint).
Relates #22822
A follow up to #23202, this adds parsing from xContent and tests to the four Suggestion implementations
and the top level suggest element to be used later when parsing the entire SearchResponse.
This commit cleans up some parsing tests added from the High Level Rest Client: IndexResponseTests, DeleteResponseTests, UpdateResponseTests, BulkItemResponseTests.
These tests are now more uniform with the others test-from-to-XContent tests we have, they now shuffle the XContent fields before parsing, the asserting method for parsed objects does not used a Map<String, Object> anymore, and buggy equals/hasCode methods in ShardInfo and ShardInfo.Failure have been removed.
This commit enforces the requirement of Content-Type for the REST layer and removes the deprecated methods in transport
requests and their usages.
While doing this, it turns out that there are many places where *Entity classes are used from the apache http client
libraries and many of these usages did not specify the content type. The methods that do not specify a content type
explicitly have been added to forbidden apis to prevent more of these from entering our code base.
Relates #19388
The file /proc/self/cgroup lists the control groups to which the process
belongs. This file is a colon separated list of three fields:
1. a hierarchy ID number
2. a comma-separated list of hierarchies
3. the pathname of the control group in the hierarchy
The regex pattern for this contains a bug for the second field. It
allows one or two entries in the comma-separated list, but not
more. This commit fixes the pattern to allow one or more entires in the
comma-separated list.
Relates #23219
Get HEAD requests incorrectly return a content-length header of 0. This
commit addresses this by removing the special handling for get HEAD
requests, and just relying on the general mechanism that exists for
handling HEAD requests in the REST layer.
Relates #23186
This commit adds a parsing method to the BulkItemResponse class. In order to do that, the way DocWriteResponses are parsed has to be changed: ConstructingObjectParser/ObjectParser is removed in favor of a simpler and more readable way to parse these objects.
DocWriteResponse now provides the parseInnerToXContent() method that can be used by subclasses (IndexResponse, UpdateReponse and DeleteResponse) to parse the current token/field and potentially update a DocWriteResponseBuilder. The DocWriteResponseBuilder is a simple POJO used
to contain parsed values. It can be passed around from one parsing method to another parsing method. For example, this is what is done in IndexResponse: a IndexResponseBuilder is created in IndexResponse.fromXContent(), it get passed to IndexResponse.parseXContentFields() that
parses fields specific to IndexResponse (like "created") and updates the context, delegating to DocWriteResponse.parseInnerToXContent() the parsing of any other field. Once all XContent is parsed, IndexResponse.fromXContent() uses the method
IndexResponseBuilder.build() to create the new instance of IndexResponse.
This behavior allow to reuse parsing code among the class hierarchy while keeping the current behavior. It also allows other objects like BulkItemResponse to reuse the same parsing code to parse DocWriteResponses.
Finally, IndexResponseTests, UpdateResponseTests and DeleteResponseTests have been updated to introduce some random shuffling of fields before the XContent is parsed in order to ensure that the parsing code does not rely on field order.