A previous change to the multi-search request execution to avoid stack
overflows regressed on limiting the number of concurrent search requests
from a batched multi-search request. In particular, the replacement of
the tail-recursive call with a loop could asynchronously fire off all of
the remaining search requests in the batch while max concurrent search
requests are already executing. This commit attempts to address this
issue by taking a more careful approach to the initial problem of
recurisve calls. The cause of the initial problem was due to possibility
of individual requests completing on the same thread as invoked the
search action execution. This can happen, for example, in cases when an
individual request does not resolve to any shards. To address this
problem, when an individual request completes we check if it completed
on the same thread as fired off the request. In this case, we loop and
otherwise safely recurse. Sadly, there was a unit test to check that the
maximum number of concurrent search requests was not exceeded, but that
test was broken while modifying the test to reproduce a case that led to
the possibility of stack overflow. As such, we randomize whether or not
search actions execute on the same thread as the thread that invoked the
action.
Relates #23538
When plugins are installed on a union filesystem (for example, inside a
Docker container), removing them can fail because we attempt an atomic
move which will not work if the plugin is not installed in the top
layer. This commit modifies removing a plugin to fall back to a
non-atomic move in cases when the underlying filesystem does not support
atomic moves.
Relates #23548
Today when handling a multi-search request, we asynchornously execute as
many search requests as the minimum of the number of search requests in
the multi-search request and the maximum number of concurrent
requests. When these search requests return, we poll more search
requests from a queue of search requests from the original multi-search
request. The implementation of this was recursive, and if the number of
requests in the multi-search request was large, a stack overflow could
arise due to the recursive invocation. This commit replaces this
recursive implementation with a simple iterative implementation.
Relates #23527
This commit upgrades to the newest version of randomized runner. There
is a new additional check that allows ensuring the working directory
for each child jvm is empty. By default, this check will fail the test
run. However, for elasticsearch, we default to wipe the directory. For
example, if you previously told the runner to not wipe the directory, in
order to investigate a failure, the wipe option will delete this data
upon re-running the test.
When parsing the control groups to which the Elasticsearch process
belongs, we extract a map from subsystems to paths by parsing
/proc/self/cgroup. This file contains colon-delimited entries of the
form hierarchy-ID:subsystem-list:cgroup-path. For control group version
1 hierarchies, the subsystem-list is a comma-delimited list of the
subsystems for that hierarchy. For control group version 2 hierarchies
(which can only exist on Linux kernels since version 4.5), the
subsystem-list is an empty string. The previous parsing of
/proc/self/cgroup incorrectly accounted for this possibility (a +
instead of a * in a regular expression). This commit addresses this
issue, adds a test case that covers this possibility, and simplifies the
code that parses /proc/self/cgroup.
Relates #23493
Previously, the RestController would stash the context prior to copying headers. However, there could be deprecation
log messages logged and in turn warning headers being added to the context prior to the stashing of the context. These
headers in the context would then be removed from the request and also leaked back into the calling thread's context.
This change moves the stashing of the context to the HttpTransport so that the network threads' context isn't
accidentally populated with warning headers and to ensure the headers added early on in the RestController are not
excluded from the response.
This commit adds the size of the cluster state to the response for the
get cluster state API call (GET /_cluster/state). The size that is
returned is the size of the full cluster state in bytes when compressed.
This is the same size of the full cluster state when serialized to
transmit over the network. Specifying the ?human flag displays the
compressed size in a more human friendly manner. Note that even if the
cluster state request filters items from the cluster state (so a subset
of the cluster state is returned), the size that is returned is the
compressed size of the entire cluster state.
Closes#3415
Currently "foo:*" is parsed as prefix query on the field `foo` unless the field is defined in `default_field` or `fields`.
This commit fixes this behavior, "foo:*" is now rewritten to an exists query on the field name.
This change also removes the assumption that "_all:*" should return all docs.
relates #23356
Today the status is lost when parsing back a BulkItemResponse.Failure. This commit changes the BulkItemResponse.Failure parsing method so that it correctly instantiates a failure with the parsed status instead of realying on the parsed ElasticsearchException (that always return an internal server error status).
Adds a common base class for testing subclasses of
`InternalSingleBucketAggregation`. They are so similar they
call into question the utility of having all of these classes.
We maybe could just use `InternalSingleBucketAggregation` in
all those cases.... But for now, let's test the classes!
Relates to #22278
The code was testing `PointRangeQuery` however we now use the
`IndexOrDocValuesQuery` in field mappers. This makes the test generate queries
through mappers so that we test the actual queries that would be generated.
Two tests were periodically failing. What both tests are doing is starting a relocation of a shard from one node to another. Once the
recovery process is started, the test blocks cluster state processing on the relocation target using the BlockClusterStateProcessing disruption. The test then indefinitely
waits for the relocation to complete. The stack dump shows that the relocation is stuck in the PeerRecoveryTargetService.waitForClusterState method, waiting for the relocation target node to have at least the same cluster state
version as the relocation source.
The reason why it gets stuck is the following race:
1) The test code executes a reroute command that relocates a shard from one node to another
2) Relocation target node starts applying the clusterstate with relocation info, starting the recovery process.
4) Recovery is super fast and quickly goes to the waitForClusterState method, which wants to ensure that the cluster state that is
applied on the relocation target is at least as new as the one on the relocation source. The relocation source has already applied the
cluster state but the relocation target is still in the process of applying it. The waitForClusterState method thus uses a
ClusterObserver to wait for the next cluster state. Internally this means submitting a task with priority HIGH to the cluster service.
5) Cluster state application is about to finish on the relocation target. As one of the last steps, it acks to the master which makes the
reroute command return successfully.
6) The test code then blocks cluster state processing on the relocation target by submitting a cluster state update task (with priority
IMMEDIATE) that blocks execution.
If the task that is submitted in step 6 is handled before the one in step 4 by ClusterService's thread executor, cluster state
processing becomes blocked and prevents the cluster state observer from observing the applied cluster state.
This refactors the `TransportShardBulkAction` to split it appart and make it
unit-testable, and then it also adds unit tests that use these methods.
In particular, this makes `executeBulkItemRequest` shorter and more readable
This commit fixes the date format in warning headers. There is some
confusion around whether or not RFC 1123 requires two-digit
days. However, the warning header specification very clearly relies on a
format that requires two-digit days. This commit removes the usage of
RFC 1123 date/time format from Java 8, which allows for one-digit days,
in favor of a format that forces two-digit days (it's otherwise
identical to RFC 1123 format, it is just fixed width).
Relates #23418
We have many version constants in master that have already been
released, but are still marked (by naming convention) as unreleased.
This commit renames those version constants.
Previously, cluster.routing.allocation.same_shard.host was not a dynamic
setting and could not be updated after startup. This commit changes the
behavior to allow the setting to be dynamically updatable. The
documentation already states that the setting is dynamic so no
documentation changes are required.
Closes#22992
When multiple reduce phases were needed the per term error got lost in subsequent reduces in some situations:
When a previous reduce phase had calculated a non-zero error for a particular bucket we were not accounting for this error in subsequent reduce phases and instead were relying on the overall error for the agg which meant we were implicitly assuming that all shards that made up that aggregation had returned the term. This is plainly not true so we need to make sure the per term error for the aggregation is used when calcualting the error for that term in the new reduced aggregation.
Change Setting#get(Settings, Settings) to fallback only if the setting is present in the secondary.
This is needed to fix setting that relies on other settings.
Replace IT with uts.
The warning header used by Elasticsearch for delivering deprecation
warnings has a specific format (RFC 7234, section 5.5). The format
specifies that the warning header should be of the form
warn-code warn-agent warn-text [warn-date]
Here, the warn-code is a three-digit code which communicates various
meanings. The warn-agent is a string used to identify the source of the
warning (either a host:port combination, or some other identifier). The
warn-text is quoted string which conveys the semantic meaning of the
warning. The warn-date is an optional quoted date that can be in a few
different formats.
This commit corrects the warning header within Elasticsearch to follow
this specification. We use the warn-code 299 which means a
"miscellaneous persistent warning." For the warn-agent, we use the
version of Elasticsearch that produced the warning. The warn-text is
unchanged from what we deliver today, but is wrapped in quotes as
specified (this is important as a problem that exists today is that
multiple warnings can not be split by comma to obtain the individual
warnings as the warnings might themselves contain commas). For the
warn-date, we use the RFC 1123 format.
Relates #23275
We recently added parsing code to parse suggesters responses into java api objects. This was done using a switch based on the type of the returned suggestion. We can now replace the switch with using NamedXContentRegistry, which will also be used for aggs parsing.
Previously when multiple reduces occured for the terms aggregation we would add up the errors for the aggregations but would not take into account the errors that had already been calculated for the previous reduce phases.
This change corrects that by adding the previously created errors to the new error value.
Closes#23286
This test makes little sense when sent from the REST layer, as WrapperQueryBuilder is supposed to be used from the Java api. Also, providing the inner query as base64 string will work only for string formats and break for binary formats like SMILE and CBOR, whcih doesn't play well with randomizing content type in our REST tests
Previously this code was duplicated across the 3 different topdocs variants
we have. It also had no real unittest (where we tested with holes in the results)
which caused a sneaky bug where the comparison used `result.size()` vs `results.size()`
causing several NPEs downstream. This change adds a static method to fill the top docs
that is shared across all variants and adds a unittest that would have caught the issue
very quickly.
Closes#19356Closes#23357
Console.readText may return null in certain cases. This commit fixes a
bug in Terminal.promptYesNo which assumed a non-null return value. It
also adds a test for this, and modifies mock terminal to be able to
handle null input values.
The IndexShardOperationsLock has a mechanism to delay operations if there is currently a block on the lock. These
delayed operations are executed when the block is released and are executed by a different thread. When the different
thread executes the operations, the ThreadContext is that of the thread that was blocking operations. In order to
preserve the ThreadContext, we need to store it and wrap the listener when the operation is delayed.
This change exposes the new Lucene graph based word delimiter token filter in the analysis filters.
Unlike the `word_delimiter` this token filter named `word_delimiter_graph` correctly handles multi terms expansion at query time.
Closes#23104
This commit adds a boundary_scanner property to the search highlight
request so the user can specify different boundary scanners:
* `chars` (default, current behavior)
* `word` Use a WordBreakIterator
* `sentence` Use a SentenceBreakIterator
This commit also adds "boundary_scanner_locale" to define which locale
should be used when scanning the text.
There are two ways to determine the latest index-N blob that contains
the truth of the contents of the repository: (1) list all index-N blobs
and figure out what the latest value of N is, and (2) read the
index.latest blob, which contains the latest value of N explicitely.
Note that the index.latest blob is not written atomically and can be
re-written, as opposed to the index-N blobs which are never re-written
(to create an updated index blob, index-{N+1} is written).
Previously, the latest index-N was determined by first trying to read
the index.latest blob and if that blob was missing (it was deleted
before being re-written and in between deleting it and re-writing it,
the system crashed), then all index-N blobs were listed to pick the
highest N value.
For non-read-only repositories, this could produce race conditions with
the file system. In particular, it is possible that the index.latest
blob is being read in order to serve a read request (e.g. get snapshots)
and while doing so, an attempt is made to delete the index.latest blob
and re-write it in order to finalize a snapshot operation. On some file
systems (e.g. Windows), it is forbidden to delete a file while it is
open for reading by another process/thread.
This commit changes the priority so that figuring out the latest index-N
blob is first done by listing all index-N blobs and determining the
latest N value. If that values because the repository does not
support listing blobs (e.g. the URL repository), then the index.latest
blob is read. This is safe because in read-only repositories that do
not support listing blobs, the index.latest blob is never deleted and
then re-written, so the aforementioned issue does not arise.
In oder to use lucene's utilities to merge top docs the results
need to be passed in a dense array where the index corresponds to the shard index in
the result list. Yet, we were sorting results before merging them just to order them
in the incoming order again for the above mentioned reason. This change removes the
obsolet sort and prevents unnecessary materializing of results.
In update scripts, `ctx._now` uses the same milliseconds value used by the
rest of the system to calculate deltas. However, that time is not
actually epoch milliseconds, as it is derived from `System.nanoTime()`.
This change reworks the estimated time thread in ThreadPool which this
time is based on to make available both the relative time, as well as
absolute milliseconds (epoch) which may be used with calendar system. It
also renames the EstimatedTimeThread to a more apt CachedTimeThread.
closes#23169
From #23093, we fixed the issue where a filesystem can be so large that it
overflows and returns a negative number. However, there is another issue when
adding a path as a sub-path to another `FsInfo.Path` object, when adding the
totals the values can still overflow.
This adds the same safety to return `Long.MAX_VALUE` instead of the negative
number, as well as a test exercising the logic.
When a node wants to join a cluster, it sends a join request to the master. The master then sends a join validation request to the node. This checks that the node can deserialize the current cluster state that exists on the master and that it can thus handle all the indices that are currently in the cluster (see #21830).
The current code can trip an assertion as it does not take the cluster state as is but sets itself as the local node on the cluster state. This can result in an inconsistent DiscoveryNodes object as the local node is not yet part of the cluster state and a node with same id but different address can still exist in the cluster state. Also another node with the same address but different id can exist in the cluster state if multiple nodes are run on the same machine and ports have been swapped after node crashes/restarts.
Also expand testing on the different ways to provide index settings and remove dead code around ability to provide settings as query string parameters
Closes#23242
Elasticsearch accepts multiple content-type formats, hence scripts can be stored/provided in json, yaml, cbor or smile. Yet the format that should be used internally is json. This is a problem mainly around search templates, as they only support json out of the four content-types, so instead of maintaining the content-type of the request we should rather convert the scripts/templates to json.
Binary formats were not previously supported. If you stored a template in yaml format, you'd get back an error "No encoder found for MIME type [application/yaml]" when trying to execute it. With this commit the request content-type is independent from the template, which always gets converted to json internally. That is transparent to users and doesn't affect the content type of the response obtained when executing the template.
Both PRs below have been backported to 5.4 such that we can enable
BWC tests of this feature as well as remove version dependend serialization
for search request / responses.
Relates to #23288
Relates to #23253
* Make document write requests immutable
Previously, write requests were mutated at the
transport level to update request version, version type
and sequence no before replication.
Now that all write requests go through the shard bulk
transport action, we can use the primary response stored
in item level bulk requests to pass the updated version,
seqence no. to replicas.
* incorporate feedback
* minor cleanup
* Add bwc test to ensure correct index version propagates to replica
* Fix bwc for propagating write operation versions
* Add assertion on replica request version type
* fix tests using internal version type for replica op
* Fix assertions to assert version type in replica and recovery
* add bwc tests for version checks in concurrent indexing
* incorporate feedback
The assertion that if there are buffered aggs at least one incremental
reduce phase should have happened doens't hold if there are shard failure.
This commit removes this assertion.
Relates to #23288
In #23253 we added an the ability to incrementally reduce search results.
This change exposes the parameter to control the batch since and therefore
the memory consumption of a large search request.
InternalTopHits uses "==" to compare hit scores and fails when score is NaN.
This commit changes the comparaison to always use Double.compare.
Relates #23253
We can and should randomly reduce down to a single result before
we passing the aggs to the final reduce. This commit changes the logic
to do that and ensures we don't trip the assertions the previous imple tripped.
Relates to #23253
Today all query results are buffered up until we received responses of
all shards. This can hold on to a significant amount of memory if the number of
shards is large. This commit adds a first step towards incrementally reducing
aggregations results if a, per search request, configurable amount of responses
are received. If enough query results have been received and buffered all so-far
received aggregation responses will be reduced and released to be GCed.
Today, the relationship between Lucene and the translog is rather
simple: every document not in Lucene is guaranteed to be in the
translog. We need a stronger guarantee from the translog though, namely
that it can replay all operations after a certain sequence number. For
this to be possible, the translog has to made sequence-number aware. As
a first step, we introduce the min and max sequence numbers into the
translog so that each generation knows the possible range of operations
contained in the generation. This will enable future work to keep around
all generations containing operations after a certain sequence number
(e.g., the global checkpoint).
Relates #22822
A follow up to #23202, this adds parsing from xContent and tests to the four Suggestion implementations
and the top level suggest element to be used later when parsing the entire SearchResponse.
This commit cleans up some parsing tests added from the High Level Rest Client: IndexResponseTests, DeleteResponseTests, UpdateResponseTests, BulkItemResponseTests.
These tests are now more uniform with the others test-from-to-XContent tests we have, they now shuffle the XContent fields before parsing, the asserting method for parsed objects does not used a Map<String, Object> anymore, and buggy equals/hasCode methods in ShardInfo and ShardInfo.Failure have been removed.
This commit enforces the requirement of Content-Type for the REST layer and removes the deprecated methods in transport
requests and their usages.
While doing this, it turns out that there are many places where *Entity classes are used from the apache http client
libraries and many of these usages did not specify the content type. The methods that do not specify a content type
explicitly have been added to forbidden apis to prevent more of these from entering our code base.
Relates #19388
The file /proc/self/cgroup lists the control groups to which the process
belongs. This file is a colon separated list of three fields:
1. a hierarchy ID number
2. a comma-separated list of hierarchies
3. the pathname of the control group in the hierarchy
The regex pattern for this contains a bug for the second field. It
allows one or two entries in the comma-separated list, but not
more. This commit fixes the pattern to allow one or more entires in the
comma-separated list.
Relates #23219
Get HEAD requests incorrectly return a content-length header of 0. This
commit addresses this by removing the special handling for get HEAD
requests, and just relying on the general mechanism that exists for
handling HEAD requests in the REST layer.
Relates #23186
This commit adds a parsing method to the BulkItemResponse class. In order to do that, the way DocWriteResponses are parsed has to be changed: ConstructingObjectParser/ObjectParser is removed in favor of a simpler and more readable way to parse these objects.
DocWriteResponse now provides the parseInnerToXContent() method that can be used by subclasses (IndexResponse, UpdateReponse and DeleteResponse) to parse the current token/field and potentially update a DocWriteResponseBuilder. The DocWriteResponseBuilder is a simple POJO used
to contain parsed values. It can be passed around from one parsing method to another parsing method. For example, this is what is done in IndexResponse: a IndexResponseBuilder is created in IndexResponse.fromXContent(), it get passed to IndexResponse.parseXContentFields() that
parses fields specific to IndexResponse (like "created") and updates the context, delegating to DocWriteResponse.parseInnerToXContent() the parsing of any other field. Once all XContent is parsed, IndexResponse.fromXContent() uses the method
IndexResponseBuilder.build() to create the new instance of IndexResponse.
This behavior allow to reuse parsing code among the class hierarchy while keeping the current behavior. It also allows other objects like BulkItemResponse to reuse the same parsing code to parse DocWriteResponses.
Finally, IndexResponseTests, UpdateResponseTests and DeleteResponseTests have been updated to introduce some random shuffling of fields before the XContent is parsed in order to ensure that the parsing code does not rely on field order.
This adds parsing from xContent to the CompletionSuggestion.Entry.Option.
The completion suggestion option also inlines the xContent rendering of the
containes SearchHit, so in order to reuse the SearchHit parser this also changes
the way SearchHit is parsed from using a loop-based parser to using a
ConstructingObjectParser that creates an intermediate map representation and
then later uses this output to create either a single SearchHit or use it with
additional fields defined in the parser for the completion suggestion option.
This change modifies the deprecation log message emitted when a setting
is found which is deprecated. The new message indicates docs for the
deprecated settings can be found in the breaking changes docs for the
next major version.
closes#22849
This commit is just a code cleanup of RestGetIndicesAction.java. For
example, we remove an unnecessary class, remove some unnecessary local
variables, and simplify some code flow.
Relates #23129
Get source HEAD requests incorrectly return a content-length header of
0. This commit addresses this by removing the special handling for get
source HEAD requests, and just relying on the general mechanism that
exists for handling HEAD requests in the REST layer.
Relates #23151
Now that we have more flexible search phases we should move the rather
hacky integration of the collapse feature as a real search phase that can
be tested and used by itself. This commit adds a new ExpandSearchPhase
including a unittest for the phase. It's integrated into the fetch phase
as an optional successor.
When nested objects are present in the mappings, many queries get deoptimized
due to the need to exclude documents that are not in the right space. For
instance, a filter is applied to all queries that prevents them from matching
non-root documents (`+*:* -_type:__*`). Moreover, a filter is applied to all
child queries of `nested` queries in order to make sure that the child query
only matches child documents (`_type:__nested_path`), which is required by
`ToParentBlockJoinQuery` (the Lucene query behing Elasticsearch's `nested`
queries).
These additional filters slow down `nested` queries. In 1.7-, the cost was
somehow amortized by the fact that we cached filters very aggressively. However,
this has proven to be a significant source of slow downs since 2.0 for users
of `nested` mappings and queries, see #20797.
This change makes the filtering a bit smarter. For instance if the query is a
`match_all` query, then we need to exclude nested docs. However, if the query
is `foo: bar` then it may only match root documents since `foo` is a top-level
field, so no additional filtering is required.
Another improvement is to use a `FILTER` clause on all types rather than a
`MUST_NOT` clause on all nested paths when possible since `FILTER` clauses
are more efficient.
Here are some examples of queries and how they get rewritten:
```
"match_all": {}
```
This query gets rewritten to `ConstantScore(+*:* -_type:__*)` on master and
`ConstantScore(_type:AutomatonQuery {\norg.apache.lucene.util.automaton.Automaton@4371da44})`
with this change. The automaton is the complement of `_type:__*` so it matches
the same documents, but is faster since it is now a positive clause. Simplistic
performance testing on a 10M index where each root document has 5 nested
documents on average gave a latency of 420ms on master and 90ms with this change
applied.
```
"term": {
"foo": {
"value": "0"
}
}
```
This query is rewritten to `+foo:0 #(ConstantScore(+*:* -_type:__*))^0.0` on
master and `foo:0` with this change: we do not need to filter nested docs out
since the query cannot match nested docs. While doing performance testing in
the same conditions as above, response times went from 250ms to 50ms.
```
"nested": {
"path": "nested",
"query": {
"term": {
"nested.foo": {
"value": "0"
}
}
}
}
```
This query is rewritten to
`+ToParentBlockJoinQuery (+nested.foo:0 #_type:__nested) #(ConstantScore(+*:* -_type:__*))^0.0`
on master and `ToParentBlockJoinQuery (nested.foo:0)` with this change. The
top-level filter (`-_type:__*`) could be removed since `nested` queries only
match documents of the parent space, as well as the child filter
(`#_type:__nested`) since the child query may only match nested docs since the
`nested` object has both `include_in_parent` and `include_in_root` set to
`false`. While doing performance testing in the same conditions as above,
response times went from 850ms to 270ms.
This gives Lucene the choice to use index/point-based queries or
doc-values-based queries depending on which one is more efficient. This commit
integrates this feature for:
- long/integer/short/byte/double/float/half_float/scaled_float ranges,
- date ranges,
- geo bounding box queries,
- geo distance queries.
Today all search phases are inner classes of AbstractSearchAsyncAction or one of it's
subclasses. This makes unit testing of these classes practically impossible. This commit
Extracts `DfsQueryPhase` and `FetchSearchPhase` or of the code that composes the actual
query execution types and moves most of the fan-out and collect code into an `InitialSearchPhase`
class that can be used to build initial search phases (phases that retry on shards). This will
make modification to these classes simpler and allows to easily compose or add new search phases
down the road if additional roundtrips are required.
When Netty decodes a bad HTTP request, it marks the decoder result on
the HTTP request as a failure, and reroutes the request to GET
/bad-request. This either leads to puzzling responses when a bad request
is sent to Elasticsearch (if an index named "bad-request" does not exist
then it produces an index not found exception and otherwise responds
with the index settings for the index named "bad-request"). This commit
addresses this by inspecting the decoder result on the HTTP request and
dispatching the request to a bad request handler preserving the initial
cause of the bad request and providing an error message to the client.
Relates #23153
This commit adds a new method to the TransportChannel that provides access to the version of the
remote node that the response is being sent on and that the request came from. This is helpful
for serialization of data attached as headers.
* Fix total disk bytes returning negative value
This adds a workaround for JDK-8162520 -
https://bugs.openjdk.java.net/browse/JDK-8162520
Some filesystems can be so large that they return a negative value for their
free/used/available disk bytes due to being larger than `Long.MAX_VALUE`.
This adds protection for our `FsProbe` implementation and adds a test that it
does the right thing.
Today when trying to encode the location header to ASCII, we rely on the
Java URI API. This API requires a proper URI which blows up whenever the
URI contains, for example, a space (which can happen if the type, ID, or
routing contain a space). This commit addresses this issue by properly
encoding the URI. Additionally, we remove the need to create a URI
simplifying the code flow.
Relates #23133
This commit changes the RefreshPolicy enum so that string representation are exposed. This will help the high level rest client to simply use refreshPolicy.getValue() to get the corresponding parameter value of a given refresh policy.
#21817 introduced the notion of a cluster state applier and banned those for sampling the cluster state directly (as it is not applied yet). Testing has exposed one exceptional use case - if the appliers want to spawn off a follow up it may require waiting for specific new cluster state (for example, the shard started action, called by the IndicesClusterStateService, may run into trouble connecting to the master and wait for a new master to be elected). This requires creating an observer which, in turn, samples the cluster state.
An example failure can be seen at https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/1701/console
This commit allows creating an observer from a cluster state applier. The observer is adapted to exclude any potential old cluster state in its logic.
Template HEAD requests incorrectly return a content-length header of
0. This commit addresses this by removing the special handling for
template HEAD requests, and just relying on the general mechanism that
exists for handling HEAD requests in the REST layer.
Relates #23130
GraphQueries are now generated as simple clauses in BooleanQuery. So for instance a multi terms synonym will generate
a GraphQuery but only for the side paths, the other part of the query will not be impacted. This means that we cannot apply
`minimum_should_match` or `cutoff_frequency` on GraphQuery anymore (only ES 5.3 does that because we generate all possible paths if a query has at least one multi terms synonym).
Starting in 5.4 multi terms synonym will now be treated as a single term when `minimum_should_match` is computed and will be ignored when `cutoff_frequency` is set.
Fixes#23102
Limits the length of `IndexRequest#toString` which also limits the size of the task description generated for `IndexRequest`s. If the document being written is larger than 2kb we skip logging the _source entirely. This is because truncating the source is tricky and it isn't worth it.
Index HEAD requests incorrectly return a content-length header of
0. This commit addresses this by removing the special handling for index
HEAD requests, and just relying on the general mechanism that exists for
handling HEAD requests in the REST layer.
Relates #23112
Alias HEAD requests incorrectly return a content-length header of
0. This commit addresses this by removing the special handling for alias
HEAD requests, and just relying on the general mechanism that exists for
handling HEAD requests in the REST layer.
Relates #23094
This commit adds methods to the BulkProcessor that accept bytes and a XContentType to avoid content type detection. The
methods that do not accept XContentType with bytes have been deprecated by this commit.
Relates #22691
This commit is just a code cleanup of RestGetAliasesAction.java. For
example, we remove an unnecessary class, simplify a convenience method,
and simplify some code flow.
Relates #23095
This pull request reuses the typed_keys parameter added in #22965, but this time it applies it to suggesters. When set to true, the suggester names in the search response will be prefixed with a prefix that reflects their type.
EvillPeerRecoveryIT checks scenario where recovery is happening while there are on going indexing operation that already have been assigned a seq# . This is fairly hard to achieve and the test goes through a couple of hoops via the plugin infra to achieve that. This PR extends the unit tests infra to allow for those hoops to happen in unit tests. This allows the test to be moved to RecoveryDuringReplicationTests
Relates to #22484
This changes removes the SearchResponseListener that was used by the ExpandCollapseSearchResponseListener to expand collapsed hits.
The removal of SearchResponseListener is not a breaking change because it was never released.
This change also replace the blocking call in ExpandCollapseSearchResponseListener by a single asynchronous multi search request. The parallelism of the expand request can be set via CollapseBuilder#max_concurrent_group_searches
Closes#23048
Now that we use bulk for single item indexing, this is often the case. Having an indicator of the id of the indexed document helps debugging.
It now looks like this `BulkShardRequest to [[test][0]] containing [index {[test][type][AVojzy9ZxfWASZ-ysmN7], source[{"auto":true}]}]`
Empty response bodies should only be sent for HEAD requests, otherwise we should always send back info about the exception that was thrown. Removed some manual exception handling in the REST action that should be rather bubbled up and handled by our rest action infra like every other rest action does.
This pull request adds a new parameter to the REST Search API named `typed_keys`. When set to true, the aggregation names in the search response will be prefixed with a prefix that reflects the internal type of the aggregation.
Here is a simple example:
```
GET /_search?typed_keys
{
"aggs": {
"tweets_per_user": {
"terms": {
"field": "user"
}
}
},
"size": 0
}
```
And the response:
```
{
"aggs": {
"sterms:tweets_per_user": {
...
}
}
}
```
This parameter is intended to make life easier for REST clients that could parse back the prefix and could detect the type of the aggregation to parse. It could also be implemented for suggesters.
Today we account for too many response with an `IllegalStateException` in
`AbstractSearchAsyncAction` while this is something that should never happen
we should rather assert that we are always have less or equal the number of
expected ops when waiting for responses.
This commit removes support for the `application/x-ldjson` Content-Type header as this was only used in the first draft
of the spec and had very little uptake. Additionally, the docs for bulk and msearch have been updated to specifically
call out ndjson and mention that the newline character may be preceded by a carriage return.
Finally, the bulk request handling of the carriage return has been improved to remove this character from the source.
Closes#23025
This is just a temporary fix until #23048 is fixed. FieldCollapsing
is executing blocking calls on a network thread which causes potential deadlocks
and trips assertions.
Relates to #23048
We have a bunch of interfaces that have only a single implementation
for 6 years now. These interfaces are pretty useless from a SW development
perspective and only add unnecessary abstractions. They also require
lots of casting in many places where we expect that there is only one
concrete implementation. This change removes the interfaces, makes
all of the classes final and removes the duplicate `foo` `getFoo` accessors
in favor of `getFoo` from these classes.
After the first cluster state from a new master is processed, NodeConnectionService guarantees we connect to the new master. This removes the need to explicitly connect to the master in the MasterFaultDetection code making it simpler and bypasses the assertion triggered due to the blocking operation on the cluster state thread.
Relates to #22828
Today we carry on all search results including aggs, suggest and profile results
until we have successfully fetched all hits for the search request. This can potentially
hold on to a large amount of memory if there are heavy aggregations involved. With
this change aggs and profiles are entirely consumed an released for GC before the fetch
phase is executing. This is a first step towards reducing results on-the-fly if the number
of non-empty response are large.
Elasticsearch v5.0.0 uses allocation IDs to safely allocate primary shards whereas prior versions of ES used a version-based mode instead. Elasticsearch v5 still has support for version-based primary shard allocation as it needs to be able to load 2.x shards. ES v6 can drop the legacy support.
Since `_all` is now deprecated and cannot be set for new indices, we should also
disallow any field that has the `include_in_all` parameter set.
Resolves#22923
This is related to #22116. This commit adds calls that require
SocketPermission connect to forbidden APIs.
The following calls are now forbidden:
- java.net.URL#openStream()
- java.net.URLConnection#connect()
- java.net.URLConnection#getInputStream()
- java.net.Socket#connect(java.net.SocketAddress)
- java.net.Socket#connect(java.net.SocketAddress, int)
- java.nio.channels.SocketChannel#open(java.net.SocketAddress)
- java.nio.channels.SocketChannel#connect(java.net.SocketAddress)
#22194 gave us the ability to open low level temporary connections to remote node based on their address. With this use case out of the way, actual full blown connections should validate the node on the other side, making sure we speak to who we think we speak to. This helps in case where multiple nodes are started on the same host and a quick node restart causes them to swap addresses, which in turn can cause confusion down the road.
This is related to #23020. There are some cases for where this method
might be called with a URL to a file inside a jar. This commit allows
this method to read URLs with a protocol of 'jar:/'.
Secure settings from the elasticsearch keystore were not yet validated.
This changed improves support in Settings so that secure settings more
seamlessly blend in with normal settings, allowing the existing settings
validation to work. Note that the setting names are still not validated
(yet) when using the elasticsearc-keystore tool.
As part of #22116 we are going to forbid usage of api
java.net.URL#openStream(). However in a number of places across the
we use this method to read files from the local filesystem. This commit
introduces a helper method openFileURLStream(URL url) to read files
from URLs. It does specific validation to only ensure that file:/
urls are read.
Additionlly, this commit removes unneeded method
FileSystemUtil.newBufferedReader(URL, Charset). This method used the
openStream () method which will soon be forbidden. Instead we use the
Files.newBufferedReader(Path, Charset).
This commit adds support for the newline delimited JSON Content-Type, which is how
the bulk, multi-search, and multi-search template APIs expect data to be formatted. The
`elasticsearch-js` client has also been using this content type for these types of requests.
Closes#22943
Today either all nodes in the cluster connect to remote clusters of only nodes
that have remote clusters configured in their node config. To allow global remote
cluster configuration but restrict connections to a set of nodes in the cluster
this change adds a new setting `search.remote.connect` (defaults to `true`) to allow
to disable remote cluster connections on a per node basis.
In order to support the evolving GeoPoint encodings in Lucene 5 and 6, ES 2.x and 5.x implements an abstraction layer to the GeoPointFieldMapper classes. As of 5.x the geo_point field mapper settled on using Lucene's more performant LatLonPoint field type and deprecated all other encodings. In 6.0 all encodings except LatLonPoint have been removed rendering this abstraction layer useless. This commit removes the abstraction layer and renames the LatLonPointFieldMapper back to GeoPointFieldMapper to mantain consistency with ES field naming.
create a snapshot with a name that already exists in the repository.
Instead of throwing a SnapshotCreateException, which results in a
generic 500 status code, a duplicate snapshot name will throw a
InvalidSnapshotNameException, which will result in a 400 status code
(bad request).
`QUERY_AND_FETCH` has been treated as an internal optimization for 2 major
versions. This commit removes the search type and it's implementation details and
folds the optimization in the case of a single shard into the search controller such
that every search with a single shard (non DFS) will receive this optimization.
When a node receives a new cluster state from the master, it opens up connections to any new node in the cluster state. That has always been done serially on the cluster state thread but it has been a long standing TODO to do this concurrently, which is done by this PR.
This is spin off of #22828, where an extra handshake is done whenever connecting to a node, which may slow down connecting. Also, the handshake is done in a blocking fashion which triggers assertions w.r.t blocking requests on the cluster state thread. Instead of adding an exception, I opted to implement concurrent connections which both side steps the assertion and compensates for the extra handshake.
This changes the way that replica failures are handled such that not all
failures will cause the replica shard to be failed or marked as stale.
In some cases such as refresh operations, or global checkpoint syncs, it is
"okay" for the operation to fail without the shard being failed (because no data
is out of sync). In these cases, instead of failing the shard we should simply
fail the operation, and, in the event it is a user-facing operation, return a
5xx response code including the shard-specific failures.
This was accomplished by having two forms of the `Replicas` proxy, one that is
for non-write operations that does not fail the shard, and one that is for write
operations that will fail the shard when an operation fails.
Relates to #10708
It was accidentally renamed `enabled_position_increment` in the cleanups
for 5.0. This adds `enable_position_increment` as a deprecated alias
so it will continue to work.
This commit removes the following queries and parameters (which were deprecated in 5.0):
* GeoPointDistanceRangeQuery
* coerce, and ignore_malformed for GeoBoundingBoxQuery, GeoDistanceQuery, GeoPolygonQuery, and GeoDistanceSort
This is related to #22116. Core no longer needs `SocketPermission`
`connect`.
This permission is relegated to these modules/plugins:
- transport-netty4 module
- reindex module
- repository-url module
- discovery-azure-classic plugin
- discovery-ec2 plugin
- discovery-gce plugin
- repository-azure plugin
- repository-gcs plugin
- repository-hdfs plugin
- repository-s3 plugin
And for tests:
- mocksocket jar
- rest client
- httpcore-nio jar
- httpasyncclient jar
This commit upgrades the checkstyle configuration from version 5.9 to
version 7.5, the latest version as of today. The main enhancement
obtained via this upgrade is better detection of redundant modifiers.
Relates #22960
When a primary is relocated from an old node to a new node, it can have
ops in its translog that do not have a sequence number assigned. When a
file-based recovery is started, this can lead to skipping these ops when
replaying the translog due to a bug in the recovery logic. This commit
addresses this bug and adds a test in the BWC tests.
Relates #22945
This change also removes the reference to the difference bewteen full name and index name.
They are always the same since 2.x and `name` does not refer anymore to `author.name` automatically.
A simple pattern must be used instead.
Remove redundant code that checks the field name twice.
Today if a user invokes the remove plugin command without specifying the
name of a plugin to remove, we arrive at a null pointer exception. This
commit adds logic to cleanly handle this situation and provide clear
feedback to the user.
Relates #22930
Currently, update action internally uses deprecated index and delete
transport actions. As of #21964, these tranport actions were deprecated
in favour of using single item bulk request. In this commit, update action
uses single item bulk action.
This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.
The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.
As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.
In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.
See #19388
GeoDistance query, sort, and scripts make use of a crazy GeoDistance enum for handling 4 different ways of computing geo distance: SLOPPY_ARC, ARC, FACTOR, and PLANE. Only two of these are necessary: ARC, PLANE. This commit removes SLOPPY_ARC, and FACTOR and cleans up the way Geo distance is computed.
This commit change ElasticsearchException.failureFromXContent() method so that it now parses root causes which were ignored before, and adds them as suppressed exceptions of the returned exception.
Implemented by wrapping an array of reused `ModuleDateTime`s that
we grow when needed. The `ModuleDateTime`s are reused when we
move to the next document.
Also improves the error message returned when attempting to modify
the `ScriptdocValues`, removes a couple of allocations, and documents
that the date functions are available in Painless.
Relates to #22162
This disallows object mappings that would accidentally create something like
`foo..bar`, which is then unparsable for the `bar` field as it does not know
what its parent is.
Resolves#22794
The test tried to create a situation where a stale replica is the only shard available. It did so by stopping the node with the replica, indexing some, stopping the primary node, starting a new node. This is flawed because the newly started node may reuse the data path of the primary node and things go back to green. Instead we should make sure that the replica is on the path that will be selected when the new node is started (i.e., the path with the smaller ordinal)
This commit adds a BytesRestResponse.errorFromXContent() method to parse the error returned by BytesRestResponse. It returns a ElasticsearchStatusException instance.
Currently, if a previously allocated shard has no in-sync copy in the
cluster, but there is a stale replica copy, the explain API does not
include information about the stale replica copies in its output. This
commit includes any shard copy information available (even for stale
copies) when explaining an unassigned primary shard that was previously
allocated in the cluster.
This situation can arise as follows: imagine an index with 1 primary and
1 replica and a cluster with 2 nodes. If the node holding the replica
is shut down, and data continues to be indexed, only the primary will
have the latest data and the replica that has gone offline will be
marked as stale. Now, suppose the node holding the primary is shut
down. There are no copies of the shard data in the cluster. Now, start
the first stopped node (holding the stale replica) back up. The cluster
is red because there is no in-sync copy available. Running the explain
API before would inform the user that there is no valid shard copy in
the cluster for that shard, but it would not provide any information
about the existence of the stale replica that exists on the restarted
node. With this commit, the explain API provides information about all
the stale replica copies when explaining the unassigned primary.
Currently, stored scripts use a namespace of (lang, id) to be put, get, deleted, and executed. This is not necessary since the lang is stored with the stored script. A user should only have to specify an id to use a stored script. This change makes that possible while keeping backwards compatibility with the previous namespace of (lang, id). Anywhere the previous namespace is used will log deprecation warnings.
The new behavior is the following:
When a user specifies a stored script, that script will be stored under both the new namespace and old namespace.
Take for example script 'A' with lang 'L0' and data 'D0'. If we add script 'A' to the empty set, the scripts map will be ["A" -- D0, "A#L0" -- D0]. If a script 'A' with lang 'L1' and data 'D1' is then added, the scripts map will be ["A" -- D1, "A#L1" -- D1, "A#L0" -- D0].
When a user deletes a stored script, that script will be deleted from both the new namespace (if it exists) and the old namespace.
Take for example a scripts map with {"A" -- D1, "A#L1" -- D1, "A#L0" -- D0}. If a script is removed specified by an id 'A' and lang null then the scripts map will be {"A#L0" -- D0}. To remove the final script, the deprecated namespace must be used, so an id 'A' and lang 'L0' would need to be specified.
When a user gets/executes a stored script, if the new namespace is used then the script will be retrieved/executed using only 'id', and if the old namespace is used then the script will be retrieved/executed using 'id' and 'lang'
The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery.
This commit adds a test that creates situation where rollback is needed (primary failover with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming `startSeqNo` as a filter.
Relates to #22484 & #10708
With the new secure settings, methods like getAsMap() no longer work
correctly as a means of checking for empty settings, or the total size.
This change converts the existing uses of that method to use methods
directly on Settings. Note this does not update the implementations to
account for SecureSettings, as that will require a followup which
changes how secure settings work.
* Integrate UnifiedHighlighter
This change integrates the Lucene highlighter called "unified" in the list of supported highlighters for ES.
This highlighter can extract offsets from either postings, term vectors, or via re-analyzing text.
The best strategy is picked automatically at query time and depends on the field and the query to highlight.
In #22762, settings preparation during bootstrap was changed slightly to
account for SecureSettings, by starting with a fresh settings builder
after reading the initial configuration. However, this the defaults from
system properties were never re-read. This change fixes that bug (which
was never released).
closes#22861
Usually the order in which we serialize sets and maps of things doesn't matter,
but since InnerHitBuilder is part of SearchSourceBuilder, which is in turn used
as a cache key in its bytes serialization, we need to ensure the order of all
these fields when writing them to an output stream.
This adds tests and makes sure we iterate over the scriptField set and the
childInnerHits map in a fixed order.
Closes#22808
Also adds many `equals` and `hashCode` implementations and moves
the failure printing in `MatchAssertion` into a common spot and
exposes it over `assertEqualsWithErrorMessageFromXContent` which
does an object equality test but then uses `toXContent` to print
the differences.
Relates to #22278
This moves the building blocks for delete by query into core. This
should enabled two thigns:
1. Plugins other than reindex to implement "bulk by scroll" style
operations.
2. Plugins to directly call delete by query. Those plugins should
be careful to make sure that task cancellation still works, but
this should be possible.
Notes:
1. I've mostly just moved classes and moved around tests methods.
2. I haven't been super careful about cohesion between these core
classes and reindex. They are quite interconnected because I wanted
to make the change as mechanical as possible.
Closes#22616
* S3 repository: Add named configurations
This change implements named configurations for s3 repository as
proposed in #22520. The access/secret key secure settings which were
added in #22479 are reverted, and the only secure settings are those
with the new named configs. All other previously used settings for the
connection are deprecated.
closes#22520
Also adds many `equals` and `hashCode` implementations and moves
the failure printing in `MatchAssertion` into a common spot and
exposes it over `assertEqualsWithErrorMessageFromXContent` which
does an object equality test but then uses `toXContent` to print
the differences.
Relates to #22278
This commit introduces sequence-number-based recovery. When a replica
has fallen out of sync, rather than performing a file-based recovery we
first attempt to replay operations since the last local checkpoint on
the replica. To do this, at the start of recovery the replica tells the
primary what its local checkpoint is. The primary will then wait for all
operations between that local checkpoint and the current maximum
sequence number to complete; this is to ensure that there are no gaps in
the operations that will be replayed from the primary to the
replica. This is a best-effort attempt as we currently have no
guarantees on the primary that these operations will be available; if we
are not able to replay all operations in the desired range, we just
fallback to file-based recovery. Later work will strengthen the
guarantees.
Relates #22484
At this point AbstractSearchAsyncAction is just a base-class for the first phase of a search where we have multiple replicas
for each shardID. If one of them is not available we move to the next one. Yet, once we passed that first stage we have to work with
the shards we succeeded on the initial phase.
Unfortunately, subsequent phases are not fully detached from the initial phase since they are all non-static inner classes.
In future changes this will be changed to detach the inner classes to test them in isolation and to simplify their creation.
The AbstractSearchAsyncAction should be final and it should just get a factory for the next phase instead of requiring subclasses
etc.
This commit adds a ElasticsearchException.failureFromXContent() that can be used to parse the result of ElasticsearchException.generateFailureXContent().
Today we cache query results even if the query timed out. This is obviously
problematic since results are not complete. Yet, the decision if a query timed
out or not happens too late to simply not cache the result since if we'd just throw
an exception all currently waiting requests with the same request / cache key would
fail with the same exception without the option to access the result or to re-execute.
Instead, this change will allow the request to enter the cache but invalidates it immediately.
Concurrent request might not get executed and return the timed out result which is not absolutely
correct but very likely since identical requests will likely timeout as well. As a side-effect
we won't hammer the node with concurrent slow searches but rather only execute one of them
and return shortly cached result.
Closes#22789
The output of the ElasticsearchException.generateThrowableXContent() method can be parsed back by the ElasticsearchException.fromXContent() method.
This commit adds unit tests in the style of the to-and-from-xcontent tests we already have for other parsing methods. It also relax the strict parsing of the ElasticsearchException.fromXContent() so that it does not throw an exception when custom metadata and headers are parsed, as long as they are either strings or arrays of strings. Every other type is ignored at parsing time.
Some tests verify that all connection have been closed but due to the
async / concurrent nature of `RemoteClusterConnection` there are situations
where we notify listeners that trigger tests to finish before we actually
closed all connections. The race is very very small and has no impact on the
code correctness. This commit documents and improves the way we close
connections to ensure test won't fail with false positives.
Closes#22803
This commit removes the search type `dfs_query_and_fetch` without a
replacement. We don't allow to use this type via REST since 2.x
but still keep it around for no particular reason. There we no users
complaining about the availability. This should now be removed from the
codebase. `query_and_fetch` is still used internally to safe a roundtrip
if there is only one shard but it can't be used via the rest interface.
This is related to #22116. URLRepository requires SocketPermission
connect. This commit introduces a new module called "repository-url"
where URLRepository will reside. With the new module, permissions can
be removed from core.
Add unit tests for `TopHitsAggregator` and convert some snippets in
docs for `top_hits` aggregation to `// CONSOLE`.
Relates to #22278
Relates to #18160