Update scripts might want to update the documents `_timestamp` but need a notion of `now()`.
Painless doesn't support any notion of now() since it would make scripts non-pure functions. Yet,
in the update case this is a valid value and we can pass it with the context together to allow the
script to record the timestamp the document was updated.
Relates to #17895
SynonymQuery was ignored by the FastVectorHighlighter.
This change adds the support for SynonymQuery in the FVH.
Although this change should be implemented in Lucene directly which is why https://issues.apache.org/jira/browse/LUCENE-7484 has been opened.
In the meantime this PR handles the issue on ES side and could be removed when LUCENE-7484 gets merged.
Fixes#20781
* Replace org.elasticsearch.common.lucene.search.MatchNoDocsQuery with its Lucene version (org.apache.lucene.search.MatchNoDocsQuery)
This change removes the ES version of the match no docs query and replaces it with the Lucene version.
relates #18030
* Add missing change
Today we throw an assertion error if we release an AbstractArray more than once.
Yet, it's recommended to implement close methods such that they can be invoked
more than once. Guaranteed single release calls are hard to implement and some
situations might not be tested causing for instance `CircuitBreaker` to operate on
corrupted memory stats.
* Fix match_phrase_prefix query with single term on _all field
This change fixes the match_phrase_prefix query when a single term is queried on the _all field.
It builds a prefix query instead of an AllTermQuery which would not match any prefix.
Fixes#20470
* Add missing change
Mappings treat dots in field names as sub objects, for instance
```
{
"a.b": "c"
}
```
generates the same dynamic mappings as
```
{
"a": {
"b": "c"
}
}
```
Source filtering should be consistent with this behaviour so that an include
list containing `a` should include fields whose name is `a.b`.
To make this change easier, source filtering was refactored to use automata.
The ability to treat dots in field names as sub objects is provided by the
`makeMatchDotsInFieldNames` method of `XContentMapValues`.
Closes#20719
Instead provide services where they are needed. The class worked
well as a temporary measure to easy removal of guice from the index
level but now we can remove it entirely.
-1 @Inject annotation
This commit upgrades the Log4j 2 dependency to version 2.7 and removes
some hacks that we had in place to work around bugs in Log4j 2 version
2.6.2.
Relates #20805
UpdateHelper, MetaDataIndexUpgradeService, and some recovery
stuff.
Move ClusterSettings to nullable ctor parameter of TransportService
so it isn't forgotten.
The `QueryShardContext.failIfFrozen()` and `QueryShardContext.freezeContext()`
methods should be final so that overriding/bypassing the freezing of
`QueryShardContext` is not possible. This is important so that we can
trust when the `QueryShardContext` says a request is cacheable.
This change also makes the methods that call `QueryShardContext.failIfFrozen()`
`final` so they cannot be overridden to bypass setting the request as not
cacheable.
Elasticsearch 1.x used to implicitly round up upper bounds of queries when they
were inclusive so that eg. `[2016-09-18 TO 2016-09-20]` would actually run
`[2016-09-18T00:00:00.000Z TO 2016-09-20T23:59:59.999Z]` and include dates like
`2016-09-20T15:32:44`. This behaviour was lost in the cleanups of #8889.
Closes#20579
The shards preference on a search request enables specifying a list of
shards to hit, and then a secondary preference (e.g., "_primary") can be
added. Today, the separator between the shards list and the secondary
preference is ';'. Unfortunately, this is also a valid separtor for URL
query parameters. This means that a preference like "_shards:0;_primary"
will be parsed into two URL parameters: "_shards:0" and "_primary". With
the recent change to strict URL parsing, the second parameter will be
rejected, "_primary" is not a valid URL parameter on a search
request. This means that this feature has never worked (unless the ';'
is escaped, but no one does that because our docs do not that, and there
was no indication from Elasticsearch that this did not work). This
commit changes the separator to '|'.
Relates #20786
This change proposes the removal of all non-tcp transport implementations. The
mock transport can be used by default to run tests instead of local transport that has
roughly the same performance compared to TCP or at least not noticeably slower.
This is a master only change, deprecation notice in 5.x will be committed as a
separate change.
Previous to this change the DateMathParser accepted a Callable<Long> to use for accessing the now value. The implementations of this callable would fall back on System.currentTimeMillis() if there was no context object provided. This is no longer necessary for two reasons:
We should not fall back to System.currentTimeMillis() as a context should always be provided. This ensures consistency between shards for the now value in all cases
We should use a LongSupplier rather than requiring an implementation of Callable. This means that we can just pass in context::noInMillis for this parameter and not have not implement anything.
This commit improves the shard decision container class in the following
ways:
1. Renames UnassignedShardDecision to ShardAllocationDecision, so that
the class can be used for general shard decisions, not just unassigned
shard decisions.
2. Changes ShardAllocationDecision to have the final decision as a Type
instead of a Decision, because all the information needed from the final
decision is contained in `Type`.
3. Uses cached instances of ShardAllocationDecision for NO and THROTTLE
decisions when no explanation is needed (which is the common case when
executing reroute's as opposed to using the explain API).
Today SearchContext expose the current context as a thread local which makes any kind of sane interface design very very hard. This PR removes the thread local entirely and instead passes the relevant context anywhere needed. This simplifies state management dramatically and will allow for a much leaner SearchContext interface down the road.
As the wise man @ywelsch said: currently when we batch cluster state update tasks by the same executor, we the first task un-queued from the pending task queue. That means that other tasks for the same executor are left in the queue. When those are dequeued, they will trigger another run for the same executor. This can give unfair precedence to future tasks of the same executor, even if they weren't batched in the first run. Take this queue for example (all with equal priority)
```
T1 (executor 1)
T2 (executor 1)
T3 (executor 2)
T4 (executor 2)
T5 (executor 1)
T6 (executor 1)
```
If T1 & T2 are picked up first (when T5 & T6 are not yet queued), one would expect T3 & T4 to run second. However, since T2 is still in the queue, it will trigger execution of T5 & T6.
The fix is easy - ignore processed tasks when extracting them from the queue.
Closes#20768
Currently, update action delegates to index and delete actions
for replication using a dedicated transport action. This change
makes update a replication operation, removing the dedicated
transport action. This simplifies bulk execution and removes
duplicate logic for update retries and translation. This
consolidates the interface for single document write requests.
Now on the primary, the update request is translated to
an index or delete request before execution and the translated
request is sent to copies for replication.
Today it compiles when creating the aggregator, meaning that scripts will be
compiled as many times as there are buckets. Instead it should compile when
creating the factory so that scripts are compiled only once regardless of the
number of buckets.
TRA currently resolves incoming requests to IndexShards in order to acquire operations locks on them. There is no need for all subclasses to have to go through the same IndicesService/IndexService song and dance. Also, doing it once means we don't need to worry about edge cases where the shard is removed while a TRA is in flight.
This commit improves the logic flow of BalancedShardsAllocator in
preparation for separating out components of this class to be used
in the cluster allocation explain APIs. In particular, this commit:
1. Adds a minimum value for the index/shard balance factor settings (0.0)
2. Makes the Balancer data structures immutable and pre-calculated at
construction time.
3. Removes difficult to follow labeled blocks / GOTOs
4. Better logic for skipping over the same replica set when one of
the replicas received a NO decision
5. Separates the decision making logic for a single shard from the logic
to iterate over all unassigned shards.
Before this change the processing of the ranges in the date range (and
other range type) aggregations was done when the Aggregator was created.
This meant that the SearchContext did not know that now had been used in
a range until after the decision to cache was made.
This change moves the processing of the ranges to the aggregation builders
so that the search context is made aware that now has been used before
it decides if the request should be cached
This commit adds a did you mean feature to the strict REST params error
message. This works by comparing any unconsumed parameters to all of the
consumer parameters, comparing the Levenstein distance between those
parameters, and taking any consumed parameters that are close to an
unconsumed parameter as candiates for the did you mean.
* Fix pluralization in strict REST params message
This commit fixes the pluralization in the strict REST parameters error
message so that the word "parameter" is not unconditionally written as
"parameters" even when there is only one unrecognized parameter.
* Strength strict REST params did you mean test
This commit adds an unconsumed parameter that is too far from every
consumed parameter to have any candidate suggestions.
Relates #20747
This commit changes the strict REST parameters message to say that
unconsumed parameters are unrecognized rather than unused. Additionally,
the test is beefed up to include two unused parameters.
Relates #20745
Before this change the processing of the ranges in the date range (and
other range type) aggregations was done when the Aggregator was created.
This meant that the SearchContext did not know that now had been used in
a range until after the decision to cache was made.
This change moves the processing of the ranges to the aggregation builders
so that the search context is made aware that now has been used before
it decides if the request should be cached
This commit adds a did you mean feature to the strict REST params error
message. This works by comparing any unconsumed parameters to all of the
consumer parameters, comparing the Levenstein distance between those
parameters, and taking any consumed parameters that are close to an
unconsumed parameter as candiates for the did you mean.
* Fix pluralization in strict REST params message
This commit fixes the pluralization in the strict REST parameters error
message so that the word "parameter" is not unconditionally written as
"parameters" even when there is only one unrecognized parameter.
* Strength strict REST params did you mean test
This commit adds an unconsumed parameter that is too far from every
consumed parameter to have any candidate suggestions.
Relates #20747
This commit changes the strict REST parameters message to say that
unconsumed parameters are unrecognized rather than unused. Additionally,
the test is beefed up to include two unused parameters.
Relates #20745
Today when parsing a request, Elasticsearch silently ignores incorrect
(including parameters with typos) or unused parameters. This is bad as
it leads to requests having unintended behavior (e.g., if a user hits
the _analyze API and misspell the "tokenizer" then Elasticsearch will
just use the standard analyzer, completely against intentions).
This commit removes lenient URL parameter parsing. The strategy is
simple: when a request is handled and a parameter is touched, we mark it
as such. Before the request is actually executed, we check to ensure
that all parameters have been consumed. If there are remaining
parameters yet to be consumed, we fail the request with a list of the
unconsumed parameters. An exception has to be made for parameters that
format the response (as opposed to controlling the request); for this
case, handlers are able to provide a list of parameters that should be
excluded from tripping the unconsumed parameters check because those
parameters will be used in formatting the response.
Additionally, some inconsistencies between the parameters in the code
and in the docs are corrected.
Relates #20722
Today, the individual allocation deciders appear in random
order when initialized in AllocationDeciders, which means
potentially more performance intensive allocation deciders
could run before less expensive deciders. This adds to the
execution time when a less expensive decider could terminate
the decision making process early with a NO decision. This
commit orders the initialization of allocation deciders,
based on a general assessment of the big O runtime of each
decider, moving the likely more expensive deciders last.
Closes#12815
IndicesClusterStateService and IndicesStore are responsible for synchronizing local shard state based on incoming cluster state updates. On client/tribe nodes, which don't store any such shard/index data/metadata, all of the logic that computes which data is to be deleted, which shards to be initialized etc. can be completely skipped, saving precious CPU cycles.
CompositeIndicesRequest should be implemented by all requests that are composed of multiple subrequests which relate to one or more indices. A composite request is
executed by its own transport action class (e.g. TransportMultiSearchAction for _msearch), which goes through all the subrequests and delegates their execution to the appropriate transport action (e.g. TransportSearchAction for _msearch) for each single item. IndicesAliasesRequest is a particular request as it holds multiple items that implement AliasesRequest, but it shouldn't be considered a composite request, as it has no specific transport action for each of its items. Also, either all of its subitems fail or succeed.
Also clarified javadocs for CompositeIndicesRequest.
We have new icons for elastic products with 5.0. This change updates the
favicon embedded in elasticsearch that users see when using the rest api
through a browser.
When a node get disconnected from the cluster and rejoins during a master election, it may be that the new master already has that node in it's cluster and will try to assign it shards. If the node hosts started primaries, the new shards will be initializing and will have the same allocation id as the allocation ids of the current started size. We currently do not recognize this currently. We should clean the current IndexShard instances and initialize new ones.
This also hardens test assertions in the same area.
This makes geo-distance sorting use `LatLonDocValuesField.newDistanceSort`
whenever applicable, which should be faster that the current approach since it
tracks a bounding box that documents need to be in in order to be competitive
instead of doing a costly distance computation all the time.
Closes#20450
today it's not possible to use date-math efficiently with the `_rollover`
API. This change adds support for date-math in the target index as well as
support for preserving the math logic when an existing index that was created with
a date math expression all subsequent indices are created with the same expression.
Right now our unit tests in that area only simulate indexing single documents. As we go forward it should be easy
to add other actions, like delete & bulk indexing. This commit extracts the common parts of the current indexing
logic to a based class make it easier to extend.
The Setting.timeValue() method uses TimeValue.toString() which can produce fractional time values. These fractional time values cannot be parsed again by the settings framework.
This commit fix a method that still use the .toString() method and replaces it with .getStringRep(). It also changes a second method so that it's not up to the caller to decide which stringify method to call.
closes#20662
This commit fixes a failing cluster settings tests, namely the logger
level update test. The test was incorrectly assuming the default log
level was info, but it could be non-info, for example, if
tests.es.logger.level is set to some non-info level.
Closes#20318
Today we allow system bootstrap checks to be ignored with a
setting. Yet, the system bootstrap checks are as vital to the health of
a production node as the non-system checks (e.g., the original bootstrap
check, the file descriptor check, is critical for reducing the chances
of data loss from being too low). This commit removes the ability to
ignore system bootstrap checks.
Relates #20511
The invalid ingest configuration field name used to show itself,
even when it was null, in error messages. Sometimes this does not make
sense.
e.g.
```[null] Only one of [file], [id], or [inline] may be configure```
vs.
```Only one of [file], [id], or [inline] may be configure```
The above deals with three fields, therefore this no one property
responsible.
this change adds a hard limit to `index.number_of_shard` that prevents
indices from being created that have more than 1024 shards. This is still
a huge limit and can only be changed via settings a system property.
Today when getting setting via an API like the cluster settings API,
complex settings are excluded (e.g.,
discovery.zen.ping.unicast.hosts). This commit adds these settings to
the output of such APIs.
Relates #20622
It currently returns something like:
```
"No feature for name [_siohgjoidfhjfihfg]"
```
Which is not the most understandable message, this changes it to be a
little more readable.
Resolves#10946
Today when executing the install plugin command without a plugin id, we
end up throwing an NPE because the plugin id is null yet we just keep
going (ultimatley we try to lookup the null plugin id in a set, the
direct cause of the NPE). This commit modifies the install command so
that a missing plugin id is detected and help is provided to the user.
Relates #20660
reindex-from-remote should ignore unknown fields so it is mostly
future compatible. This makes it ignore unknown fields by adding an
option to `ObjectParser` and `ConstructingObjectParser` that, if
enabled, causes them to ignore unknown fields.
Closes#20504
Many of our unit tests instantiate an `AllocationService`, which requires having a `GatewayAllocator`. Today almost all of our test use a class called `NoopGatewayAllocator` which does nothing, effectively leaving all shard assignments to the balanced allocator. This is sad as it means we test a system that behaves differently than our production logic in very basic things. For example, a started primary that is lost will be assigned to a node that didn't use to have it.
This PR removes `NoopGatewayAllocator` in favor of a new `TestGatewayAllocator` that inherits the standard `GatewayAllocator` and overrides shard information fetching to return information based on historical assignments the allocator has done. The only exception is `BalanceConfigurationTests` which does test only the balancer and I opted to not have it work around the `GatewayAllocator` being in it's way.
Changes the API of GatewayAllocator#applyStartedShards and
GatewayAllocator#applyFailedShards to take both a RoutingAllocation
and a list of shards to apply. This allows better mock allocators
to be created as being done in #20637.
Closes#20642
Removes the FailedRerouteAllocation class and StartedRerouteAllocation
class, as they were just wrappers for RerouteAllocation that stored
started and failed shards, but these started and failed shards can
be passed in directly to the methods that needed them, removing the
need for this wrapper class and extra level of indirection.
Closes#20626
When initializing a new index routing table, we make a decision where the primary shards should be recovered from. This can be an empty folder for new indices, a set of specific allocation ids for old indices or a snapshot. We currently allow callers of `IndexRoutingTable.initializeEmpty` to supply the source but also set it automatically if null is given. Sadly the current logic is reusing the supplied parameter to store the result of the automatic decision. This is flawed if some of the decision should be *different* between the different index shard (as the first decision that is maid sticks).
This commit fixes this but also simplifies the API to always make an automatic decision.
This was discovered while working on #20637 which strengthens the testing infra and caused this to bubble up. I put it as a separate commit to make sure it is not lost as part of a bigger test only PR.
Today we hold on to all possible tokenizers, tokenfilters etc. when we create
an index service on a node. This was mainly done to allow the `_analyze` API to
directly access all these primitive. We fixed this in #19827 and can now get rid of
the AnalysisService entirely and replace it with a simple map like class. This
ensures we don't create a gazillion long living objects that are entirely useless since
they are never used in most of the indices. Also those objects might consume a considerable
amount of memory since they might load stopwords or synonyms etc.
Closes#19828
When testing tribe nodes in an integration test, we should pass the classpath
plugins of the node down to the tribe client nodes. Without this the tribe client
nodes could be prevented from communicating with the tribes.