* master: (911 commits)
[TEST] wait for yellow after setup doc tests (#18726)
Fix recovery throttling to properly handle relocating non-primary shards (#18701)
Fix merge stats rendering in RestIndicesAction (#18720)
[TEST] mute RandomAllocationDeciderTests.testRandomDecisions
Reworked docs for index-shrink API (#18705)
Improve painless compile-time exceptions
Adds UUIDs to snapshots
Add test rethrottle test case for delete-by-query
Do not start scheduled pings until transport start
Adressing review comments
Only filter intial recovery (post API) when shrinking an index (#18661)
Add tests to check that toQuery() doesn't return null
Removing handling of null lucene query where we catch this at parse time
Handle empty query bodies at parse time and remove EmptyQueryBuilder
Mute failing assertions in IndexWithShadowReplicasIT until fix
Remove allow running as root
Add upgrade-not-supported warning to alpha release notes
remove unrecognized javadoc tag from matrix aggregation module
set ValuesSourceConfig fields as private
Adding MultiValuesSource support classes and documentation to matrix stats agg module
...
Relocation of non-primary shards is realized by recovering from the primary shard. Recovery throttling wrongly equates non-primary relocation as recovering a shard from the non-primary relocation source, however.
Closes#18640
This commit adds a UUID for each snapshot, in addition to the already
existing repository and snapshot name. The addition of UUIDs will enable
more robust handling of the deletion of previous snapshots and lingering
files from partially failed delete operations, on top of being able to
uniquely track each snapshot.
Closes#18228
Relates #18156
Today we use `index.routing.allocation.include._id` to filter the allocation
for the shrink target index. That has the sideeffect that the user has to
delete that setting / change it once the primary has been recovered (shrink is done)
This PR adds a dedicated filter that can only be set internally that only filters
allocation for unassigned shards.
Currently we support empty query clauses like the filter in
"constant_score" : { "filter" : { } }
How these clauses are handled depends on the surrounding query.
They later are either ignored, converted to match all or no documents or
passed up further in the query hierarchy. During parsing these claues are
currently represented as EmptyQueryBuilders. When not handled anywhere else,
these special cases need to be checked for on the shard when building the
lucene query.
This is trappy, so this PR changes the parsing of compound queries. Instead
of returning QueryBuilder, the core query parsing method
QueryShardContext#parseInnerQueryBuilder() now return an Optional which can
be empty in the case of empty query clauses. This has the advantage of forcing
callers to deal with this sooner or later. When encountering empty Optionals,
compound query builders now have the choice to ignore them, pass them on or
rewrite to a different query, depending on context.
The setting bootstrap.mlockall is useful on both POSIX-like systems
(POSIX mlockall) and Windows (Win32 VirtualLock). But mlockall is really
a POSIX only thing so the name should not be tied POSIX. This commit
renames the setting to "bootstrap.memory_lock".
Relates #18669
Index deletion requests currently use a custom acknowledgement mechanism that wait for the data nodes to actually delete the data before acknowledging the request to the client. This was initially put into place as a new index with same name could only be created if the old index was wiped as we used the index name as data folder on the data nodes. With PR #16442, we now use the index uuid as folder name which avoids collision between indices that are named the same (deleted and recreated). This allows us to get rid of the custom acknowledgment mechanism altogether and rely on the standard cluster state-based acknowledgment instead.
Closes#18558
If this option is enabled on a processor it silently catches any processor related failure and continues executing the rest of the pipeline.
Closes#18493
We do throw ConnectTransportException which is logged in trace level hiding a potentially
important information when an old or wrong node wants to connect. We should throw ISE and
log as warn.
Similar reasoning as #18133 but for the aggs API. One important change is that
I moved the base PipelineAggregatorBuilder class to the o.e.s.aggregations
package instead of o.e.s.aggregations.pipeline so that the create method does
not need to be public.
Share applying template with MetaDataCreateIndexService and MetaDataIndexTemplateService
Add some unit test
Collapse addMappingsToMapperService and move it to MapperService
Extract validateTemplate method
use expectThrows in testcase
Add TODO comment
Closes#2415
This commit clarifies the behavior that must be adhered to by any
implementors of the BlobContainer interface. This is done through
expanded Javadocs.
Closes#18157Closes#15580
The page cache recycler has a dependency on thread pool that was there
for historical reasons but is no longer needed. This commit removes this
now unneeded dependency.
Relates #18664
Contains a number of cleanups related to recent changes in RoutingNodes:
- PR #17821 (Immutable ShardRouting) changed RoutingNode to use a map indexed by ShardId to manage ShardRouting elements. This means that we can directly select the right ShardRouting without iterating over all elements. This lets us get rid of RoutingNodeIterator and all kind of iterations all over the place.
- Second cleanup is an extension of #18390 (Expose cluster state before reroute in RoutingAllocation instead of RoutingNodes). We should not reexpose RoutingTable in RoutingNodes and only use it in the constructor. This makes it clear that the RoutingTable is only used to construct the RoutingNodes and can diverge from it afterwards (only RoutingNodes is mutable).
- Remove AllocationService.applyNewNodes() (that is already done as part of construction of RoutingNodes)
When we shrink an index we can estimate the shards size for the primary
from the source index. This is important for allocation decisions since we
should try out best to ensure we have enough space on the node we shrink the
index.
Currently we return `null` when the query in a common terms query has
zero tokens after analysis. It would be better if query builders
`toQuery()` would never return null and return a meaningful lucene
query instead. Since an ExtendedCommonTermsQuery with no terms gets
rewritten to a MatchNoDocsQuery later, it is enough to leave out the
check for zero tokens.
The fact that ip fields used a different doc values representation in 2.x causes
issues when querying 2.x and 5.0 indices in the same request. This changes 2.x
doc values on ip fields/2.x to be hidden behind binary doc values that use the
same encoding as 5.0. This way the coordinating node will be able to merge shard
responses that have different major versions.
One known issue is that this makes sorting/aggregating slower on ip fields for
indices that have been generated with elasticsearch 2.x.
This adds a low level primitive operations to shrink an existing
index into a new index with a single shard. This primitive expects
all shards of the source index to allocated on a single node. Once the target index is initializing on the shrink node it takes a snapshot of the source index shards and copies all files into the target indices data folder. An [optimization](https://issues.apache.org/jira/browse/LUCENE-7300) coming in Lucene 6.1 will also allow for optional constant time copy if hard-links are supported by the filesystem. All mappings are merged into the new indexes metadata once the snapshots have been taken on the merge node.
To shrink an existing index all shards must be moved to a single node (one instance of each shard) and the index must be read-only:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_settings' -d '{
"settings" : {
"index.routing.allocation.require._name" : "shrink_node_name",
"index.blocks.write" : true
}
}
```
once all shards are started on the shrink node. the new index can be created via:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_shrink/logs_single_shard' -d '{
"settings" : {
"index.codec" : "best_compression",
"index.number_of_replicas" : 1
}
}'
```
This API will perform all needed check before the new index is created and selects the shrink node based on the allocation of the source index. This call returns immediately, to monitor shrink progress the recovery API should be used since all copy operations are reflected in the recovery API with byte copy progress etc.
The shrink operation does not modify the source index, if a shrink operation should
be canceled or if the shrink failed, the target index can simply be deleted and
all resources are released.
When calling `findTemplateBuilder(context, currentFieldName, "text", null)`,
elasticsearch ignores all templates that have a `match_mapping_type` set since
no dynamic type is provided (the last parameter, which is null in that case).
So this should only be called _last_. Otherwise, if a path-based template
matches, it will have precedence over all type-based templates.
Closes#18625
We have 3 evil tests for jarhell. They have been failing in java 9
because of how evil they are. The first checks the leniency we add for
jarhell in the jdk itself. This is unecessary, since if the leniency
wasn't there, we would already be failing all jarhell checks. The second
is checking the compile version is compatible with the jdk. This is
simpler since we don't need to fake the java version: we know 1.7 should
be compatibile with both java 8 and 9, so we can use that as a constant.
Finally the last test checks if the java version system property is
broken. This is simply something we should not check, we have to trust
that java specifies it correctly, and again, if it was broken, all
jarhell checks would be broken.
There is no reason to read the entire marvel hero file to test the features,
it might take several seconds to do so which is unnecessary.
This commit also splits SearchSuggestTests into core and modules/mustache
also add @Nighlty to forbidden API to make sure we don't use it since they won't run in CI these days.
Modifying the translog replay to not replay again into the translog
introduced a bug for the case of multiple operations for the same
doc. Namely, since we were no longer updating the version map for each
operation, the second operation for a doc would be treated as a creation
instead of as an update. This commit fixes this bug by placing these
operations into version map. This commit includes a failing test case.
Relates #18611
Today we pull doc stats from an index reader which might not reflect reality.
IndexWriter might have merged all deletes away but due to a missing refresh
the stats are completely off. This change pulls doc stats from the IndexWriter
directly instead of relying on refreshes to run regularly. Note: Buffered deletes
are still not visible until the segments are flushed to disk.
If the relocation source fails during the relocation of a shard from one node to another, the
relocation target is currently failed as well. For replica shards this is not necessary,
however, as the actual shard recovery of the relocation target is done via the primary shard.
Our current testing for TimeUnitRoundings rounding() and nextRoundingValue()
methods that are used especially for date histograms lacked proper randomization
for time zones. We only did randomized tests for fixed offset time zones
(e.g. +01:00, -05:00) but didn't account for real world time zones with
DST transitions.
Adding those tests revealed a couple of problems with our current rounding logic.
In some cases, usually happening around those transitions, rounding a value down
could land on a value that itself is not a proper rounded value. Also sometimes
the nextRoundingValue would not line up properly with the rounded value of all
dates in the next unit interval.
This change improves the current rounding logic in TimeUnitRounding in two ways:
it makes sure that calling round(date) always returns a date that when rounded
again won't change (making round() idempotent) by handling special cases happening
during dst transitions by performing a second rounding. It also changes the
nextRoundingValue() method to itself rely on the round method to make sure we
always return rounded values for the unit interval boundaries.
Also adding tests for randomized TimeUnitRounding that assert important basic
properties the rounding values should have. For better understanding and readability
a few of the pathological edge cases are also added as a special test case.
Like on other places in the query dsl the full field name should be used.
Before this change this wasn't the case for nested inner hits when source filtering was used.
Highlighting has a workaround, which is now removed as the source of nested inner hits can only be refered by the full name.
Closes#16653
When performing a local recovery, the engine replays operations
recovered from the translog into the translog again. These duplicated
operations are eventually cleared after successful recovery on flush,
but there is no need to play these operations into the translog at
all. This commit modifies the local recovery process so as to not replay
these operations into the translog.
Relates #18547
This PR changes the InternalTestCluster to support dedicated master nodes. The creation of dedicated master nodes can be controlled using a new `supportsMasterNodes` parameter to the ClusterScope annotation. If set to true (the default), dedicated master nodes will randomly be used. If set to false, no master nodes will be created and data nodes will also be allowed to become masters. If active, test runs will either have 1 or 3 masternodes
When mocking unassigned shards which have failed with reason ALLOCATION_FAILED we
have to ensure that the failed allocation counter is strictly positive.
This commit removes the ability to specify a custom plugins
path. Instead, the plugins path will always be a subdirectory called
"plugins" off of the home directory.
This commit simplifies the delayed shard allocation implementation by assigning clear responsibilities to the various components that are affected by delayed shard allocation:
- UnassignedInfo gets a boolean flag delayed which determines whether assignment of the shard should be delayed. The flag gets persisted in the cluster state and is thus available across nodes, i.e. each node knows whether a shard was delayed-unassigned in a specific cluster state. Before, nodes other than the current master were unaware of that information.
- This flag is initially set as true if the shard becomes unassigned due to a node leaving and the index setting index.unassigned.node_left.delayed_timeout being strictly positive. From then on, unassigned shards can only transition from delayed to non-delayed, never in the other direction.
- The reroute step is in charge of removing the delay marker (comparing timestamp when node left to current timestamp).
- A dedicated service DelayedAllocationService, reacting to cluster change events, has the responsibility to schedule reroutes to remove the delay marker.
Closes#18293
Failing the build on deprecation warnings was removed in
19b3ec88af. This commit removes the
suppressed deprecation warnings so that their use is surfaced in the
build now.
Relates #18582
`doc_values` for _type field are created but any attempt to load them throws an IAE.
This PR re-enables `doc_values` loading for _type, it also enables `fielddata` loading for indices created between 2.0 and 2.1 since doc_values were disabled during that period.
It also restores the old docs that gives example on how to sort or aggregate on _type field.
The setting minimum_master_nodes is crucial to avoid split brains in a cluster. In order to avoid data loss, it should always be configured to at least a quorum (majority) of master-eligible nodes.
This commit adds a warning to the logs on the master node if the value is set to less than quorum of master-eligible nodes.
The 'Setting' constructor has some outdated Javadoc that suggested that it would automatically apply 'Property.NodeScope' if no scope is supplied, but no scope is added in that case.
Significant changes:
* AbstractQueryTestCase has moved to the test framework module, in order for query builder tests in modules and plugins
* Added support to AbstractQueryTestCase to register plugins
* Lift the restriction that only one percolator could be added per index. This validation existed in MapperService, but because the percolator moved to a module it could no longer exist there. Instead of bringing it back it was removed. This validation existed since the percolator cache only supported one percolator query per document, since the percolator cache has been removed this restriction could removed as well.
* While moving percolator tests to the new module, also removed a couple of tests for the deprecated percolate and mpercolate api. These APIs are now sugar APIs for bwc and rediect to the searvh and msearvh APIs. Some tests were still testing as if percolate and mpercolate API did the percolation, but this no longer the case and these tests could be removed.
Before the query extraction would have been aborted and the percolator query would be marked as unknown.
This resulted in a situation that these queries always need to be evaluated by the memory index at search time.
By adding support for this query many more percolator query candidate hits can skip the expensive memory index verification step. For example the `match` query parser returns a MatchNoDocsQuery if the query terms are removed by text analysis (lets query text only contained stop words).
Remove the arbitrary limit on epoch_millis and epoch_seconds of 13 and 10
characters, respectively. Instead allow any character combination that can
be converted to a Java Long.
Update the docs to reflect this change.
This commit is a slight refactoring of the use of environment variables
in replacing property placeholders. In commit
115f983827 the constructor for
Settings.Builder was made package visible to provide a hook for tests to
mock obtaining environment variables. But we do not need to go that far
and can instead provide a small hook for this for tests without opening
up the constructor. Thus, in this commit we refactor
Settings.Builder#replacePropertyPlaceholders to a package-visible method
that accepts a function providing environment variables by names. The
public-visible method just delegates to this method passing in
System::getenv and tests can use the package-visible method to mock the
behavior they need without relying on external environment variables.
This change makes ES compile with java9 again, build 118.
* There are a handful of changes due to failure to determine types during compile.
* The attachment plugins which use tika needed to have tika upgraded in order to pickup fixes there for java 9.
* azure discovery and s3 repository indirectly depend on jaxb, which is no longer in the default modules. They now add a jaxb dependency externally, and make JarHell allow for this package.
This commit modifies the settings test for environment variables
placeholders so that it is reproducible. The underlying issue is that
the set of environment variables from system to system can vary, and
this means that's we can never be sure that a failing test will be
reproducible. This commit simplifies this test to not rely on external
forces that could influence reproducibility.
Relates #18501
This removes the ScriptMode class entirely, which was an enum with two
options (ON and OFF) which essentially boiled down to true and false.
Now the boolean values are used instead.
Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
`index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.
Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards
has been started.
Relates to #18417
Today when sending a REST error to a client, we send the decoded
path. But decoding that path can already be the cause of the error in
which case decoding it again will just throw an exception leading to us
never sending an error back to the client. It would be better to send
the entire raw path to the client and that is what we do in this commit.
Relates #18477
This test fails spuriosly in CI and is not reproducible locally.
With this commit we temporarily increase the log level in a few
packages that are suspected to reveal the cause.
This commit fixes a test bug in the scaling thread pool configuration
test. In particular, the test randomization could select min and max for
a thread pool configuration where both are equal to zero. This is a
violation of the requirements of the ThreadPoolExecutor. With this
commit, we now ensure that the max is bounded below by one.
Before 5.0 for it was required that the percolator queries were cached in jvm heap as Lucene queries for two reasons:
1) Performance. The percolator evaluated all percolator queries all the time. There was no pre-selecting queries that are likely to match like we have today.
2) Updates made to percolator queries were visible in realtime, Today these changes are visible in near realtime. So updating no longer requires the percolator to have the queries in jvm heap.
So having the percolator queries in jvm heap via the percolator cache is now less attractive. Especially when there are many percolator queries then these queries can consume many GBs of jvm heap.
Removing the percolator cache does make the percolate query slower compared to how the execution time in 5.0.0-alpha1 and alpha2, but it is still faster compared to 2.x and before.
Currently the query builders expose the clauses of the span
query as a modifiable list. Instead we should make the that
getter return an unmodifiable list. Also renaming the method
used to add a clause from `clause(spanQuery)` to
`addClause(spanQuery)`.
#18360 introduced an extra lock in order to allow writes while syncing the translog. This caused a potential deadlock with snapshotting code where we first acquire the instance lock, followed by a sync (which acquires the syncLock). However, the sync logic acquires the syncLock first, followed by the instance lock.
I considered solving this by not syncing the translog on snapshot - I think we can get away with just flushing it. That however will create subtleties around snapshoting and whether operations in them are persisted. I opted instead to have slightly uglier code with nest synchronized, where the scope of the change is contained to the TranslogWriter class alone.
Today when parsing settings during bootstrap, we add a system property
for every Elasticsearch setting. Additionally, settings can be set via
system properties. This commit simplifies this situation.
- settings are no longer propogated to system properties
- system properties can not be used to set settings
- the "es." prefix on settings is no longer required (nor permitted)
- test logging has a dedicated system property (tests.logger.level)
Relates #18198
The preserve_original option to the ASCIIFoldingFilter doesn't
play well with the FingerprintFilter, as it ends up producing
fingerprints like:
"and consistent godel gödel is said sentence this yes"
The goal of the OpenRefine algorithm is to product a small normalized
ASCII fingerprint. There's no need to expose preserve_original.
We often require a random joda DateTimeZone in our tests. Currently
there are a few options for generating such a random DateTimeZone
from the set of available ids. Currently most random picks are not
really reproducable across different jvms because they rely on order
in the ids set implementation. The helper in DateProcessorFactoryTests
thus performs a sort on the set of ids before random picking from
the result, so I moved this to ESTestCase to make it publicly
available and changed all other tests to use that method.