This commit clarifies the behavior that must be adhered to by any
implementors of the BlobContainer interface. This is done through
expanded Javadocs.
Closes#18157Closes#15580
The page cache recycler has a dependency on thread pool that was there
for historical reasons but is no longer needed. This commit removes this
now unneeded dependency.
Relates #18664
Contains a number of cleanups related to recent changes in RoutingNodes:
- PR #17821 (Immutable ShardRouting) changed RoutingNode to use a map indexed by ShardId to manage ShardRouting elements. This means that we can directly select the right ShardRouting without iterating over all elements. This lets us get rid of RoutingNodeIterator and all kind of iterations all over the place.
- Second cleanup is an extension of #18390 (Expose cluster state before reroute in RoutingAllocation instead of RoutingNodes). We should not reexpose RoutingTable in RoutingNodes and only use it in the constructor. This makes it clear that the RoutingTable is only used to construct the RoutingNodes and can diverge from it afterwards (only RoutingNodes is mutable).
- Remove AllocationService.applyNewNodes() (that is already done as part of construction of RoutingNodes)
When we shrink an index we can estimate the shards size for the primary
from the source index. This is important for allocation decisions since we
should try out best to ensure we have enough space on the node we shrink the
index.
Currently we return `null` when the query in a common terms query has
zero tokens after analysis. It would be better if query builders
`toQuery()` would never return null and return a meaningful lucene
query instead. Since an ExtendedCommonTermsQuery with no terms gets
rewritten to a MatchNoDocsQuery later, it is enough to leave out the
check for zero tokens.
The fact that ip fields used a different doc values representation in 2.x causes
issues when querying 2.x and 5.0 indices in the same request. This changes 2.x
doc values on ip fields/2.x to be hidden behind binary doc values that use the
same encoding as 5.0. This way the coordinating node will be able to merge shard
responses that have different major versions.
One known issue is that this makes sorting/aggregating slower on ip fields for
indices that have been generated with elasticsearch 2.x.
This adds a low level primitive operations to shrink an existing
index into a new index with a single shard. This primitive expects
all shards of the source index to allocated on a single node. Once the target index is initializing on the shrink node it takes a snapshot of the source index shards and copies all files into the target indices data folder. An [optimization](https://issues.apache.org/jira/browse/LUCENE-7300) coming in Lucene 6.1 will also allow for optional constant time copy if hard-links are supported by the filesystem. All mappings are merged into the new indexes metadata once the snapshots have been taken on the merge node.
To shrink an existing index all shards must be moved to a single node (one instance of each shard) and the index must be read-only:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_settings' -d '{
"settings" : {
"index.routing.allocation.require._name" : "shrink_node_name",
"index.blocks.write" : true
}
}
```
once all shards are started on the shrink node. the new index can be created via:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_shrink/logs_single_shard' -d '{
"settings" : {
"index.codec" : "best_compression",
"index.number_of_replicas" : 1
}
}'
```
This API will perform all needed check before the new index is created and selects the shrink node based on the allocation of the source index. This call returns immediately, to monitor shrink progress the recovery API should be used since all copy operations are reflected in the recovery API with byte copy progress etc.
The shrink operation does not modify the source index, if a shrink operation should
be canceled or if the shrink failed, the target index can simply be deleted and
all resources are released.
When calling `findTemplateBuilder(context, currentFieldName, "text", null)`,
elasticsearch ignores all templates that have a `match_mapping_type` set since
no dynamic type is provided (the last parameter, which is null in that case).
So this should only be called _last_. Otherwise, if a path-based template
matches, it will have precedence over all type-based templates.
Closes#18625
We have 3 evil tests for jarhell. They have been failing in java 9
because of how evil they are. The first checks the leniency we add for
jarhell in the jdk itself. This is unecessary, since if the leniency
wasn't there, we would already be failing all jarhell checks. The second
is checking the compile version is compatible with the jdk. This is
simpler since we don't need to fake the java version: we know 1.7 should
be compatibile with both java 8 and 9, so we can use that as a constant.
Finally the last test checks if the java version system property is
broken. This is simply something we should not check, we have to trust
that java specifies it correctly, and again, if it was broken, all
jarhell checks would be broken.
There is no reason to read the entire marvel hero file to test the features,
it might take several seconds to do so which is unnecessary.
This commit also splits SearchSuggestTests into core and modules/mustache
also add @Nighlty to forbidden API to make sure we don't use it since they won't run in CI these days.
Lucene SuppressForbidden is marked lucene.internal and should not be
used outside of Lucene. This commit removes the uses of this class
within Elasticsearch. Instead,
org.elasticsearch.common.SuppressForbidden should be used, which was
already the case in most places.
Modifying the translog replay to not replay again into the translog
introduced a bug for the case of multiple operations for the same
doc. Namely, since we were no longer updating the version map for each
operation, the second operation for a doc would be treated as a creation
instead of as an update. This commit fixes this bug by placing these
operations into version map. This commit includes a failing test case.
Relates #18611
Today we pull doc stats from an index reader which might not reflect reality.
IndexWriter might have merged all deletes away but due to a missing refresh
the stats are completely off. This change pulls doc stats from the IndexWriter
directly instead of relying on refreshes to run regularly. Note: Buffered deletes
are still not visible until the segments are flushed to disk.
If the relocation source fails during the relocation of a shard from one node to another, the
relocation target is currently failed as well. For replica shards this is not necessary,
however, as the actual shard recovery of the relocation target is done via the primary shard.
Our current testing for TimeUnitRoundings rounding() and nextRoundingValue()
methods that are used especially for date histograms lacked proper randomization
for time zones. We only did randomized tests for fixed offset time zones
(e.g. +01:00, -05:00) but didn't account for real world time zones with
DST transitions.
Adding those tests revealed a couple of problems with our current rounding logic.
In some cases, usually happening around those transitions, rounding a value down
could land on a value that itself is not a proper rounded value. Also sometimes
the nextRoundingValue would not line up properly with the rounded value of all
dates in the next unit interval.
This change improves the current rounding logic in TimeUnitRounding in two ways:
it makes sure that calling round(date) always returns a date that when rounded
again won't change (making round() idempotent) by handling special cases happening
during dst transitions by performing a second rounding. It also changes the
nextRoundingValue() method to itself rely on the round method to make sure we
always return rounded values for the unit interval boundaries.
Also adding tests for randomized TimeUnitRounding that assert important basic
properties the rounding values should have. For better understanding and readability
a few of the pathological edge cases are also added as a special test case.
Like on other places in the query dsl the full field name should be used.
Before this change this wasn't the case for nested inner hits when source filtering was used.
Highlighting has a workaround, which is now removed as the source of nested inner hits can only be refered by the full name.
Closes#16653
When performing a local recovery, the engine replays operations
recovered from the translog into the translog again. These duplicated
operations are eventually cleared after successful recovery on flush,
but there is no need to play these operations into the translog at
all. This commit modifies the local recovery process so as to not replay
these operations into the translog.
Relates #18547
This PR changes the InternalTestCluster to support dedicated master nodes. The creation of dedicated master nodes can be controlled using a new `supportsMasterNodes` parameter to the ClusterScope annotation. If set to true (the default), dedicated master nodes will randomly be used. If set to false, no master nodes will be created and data nodes will also be allowed to become masters. If active, test runs will either have 1 or 3 masternodes
This commit fixes an issue with the plugins directory being a symbolic
link. Namely, the install plugins command attempts to always create the
plugins directory just in case it does not exist. The JDK method used
here guarantees that the directory is created, and an exception is not
thrown if the directory could not be created because it already
exists. The problem is that this JDK method does not respect symlinks so
its internal existence checks fails, it proceeds to attempt to create
the directory, but the directory creation fails because the symlink
exists. This is documented as being not an issue. We work around this by
checking if there is a symlink where we expect the plugins directory to
be, and only attempt to create if not. We add a unit test that plugin
installation to a symlinked plugins directory works as expected.
When mocking unassigned shards which have failed with reason ALLOCATION_FAILED we
have to ensure that the failed allocation counter is strictly positive.
It looks like the readme was duplicated when plugins were merged back
into the repo. We removed all these extra files from the plugins, this
removes the remaining duplicate from core.
closes#18597
This commit removes the ability to specify a custom plugins
path. Instead, the plugins path will always be a subdirectory called
"plugins" off of the home directory.
This commit simplifies the delayed shard allocation implementation by assigning clear responsibilities to the various components that are affected by delayed shard allocation:
- UnassignedInfo gets a boolean flag delayed which determines whether assignment of the shard should be delayed. The flag gets persisted in the cluster state and is thus available across nodes, i.e. each node knows whether a shard was delayed-unassigned in a specific cluster state. Before, nodes other than the current master were unaware of that information.
- This flag is initially set as true if the shard becomes unassigned due to a node leaving and the index setting index.unassigned.node_left.delayed_timeout being strictly positive. From then on, unassigned shards can only transition from delayed to non-delayed, never in the other direction.
- The reroute step is in charge of removing the delay marker (comparing timestamp when node left to current timestamp).
- A dedicated service DelayedAllocationService, reacting to cluster change events, has the responsibility to schedule reroutes to remove the delay marker.
Closes#18293