In rare occasion, the translog replay phase of recovery may require mapping changes on the target shard. This can happen where indexing on the primary introduces new mappings while the recovery is in phase1. If the source node processes the new mapping from the master, allowing the indexing to proceed, before the target node does and the recovery moves to the phase 2 (translog replay) before as well, the translog operations arriving on the target node may miss the mapping changes. To protect agains this we now throw and catch an exception, so we can properly wait and retry when the next cluster state arrives.
Closes#11281Closes#11363
In order to wait for events of a certain priority to pass, TransportClusterHealthAction submits a cluster state update task. If the current master steps down while this task is in the queue, the task will fail causing the ClusterHealth to report an unexpected error.
We often use this request to ensure cluster stability in tests after disruption. However, depends on the nature of the failure it may happen (if we're unfortunate) that two master election rounds are needed. The above issues causes the get health request to fail after the first one. Instead we should try to wait for a new master to be elected (or the local node to be re-elected).
Closes#11493
Core: time-duration and byte-sized settings now require explicit units.
On upgrade, if there are any cluster or index settings that are missing units, a warning is logged and the default unit is applied.
Closes#7616Closes#10888
Previously AggregationBuilder would wrap binary_aggregations in an aggregations object which would break parsing. This has been fixed so that for normally specified aggregations there are wrapped in an `aggregations` object, for binary aggregation which have the same XContentType as the builder it will use an `aggregations` field name and use the aggregationsBinary as the value (this will render the same as normal aggregations), and for binary aggregation with a different ContentType from the builder we use an `aggregations_binary` field name and add the aggregationsBinary as a binary value.
Additionally the logic in AggregationParsers needed to be changed as it previously did not parse `aggregations_binary` fields in sub-aggregations. A check has been added for the `aggregations_binary` field name and the binaryValue of this field is used to create a new parser and create the correct AggregatorFactories.
Close#11457
To better distribute the memory allocating to indexing, the IndexingMemoryController periodically checks the different shard for their last indexing activity. If no activity has happened for a while, the controller marks the shards as in active and allocated it's memory buffer budget (but a small minimal budget) to other active shards. The recently added synced flush feature (#11179, #11336) uses this inactivity trigger to attempt as a trigger to attempt adding a sync id marker (which will speed up future recoveries).
We wait for 30m before declaring a shard inactive. However, these days the operation just requires a refresh and is light. We can be stricter (and 5m) increase the chance a synced flush will be triggered.
Closes#11479
When a snapshot operation on a particular shard finishes, the data node where this shard resides sends an update shard status request to the master node to indicate that the operation on the shard is done. When the master node receives the command it queues cluster state update task and acknowledges the receipt of the command to the data node.
The update snapshot shard status tasks have relatively low priority, so during cluster instability they tend to get stuck at the end of the queue. If the master node gets restarted before processing these tasks the information about the shards can be lost and the new master assumes that they are still in process while the data node thinks that these shards are already done.
This commit add a retry mechanism that checks compares cluster state of a newly elected master and the current state of snapshot shards and updates the cluster state on the master again if needed.
Closes#11314
This is happening because of #4074 when we required that the top-level "query" is present to delete-by-query requests, but prior to that we required that it is not present. So the translog has a DBQ without "query" and when we try to parse it we hit this exception.
This commit adds special handling for pre 1.0.0 indices if we hit parse exception, we
try to reparse without a top-level query object to be BWC compatible for these indices.
Closes#10262
Conflicts:
src/main/java/org/elasticsearch/index/shard/IndexShard.java
src/test/java/org/elasticsearch/index/shard/IndexShardTests.java
This commit changes the date handling. First and foremost Elasticsearch
does not try to convert every date to a unix timestamp first and then
uses the configured date. This now allows for dates like `2015121212` to
be parsed correctly.
Instead it is now explicit by adding a `epoch_second` and `epoch_millis`
date format. This also means, that the default date format now is
`epoch_millis||dateOptionalTime` to remain backwards compatible.
Closes#5328
Relates #10971
Each shard repository consists of snapshot file for each snapshot - this file contains a map between original physical file that is snapshotted and its representation in repository. This data includes original filename, checksum and length. When a new snapshot is created, elasticsearch needs to read all these snapshot files to figure which file are already present in the repository and which files still have to be copied there. This change adds a new index file that contains all this information combined into a single file. So, if a repository has 1000 snapshots with 1000 shards elasticsearch will only need to read 1000 blobs (one per shard) instead of 1,000,000 to delete a snapshot. This change should also improve snapshot creation speed on repositories with large number of snapshot and high latency.
Fixes#8958