Previously, we would determine index deletes in the cluster state by
comparing the index metadatas between the current cluster state and the
previous cluster state and decipher which ones were missing (the missing
ones are deleted indices). This led to a situation where a node that
went offline and rejoined the cluster could potentially cause dangling
indices to be imported which should have been deleted, because when a node
rejoins, its previous cluster state does not contain reliable state.
This commit introduces the notion of index tombstones in the cluster
state, where we are explicit about which indices have been deleted.
In the case where the previous cluster state is not useful for index
metadata comparisons, a node now determines which indices are to be
deleted based on these tombstones in the cluster state. There is also
functionality to purge the tombstones after exceeding a certain amount.
Closes#17265Closes#16358Closes#17435
When I pulled on the thread that is "Remove PROTOTYPEs from
SignificanceHeuristics" I ended up removing SignificanceHeuristicStreams
and replacing it with readNamedWriteable. That seems like a lot at once
but it made sense at the time. And it is what we want in the end, I think.
Anyway, this also converts registration of SignificanceHeuristics to
use ParseFieldRegistry to make them consistent with Queries, Aggregations
and lots of other stuff.
Adds a new and wonderous hack to support serialization checking of
NamedWriteables registered by plugins!
Related to #17085
This commit contains the following improvements/fixes:
1. Renaming method names and variables to better reflect the purpose
of the method and the semantics of the variable.
2. For deleting indexes, replace the closed parameter passed to the
delete index/store methods with obtaining the index's state from the
IndexSettings that is already passed in.
3. Added tests to the IndexWithShadowReplicaIT suite, some of which
show issues in the shadow replica delete process that are captured in
Github issue 17695.
Closes#17638
With this commit we limit the size of all in-flight requests on
HTTP level. The size is guarded by the same circuit breaker that
is also used on transport level. Similarly, the size that is used
is HTTP content length.
Relates #16011
With this commit we limit the size of all in-flight requests on
transport level. The size is guarded by a circuit breaker and is
based on the content size of each request.
By default we use 100% of available heap meaning that the parent
circuit breaker will limit the maximum available size. This value
can be changed by adjusting the setting
network.breaker.inflight_requests.limit
Relates #16011
We have a couple places in the code base that assume that search is always done
on the inverted index. However with the new points API in Lucene 6, this is not
true anymore. This commit makes MappedFieldType.indexedValueForSearch protected
and fixes call sites to keep working for field types that use the inverted
index and either work differently ar throw an exception otherwise. For instance,
it will still be possible to run cross_fields multi match queries on numeric
fields, but the score contributions will not be blended as well as before, and
significant terms aggregations on long terms will not be possible anymore since
points do not record document frequencies.
CBOR is natively supported in Elasticsearch and allows for byte arrays.
This means, that by using CBOR the user can prevent base64 conversions
for the data being sent back and forth.
This PR adds support to extract data from a byte array in addition to
a string. This also required to add a ByteArrayValueSource class.
Sometimes we get a test failure caused by search contexts left open.
The tests include a stack trace of the call that opened the context
but nothing else about the context. This adds more information about
the context that has been left open like what query it was running,
what shard it targeted, and whether or not it was a scroll.
Relates to #17582
We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly
the same thing, so we should keep only one. I kept `Settings.builder` since it
has my preference but also it is the one that we use in examples of the Java API.
This removes the inconsistent output of IP addresses. The format was parsing-unfriendly and it makes it hard
to reason about API responses, such as to _nodes.
With this change in place, it will never print the hostname as part of the default format, which has the
added benefit that it can be used consistently for URIs, which was not the case when the hostname might
appear at the front with "hostname/ip:port".
Aggregations need to perform instanceof calls on MappedFieldType instances in
order to know how they should be parsed or formatted. Instead, we should let
the field types provide a formatter/parser that can can be used.
This change makes the root (/) rest api delegate to a transport action to get the
data for the response. This aligns this rest api with all of the other apis, which
delegate to one or more actions.
In doing this, unit tests were added to provide coverage of the RestMainAction
and the associated classes.
* master: (156 commits)
Make JNA calls optional
Added RPM metadata
Remove PROTOTYPE from MLT.Item
Remove PROTOTYPE from VersionType
Fix mistake in TopHits change
Remove PROTOTYPEs from highlighting
Clean up some log messages
Command line arguments with comma must be quoted on windows
Cluster Health should run on applied states, even if waitFor=0 #17440
ingest: make concrete processor impl final, like all other processor concrete impls.
Improve some test method comments.
Document task id's as string in the rest spec
Replace FieldStatsProvider with a method on MappedFieldType. #17334
cleanup test
Remove MathUtils. #17454
Addressing review comments
fix javadocs
Make TranslogConfig immutable and pass TranslogGeneration as a ctor arg to Translog
[reindex] Don't get rejected
Remove redundant commit - #openTranslog() already commits in that case
...
Move translog recover outside of the engine
We changed the way we manage engine memory buffers to an
open model where each shard can essentially has infinite memory.
The indexing memory controller is responsible for moving memory to disk
when it's needed. Yet, this doesn't work today when we recover from store/translog
since the engine is not fully initialized such that IMC has no access to the engine,
neither to it's memory buffer nor can it move data to disk.
The biggest issue here is that translog recovery happends inside the Engine constructor
which is problematic by itself since it might take minutes and uses a not yet fully
initialzied engine to perform write operations on.
This change detaches the translog recovery and makes it the responsibility of the caller
to run it once the engine is fully constructed or skip it if not necessary.
Currently our testing of parsing query builders is limited to the
default order of the parameters that each builders toXContent()
method produces. To better test real queries where the order of
parameters can be different, this change adds a helper
method to ESTestCase that takes a XContentBuilder and randomly
shuffles the order of the fields inside an object. This is
used in AbstractQueryTestCase, but it can be used in other similar
places in the future.
We changed the way we manage engine memory buffers to an
open model where each shard can essentially has infinite memory.
The indexing memory controller is responsible for moving memory to disk
when it's needed. Yet, this doesn't work today when we recover from store/translog
since the engine is not fully initialized such that IMC has no access to the engine,
neither to it's memory buffer nor can it move data to disk.
The biggest issue here is that translog recovery happends inside the Engine constructor
which is problematic by itself since it might take minutes and uses a not yet fully
initialzied engine to perform write operations on.
This change detaches the translog recovery and makes it the responsibility of the caller
to run it once the engine is fully constructed or skip it if not necessary.
* master: (25 commits)
Replication operation that try to perform the primary phase on a replica should be retried
split long line in ConvertProcessorTests
add type conversion support to ConvertProcessor
percolator: Make explain use the two phase iterator
test: make sure we don't flush during indexing the percolator queries
Added experimental annotation to the update-by-query and reindex docs
Fixed bad YAML in reindex REST test: 50_routing.yaml
Update-by-query rest tests: fixed bad yaml and deleted a client-dependent test
Prevents exception being raised when ordering by an aggregation which wasn't collected
The reindex body is now required, which changes the exception thrown by the REST test
Docs: Included Nodes Task API and tidied reindex/update-by-query
Rename update-by-query REST tests to update_by_query
REST: The body is required in the reindex API
The source parameter should not be defined in the delete-by-query REST spec
Renamed update-by-query REST spec to update_by_query
Fix test bug in TypeQueryBuilderTests.
Add comment why it is safe to check the number of nested fields in MapperService.merge.
Automatically add a sub keyword field to string dynamic mappings. #17188
Type filters should not have a performance impact when there is a single type. #17350
Add API to explain why a shard is or isn't assigned
...
If a terms aggregation was ordered by a metric nested in a single bucket aggregator which did not collect any documents (e.g. a filters aggregation which did not match in that term bucket) an ArrayOutOfBoundsException would be thrown when the ordering code tried to retrieve the value for the metric. This fix fixes all numeric metric aggregators so they return their default value when a bucket ordinal is requested which was not collected.
Closes#17225
This adds a new `/_cluster/allocation/explain` API that explains why a
shard can or cannot be allocated to nodes in the cluster. Additionally,
it will show where the master *desires* to put the shard, according to
the `ShardsAllocator`.
It looks like this:
```
GET /_cluster/allocation/explain?pretty
{
"index": "only-foo",
"shard": 0,
"primary": false
}
```
Though, you can optionally send an empty body, which means "explain the
allocation for the first unassigned shard you find".
The output when a shard is unassigned looks like this:
```
{
"shard" : {
"index" : "only-foo",
"index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
"id" : 0,
"primary" : false
},
"assigned" : false,
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2016-03-22T20:04:23.620Z"
},
"nodes" : {
"V-Spi0AyRZ6ZvKbaI3691w" : {
"node_name" : "Susan Storm",
"node_attributes" : {
"bar" : "baz"
},
"final_decision" : "NO",
"weight" : 0.06666675,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
},
"Qc6VL8c5RWaw1qXZ0Rg57g" : {
"node_name" : "Slipstream",
"node_attributes" : {
"bar" : "baz",
"foo" : "bar"
},
"final_decision" : "NO",
"weight" : -1.3833332,
"decisions" : [ {
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
} ]
},
"PzdyMZGXQdGhqTJHF_hGgA" : {
"node_name" : "The Symbiote",
"node_attributes" : { },
"final_decision" : "NO",
"weight" : 2.3166666,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
}
}
}
```
And when the shard *is* assigned, the output looks like:
```
{
"shard" : {
"index" : "only-foo",
"index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
"id" : 0,
"primary" : true
},
"assigned" : true,
"assigned_node_id" : "Qc6VL8c5RWaw1qXZ0Rg57g",
"nodes" : {
"V-Spi0AyRZ6ZvKbaI3691w" : {
"node_name" : "Susan Storm",
"node_attributes" : {
"bar" : "baz"
},
"final_decision" : "NO",
"weight" : 1.4499999,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
},
"Qc6VL8c5RWaw1qXZ0Rg57g" : {
"node_name" : "Slipstream",
"node_attributes" : {
"bar" : "baz",
"foo" : "bar"
},
"final_decision" : "CURRENTLY_ASSIGNED",
"weight" : 0.0,
"decisions" : [ {
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
} ]
},
"PzdyMZGXQdGhqTJHF_hGgA" : {
"node_name" : "The Symbiote",
"node_attributes" : { },
"final_decision" : "NO",
"weight" : 3.6999998,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
}
}
}
```
Only "NO" decisions are returned by default, but all decisions can be
shown by specifying the `?include_yes_decisions=true` parameter in the
request.
Resolves#14593
* master: (419 commits)
Remove PROTOTYPE from ShapeBuilders
Take filterNodeIds into consideration while sending tasks actions requests to nodes
test: cleanup imports and method rename
Remove PROTOTYPE from SortBuilders
percolator: Add query extract support for the blended term query and the common terms query.
Don't iterate over shard routing if it's null
[TEST] Reduce size of random shapes
Add some debug logging to testPrimaryRelocationWhileIndexing
Order methods in IndicesClusterStateService according to execution
Tidied up percolator doc annotations
In cat.snapshots, repository is required
Do not retrieve all indices stats when checking for cache resets
Enforce `discovery.zen.minimum_master_nodes` is set when bound to a public ip #17288
Port Primary Terms to master #17044
Revert "Add debug logging for Vagrant upgrade test"
Ownership for data, logs, and configs for packages
add on_failure exception metadata to ingest document for verbose simulate
Revert "Merge pull request #16843 from xuzha/s3-encryption"
Update Format, add new settings into the setting test
Update and rebase the init implementation.
...
Node roles are now serialized as well, they are not part of the node attributes anymore. DiscoveryNodeService takes care of dividing settings into attributes and roles. DiscoveryNode always requires to pass in attributes and roles separately.
Primary terms is a way to make sure that operations replicated from stale primary are rejected by shards following a newly elected primary.
Original PRs adding this to the seq# feature branch #14062 , #14651 . Unlike those PR, here we take a different approach (based on newer code in master) where the primary terms are stored in the meta data only (and not in `ShardRouting` objects).
Relates to #17038Closes#17044