By default, tasks are grouped by node. However, task execution in elasticsearch can be quite complex and an individual task that runs on a coordinating node can have many subtasks running on other nodes in the cluster. This commit makes it possible to list task grouped by common parents instead of by node. When this option is enabled all subtask are grouped under the coordinating node task that started all subtasks in the group. To group tasks by common parents, use the following syntax:
GET /tasks?group_by=parents
This commit adds the new `action.search.shard_count.limit` setting which
configures the maximum number of shards that can be queried in a single search
request. It has a default value of 1000.
Both top level and inline inner hits are now covered by InnerHitBuilder.
Although there are differences between top level and inline inner hits,
they now make use of the same builder logic.
The parsing of top level inner hits slightly changed to be more readable.
Before the nested path or parent/child type had to be specified as encapsuting
json object, now these settings are simple fields. Before this was required
to allow streaming parsing of inner hits without missing contextual information.
Once some issues are fixed with inline inner hits (around multi level hierachy of inner hits),
top level inner hits will be deprecated and removed in the next major version.
Today the basic node settings like `node.data` and `node.master` can't really be fully validated
since we allow to specify custom user attributes on the node level. We have to, in order to
support that, add a wildcard setting for `node.*` to let these setting pass validation.
Instead we should require a more contraint prefix like `node.attr.` that defines a namespace
that is reserved for user attributes.
This commit adds a new namespace for attributes in `node.attr`.
Closes#17280
This is to prevent mapping explosion when dynamic keys such as UUID are used as field names. index.mapping.total_fields.limit specifies the total number of fields an index can have. An exception will be thrown when the limit is reached. The default limit is 1000. Value 0 means no limit. This setting is runtime adjustable
Closes#11443
This adds a new `/_cluster/allocation/explain` API that explains why a
shard can or cannot be allocated to nodes in the cluster. Additionally,
it will show where the master *desires* to put the shard, according to
the `ShardsAllocator`.
It looks like this:
```
GET /_cluster/allocation/explain?pretty
{
"index": "only-foo",
"shard": 0,
"primary": false
}
```
Though, you can optionally send an empty body, which means "explain the
allocation for the first unassigned shard you find".
The output when a shard is unassigned looks like this:
```
{
"shard" : {
"index" : "only-foo",
"index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
"id" : 0,
"primary" : false
},
"assigned" : false,
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2016-03-22T20:04:23.620Z"
},
"nodes" : {
"V-Spi0AyRZ6ZvKbaI3691w" : {
"node_name" : "Susan Storm",
"node_attributes" : {
"bar" : "baz"
},
"final_decision" : "NO",
"weight" : 0.06666675,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
},
"Qc6VL8c5RWaw1qXZ0Rg57g" : {
"node_name" : "Slipstream",
"node_attributes" : {
"bar" : "baz",
"foo" : "bar"
},
"final_decision" : "NO",
"weight" : -1.3833332,
"decisions" : [ {
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
} ]
},
"PzdyMZGXQdGhqTJHF_hGgA" : {
"node_name" : "The Symbiote",
"node_attributes" : { },
"final_decision" : "NO",
"weight" : 2.3166666,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
}
}
}
```
And when the shard *is* assigned, the output looks like:
```
{
"shard" : {
"index" : "only-foo",
"index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
"id" : 0,
"primary" : true
},
"assigned" : true,
"assigned_node_id" : "Qc6VL8c5RWaw1qXZ0Rg57g",
"nodes" : {
"V-Spi0AyRZ6ZvKbaI3691w" : {
"node_name" : "Susan Storm",
"node_attributes" : {
"bar" : "baz"
},
"final_decision" : "NO",
"weight" : 1.4499999,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
},
"Qc6VL8c5RWaw1qXZ0Rg57g" : {
"node_name" : "Slipstream",
"node_attributes" : {
"bar" : "baz",
"foo" : "bar"
},
"final_decision" : "CURRENTLY_ASSIGNED",
"weight" : 0.0,
"decisions" : [ {
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
} ]
},
"PzdyMZGXQdGhqTJHF_hGgA" : {
"node_name" : "The Symbiote",
"node_attributes" : { },
"final_decision" : "NO",
"weight" : 3.6999998,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
}
}
}
```
Only "NO" decisions are returned by default, but all decisions can be
shown by specifying the `?include_yes_decisions=true` parameter in the
request.
Resolves#14593
https://github.com/elastic/elasticsearch/pull/17288 added a check to enforce that the `discovery.zen.minimum_master_nodes` configuration is set when nodes have the `host`, `port`, or `bind_host` set in either `transport` or general `network` configuration sections. This was documented incorrectly as "nodes that are bound to a non-loopback interface", which lead to confusion as I set `network.host: "localhost"` and the check was still failing.
This change updates the docs to detail the actual check. I think it also highlights how complex the check is and the need for a simpler solution.
discovery.zen.minimum_master_nodes is the single most important setting to set on a production cluster. We have no way of supplying a good default so it must be set by the user. Binding a node to a public IP (as opposed to the default local host) is a good enough indication that a node will be part of a production cluster cluster and thus it's a good tradeoff to enforce the settings. Note that nothing prevent users from setting it to 1 in a single node cluster.
Closes#17288
We currently have a `discovery.zen.master_election.filter_client` setting that control whether their ping responses are ignored for master election (which is the current default). With the push to treat client nodes as normal nodes (and promote the transport/rest clients for client work), this should be changed. This commit remove this setting and it's companion `discovery.zen.master_election.filter_data` setting (currently defaulting to false) in favor of singe `discovery.zen.master_election.ignore_non_master_pings` setting with more intuitive name (defaulting to false).
Resolves#17325Closes#17329
The available memory metric was always set to `0` since 2.0.beta1 (bug). was left behind but never set. Turns out the section wasn't that useful, as it would only output the total memory available throughout all nodes in the cluster. We decided to remove the section then.
In #17198, we removed suggest transport action, which
used the `suggest` threadpool to execute requests. Now
`suggest` threadpool is unused and suggest requests are
executed on the `search` threadpool.
We can be better at checking `buffer_size` and `chunk_size` for S3 repositories.
For example, we know that:
* `buffer_size` should be more than `5mb`
* `chunk_size` should be no more than `5tb`
* `buffer_size` should be lower than `chunk_size`
Otherwise, setting `buffer_size` is useless.
For the record:
`chunk_size` is a Snapshot setting whatever the implementation is.
`buffer_size` is an S3 implementation setting.
Let say that you are snapshotting a 500mb file. If you set `chunk_size` to `200mb`, then Snapshot service will call S3 repository to snapshot 3 files with the following sizes:
* `200mb`
* `200mb`
* `100mb`
If you set `buffer_size` to `100mb` (AWS maximum size recommendation), the first file of `200mb` will be uploaded on S3 using the multipart feature in 2 chunks and the workflow is basically the following:
* create the multipart request and get back an `id` from AWS S3 platform
* upload part1: `100mb`
* upload part2: `100mb`
* "commit" the full upload using the `id`.
Closes#17244.