Commit Graph

6 Commits

Author SHA1 Message Date
Simon Willnauer 35e705877b Limit retries of failed allocations per index (#18467)
Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
`index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.

Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards
has been started.

Relates to #18417
2016-05-20 20:37:45 +02:00
Yannick Welsch d5b691b68e Extend reroute with an option to force assign stale primary shard copies
Closes #15708
2016-01-19 12:07:01 +01:00
Clinton Gormley db541d6fbe Docs: Add warning about allow_primary to the cluster reroute docs
Closes #12503
2015-08-07 12:03:19 +02:00
Lee Hinman e53a43800e Add `explain` flag support to the reroute API
By specifying the `explain` flag, an explanation for the reason a
command can or cannot be executed is returned. No allocation commands
are actually performed.

Returns a response similar to:

{
  "state": {...cluster state...},
  "acknowledged": true,
  "explanations" : [ {
    "command" : "cancel",
      "parameters" : {
        "index" : "decide",
        "shard" : 0,
        "node" : "IvpoKRdtRiGrQ_WKtt4_4w",
        "allow_primary" : false
      },
      "decisions" : [ {
        "decider" : "cancel_allocation_command",
        "decision" : "YES",
        "explanation" : "..."
        } ]
     }, {
      "command" : "move",
      "parameters" : {
        "index" : "decide",
        "shard" : 0,
        "from_node" : "IvpoKRdtRiGrQ_WKtt4_4w",
        "to_node" : "IvpoKRdtRiGrQ_WKtt4_4w"
       },
       "decisions" : [ {
         "decider" : "same_shard",
         "decision" : "NO",
         "explanation" : "shard cannot be allocated on same node [IvpoKRdtRiGrQ_WKtt4_4w] it already exists on"
       },
       etc
       ]
  }]
}

also removes AllocationExplanation from cluster state

Closes #2483
Closes #5169
2014-02-27 09:48:51 -07:00
Sean Gallagher e935a301df Doc fix explaining resynchronization with the Cancel command.
Added line explaining resync process to Reroute/Cancel command.

Closes #5025
2014-02-07 17:02:36 -05:00
Clinton Gormley 822043347e Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00