Commit Graph

45 Commits

Author SHA1 Message Date
David Turner bd41150338
Move 'lost cluster state updates' issue to DONE (#36959)
Relates #34714.
2018-12-24 09:29:08 +00:00
David Turner e2e82039d9
Add resiliency note on replica divergence (#36960) 2018-12-23 09:50:40 +00:00
lipsill b7c0d2830a [Docs] Remove repeating words (#33087) 2018-08-28 13:16:43 +02:00
Boaz Leskes 654378f504 Resilience page - Remove 6.0.0 as a target for the discovery refactoring. (#26311) 2017-08-21 18:15:24 +02:00
Jason Tedor f4a432e456 Add note regarding out-of-sync replicas
This commit adds a note to the resiliency status page regarding the fact
that replicas can fall out of sync with the primary shard after primary
promotion occurs due to a failing primary shard.

Relates #23503
2017-03-07 14:25:23 -05:00
Boaz Leskes 75ee2bb61d Update resiliency page for the release of v5 (#21177) 2016-10-28 18:46:54 +02:00
Lee Hinman 45d4d08f32 [DOCS] Mark combinatorial explosion in aggs as 'done'
This marks the "Prevent combinatorial explosion in aggregations from
causing OOM" task as done in 5.0.0.

Relates to #8081 and #19394
2016-09-20 16:07:43 -06:00
Boaz Leskes 577dcb3237 Add current cluster state version to zen pings and use them in master election (#20384)
During a networking partition, cluster states updates (like mapping changes or shard assignments)
are committed if a majority of the masters node received the update correctly. This means that the current master has access to enough nodes in the cluster to continue to operate correctly. When the network partition heals, the isolated nodes catch up with the current state and get the changes they couldn't receive before. However, if a second partition happens while the cluster
is still recovering from the previous one *and* the old master is put in the minority side, it may be that a new master is elected which did not yet catch up. If that happens, cluster state updates can be lost.

This commit fixed 95% of this rare problem by adding the current cluster state version to `PingResponse` and use them when deciding which master to join (and thus casting the node's vote). 

Note: this doesn't fully mitigate the problem as a cluster state update which is issued concurrently with a network partition can be lost if the partition prevents the commit message (part of the two phased commit of cluster state updates) from reaching any single node in the majority side *and* the partition does allow for the master to acknowledge the change. We are working on a more comprehensive fix but that requires considerate work  and is targeted at 6.0.
2016-09-15 23:39:11 +02:00
Ali Beyad a21dd80f1b Documentation changes for wait_for_active_shards (#19581)
Documentation changes and migration doc changes for introducing 
wait_for_active_shards and removing write consistency level.

Closes #19581
2016-08-02 09:15:01 -04:00
Yannick Welsch 7dff8fbb1d Update resiliency docs (#19303)
Adds clarifications about Jepsen tests and new section on issues with versioning.
2016-07-08 17:30:46 +02:00
Boaz Leskes 09ca6d6ed2 Add a BridgePartition to be used by testAckedIndexing (#19172)
We have long worked to capture different partitioning scenarios in our testing infra. This PR adds a new variant, inspired by the Jepsen blogs, which was forgotten far - namely a partition where one node can still see and be seen by all other nodes. It also updates the resiliency page to better reflect all the work that was done in this area.
2016-06-30 17:58:12 +02:00
Clinton Gormley 6b7acc0ca2 Update index.asciidoc
In-flight requests circuit breaker is done
2016-06-29 10:24:43 +02:00
Boaz Leskes 8eee28e798 Update resiliency page (#17586)
#14252 , #7572 , #15900, #12573, #14671, #15281 and #9126 have all been closed/merged and will be part of 5.0.0.
2016-04-07 12:17:13 +02:00
Lee Hinman 6adbbff97c Fix organization rename in all files in project
Basically a query-replace of "https://github.com/elasticsearch/" with "https://github.com/elastic/"
2016-03-03 12:04:13 -07:00
Jason Tedor aa8ee74c6c Bump Elasticsearch version to 5.0.0-SNAPSHOT
This commit bumps the Elasticsearch version to 5.0.0-SNAPSHOT in line
with the alignment of versions across the stack.

Closes #16862
2016-03-01 17:03:47 -05:00
Yannick Welsch 4ea83740a2 Updates to resiliency documentation
Closes #16658
2016-02-19 09:48:33 -08:00
Boaz Leskes 316f07743a feedback 2015-11-23 13:15:22 +01:00
Boaz Leskes c6f0798a20 Update the resiliency page to 2.0.0 2015-11-17 14:32:33 +01:00
Jason Tedor a846a257c5 Clarify stale shard issue on Resiliency page 2015-11-13 12:40:51 -05:00
Jason Tedor 7b16e82df4 Add stale shard issue to Resiliency page
This commit adds a simple description of the stale shard issue to the
Resiliency page.

Relates #14671
2015-11-11 08:30:54 -05:00
debadair c258d8c341 Removed cross document link to field data topic. 2015-10-22 11:05:43 -07:00
Jose Diaz-Gonzalez 8782c8e08d Update link to Jepsen related test class 2015-10-01 16:34:19 -04:00
Boaz Leskes d9f6e302b5 doc feedback 2015-08-28 12:31:45 +02:00
Boaz Leskes f70ed876d6 added docs 2015-08-28 12:31:45 +02:00
Boaz Leskes 1e35bf3171 Discovery: wait on incoming joins before electing local node as master
During master election each node pings in order to discover other nodes and validate the liveness of existing nodes. Based on this information the node either discovers an existing master or, if enough nodes are found (based on `discovery.zen.minimum_master_nodes>>) a new master will be elected.

Currently, the node that is elected as master will currently update it the cluster state to indicate the result of the election. Other nodes will submit a join request to the newly elected master node. Instead of immediately processing the election result, the elected master
node should wait for the incoming joins from other nodes, thus validating the elections result is properly applied. As soon as enough nodes have sent their joins request (based on the `minimum_master_nodes` settings) the cluster state is modified.

Note that if `minimum_master_nodes` is not set, this change has no effect.

Closes #12161
2015-07-15 07:43:49 +02:00
Clinton Gormley 5324855224 Merge pull request #12223 from awislowski/patch-1
Update index.asciidoc
2015-07-14 14:36:03 +02:00
Andrzej Wisłowski 7b4824f318 fix github source link 2015-07-14 10:14:15 +02:00
Andrzej Wisłowski aaea4a2f52 Update index.asciidoc 2015-07-14 09:35:15 +02:00
Clinton Gormley b89bd99cd1 Refixed bad link 2015-06-22 23:59:33 +02:00
Clinton Gormley 81d66d8ef3 Docs: Fixed bad link 2015-06-22 23:57:29 +02:00
Clinton Gormley f123a53d72 Docs: Refactored modules and index modules sections 2015-06-22 23:49:45 +02:00
Alex Chan e31049988b [Docs] Fix minor spelling errors
Closes #11320
2015-05-25 19:56:43 +02:00
Pascal Borreli af6d890ad5 Docs: Fixed typos
Closes #10973
2015-05-05 10:38:05 +02:00
Clinton Gormley c28bf3bb3f Docs: Updated elasticsearch.org links to elastic.co 2015-05-01 20:46:12 +02:00
Clinton Gormley 7c147b2db5 Doc: Updates to resiliency page for 1.5.0/1 2015-04-14 15:30:33 +02:00
Clinton Gormley 6a43ed8b28 Updated the resiliency status page for v1.4.0
Closes #9969
2015-03-03 19:50:13 +01:00
Martijn van Groningen b669e37c0b Docs: updated resilience page 2015-03-03 15:25:10 +01:00
Boaz Leskes fb2e4da56c Add #8720 to the resiliency page
Closes #9277
2015-01-16 05:24:54 -08:00
Boaz Leskes 1e87604b36 Docs: minor update to resiliency page 2014-11-05 15:00:53 +01:00
Boaz Leskes ddd06c772e Docs: updated resiliency page for 1.4.0 2014-11-05 14:50:52 +01:00
Martijn van Groningen 960fcf4e61 Update the status of the shard info header work. 2014-11-05 12:53:45 +01:00
Clinton Gormley 3267c2a2bf Docs: Updated the resiliency docs to point to the DiscoveryWithServiceDisruptions class 2014-10-02 21:08:32 +02:00
Clinton Gormley 12265aae02 Docs: Fixed issue link in doc values section of resiliency status 2014-10-02 13:34:27 +02:00
Clinton Gormley 1c7f4ca513 Updated resiliency docs to remove improve_zen branch and update link to dakrone's repo 2014-10-01 18:16:13 +02:00
Clinton Gormley fb18e2e9dd Added resiliency page to docs 2014-10-01 16:16:32 +02:00