OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Turner	8f4f844e6e	Add docs for filesystem health checks (#59134 ) Documents the feature and settings introduced in #52680. Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-07 14:14:58 +01:00
James Rodewig	cde5b7d2b3	[DOCS] Relocate discovery module content (#56611 ) (#57454 ) * Moves `Discovery and cluster formation` content from `Modules` to `Set up Elasticsearch`. * Combines `Adding and removing nodes` with `Adding nodes to your cluster`. Adds related redirect. * Removes and redirects the `Modules` page. * Rewrites parts of `Discovery and cluster formation` to remove `module` references and meta references to the section.	2020-06-01 14:13:13 -04:00
David Turner	69b78f7f8a	"Adding nodes" instructions only work on localhost (#52677 ) The introductory sections of the reference manual contains some simplified instructions for adding a node to the cluster. Unfortunately they are a little too simplified and only really work for clusters running on `localhost`. If you try and follow these instructions for a distributed cluster then the new node will, confusingly, auto-bootstrap itself into a distinct one-node cluster. Multiple nodes running on localhost is a valid config, of course, but we should spell out that these instructions are really only for experimentation and that it takes a bit more work to add nodes to a distributed cluster. This commit does so. Also, the "important config" instructions for discovery say that you MUST set `discovery.seed_hosts` whereas in fact it is fine to ignore this setting and use a dynamic discovery mechanism instead. This commit weakens this statement and links to the docs for dynamic discovery mechanisms. Finally, this section is also overloaded with some technical details that are not important for this context and are adequately covered elsewhere, and completely fails to note that the default discovery port is 9300. This commit addresses this.	2020-02-27 09:18:37 +00:00
David Turner	00b9098250	Ignore timeouts with single-node discovery (#52159 ) Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely if joining a faulty master that is too slow to respond, and `cluster.publish.timeout` to allow a faulty master to detect that it is unable to publish its cluster state updates in a timely fashion. If these timeouts occur then the node restarts the discovery process in an attempt to find a healthier master. In the special case of `discovery.type: single-node` there is no point in looking for another healthier master since the single node in the cluster is all we've got. This commit suppresses these timeouts and instead lets the node wait for joins and publications to succeed no matter how long this might take.	2020-02-11 14:15:01 +00:00
David Turner	532ade7816	More logging for slow cluster state application (#45007 ) Today the lag detector may remove nodes from the cluster if they fail to apply a cluster state within a reasonable timeframe, but it is rather unclear from the default logging that this has occurred and there is very little extra information beyond the fact that the removed node was lagging. Moreover the only forewarning that the lag detector might be invoked is a message indicating that cluster state publication took unreasonably long, which does not contain enough information to investigate the problem further. This commit adds a good deal more detail to make the issues of slow nodes more prominent: - after 10 seconds (by default) we log an INFO message indicating that a publication is still waiting for responses from some nodes, including the identities of the problematic nodes. - when the publication times out after 30 seconds (by default) we log a WARN message identifying the nodes that are still pending. - the lag detector logs a more detailed warning when a fatally-lagging node is detected. - if applying a cluster state takes too long then the cluster applier service logs a breakdown of all the tasks it ran as part of that process.	2019-08-01 13:20:46 +01:00
Lisa Cawley	757c6a45a0	[DOCS] Adds discovery.type (#42823 ) Co-Authored-By: David Turner <david.turner@elastic.co>	2019-06-05 12:37:17 -07:00
David Turner	15fd233ae3	Minor cluster coordination docs fixes (#42111 ) Fixes a typo and a badly-formatted warning.	2019-05-15 09:27:08 -04:00
David Turner	36a8c7aa0b	Add 'DO NOT TOUCH' warnings to disco settings docs (#41211 )	2019-04-16 06:26:52 +01:00
David Turner	5a3c452480	Align docs etc with new discovery setting names (#38492 ) In #38333 and #38350 we moved away from the `discovery.zen` settings namespace since these settings have an effect even though Zen Discovery itself is being phased out. This change aligns the documentation and the names of related classes and methods with the newly-introduced naming conventions.	2019-02-06 11:34:38 +00:00
David Turner	3b2a0d7959	Rename no-master-block setting (#38350 ) Replaces `discovery.zen.no_master_block` with `cluster.no_master_block`. Any value set for the old setting is now ignored.	2019-02-05 08:47:56 +00:00
David Turner	2d114a02ff	Rename static Zen1 settings (#38333 ) Renames the following settings to remove the mention of `zen` in their names: - `discovery.zen.hosts_provider` -> `discovery.seed_providers` - `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers` - `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout` - `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`	2019-02-05 08:46:52 +00:00
Yannick Welsch	ece8c659c5	Decrease leader and follower check timeout (#38298 ) Reduces the leader and follower check timeout to 3 * 10 = 30s instead of 3 * 30 = 90s, with 30s still being a very long time for a node to be completely unresponsive.	2019-02-04 15:11:12 +01:00
Lisa Cawley	f307847f29	[DOCS] Adds overview and API ref for cluster voting configurations (#36954 )	2019-01-07 09:11:14 -08:00
Lisa Cawley	33e9cf3892	[DOCS] Merges list of discovery and cluster formation settings (#36909 )	2018-12-21 11:24:48 -08:00

14 Commits