Boaz Leskes 6861d3571e Persistent Node Ids (#19140)
Node IDs are currently randomly generated during node startup. That means they change every time the node is restarted. While this doesn't matter for ES proper, it makes it hard for external services to track nodes. Another, more minor, side effect is that indexing the output of, say, the node stats API results in creating new fields due to node ID being used as keys.

The first approach I considered was to use the node's published address as the base for the id. We already [treat nodes with the same address as the same](https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/discovery/zen/NodeJoinController.java#L387) so this is a simple change (see [here](https://github.com/elastic/elasticsearch/compare/master...bleskes:node_persistent_id_based_on_address)). While this is simple and it works for probably most cases, it is not perfect. For example, if after a node restart, the node is not able to bind to the same port (because it's not yet freed by the OS), it will cause the node to still change identity. Also in environments where the host IP can change due to a host restart, identity will not be the same. 

Due to those limitation, I opted to go with a different approach where the node id will be persisted in the node's data folder. This has the upside of connecting the id to the nodes data. It also means that the host can be adapted in any way (replace network cards, attach storage to a new VM). I

It does however also have downsides - we now run the risk of two nodes having the same id, if someone copies clones a data folder from one node to another. To mitigate this I changed the semantics of the protection against multiple nodes with the same address to be stricter - it will now reject the incoming join if a node exists with the same id but a different address. Note that if the existing node doesn't respond to pings (i.e., it's not alive) it will be removed and the new node will be accepted when it tries another join.

Last, and most importantly, this change requires that *all* nodes persist data to disk. This is a change from current behavior where only data & master nodes store local files. This is the main reason for marking this PR as breaking.

Other less important notes:
- DummyTransportAddress is removed as we need a unique network address per node. Use `LocalTransportAddress.buildUnique()` instead.
- I renamed `node.add_lid_to_custom_path` to `node.add_lock_id_to_custom_path` to avoid confusion with the node ID which is now part of the `NodeEnvironment` logic.
- I removed the `version` paramater from `MetaDataStateFormat#write` , it wasn't really used and was just in the way :)
- TribeNodes are special in the sense that they do start multiple sub-nodes (previously known as client nodes). Those sub-nodes do not store local files but derive their ID from the parent node id, so they are generated consistently.
2016-07-04 21:09:25 +02:00
..
2015-10-22 15:00:35 -07:00
2016-07-04 21:09:25 +02:00
2016-06-14 14:32:29 -04:00
2016-06-14 14:32:29 -04:00

The Elasticsearch docs are in AsciiDoc format and can be built using the
Elasticsearch documentation build process.

See: https://github.com/elastic/docs

Snippets marked with `// CONSOLE` are automatically annotated with "VIEW IN
SENSE" in the documentation and are automatically tested by the command
`gradle :docs:check`. By default `// CONSOLE` snippet runs as its own isolated
test. You can manipulate the test execution in the following ways:

* `// TEST`: Explicitly marks a snippet as a test. Snippets marked this way
are tests even if they don't have `// CONSOLE`.
  * `// TEST[s/foo/bar/]`: Replace `foo` with `bar` in the test. This should be
  used sparingly because it makes the test "lie". Sometimes, though, you can use
  it to make the tests more clear.
  * `// TEST[catch:foo]`: Used to expect errors in the requests. Replace `foo`
  with `request` to expect a 400 error, for example. If the snippet contains
  multiple requests then only the last request will expect the error.
  * `// TEST[continued]`: Continue the test started in the last snippet. Between
  tests the nodes are cleaned: indexes are removed, etc. This will prevent that.
  This is really useful when you have text and snippets that work together to
  tell the story of some use case because it merges the snippets (and thus the
  use case) into one big test.
  * `// TEST[skip:reason]`: Skip this test. Replace `reason` with the actual
  reason to skip the test. Snippets without `// TEST` or `// CONSOLE` aren't
  considered tests anyway but this is useful for explicitly documenting the
  reason why the test shouldn't be run.
  * `// TEST[setup:name]`: Run some setup code before running the snippet. This
  is useful for creating and populating indexes used in the snippet. The setup
  code is defined in `docs/build.gradle`.
* `// TESTRESPONSE`: Matches this snippet against the body of the response of
  the last test. If the response is JSON then order is ignored. With
  `// TEST[continued]` you can make tests that contain multiple command snippets
  and multiple response snippets.
  * `// TESTRESPONSE[s/foo/bar/]`: Substitutions. See `// TEST[s/foo/bar]`.
* `// TESTSETUP`: Marks this snippet as the "setup" for all other snippets in
  this file. This is a somewhat natural way of structuring documentation. You
  say "this is the data we use to explain this feature" then you add the
  snippet that you mark `// TESTSETUP` and then every snippet will turn into
  a test that runs the setup snippet first. See the "painless" docs for a file
  that puts this to good use. This is fairly similar to `// TEST[setup:name]`
  but rather than the setup defined in `docs/build.gradle` the setup is defined
  right in the documentation file.

Any place you can use json you can use elements like `$body.path.to.thing`
which is replaced on the fly with the contents of the thing at `path.to.thing`
in the last response.