Commit Graph

264 Commits

Author SHA1 Message Date
Lee Hinman 4a08928c47
[7.x] Add index.routing.allocation.include._tier_preference setting (#62589) (#62667)
This commit adds the `index.routing.allocation.prefer._tier` setting to the
`DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a
preference-based list of tiers for an index to be assigned to. For example, if the setting were set
to:

```
"index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content"
```

If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be
allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and
`data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes.

This allows us to specify an index's preference for tier(s) without causing the index to be
unassigned if no nodes of a preferred tier are available.

Subsequent work will change the ILM migration to make additional use of this setting.

Relates to #60848
2020-09-18 15:41:36 -06:00
James Rodewig e65778c222
[DOCS] Fix typo in nodes stats docs (#61601) (#61717)
Co-authored-by: Henry <henryloh@ucla.edu>
2020-08-31 09:29:50 -04:00
Lee Hinman 1bfebd54ea
[7.x] Allocate newly created indices on data_hot tier nodes (#61342) (#61650)
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to #60848
2020-08-27 13:41:12 -06:00
James Rodewig 60876a0e32
[DOCS] Replace Wikipedia links with attribute (#61171) (#61209) 2020-08-17 11:27:04 -04:00
James Rodewig 26d51089da
[DOCS] Replace `twitter` dataset in docs (#60604) (#60609) 2020-08-03 13:31:19 -04:00
Tim Brooks 85fdf959ad
Add configured indexing memory limit to node stats (#60414)
This commit adds the configured memory limit to the node stats API.
2020-07-29 12:28:21 -06:00
David Turner 9450ea08b4 Log and track open/close of transport connections (#60297)
Transport connections between nodes remain in place until one or other
node shuts down or the connection is disrupted by a flaky network.
Today it is very difficult to demonstrate that transient failures and
cluster instability are caused by the network even though this is often
the case. In particular, transport connections open and close without
logging anything, even at `DEBUG` level, making it very hard to quantify
the scale of the problem or to correlate the networking problems with
external events.

This commit adds the missing `DEBUG`-level logging when transport
connections open and close, and also tracks the total number of
transport connections a node has opened as a measure of the stability of
the underlying network.
2020-07-28 17:08:04 +01:00
James Rodewig aba785cb6e
[DOCS] Update my-index examples (#60132) (#60248)
Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
2020-07-27 15:58:26 -04:00
Tim Brooks ba01540d7e
Implement human readable indexing pressure stats (#60058)
The indexing pressure stats do not currently have human readable
variants. This commit add human readable variants and updates the
documentation.
2020-07-22 12:07:59 -06:00
Tim Brooks ceb54ed655
Add indexing pressure documentation (#59456)
This commit adds documentation about the new indexing pressure memory
limit setting and exposure of this metrics in node stats.
2020-07-22 10:09:18 -06:00
James Rodewig b302b09b85
[DOCS] Reformat snippets to use two-space indents (#59973) (#59994) 2020-07-21 15:49:58 -04:00
David Turner b75207a09f Remove sporadic min/max usage estimates from stats (#59755)
Today `GET _nodes/stats/fs` includes `{least,most}_usage_estimate`
fields for some nodes. These fields have rather strange semantics. They
are only reported on the elected master and on nodes that have been the
elected master since they were last restarted; when a node stops being
the elected master these stats remain in place but we stop updating them
so they may become arbitrarily stale.

This means that these statistics are pretty meaningless and impossible
to use correctly. Even if they were kept up to date they're never
reported for data-only nodes anyway, despite the fact that data nodes
are the ones where we care most about disk usage. The information needed
to compute the path with the least/most available space is already
provided in the rest the stats output, so we can treat the inclusion of
these stats as a bug and fix it by simply removing them in this commit.
Since these stats were always optional and mostly omitted (for opaque
reasons) this is not considered a breaking change.
2020-07-20 15:22:04 +01:00
James Rodewig 6ed356ffc3
[DOCS] Replace `datatype` with `data type` (#58972) (#59184) 2020-07-07 14:59:35 -04:00
James Rodewig e64fe15c1f
[DOCS] Add data streams to cluster APIs docs (#58945) (#58974)
Makes existing docs for the cluster health and cluster state APIs aware
of data streams.
2020-07-02 17:20:55 -04:00
David Turner 3a234d2669
Account for remaining recovery in disk allocator (#58800)
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.

This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.

Backport of #58029 to 7.x

* Picky picky

* Missing type
2020-07-01 10:12:44 +01:00
Lisa Cawley 6680271691 [DOCS] Updates pull and issue release attributes (#58348) 2020-06-18 12:55:02 -07:00
David Turner 9b52a250f8 Add admonition to cluster state instability note (#57985)
We document that the cluster state API is an internal representation which may
change, but apparently not emphatically enough. This commit adds a `NOTE:`
admonition to this paragraph.
2020-06-11 15:32:18 +01:00
Lisa Cawley db5bf92acf
[7.x][DOCS] Replace docdir attribute with es-repo-dir (#57489) (#57494) 2020-06-01 16:42:53 -07:00
James Rodewig 12813d5918
[DOCS] Document `dynamic` and `static` setting types (#56919) (#57457) 2020-06-01 14:19:52 -04:00
Théophile Helleboid - chtitux 8a23da429a Docs fix node_id spec for secure settings reload API (#55712)
Fix docs typo for the `node_id` parameter in the secure settings reload API.
2020-05-05 11:21:02 +03:00
David Turner 69f50fe79f Improve same-shard allocation explanations (#56010)
I see occasional confusion about the explanations emitted by the same-shard
allocation decider, particularly amongst new users setting up a single-node
cluster and trying to determine why their cluster has `yellow` health. For
example:

    the shard cannot be allocated to the same node on which a copy of the shard
    already exists

This is technically correct but it's quite a complicated sentence. Also, by
starting with "the shard cannot be allocated" it makes it sound like this is
the problem, whereas in fact this message is a good thing and users should
typically focus their attention elsewhere.

This commit simplifies the wording of these messages and makes them sound more
positive, for example:

    a copy of this shard is already allocated to this node
2020-05-01 10:07:14 +01:00
Igor Motov d8f9df771d
Expose agg usage in Feature Usage API (#55732) (#56048)
Counts usage of the aggs and exposes them on the _nodes/usage/.

Closes #53746
2020-04-30 12:53:36 -04:00
David Turner 8e618fdf10 Adjust docs for voting config exclusions API (#55006)
In #50836 we deprecated the existing voting config exclusions API and added a
new one. This commit adjust the docs to match.
2020-04-20 19:47:33 +01:00
James Rodewig f87a3f0c48 [DOCS] Document analysis/mapping response for cluster stats API (#55054)
PR #51260 moved usage counts about mapping field types and analysis to
the `_cluster/stats` API.

This documents those stats in the response section of the cluster stats
API docs.
2020-04-17 08:44:10 -04:00
Nhat Nguyen 96bb1164f0 Support hierarchical task cancellation (#54757)
With this change, when a task is canceled, the task manager will cancel
not only its direct child tasks but all also its descendant tasks.

Closes #50990
2020-04-13 12:35:21 -04:00
Ioannis Kakavas 7a8a66d9ae
[7.x] Fix ReloadSecureSettings API to consume password (#54771) (#55059)
The secure_settings_password was never taken into consideration in
the ReloadSecureSettings API. This commit fixes that and adds
necessary REST layer testing. Doing so, it also:

- Allows TestClusters to have a password protected keystore
so that it can be set for tests.
- Adds a parameter to the run task so that elastisearch can
be run with a password protected keystore from source.
2020-04-13 09:50:55 +03:00
Jason Tedor 9eeae59a83
Clarify available processors (#54907)
The use of available processors, the terminology, and the settings
around it have evolved over time. This commit cleans up some places in
the codes and in the docs to adjust to the current terminology.
2020-04-10 08:48:27 -04:00
Vishal Patel 51cb0c5c7b [DOCS] Collapse nested objects in cluster reroute docs (#54851) 2020-04-09 15:29:22 -04:00
James Rodewig e9c3bfc8e5 [DOCS] Collapse nested objects in node stats API response (#54755)
Replaces dot notation with collapsed nested object formatting
per the [Elastic API reference template][0].

[0]:https://github.com/elastic/docs/blob/master/shared/api-ref-ex.asciidoc
2020-04-06 15:19:54 -04:00
James Rodewig 548ad03941 [DOCS] Collapse nested objects in cluster stats API response (#54739)
Replaces dot notation with collapsed nested object formatting
per the [Elastic API reference template][0].

[0]:https://github.com/elastic/docs/blob/master/shared/api-ref-ex.asciidoc
2020-04-06 13:11:46 -04:00
Nhat Nguyen 2fdbed7797 Broadcast cancellation to only nodes have outstanding child tasks (#54312)
Today when canceling a task we broadcast ban/unban requests to all nodes
in the cluster. This strategy does not scale well for hierarchical
cancellation. With this change, we will track outstanding child requests
and broadcast the cancellation to only nodes that have outstanding child
tasks. This change also prevents a parent task from sending child
requests once it got canceled.

Relates #50990
Supersedes #51157

Co-authored-by: Igor Motov <igor@motovs.org>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
2020-04-06 11:11:29 -04:00
James Rodewig 2fdf6b2f96
[DOCS] Document missing data types for node stats API's response parameters (#53475)
Documents missing data types for several response parameters returned
by the node stats API.

Also adds several missing human-readable parameters returned by the API.
2020-03-25 08:42:58 -04:00
Tim Brooks caefa78513
Align remote info api with new settings (#54102)
Currently the remote info api has added a number of possible fields
(proxy, num_socket_connections, etc) that are available in proxy mode.
These fields are not aligned with what the settings are named. This
commit modifies this API to align with the settings.
2020-03-24 10:27:24 -06:00
Tim Brooks 531cd5aaa4
Add documentation for remote cluster proxy mode (#52779)
This is related to #49067.
2020-03-16 14:18:29 -06:00
James Rodewig 641eb69520 [DOCS] Document `nodes` cluster stats (#52813)
Documents the `nodes` response parameters returned by the
`_cluster/stats` API.

Also adds collapsible attributes for the `indices` and `nodes`
sections.
2020-03-10 05:05:05 -04:00
Nattachai Suteerapongpan 14f847cc8f [DOCS] Fix typo in task management API docs (#52881) 2020-02-27 11:31:11 -05:00
Marios Trivyzas c03f51f68f
[Docs] Clarify default value for `allow_no_indices` (#52635) (#52697)
Add default value to each one of the usages of `allow_no_indices`
since it differs between different APIs.

Relates to: #52534

(cherry picked from commit 2eb986488ac326d6da6ab8ad0203a94e08684a36)
2020-02-24 11:57:32 +01:00
James Rodewig 068181b0b6 [DOCS] Add missing `indices` parms returned by `_nodes/stats` (#52055)
Adds several human-readable `indices` parameters returned by the
`_nodes/stats` API.
2020-02-21 08:15:59 -05:00
Adrien Grand ad9d2f1922
Move analysis/mappings stats to cluster-stats. (#51875)
Closes #51138
2020-02-05 11:02:25 +01:00
William Brafford ba2810f23d
Use standard format for reload settings API (#51560) (#51828)
* Use standard format for reload settings API

The reload-secure-settings API page was not reorganized for the standard
API format, so this commit is reorganizing the page and adding some
links to the page in related documentation.

* Fix broken links

* Reorder examples to correctly check API response

* Note that only certain settings are reloadable

* [DOCS] Edits layout

* [DOCS] Removes unnecessary callouts

Co-authored-by: Lisa Cawley <lcawley@elastic.co>

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-02-03 18:07:26 -05:00
James Rodewig 1545c2ab26 [DOCS] Document node stats response meta (#51263)
Documents several metadata-related parameters returned by the
`GET _nodes/stats` API.
2020-02-03 08:33:57 -05:00
Henning Andersen ca8601373a [DOCS] Task management API experimental status issue (#51634)
Add issue reference to documentation.

Relates #51628
2020-01-30 14:15:47 +01:00
James Rodewig 139305ffc8 [DOCS] Document `indices` cluster stats (#50527)
Documents the header and `indices` response parameters returned by the
`_cluster/stats` API.

Co-Authored-By: David Turner <david.turner@elastic.co>
2020-01-28 11:00:00 -05:00
William Brafford 9efa5be60e
Password-protected Keystore Feature Branch PR (#51123) (#51510)
* Reload secure settings with password (#43197)

If a password is not set, we assume an empty string to be
compatible with previous behavior.
Only allow the reload to be broadcast to other nodes if TLS is
enabled for the transport layer.

* Add passphrase support to elasticsearch-keystore (#38498)

This change adds support for keystore passphrases to all subcommands
of the elasticsearch-keystore cli tool and adds a subcommand for
changing the passphrase of an existing keystore.
The work to read the passphrase in Elasticsearch when
loading, which will be addressed in a different PR.

Subcommands of elasticsearch-keystore can handle (open and create)
passphrase protected keystores

When reading a keystore, a user is only prompted for a passphrase
only if the keystore is passphrase protected.

When creating a keystore, a user is allowed (default behavior) to create one with an
empty passphrase

Passphrase can be set to be empty when changing/setting it for an
existing keystore

Relates to: #32691
Supersedes: #37472

* Restore behavior for force parameter (#44847)

Turns out that the behavior of `-f` for the add and add-file sub
commands where it would also forcibly create the keystore if it
didn't exist, was by design - although undocumented.
This change restores that behavior auto-creating a keystore that
is not password protected if the force flag is used. The force
OptionSpec is moved to the BaseKeyStoreCommand as we will presumably
want to maintain the same behavior in any other command that takes
a force option.

*  Handle pwd protected keystores in all CLI tools  (#45289)

This change ensures that `elasticsearch-setup-passwords` and
`elasticsearch-saml-metadata` can handle a password protected
elasticsearch.keystore.
For setup passwords the user would be prompted to add the
elasticsearch keystore password upon running the tool. There is no
option to pass the password as a parameter as we assume the user is
present in order to enter the desired passwords for the built-in
users.
For saml-metadata, we prompt for the keystore password at all times
even though we'd only need to read something from the keystore when
there is a signing or encryption configuration.

* Modify docs for setup passwords and saml metadata cli (#45797)

Adds a sentence in the documentation of `elasticsearch-setup-passwords`
and `elasticsearch-saml-metadata` to describe that users would be
prompted for the keystore's password when running these CLI tools,
when the keystore is password protected.

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Elasticsearch keystore passphrase for startup scripts (#44775)

This commit allows a user to provide a keystore password on Elasticsearch
startup, but only prompts when the keystore exists and is encrypted.

The entrypoint in Java code is standard input. When the Bootstrap class is
checking for secure keystore settings, it checks whether or not the keystore
is encrypted. If so, we read one line from standard input and use this as the
password. For simplicity's sake, we allow a maximum passphrase length of 128
characters. (This is an arbitrary limit and could be increased or eliminated.
It is also enforced in the keystore tools, so that a user can't create a
password that's too long to enter at startup.)

In order to provide a password on standard input, we have to account for four
different ways of starting Elasticsearch: the bash startup script, the Windows
batch startup script, systemd startup, and docker startup. We use wrapper
scripts to reduce systemd and docker to the bash case: in both cases, a
wrapper script can read a passphrase from the filesystem and pass it to the
bash script.

In order to simplify testing the need for a passphrase, I have added a
has-passwd command to the keystore tool. This command can run silently, and
exit with status 0 when the keystore has a password. It exits with status 1 if
the keystore doesn't exist or exists and is unencrypted.

A good deal of the code-change in this commit has to do with refactoring
packaging tests to cleanly use the same tests for both the "archive" and the
"package" cases. This required not only moving tests around, but also adding
some convenience methods for an abstraction layer over distribution-specific
commands.

* Adjust docs for password protected keystore (#45054)

This commit adds relevant parts in the elasticsearch-keystore
sub-commands reference docs and in the reload secure settings API
doc.

* Fix failing Keystore Passphrase test for feature branch (#50154)

One problem with the passphrase-from-file tests, as written, is that
they would leave a SystemD environment variable set when they failed,
and this setting would cause elasticsearch startup to fail for other
tests as well. By using a try-finally, I hope that these tests will fail
more gracefully.

It appears that our Fedora and Ubuntu environments may be configured to
store journald information under /var rather than under /run, so that it
will persist between boots. Our destructive tests that read from the
journal need to account for this in order to avoid trying to limit the
output we check in tests.

* Run keystore management tests on docker distros (#50610)

* Add Docker handling to PackagingTestCase

Keystore tests need to be able to run in the Docker case. We can do this
by using a DockerShell instead of a plain Shell when Docker is running.

* Improve ES startup check for docker

Previously we were checking truncated output for the packaged JDK as
an indication that Elasticsearch had started. With new preliminary
password checks, we might get a false positive from ES keystore
commands, so we have to check specifically that the Elasticsearch
class from the Bootstrap package is what's running.

* Test password-protected keystore with Docker (#50803)

This commit adds two tests for the case where we mount a
password-protected keystore into a Docker container and provide a
password via a Docker environment variable.

We also fix a logging bug where we were logging the identifier for an
array of strings rather than the contents of that array.

* Add documentation for keystore startup prompting (#50821)

When a keystore is password-protected, Elasticsearch will prompt at
startup. This commit adds documentation for this prompt for the archive,
systemd, and Docker cases.

Co-authored-by: Lisa Cawley <lcawley@elastic.co>

* Warn when unable to upgrade keystore on debian (#51011)

For Red Hat RPM upgrades, we warn if we can't upgrade the keystore. This
commit brings the same logic to the code for Debian packages. See the
posttrans file for gets executed for RPMs.

* Restore handling of string input

Adds tests that were mistakenly removed. One of these tests proved
we were not handling the the stdin (-x) option correctly when no
input was added. This commit restores the original approach of
reading stdin one char at a time until there is no more (-1, \r, \n)
instead of using readline() that might return null

* Apply spotless reformatting

* Use '--since' flag to get recent journal messages

When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.

Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.

It seems to me that we might be able to use journald's "--since" flag to
retrieve only log messages from the last run, and that this might be
less likely to fail due to race conditions in file deletion.

Unfortunately, it looks as if the "--since" flag has a granularity of
one-second. I've added a two-second sleep to make sure that there's a
sufficient gap between the test that will read from journald and the
test before it.

* Use new journald wrapper pattern

* Update version added in secure settings request

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
2020-01-28 05:32:32 -05:00
James Rodewig b6bf64b969 [DOCS] Collapse node stats response sections (#51063)
elastic/docs#1687 added support for the `[%collapsible]` Asciidoc
attribute, which creates collapsible sections in the HTML output.

This PR makes two related changes to the nodes stats API documentation:

* Makes the response parameter sections collapsible. This allows users
  to more easily navigate the page without long walls of text.

* Reorders the response parameter sections to match the default order
  returned by the API.

Relates to #47524.
2020-01-16 13:19:29 -05:00
James Rodewig 1211772f6a [DOCS] Use same index in Cluster Allocation Explain docs (#50936)
Updates several example snippets in the Cluster Allocation Explain API
docs to consistently use the `my_index` index.

Previously, the snippets switches from `my_index` to `idx`, which could
confuse users.

Co-authored-by: Emmanuel DEMEY <demey.emmanuel@gmail.com>

Co-authored-by: Emmanuel DEMEY <demey.emmanuel@gmail.com>
2020-01-16 09:15:26 -05:00
James Rodewig a290762df1 [DOCS] Document `breakers`, `script`, and `discovery` node stats (#50509)
Documents the `breakers`, `script`, and `discovery` parameters returned
by the `_nodes/stats` API.
2020-01-14 16:51:50 -05:00
James Rodewig 1cec87c0e6 [DOCS] Document `transport` and `http` node stats (#50473)
Documents the `transport` and `http` parameters returned by the
`_nodes/stats` API.
2019-12-26 07:43:14 -05:00
James Rodewig 73ca2e5826 [DOCS] Document `thread_pool` node stats (#50330) 2019-12-18 17:02:00 -05:00
James Rodewig 249334f38d [DOCS] Document JVM node stats (#49500)
* [DOCS] Document JVM node stats

Documents the `jvm` parameters returned by the `_nodes/stats` API.

Co-Authored-By: James Baiera <james.baiera@gmail.com>
2019-12-12 15:42:18 -05:00