The remove keystore command can handle multiple settings. In a few
places, we were not consistent about mentioning this. This commit
addreses this, in the CLI help, and the docs.
Today the keystore add-file command can only handle adding a single
setting/file pair in a single invocation. This incurs the startup costs
of the JVM many times, which in some environments can be expensive. This
commit teaches the add-file keystore command to accept adding multiple
settings in a single invocation.
The documentation was missing the long option for the force option, and
the short option for the stdin option. This commit addresses this by
adding these to the documentation.
Today the keystore add command can only handle adding a single
setting/value pair in a single invocation. This incurs the startup costs
of the JVM many times, which in some environments can be expensive. This
commit teaches the add keystore command to accept adding multiple
settings in a single invocation.
* Reload secure settings with password (#43197)
If a password is not set, we assume an empty string to be
compatible with previous behavior.
Only allow the reload to be broadcast to other nodes if TLS is
enabled for the transport layer.
* Add passphrase support to elasticsearch-keystore (#38498)
This change adds support for keystore passphrases to all subcommands
of the elasticsearch-keystore cli tool and adds a subcommand for
changing the passphrase of an existing keystore.
The work to read the passphrase in Elasticsearch when
loading, which will be addressed in a different PR.
Subcommands of elasticsearch-keystore can handle (open and create)
passphrase protected keystores
When reading a keystore, a user is only prompted for a passphrase
only if the keystore is passphrase protected.
When creating a keystore, a user is allowed (default behavior) to create one with an
empty passphrase
Passphrase can be set to be empty when changing/setting it for an
existing keystore
Relates to: #32691
Supersedes: #37472
* Restore behavior for force parameter (#44847)
Turns out that the behavior of `-f` for the add and add-file sub
commands where it would also forcibly create the keystore if it
didn't exist, was by design - although undocumented.
This change restores that behavior auto-creating a keystore that
is not password protected if the force flag is used. The force
OptionSpec is moved to the BaseKeyStoreCommand as we will presumably
want to maintain the same behavior in any other command that takes
a force option.
* Handle pwd protected keystores in all CLI tools (#45289)
This change ensures that `elasticsearch-setup-passwords` and
`elasticsearch-saml-metadata` can handle a password protected
elasticsearch.keystore.
For setup passwords the user would be prompted to add the
elasticsearch keystore password upon running the tool. There is no
option to pass the password as a parameter as we assume the user is
present in order to enter the desired passwords for the built-in
users.
For saml-metadata, we prompt for the keystore password at all times
even though we'd only need to read something from the keystore when
there is a signing or encryption configuration.
* Modify docs for setup passwords and saml metadata cli (#45797)
Adds a sentence in the documentation of `elasticsearch-setup-passwords`
and `elasticsearch-saml-metadata` to describe that users would be
prompted for the keystore's password when running these CLI tools,
when the keystore is password protected.
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
* Elasticsearch keystore passphrase for startup scripts (#44775)
This commit allows a user to provide a keystore password on Elasticsearch
startup, but only prompts when the keystore exists and is encrypted.
The entrypoint in Java code is standard input. When the Bootstrap class is
checking for secure keystore settings, it checks whether or not the keystore
is encrypted. If so, we read one line from standard input and use this as the
password. For simplicity's sake, we allow a maximum passphrase length of 128
characters. (This is an arbitrary limit and could be increased or eliminated.
It is also enforced in the keystore tools, so that a user can't create a
password that's too long to enter at startup.)
In order to provide a password on standard input, we have to account for four
different ways of starting Elasticsearch: the bash startup script, the Windows
batch startup script, systemd startup, and docker startup. We use wrapper
scripts to reduce systemd and docker to the bash case: in both cases, a
wrapper script can read a passphrase from the filesystem and pass it to the
bash script.
In order to simplify testing the need for a passphrase, I have added a
has-passwd command to the keystore tool. This command can run silently, and
exit with status 0 when the keystore has a password. It exits with status 1 if
the keystore doesn't exist or exists and is unencrypted.
A good deal of the code-change in this commit has to do with refactoring
packaging tests to cleanly use the same tests for both the "archive" and the
"package" cases. This required not only moving tests around, but also adding
some convenience methods for an abstraction layer over distribution-specific
commands.
* Adjust docs for password protected keystore (#45054)
This commit adds relevant parts in the elasticsearch-keystore
sub-commands reference docs and in the reload secure settings API
doc.
* Fix failing Keystore Passphrase test for feature branch (#50154)
One problem with the passphrase-from-file tests, as written, is that
they would leave a SystemD environment variable set when they failed,
and this setting would cause elasticsearch startup to fail for other
tests as well. By using a try-finally, I hope that these tests will fail
more gracefully.
It appears that our Fedora and Ubuntu environments may be configured to
store journald information under /var rather than under /run, so that it
will persist between boots. Our destructive tests that read from the
journal need to account for this in order to avoid trying to limit the
output we check in tests.
* Run keystore management tests on docker distros (#50610)
* Add Docker handling to PackagingTestCase
Keystore tests need to be able to run in the Docker case. We can do this
by using a DockerShell instead of a plain Shell when Docker is running.
* Improve ES startup check for docker
Previously we were checking truncated output for the packaged JDK as
an indication that Elasticsearch had started. With new preliminary
password checks, we might get a false positive from ES keystore
commands, so we have to check specifically that the Elasticsearch
class from the Bootstrap package is what's running.
* Test password-protected keystore with Docker (#50803)
This commit adds two tests for the case where we mount a
password-protected keystore into a Docker container and provide a
password via a Docker environment variable.
We also fix a logging bug where we were logging the identifier for an
array of strings rather than the contents of that array.
* Add documentation for keystore startup prompting (#50821)
When a keystore is password-protected, Elasticsearch will prompt at
startup. This commit adds documentation for this prompt for the archive,
systemd, and Docker cases.
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
* Warn when unable to upgrade keystore on debian (#51011)
For Red Hat RPM upgrades, we warn if we can't upgrade the keystore. This
commit brings the same logic to the code for Debian packages. See the
posttrans file for gets executed for RPMs.
* Restore handling of string input
Adds tests that were mistakenly removed. One of these tests proved
we were not handling the the stdin (-x) option correctly when no
input was added. This commit restores the original approach of
reading stdin one char at a time until there is no more (-1, \r, \n)
instead of using readline() that might return null
* Apply spotless reformatting
* Use '--since' flag to get recent journal messages
When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.
Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.
It seems to me that we might be able to use journald's "--since" flag to
retrieve only log messages from the last run, and that this might be
less likely to fail due to race conditions in file deletion.
Unfortunately, it looks as if the "--since" flag has a granularity of
one-second. I've added a two-second sleep to make sure that there's a
sufficient gap between the test that will read from journald and the
test before it.
* Use new journald wrapper pattern
* Update version added in secure settings request
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
* Move metadata storage to Lucene (#50907)
Today we split the on-disk cluster metadata across many files: one file for the metadata of each
index, plus one file for the global metadata and another for the manifest. Most metadata updates
only touch a few of these files, but some must write them all. If a node holds a large number of
indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election
from succeeding.
This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version
of the following PRs that were targeting a feature branch:
* Introduce Lucene-based metadata persistence (#48733)
This commit introduces `LucenePersistedState` which master-eligible nodes
can use to persist the cluster metadata in a Lucene index rather than in
many separate files.
Relates #48701
* Remove per-index metadata without assigned shards (#49234)
Today on master-eligible nodes we maintain per-index metadata files for every
index. However, we also keep this metadata in the `LucenePersistedState`, and
only use the per-index metadata files for importing dangling indices. However
there is no point in importing a dangling index without any shard data, so we
do not need to maintain these extra files any more.
This commit removes per-index metadata files from nodes which do not hold any
shards of those indices.
Relates #48701
* Use Lucene exclusively for metadata storage (#50144)
This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds
an interoperability layer for upgrades from prior versions.
This commit disables a number of tests related to dangling indices and command-line tools.
Those will be addressed in follow-ups.
Relates #48701
* Add command-line tool support for Lucene-based metadata storage (#50179)
Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard
commands) for the Lucene-based metadata storage.
Relates #48701
* Use single directory for metadata (#50639)
Earlier PRs for #48701 introduced a separate directory for the cluster state. This is not needed
though, and introduces an additional unnecessary cognitive burden to the users.
Co-Authored-By: David Turner <david.turner@elastic.co>
* Add async dangling indices support (#50642)
Adds support for writing out dangling indices in an asynchronous way. Also provides an option to
avoid writing out dangling indices at all.
Relates #48701
* Fold node metadata into new node storage (#50741)
Moves node metadata to uses the new storage mechanism (see #48701) as the authoritative source.
* Write CS asynchronously on data-only nodes (#50782)
Writes cluster states out asynchronously on data-only nodes. The main reason for writing out
the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a
bit of bootstrap validation and so that the shard recovery tools work.
Cluster states that are written asynchronously have their voting configuration adapted to a non
existing configuration so that these nodes cannot mistakenly become master even if their node
role is changed back and forth.
Relates #48701
* Remove persistent cluster settings tool (#50694)
Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on
disk cluster state in case where it contains incompatible settings that prevent the cluster from
forming.
Relates #48701
* Make cluster state writer resilient to disk issues (#50805)
Adds handling to make the cluster state writer resilient to disk issues. Relates to #48701
* Omit writing global metadata if no change (#50901)
Uses the same optimization for the new cluster state storage layer as the old one, writing global
metadata only when changed. Avoids writing out the global metadata if none of the persistent
fields changed. Speeds up server:integTest by ~10%.
Relates #48701
* DanglingIndicesIT should ensure node removed first (#50896)
These tests occasionally failed because the deletion was submitted before the
restarting node was removed from the cluster, causing the deletion not to be
fully acked. This commit fixes this by checking the restarting node has been
removed from the cluster.
Co-authored-by: David Turner <david.turner@elastic.co>
* fix tests
Co-authored-by: David Turner <david.turner@elastic.co>
Today the `elasticsearch-shard remove-corrupted-data` tool will only truncate a
translog it determines to be corrupt. However there may be other cases in which
it is desirable to truncate the translog, for instance if an operation in the
translog cannot be replayed for some reason other than corruption. This commit
adds a `--truncate-clean-translog` option to skip the corruption check on the
translog and blindly truncate it.
Downgrading an Elasticsearch node to an earlier version is unsupported, because
we do not make any attempt to guarantee that a node can read any of the on-disk
data written by a future version. Yet today we do not actively prevent
downgrades, and sometimes users will attempt to roll back a failed upgrade with
an in-place downgrade and get into an unrecoverable state.
This change adds the current version of the node to the node metadata file, and
checks the version found in this file against the current version at startup.
If the node cannot be sure of its ability to read the on-disk data then it
refuses to start, preserving any on-disk data in its upgraded state.
This change also adds a command-line tool to overwrite the node metadata file
without performing any version checks, to unsafely bypass these checks and
recover the historical and lenient behaviour.
The migrate tool was added when the native realm was created, to aid
users in converting from file realms that were per node, into the
cluster managed native realm. While this tool was useful at the time,
users should now be using the native realm directly. This commit
deprecates the tool, to be removed in a followup for 8.0.
Added documentation for node repurpose tool and included documentation on how to repurpose nodes safely. Adjusted order of tools in `elasticsearch-node` tool since the repurpose tool is most likely to be used.
Co-Authored-By: David Turner <david.turner@elastic.co>
Improve the documentation of parameter --pass of elasticsearch-certutil
Backport of: #40137
Co-Authored-By: Diego Cardozo Sandrim <diegocsandrim@users.noreply.github.com>
Co-Authored-By: Vigneash Sundar <vikene@users.noreply.github.com>
This commit, mostly authored by @DaveCTurner,
adds documentation for elasticsearch-node tool #37696.
(cherry picked from commit 09425d5a5158c2d3fdad794411b3bbc4bba47b15)
* Adding new MonitoredSystem for APM server
* Teaching Monitoring template utils about APM server monitoring indices
* Documenting new monitoring index for APM server
* Adding monitoring index template for APM server
* Copy pasta typo
* Removing metrics.libbeat.config section from mapping
* Adding built-in user and role for APM server user
* Actually define the role :)
* Adding missing import
* Removing index template and system ID for apm server
* Shortening line lengths
* Updating expected number of built-in users in integration test
* Removing "system" from role and user names
* Rearranging users to make tests pass