🔎 Open source distributed and RESTful search engine.
Go to file
William Brafford 9efa5be60e
Password-protected Keystore Feature Branch PR (#51123) (#51510)
* Reload secure settings with password (#43197)

If a password is not set, we assume an empty string to be
compatible with previous behavior.
Only allow the reload to be broadcast to other nodes if TLS is
enabled for the transport layer.

* Add passphrase support to elasticsearch-keystore (#38498)

This change adds support for keystore passphrases to all subcommands
of the elasticsearch-keystore cli tool and adds a subcommand for
changing the passphrase of an existing keystore.
The work to read the passphrase in Elasticsearch when
loading, which will be addressed in a different PR.

Subcommands of elasticsearch-keystore can handle (open and create)
passphrase protected keystores

When reading a keystore, a user is only prompted for a passphrase
only if the keystore is passphrase protected.

When creating a keystore, a user is allowed (default behavior) to create one with an
empty passphrase

Passphrase can be set to be empty when changing/setting it for an
existing keystore

Relates to: #32691
Supersedes: #37472

* Restore behavior for force parameter (#44847)

Turns out that the behavior of `-f` for the add and add-file sub
commands where it would also forcibly create the keystore if it
didn't exist, was by design - although undocumented.
This change restores that behavior auto-creating a keystore that
is not password protected if the force flag is used. The force
OptionSpec is moved to the BaseKeyStoreCommand as we will presumably
want to maintain the same behavior in any other command that takes
a force option.

*  Handle pwd protected keystores in all CLI tools  (#45289)

This change ensures that `elasticsearch-setup-passwords` and
`elasticsearch-saml-metadata` can handle a password protected
elasticsearch.keystore.
For setup passwords the user would be prompted to add the
elasticsearch keystore password upon running the tool. There is no
option to pass the password as a parameter as we assume the user is
present in order to enter the desired passwords for the built-in
users.
For saml-metadata, we prompt for the keystore password at all times
even though we'd only need to read something from the keystore when
there is a signing or encryption configuration.

* Modify docs for setup passwords and saml metadata cli (#45797)

Adds a sentence in the documentation of `elasticsearch-setup-passwords`
and `elasticsearch-saml-metadata` to describe that users would be
prompted for the keystore's password when running these CLI tools,
when the keystore is password protected.

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Elasticsearch keystore passphrase for startup scripts (#44775)

This commit allows a user to provide a keystore password on Elasticsearch
startup, but only prompts when the keystore exists and is encrypted.

The entrypoint in Java code is standard input. When the Bootstrap class is
checking for secure keystore settings, it checks whether or not the keystore
is encrypted. If so, we read one line from standard input and use this as the
password. For simplicity's sake, we allow a maximum passphrase length of 128
characters. (This is an arbitrary limit and could be increased or eliminated.
It is also enforced in the keystore tools, so that a user can't create a
password that's too long to enter at startup.)

In order to provide a password on standard input, we have to account for four
different ways of starting Elasticsearch: the bash startup script, the Windows
batch startup script, systemd startup, and docker startup. We use wrapper
scripts to reduce systemd and docker to the bash case: in both cases, a
wrapper script can read a passphrase from the filesystem and pass it to the
bash script.

In order to simplify testing the need for a passphrase, I have added a
has-passwd command to the keystore tool. This command can run silently, and
exit with status 0 when the keystore has a password. It exits with status 1 if
the keystore doesn't exist or exists and is unencrypted.

A good deal of the code-change in this commit has to do with refactoring
packaging tests to cleanly use the same tests for both the "archive" and the
"package" cases. This required not only moving tests around, but also adding
some convenience methods for an abstraction layer over distribution-specific
commands.

* Adjust docs for password protected keystore (#45054)

This commit adds relevant parts in the elasticsearch-keystore
sub-commands reference docs and in the reload secure settings API
doc.

* Fix failing Keystore Passphrase test for feature branch (#50154)

One problem with the passphrase-from-file tests, as written, is that
they would leave a SystemD environment variable set when they failed,
and this setting would cause elasticsearch startup to fail for other
tests as well. By using a try-finally, I hope that these tests will fail
more gracefully.

It appears that our Fedora and Ubuntu environments may be configured to
store journald information under /var rather than under /run, so that it
will persist between boots. Our destructive tests that read from the
journal need to account for this in order to avoid trying to limit the
output we check in tests.

* Run keystore management tests on docker distros (#50610)

* Add Docker handling to PackagingTestCase

Keystore tests need to be able to run in the Docker case. We can do this
by using a DockerShell instead of a plain Shell when Docker is running.

* Improve ES startup check for docker

Previously we were checking truncated output for the packaged JDK as
an indication that Elasticsearch had started. With new preliminary
password checks, we might get a false positive from ES keystore
commands, so we have to check specifically that the Elasticsearch
class from the Bootstrap package is what's running.

* Test password-protected keystore with Docker (#50803)

This commit adds two tests for the case where we mount a
password-protected keystore into a Docker container and provide a
password via a Docker environment variable.

We also fix a logging bug where we were logging the identifier for an
array of strings rather than the contents of that array.

* Add documentation for keystore startup prompting (#50821)

When a keystore is password-protected, Elasticsearch will prompt at
startup. This commit adds documentation for this prompt for the archive,
systemd, and Docker cases.

Co-authored-by: Lisa Cawley <lcawley@elastic.co>

* Warn when unable to upgrade keystore on debian (#51011)

For Red Hat RPM upgrades, we warn if we can't upgrade the keystore. This
commit brings the same logic to the code for Debian packages. See the
posttrans file for gets executed for RPMs.

* Restore handling of string input

Adds tests that were mistakenly removed. One of these tests proved
we were not handling the the stdin (-x) option correctly when no
input was added. This commit restores the original approach of
reading stdin one char at a time until there is no more (-1, \r, \n)
instead of using readline() that might return null

* Apply spotless reformatting

* Use '--since' flag to get recent journal messages

When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.

Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.

It seems to me that we might be able to use journald's "--since" flag to
retrieve only log messages from the last run, and that this might be
less likely to fail due to race conditions in file deletion.

Unfortunately, it looks as if the "--since" flag has a granularity of
one-second. I've added a two-second sleep to make sure that there's a
sufficient gap between the test that will read from journald and the
test before it.

* Use new journald wrapper pattern

* Update version added in secure settings request

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
2020-01-28 05:32:32 -05:00
.ci Enable tests in FIPS 140 in JDK 11 (#49485) 2020-01-27 11:14:52 +02:00
.github Add version command to issue template 2017-07-31 08:55:31 +09:00
benchmarks Apply 2-space indent to all gradle scripts (#49071) 2019-11-14 11:01:23 +00:00
buildSrc Formalize build snapshot (#51484) 2020-01-27 16:56:31 -05:00
client Enable tests in FIPS 140 in JDK 11 (#49485) 2020-01-27 11:14:52 +02:00
dev-tools Add shell script for performing atomic pushes across branches (#50401) 2019-12-19 12:55:36 -08:00
distribution Password-protected Keystore Feature Branch PR (#51123) (#51510) 2020-01-28 05:32:32 -05:00
docs Password-protected Keystore Feature Branch PR (#51123) (#51510) 2020-01-28 05:32:32 -05:00
gradle Upgrade gradle to 6.1.1 (#51460) 2020-01-27 11:25:05 -08:00
libs Password-protected Keystore Feature Branch PR (#51123) (#51510) 2020-01-28 05:32:32 -05:00
licenses Reorganize license files 2018-04-20 15:33:59 -07:00
modules Disable reindex against 0.90 on mac (#51449) 2020-01-27 12:42:12 +01:00
plugins Enable tests in FIPS 140 in JDK 11 (#49485) 2020-01-27 11:14:52 +02:00
qa Password-protected Keystore Feature Branch PR (#51123) (#51510) 2020-01-28 05:32:32 -05:00
rest-api-spec Begin moving date_histogram to offset rounding (take two) (#51271) (#51495) 2020-01-27 13:40:54 -05:00
server Password-protected Keystore Feature Branch PR (#51123) (#51510) 2020-01-28 05:32:32 -05:00
test Password-protected Keystore Feature Branch PR (#51123) (#51510) 2020-01-28 05:32:32 -05:00
x-pack Password-protected Keystore Feature Branch PR (#51123) (#51510) 2020-01-28 05:32:32 -05:00
.dir-locals.el Go back to 140 column limit in .dir-locals.el 2017-04-14 08:50:53 -06:00
.eclipseformat.xml Tweak formatter config for long generic lines (#51027) 2020-01-15 13:17:37 +00:00
.editorconfig Remove default indent from .editorconfig (#49183) 2019-11-18 08:05:53 +00:00
.gitattributes Add a CHANGELOG file for release notes. (#29450) 2018-04-18 07:42:05 -07:00
.gitignore Add generated benchmark files to gitignore (#51000) 2020-01-14 16:18:28 -07:00
CONTRIBUTING.md Require JDK 13 for compilation (#50004) 2019-12-11 16:29:15 -05:00
LICENSE.txt Clarify mixed license text (#45637) 2019-08-16 13:39:12 -04:00
NOTICE.txt Restore date aggregation performance in UTC case (#38221) (#38700) 2019-02-11 16:30:48 +03:00
README.asciidoc [DOCS] Convert main README to asciidoc (#50303) (#50384) 2019-12-19 12:58:22 -05:00
TESTING.asciidoc Detail the IDEs options for configuring the debug step (#48507) 2019-10-25 17:27:48 +03:00
Vagrantfile Password-protected Keystore Feature Branch PR (#51123) (#51510) 2020-01-28 05:32:32 -05:00
build.gradle Format projects under :distribution:tools (#51292) 2020-01-22 11:19:17 +00:00
gradle.properties Testclusters: improove timeout handling (#43440) 2019-07-01 11:39:53 +03:00
gradlew Upgrade to Gradle 6.0 (#49211) (#49994) 2019-12-09 11:34:35 -08:00
gradlew.bat Upgrade to Gradle 5.5 (#43788) (#43832) 2019-07-01 11:54:58 -07:00
settings.gradle Remove UBI-based Docker images (#50747) 2020-01-08 20:30:12 -05:00

README.asciidoc

= Elasticsearch

== A Distributed RESTful Search Engine

=== https://www.elastic.co/products/elasticsearch[https://www.elastic.co/products/elasticsearch]

Elasticsearch is a distributed RESTful search engine built for the cloud. Features include:

* Distributed and Highly Available Search Engine.
** Each index is fully sharded with a configurable number of shards.
** Each shard can have one or more replicas.
** Read / Search operations performed on any of the replica shards.
* Multi Tenant.
** Support for more than one index.
** Index level configuration (number of shards, index storage, ...).
* Various set of APIs
** HTTP RESTful API
** All APIs perform automatic node operation rerouting.
* Document oriented
** No need for upfront schema definition.
** Schema can be defined for customization of the indexing process.
* Reliable, Asynchronous Write Behind for long term persistency.
* (Near) Real Time Search.
* Built on top of Apache Lucene
** Each shard is a fully functional Lucene index
** All the power of Lucene easily exposed through simple configuration / plugins.
* Per operation consistency
** Single document level operations are atomic, consistent, isolated and durable.

== Getting Started

First of all, DON'T PANIC. It will take 5 minutes to get the gist of what Elasticsearch is all about.

=== Installation

* https://www.elastic.co/downloads/elasticsearch[Download] and unpack the Elasticsearch official distribution.
* Run `bin/elasticsearch` on Linux or macOS. Run `bin\elasticsearch.bat` on Windows.
* Run `curl -X GET http://localhost:9200/`.
* Start more servers ...

=== Indexing

Let's try and index some twitter like information. First, let's index some tweets (the `twitter` index will be created automatically):

----
curl -XPUT 'http://localhost:9200/twitter/_doc/1?pretty' -H 'Content-Type: application/json' -d '
{
    "user": "kimchy",
    "post_date": "2009-11-15T13:12:00",
    "message": "Trying out Elasticsearch, so far so good?"
}'

curl -XPUT 'http://localhost:9200/twitter/_doc/2?pretty' -H 'Content-Type: application/json' -d '
{
    "user": "kimchy",
    "post_date": "2009-11-15T14:12:12",
    "message": "Another tweet, will it be indexed?"
}'

curl -XPUT 'http://localhost:9200/twitter/_doc/3?pretty' -H 'Content-Type: application/json' -d '
{
    "user": "elastic",
    "post_date": "2010-01-15T01:46:38",
    "message": "Building the site, should be kewl"
}'
----

Now, let's see if the information was added by GETting it:

----
curl -XGET 'http://localhost:9200/twitter/_doc/1?pretty=true'
curl -XGET 'http://localhost:9200/twitter/_doc/2?pretty=true'
curl -XGET 'http://localhost:9200/twitter/_doc/3?pretty=true'
----

=== Searching

Mmm search..., shouldn't it be elastic?
Let's find all the tweets that `kimchy` posted:

----
curl -XGET 'http://localhost:9200/twitter/_search?q=user:kimchy&pretty=true'
----

We can also use the JSON query language Elasticsearch provides instead of a query string:

----
curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "match" : { "user": "kimchy" }
    }
}'
----

Just for kicks, let's get all the documents stored (we should see the tweet from `elastic` as well):

----
curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "match_all" : {}
    }
}'
----

We can also do range search (the `post_date` was automatically identified as date)

----
curl -XGET 'http://localhost:9200/twitter/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "range" : {
            "post_date" : { "from" : "2009-11-15T13:00:00", "to" : "2009-11-15T14:00:00" }
        }
    }
}'
----

There are many more options to perform search, after all, it's a search product no? All the familiar Lucene queries are available through the JSON query language, or through the query parser.

=== Multi Tenant - Indices

Man, that twitter index might get big (in this case, index size == valuation). Let's see if we can structure our twitter system a bit differently in order to support such large amounts of data.

Elasticsearch supports multiple indices. In the previous example we used an index called `twitter` that stored tweets for every user.

Another way to define our simple twitter system is to have a different index per user (note, though that each index has an overhead). Here is the indexing curl's in this case:

----
curl -XPUT 'http://localhost:9200/kimchy/_doc/1?pretty' -H 'Content-Type: application/json' -d '
{
    "user": "kimchy",
    "post_date": "2009-11-15T13:12:00",
    "message": "Trying out Elasticsearch, so far so good?"
}'

curl -XPUT 'http://localhost:9200/kimchy/_doc/2?pretty' -H 'Content-Type: application/json' -d '
{
    "user": "kimchy",
    "post_date": "2009-11-15T14:12:12",
    "message": "Another tweet, will it be indexed?"
}'
----

The above will index information into the `kimchy` index. Each user will get their own special index.

Complete control on the index level is allowed. As an example, in the above case, we might want to change from the default 1 shards with 1 replica per index, to 2 shards with 1 replica per index (because this user tweets a lot). Here is how this can be done (the configuration can be in yaml as well):

----
curl -XPUT http://localhost:9200/another_user?pretty -H 'Content-Type: application/json' -d '
{
    "settings" : {
        "index.number_of_shards" : 2,
        "index.number_of_replicas" : 1
    }
}'
----

Search (and similar operations) are multi index aware. This means that we can easily search on more than one
index (twitter user), for example:

----
curl -XGET 'http://localhost:9200/kimchy,another_user/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "match_all" : {}
    }
}'
----

Or on all the indices:

----
curl -XGET 'http://localhost:9200/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "match_all" : {}
    }
}'
----

And the cool part about that? You can easily search on multiple twitter users (indices), with different boost levels per user (index), making social search so much simpler (results from my friends rank higher than results from friends of my friends).

=== Distributed, Highly Available

Let's face it, things will fail....

Elasticsearch is a highly available and distributed search engine. Each index is broken down into shards, and each shard can have one or more replicas. By default, an index is created with 1 shard and 1 replica per shard (1/1). There are many topologies that can be used, including 1/10 (improve search performance), or 20/1 (improve indexing performance, with search executed in a map reduce fashion across shards).

In order to play with the distributed nature of Elasticsearch, simply bring more nodes up and shut down nodes. The system will continue to serve requests (make sure you use the correct http port) with the latest data indexed.

=== Where to go from here?

We have just covered a very small portion of what Elasticsearch is all about. For more information, please refer to the http://www.elastic.co/products/elasticsearch[elastic.co] website. General questions can be asked on the https://discuss.elastic.co[Elastic Forum] or https://ela.st/slack[on Slack]. The Elasticsearch GitHub repository is reserved for bug reports and feature requests only.

=== Building from Source

Elasticsearch uses https://gradle.org[Gradle] for its build system.

In order to create a distribution, simply run the `./gradlew assemble` command in the cloned directory.

The distribution for each project will be created under the `build/distributions` directory in that project.

See the xref:TESTING.asciidoc[TESTING] for more information about running the Elasticsearch test suite.

=== Upgrading from older Elasticsearch versions

In order to ensure a smooth upgrade process from earlier versions of Elasticsearch, please see our https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-upgrade.html[upgrade documentation] for more details on the upgrade process.