843 lines
29 KiB
Plaintext
Executable File
843 lines
29 KiB
Plaintext
Executable File
[[getting-started]]
|
|
= Getting started with {es}
|
|
|
|
[partintro]
|
|
--
|
|
Ready to take {es} for a test drive and see for yourself how you can use the
|
|
REST APIs to store, search, and analyze data?
|
|
|
|
Step through this getting started tutorial to:
|
|
|
|
. Get an {es} cluster up and running
|
|
. Index some sample documents
|
|
. Search for documents using the {es} query language
|
|
. Analyze the results using bucket and metrics aggregations
|
|
|
|
|
|
Need more context?
|
|
|
|
Check out the <<elasticsearch-intro,
|
|
{es} Introduction>> to learn the lingo and understand the basics of
|
|
how {es} works. If you're already familiar with {es} and want to see how it works
|
|
with the rest of the stack, you might want to jump to the
|
|
{stack-gs}/get-started-elastic-stack.html[Elastic Stack
|
|
Tutorial] to see how to set up a system monitoring solution with {es}, {kib},
|
|
{beats}, and {ls}.
|
|
|
|
TIP: The fastest way to get started with {es} is to
|
|
https://www.elastic.co/cloud/elasticsearch-service/signup[start a free 14-day
|
|
trial of {ess}] in the cloud.
|
|
--
|
|
|
|
[[getting-started-install]]
|
|
== Get {es} up and running
|
|
|
|
To take {es} for a test drive, you can create a
|
|
https://www.elastic.co/cloud/elasticsearch-service/signup[hosted deployment] on
|
|
the {ess} or set up a multi-node {es} cluster on your own
|
|
Linux, macOS, or Windows machine.
|
|
|
|
|
|
[float]
|
|
[[run-elasticsearch-local]]
|
|
=== Run {es} locally on Linux, macOS, or Windows
|
|
|
|
When you create a deployment on the {ess}, a master node and
|
|
two data nodes are provisioned automatically. By installing from the tar or zip
|
|
archive, you can start multiple instances of {es} locally to see how a multi-node
|
|
cluster behaves.
|
|
|
|
To run a three-node {es} cluster locally:
|
|
|
|
. Download the {es} archive for your OS:
|
|
+
|
|
Linux: https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-linux-x86_64.tar.gz[elasticsearch-{version}-linux-x86_64.tar.gz]
|
|
+
|
|
["source","sh",subs="attributes,callouts"]
|
|
--------------------------------------------------
|
|
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-linux-x86_64.tar.gz
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
+
|
|
macOS: https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-darwin-x86_64.tar.gz[elasticsearch-{version}-darwin-x86_64.tar.gz]
|
|
+
|
|
["source","sh",subs="attributes,callouts"]
|
|
--------------------------------------------------
|
|
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-darwin-x86_64.tar.gz
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
+
|
|
Windows:
|
|
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-{version}-windows-x86_64.zip[elasticsearch-{version}-windows-x86_64.zip]
|
|
|
|
. Extract the archive:
|
|
+
|
|
Linux:
|
|
+
|
|
["source","sh",subs="attributes,callouts"]
|
|
--------------------------------------------------
|
|
tar -xvf elasticsearch-{version}-linux-x86_64.tar.gz
|
|
--------------------------------------------------
|
|
+
|
|
macOS:
|
|
+
|
|
["source","sh",subs="attributes,callouts"]
|
|
--------------------------------------------------
|
|
tar -xvf elasticsearch-{version}-darwin-x86_64.tar.gz
|
|
--------------------------------------------------
|
|
+
|
|
Windows PowerShell:
|
|
+
|
|
["source","powershell",subs="attributes,callouts"]
|
|
--------------------------------------------------
|
|
Expand-Archive elasticsearch-{version}-windows-x86_64.zip
|
|
--------------------------------------------------
|
|
|
|
. Start {es} from the `bin` directory:
|
|
+
|
|
Linux and macOS:
|
|
+
|
|
["source","sh",subs="attributes,callouts"]
|
|
--------------------------------------------------
|
|
cd elasticsearch-{version}/bin
|
|
./elasticsearch
|
|
--------------------------------------------------
|
|
+
|
|
Windows:
|
|
+
|
|
["source","powershell",subs="attributes,callouts"]
|
|
--------------------------------------------------
|
|
cd elasticsearch-{version}\bin
|
|
.\elasticsearch.bat
|
|
--------------------------------------------------
|
|
+
|
|
You now have a single-node {es} cluster up and running!
|
|
|
|
. Start two more instances of {es} so you can see how a typical multi-node
|
|
cluster behaves. You need to specify unique data and log paths
|
|
for each node.
|
|
+
|
|
Linux and macOS:
|
|
+
|
|
["source","sh",subs="attributes,callouts"]
|
|
--------------------------------------------------
|
|
./elasticsearch -Epath.data=data2 -Epath.logs=log2
|
|
./elasticsearch -Epath.data=data3 -Epath.logs=log3
|
|
--------------------------------------------------
|
|
+
|
|
Windows:
|
|
+
|
|
["source","powershell",subs="attributes,callouts"]
|
|
--------------------------------------------------
|
|
.\elasticsearch.bat -E path.data=data2 -E path.logs=log2
|
|
.\elasticsearch.bat -E path.data=data3 -E path.logs=log3
|
|
--------------------------------------------------
|
|
+
|
|
The additional nodes are assigned unique IDs. Because you're running all three
|
|
nodes locally, they automatically join the cluster with the first node.
|
|
|
|
. Use the cat health API to verify that your three-node cluster is up running.
|
|
The cat APIs return information about your cluster and indices in a
|
|
format that's easier to read than raw JSON.
|
|
+
|
|
You can interact directly with your cluster by submitting HTTP requests to
|
|
the {es} REST API. Most of the examples in this guide enable you to copy the
|
|
appropriate cURL command and submit the request to your local {es} instance from
|
|
the command line. If you have Kibana installed and running, you can also
|
|
open Kibana and submit requests through the Dev Console.
|
|
+
|
|
TIP: You'll want to check out the
|
|
https://www.elastic.co/guide/en/elasticsearch/client/index.html[{es} language
|
|
clients] when you're ready to start using {es} in your own applications.
|
|
+
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /_cat/health?v
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
+
|
|
The response should indicate that the status of the `elasticsearch` cluster
|
|
is `green` and it has three nodes:
|
|
+
|
|
[source,txt]
|
|
--------------------------------------------------
|
|
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
|
|
1565052807 00:53:27 elasticsearch green 3 3 6 3 0 0 0 0 - 100.0%
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/1565052807 00:53:27 elasticsearch/\\d+ \\d+:\\d+:\\d+ integTest/]
|
|
// TESTRESPONSE[s/3 3 6 3/\\d+ \\d+ \\d+ \\d+/]
|
|
// TESTRESPONSE[s/0 0 -/0 \\d+ -/]
|
|
// TESTRESPONSE[non_json]
|
|
+
|
|
NOTE: The cluster status will remain yellow if you are only running a single
|
|
instance of {es}. A single node cluster is fully functional, but data
|
|
cannot be replicated to another node to provide resiliency. Replica shards must
|
|
be available for the cluster status to be green. If the cluster status is red,
|
|
some data is unavailable.
|
|
|
|
[float]
|
|
[[gs-other-install]]
|
|
=== Other installation options
|
|
|
|
Installing {es} from an archive file enables you to easily install and run
|
|
multiple instances locally so you can try things out. To run a single instance,
|
|
you can run {es} in a Docker container, install {es} using the DEB or RPM
|
|
packages on Linux, install using Homebrew on macOS, or install using the MSI
|
|
package installer on Windows. See <<install-elasticsearch>> for more information.
|
|
|
|
[[getting-started-index]]
|
|
== Index some documents
|
|
|
|
Once you have a cluster up and running, you're ready to index some data.
|
|
There are a variety of ingest options for {es}, but in the end they all
|
|
do the same thing: put JSON documents into an {es} index.
|
|
|
|
You can do this directly with a simple PUT request that specifies
|
|
the index you want to add the document, a unique document ID, and one or more
|
|
`"field": "value"` pairs in the request body:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT /customer/_doc/1
|
|
{
|
|
"name": "John Doe"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
This request automatically creates the `customer` index if it doesn't already
|
|
exist, adds a new document that has an ID of `1`, and stores and
|
|
indexes the `name` field.
|
|
|
|
Since this is a new document, the response shows that the result of the
|
|
operation was that version 1 of the document was created:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_index" : "customer",
|
|
"_type" : "_doc",
|
|
"_id" : "1",
|
|
"_version" : 1,
|
|
"result" : "created",
|
|
"_shards" : {
|
|
"total" : 2,
|
|
"successful" : 2,
|
|
"failed" : 0
|
|
},
|
|
"_seq_no" : 26,
|
|
"_primary_term" : 4
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/]
|
|
// TESTRESPONSE[s/"successful" : \d+/"successful" : $body._shards.successful/]
|
|
// TESTRESPONSE[s/"_primary_term" : \d+/"_primary_term" : $body._primary_term/]
|
|
|
|
|
|
The new document is available immediately from any node in the cluster.
|
|
You can retrieve it with a GET request that specifies its document ID:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /customer/_doc/1
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
The response indicates that a document with the specified ID was found
|
|
and shows the original source fields that were indexed.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_index" : "customer",
|
|
"_type" : "_doc",
|
|
"_id" : "1",
|
|
"_version" : 1,
|
|
"_seq_no" : 26,
|
|
"_primary_term" : 4,
|
|
"found" : true,
|
|
"_source" : {
|
|
"name": "John Doe"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ ]
|
|
// TESTRESPONSE[s/"_primary_term" : \d+/"_primary_term" : $body._primary_term/]
|
|
|
|
[float]
|
|
[[getting-started-batch-processing]]
|
|
=== Indexing documents in bulk
|
|
|
|
If you have a lot of documents to index, you can submit them in batches with
|
|
the {ref}/docs-bulk.html[bulk API]. Using bulk to batch document
|
|
operations is significantly faster than submitting requests individually as it minimizes network roundtrips.
|
|
|
|
The optimal batch size depends a number of factors: the document size and complexity, the indexing and search load, and the resources available to your cluster. A good place to start is with batches of 1,000 to 5,000 documents
|
|
and a total payload between 5MB and 15MB. From there, you can experiment
|
|
to find the sweet spot.
|
|
|
|
To get some data into {es} that you can start searching and analyzing:
|
|
|
|
. Download the https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true[`accounts.json`] sample data set. The documents in this randomly-generated data set represent user accounts with the following information:
|
|
+
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"account_number": 0,
|
|
"balance": 16623,
|
|
"firstname": "Bradshaw",
|
|
"lastname": "Mckenzie",
|
|
"age": 29,
|
|
"gender": "F",
|
|
"address": "244 Columbus Place",
|
|
"employer": "Euron",
|
|
"email": "bradshawmckenzie@euron.com",
|
|
"city": "Hobucken",
|
|
"state": "CO"
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
. Index the account data into the `bank` index with the following `_bulk` request:
|
|
+
|
|
[source,sh]
|
|
--------------------------------------------------
|
|
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
|
|
curl "localhost:9200/_cat/indices?v"
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
+
|
|
////
|
|
This replicates the above in a document-testing friendly way but isn't visible
|
|
in the docs:
|
|
+
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /_cat/indices?v
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:bank]
|
|
////
|
|
+
|
|
The response indicates that 1,000 documents were indexed successfully.
|
|
+
|
|
[source,txt]
|
|
--------------------------------------------------
|
|
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
|
|
yellow open bank l7sSYV2cQXmu6_4rJWVIww 5 1 1000 0 128.6kb 128.6kb
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/128.6kb/\\d+(\\.\\d+)?[mk]?b/]
|
|
// TESTRESPONSE[s/l7sSYV2cQXmu6_4rJWVIww/.+/ non_json]
|
|
|
|
[[getting-started-search]]
|
|
== Start searching
|
|
|
|
Now let's start with some simple searches. There are two basic ways to run searches: one is by sending search parameters through the {ref}/search-uri-request.html[REST request URI] and the other by sending them through the {ref}/search-request-body.html[REST request body]. The request body method allows you to be more expressive and also to define your searches in a more readable JSON format. We'll try one example of the request URI method but for the remainder of this tutorial, we will exclusively be using the request body method.
|
|
|
|
The REST API for search is accessible from the `_search` endpoint. This example returns all documents in the bank index:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search?q=*&sort=account_number:asc&pretty
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
Let's first dissect the search call. We are searching (`_search` endpoint) in the bank index, and the `q=*` parameter instructs Elasticsearch to match all documents in the index. The `sort=account_number:asc` parameter indicates to sort the results using the `account_number` field of each document in an ascending order. The `pretty` parameter, again, just tells Elasticsearch to return pretty-printed JSON results.
|
|
|
|
And the response (partially shown):
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took" : 63,
|
|
"timed_out" : false,
|
|
"_shards" : {
|
|
"total" : 5,
|
|
"successful" : 5,
|
|
"skipped" : 0,
|
|
"failed" : 0
|
|
},
|
|
"hits" : {
|
|
"total" : {
|
|
"value": 1000,
|
|
"relation": "eq"
|
|
},
|
|
"max_score" : null,
|
|
"hits" : [ {
|
|
"_index" : "bank",
|
|
"_type" : "_doc",
|
|
"_id" : "0",
|
|
"sort": [0],
|
|
"_score" : null,
|
|
"_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
|
|
}, {
|
|
"_index" : "bank",
|
|
"_type" : "_doc",
|
|
"_id" : "1",
|
|
"sort": [1],
|
|
"_score" : null,
|
|
"_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
|
|
}, ...
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took" : 63/"took" : $body.took/]
|
|
// TESTRESPONSE[s/\.\.\./$body.hits.hits.2, $body.hits.hits.3, $body.hits.hits.4, $body.hits.hits.5, $body.hits.hits.6, $body.hits.hits.7, $body.hits.hits.8, $body.hits.hits.9/]
|
|
|
|
For example, the following request retrieves all documents in the `bank`
|
|
index sorted by account number:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"query": { "match_all": {} },
|
|
"sort": [
|
|
{ "account_number": "asc" }
|
|
]
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
The difference here is that instead of passing `q=*` in the URI, we provide a JSON-style query request body to the `_search` API. We'll discuss this JSON query in the next section.
|
|
|
|
////
|
|
Hidden response just so we can assert that it is indeed the same but don't have
|
|
to clutter the docs with it:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took" : 63,
|
|
"timed_out" : false,
|
|
"_shards" : {
|
|
"total" : 5,
|
|
"successful" : 5,
|
|
"skipped" : 0,
|
|
"failed" : 0
|
|
},
|
|
"hits" : {
|
|
"total" : {
|
|
"value": 1000,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": null,
|
|
"hits" : [ {
|
|
"_index" : "bank",
|
|
"_type" : "_doc",
|
|
"_id" : "0",
|
|
"sort": [0],
|
|
"_score": null,
|
|
"_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
|
|
}, {
|
|
"_index" : "bank",
|
|
"_type" : "_doc",
|
|
"_id" : "1",
|
|
"sort": [1],
|
|
"_score": null,
|
|
"_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
|
|
}, ...
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took" : 63/"took" : $body.took/]
|
|
// TESTRESPONSE[s/\.\.\./$body.hits.hits.2, $body.hits.hits.3, $body.hits.hits.4, $body.hits.hits.5, $body.hits.hits.6, $body.hits.hits.7, $body.hits.hits.8, $body.hits.hits.9/]
|
|
|
|
////
|
|
|
|
It is important to understand that once you get your search results back, Elasticsearch is completely done with the request and does not maintain any kind of server-side resources or open cursors into your results. This is in stark contrast to many other platforms such as SQL wherein you may initially get a partial subset of your query results up-front and then you have to continuously go back to the server if you want to fetch (or page through) the rest of the results using some kind of stateful server-side cursor.
|
|
|
|
[float]
|
|
[[getting-started-query-lang]]
|
|
=== Introducing the Query Language
|
|
|
|
Elasticsearch provides a JSON-style domain-specific language that you can use to execute queries. This is referred to as the {ref}/query-dsl.html[Query DSL]. The query language is quite comprehensive and can be intimidating at first glance but the best way to actually learn it is to start with a few basic examples.
|
|
|
|
Going back to our last example, we executed this query:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"query": { "match_all": {} }
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
Dissecting the above, the `query` part tells us what our query definition is and the `match_all` part is simply the type of query that we want to run. The `match_all` query is simply a search for all documents in the specified index.
|
|
|
|
In addition to the `query` parameter, we also can pass other parameters to
|
|
influence the search results. In the example in the section above we passed in
|
|
`sort`, here we pass in `size`:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"query": { "match_all": {} },
|
|
"size": 1
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
Each search request is self-contained: {es} does not maintain any
|
|
state information across requests. To page through the search hits, specify
|
|
the `from` and `size` parameters in your request.
|
|
|
|
This example does a `match_all` and returns documents 10 through 19:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"query": { "match_all": {} },
|
|
"from": 10,
|
|
"size": 10
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
The `from` parameter (0-based) specifies which document index to start from and the `size` parameter specifies how many documents to return starting at the from parameter. This feature is useful when implementing paging of search results. Note that if `from` is not specified, it defaults to 0.
|
|
|
|
To search for specific terms within a field, you can use a `match` query.
|
|
For example, the following request searches the `address` field to find
|
|
customers whose addresses contain `mill` or `lane`:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"query": { "match": { "address": "mill lane" } }
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
This example is a variant of `match` (`match_phrase`) that returns all accounts containing the phrase "mill lane" in the address:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"query": { "match_phrase": { "address": "mill lane" } }
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
Let's now introduce the {ref}/query-dsl-bool-query.html[`bool` query]. The `bool` query allows us to compose smaller queries into bigger queries using boolean logic.
|
|
|
|
This example composes two `match` queries and returns all accounts containing "mill" and "lane" in the address:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"query": {
|
|
"bool": {
|
|
"must": [
|
|
{ "match": { "address": "mill" } },
|
|
{ "match": { "address": "lane" } }
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
In the above example, the `bool must` clause specifies all the queries that must be true for a document to be considered a match.
|
|
|
|
In contrast, this example composes two `match` queries and returns all accounts containing "mill" or "lane" in the address:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"query": {
|
|
"bool": {
|
|
"should": [
|
|
{ "match": { "address": "mill" } },
|
|
{ "match": { "address": "lane" } }
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
In the above example, the `bool should` clause specifies a list of queries either of which must be true for a document to be considered a match.
|
|
|
|
This example composes two `match` queries and returns all accounts that contain neither "mill" nor "lane" in the address:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"query": {
|
|
"bool": {
|
|
"must_not": [
|
|
{ "match": { "address": "mill" } },
|
|
{ "match": { "address": "lane" } }
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
In the above example, the `bool must_not` clause specifies a list of queries none of which must be true for a document to be considered a match.
|
|
|
|
We can combine `must`, `should`, and `must_not` clauses simultaneously inside a `bool` query. Furthermore, we can compose `bool` queries inside any of these `bool` clauses to mimic any complex multi-level boolean logic.
|
|
|
|
This example returns all accounts of anybody who is 40 years old but doesn't live in ID(aho):
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"query": {
|
|
"bool": {
|
|
"must": [
|
|
{ "match": { "age": "40" } }
|
|
],
|
|
"must_not": [
|
|
{ "match": { "state": "ID" } }
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
[float]
|
|
[[getting-started-filters]]
|
|
=== Executing filters
|
|
|
|
In the previous section, we skipped over a little detail called the document score (`_score` field in the search results). The score is a numeric value that is a relative measure of how well the document matches the search query that we specified. The higher the score, the more relevant the document is, the lower the score, the less relevant the document is.
|
|
|
|
But queries do not always need to produce scores, in particular when they are only used for "filtering" the document set. Elasticsearch detects these situations and automatically optimizes query execution in order not to compute useless scores.
|
|
|
|
The {ref}/query-dsl-bool-query.html[`bool` query] that we introduced in the previous section also supports `filter` clauses which allow us to use a query to restrict the documents that will be matched by other clauses, without changing how scores are computed. As an example, let's introduce the {ref}/query-dsl-range-query.html[`range` query], which allows us to filter documents by a range of values. This is generally used for numeric or date filtering.
|
|
|
|
This example uses a bool query to return all accounts with balances between 20000 and 30000, inclusive. In other words, we want to find accounts with a balance that is greater than or equal to 20000 and less than or equal to 30000.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"query": {
|
|
"bool": {
|
|
"must": { "match_all": {} },
|
|
"filter": {
|
|
"range": {
|
|
"balance": {
|
|
"gte": 20000,
|
|
"lte": 30000
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
Dissecting the above, the bool query contains a `match_all` query (the query part) and a `range` query (the filter part). We can substitute any other queries into the query and the filter parts. In the above case, the range query makes perfect sense since documents falling into the range all match "equally", i.e., no document is more relevant than another.
|
|
|
|
In addition to the `match_all`, `match`, `bool`, and `range` queries, there are a lot of other query types that are available and we won't go into them here. Since we already have a basic understanding of how they work, it shouldn't be too difficult to apply this knowledge in learning and experimenting with the other query types.
|
|
|
|
[[getting-started-aggregations]]
|
|
== Analyze results with aggregations
|
|
|
|
{es} aggregations enable you to get meta-information about your search results
|
|
and answer questions like, "How many account holders are in Texas?" or
|
|
"What's the average balance of accounts in Tennessee?" You can search
|
|
documents, filter hits, and use aggregations to analyze the results all in one
|
|
request.
|
|
|
|
For example, the following request uses a `terms` aggregation to group
|
|
all of the accounts in the `bank` index by state, and returns the ten states
|
|
with the most accounts in descending order:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"size": 0,
|
|
"aggs": {
|
|
"group_by_state": {
|
|
"terms": {
|
|
"field": "state.keyword"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
The `buckets` in the response are the values of the `state` field. The
|
|
`doc_count` shows the number of accounts in each state. For example, you
|
|
can see that there are 27 accounts in `ID` (Idaho). Because the request
|
|
set `size=0`, the response only contains the aggregation results.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 29,
|
|
"timed_out": false,
|
|
"_shards": {
|
|
"total": 5,
|
|
"successful": 5,
|
|
"skipped" : 0,
|
|
"failed": 0
|
|
},
|
|
"hits" : {
|
|
"total" : {
|
|
"value": 1000,
|
|
"relation": "eq"
|
|
},
|
|
"max_score" : null,
|
|
"hits" : [ ]
|
|
},
|
|
"aggregations" : {
|
|
"group_by_state" : {
|
|
"doc_count_error_upper_bound": 20,
|
|
"sum_other_doc_count": 770,
|
|
"buckets" : [ {
|
|
"key" : "ID",
|
|
"doc_count" : 27
|
|
}, {
|
|
"key" : "TX",
|
|
"doc_count" : 27
|
|
}, {
|
|
"key" : "AL",
|
|
"doc_count" : 25
|
|
}, {
|
|
"key" : "MD",
|
|
"doc_count" : 25
|
|
}, {
|
|
"key" : "TN",
|
|
"doc_count" : 23
|
|
}, {
|
|
"key" : "MA",
|
|
"doc_count" : 21
|
|
}, {
|
|
"key" : "NC",
|
|
"doc_count" : 21
|
|
}, {
|
|
"key" : "ND",
|
|
"doc_count" : 21
|
|
}, {
|
|
"key" : "ME",
|
|
"doc_count" : 20
|
|
}, {
|
|
"key" : "MO",
|
|
"doc_count" : 20
|
|
} ]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took": 29/"took": $body.took/]
|
|
|
|
|
|
You can combine aggregations to build more complex summaries of your data. For
|
|
example, the following request nests an `avg` aggregation within the previous
|
|
`group_by_state` aggregation to calculate the average account balances for
|
|
each state.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"size": 0,
|
|
"aggs": {
|
|
"group_by_state": {
|
|
"terms": {
|
|
"field": "state.keyword"
|
|
},
|
|
"aggs": {
|
|
"average_balance": {
|
|
"avg": {
|
|
"field": "balance"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
Instead of sorting the results by count, you could sort using the result of
|
|
the nested aggregation by specifying the order within the `terms` aggregation:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /bank/_search
|
|
{
|
|
"size": 0,
|
|
"aggs": {
|
|
"group_by_state": {
|
|
"terms": {
|
|
"field": "state.keyword",
|
|
"order": {
|
|
"average_balance": "desc"
|
|
}
|
|
},
|
|
"aggs": {
|
|
"average_balance": {
|
|
"avg": {
|
|
"field": "balance"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
In addition to basic bucketing and metrics aggregations like these, {es}
|
|
provides specialized aggregations for operating on multiple fields and
|
|
analyzing particular types of data such as dates, IP addresses, and geo
|
|
data. You can also feed the results of individual aggregations into pipeline
|
|
aggregations for further analysis.
|
|
|
|
The core analysis capabilities provided by aggregations enable advanced
|
|
features such as using machine learning to detect anomalies.
|
|
|
|
[[getting-started-next-steps]]
|
|
== Where to go from here
|
|
|
|
Now that you've set up a cluster, indexed some documents, and run some
|
|
searches and aggregations, you might want to:
|
|
|
|
* {stack-gs}/get-started-elastic-stack.html#install-kibana[Dive in to the Elastic
|
|
Stack Tutorial] to install Kibana, Logstash, and Beats and
|
|
set up a basic system monitoring solution.
|
|
|
|
* {kibana-ref}/add-sample-data.html[Load one of the sample data sets into Kibana]
|
|
to see how you can use {es} and Kibana together to visualize your data.
|
|
|
|
* Try out one of the Elastic search solutions:
|
|
** https://swiftype.com/documentation/site-search/crawler-quick-start[Site Search]
|
|
** https://swiftype.com/documentation/app-search/getting-started[App Search]
|
|
** https://swiftype.com/documentation/enterprise-search/getting-started[Enterprise Search]
|