CONSOLEify remaining getting-started docs

Moves some test data used by those docs into the Elasticsearch
repository so we can use it when we test the docs during the build.

Relates to #18160
This commit is contained in:
Nik Everett 2016-10-02 23:16:21 -04:00
parent d9781bd069
commit 6d42e197b8
3 changed files with 2233 additions and 124 deletions

View File

@ -128,7 +128,6 @@ buildRestTests.expectedUnconvertedCandidates = [
'reference/docs/termvectors.asciidoc',
'reference/docs/update-by-query.asciidoc',
'reference/docs/update.asciidoc',
'reference/getting-started.asciidoc',
'reference/index-modules/similarity.asciidoc',
'reference/index-modules/store.asciidoc',
'reference/index-modules/translog.asciidoc',
@ -297,3 +296,26 @@ buildRestTests.setups['sales'] = '''
{"date": "2015/03/01 00:00:00", "price": 200, "type": "hat"}
{"index":{}}
{"date": "2015/03/01 00:00:00", "price": 175, "type": "t-shirt"}'''
// Dummy bank account data used by getting-started.asciidoc
buildRestTests.setups['bank'] = '''
- do:
bulk:
index: bank
type: account
refresh: true
body: |
#bank_data#
'''
/* Load the actual accounts only if we're going to use them. This complicates
* dependency checking but that is a small price to pay for not building a
* 400kb string every time we start the build. */
File accountsFile = new File("$projectDir/src/test/resources/accounts.json")
buildRestTests.inputs.file(accountsFile)
buildRestTests.doFirst {
String accounts = accountsFile.getText('UTF-8')
// Indent like a yaml test needs
accounts = accounts.replaceAll('(?m)^', ' ')
buildRestTests.setups['bank'] =
buildRestTests.setups['bank'].replace('#bank_data#', accounts)
}

View File

@ -386,6 +386,7 @@ PUT /customer/external/1
GET /customer/external/1
DELETE /customer
--------------------------------------------------
// CONSOLE
If we study the above commands carefully, we can actually see a pattern of how we access data in Elasticsearch. That pattern can be summarized as follows:
@ -455,7 +456,7 @@ POST /customer/external?pretty
// CONSOLE
// TEST[continued]
Note that in the above case, we are using the POST verb instead of PUT since we didn't specify an ID.
Note that in the above case, we are using the `POST` verb instead of PUT since we didn't specify an ID.
=== Updating Documents
@ -555,7 +556,7 @@ The bulk API executes all the actions sequentially and in order. If a single act
Now that we've gotten a glimpse of the basics, let's try to work on a more realistic dataset. I've prepared a sample of fictitious JSON documents of customer bank account information. Each document has the following schema:
[source,sh]
[source,js]
--------------------------------------------------
{
"account_number": 0,
@ -571,28 +572,43 @@ Now that we've gotten a glimpse of the basics, let's try to work on a more reali
"state": "CO"
}
--------------------------------------------------
// NOTCONSOLE
For the curious, I generated this data from http://www.json-generator.com/[`www.json-generator.com/`] so please ignore the actual values and semantics of the data as these are all randomly generated.
[float]
=== Loading the Sample Dataset
You can download the sample dataset (accounts.json) from https://github.com/bly2k/files/blob/master/accounts.zip?raw=true[here]. Extract it to our current directory and let's load it into our cluster as follows:
You can download the sample dataset (accounts.json) from https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true[here]. Extract it to our current directory and let's load it into our cluster as follows:
[source,sh]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json"
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty&refresh' --data-binary "@accounts.json"
curl 'localhost:9200/_cat/indices?v'
--------------------------------------------------
// NOTCONSOLE
////
This replicates the above in a document-testing friendly way but isn't visible
in the docs:
[source,js]
--------------------------------------------------
GET /_cat/indices?v
--------------------------------------------------
// CONSOLE
// TEST[setup:bank]
////
And the response:
[source,sh]
[source,js]
--------------------------------------------------
curl 'localhost:9200/_cat/indices?v'
health index pri rep docs.count docs.deleted store.size pri.store.size
yellow bank 5 1 1000 0 424.4kb 424.4kb
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank l7sSYV2cQXmu6_4rJWVIww 5 1 1000 0 128.6kb 128.6kb
--------------------------------------------------
// TESTRESPONSE[s/128.6kb/\\d+\\.\\d+[mk]?b/]
// TESTRESPONSE[s/l7sSYV2cQXmu6_4rJWVIww/.+/ _cat]
Which means that we just successfully bulk indexed 1000 documents into the bank index (under the account type).
@ -602,18 +618,19 @@ Now let's start with some simple searches. There are two basic ways to run searc
The REST API for search is accessible from the `_search` endpoint. This example returns all documents in the bank index:
[source,sh]
[source,js]
--------------------------------------------------
curl 'localhost:9200/bank/_search?q=*&pretty'
GET /bank/_search?q=*&sort=account_number:asc
--------------------------------------------------
// CONSOLE
// TEST[continued]
Let's first dissect the search call. We are searching (`_search` endpoint) in the bank index, and the `q=*` parameter instructs Elasticsearch to match all documents in the index. The `pretty` parameter, again, just tells Elasticsearch to return pretty-printed JSON results.
And the response (partially shown):
[source,sh]
[source,js]
--------------------------------------------------
curl 'localhost:9200/bank/_search?q=*&pretty'
{
"took" : 63,
"timed_out" : false,
@ -624,21 +641,28 @@ curl 'localhost:9200/bank/_search?q=*&pretty'
},
"hits" : {
"total" : 1000,
"max_score" : 1.0,
"max_score" : null,
"hits" : [ {
"_index" : "bank",
"_type" : "account",
"_id" : "0",
"sort": [0],
"_score" : null,
"_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
}, {
"_index" : "bank",
"_type" : "account",
"sort": [1],
"_score" : null,
"_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
}, ...
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took" : 63/"took" : $body.took/]
// TESTRESPONSE[s/\.\.\./$body.hits.hits.2, $body.hits.hits.3, $body.hits.hits.4, $body.hits.hits.5, $body.hits.hits.6, $body.hits.hits.7, $body.hits.hits.8, $body.hits.hits.9/]
As for the response, we see the following parts:
@ -648,30 +672,34 @@ As for the response, we see the following parts:
* `hits` search results
* `hits.total` total number of documents matching our search criteria
* `hits.hits` actual array of search results (defaults to first 10 documents)
* `sort` - sort key for results (missing if sorting by score)
* `_score` and `max_score` - ignore these fields for now
Here is the same exact search above using the alternative request body method:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": { "match_all": {} }
}'
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
The difference here is that instead of passing `q=*` in the URI, we POST a JSON-style query request body to the `_search` API. We'll discuss this JSON query in the next section.
And the response (partially shown):
////
Hidden response just so we can assert that it is indeed the same but don't have
to clutter the docs with it:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_all": {} }
}'
{
"took" : 26,
"took" : 63,
"timed_out" : false,
"_shards" : {
"total" : 5,
@ -680,22 +708,30 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
},
"hits" : {
"total" : 1000,
"max_score" : 1.0,
"max_score": null,
"hits" : [ {
"_index" : "bank",
"_type" : "account",
"_id" : "0",
"sort": [0],
"_score": null,
"_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"_score" : 1.0, "_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "6",
"_score" : 1.0, "_source" : {"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "13",
"sort": [1],
"_score": null,
"_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
}, ...
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took" : 63/"took" : $body.took/]
// TESTRESPONSE[s/\.\.\./$body.hits.hits.2, $body.hits.hits.3, $body.hits.hits.4, $body.hits.hits.5, $body.hits.hits.6, $body.hits.hits.7, $body.hits.hits.8, $body.hits.hits.9/]
////
It is important to understand that once you get your search results back, Elasticsearch is completely done with the request and does not maintain any kind of server-side resources or open cursors into your results. This is in stark contrast to many other platforms such as SQL wherein you may initially get a partial subset of your query results up-front and then you have to continuously go back to the server if you want to fetch (or page through) the rest of the results using some kind of stateful server-side cursor.
@ -705,52 +741,63 @@ Elasticsearch provides a JSON-style domain-specific language that you can use to
Going back to our last example, we executed this query:
[source,sh]
[source,js]
--------------------------------------------------
GET /bank/_search
{
"query": { "match_all": {} }
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
Dissecting the above, the `query` part tells us what our query definition is and the `match_all` part is simply the type of query that we want to run. The `match_all` query is simply a search for all documents in the specified index.
In addition to the `query` parameter, we also can pass other parameters to influence the search results. For example, the following does a `match_all` and returns only the first document:
In addition to the `query` parameter, we also can pass other parameters to
influence the search results. In the example in the section above we passed in
`sort`, here we pass in `size`:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": { "match_all": {} },
"size": 1
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
Note that if `size` is not specified, it defaults to 10.
This example does a `match_all` and returns documents 11 through 20:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": { "match_all": {} },
"from": 10,
"size": 10
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
The `from` parameter (0-based) specifies which document index to start from and the `size` parameter specifies how many documents to return starting at the from parameter. This feature is useful when implementing paging of search results. Note that if `from` is not specified, it defaults to 0.
This example does a `match_all` and sorts the results by account balance in descending order and returns the top 10 (default size) documents.
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } }
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
=== Executing Searches
@ -758,14 +805,16 @@ Now that we have seen a few of the basic search parameters, let's dig in some mo
This example shows how to return two fields, `account_number` and `balance` (inside of `_source`), from the search:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
Note that the above example simply reduces the `_source` field. It will still only return one field named `_source` but within it, only the fields `account_number` and `balance` are included.
@ -775,51 +824,59 @@ Now let's move on to the query part. Previously, we've seen how the `match_all`
This example returns the account numbered 20:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": { "match": { "account_number": 20 } }
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
This example returns all accounts containing the term "mill" in the address:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": { "match": { "address": "mill" } }
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
This example returns all accounts containing the term "mill" or "lane" in the address:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": { "match": { "address": "mill lane" } }
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
This example is a variant of `match` (`match_phrase`) that returns all accounts containing the phrase "mill lane" in the address:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
Let's now introduce the <<query-dsl-bool-query,`bool`(ean) query>>. The `bool` query allows us to compose smaller queries into bigger queries using boolean logic.
This example composes two `match` queries and returns all accounts containing "mill" and "lane" in the address:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": {
"bool": {
@ -829,16 +886,18 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
]
}
}
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
In the above example, the `bool must` clause specifies all the queries that must be true for a document to be considered a match.
In contrast, this example composes two `match` queries and returns all accounts containing "mill" or "lane" in the address:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": {
"bool": {
@ -848,16 +907,18 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
]
}
}
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
In the above example, the `bool should` clause specifies a list of queries either of which must be true for a document to be considered a match.
This example composes two `match` queries and returns all accounts that contain neither "mill" nor "lane" in the address:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": {
"bool": {
@ -867,8 +928,10 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
]
}
}
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
In the above example, the `bool must_not` clause specifies a list of queries none of which must be true for a document to be considered a match.
@ -876,9 +939,9 @@ We can combine `must`, `should`, and `must_not` clauses simultaneously inside a
This example returns all accounts of anybody who is 40 years old but don't live in ID(aho):
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": {
"bool": {
@ -890,8 +953,10 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
]
}
}
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
=== Executing Filters
@ -903,9 +968,9 @@ The <<query-dsl-bool-query,`bool` query>> that we introduced in the previous sec
This example uses a bool query to return all accounts with balances between 20000 and 30000, inclusive. In other words, we want to find accounts with a balance that is greater than or equal to 20000 and less than or equal to 30000.
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"query": {
"bool": {
@ -920,8 +985,10 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
}
}
}
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
Dissecting the above, the bool query contains a `match_all` query (the query part) and a `range` query (the filter part). We can substitute any other queries into the query and the filter parts. In the above case, the range query makes perfect sense since documents falling into the range all match "equally", i.e., no document is more relevant than another.
@ -933,9 +1000,9 @@ Aggregations provide the ability to group and extract statistics from your data.
To start with, this example groups all the accounts by state, and then returns the top 10 (default) states sorted by count descending (also default):
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"size": 0,
"aggs": {
@ -945,8 +1012,10 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
}
}
}
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
In SQL, the above aggregation is similar in concept to:
@ -957,8 +1026,16 @@ SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC
And the response (partially shown):
[source,sh]
[source,js]
--------------------------------------------------
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits" : {
"total" : 1000,
"max_score" : 0.0,
@ -966,51 +1043,55 @@ And the response (partially shown):
},
"aggregations" : {
"group_by_state" : {
"doc_count_error_upper_bound": 20,
"sum_other_doc_count": 770,
"buckets" : [ {
"key" : "al",
"key" : "ID",
"doc_count" : 27
}, {
"key" : "TX",
"doc_count" : 27
}, {
"key" : "AL",
"doc_count" : 25
}, {
"key" : "MD",
"doc_count" : 25
}, {
"key" : "TN",
"doc_count" : 23
}, {
"key" : "MA",
"doc_count" : 21
}, {
"key" : "tx",
"doc_count" : 17
"key" : "NC",
"doc_count" : 21
}, {
"key" : "id",
"doc_count" : 15
"key" : "ND",
"doc_count" : 21
}, {
"key" : "ma",
"doc_count" : 15
"key" : "ME",
"doc_count" : 20
}, {
"key" : "md",
"doc_count" : 15
}, {
"key" : "pa",
"doc_count" : 15
}, {
"key" : "dc",
"doc_count" : 14
}, {
"key" : "me",
"doc_count" : 14
}, {
"key" : "mo",
"doc_count" : 14
}, {
"key" : "nd",
"doc_count" : 14
"key" : "MO",
"doc_count" : 20
} ]
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 29/"took": $body.took/]
We can see that there are 21 accounts in AL(abama), followed by 17 accounts in TX, followed by 15 accounts in ID(aho), and so forth.
We can see that there are 27 accounts in `ID` (Idaho), followed by 27 accounts
in `TX` (Texas), followed by 25 accounts in `AL` (Alabama), and so forth.
Note that we set `size=0` to not show search hits because we only want to see the aggregation results in the response.
Building on the previous aggregation, this example calculates the average account balance by state (again only for the top 10 states sorted by count in descending order):
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"size": 0,
"aggs": {
@ -1027,16 +1108,18 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
}
}
}
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
Notice how we nested the `average_balance` aggregation inside the `group_by_state` aggregation. This is a common pattern for all the aggregations. You can nest aggregations inside aggregations arbitrarily to extract pivoted summarizations that you require from your data.
Building on the previous aggregation, let's now sort on the average balance in descending order:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"size": 0,
"aggs": {
@ -1056,14 +1139,16 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
}
}
}
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
This example demonstrates how we can group by age brackets (ages 20-29, 30-39, and 40-49), then by gender, and then finally get the average account balance, per age bracket, per gender:
[source,sh]
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
GET /bank/_search
{
"size": 0,
"aggs": {
@ -1101,8 +1186,10 @@ curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
}
}
}
}'
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
There are a many other aggregations capabilities that we won't go into detail here. The <<search-aggregations,aggregations reference guide>> is a great starting point if you want to do further experimentation.

File diff suppressed because it is too large Load Diff