No-op changes to: * Move `Search your data` source files into the same directory * Rename `Search your data` source files based on page ID * Remove unneeded includes * Remove the `Request` dir
This commit is contained in:
parent
1ae2923632
commit
17b5a0d25e
|
@ -32,7 +32,7 @@ include::data-streams/data-streams.asciidoc[]
|
|||
|
||||
include::ingest.asciidoc[]
|
||||
|
||||
include::search/search-your-data.asciidoc[]
|
||||
include::search/search-your-data/search-your-data.asciidoc[]
|
||||
|
||||
include::query-dsl.asciidoc[]
|
||||
|
||||
|
|
|
@ -1,20 +0,0 @@
|
|||
[[filter-search-results]]
|
||||
== Filter search results
|
||||
|
||||
You can use two methods to filter search results:
|
||||
|
||||
* Use a boolean query with a `filter` clause. Search requests apply
|
||||
<<query-dsl-bool-query,boolean filters>> to both search hits and
|
||||
<<search-aggregations,aggregations>>.
|
||||
|
||||
* Use the search API's `post_filter` parameter. Search requests apply
|
||||
<<post-filter,post filters>> only to search hits, not aggregations. You can use
|
||||
a post filter to calculate aggregations based on a broader result set, and then
|
||||
further narrow the results.
|
||||
+
|
||||
You can also <<rescore,rescore>> hits after the post filter to
|
||||
improve relevance and reorder results.
|
||||
|
||||
include::request/post-filter.asciidoc[]
|
||||
|
||||
include::request/rescore.asciidoc[]
|
|
@ -1,52 +0,0 @@
|
|||
[[paginate-search-results]]
|
||||
== Paginate search results
|
||||
|
||||
By default, the <<search-search,search API>> returns the top 10 matching documents.
|
||||
|
||||
To paginate through a larger set of results, you can use the search API's `size`
|
||||
and `from` parameters. The `size` parameter is the number of matching documents
|
||||
to return. The `from` parameter is a zero-indexed offset from the beginning of
|
||||
the complete result set that indicates the document you want to start with.
|
||||
|
||||
The following search API request sets the `from` offset to `5`, meaning the
|
||||
request offsets, or skips, the first five matching documents.
|
||||
|
||||
The `size` parameter is `20`, meaning the request can return up to 20 documents,
|
||||
starting at the offset.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /_search
|
||||
{
|
||||
"from": 5,
|
||||
"size": 20,
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
By default, you cannot page through more than 10,000 documents using the `from`
|
||||
and `size` parameters. This limit is set using the
|
||||
<<index-max-result-window,`index.max_result_window`>> index setting.
|
||||
|
||||
Deep paging or requesting many results at once can result in slow searches.
|
||||
Results are sorted before being returned. Because search requests usually span
|
||||
multiple shards, each shard must generate its own sorted results. These separate
|
||||
results must then be combined and sorted to ensure that the overall sort order
|
||||
is correct.
|
||||
|
||||
As an alternative to deep paging, we recommend using
|
||||
<<scroll-search-results,scroll>> or the
|
||||
<<search-after,`search_after`>> parameter.
|
||||
|
||||
WARNING: {es} uses Lucene's internal doc IDs as tie-breakers. These internal
|
||||
doc IDs can be completely different across replicas of the same
|
||||
data. When paginating, you might occasionally see that documents with the same
|
||||
sort values are not ordered consistently.
|
||||
|
||||
include::request/scroll.asciidoc[]
|
||||
|
||||
include::request/search-after.asciidoc[]
|
|
@ -1,60 +0,0 @@
|
|||
[discrete]
|
||||
[[quickly-check-for-matching-docs]]
|
||||
=== Quickly check for matching docs
|
||||
|
||||
If you only want to know if there are any documents matching a
|
||||
specific query, you can set the `size` to `0` to indicate that we are not
|
||||
interested in the search results. You can also set `terminate_after` to `1`
|
||||
to indicate that the query execution can be terminated whenever the first
|
||||
matching document was found (per shard).
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search?q=user.id:elkbee&size=0&terminate_after=1
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
|
||||
NOTE: `terminate_after` is always applied **after** the
|
||||
<<post-filter,`post_filter`>> and stops the query as well as the aggregation
|
||||
executions when enough hits have been collected on the shard. Though the doc
|
||||
count on aggregations may not reflect the `hits.total` in the response since
|
||||
aggregations are applied **before** the post filtering.
|
||||
|
||||
The response will not contain any hits as the `size` was set to `0`. The
|
||||
`hits.total` will be either equal to `0`, indicating that there were no
|
||||
matching documents, or greater than `0` meaning that there were at least
|
||||
as many documents matching the query when it was early terminated.
|
||||
Also if the query was terminated early, the `terminated_early` flag will
|
||||
be set to `true` in the response.
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"took": 3,
|
||||
"timed_out": false,
|
||||
"terminated_early": true,
|
||||
"_shards": {
|
||||
"total": 1,
|
||||
"successful": 1,
|
||||
"skipped" : 0,
|
||||
"failed": 0
|
||||
},
|
||||
"hits": {
|
||||
"total" : {
|
||||
"value": 1,
|
||||
"relation": "eq"
|
||||
},
|
||||
"max_score": null,
|
||||
"hits": []
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/"took": 3/"took": $body.took/]
|
||||
|
||||
|
||||
The `took` time in the response contains the milliseconds that this request
|
||||
took for processing, beginning quickly after the node received the query, up
|
||||
until all search related work is done and before the above JSON is returned
|
||||
to the client. This means it includes the time spent waiting in thread pools,
|
||||
executing a distributed search across the whole cluster and gathering all the
|
||||
results.
|
|
@ -1,132 +0,0 @@
|
|||
[discrete]
|
||||
[[post-filter]]
|
||||
=== Post filter
|
||||
|
||||
When you use the `post_filter` parameter to filter search results, the search
|
||||
hits are filtered after the aggregations are calculated. A post filter has no
|
||||
impact on the aggregation results.
|
||||
|
||||
For example, you are selling shirts that have the following properties:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT /shirts
|
||||
{
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"brand": { "type": "keyword"},
|
||||
"color": { "type": "keyword"},
|
||||
"model": { "type": "keyword"}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT /shirts/_doc/1?refresh
|
||||
{
|
||||
"brand": "gucci",
|
||||
"color": "red",
|
||||
"model": "slim"
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTSETUP
|
||||
|
||||
|
||||
Imagine a user has specified two filters:
|
||||
|
||||
`color:red` and `brand:gucci`. You only want to show them red shirts made by
|
||||
Gucci in the search results. Normally you would do this with a
|
||||
<<query-dsl-bool-query,`bool` query>>:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /shirts/_search
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"filter": [
|
||||
{ "term": { "color": "red" }},
|
||||
{ "term": { "brand": "gucci" }}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
However, you would also like to use _faceted navigation_ to display a list of
|
||||
other options that the user could click on. Perhaps you have a `model` field
|
||||
that would allow the user to limit their search results to red Gucci
|
||||
`t-shirts` or `dress-shirts`.
|
||||
|
||||
This can be done with a
|
||||
<<search-aggregations-bucket-terms-aggregation,`terms` aggregation>>:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /shirts/_search
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"filter": [
|
||||
{ "term": { "color": "red" }},
|
||||
{ "term": { "brand": "gucci" }}
|
||||
]
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"models": {
|
||||
"terms": { "field": "model" } <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
<1> Returns the most popular models of red shirts by Gucci.
|
||||
|
||||
But perhaps you would also like to tell the user how many Gucci shirts are
|
||||
available in *other colors*. If you just add a `terms` aggregation on the
|
||||
`color` field, you will only get back the color `red`, because your query
|
||||
returns only red shirts by Gucci.
|
||||
|
||||
Instead, you want to include shirts of all colors during aggregation, then
|
||||
apply the `colors` filter only to the search results. This is the purpose of
|
||||
the `post_filter`:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /shirts/_search
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"filter": {
|
||||
"term": { "brand": "gucci" } <1>
|
||||
}
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"colors": {
|
||||
"terms": { "field": "color" } <2>
|
||||
},
|
||||
"color_red": {
|
||||
"filter": {
|
||||
"term": { "color": "red" } <3>
|
||||
},
|
||||
"aggs": {
|
||||
"models": {
|
||||
"terms": { "field": "model" } <3>
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"post_filter": { <4>
|
||||
"term": { "color": "red" }
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
<1> The main query now finds all shirts by Gucci, regardless of color.
|
||||
<2> The `colors` agg returns popular colors for shirts by Gucci.
|
||||
<3> The `color_red` agg limits the `models` sub-aggregation
|
||||
to *red* Gucci shirts.
|
||||
<4> Finally, the `post_filter` removes colors other than red
|
||||
from the search `hits`.
|
||||
|
|
@ -1,72 +0,0 @@
|
|||
[discrete]
|
||||
[[script-fields]]
|
||||
=== Script fields
|
||||
|
||||
You can use the `script_fields` parameter to retrieve a <<modules-scripting,script
|
||||
evaluation>> (based on different fields) for each hit. For example:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"query": {
|
||||
"match_all": {}
|
||||
},
|
||||
"script_fields": {
|
||||
"test1": {
|
||||
"script": {
|
||||
"lang": "painless",
|
||||
"source": "doc['price'].value * 2"
|
||||
}
|
||||
},
|
||||
"test2": {
|
||||
"script": {
|
||||
"lang": "painless",
|
||||
"source": "doc['price'].value * params.factor",
|
||||
"params": {
|
||||
"factor": 2.0
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:sales]
|
||||
|
||||
Script fields can work on fields that are not stored (`price` in
|
||||
the above case), and allow to return custom values to be returned (the
|
||||
evaluated value of the script).
|
||||
|
||||
Script fields can also access the actual `_source` document and
|
||||
extract specific elements to be returned from it by using `params['_source']`.
|
||||
Here is an example:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"query" : {
|
||||
"match_all": {}
|
||||
},
|
||||
"script_fields" : {
|
||||
"test1" : {
|
||||
"script" : "params['_source']['message']"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
|
||||
Note the `_source` keyword here to navigate the json-like model.
|
||||
|
||||
It's important to understand the difference between
|
||||
`doc['my_field'].value` and `params['_source']['my_field']`. The first,
|
||||
using the doc keyword, will cause the terms for that field to be loaded to
|
||||
memory (cached), which will result in faster execution, but more memory
|
||||
consumption. Also, the `doc[...]` notation only allows for simple valued
|
||||
fields (you can't return a json object from it) and makes sense only for
|
||||
non-analyzed or single term based fields. However, using `doc` is
|
||||
still the recommended way to access values from the document, if at all
|
||||
possible, because `_source` must be loaded and parsed every time it's used.
|
||||
Using `_source` is very slow.
|
||||
|
|
@ -1,81 +0,0 @@
|
|||
[discrete]
|
||||
[[search-after]]
|
||||
=== Search after
|
||||
|
||||
Pagination of results can be done by using the `from` and `size` but the cost becomes prohibitive when the deep pagination is reached.
|
||||
The `index.max_result_window` which defaults to 10,000 is a safeguard, search requests take heap memory and time proportional to `from + size`.
|
||||
The <<scroll-search-results,scroll>> API is recommended for efficient deep scrolling but scroll contexts are costly and it is not
|
||||
recommended to use it for real time user requests.
|
||||
The `search_after` parameter circumvents this problem by providing a live cursor.
|
||||
The idea is to use the results from the previous page to help the retrieval of the next page.
|
||||
|
||||
Suppose that the query to retrieve the first page looks like this:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search
|
||||
{
|
||||
"size": 10,
|
||||
"query": {
|
||||
"match" : {
|
||||
"message" : "foo"
|
||||
}
|
||||
},
|
||||
"sort": [
|
||||
{"@timestamp": "asc"},
|
||||
{"tie_breaker_id": "asc"} <1>
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
// TEST[s/"tie_breaker_id": "asc"/"tie_breaker_id": {"unmapped_type": "keyword"}/]
|
||||
|
||||
<1> A copy of the `_id` field with `doc_values` enabled
|
||||
|
||||
[IMPORTANT]
|
||||
A field with one unique value per document should be used as the tiebreaker
|
||||
of the sort specification. Otherwise the sort order for documents that have
|
||||
the same sort values would be undefined and could lead to missing or duplicate
|
||||
results. The <<mapping-id-field,`_id` field>> has a unique value per document
|
||||
but it is not recommended to use it as a tiebreaker directly.
|
||||
Beware that `search_after` looks for the first document which fully or partially
|
||||
matches tiebreaker's provided value. Therefore if a document has a tiebreaker value of
|
||||
`"654323"` and you `search_after` for `"654"` it would still match that document
|
||||
and return results found after it.
|
||||
<<doc-values,doc value>> are disabled on this field so sorting on it requires
|
||||
to load a lot of data in memory. Instead it is advised to duplicate (client side
|
||||
or with a <<ingest-processors,set ingest processor>>) the content
|
||||
of the <<mapping-id-field,`_id` field>> in another field that has
|
||||
<<doc-values,doc value>> enabled and to use this new field as the tiebreaker
|
||||
for the sort.
|
||||
|
||||
The result from the above request includes an array of `sort values` for each document.
|
||||
These `sort values` can be used in conjunction with the `search_after` parameter to start returning results "after" any
|
||||
document in the result list.
|
||||
For instance we can use the `sort values` of the last document and pass it to `search_after` to retrieve the next page of results:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search
|
||||
{
|
||||
"size": 10,
|
||||
"query": {
|
||||
"match" : {
|
||||
"message" : "foo"
|
||||
}
|
||||
},
|
||||
"search_after": [1463538857, "654323"],
|
||||
"sort": [
|
||||
{"@timestamp": "asc"},
|
||||
{"tie_breaker_id": "asc"}
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
// TEST[s/"tie_breaker_id": "asc"/"tie_breaker_id": {"unmapped_type": "keyword"}/]
|
||||
|
||||
NOTE: The parameter `from` must be set to 0 (or -1) when `search_after` is used.
|
||||
|
||||
`search_after` is not a solution to jump freely to a random page but rather to scroll many queries in parallel.
|
||||
It is very similar to the `scroll` API but unlike it, the `search_after` parameter is stateless, it is always resolved against the latest
|
||||
version of the searcher. For this reason the sort order may change during a walk depending on the updates and deletes of your index.
|
|
@ -1,65 +0,0 @@
|
|||
WARNING: The `stored_fields` parameter is for fields that are explicitly marked as
|
||||
stored in the mapping, which is off by default and generally not recommended.
|
||||
Use <<source-filtering,source filtering>> instead to select
|
||||
subsets of the original source document to be returned.
|
||||
|
||||
Allows to selectively load specific stored fields for each document represented
|
||||
by a search hit.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"stored_fields" : ["user", "postDate"],
|
||||
"query" : {
|
||||
"term" : { "user" : "kimchy" }
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
`*` can be used to load all stored fields from the document.
|
||||
|
||||
An empty array will cause only the `_id` and `_type` for each hit to be
|
||||
returned, for example:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"stored_fields" : [],
|
||||
"query" : {
|
||||
"term" : { "user" : "kimchy" }
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If the requested fields are not stored (`store` mapping set to `false`), they will be ignored.
|
||||
|
||||
Stored field values fetched from the document itself are always returned as an array. On the contrary, metadata fields like `_routing` are never returned as an array.
|
||||
|
||||
Also only leaf fields can be returned via the `stored_fields` option. If an object field is specified, it will be ignored.
|
||||
|
||||
NOTE: On its own, `stored_fields` cannot be used to load fields in nested
|
||||
objects -- if a field contains a nested object in its path, then no data will
|
||||
be returned for that stored field. To access nested fields, `stored_fields`
|
||||
must be used within an <<inner-hits, `inner_hits`>> block.
|
||||
|
||||
[discrete]
|
||||
[[disable-stored-fields]]
|
||||
==== Disable stored fields
|
||||
|
||||
To disable the stored fields (and metadata fields) entirely use: `_none_`:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"stored_fields": "_none_",
|
||||
"query" : {
|
||||
"term" : { "user" : "kimchy" }
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
NOTE: <<source-filtering,`_source`>> and <<request-body-search-version, `version`>> parameters cannot be activated if `_none_` is used.
|
||||
|
|
@ -1,178 +0,0 @@
|
|||
[discrete]
|
||||
[[track-total-hits]]
|
||||
=== Track total hits
|
||||
|
||||
Generally the total hit count can't be computed accurately without visiting all
|
||||
matches, which is costly for queries that match lots of documents. The
|
||||
`track_total_hits` parameter allows you to control how the total number of hits
|
||||
should be tracked.
|
||||
Given that it is often enough to have a lower bound of the number of hits,
|
||||
such as "there are at least 10000 hits", the default is set to `10,000`.
|
||||
This means that requests will count the total hit accurately up to `10,000` hits.
|
||||
It's is a good trade off to speed up searches if you don't need the accurate number
|
||||
of hits after a certain threshold.
|
||||
|
||||
When set to `true` the search response will always track the number of hits that
|
||||
match the query accurately (e.g. `total.relation` will always be equal to `"eq"`
|
||||
when `track_total_hits` is set to true). Otherwise the `"total.relation"` returned
|
||||
in the `"total"` object in the search response determines how the `"total.value"`
|
||||
should be interpreted. A value of `"gte"` means that the `"total.value"` is a
|
||||
lower bound of the total hits that match the query and a value of `"eq"` indicates
|
||||
that `"total.value"` is the accurate count.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search
|
||||
{
|
||||
"track_total_hits": true,
|
||||
"query": {
|
||||
"match" : {
|
||||
"user.id" : "elkbee"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
|
||||
\... returns:
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"_shards": ...
|
||||
"timed_out": false,
|
||||
"took": 100,
|
||||
"hits": {
|
||||
"max_score": 1.0,
|
||||
"total" : {
|
||||
"value": 2048, <1>
|
||||
"relation": "eq" <2>
|
||||
},
|
||||
"hits": ...
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
|
||||
// TESTRESPONSE[s/"took": 100/"took": $body.took/]
|
||||
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
|
||||
// TESTRESPONSE[s/"value": 2048/"value": $body.hits.total.value/]
|
||||
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
|
||||
|
||||
<1> The total number of hits that match the query.
|
||||
<2> The count is accurate (e.g. `"eq"` means equals).
|
||||
|
||||
It is also possible to set `track_total_hits` to an integer.
|
||||
For instance the following query will accurately track the total hit count that match
|
||||
the query up to 100 documents:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search
|
||||
{
|
||||
"track_total_hits": 100,
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "elkbee"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[continued]
|
||||
|
||||
The `hits.total.relation` in the response will indicate if the
|
||||
value returned in `hits.total.value` is accurate (`"eq"`) or a lower
|
||||
bound of the total (`"gte"`).
|
||||
|
||||
For instance the following response:
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"_shards": ...
|
||||
"timed_out": false,
|
||||
"took": 30,
|
||||
"hits": {
|
||||
"max_score": 1.0,
|
||||
"total": {
|
||||
"value": 42, <1>
|
||||
"relation": "eq" <2>
|
||||
},
|
||||
"hits": ...
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
|
||||
// TESTRESPONSE[s/"took": 30/"took": $body.took/]
|
||||
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
|
||||
// TESTRESPONSE[s/"value": 42/"value": $body.hits.total.value/]
|
||||
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
|
||||
|
||||
<1> 42 documents match the query
|
||||
<2> and the count is accurate (`"eq"`)
|
||||
|
||||
\... indicates that the number of hits returned in the `total`
|
||||
is accurate.
|
||||
|
||||
If the total number of hits that match the query is greater than the
|
||||
value set in `track_total_hits`, the total hits in the response
|
||||
will indicate that the returned value is a lower bound:
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"_shards": ...
|
||||
"hits": {
|
||||
"max_score": 1.0,
|
||||
"total": {
|
||||
"value": 100, <1>
|
||||
"relation": "gte" <2>
|
||||
},
|
||||
"hits": ...
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[skip:response is already tested in the previous snippet]
|
||||
|
||||
<1> There are at least 100 documents that match the query
|
||||
<2> This is a lower bound (`"gte"`).
|
||||
|
||||
If you don't need to track the total number of hits at all you can improve query
|
||||
times by setting this option to `false`:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search
|
||||
{
|
||||
"track_total_hits": false,
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "elkbee"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[continued]
|
||||
|
||||
\... returns:
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"_shards": ...
|
||||
"timed_out": false,
|
||||
"took": 10,
|
||||
"hits": { <1>
|
||||
"max_score": 1.0,
|
||||
"hits": ...
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
|
||||
// TESTRESPONSE[s/"took": 10/"took": $body.took/]
|
||||
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
|
||||
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
|
||||
|
||||
<1> The total number of hits is unknown.
|
||||
|
||||
Finally you can force an accurate count by setting `"track_total_hits"`
|
||||
to `true` in the request.
|
|
@ -1,222 +0,0 @@
|
|||
[[search-your-data]]
|
||||
= Search your data
|
||||
|
||||
[[search-query]]
|
||||
A _search query_, or _query_, is a request for information about data in
|
||||
{es} data streams or indices.
|
||||
|
||||
You can think of a query as a question, written in a way {es} understands.
|
||||
Depending on your data, you can use a query to get answers to questions like:
|
||||
|
||||
* What processes on my server take longer than 500 milliseconds to respond?
|
||||
* What users on my network ran `regsvr32.exe` within the last week?
|
||||
* What pages on my website contain a specific word or phrase?
|
||||
|
||||
A _search_ consists of one or more queries that are combined and sent to {es}.
|
||||
Documents that match a search's queries are returned in the _hits_, or
|
||||
_search results_, of the response.
|
||||
|
||||
A search may also contain additional information used to better process its
|
||||
queries. For example, a search may be limited to a specific index or only return
|
||||
a specific number of results.
|
||||
|
||||
[discrete]
|
||||
[[run-an-es-search]]
|
||||
== Run a search
|
||||
|
||||
You can use the <<search-search,search API>> to search and
|
||||
<<search-aggregations,aggregate>> data stored in {es} data streams or indices.
|
||||
The API's `query` request body parameter accepts queries written in
|
||||
<<query-dsl,Query DSL>>.
|
||||
|
||||
The following request searches `my-index-000001` using a
|
||||
<<query-dsl-match-query,`match`>> query. This query matches documents with a
|
||||
`user.id` value of `kimchy`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
The API response returns the top 10 documents matching the query in the
|
||||
`hits.hits` property.
|
||||
|
||||
[source,console-result]
|
||||
----
|
||||
{
|
||||
"took": 5,
|
||||
"timed_out": false,
|
||||
"_shards": {
|
||||
"total": 1,
|
||||
"successful": 1,
|
||||
"skipped": 0,
|
||||
"failed": 0
|
||||
},
|
||||
"hits": {
|
||||
"total": {
|
||||
"value": 1,
|
||||
"relation": "eq"
|
||||
},
|
||||
"max_score": 1.3862942,
|
||||
"hits": [
|
||||
{
|
||||
"_index": "my-index-000001",
|
||||
"_type": "_doc",
|
||||
"_id": "kxWFcnMByiguvud1Z8vC",
|
||||
"_score": 1.3862942,
|
||||
"_source": {
|
||||
"@timestamp": "2099-11-15T14:12:12",
|
||||
"http": {
|
||||
"request": {
|
||||
"method": "get"
|
||||
},
|
||||
"response": {
|
||||
"bytes": 1070000,
|
||||
"status_code": 200
|
||||
},
|
||||
"version": "1.1"
|
||||
},
|
||||
"message": "GET /search HTTP/1.1 200 1070000",
|
||||
"source": {
|
||||
"ip": "127.0.0.1"
|
||||
},
|
||||
"user": {
|
||||
"id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
// TESTRESPONSE[s/"took": 5/"took": "$body.took"/]
|
||||
// TESTRESPONSE[s/"_id": "kxWFcnMByiguvud1Z8vC"/"_id": "$body.hits.hits.0._id"/]
|
||||
|
||||
[discrete]
|
||||
[[common-search-options]]
|
||||
=== Common search options
|
||||
|
||||
You can use the following options to customize your searches.
|
||||
|
||||
*Query DSL* +
|
||||
<<query-dsl,Query DSL>> supports a variety of query types you can mix and match
|
||||
to get the results you want. Query types include:
|
||||
|
||||
* <<query-dsl-bool-query,Boolean>> and other <<compound-queries,compound
|
||||
queries>>, which let you combine queries and match results based on multiple
|
||||
criteria
|
||||
* <<term-level-queries,Term-level queries>> for filtering and finding exact matches
|
||||
* <<full-text-queries,Full text queries>>, which are commonly used in search
|
||||
engines
|
||||
* <<geo-queries,Geo>> and <<shape-queries,spatial queries>>
|
||||
|
||||
*Aggregations* +
|
||||
You can use <<search-aggregations,search aggregations>> to get statistics and
|
||||
other analytics for your search results. Aggregations help you answer questions
|
||||
like:
|
||||
|
||||
* What's the average response time for my servers?
|
||||
* What are the top IP addresses hit by users on my network?
|
||||
* What is the total transaction revenue by customer?
|
||||
|
||||
*Search multiple data streams and indices* +
|
||||
You can use comma-separated values and grep-like index patterns to search
|
||||
several data streams and indices in the same request. You can even boost search
|
||||
results from specific indices. See <<search-multiple-indices>>.
|
||||
|
||||
*Paginate search results* +
|
||||
By default, searches return only the top 10 matching hits. To retrieve
|
||||
more or fewer documents, see <<paginate-search-results>>.
|
||||
|
||||
*Retrieve selected fields* +
|
||||
The search response's `hit.hits` property includes the full document
|
||||
<<mapping-source-field,`_source`>> for each hit. To retrieve only a subset of
|
||||
the `_source` or other fields, see <<search-fields>>.
|
||||
|
||||
*Sort search results* +
|
||||
By default, search hits are sorted by `_score`, a <<relevance-scores,relevance
|
||||
score>> that measures how well each document matches the query. To customize the
|
||||
calculation of these scores, use the
|
||||
<<query-dsl-script-score-query,`script_score`>> query. To sort search hits by
|
||||
other field values, see <<sort-search-results>>.
|
||||
|
||||
*Run an async search* +
|
||||
{es} searches are designed to run on large volumes of data quickly, often
|
||||
returning results in milliseconds. For this reason, searches are
|
||||
_synchronous_ by default. The search request waits for complete results before
|
||||
returning a response.
|
||||
|
||||
However, complete results can take longer for searches across
|
||||
<<frozen-indices,frozen indices>> or <<modules-cross-cluster-search,multiple
|
||||
clusters>>.
|
||||
|
||||
To avoid long waits, you can use run an _asynchronous_, or _async_, search
|
||||
instead. An <<async-search-intro,async search>> lets you retrieve partial
|
||||
results for a long-running search now and get complete results later.
|
||||
|
||||
[discrete]
|
||||
[[search-timeout]]
|
||||
=== Search timeout
|
||||
|
||||
By default, search requests don't time out. The request waits for complete
|
||||
results before returning a response.
|
||||
|
||||
While <<async-search-intro,async search>> is designed for long-running
|
||||
searches, you can also use the `timeout` parameter to specify a duration you'd
|
||||
like to wait for a search to complete. If no response is received before this
|
||||
period ends, the request fails and returns an error.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001/_search
|
||||
{
|
||||
"timeout": "2s",
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
To set a cluster-wide default timeout for all search requests, configure
|
||||
`search.default_search_timeout` using the <<cluster-update-settings,cluster
|
||||
settings API>>. This global timeout duration is used if no `timeout` argument is
|
||||
passed in the request. If the global search timeout expires before the search
|
||||
request finishes, the request is cancelled using <<task-cancellation,task
|
||||
cancellation>>. The `search.default_search_timeout` setting defaults to `-1` (no
|
||||
timeout).
|
||||
|
||||
[discrete]
|
||||
[[global-search-cancellation]]
|
||||
=== Search cancellation
|
||||
|
||||
You can cancel a search request using the <<task-cancellation,task management
|
||||
API>>. {es} also automatically cancels a search request when your client's HTTP
|
||||
connection closes. We recommend you set up your client to close HTTP connections
|
||||
when a search request is aborted or times out.
|
||||
|
||||
include::request/track-total-hits.asciidoc[]
|
||||
include::quickly-check-for-matching-docs.asciidoc[]
|
||||
|
||||
include::request/collapse.asciidoc[]
|
||||
include::filter-search-results.asciidoc[]
|
||||
include::request/highlighting.asciidoc[]
|
||||
include::{es-repo-dir}/async-search.asciidoc[]
|
||||
include::{es-repo-dir}/search/near-real-time.asciidoc[]
|
||||
include::paginate-search-results.asciidoc[]
|
||||
include::request/inner-hits.asciidoc[]
|
||||
include::search-fields.asciidoc[]
|
||||
include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[]
|
||||
include::search-multiple-indices.asciidoc[]
|
||||
include::search-shard-routing.asciidoc[]
|
||||
include::request/sort.asciidoc[]
|
|
@ -1,3 +1,152 @@
|
|||
[[filter-search-results]]
|
||||
== Filter search results
|
||||
|
||||
You can use two methods to filter search results:
|
||||
|
||||
* Use a boolean query with a `filter` clause. Search requests apply
|
||||
<<query-dsl-bool-query,boolean filters>> to both search hits and
|
||||
<<search-aggregations,aggregations>>.
|
||||
|
||||
* Use the search API's `post_filter` parameter. Search requests apply
|
||||
<<post-filter,post filters>> only to search hits, not aggregations. You can use
|
||||
a post filter to calculate aggregations based on a broader result set, and then
|
||||
further narrow the results.
|
||||
+
|
||||
You can also <<rescore,rescore>> hits after the post filter to
|
||||
improve relevance and reorder results.
|
||||
|
||||
[discrete]
|
||||
[[post-filter]]
|
||||
=== Post filter
|
||||
|
||||
When you use the `post_filter` parameter to filter search results, the search
|
||||
hits are filtered after the aggregations are calculated. A post filter has no
|
||||
impact on the aggregation results.
|
||||
|
||||
For example, you are selling shirts that have the following properties:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT /shirts
|
||||
{
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"brand": { "type": "keyword"},
|
||||
"color": { "type": "keyword"},
|
||||
"model": { "type": "keyword"}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT /shirts/_doc/1?refresh
|
||||
{
|
||||
"brand": "gucci",
|
||||
"color": "red",
|
||||
"model": "slim"
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTSETUP
|
||||
|
||||
|
||||
Imagine a user has specified two filters:
|
||||
|
||||
`color:red` and `brand:gucci`. You only want to show them red shirts made by
|
||||
Gucci in the search results. Normally you would do this with a
|
||||
<<query-dsl-bool-query,`bool` query>>:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /shirts/_search
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"filter": [
|
||||
{ "term": { "color": "red" }},
|
||||
{ "term": { "brand": "gucci" }}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
However, you would also like to use _faceted navigation_ to display a list of
|
||||
other options that the user could click on. Perhaps you have a `model` field
|
||||
that would allow the user to limit their search results to red Gucci
|
||||
`t-shirts` or `dress-shirts`.
|
||||
|
||||
This can be done with a
|
||||
<<search-aggregations-bucket-terms-aggregation,`terms` aggregation>>:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /shirts/_search
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"filter": [
|
||||
{ "term": { "color": "red" }},
|
||||
{ "term": { "brand": "gucci" }}
|
||||
]
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"models": {
|
||||
"terms": { "field": "model" } <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
<1> Returns the most popular models of red shirts by Gucci.
|
||||
|
||||
But perhaps you would also like to tell the user how many Gucci shirts are
|
||||
available in *other colors*. If you just add a `terms` aggregation on the
|
||||
`color` field, you will only get back the color `red`, because your query
|
||||
returns only red shirts by Gucci.
|
||||
|
||||
Instead, you want to include shirts of all colors during aggregation, then
|
||||
apply the `colors` filter only to the search results. This is the purpose of
|
||||
the `post_filter`:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /shirts/_search
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"filter": {
|
||||
"term": { "brand": "gucci" } <1>
|
||||
}
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"colors": {
|
||||
"terms": { "field": "color" } <2>
|
||||
},
|
||||
"color_red": {
|
||||
"filter": {
|
||||
"term": { "color": "red" } <3>
|
||||
},
|
||||
"aggs": {
|
||||
"models": {
|
||||
"terms": { "field": "model" } <3>
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"post_filter": { <4>
|
||||
"term": { "color": "red" }
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
<1> The main query now finds all shirts by Gucci, regardless of color.
|
||||
<2> The `colors` agg returns popular colors for shirts by Gucci.
|
||||
<3> The `color_red` agg limits the `models` sub-aggregation
|
||||
to *red* Gucci shirts.
|
||||
<4> Finally, the `post_filter` removes colors other than red
|
||||
from the search `hits`.
|
||||
|
||||
[discrete]
|
||||
[[rescore]]
|
||||
=== Rescore filtered search results
|
|
@ -1,3 +1,52 @@
|
|||
[[paginate-search-results]]
|
||||
== Paginate search results
|
||||
|
||||
By default, the <<search-search,search API>> returns the top 10 matching documents.
|
||||
|
||||
To paginate through a larger set of results, you can use the search API's `size`
|
||||
and `from` parameters. The `size` parameter is the number of matching documents
|
||||
to return. The `from` parameter is a zero-indexed offset from the beginning of
|
||||
the complete result set that indicates the document you want to start with.
|
||||
|
||||
The following search API request sets the `from` offset to `5`, meaning the
|
||||
request offsets, or skips, the first five matching documents.
|
||||
|
||||
The `size` parameter is `20`, meaning the request can return up to 20 documents,
|
||||
starting at the offset.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /_search
|
||||
{
|
||||
"from": 5,
|
||||
"size": 20,
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
By default, you cannot page through more than 10,000 documents using the `from`
|
||||
and `size` parameters. This limit is set using the
|
||||
<<index-max-result-window,`index.max_result_window`>> index setting.
|
||||
|
||||
Deep paging or requesting many results at once can result in slow searches.
|
||||
Results are sorted before being returned. Because search requests usually span
|
||||
multiple shards, each shard must generate its own sorted results. These separate
|
||||
results must then be combined and sorted to ensure that the overall sort order
|
||||
is correct.
|
||||
|
||||
As an alternative to deep paging, we recommend using
|
||||
<<scroll-search-results,scroll>> or the
|
||||
<<search-after,`search_after`>> parameter.
|
||||
|
||||
WARNING: {es} uses Lucene's internal doc IDs as tie-breakers. These internal
|
||||
doc IDs can be completely different across replicas of the same
|
||||
data. When paginating, you might occasionally see that documents with the same
|
||||
sort values are not ordered consistently.
|
||||
|
||||
[discrete]
|
||||
[[scroll-search-results]]
|
||||
=== Scroll search results
|
||||
|
@ -291,3 +340,85 @@ For append only time-based indices, the `timestamp` field can be used safely.
|
|||
|
||||
NOTE: By default the maximum number of slices allowed per scroll is limited to 1024.
|
||||
You can update the `index.max_slices_per_scroll` index setting to bypass this limit.
|
||||
|
||||
[discrete]
|
||||
[[search-after]]
|
||||
=== Search after
|
||||
|
||||
Pagination of results can be done by using the `from` and `size` but the cost becomes prohibitive when the deep pagination is reached.
|
||||
The `index.max_result_window` which defaults to 10,000 is a safeguard, search requests take heap memory and time proportional to `from + size`.
|
||||
The <<scroll-search-results,scroll>> API is recommended for efficient deep scrolling but scroll contexts are costly and it is not
|
||||
recommended to use it for real time user requests.
|
||||
The `search_after` parameter circumvents this problem by providing a live cursor.
|
||||
The idea is to use the results from the previous page to help the retrieval of the next page.
|
||||
|
||||
Suppose that the query to retrieve the first page looks like this:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search
|
||||
{
|
||||
"size": 10,
|
||||
"query": {
|
||||
"match" : {
|
||||
"message" : "foo"
|
||||
}
|
||||
},
|
||||
"sort": [
|
||||
{"@timestamp": "asc"},
|
||||
{"tie_breaker_id": "asc"} <1>
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
// TEST[s/"tie_breaker_id": "asc"/"tie_breaker_id": {"unmapped_type": "keyword"}/]
|
||||
|
||||
<1> A copy of the `_id` field with `doc_values` enabled
|
||||
|
||||
[IMPORTANT]
|
||||
A field with one unique value per document should be used as the tiebreaker
|
||||
of the sort specification. Otherwise the sort order for documents that have
|
||||
the same sort values would be undefined and could lead to missing or duplicate
|
||||
results. The <<mapping-id-field,`_id` field>> has a unique value per document
|
||||
but it is not recommended to use it as a tiebreaker directly.
|
||||
Beware that `search_after` looks for the first document which fully or partially
|
||||
matches tiebreaker's provided value. Therefore if a document has a tiebreaker value of
|
||||
`"654323"` and you `search_after` for `"654"` it would still match that document
|
||||
and return results found after it.
|
||||
<<doc-values,doc value>> are disabled on this field so sorting on it requires
|
||||
to load a lot of data in memory. Instead it is advised to duplicate (client side
|
||||
or with a <<ingest-processors,set ingest processor>>) the content
|
||||
of the <<mapping-id-field,`_id` field>> in another field that has
|
||||
<<doc-values,doc value>> enabled and to use this new field as the tiebreaker
|
||||
for the sort.
|
||||
|
||||
The result from the above request includes an array of `sort values` for each document.
|
||||
These `sort values` can be used in conjunction with the `search_after` parameter to start returning results "after" any
|
||||
document in the result list.
|
||||
For instance we can use the `sort values` of the last document and pass it to `search_after` to retrieve the next page of results:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search
|
||||
{
|
||||
"size": 10,
|
||||
"query": {
|
||||
"match" : {
|
||||
"message" : "foo"
|
||||
}
|
||||
},
|
||||
"search_after": [1463538857, "654323"],
|
||||
"sort": [
|
||||
{"@timestamp": "asc"},
|
||||
{"tie_breaker_id": "asc"}
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
// TEST[s/"tie_breaker_id": "asc"/"tie_breaker_id": {"unmapped_type": "keyword"}/]
|
||||
|
||||
NOTE: The parameter `from` must be set to 0 (or -1) when `search_after` is used.
|
||||
|
||||
`search_after` is not a solution to jump freely to a random page but rather to scroll many queries in parallel.
|
||||
It is very similar to the `scroll` API but unlike it, the `search_after` parameter is stateless, it is always resolved against the latest
|
||||
version of the searcher. For this reason the sort order may change during a walk depending on the updates and deletes of your index.
|
|
@ -231,7 +231,70 @@ It's also possible to store an individual field's values by using the
|
|||
<<mapping-store,`store`>> mapping option. You can use the
|
||||
`stored_fields` parameter to include these stored values in the search response.
|
||||
|
||||
include::request/stored-fields.asciidoc[]
|
||||
WARNING: The `stored_fields` parameter is for fields that are explicitly marked as
|
||||
stored in the mapping, which is off by default and generally not recommended.
|
||||
Use <<source-filtering,source filtering>> instead to select
|
||||
subsets of the original source document to be returned.
|
||||
|
||||
Allows to selectively load specific stored fields for each document represented
|
||||
by a search hit.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"stored_fields" : ["user", "postDate"],
|
||||
"query" : {
|
||||
"term" : { "user" : "kimchy" }
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
`*` can be used to load all stored fields from the document.
|
||||
|
||||
An empty array will cause only the `_id` and `_type` for each hit to be
|
||||
returned, for example:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"stored_fields" : [],
|
||||
"query" : {
|
||||
"term" : { "user" : "kimchy" }
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If the requested fields are not stored (`store` mapping set to `false`), they will be ignored.
|
||||
|
||||
Stored field values fetched from the document itself are always returned as an array. On the contrary, metadata fields like `_routing` are never returned as an array.
|
||||
|
||||
Also only leaf fields can be returned via the `stored_fields` option. If an object field is specified, it will be ignored.
|
||||
|
||||
NOTE: On its own, `stored_fields` cannot be used to load fields in nested
|
||||
objects -- if a field contains a nested object in its path, then no data will
|
||||
be returned for that stored field. To access nested fields, `stored_fields`
|
||||
must be used within an <<inner-hits, `inner_hits`>> block.
|
||||
|
||||
[discrete]
|
||||
[[disable-stored-fields]]
|
||||
==== Disable stored fields
|
||||
|
||||
To disable the stored fields (and metadata fields) entirely use: `_none_`:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"stored_fields": "_none_",
|
||||
"query" : {
|
||||
"term" : { "user" : "kimchy" }
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
NOTE: <<source-filtering,`_source`>> and <<request-body-search-version, `version`>> parameters cannot be activated if `_none_` is used.
|
||||
|
||||
[discrete]
|
||||
[[source-filtering]]
|
||||
|
@ -319,4 +382,74 @@ GET /_search
|
|||
}
|
||||
----
|
||||
|
||||
include::request/script-fields.asciidoc[]
|
||||
[discrete]
|
||||
[[script-fields]]
|
||||
=== Script fields
|
||||
|
||||
You can use the `script_fields` parameter to retrieve a <<modules-scripting,script
|
||||
evaluation>> (based on different fields) for each hit. For example:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"query": {
|
||||
"match_all": {}
|
||||
},
|
||||
"script_fields": {
|
||||
"test1": {
|
||||
"script": {
|
||||
"lang": "painless",
|
||||
"source": "doc['price'].value * 2"
|
||||
}
|
||||
},
|
||||
"test2": {
|
||||
"script": {
|
||||
"lang": "painless",
|
||||
"source": "doc['price'].value * params.factor",
|
||||
"params": {
|
||||
"factor": 2.0
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:sales]
|
||||
|
||||
Script fields can work on fields that are not stored (`price` in
|
||||
the above case), and allow to return custom values to be returned (the
|
||||
evaluated value of the script).
|
||||
|
||||
Script fields can also access the actual `_source` document and
|
||||
extract specific elements to be returned from it by using `params['_source']`.
|
||||
Here is an example:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"query" : {
|
||||
"match_all": {}
|
||||
},
|
||||
"script_fields" : {
|
||||
"test1" : {
|
||||
"script" : "params['_source']['message']"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
|
||||
Note the `_source` keyword here to navigate the json-like model.
|
||||
|
||||
It's important to understand the difference between
|
||||
`doc['my_field'].value` and `params['_source']['my_field']`. The first,
|
||||
using the doc keyword, will cause the terms for that field to be loaded to
|
||||
memory (cached), which will result in faster execution, but more memory
|
||||
consumption. Also, the `doc[...]` notation only allows for simple valued
|
||||
fields (you can't return a json object from it) and makes sense only for
|
||||
non-analyzed or single term based fields. However, using `doc` is
|
||||
still the recommended way to access values from the document, if at all
|
||||
possible, because `_source` must be loaded and parsed every time it's used.
|
||||
Using `_source` is very slow.
|
|
@ -0,0 +1,459 @@
|
|||
[[search-your-data]]
|
||||
= Search your data
|
||||
|
||||
[[search-query]]
|
||||
A _search query_, or _query_, is a request for information about data in
|
||||
{es} data streams or indices.
|
||||
|
||||
You can think of a query as a question, written in a way {es} understands.
|
||||
Depending on your data, you can use a query to get answers to questions like:
|
||||
|
||||
* What processes on my server take longer than 500 milliseconds to respond?
|
||||
* What users on my network ran `regsvr32.exe` within the last week?
|
||||
* What pages on my website contain a specific word or phrase?
|
||||
|
||||
A _search_ consists of one or more queries that are combined and sent to {es}.
|
||||
Documents that match a search's queries are returned in the _hits_, or
|
||||
_search results_, of the response.
|
||||
|
||||
A search may also contain additional information used to better process its
|
||||
queries. For example, a search may be limited to a specific index or only return
|
||||
a specific number of results.
|
||||
|
||||
[discrete]
|
||||
[[run-an-es-search]]
|
||||
== Run a search
|
||||
|
||||
You can use the <<search-search,search API>> to search and
|
||||
<<search-aggregations,aggregate>> data stored in {es} data streams or indices.
|
||||
The API's `query` request body parameter accepts queries written in
|
||||
<<query-dsl,Query DSL>>.
|
||||
|
||||
The following request searches `my-index-000001` using a
|
||||
<<query-dsl-match-query,`match`>> query. This query matches documents with a
|
||||
`user.id` value of `kimchy`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
The API response returns the top 10 documents matching the query in the
|
||||
`hits.hits` property.
|
||||
|
||||
[source,console-result]
|
||||
----
|
||||
{
|
||||
"took": 5,
|
||||
"timed_out": false,
|
||||
"_shards": {
|
||||
"total": 1,
|
||||
"successful": 1,
|
||||
"skipped": 0,
|
||||
"failed": 0
|
||||
},
|
||||
"hits": {
|
||||
"total": {
|
||||
"value": 1,
|
||||
"relation": "eq"
|
||||
},
|
||||
"max_score": 1.3862942,
|
||||
"hits": [
|
||||
{
|
||||
"_index": "my-index-000001",
|
||||
"_type": "_doc",
|
||||
"_id": "kxWFcnMByiguvud1Z8vC",
|
||||
"_score": 1.3862942,
|
||||
"_source": {
|
||||
"@timestamp": "2099-11-15T14:12:12",
|
||||
"http": {
|
||||
"request": {
|
||||
"method": "get"
|
||||
},
|
||||
"response": {
|
||||
"bytes": 1070000,
|
||||
"status_code": 200
|
||||
},
|
||||
"version": "1.1"
|
||||
},
|
||||
"message": "GET /search HTTP/1.1 200 1070000",
|
||||
"source": {
|
||||
"ip": "127.0.0.1"
|
||||
},
|
||||
"user": {
|
||||
"id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
// TESTRESPONSE[s/"took": 5/"took": "$body.took"/]
|
||||
// TESTRESPONSE[s/"_id": "kxWFcnMByiguvud1Z8vC"/"_id": "$body.hits.hits.0._id"/]
|
||||
|
||||
[discrete]
|
||||
[[common-search-options]]
|
||||
=== Common search options
|
||||
|
||||
You can use the following options to customize your searches.
|
||||
|
||||
*Query DSL* +
|
||||
<<query-dsl,Query DSL>> supports a variety of query types you can mix and match
|
||||
to get the results you want. Query types include:
|
||||
|
||||
* <<query-dsl-bool-query,Boolean>> and other <<compound-queries,compound
|
||||
queries>>, which let you combine queries and match results based on multiple
|
||||
criteria
|
||||
* <<term-level-queries,Term-level queries>> for filtering and finding exact matches
|
||||
* <<full-text-queries,Full text queries>>, which are commonly used in search
|
||||
engines
|
||||
* <<geo-queries,Geo>> and <<shape-queries,spatial queries>>
|
||||
|
||||
*Aggregations* +
|
||||
You can use <<search-aggregations,search aggregations>> to get statistics and
|
||||
other analytics for your search results. Aggregations help you answer questions
|
||||
like:
|
||||
|
||||
* What's the average response time for my servers?
|
||||
* What are the top IP addresses hit by users on my network?
|
||||
* What is the total transaction revenue by customer?
|
||||
|
||||
*Search multiple data streams and indices* +
|
||||
You can use comma-separated values and grep-like index patterns to search
|
||||
several data streams and indices in the same request. You can even boost search
|
||||
results from specific indices. See <<search-multiple-indices>>.
|
||||
|
||||
*Paginate search results* +
|
||||
By default, searches return only the top 10 matching hits. To retrieve
|
||||
more or fewer documents, see <<paginate-search-results>>.
|
||||
|
||||
*Retrieve selected fields* +
|
||||
The search response's `hit.hits` property includes the full document
|
||||
<<mapping-source-field,`_source`>> for each hit. To retrieve only a subset of
|
||||
the `_source` or other fields, see <<search-fields>>.
|
||||
|
||||
*Sort search results* +
|
||||
By default, search hits are sorted by `_score`, a <<relevance-scores,relevance
|
||||
score>> that measures how well each document matches the query. To customize the
|
||||
calculation of these scores, use the
|
||||
<<query-dsl-script-score-query,`script_score`>> query. To sort search hits by
|
||||
other field values, see <<sort-search-results>>.
|
||||
|
||||
*Run an async search* +
|
||||
{es} searches are designed to run on large volumes of data quickly, often
|
||||
returning results in milliseconds. For this reason, searches are
|
||||
_synchronous_ by default. The search request waits for complete results before
|
||||
returning a response.
|
||||
|
||||
However, complete results can take longer for searches across
|
||||
<<frozen-indices,frozen indices>> or <<modules-cross-cluster-search,multiple
|
||||
clusters>>.
|
||||
|
||||
To avoid long waits, you can use run an _asynchronous_, or _async_, search
|
||||
instead. An <<async-search-intro,async search>> lets you retrieve partial
|
||||
results for a long-running search now and get complete results later.
|
||||
|
||||
[discrete]
|
||||
[[search-timeout]]
|
||||
=== Search timeout
|
||||
|
||||
By default, search requests don't time out. The request waits for complete
|
||||
results before returning a response.
|
||||
|
||||
While <<async-search-intro,async search>> is designed for long-running
|
||||
searches, you can also use the `timeout` parameter to specify a duration you'd
|
||||
like to wait for a search to complete. If no response is received before this
|
||||
period ends, the request fails and returns an error.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001/_search
|
||||
{
|
||||
"timeout": "2s",
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
To set a cluster-wide default timeout for all search requests, configure
|
||||
`search.default_search_timeout` using the <<cluster-update-settings,cluster
|
||||
settings API>>. This global timeout duration is used if no `timeout` argument is
|
||||
passed in the request. If the global search timeout expires before the search
|
||||
request finishes, the request is cancelled using <<task-cancellation,task
|
||||
cancellation>>. The `search.default_search_timeout` setting defaults to `-1` (no
|
||||
timeout).
|
||||
|
||||
[discrete]
|
||||
[[global-search-cancellation]]
|
||||
=== Search cancellation
|
||||
|
||||
You can cancel a search request using the <<task-cancellation,task management
|
||||
API>>. {es} also automatically cancels a search request when your client's HTTP
|
||||
connection closes. We recommend you set up your client to close HTTP connections
|
||||
when a search request is aborted or times out.
|
||||
|
||||
[discrete]
|
||||
[[track-total-hits]]
|
||||
=== Track total hits
|
||||
|
||||
Generally the total hit count can't be computed accurately without visiting all
|
||||
matches, which is costly for queries that match lots of documents. The
|
||||
`track_total_hits` parameter allows you to control how the total number of hits
|
||||
should be tracked.
|
||||
Given that it is often enough to have a lower bound of the number of hits,
|
||||
such as "there are at least 10000 hits", the default is set to `10,000`.
|
||||
This means that requests will count the total hit accurately up to `10,000` hits.
|
||||
It's is a good trade off to speed up searches if you don't need the accurate number
|
||||
of hits after a certain threshold.
|
||||
|
||||
When set to `true` the search response will always track the number of hits that
|
||||
match the query accurately (e.g. `total.relation` will always be equal to `"eq"`
|
||||
when `track_total_hits` is set to true). Otherwise the `"total.relation"` returned
|
||||
in the `"total"` object in the search response determines how the `"total.value"`
|
||||
should be interpreted. A value of `"gte"` means that the `"total.value"` is a
|
||||
lower bound of the total hits that match the query and a value of `"eq"` indicates
|
||||
that `"total.value"` is the accurate count.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search
|
||||
{
|
||||
"track_total_hits": true,
|
||||
"query": {
|
||||
"match" : {
|
||||
"user.id" : "elkbee"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
|
||||
\... returns:
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"_shards": ...
|
||||
"timed_out": false,
|
||||
"took": 100,
|
||||
"hits": {
|
||||
"max_score": 1.0,
|
||||
"total" : {
|
||||
"value": 2048, <1>
|
||||
"relation": "eq" <2>
|
||||
},
|
||||
"hits": ...
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
|
||||
// TESTRESPONSE[s/"took": 100/"took": $body.took/]
|
||||
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
|
||||
// TESTRESPONSE[s/"value": 2048/"value": $body.hits.total.value/]
|
||||
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
|
||||
|
||||
<1> The total number of hits that match the query.
|
||||
<2> The count is accurate (e.g. `"eq"` means equals).
|
||||
|
||||
It is also possible to set `track_total_hits` to an integer.
|
||||
For instance the following query will accurately track the total hit count that match
|
||||
the query up to 100 documents:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search
|
||||
{
|
||||
"track_total_hits": 100,
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "elkbee"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[continued]
|
||||
|
||||
The `hits.total.relation` in the response will indicate if the
|
||||
value returned in `hits.total.value` is accurate (`"eq"`) or a lower
|
||||
bound of the total (`"gte"`).
|
||||
|
||||
For instance the following response:
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"_shards": ...
|
||||
"timed_out": false,
|
||||
"took": 30,
|
||||
"hits": {
|
||||
"max_score": 1.0,
|
||||
"total": {
|
||||
"value": 42, <1>
|
||||
"relation": "eq" <2>
|
||||
},
|
||||
"hits": ...
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
|
||||
// TESTRESPONSE[s/"took": 30/"took": $body.took/]
|
||||
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
|
||||
// TESTRESPONSE[s/"value": 42/"value": $body.hits.total.value/]
|
||||
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
|
||||
|
||||
<1> 42 documents match the query
|
||||
<2> and the count is accurate (`"eq"`)
|
||||
|
||||
\... indicates that the number of hits returned in the `total`
|
||||
is accurate.
|
||||
|
||||
If the total number of hits that match the query is greater than the
|
||||
value set in `track_total_hits`, the total hits in the response
|
||||
will indicate that the returned value is a lower bound:
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"_shards": ...
|
||||
"hits": {
|
||||
"max_score": 1.0,
|
||||
"total": {
|
||||
"value": 100, <1>
|
||||
"relation": "gte" <2>
|
||||
},
|
||||
"hits": ...
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[skip:response is already tested in the previous snippet]
|
||||
|
||||
<1> There are at least 100 documents that match the query
|
||||
<2> This is a lower bound (`"gte"`).
|
||||
|
||||
If you don't need to track the total number of hits at all you can improve query
|
||||
times by setting this option to `false`:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search
|
||||
{
|
||||
"track_total_hits": false,
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "elkbee"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[continued]
|
||||
|
||||
\... returns:
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"_shards": ...
|
||||
"timed_out": false,
|
||||
"took": 10,
|
||||
"hits": { <1>
|
||||
"max_score": 1.0,
|
||||
"hits": ...
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
|
||||
// TESTRESPONSE[s/"took": 10/"took": $body.took/]
|
||||
// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
|
||||
// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
|
||||
|
||||
<1> The total number of hits is unknown.
|
||||
|
||||
Finally you can force an accurate count by setting `"track_total_hits"`
|
||||
to `true` in the request.
|
||||
|
||||
[discrete]
|
||||
[[quickly-check-for-matching-docs]]
|
||||
=== Quickly check for matching docs
|
||||
|
||||
If you only want to know if there are any documents matching a
|
||||
specific query, you can set the `size` to `0` to indicate that we are not
|
||||
interested in the search results. You can also set `terminate_after` to `1`
|
||||
to indicate that the query execution can be terminated whenever the first
|
||||
matching document was found (per shard).
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search?q=user.id:elkbee&size=0&terminate_after=1
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
|
||||
NOTE: `terminate_after` is always applied **after** the
|
||||
<<post-filter,`post_filter`>> and stops the query as well as the aggregation
|
||||
executions when enough hits have been collected on the shard. Though the doc
|
||||
count on aggregations may not reflect the `hits.total` in the response since
|
||||
aggregations are applied **before** the post filtering.
|
||||
|
||||
The response will not contain any hits as the `size` was set to `0`. The
|
||||
`hits.total` will be either equal to `0`, indicating that there were no
|
||||
matching documents, or greater than `0` meaning that there were at least
|
||||
as many documents matching the query when it was early terminated.
|
||||
Also if the query was terminated early, the `terminated_early` flag will
|
||||
be set to `true` in the response.
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"took": 3,
|
||||
"timed_out": false,
|
||||
"terminated_early": true,
|
||||
"_shards": {
|
||||
"total": 1,
|
||||
"successful": 1,
|
||||
"skipped" : 0,
|
||||
"failed": 0
|
||||
},
|
||||
"hits": {
|
||||
"total" : {
|
||||
"value": 1,
|
||||
"relation": "eq"
|
||||
},
|
||||
"max_score": null,
|
||||
"hits": []
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TESTRESPONSE[s/"took": 3/"took": $body.took/]
|
||||
|
||||
|
||||
The `took` time in the response contains the milliseconds that this request
|
||||
took for processing, beginning quickly after the node received the query, up
|
||||
until all search related work is done and before the above JSON is returned
|
||||
to the client. This means it includes the time spent waiting in thread pools,
|
||||
executing a distributed search across the whole cluster and gathering all the
|
||||
results.
|
||||
|
||||
include::collapse-search-results.asciidoc[]
|
||||
include::filter-search-results.asciidoc[]
|
||||
include::highlighting.asciidoc[]
|
||||
include::long-running-searches.asciidoc[]
|
||||
include::near-real-time.asciidoc[]
|
||||
include::paginate-search-results.asciidoc[]
|
||||
include::retrieve-inner-hits.asciidoc[]
|
||||
include::retrieve-selected-fields.asciidoc[]
|
||||
include::search-across-clusters.asciidoc[]
|
||||
include::search-multiple-indices.asciidoc[]
|
||||
include::search-shard-routing.asciidoc[]
|
||||
include::sort-search-results.asciidoc[]
|
Loading…
Reference in New Issue