[[search-your-data]] = Search your data [[search-query]] A _search query_, or _query_, is a request for information about data in {es} data streams or indices. You can think of a query as a question, written in a way {es} understands. Depending on your data, you can use a query to get answers to questions like: * What processes on my server take longer than 500 milliseconds to respond? * What users on my network ran `regsvr32.exe` within the last week? * What pages on my website contain a specific word or phrase? A _search_ consists of one or more queries that are combined and sent to {es}. Documents that match a search's queries are returned in the _hits_, or _search results_, of the response. A search may also contain additional information used to better process its queries. For example, a search may be limited to a specific index or only return a specific number of results. [discrete] [[run-an-es-search]] == Run a search You can use the <> to search and <> data stored in {es} data streams or indices. The API's `query` request body parameter accepts queries written in <>. The following request searches `my-index-000001` using a <> query. This query matches documents with a `user.id` value of `kimchy`. [source,console] ---- GET /my-index-000001/_search { "query": { "match": { "user.id": "kimchy" } } } ---- // TEST[setup:my_index] The API response returns the top 10 documents matching the query in the `hits.hits` property. [source,console-result] ---- { "took": 5, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 1.3862942, "hits": [ { "_index": "my-index-000001", "_type": "_doc", "_id": "kxWFcnMByiguvud1Z8vC", "_score": 1.3862942, "_source": { "@timestamp": "2099-11-15T14:12:12", "http": { "request": { "method": "get" }, "response": { "bytes": 1070000, "status_code": 200 }, "version": "1.1" }, "message": "GET /search HTTP/1.1 200 1070000", "source": { "ip": "127.0.0.1" }, "user": { "id": "kimchy" } } } ] } } ---- // TESTRESPONSE[s/"took": 5/"took": "$body.took"/] // TESTRESPONSE[s/"_id": "kxWFcnMByiguvud1Z8vC"/"_id": "$body.hits.hits.0._id"/] [discrete] [[common-search-options]] === Common search options You can use the following options to customize your searches. *Query DSL* + <> supports a variety of query types you can mix and match to get the results you want. Query types include: * <> and other <>, which let you combine queries and match results based on multiple criteria * <> for filtering and finding exact matches * <>, which are commonly used in search engines * <> and <> *Aggregations* + You can use <> to get statistics and other analytics for your search results. Aggregations help you answer questions like: * What's the average response time for my servers? * What are the top IP addresses hit by users on my network? * What is the total transaction revenue by customer? *Search multiple data streams and indices* + You can use comma-separated values and grep-like index patterns to search several data streams and indices in the same request. You can even boost search results from specific indices. See <>. *Paginate search results* + By default, searches return only the top 10 matching hits. To retrieve more or fewer documents, see <>. *Retrieve selected fields* + The search response's `hit.hits` property includes the full document <> for each hit. To retrieve only a subset of the `_source` or other fields, see <>. *Sort search results* + By default, search hits are sorted by `_score`, a <> that measures how well each document matches the query. To customize the calculation of these scores, use the <> query. To sort search hits by other field values, see <>. *Run an async search* + {es} searches are designed to run on large volumes of data quickly, often returning results in milliseconds. For this reason, searches are _synchronous_ by default. The search request waits for complete results before returning a response. However, complete results can take longer for searches across <> or <>. To avoid long waits, you can run an _asynchronous_, or _async_, search instead. An <> lets you retrieve partial results for a long-running search now and get complete results later. [discrete] [[search-timeout]] === Search timeout By default, search requests don't time out. The request waits for complete results before returning a response. While <> is designed for long-running searches, you can also use the `timeout` parameter to specify a duration you'd like to wait for a search to complete. If no response is received before this period ends, the request fails and returns an error. [source,console] ---- GET /my-index-000001/_search { "timeout": "2s", "query": { "match": { "user.id": "kimchy" } } } ---- // TEST[setup:my_index] To set a cluster-wide default timeout for all search requests, configure `search.default_search_timeout` using the <>. This global timeout duration is used if no `timeout` argument is passed in the request. If the global search timeout expires before the search request finishes, the request is cancelled using <>. The `search.default_search_timeout` setting defaults to `-1` (no timeout). [discrete] [[global-search-cancellation]] === Search cancellation You can cancel a search request using the <>. {es} also automatically cancels a search request when your client's HTTP connection closes. We recommend you set up your client to close HTTP connections when a search request is aborted or times out. [discrete] [[track-total-hits]] === Track total hits Generally the total hit count can't be computed accurately without visiting all matches, which is costly for queries that match lots of documents. The `track_total_hits` parameter allows you to control how the total number of hits should be tracked. Given that it is often enough to have a lower bound of the number of hits, such as "there are at least 10000 hits", the default is set to `10,000`. This means that requests will count the total hit accurately up to `10,000` hits. It's is a good trade off to speed up searches if you don't need the accurate number of hits after a certain threshold. When set to `true` the search response will always track the number of hits that match the query accurately (e.g. `total.relation` will always be equal to `"eq"` when `track_total_hits` is set to true). Otherwise the `"total.relation"` returned in the `"total"` object in the search response determines how the `"total.value"` should be interpreted. A value of `"gte"` means that the `"total.value"` is a lower bound of the total hits that match the query and a value of `"eq"` indicates that `"total.value"` is the accurate count. [source,console] -------------------------------------------------- GET my-index-000001/_search { "track_total_hits": true, "query": { "match" : { "user.id" : "elkbee" } } } -------------------------------------------------- // TEST[setup:my_index] \... returns: [source,console-result] -------------------------------------------------- { "_shards": ... "timed_out": false, "took": 100, "hits": { "max_score": 1.0, "total" : { "value": 2048, <1> "relation": "eq" <2> }, "hits": ... } } -------------------------------------------------- // TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] // TESTRESPONSE[s/"took": 100/"took": $body.took/] // TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/] // TESTRESPONSE[s/"value": 2048/"value": $body.hits.total.value/] // TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/] <1> The total number of hits that match the query. <2> The count is accurate (e.g. `"eq"` means equals). It is also possible to set `track_total_hits` to an integer. For instance the following query will accurately track the total hit count that match the query up to 100 documents: [source,console] -------------------------------------------------- GET my-index-000001/_search { "track_total_hits": 100, "query": { "match": { "user.id": "elkbee" } } } -------------------------------------------------- // TEST[continued] The `hits.total.relation` in the response will indicate if the value returned in `hits.total.value` is accurate (`"eq"`) or a lower bound of the total (`"gte"`). For instance the following response: [source,console-result] -------------------------------------------------- { "_shards": ... "timed_out": false, "took": 30, "hits": { "max_score": 1.0, "total": { "value": 42, <1> "relation": "eq" <2> }, "hits": ... } } -------------------------------------------------- // TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] // TESTRESPONSE[s/"took": 30/"took": $body.took/] // TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/] // TESTRESPONSE[s/"value": 42/"value": $body.hits.total.value/] // TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/] <1> 42 documents match the query <2> and the count is accurate (`"eq"`) \... indicates that the number of hits returned in the `total` is accurate. If the total number of hits that match the query is greater than the value set in `track_total_hits`, the total hits in the response will indicate that the returned value is a lower bound: [source,console-result] -------------------------------------------------- { "_shards": ... "hits": { "max_score": 1.0, "total": { "value": 100, <1> "relation": "gte" <2> }, "hits": ... } } -------------------------------------------------- // TESTRESPONSE[skip:response is already tested in the previous snippet] <1> There are at least 100 documents that match the query <2> This is a lower bound (`"gte"`). If you don't need to track the total number of hits at all you can improve query times by setting this option to `false`: [source,console] -------------------------------------------------- GET my-index-000001/_search { "track_total_hits": false, "query": { "match": { "user.id": "elkbee" } } } -------------------------------------------------- // TEST[continued] \... returns: [source,console-result] -------------------------------------------------- { "_shards": ... "timed_out": false, "took": 10, "hits": { <1> "max_score": 1.0, "hits": ... } } -------------------------------------------------- // TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] // TESTRESPONSE[s/"took": 10/"took": $body.took/] // TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/] // TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/] <1> The total number of hits is unknown. Finally you can force an accurate count by setting `"track_total_hits"` to `true` in the request. [discrete] [[quickly-check-for-matching-docs]] === Quickly check for matching docs If you only want to know if there are any documents matching a specific query, you can set the `size` to `0` to indicate that we are not interested in the search results. You can also set `terminate_after` to `1` to indicate that the query execution can be terminated whenever the first matching document was found (per shard). [source,console] -------------------------------------------------- GET /_search?q=user.id:elkbee&size=0&terminate_after=1 -------------------------------------------------- // TEST[setup:my_index] NOTE: `terminate_after` is always applied **after** the <> and stops the query as well as the aggregation executions when enough hits have been collected on the shard. Though the doc count on aggregations may not reflect the `hits.total` in the response since aggregations are applied **before** the post filtering. The response will not contain any hits as the `size` was set to `0`. The `hits.total` will be either equal to `0`, indicating that there were no matching documents, or greater than `0` meaning that there were at least as many documents matching the query when it was early terminated. Also if the query was terminated early, the `terminated_early` flag will be set to `true` in the response. [source,console-result] -------------------------------------------------- { "took": 3, "timed_out": false, "terminated_early": true, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": null, "hits": [] } } -------------------------------------------------- // TESTRESPONSE[s/"took": 3/"took": $body.took/] The `took` time in the response contains the milliseconds that this request took for processing, beginning quickly after the node received the query, up until all search related work is done and before the above JSON is returned to the client. This means it includes the time spent waiting in thread pools, executing a distributed search across the whole cluster and gathering all the results. include::collapse-search-results.asciidoc[] include::filter-search-results.asciidoc[] include::highlighting.asciidoc[] include::long-running-searches.asciidoc[] include::near-real-time.asciidoc[] include::paginate-search-results.asciidoc[] include::retrieve-inner-hits.asciidoc[] include::retrieve-selected-fields.asciidoc[] include::search-across-clusters.asciidoc[] include::search-multiple-indices.asciidoc[] include::search-shard-routing.asciidoc[] include::sort-search-results.asciidoc[]