[DOCS] Prune `Search your data` content (#61303) (#61462)

Changes:
* Removes narrative around URI searches. These aren't commonly used in production. The `q` param is already covered in the search API docs: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-search.html#search-api-query-params-q
* Adds a common options section that highlights narrative docs for query DSL, aggregations, multi-index search, search fields, pagination, sorting, and async search.
* Adds a `Search shard routing` page. Moves narrative docs for adaptive replica selection, preference, routing , and shard limits to that section.
* Moves search timeout and cancellation content to the `Search your data` page.
* Creates a `Search multiple data streams and indices` page. Moves related narrative docs for multi-target syntax searches and `indices_boost` to that page.
* Removes narrative examples for the `search_type` parameters. Moves documentation for this parameter to the search API docs.
This commit is contained in:
James Rodewig 2020-08-24 09:31:53 -04:00 committed by GitHub
parent 0d8d0f423c
commit da89ff87bb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 522 additions and 464 deletions

View File

@ -65,7 +65,7 @@ include-tagged::{doc-tests-file}[{api}-request-item-extras]
<2> Version <2> Version
<3> Version type <3> Version type
{ref}/search-your-data.html#search-preference[`preference`], {ref}/search-search.html#search-preference[`preference`],
{ref}/docs-get.html#realtime[`realtime`] {ref}/docs-get.html#realtime[`realtime`]
and and
{ref}/docs-get.html#get-refresh[`refresh`] can be set on the main request but {ref}/docs-get.html#get-refresh[`refresh`] can be set on the main request but

View File

@ -1,156 +1,51 @@
[[search]] [[search]]
== Search APIs == Search APIs
Search APIs are used to search and aggregate data stored in {es} indices and
data streams. For an overview and related tutorials, see <<search-your-data>>.
Most search APIs support <<multi-index,multi-target syntax>>, with the Most search APIs support <<multi-index,multi-target syntax>>, with the
exception of the <<search-explain>> endpoints. exception of the <<search-explain,explain API>>.
[discrete] [discrete]
[[search-routing]] [[core-search-apis]]
=== Routing === Core search
When executing a search, Elasticsearch will pick the "best" copy of the data * <<search-search>>
based on the <<search-adaptive-replica,adaptive replica selection>> formula. * <<search-multi-search>>
Which shards will be searched on can also be controlled by providing the * <<async-search>>
`routing` parameter. * <<scroll-api>>
* <<clear-scroll-api>>
For example, the following indexing request routes documents to shard `1`: * <<search-suggesters>>
[source,console]
--------------------------------------------------
POST /my-index-000001/_doc?routing=1
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
--------------------------------------------------
Later, you can use the `routing` parameter in a search request to search only
the specified shard. The following search requests hits only shard `1`.
[source,console]
--------------------------------------------------
POST /my-index-000001/_search?routing=1
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "some query string here"
}
},
"filter": {
"term": { "user.id": "kimchy" }
}
}
}
}
--------------------------------------------------
// TEST[continued]
The routing parameter can be multi valued represented as a comma
separated string. This will result in hitting the relevant shards where
the routing values match to.
[discrete] [discrete]
[[search-adaptive-replica]] [[search-testing-apis]]
=== Adaptive Replica Selection === Search testing
By default, Elasticsearch will use what is called adaptive replica selection. * <<search-explain>>
This allows the coordinating node to send the request to the copy deemed "best" * <<search-field-caps>>
based on a number of criteria: * <<search-profile>>
* <<search-rank-eval>>
- Response time of past requests between the coordinating node and the node * <<search-shards>>
containing the copy of the data * <<search-validate>>
- Time past search requests took to execute on the node containing the data
- The queue size of the search threadpool on the node containing the data
This can be turned off by changing the dynamic cluster setting
`cluster.routing.use_adaptive_replica_selection` from `true` to `false`:
[source,console]
--------------------------------------------------
PUT /_cluster/settings
{
"transient": {
"cluster.routing.use_adaptive_replica_selection": false
}
}
--------------------------------------------------
If adaptive replica selection is turned off, searches are sent to the
index/indices shards in a round robin fashion between all copies of the data
(primaries and replicas).
[discrete] [discrete]
[[stats-groups]] [[search-template-apis]]
=== Stats Groups === Search templates
A search can be associated with stats groups, which maintains a * <<search-template>>
statistics aggregation per group. It can later be retrieved using the * <<multi-search-template>>
<<indices-stats,indices stats>> API
specifically. For example, here is a search body request that associate
the request with two different groups:
[source,console]
--------------------------------------------------
POST /_search
{
"query" : {
"match_all" : {}
},
"stats" : ["group1", "group2"]
}
--------------------------------------------------
// TEST[setup:my_index]
[discrete] [discrete]
[[global-search-timeout]] [[eql-search-apis]]
=== Global Search Timeout === EQL search
Individual searches can have a timeout as part of the For an overview of EQL and related tutorials, see <<eql>>.
<<search-request-body>>. Since search requests can originate from many
sources, Elasticsearch has a dynamic cluster-level setting for a global
search timeout that applies to all search requests that do not set a
timeout in the request body. These requests will be cancelled after
the specified time using the mechanism described in the following section on
<<global-search-cancellation>>. Therefore the same caveats about timeout
responsiveness apply.
The setting key is `search.default_search_timeout` and can be set using the * <<eql-search-api>>
<<cluster-update-settings>> endpoints. The default value is no global timeout. * <<get-async-eql-search-api>>
Setting this value to `-1` resets the global search timeout to no timeout. * <<delete-async-eql-search-api>>
[discrete]
[[global-search-cancellation]]
=== Search Cancellation
Searches can be cancelled using standard <<task-cancellation,task cancellation>>
mechanism and are also automatically cancelled when the http connection used to
perform the request is closed by the client. It is fundamental that the http
client sending requests closes connections whenever requests time out or are
aborted.
[discrete]
[[search-concurrency-and-parallelism]]
=== Search concurrency and parallelism
By default Elasticsearch doesn't reject any search requests based on the number
of shards the request hits. While Elasticsearch will optimize the search
execution on the coordinating node a large number of shards can have a
significant impact CPU and memory wise. It is usually a better idea to organize
data in such a way that there are fewer larger shards. In case you would like to
configure a soft limit, you can update the `action.search.shard_count.limit`
cluster setting in order to reject search requests that hit too many shards.
The request parameter `max_concurrent_shard_requests` can be used to control the
maximum number of concurrent shard requests the search API will execute per node
for the request. This parameter should be used to protect a single request from
overloading a cluster (e.g., a default request will hit all indices in a cluster
which could cause shard request rejections if the number of shards per node is
high). This default value is `5`.
include::search/search.asciidoc[] include::search/search.asciidoc[]

View File

@ -1,37 +0,0 @@
[discrete]
[[index-boost]]
=== Index boost
When searching multiple indices, you can use the `indices_boost` parameter to
boost results from one or more specified indices. This is useful when hits
coming from one index matter more than hits coming from another index.
[source,console]
--------------------------------------------------
GET /_search
{
"indices_boost": [
{ "my-index-000001": 1.4 },
{ "my-index-000002": 1.3 }
]
}
--------------------------------------------------
// TEST[s/^/PUT my-index-000001\nPUT my-index-000002\n/]
You can also specify it as an array to control the order of boosts.
[source,console]
--------------------------------------------------
GET /_search
{
"indices_boost": [
{ "my-alias": 1.4 },
{ "my-index*": 1.3 }
]
}
--------------------------------------------------
// TEST[s/^/PUT my-index-000001\nPUT my-index-000001\/_alias\/my-alias\n/]
This is important when you use aliases or wildcard expression.
If multiple matches are found, the first match will be used.
For example, if an index is included in both `alias1` and `index*`, boost value of `1.4` is applied.

View File

@ -1,80 +0,0 @@
[discrete]
[[search-preference]]
=== Preference
You can use the `preference` parameter to control the shard copies on which a search runs. By
default, Elasticsearch selects from the available shard copies in an
unspecified order, taking the <<shard-allocation-awareness,allocation awareness>> and
<<search-adaptive-replica,adaptive replica selection>> configuration into
account. However, it may sometimes be desirable to try and route certain
searches to certain sets of shard copies.
A possible use case would be to make use of per-copy caches like the
<<shard-request-cache,request cache>>. Doing this, however, runs contrary to the
idea of search parallelization and can create hotspots on certain nodes because
the load might not be evenly distributed anymore.
The `preference` is a query string parameter which can be set to:
[horizontal]
`_only_local`::
The operation will be executed only on shards allocated to the local
node.
`_local`::
The operation will be executed on shards allocated to the local node if
possible, and will fall back to other shards if not.
`_prefer_nodes:abc,xyz`::
The operation will be executed on nodes with one of the provided node
ids (`abc` or `xyz` in this case) if possible. If suitable shard copies
exist on more than one of the selected nodes then the order of
preference between these copies is unspecified.
`_shards:2,3`::
Restricts the operation to the specified shards. (`2` and `3` in this
case). This preference can be combined with other preferences but it
has to appear first: `_shards:2,3|_local`
`_only_nodes:abc*,x*yz,...`::
Restricts the operation to nodes specified according to the
<<cluster,node specification>>. If suitable shard copies exist on more
than one of the selected nodes then the order of preference between
these copies is unspecified.
Custom (string) value::
Any value that does not start with `_`. If two searches both give the same
custom string value for their preference and the underlying cluster state
does not change then the same ordering of shards will be used for the
searches. This does not guarantee that the exact same shards will be used
each time: the cluster state, and therefore the selected shards, may change
for a number of reasons including shard relocations and shard failures, and
nodes may sometimes reject searches causing fallbacks to alternative nodes.
However, in practice the ordering of shards tends to remain stable for long
periods of time. A good candidate for a custom preference value is something
like the web session id or the user name.
For instance, use the user's session ID `xyzabc123` as follows:
[source,console]
------------------------------------------------
GET /_search?preference=xyzabc123
{
"query": {
"match": {
"title": "elasticsearch"
}
}
}
------------------------------------------------
This can be an effective strategy to increase usage of e.g. the request cache for
unique users running similar searches repeatedly by always hitting the same cache, while
requests of different users are still spread across all shard copies.
NOTE: The `_only_local` preference guarantees only to use shard copies on the
local node, which is sometimes useful for troubleshooting. All other options do
not _fully_ guarantee that any particular shard copies are used in a search,
and on a changing index this may mean that repeated searches may yield
different results if they are executed on different shard copies which are in
different refresh states.

View File

@ -1,78 +0,0 @@
[discrete]
[[search-type]]
=== Search type
There are different execution paths that can be done when executing a
distributed search. The distributed search operation needs to be
scattered to all the relevant shards and then all the results are
gathered back. When doing scatter/gather type execution, there are
several ways to do that, specifically with search engines.
One of the questions when executing a distributed search is how many
results to retrieve from each shard. For example, if we have 10 shards,
the 1st shard might hold the most relevant results from 0 till 10, with
other shards results ranking below it. For this reason, when executing a
request, we will need to get results from 0 till 10 from all shards,
sort them, and then return the results if we want to ensure correct
results.
Another question, which relates to the search engine, is the fact that each
shard stands on its own. When a query is executed on a specific shard,
it does not take into account term frequencies and other search engine
information from the other shards. If we want to support accurate
ranking, we would need to first gather the term frequencies from all
shards to calculate global term frequencies, then execute the query on
each shard using these global frequencies.
Also, because of the need to sort the results, getting back a large
document set, or even scrolling it, while maintaining the correct sorting
behavior can be a very expensive operation. For large result set
scrolling, it is best to sort by `_doc` if the order in which documents
are returned is not important.
Elasticsearch is very flexible and allows to control the type of search
to execute on a *per search request* basis. The type can be configured
by setting the *search_type* parameter in the query string. The types
are:
[discrete]
[[query-then-fetch]]
==== Query Then Fetch
Parameter value: *query_then_fetch*.
The request is processed in two phases. In the first phase, the query
is forwarded to *all involved shards*. Each shard executes the search
and generates a sorted list of results, local to that shard. Each
shard returns *just enough information* to the coordinating node
to allow it to merge and re-sort the shard level results into a globally
sorted set of results, of maximum length `size`.
During the second phase, the coordinating node requests the document
content (and highlighted snippets, if any) from *only the relevant
shards*.
[source,console]
--------------------------------------------------
GET my-index-000001/_search?search_type=query_then_fetch
--------------------------------------------------
// TEST[setup:my_index]
NOTE: This is the default setting, if you do not specify a `search_type`
in your request.
[discrete]
[[dfs-query-then-fetch]]
==== Dfs, Query Then Fetch
Parameter value: *dfs_query_then_fetch*.
Same as "Query Then Fetch", except for an initial scatter phase which
goes and computes the distributed term frequencies for more accurate
scoring.
[source,console]
--------------------------------------------------
GET my-index-000001/_search?search_type=dfs_query_then_fetch
--------------------------------------------------
// TEST[setup:my_index]

View File

@ -0,0 +1,117 @@
[[search-multiple-indices]]
== Search multiple data streams and indices
To search multiple data streams and indices, add them as comma-separated values
in the <<search-search,search API>>'s request path.
The following request searches the `my-index-000001` and `my-index-000002`
indices.
[source,console]
----
GET /my-index-000001,my-index-000002/_search
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
----
// TEST[setup:my_index]
// TEST[s/^/PUT my-index-000002\n/]
You can also search multiple data streams and indices using an index pattern.
The following request targets the `my-index-*` index pattern. The request
searches any data streams or indices in the cluster that start with `my-index-`.
[source,console]
----
GET /my-index-*/_search
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
----
// TEST[setup:my_index]
To search all data streams and indices in a cluster, omit the target from the
request path. Alternatively, you can use `_all` or `*`.
The following requests are equivalent and search all data streams and indices in
the cluster.
[source,console]
----
GET /_search
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
GET /_all/_search
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
GET /*/_search
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
----
// TEST[setup:my_index]
[discrete]
[[index-boost]]
=== Index boost
When searching multiple indices, you can use the `indices_boost` parameter to
boost results from one or more specified indices. This is useful when hits
coming from some indices matter more than hits from other.
NOTE: You cannot use `indices_boost` with data streams.
[source,console]
--------------------------------------------------
GET /_search
{
"indices_boost": [
{ "my-index-000001": 1.4 },
{ "my-index-000002": 1.3 }
]
}
--------------------------------------------------
// TEST[s/^/PUT my-index-000001\nPUT my-index-000002\n/]
Index aliases and index patterns can also be used:
[source,console]
--------------------------------------------------
GET /_search
{
"indices_boost": [
{ "my-alias": 1.4 },
{ "my-index*": 1.3 }
]
}
--------------------------------------------------
// TEST[s/^/PUT my-index-000001\nPUT my-index-000001\/_alias\/my-alias\n/]
If multiple matches are found, the first match will be used. For example, if an
index is included in `alias1` and matches the `my-index*` pattern, a boost value
of `1.4` is applied.

View File

@ -0,0 +1,184 @@
[[search-shard-routing]]
== Search shard routing
To protect against hardware failure and increase search capacity, {es} can store
copies of an index's data across multiple shards on multiple nodes. When running
a search request, {es} selects a node containing a copy of the index's data and
forwards the search request to that node's shards. This process is known as
_search shard routing_ or _routing_.
[discrete]
[[search-adaptive-replica]]
=== Adaptive replica selection
By default, {es} uses _adaptive replica selection_ to route search requests.
This method selects an eligible node using <<allocation-awareness,allocation
awareness>> and the following criteria:
* Response time of prior requests between the coordinating node
and the eligible node
* How long the eligible node took to run previous searches
* Queue size of the eligible node's `search` <<modules-threadpool,threadpool>>
Adaptive replica selection is designed to decrease search latency. However, you
can disable adaptive replica selection by setting
`cluster.routing.use_adaptive_replica_selection` to `false` using the
<<cluster-update-settings,cluster settings API>>. If disabled, {es} routes
search requests using a round-robin method, which may result in slower searches.
[discrete]
[[shard-and-node-preference]]
=== Set a preference
By default, adaptive replica selection chooses from all eligible nodes and
shards. However, you may only want data from a local node or want to route
searches to a specific node based on its hardware. Or you may want to send
repeated searches to the same shard to take advantage of caching.
To limit the set of nodes and shards eligible for a search request, use
the search API's <<search-preference,`preference`>> query parameter.
For example, the following request searches `my-index-000001` with a
`preference` of `_local`. This restricts the search to shards on the
local node. If the local node contains no shard copies of the index's data, the
request uses adaptive replica selection to another eligible node
as a fallback.
[source,console]
----
GET /my-index-000001/_search?preference=_local
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
----
// TEST[setup:my_index]
You can also use the `preference` parameter to route searches to specific shards
based on a provided string. If the cluster state and selected shards
do not change, searches using the same `preference` string are routed to the
same shards in the same order.
We recommend using a unique `preference` string, such as a user name or web
session ID. This string cannot start with a `_`.
TIP: You can use this option to serve cached results for frequently used and
resource-intensive searches. If the shard's data doesn't change, repeated
searches with the same `preference` string retrieve results from the same
<<shard-request-cache,shard request cache>>. For time-series use cases, such as
logging, data in older indices is rarely updated and can be served directly from
this cache.
The following request searches `my-index-000001` with a `preference` string of
`my-custom-shard-string`.
[source,console]
----
GET /my-index-000001/_search?preference=my-custom-shard-string
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
----
// TEST[setup:my_index]
NOTE: If the cluster state or selected shards change, the same `preference`
string may not route searches to the same shards in the same order. This can
occur for a number of reasons, including shard relocations and shard failures. A
node can also reject a search request, which {es} would re-route to another
node.
[discrete]
[[search-routing]]
=== Use a routing value
When you index a document, you can specify an optional
<<mapping-routing-field,routing value>>, which routes the document to a
specific shard.
For example, the following indexing request routes a document using
`my-routing-value`.
[source,console]
----
POST /my-index-000001/_doc?routing=my-routing-value
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
----
You can use the same routing value in the search API's `routing` query
parameter. This ensures the search runs on the same shard used to index the
document.
[source,console]
----
GET /my-index-000001/_search?routing=my-routing-value
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
----
// TEST[setup:my_index]
You can also provide multiple comma-separated routing values:
[source,console]
----
GET /my-index-000001/_search?routing=my-routing-value,my-routing-value-2
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
----
// TEST[setup:my_index]
[discrete]
[[search-concurrency-and-parallelism]]
=== Search concurrency and parallelism
By default, {es} doesn't reject search requests based on the number of shards
the request hits. However, hitting a large number of shards can significantly
increase CPU and memory usage.
TIP: For tips on preventing indices with large numbers of shards, see
<<avoid-oversharding>>.
You can use the `max_concurrent_shard_requests` query parameter to control
maximum number of concurrent shards a search request can hit per node. This
prevents a single request from overloading a cluster. The parameter defaults to
a maximum of `5`.
[source,console]
----
GET /my-index-000001/_search?max_concurrent_shard_requests=3
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
----
// TEST[setup:my_index]
You can also use the `action.search.shard_count.limit` cluster setting to set a
search shard limit and reject requests that hit too many shards. You can
configure `action.search.shard_count.limit` using the
<<cluster-update-settings,cluster settings API>>.

View File

@ -2,7 +2,7 @@
= Search your data = Search your data
[[search-query]] [[search-query]]
A _search query_, or _query_, is a request for information about data in A _search query_, or _query_, is a request for information about data in
{es} data streams or indices. {es} data streams or indices.
You can think of a query as a question, written in a way {es} understands. You can think of a query as a question, written in a way {es} understands.
@ -24,55 +24,30 @@ a specific number of results.
[[run-an-es-search]] [[run-an-es-search]]
== Run a search == Run a search
You can use the <<search-search,search API>> to search data stored in You can use the <<search-search,search API>> to search and
{es} data streams or indices. <<search-aggregations,aggregate>> data stored in {es} data streams or indices.
The API's `query` request body parameter accepts queries written in
<<query-dsl,Query DSL>>.
The API can run two types of searches, depending on how you provide The following request searches `my-index-000001` using a
queries: <<query-dsl-match-query,`match`>> query. This query matches documents with a
`user.id` value of `kimchy`.
<<run-uri-search,URI searches>>::
Queries are provided through a query parameter. URI searches tend to be
simpler and best suited for testing.
<<run-request-body-search,Request body searches>>::
Queries are provided through the JSON body of the API request. These queries
are written in <<query-dsl,Query DSL>>. We recommend using request body
searches in most production use cases.
[WARNING]
====
If you specify a query in both the URI and request body, the search API request
runs only the URI query.
====
[discrete]
[[run-uri-search]]
=== Run a URI search
You can use the search API's <<search-api-query-params-q,`q` query string
parameter>> to run a search in the request's URI. The `q` parameter only accepts
queries written in Lucene's <<query-string-syntax,query string syntax>>.
The following URI search matches documents with a `user.id` value of `kimchy`.
[source,console] [source,console]
---- ----
GET /my-index-000001/_search?q=user.id:kimchy GET /my-index-000001/_search
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
---- ----
// TEST[setup:my_index] // TEST[setup:my_index]
The API returns the following response. The API response returns the top 10 documents matching the query in the
`hits.hits` property.
By default, the `hits.hits` property returns the top 10 documents matching the
query. To retrieve more documents, see <<paginate-search-results>>.
The response sorts documents in `hits.hits` by `_score`, a
<<relevance-scores,relevance score>> that measures how well each document
matches the query.
The `hit.hits` property also includes the <<mapping-source-field,`_source`>> for
each matching document. To retrieve only a subset of the `_source` or other
fields, see <<search-fields>>.
[source,console-result] [source,console-result]
---- ----
@ -126,20 +101,84 @@ fields, see <<search-fields>>.
// TESTRESPONSE[s/"_id": "kxWFcnMByiguvud1Z8vC"/"_id": "$body.hits.hits.0._id"/] // TESTRESPONSE[s/"_id": "kxWFcnMByiguvud1Z8vC"/"_id": "$body.hits.hits.0._id"/]
[discrete] [discrete]
[[run-request-body-search]] [[common-search-options]]
=== Run a request body search === Common search options
You can use the search API's <<request-body-search-query,`query` request You can use the following options to customize your searches.
body parameter>> to provide a query as a JSON object, written in
<<query-dsl,Query DSL>>.
The following request body search uses the <<query-dsl-match-query,`match`>> *Query DSL* +
query to match documents with a `user.id` value of `kimchy`. <<query-dsl,Query DSL>> supports a variety of query types you can mix and match
to get the results you want. Query types include:
* <<query-dsl-bool-query,Boolean>> and other <<compound-queries,compound
queries>>, which let you combine queries and match results based on multiple
criteria
* <<term-level-queries,Term-level queries>> for filtering and finding exact matches
* <<full-text-queries,Full text queries>>, which are commonly used in search
engines
* <<geo-queries,Geo>> and <<shape-queries,spatial queries>>
*Aggregations* +
You can use <<search-aggregations,search aggregations>> to get statistics and
other analytics for your search results. Aggregations help you answer questions
like:
* What's the average response time for my servers?
* What are the top IP addresses hit by users on my network?
* What is the total transaction revenue by customer?
*Search multiple data streams and indices* +
You can use comma-separated values and grep-like index patterns to search
several data streams and indices in the same request. You can even boost search
results from specific indices. See <<search-multiple-indices>>.
*Paginate search results* +
By default, searches return only the top 10 matching hits. To retrieve
more or fewer documents, see <<paginate-search-results>>.
*Retrieve selected fields* +
The search response's `hit.hits` property includes the full document
<<mapping-source-field,`_source`>> for each hit. To retrieve only a subset of
the `_source` or other fields, see <<search-fields>>.
*Sort search results* +
By default, search hits are sorted by `_score`, a <<relevance-scores,relevance
score>> that measures how well each document matches the query. To customize the
calculation of these scores, use the
<<query-dsl-script-score-query,`script_score`>> query. To sort search hits by
other field values, see <<sort-search-results>>.
*Run an async search* +
{es} searches are designed to run on large volumes of data quickly, often
returning results in milliseconds. For this reason, searches are
_synchronous_ by default. The search request waits for complete results before
returning a response.
However, complete results can take longer for searches across
<<frozen-indices,frozen indices>> or <<modules-cross-cluster-search,multiple
clusters>>.
To avoid long waits, you can use run an _asynchronous_, or _async_, search
instead. An <<async-search-intro,async search>> lets you retrieve partial
results for a long-running search now and get complete results later.
[discrete]
[[search-timeout]]
=== Search timeout
By default, search requests don't time out. The request waits for complete
results before returning a response.
While <<async-search-intro,async search>> is designed for long-running
searches, you can also use the `timeout` parameter to specify a duration you'd
like to wait for a search to complete. If no response is received before this
period ends, the request fails and returns an error.
[source,console] [source,console]
---- ----
GET /my-index-000001/_search GET /my-index-000001/_search
{ {
"timeout": "2s",
"query": { "query": {
"match": { "match": {
"user.id": "kimchy" "user.id": "kimchy"
@ -149,88 +188,23 @@ GET /my-index-000001/_search
---- ----
// TEST[setup:my_index] // TEST[setup:my_index]
To set a cluster-wide default timeout for all search requests, configure
`search.default_search_timeout` using the <<cluster-update-settings,cluster
settings API>>. This global timeout duration is used if no `timeout` argument is
passed in the request. If the global search timeout expires before the search
request finishes, the request is cancelled using <<task-cancellation,task
cancellation>>. The `search.default_search_timeout` setting defaults to `-1` (no
timeout).
[discrete] [discrete]
[[search-multiple-indices]] [[global-search-cancellation]]
=== Search multiple data streams and indices === Search cancellation
To search multiple data streams and indices, add them as comma-separated values You can cancel a search request using the <<task-cancellation,task management
in the search API request path. API>>. {es} also automatically cancels a search request when your client's HTTP
connection closes. We recommend you set up your client to close HTTP connections
when a search request is aborted or times out.
The following request searches the `my-index-000001` and `my-index-000002`
indices.
[source,console]
----
GET /my-index-000001,my-index-000002/_search
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
----
// TEST[setup:my_index]
// TEST[s/^/PUT my-index-000002\n/]
You can also search multiple data streams and indices using a wildcard (`*`)
pattern.
The following request targets the wildcard pattern `my-index-*`. The request
searches any data streams or indices in the cluster that start with `my-index-`.
[source,console]
----
GET /my-index-*/_search
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
----
// TEST[setup:my_index]
To search all data streams and indices in a cluster, omit the target from the
request path. Alternatively, you can use `_all` or `*`.
The following requests are equivalent and search all data streams and indices in the cluster.
[source,console]
----
GET /_search
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
GET /_all/_search
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
GET /*/_search
{
"query": {
"match": {
"user.id": "kimchy"
}
}
}
----
// TEST[setup:my_index]
include::request/index-boost.asciidoc[]
include::request/preference.asciidoc[]
include::request/search-type.asciidoc[]
include::request/track-total-hits.asciidoc[] include::request/track-total-hits.asciidoc[]
include::quickly-check-for-matching-docs.asciidoc[] include::quickly-check-for-matching-docs.asciidoc[]
@ -243,4 +217,6 @@ include::paginate-search-results.asciidoc[]
include::request/inner-hits.asciidoc[] include::request/inner-hits.asciidoc[]
include::search-fields.asciidoc[] include::search-fields.asciidoc[]
include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[] include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[]
include::search-multiple-indices.asciidoc[]
include::search-shard-routing.asciidoc[]
include::request/sort.asciidoc[] include::request/sort.asciidoc[]

View File

@ -129,9 +129,44 @@ When unspecified, the pre-filter phase is executed if any of these conditions is
- The request targets one or more read-only index. - The request targets one or more read-only index.
- The primary sort of the query targets an indexed field. - The primary sort of the query targets an indexed field.
[[search-preference]]
`preference`:: `preference`::
(Optional, string) Specifies the node or shard the operation should be (Optional, string)
performed on. Random by default. Nodes and shards used for the search. By default, {es} selects from eligible
nodes and shards using <<search-adaptive-replica,adaptive replica selection>>,
accounting for <<shard-allocation-awareness,allocation awareness>>.
+
.Valid values for `preference`
[%collapsible%open]
====
`_only_local`::
Run the search only on shards on the local node.
`_local`::
If possible, run the search on shards on the local node. If not, select shards
using the default method.
`_only_nodes:<node-id>,<node-id>`::
Run the search on only the specified nodes IDs. If suitable shards exist on more
than one selected nodes, use shards on those nodes using the default method. If
none of the specified nodes are available, select shards from any available node
using the default method.
`_prefer_nodes:<node-id>,<node-id>`::
If possible, run the search on the specified nodes IDs. If not, select shards
using the default method.
`_shards:<shard>,<shard>`::
Run the search only on the specified shards. This value can be combined with
other `preference` values, but this value must come first. For example:
`_shards:2,3|_local`
<custom-string>::
Any string that does not start with `_`. If the cluster state and selected
shards do not change, searches using the same `<custom-string>` value are routed
to the same shards in the same order.
====
[[search-api-query-params-q]] [[search-api-query-params-q]]
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search-q] include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search-q]
@ -164,7 +199,28 @@ Period to retain the <<scroll-search-context,search context>> for scrolling. See
By default, this value cannot exceed `1d` (24 hours). You can change By default, this value cannot exceed `1d` (24 hours). You can change
this limit using the `search.max_keep_alive` cluster-level setting. this limit using the `search.max_keep_alive` cluster-level setting.
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search_type] [[search-type]]
`search_type`::
(Optional, string)
How {wikipedia}/Tfidf[distributed term frequencies] are calculated for
<<relevance-scores,relevance scoring>>.
+
.Valid values for `search_type`
[%collapsible%open]
====
`query_then_fetch`::
(Default)
Distributed term frequencies are calculated locally for each shard running the
search. We recommend this option for faster searches with potentially less
accurate scoring.
[[dfs-query-then-fetch]]
`dfs_query_then_fetch`::
Distributed term frequencies are calculated globally, using information gathered
from all shards running the search. While this option increases the accuracy of
scoring, it adds a round-trip to each shard, which can result in slower
searches.
====
`seq_no_primary_term`:: `seq_no_primary_term`::
(Optional, boolean) If `true`, returns sequence number and primary term of the (Optional, boolean) If `true`, returns sequence number and primary term of the
@ -284,7 +340,7 @@ You can specify items in the array as a string or object.
See <<docvalue-fields>>. See <<docvalue-fields>>.
+ +
.Properties of `docvalue_fields` objects .Properties of `docvalue_fields` objects
[%collapsible] [%collapsible%open]
==== ====
`field`:: `field`::
(Required, string) (Required, string)
@ -326,6 +382,24 @@ As an alternative to deep paging, we recommend using
<<search-after,`search_after`>> parameter. <<search-after,`search_after`>> parameter.
-- --
`indices_boost`::
(Optional, array of objects)
Boosts the <<relevance-scores,`_score`>> of documents from specified indices.
+
.Properties of `indices_boost` objects
[%collapsible%open]
====
`<index>: <boost-value>`::
(Required, float)
`<index>` is the name of the index or index alias. Wildcard (`*`) expressions
are supported.
+
`<boost-value>` is the factor by which scores are multiplied.
+
A boost value greater than `1.0` increases the score. A boost value between
`0` and `1.0` decreases the score.
====
[[search-api-min-score]] [[search-api-min-score]]
`min_score`:: `min_score`::
(Optional, float) (Optional, float)
@ -409,6 +483,13 @@ exclude fields from this subset using the `excludes` property.
===== =====
==== ====
[[stats-groups]]
`stats`::
(Optional, array of strings)
Stats groups to associate with the search. Each group maintains a statistics
aggregation for its associated searches. You can retrieve these stats using the
<<indices-stats,indices stats API>>.
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=terminate_after] include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=terminate_after]
+ +
Defaults to `0`, which does not terminate query execution early. Defaults to `0`, which does not terminate query execution early.