Changes: * Removes narrative around URI searches. These aren't commonly used in production. The `q` param is already covered in the search API docs: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-search.html#search-api-query-params-q * Adds a common options section that highlights narrative docs for query DSL, aggregations, multi-index search, search fields, pagination, sorting, and async search. * Adds a `Search shard routing` page. Moves narrative docs for adaptive replica selection, preference, routing , and shard limits to that section. * Moves search timeout and cancellation content to the `Search your data` page. * Creates a `Search multiple data streams and indices` page. Moves related narrative docs for multi-target syntax searches and `indices_boost` to that page. * Removes narrative examples for the `search_type` parameters. Moves documentation for this parameter to the search API docs.
This commit is contained in:
parent
0d8d0f423c
commit
da89ff87bb
|
@ -65,7 +65,7 @@ include-tagged::{doc-tests-file}[{api}-request-item-extras]
|
|||
<2> Version
|
||||
<3> Version type
|
||||
|
||||
{ref}/search-your-data.html#search-preference[`preference`],
|
||||
{ref}/search-search.html#search-preference[`preference`],
|
||||
{ref}/docs-get.html#realtime[`realtime`]
|
||||
and
|
||||
{ref}/docs-get.html#get-refresh[`refresh`] can be set on the main request but
|
||||
|
|
|
@ -1,156 +1,51 @@
|
|||
[[search]]
|
||||
== Search APIs
|
||||
|
||||
Search APIs are used to search and aggregate data stored in {es} indices and
|
||||
data streams. For an overview and related tutorials, see <<search-your-data>>.
|
||||
|
||||
Most search APIs support <<multi-index,multi-target syntax>>, with the
|
||||
exception of the <<search-explain>> endpoints.
|
||||
exception of the <<search-explain,explain API>>.
|
||||
|
||||
[discrete]
|
||||
[[search-routing]]
|
||||
=== Routing
|
||||
[[core-search-apis]]
|
||||
=== Core search
|
||||
|
||||
When executing a search, Elasticsearch will pick the "best" copy of the data
|
||||
based on the <<search-adaptive-replica,adaptive replica selection>> formula.
|
||||
Which shards will be searched on can also be controlled by providing the
|
||||
`routing` parameter.
|
||||
|
||||
For example, the following indexing request routes documents to shard `1`:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
POST /my-index-000001/_doc?routing=1
|
||||
{
|
||||
"@timestamp": "2099-11-15T13:12:00",
|
||||
"message": "GET /search HTTP/1.1 200 1070000",
|
||||
"user": {
|
||||
"id": "kimchy"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Later, you can use the `routing` parameter in a search request to search only
|
||||
the specified shard. The following search requests hits only shard `1`.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
POST /my-index-000001/_search?routing=1
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"must": {
|
||||
"query_string": {
|
||||
"query": "some query string here"
|
||||
}
|
||||
},
|
||||
"filter": {
|
||||
"term": { "user.id": "kimchy" }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[continued]
|
||||
|
||||
The routing parameter can be multi valued represented as a comma
|
||||
separated string. This will result in hitting the relevant shards where
|
||||
the routing values match to.
|
||||
* <<search-search>>
|
||||
* <<search-multi-search>>
|
||||
* <<async-search>>
|
||||
* <<scroll-api>>
|
||||
* <<clear-scroll-api>>
|
||||
* <<search-suggesters>>
|
||||
|
||||
[discrete]
|
||||
[[search-adaptive-replica]]
|
||||
=== Adaptive Replica Selection
|
||||
[[search-testing-apis]]
|
||||
=== Search testing
|
||||
|
||||
By default, Elasticsearch will use what is called adaptive replica selection.
|
||||
This allows the coordinating node to send the request to the copy deemed "best"
|
||||
based on a number of criteria:
|
||||
|
||||
- Response time of past requests between the coordinating node and the node
|
||||
containing the copy of the data
|
||||
- Time past search requests took to execute on the node containing the data
|
||||
- The queue size of the search threadpool on the node containing the data
|
||||
|
||||
This can be turned off by changing the dynamic cluster setting
|
||||
`cluster.routing.use_adaptive_replica_selection` from `true` to `false`:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT /_cluster/settings
|
||||
{
|
||||
"transient": {
|
||||
"cluster.routing.use_adaptive_replica_selection": false
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If adaptive replica selection is turned off, searches are sent to the
|
||||
index/indices shards in a round robin fashion between all copies of the data
|
||||
(primaries and replicas).
|
||||
* <<search-explain>>
|
||||
* <<search-field-caps>>
|
||||
* <<search-profile>>
|
||||
* <<search-rank-eval>>
|
||||
* <<search-shards>>
|
||||
* <<search-validate>>
|
||||
|
||||
[discrete]
|
||||
[[stats-groups]]
|
||||
=== Stats Groups
|
||||
[[search-template-apis]]
|
||||
=== Search templates
|
||||
|
||||
A search can be associated with stats groups, which maintains a
|
||||
statistics aggregation per group. It can later be retrieved using the
|
||||
<<indices-stats,indices stats>> API
|
||||
specifically. For example, here is a search body request that associate
|
||||
the request with two different groups:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
POST /_search
|
||||
{
|
||||
"query" : {
|
||||
"match_all" : {}
|
||||
},
|
||||
"stats" : ["group1", "group2"]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
* <<search-template>>
|
||||
* <<multi-search-template>>
|
||||
|
||||
[discrete]
|
||||
[[global-search-timeout]]
|
||||
=== Global Search Timeout
|
||||
[[eql-search-apis]]
|
||||
=== EQL search
|
||||
|
||||
Individual searches can have a timeout as part of the
|
||||
<<search-request-body>>. Since search requests can originate from many
|
||||
sources, Elasticsearch has a dynamic cluster-level setting for a global
|
||||
search timeout that applies to all search requests that do not set a
|
||||
timeout in the request body. These requests will be cancelled after
|
||||
the specified time using the mechanism described in the following section on
|
||||
<<global-search-cancellation>>. Therefore the same caveats about timeout
|
||||
responsiveness apply.
|
||||
For an overview of EQL and related tutorials, see <<eql>>.
|
||||
|
||||
The setting key is `search.default_search_timeout` and can be set using the
|
||||
<<cluster-update-settings>> endpoints. The default value is no global timeout.
|
||||
Setting this value to `-1` resets the global search timeout to no timeout.
|
||||
* <<eql-search-api>>
|
||||
* <<get-async-eql-search-api>>
|
||||
* <<delete-async-eql-search-api>>
|
||||
|
||||
[discrete]
|
||||
[[global-search-cancellation]]
|
||||
=== Search Cancellation
|
||||
|
||||
Searches can be cancelled using standard <<task-cancellation,task cancellation>>
|
||||
mechanism and are also automatically cancelled when the http connection used to
|
||||
perform the request is closed by the client. It is fundamental that the http
|
||||
client sending requests closes connections whenever requests time out or are
|
||||
aborted.
|
||||
|
||||
[discrete]
|
||||
[[search-concurrency-and-parallelism]]
|
||||
=== Search concurrency and parallelism
|
||||
|
||||
By default Elasticsearch doesn't reject any search requests based on the number
|
||||
of shards the request hits. While Elasticsearch will optimize the search
|
||||
execution on the coordinating node a large number of shards can have a
|
||||
significant impact CPU and memory wise. It is usually a better idea to organize
|
||||
data in such a way that there are fewer larger shards. In case you would like to
|
||||
configure a soft limit, you can update the `action.search.shard_count.limit`
|
||||
cluster setting in order to reject search requests that hit too many shards.
|
||||
|
||||
The request parameter `max_concurrent_shard_requests` can be used to control the
|
||||
maximum number of concurrent shard requests the search API will execute per node
|
||||
for the request. This parameter should be used to protect a single request from
|
||||
overloading a cluster (e.g., a default request will hit all indices in a cluster
|
||||
which could cause shard request rejections if the number of shards per node is
|
||||
high). This default value is `5`.
|
||||
|
||||
include::search/search.asciidoc[]
|
||||
|
||||
|
|
|
@ -1,37 +0,0 @@
|
|||
[discrete]
|
||||
[[index-boost]]
|
||||
=== Index boost
|
||||
|
||||
When searching multiple indices, you can use the `indices_boost` parameter to
|
||||
boost results from one or more specified indices. This is useful when hits
|
||||
coming from one index matter more than hits coming from another index.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"indices_boost": [
|
||||
{ "my-index-000001": 1.4 },
|
||||
{ "my-index-000002": 1.3 }
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[s/^/PUT my-index-000001\nPUT my-index-000002\n/]
|
||||
|
||||
You can also specify it as an array to control the order of boosts.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"indices_boost": [
|
||||
{ "my-alias": 1.4 },
|
||||
{ "my-index*": 1.3 }
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[s/^/PUT my-index-000001\nPUT my-index-000001\/_alias\/my-alias\n/]
|
||||
|
||||
This is important when you use aliases or wildcard expression.
|
||||
If multiple matches are found, the first match will be used.
|
||||
For example, if an index is included in both `alias1` and `index*`, boost value of `1.4` is applied.
|
|
@ -1,80 +0,0 @@
|
|||
[discrete]
|
||||
[[search-preference]]
|
||||
=== Preference
|
||||
|
||||
You can use the `preference` parameter to control the shard copies on which a search runs. By
|
||||
default, Elasticsearch selects from the available shard copies in an
|
||||
unspecified order, taking the <<shard-allocation-awareness,allocation awareness>> and
|
||||
<<search-adaptive-replica,adaptive replica selection>> configuration into
|
||||
account. However, it may sometimes be desirable to try and route certain
|
||||
searches to certain sets of shard copies.
|
||||
|
||||
A possible use case would be to make use of per-copy caches like the
|
||||
<<shard-request-cache,request cache>>. Doing this, however, runs contrary to the
|
||||
idea of search parallelization and can create hotspots on certain nodes because
|
||||
the load might not be evenly distributed anymore.
|
||||
|
||||
The `preference` is a query string parameter which can be set to:
|
||||
|
||||
[horizontal]
|
||||
`_only_local`::
|
||||
The operation will be executed only on shards allocated to the local
|
||||
node.
|
||||
|
||||
`_local`::
|
||||
The operation will be executed on shards allocated to the local node if
|
||||
possible, and will fall back to other shards if not.
|
||||
|
||||
`_prefer_nodes:abc,xyz`::
|
||||
The operation will be executed on nodes with one of the provided node
|
||||
ids (`abc` or `xyz` in this case) if possible. If suitable shard copies
|
||||
exist on more than one of the selected nodes then the order of
|
||||
preference between these copies is unspecified.
|
||||
|
||||
`_shards:2,3`::
|
||||
Restricts the operation to the specified shards. (`2` and `3` in this
|
||||
case). This preference can be combined with other preferences but it
|
||||
has to appear first: `_shards:2,3|_local`
|
||||
|
||||
`_only_nodes:abc*,x*yz,...`::
|
||||
Restricts the operation to nodes specified according to the
|
||||
<<cluster,node specification>>. If suitable shard copies exist on more
|
||||
than one of the selected nodes then the order of preference between
|
||||
these copies is unspecified.
|
||||
|
||||
Custom (string) value::
|
||||
Any value that does not start with `_`. If two searches both give the same
|
||||
custom string value for their preference and the underlying cluster state
|
||||
does not change then the same ordering of shards will be used for the
|
||||
searches. This does not guarantee that the exact same shards will be used
|
||||
each time: the cluster state, and therefore the selected shards, may change
|
||||
for a number of reasons including shard relocations and shard failures, and
|
||||
nodes may sometimes reject searches causing fallbacks to alternative nodes.
|
||||
However, in practice the ordering of shards tends to remain stable for long
|
||||
periods of time. A good candidate for a custom preference value is something
|
||||
like the web session id or the user name.
|
||||
|
||||
For instance, use the user's session ID `xyzabc123` as follows:
|
||||
|
||||
[source,console]
|
||||
------------------------------------------------
|
||||
GET /_search?preference=xyzabc123
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"title": "elasticsearch"
|
||||
}
|
||||
}
|
||||
}
|
||||
------------------------------------------------
|
||||
|
||||
This can be an effective strategy to increase usage of e.g. the request cache for
|
||||
unique users running similar searches repeatedly by always hitting the same cache, while
|
||||
requests of different users are still spread across all shard copies.
|
||||
|
||||
NOTE: The `_only_local` preference guarantees only to use shard copies on the
|
||||
local node, which is sometimes useful for troubleshooting. All other options do
|
||||
not _fully_ guarantee that any particular shard copies are used in a search,
|
||||
and on a changing index this may mean that repeated searches may yield
|
||||
different results if they are executed on different shard copies which are in
|
||||
different refresh states.
|
|
@ -1,78 +0,0 @@
|
|||
[discrete]
|
||||
[[search-type]]
|
||||
=== Search type
|
||||
|
||||
There are different execution paths that can be done when executing a
|
||||
distributed search. The distributed search operation needs to be
|
||||
scattered to all the relevant shards and then all the results are
|
||||
gathered back. When doing scatter/gather type execution, there are
|
||||
several ways to do that, specifically with search engines.
|
||||
|
||||
One of the questions when executing a distributed search is how many
|
||||
results to retrieve from each shard. For example, if we have 10 shards,
|
||||
the 1st shard might hold the most relevant results from 0 till 10, with
|
||||
other shards results ranking below it. For this reason, when executing a
|
||||
request, we will need to get results from 0 till 10 from all shards,
|
||||
sort them, and then return the results if we want to ensure correct
|
||||
results.
|
||||
|
||||
Another question, which relates to the search engine, is the fact that each
|
||||
shard stands on its own. When a query is executed on a specific shard,
|
||||
it does not take into account term frequencies and other search engine
|
||||
information from the other shards. If we want to support accurate
|
||||
ranking, we would need to first gather the term frequencies from all
|
||||
shards to calculate global term frequencies, then execute the query on
|
||||
each shard using these global frequencies.
|
||||
|
||||
Also, because of the need to sort the results, getting back a large
|
||||
document set, or even scrolling it, while maintaining the correct sorting
|
||||
behavior can be a very expensive operation. For large result set
|
||||
scrolling, it is best to sort by `_doc` if the order in which documents
|
||||
are returned is not important.
|
||||
|
||||
Elasticsearch is very flexible and allows to control the type of search
|
||||
to execute on a *per search request* basis. The type can be configured
|
||||
by setting the *search_type* parameter in the query string. The types
|
||||
are:
|
||||
|
||||
[discrete]
|
||||
[[query-then-fetch]]
|
||||
==== Query Then Fetch
|
||||
|
||||
Parameter value: *query_then_fetch*.
|
||||
|
||||
The request is processed in two phases. In the first phase, the query
|
||||
is forwarded to *all involved shards*. Each shard executes the search
|
||||
and generates a sorted list of results, local to that shard. Each
|
||||
shard returns *just enough information* to the coordinating node
|
||||
to allow it to merge and re-sort the shard level results into a globally
|
||||
sorted set of results, of maximum length `size`.
|
||||
|
||||
During the second phase, the coordinating node requests the document
|
||||
content (and highlighted snippets, if any) from *only the relevant
|
||||
shards*.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search?search_type=query_then_fetch
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
||||
|
||||
NOTE: This is the default setting, if you do not specify a `search_type`
|
||||
in your request.
|
||||
|
||||
[discrete]
|
||||
[[dfs-query-then-fetch]]
|
||||
==== Dfs, Query Then Fetch
|
||||
|
||||
Parameter value: *dfs_query_then_fetch*.
|
||||
|
||||
Same as "Query Then Fetch", except for an initial scatter phase which
|
||||
goes and computes the distributed term frequencies for more accurate
|
||||
scoring.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET my-index-000001/_search?search_type=dfs_query_then_fetch
|
||||
--------------------------------------------------
|
||||
// TEST[setup:my_index]
|
|
@ -0,0 +1,117 @@
|
|||
[[search-multiple-indices]]
|
||||
== Search multiple data streams and indices
|
||||
|
||||
To search multiple data streams and indices, add them as comma-separated values
|
||||
in the <<search-search,search API>>'s request path.
|
||||
|
||||
The following request searches the `my-index-000001` and `my-index-000002`
|
||||
indices.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001,my-index-000002/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
// TEST[s/^/PUT my-index-000002\n/]
|
||||
|
||||
You can also search multiple data streams and indices using an index pattern.
|
||||
|
||||
The following request targets the `my-index-*` index pattern. The request
|
||||
searches any data streams or indices in the cluster that start with `my-index-`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-*/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
To search all data streams and indices in a cluster, omit the target from the
|
||||
request path. Alternatively, you can use `_all` or `*`.
|
||||
|
||||
The following requests are equivalent and search all data streams and indices in
|
||||
the cluster.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
GET /_all/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
GET /*/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
[discrete]
|
||||
[[index-boost]]
|
||||
=== Index boost
|
||||
|
||||
When searching multiple indices, you can use the `indices_boost` parameter to
|
||||
boost results from one or more specified indices. This is useful when hits
|
||||
coming from some indices matter more than hits from other.
|
||||
|
||||
NOTE: You cannot use `indices_boost` with data streams.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"indices_boost": [
|
||||
{ "my-index-000001": 1.4 },
|
||||
{ "my-index-000002": 1.3 }
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[s/^/PUT my-index-000001\nPUT my-index-000002\n/]
|
||||
|
||||
Index aliases and index patterns can also be used:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /_search
|
||||
{
|
||||
"indices_boost": [
|
||||
{ "my-alias": 1.4 },
|
||||
{ "my-index*": 1.3 }
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[s/^/PUT my-index-000001\nPUT my-index-000001\/_alias\/my-alias\n/]
|
||||
|
||||
If multiple matches are found, the first match will be used. For example, if an
|
||||
index is included in `alias1` and matches the `my-index*` pattern, a boost value
|
||||
of `1.4` is applied.
|
|
@ -0,0 +1,184 @@
|
|||
[[search-shard-routing]]
|
||||
== Search shard routing
|
||||
|
||||
To protect against hardware failure and increase search capacity, {es} can store
|
||||
copies of an index's data across multiple shards on multiple nodes. When running
|
||||
a search request, {es} selects a node containing a copy of the index's data and
|
||||
forwards the search request to that node's shards. This process is known as
|
||||
_search shard routing_ or _routing_.
|
||||
|
||||
[discrete]
|
||||
[[search-adaptive-replica]]
|
||||
=== Adaptive replica selection
|
||||
|
||||
By default, {es} uses _adaptive replica selection_ to route search requests.
|
||||
This method selects an eligible node using <<allocation-awareness,allocation
|
||||
awareness>> and the following criteria:
|
||||
|
||||
* Response time of prior requests between the coordinating node
|
||||
and the eligible node
|
||||
* How long the eligible node took to run previous searches
|
||||
* Queue size of the eligible node's `search` <<modules-threadpool,threadpool>>
|
||||
|
||||
Adaptive replica selection is designed to decrease search latency. However, you
|
||||
can disable adaptive replica selection by setting
|
||||
`cluster.routing.use_adaptive_replica_selection` to `false` using the
|
||||
<<cluster-update-settings,cluster settings API>>. If disabled, {es} routes
|
||||
search requests using a round-robin method, which may result in slower searches.
|
||||
|
||||
[discrete]
|
||||
[[shard-and-node-preference]]
|
||||
=== Set a preference
|
||||
|
||||
By default, adaptive replica selection chooses from all eligible nodes and
|
||||
shards. However, you may only want data from a local node or want to route
|
||||
searches to a specific node based on its hardware. Or you may want to send
|
||||
repeated searches to the same shard to take advantage of caching.
|
||||
|
||||
To limit the set of nodes and shards eligible for a search request, use
|
||||
the search API's <<search-preference,`preference`>> query parameter.
|
||||
|
||||
For example, the following request searches `my-index-000001` with a
|
||||
`preference` of `_local`. This restricts the search to shards on the
|
||||
local node. If the local node contains no shard copies of the index's data, the
|
||||
request uses adaptive replica selection to another eligible node
|
||||
as a fallback.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001/_search?preference=_local
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
You can also use the `preference` parameter to route searches to specific shards
|
||||
based on a provided string. If the cluster state and selected shards
|
||||
do not change, searches using the same `preference` string are routed to the
|
||||
same shards in the same order.
|
||||
|
||||
We recommend using a unique `preference` string, such as a user name or web
|
||||
session ID. This string cannot start with a `_`.
|
||||
|
||||
TIP: You can use this option to serve cached results for frequently used and
|
||||
resource-intensive searches. If the shard's data doesn't change, repeated
|
||||
searches with the same `preference` string retrieve results from the same
|
||||
<<shard-request-cache,shard request cache>>. For time-series use cases, such as
|
||||
logging, data in older indices is rarely updated and can be served directly from
|
||||
this cache.
|
||||
|
||||
The following request searches `my-index-000001` with a `preference` string of
|
||||
`my-custom-shard-string`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001/_search?preference=my-custom-shard-string
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
NOTE: If the cluster state or selected shards change, the same `preference`
|
||||
string may not route searches to the same shards in the same order. This can
|
||||
occur for a number of reasons, including shard relocations and shard failures. A
|
||||
node can also reject a search request, which {es} would re-route to another
|
||||
node.
|
||||
|
||||
[discrete]
|
||||
[[search-routing]]
|
||||
=== Use a routing value
|
||||
|
||||
When you index a document, you can specify an optional
|
||||
<<mapping-routing-field,routing value>>, which routes the document to a
|
||||
specific shard.
|
||||
|
||||
For example, the following indexing request routes a document using
|
||||
`my-routing-value`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
POST /my-index-000001/_doc?routing=my-routing-value
|
||||
{
|
||||
"@timestamp": "2099-11-15T13:12:00",
|
||||
"message": "GET /search HTTP/1.1 200 1070000",
|
||||
"user": {
|
||||
"id": "kimchy"
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
You can use the same routing value in the search API's `routing` query
|
||||
parameter. This ensures the search runs on the same shard used to index the
|
||||
document.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001/_search?routing=my-routing-value
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
You can also provide multiple comma-separated routing values:
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001/_search?routing=my-routing-value,my-routing-value-2
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
[discrete]
|
||||
[[search-concurrency-and-parallelism]]
|
||||
=== Search concurrency and parallelism
|
||||
|
||||
By default, {es} doesn't reject search requests based on the number of shards
|
||||
the request hits. However, hitting a large number of shards can significantly
|
||||
increase CPU and memory usage.
|
||||
|
||||
TIP: For tips on preventing indices with large numbers of shards, see
|
||||
<<avoid-oversharding>>.
|
||||
|
||||
You can use the `max_concurrent_shard_requests` query parameter to control
|
||||
maximum number of concurrent shards a search request can hit per node. This
|
||||
prevents a single request from overloading a cluster. The parameter defaults to
|
||||
a maximum of `5`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001/_search?max_concurrent_shard_requests=3
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
You can also use the `action.search.shard_count.limit` cluster setting to set a
|
||||
search shard limit and reject requests that hit too many shards. You can
|
||||
configure `action.search.shard_count.limit` using the
|
||||
<<cluster-update-settings,cluster settings API>>.
|
|
@ -2,7 +2,7 @@
|
|||
= Search your data
|
||||
|
||||
[[search-query]]
|
||||
A _search query_, or _query_, is a request for information about data in
|
||||
A _search query_, or _query_, is a request for information about data in
|
||||
{es} data streams or indices.
|
||||
|
||||
You can think of a query as a question, written in a way {es} understands.
|
||||
|
@ -24,55 +24,30 @@ a specific number of results.
|
|||
[[run-an-es-search]]
|
||||
== Run a search
|
||||
|
||||
You can use the <<search-search,search API>> to search data stored in
|
||||
{es} data streams or indices.
|
||||
You can use the <<search-search,search API>> to search and
|
||||
<<search-aggregations,aggregate>> data stored in {es} data streams or indices.
|
||||
The API's `query` request body parameter accepts queries written in
|
||||
<<query-dsl,Query DSL>>.
|
||||
|
||||
The API can run two types of searches, depending on how you provide
|
||||
queries:
|
||||
|
||||
<<run-uri-search,URI searches>>::
|
||||
Queries are provided through a query parameter. URI searches tend to be
|
||||
simpler and best suited for testing.
|
||||
|
||||
<<run-request-body-search,Request body searches>>::
|
||||
Queries are provided through the JSON body of the API request. These queries
|
||||
are written in <<query-dsl,Query DSL>>. We recommend using request body
|
||||
searches in most production use cases.
|
||||
|
||||
[WARNING]
|
||||
====
|
||||
If you specify a query in both the URI and request body, the search API request
|
||||
runs only the URI query.
|
||||
====
|
||||
|
||||
[discrete]
|
||||
[[run-uri-search]]
|
||||
=== Run a URI search
|
||||
|
||||
You can use the search API's <<search-api-query-params-q,`q` query string
|
||||
parameter>> to run a search in the request's URI. The `q` parameter only accepts
|
||||
queries written in Lucene's <<query-string-syntax,query string syntax>>.
|
||||
|
||||
The following URI search matches documents with a `user.id` value of `kimchy`.
|
||||
The following request searches `my-index-000001` using a
|
||||
<<query-dsl-match-query,`match`>> query. This query matches documents with a
|
||||
`user.id` value of `kimchy`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001/_search?q=user.id:kimchy
|
||||
GET /my-index-000001/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
The API returns the following response.
|
||||
|
||||
By default, the `hits.hits` property returns the top 10 documents matching the
|
||||
query. To retrieve more documents, see <<paginate-search-results>>.
|
||||
|
||||
The response sorts documents in `hits.hits` by `_score`, a
|
||||
<<relevance-scores,relevance score>> that measures how well each document
|
||||
matches the query.
|
||||
|
||||
The `hit.hits` property also includes the <<mapping-source-field,`_source`>> for
|
||||
each matching document. To retrieve only a subset of the `_source` or other
|
||||
fields, see <<search-fields>>.
|
||||
The API response returns the top 10 documents matching the query in the
|
||||
`hits.hits` property.
|
||||
|
||||
[source,console-result]
|
||||
----
|
||||
|
@ -126,20 +101,84 @@ fields, see <<search-fields>>.
|
|||
// TESTRESPONSE[s/"_id": "kxWFcnMByiguvud1Z8vC"/"_id": "$body.hits.hits.0._id"/]
|
||||
|
||||
[discrete]
|
||||
[[run-request-body-search]]
|
||||
=== Run a request body search
|
||||
[[common-search-options]]
|
||||
=== Common search options
|
||||
|
||||
You can use the search API's <<request-body-search-query,`query` request
|
||||
body parameter>> to provide a query as a JSON object, written in
|
||||
<<query-dsl,Query DSL>>.
|
||||
You can use the following options to customize your searches.
|
||||
|
||||
The following request body search uses the <<query-dsl-match-query,`match`>>
|
||||
query to match documents with a `user.id` value of `kimchy`.
|
||||
*Query DSL* +
|
||||
<<query-dsl,Query DSL>> supports a variety of query types you can mix and match
|
||||
to get the results you want. Query types include:
|
||||
|
||||
* <<query-dsl-bool-query,Boolean>> and other <<compound-queries,compound
|
||||
queries>>, which let you combine queries and match results based on multiple
|
||||
criteria
|
||||
* <<term-level-queries,Term-level queries>> for filtering and finding exact matches
|
||||
* <<full-text-queries,Full text queries>>, which are commonly used in search
|
||||
engines
|
||||
* <<geo-queries,Geo>> and <<shape-queries,spatial queries>>
|
||||
|
||||
*Aggregations* +
|
||||
You can use <<search-aggregations,search aggregations>> to get statistics and
|
||||
other analytics for your search results. Aggregations help you answer questions
|
||||
like:
|
||||
|
||||
* What's the average response time for my servers?
|
||||
* What are the top IP addresses hit by users on my network?
|
||||
* What is the total transaction revenue by customer?
|
||||
|
||||
*Search multiple data streams and indices* +
|
||||
You can use comma-separated values and grep-like index patterns to search
|
||||
several data streams and indices in the same request. You can even boost search
|
||||
results from specific indices. See <<search-multiple-indices>>.
|
||||
|
||||
*Paginate search results* +
|
||||
By default, searches return only the top 10 matching hits. To retrieve
|
||||
more or fewer documents, see <<paginate-search-results>>.
|
||||
|
||||
*Retrieve selected fields* +
|
||||
The search response's `hit.hits` property includes the full document
|
||||
<<mapping-source-field,`_source`>> for each hit. To retrieve only a subset of
|
||||
the `_source` or other fields, see <<search-fields>>.
|
||||
|
||||
*Sort search results* +
|
||||
By default, search hits are sorted by `_score`, a <<relevance-scores,relevance
|
||||
score>> that measures how well each document matches the query. To customize the
|
||||
calculation of these scores, use the
|
||||
<<query-dsl-script-score-query,`script_score`>> query. To sort search hits by
|
||||
other field values, see <<sort-search-results>>.
|
||||
|
||||
*Run an async search* +
|
||||
{es} searches are designed to run on large volumes of data quickly, often
|
||||
returning results in milliseconds. For this reason, searches are
|
||||
_synchronous_ by default. The search request waits for complete results before
|
||||
returning a response.
|
||||
|
||||
However, complete results can take longer for searches across
|
||||
<<frozen-indices,frozen indices>> or <<modules-cross-cluster-search,multiple
|
||||
clusters>>.
|
||||
|
||||
To avoid long waits, you can use run an _asynchronous_, or _async_, search
|
||||
instead. An <<async-search-intro,async search>> lets you retrieve partial
|
||||
results for a long-running search now and get complete results later.
|
||||
|
||||
[discrete]
|
||||
[[search-timeout]]
|
||||
=== Search timeout
|
||||
|
||||
By default, search requests don't time out. The request waits for complete
|
||||
results before returning a response.
|
||||
|
||||
While <<async-search-intro,async search>> is designed for long-running
|
||||
searches, you can also use the `timeout` parameter to specify a duration you'd
|
||||
like to wait for a search to complete. If no response is received before this
|
||||
period ends, the request fails and returns an error.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001/_search
|
||||
{
|
||||
"timeout": "2s",
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
|
@ -149,88 +188,23 @@ GET /my-index-000001/_search
|
|||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
To set a cluster-wide default timeout for all search requests, configure
|
||||
`search.default_search_timeout` using the <<cluster-update-settings,cluster
|
||||
settings API>>. This global timeout duration is used if no `timeout` argument is
|
||||
passed in the request. If the global search timeout expires before the search
|
||||
request finishes, the request is cancelled using <<task-cancellation,task
|
||||
cancellation>>. The `search.default_search_timeout` setting defaults to `-1` (no
|
||||
timeout).
|
||||
|
||||
[discrete]
|
||||
[[search-multiple-indices]]
|
||||
=== Search multiple data streams and indices
|
||||
[[global-search-cancellation]]
|
||||
=== Search cancellation
|
||||
|
||||
To search multiple data streams and indices, add them as comma-separated values
|
||||
in the search API request path.
|
||||
You can cancel a search request using the <<task-cancellation,task management
|
||||
API>>. {es} also automatically cancels a search request when your client's HTTP
|
||||
connection closes. We recommend you set up your client to close HTTP connections
|
||||
when a search request is aborted or times out.
|
||||
|
||||
The following request searches the `my-index-000001` and `my-index-000002`
|
||||
indices.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-000001,my-index-000002/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
// TEST[s/^/PUT my-index-000002\n/]
|
||||
|
||||
You can also search multiple data streams and indices using a wildcard (`*`)
|
||||
pattern.
|
||||
|
||||
The following request targets the wildcard pattern `my-index-*`. The request
|
||||
searches any data streams or indices in the cluster that start with `my-index-`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /my-index-*/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
To search all data streams and indices in a cluster, omit the target from the
|
||||
request path. Alternatively, you can use `_all` or `*`.
|
||||
|
||||
The following requests are equivalent and search all data streams and indices in the cluster.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
GET /_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
GET /_all/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
GET /*/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"user.id": "kimchy"
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
// TEST[setup:my_index]
|
||||
|
||||
include::request/index-boost.asciidoc[]
|
||||
include::request/preference.asciidoc[]
|
||||
include::request/search-type.asciidoc[]
|
||||
include::request/track-total-hits.asciidoc[]
|
||||
include::quickly-check-for-matching-docs.asciidoc[]
|
||||
|
||||
|
@ -243,4 +217,6 @@ include::paginate-search-results.asciidoc[]
|
|||
include::request/inner-hits.asciidoc[]
|
||||
include::search-fields.asciidoc[]
|
||||
include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[]
|
||||
include::search-multiple-indices.asciidoc[]
|
||||
include::search-shard-routing.asciidoc[]
|
||||
include::request/sort.asciidoc[]
|
||||
|
|
|
@ -129,9 +129,44 @@ When unspecified, the pre-filter phase is executed if any of these conditions is
|
|||
- The request targets one or more read-only index.
|
||||
- The primary sort of the query targets an indexed field.
|
||||
|
||||
[[search-preference]]
|
||||
`preference`::
|
||||
(Optional, string) Specifies the node or shard the operation should be
|
||||
performed on. Random by default.
|
||||
(Optional, string)
|
||||
Nodes and shards used for the search. By default, {es} selects from eligible
|
||||
nodes and shards using <<search-adaptive-replica,adaptive replica selection>>,
|
||||
accounting for <<shard-allocation-awareness,allocation awareness>>.
|
||||
+
|
||||
.Valid values for `preference`
|
||||
[%collapsible%open]
|
||||
====
|
||||
`_only_local`::
|
||||
Run the search only on shards on the local node.
|
||||
|
||||
`_local`::
|
||||
If possible, run the search on shards on the local node. If not, select shards
|
||||
using the default method.
|
||||
|
||||
`_only_nodes:<node-id>,<node-id>`::
|
||||
Run the search on only the specified nodes IDs. If suitable shards exist on more
|
||||
than one selected nodes, use shards on those nodes using the default method. If
|
||||
none of the specified nodes are available, select shards from any available node
|
||||
using the default method.
|
||||
|
||||
`_prefer_nodes:<node-id>,<node-id>`::
|
||||
If possible, run the search on the specified nodes IDs. If not, select shards
|
||||
using the default method.
|
||||
|
||||
`_shards:<shard>,<shard>`::
|
||||
Run the search only on the specified shards. This value can be combined with
|
||||
other `preference` values, but this value must come first. For example:
|
||||
`_shards:2,3|_local`
|
||||
|
||||
<custom-string>::
|
||||
Any string that does not start with `_`. If the cluster state and selected
|
||||
shards do not change, searches using the same `<custom-string>` value are routed
|
||||
to the same shards in the same order.
|
||||
====
|
||||
|
||||
|
||||
[[search-api-query-params-q]]
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search-q]
|
||||
|
@ -164,7 +199,28 @@ Period to retain the <<scroll-search-context,search context>> for scrolling. See
|
|||
By default, this value cannot exceed `1d` (24 hours). You can change
|
||||
this limit using the `search.max_keep_alive` cluster-level setting.
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search_type]
|
||||
[[search-type]]
|
||||
`search_type`::
|
||||
(Optional, string)
|
||||
How {wikipedia}/Tf–idf[distributed term frequencies] are calculated for
|
||||
<<relevance-scores,relevance scoring>>.
|
||||
+
|
||||
.Valid values for `search_type`
|
||||
[%collapsible%open]
|
||||
====
|
||||
`query_then_fetch`::
|
||||
(Default)
|
||||
Distributed term frequencies are calculated locally for each shard running the
|
||||
search. We recommend this option for faster searches with potentially less
|
||||
accurate scoring.
|
||||
|
||||
[[dfs-query-then-fetch]]
|
||||
`dfs_query_then_fetch`::
|
||||
Distributed term frequencies are calculated globally, using information gathered
|
||||
from all shards running the search. While this option increases the accuracy of
|
||||
scoring, it adds a round-trip to each shard, which can result in slower
|
||||
searches.
|
||||
====
|
||||
|
||||
`seq_no_primary_term`::
|
||||
(Optional, boolean) If `true`, returns sequence number and primary term of the
|
||||
|
@ -284,7 +340,7 @@ You can specify items in the array as a string or object.
|
|||
See <<docvalue-fields>>.
|
||||
+
|
||||
.Properties of `docvalue_fields` objects
|
||||
[%collapsible]
|
||||
[%collapsible%open]
|
||||
====
|
||||
`field`::
|
||||
(Required, string)
|
||||
|
@ -326,6 +382,24 @@ As an alternative to deep paging, we recommend using
|
|||
<<search-after,`search_after`>> parameter.
|
||||
--
|
||||
|
||||
`indices_boost`::
|
||||
(Optional, array of objects)
|
||||
Boosts the <<relevance-scores,`_score`>> of documents from specified indices.
|
||||
+
|
||||
.Properties of `indices_boost` objects
|
||||
[%collapsible%open]
|
||||
====
|
||||
`<index>: <boost-value>`::
|
||||
(Required, float)
|
||||
`<index>` is the name of the index or index alias. Wildcard (`*`) expressions
|
||||
are supported.
|
||||
+
|
||||
`<boost-value>` is the factor by which scores are multiplied.
|
||||
+
|
||||
A boost value greater than `1.0` increases the score. A boost value between
|
||||
`0` and `1.0` decreases the score.
|
||||
====
|
||||
|
||||
[[search-api-min-score]]
|
||||
`min_score`::
|
||||
(Optional, float)
|
||||
|
@ -409,6 +483,13 @@ exclude fields from this subset using the `excludes` property.
|
|||
=====
|
||||
====
|
||||
|
||||
[[stats-groups]]
|
||||
`stats`::
|
||||
(Optional, array of strings)
|
||||
Stats groups to associate with the search. Each group maintains a statistics
|
||||
aggregation for its associated searches. You can retrieve these stats using the
|
||||
<<indices-stats,indices stats API>>.
|
||||
|
||||
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=terminate_after]
|
||||
+
|
||||
Defaults to `0`, which does not terminate query execution early.
|
||||
|
|
Loading…
Reference in New Issue