379 lines
11 KiB
Plaintext
379 lines
11 KiB
Plaintext
[[modules-cross-cluster-search]]
|
|
== Search across clusters
|
|
|
|
*{ccs-cap}* lets you run a single search request against one or more
|
|
<<modules-remote-clusters,remote clusters>>. For example, you can use a {ccs} to
|
|
filter and analyze log data stored on clusters in different data centers.
|
|
|
|
IMPORTANT: {ccs-cap} requires <<modules-remote-clusters, remote clusters>>.
|
|
|
|
[float]
|
|
[[ccs-supported-apis]]
|
|
=== Supported APIs
|
|
|
|
The following APIs support {ccs}:
|
|
|
|
* <<search-search,Search>>
|
|
* <<search-multi-search,Multi search>>
|
|
* <<search-template,Search template>>
|
|
* <<multi-search-template,Multi search template>>
|
|
|
|
[float]
|
|
[[ccs-example]]
|
|
=== {ccs-cap} examples
|
|
|
|
[float]
|
|
[[ccs-remote-cluster-setup]]
|
|
==== Remote cluster setup
|
|
|
|
To perform a {ccs}, you must have at least one remote cluster configured.
|
|
|
|
The following <<cluster-update-settings,cluster update settings>> API request
|
|
adds three remote clusters:`cluster_one`, `cluster_two`, and `cluster_three`.
|
|
|
|
[source,console]
|
|
--------------------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"persistent": {
|
|
"cluster": {
|
|
"remote": {
|
|
"cluster_one": {
|
|
"seeds": [
|
|
"127.0.0.1:9300"
|
|
]
|
|
},
|
|
"cluster_two": {
|
|
"seeds": [
|
|
"127.0.0.1:9301"
|
|
]
|
|
},
|
|
"cluster_three": {
|
|
"seeds": [
|
|
"127.0.0.1:9302"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------
|
|
// TEST[setup:host]
|
|
// TEST[s/127.0.0.1:930\d+/\${transport_host}/]
|
|
|
|
[float]
|
|
[[ccs-search-remote-cluster]]
|
|
==== Search a single remote cluster
|
|
|
|
The following <<search-search,search>> API request searches the
|
|
`twitter` index on a single remote cluster, `cluster_one`.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /cluster_one:twitter/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"user": "kimchy"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
// TEST[setup:twitter]
|
|
|
|
The API returns the following response:
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 150,
|
|
"timed_out": false,
|
|
"_shards": {
|
|
"total": 1,
|
|
"successful": 1,
|
|
"failed": 0,
|
|
"skipped": 0
|
|
},
|
|
"_clusters": {
|
|
"total": 1,
|
|
"successful": 1,
|
|
"skipped": 0
|
|
},
|
|
"hits": {
|
|
"total" : {
|
|
"value": 1,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 1,
|
|
"hits": [
|
|
{
|
|
"_index": "cluster_one:twitter", <1>
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 1,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
|
|
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
|
|
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
|
|
|
|
<1> The search response body includes the name of the remote cluster in the
|
|
`_index` parameter.
|
|
|
|
[float]
|
|
[[ccs-search-multi-remote-cluster]]
|
|
==== Search multiple remote clusters
|
|
|
|
The following <<search,search>> API request searches the `twitter` index on
|
|
three clusters:
|
|
|
|
* Your local cluster
|
|
* Two remote clusters, `cluster_one` and `cluster_two`
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET /twitter,cluster_one:twitter,cluster_two:twitter/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"user": "kimchy"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
The API returns the following response:
|
|
|
|
[source,console-result]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 150,
|
|
"timed_out": false,
|
|
"num_reduce_phases": 4,
|
|
"_shards": {
|
|
"total": 3,
|
|
"successful": 3,
|
|
"failed": 0,
|
|
"skipped": 0
|
|
},
|
|
"_clusters": {
|
|
"total": 3,
|
|
"successful": 3,
|
|
"skipped": 0
|
|
},
|
|
"hits": {
|
|
"total" : {
|
|
"value": 3,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 1,
|
|
"hits": [
|
|
{
|
|
"_index": "twitter", <1>
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 2,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
},
|
|
{
|
|
"_index": "cluster_one:twitter", <2>
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 1,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
},
|
|
{
|
|
"_index": "cluster_two:twitter", <3>
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 1,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
|
|
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
|
|
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
|
|
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
|
|
|
|
<1> This document's `_index` parameter doesn't include a cluster name. This
|
|
means the document came from the local cluster.
|
|
<2> This document came from `cluster_one`.
|
|
<3> This document came from `cluster_two`.
|
|
|
|
[float]
|
|
[[skip-unavailable-clusters]]
|
|
=== Skip unavailable clusters
|
|
|
|
By default, a {ccs} returns an error if *any* cluster in the request is
|
|
unavailable.
|
|
|
|
To skip an unavailable cluster during a {ccs}, set the
|
|
<<skip-unavailable,`skip_unavailable`>> cluster setting to `true`.
|
|
|
|
The following <<cluster-update-settings,cluster update settings>> API request
|
|
changes `cluster_two`'s `skip_unavailable` setting to `true`.
|
|
|
|
[source,console]
|
|
--------------------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"persistent": {
|
|
"cluster.remote.cluster_two.skip_unavailable": true
|
|
}
|
|
}
|
|
--------------------------------
|
|
// TEST[continued]
|
|
|
|
If `cluster_two` is disconnected or unavailable during a {ccs}, {es} won't
|
|
include matching documents from that cluster in the final results.
|
|
|
|
[discrete]
|
|
[[ccs-gateway-seed-nodes]]
|
|
=== Selecting gateway and seed nodes in sniff mode
|
|
|
|
For remote clusters using the <<sniff-mode,sniff connection>> mode, gateway and
|
|
seed nodes need to be accessible from the local cluster via your network.
|
|
|
|
By default, any non-<<master-node,master-eligible>> node can act as a
|
|
gateway node. If wanted, you can define the gateway nodes for a cluster by
|
|
setting `cluster.remote.node.attr.gateway` to `true`.
|
|
|
|
For {ccs}, we recommend you use gateway nodes that are capable of serving as
|
|
<<coordinating-node,coordinating nodes>> for search requests. If
|
|
wanted, the seed nodes for a cluster can be a subset of these gateway nodes.
|
|
|
|
[discrete]
|
|
[[ccs-proxy-mode]]
|
|
=== {ccs} in proxy mode
|
|
|
|
<<proxy-mode,Proxy mode>> remote cluster connections support {ccs}. All remote
|
|
connections connect to the configured `proxy_address`. Any desired connection
|
|
routing to gateway or <<coordinating-node,coordinating nodes>> must
|
|
be implemented by the intermediate proxy at this configured address.
|
|
|
|
[discrete]
|
|
[[ccs-network-delays]]
|
|
=== How {ccs} handles network delays
|
|
|
|
Because {ccs} involves sending requests to remote clusters, any network delays
|
|
can impact search speed. To avoid slow searches, {ccs} offers two options for
|
|
handling network delays:
|
|
|
|
<<ccs-min-roundtrips,Minimize network roundtrips>>::
|
|
By default, {es} reduces the number of network roundtrips between remote
|
|
clusters. This reduces the impact of network delays on search speed. However,
|
|
{es} can't reduce network roundtrips for large search requests, such as those
|
|
including a <<request-body-search-scroll, scroll>> or
|
|
<<request-body-search-inner-hits,inner hits>>.
|
|
+
|
|
See <<ccs-min-roundtrips>> to learn how this option works.
|
|
|
|
<<ccs-unmin-roundtrips, Don't minimize network roundtrips>>:: For search
|
|
requests that include a scroll or inner hits, {es} sends multiple outgoing and
|
|
ingoing requests to each remote cluster. You can also choose this option by
|
|
setting the <<ccs-minimize-roundtrips,`ccs_minimize_roundtrips`>> parameter to
|
|
`false`. While typically slower, this approach may work well for networks with
|
|
low latency.
|
|
+
|
|
See <<ccs-unmin-roundtrips>> to learn how this option works.
|
|
|
|
[discrete]
|
|
[[ccs-min-roundtrips]]
|
|
==== Minimize network roundtrips
|
|
|
|
Here's how {ccs} works when you minimize network roundtrips.
|
|
|
|
. You send a {ccs} request to your local cluster. A coordinating node in that
|
|
cluster receives and parses the request.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-client-request.svg[]
|
|
|
|
. The coordinating node sends a single search request to each cluster, including
|
|
the local cluster. Each cluster performs the search request independently,
|
|
applying its own cluster-level settings to the request.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]
|
|
|
|
. Each remote cluster sends its search results back to the coordinating node.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]
|
|
|
|
. After collecting results from each cluster, the coordinating node returns the
|
|
final results in the {ccs} response.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-client-response.svg[]
|
|
|
|
[discrete]
|
|
[[ccs-unmin-roundtrips]]
|
|
==== Don't minimize network roundtrips
|
|
|
|
Here's how {ccs} works when you don't minimize network roundtrips.
|
|
|
|
. You send a {ccs} request to your local cluster. A coordinating node in that
|
|
cluster receives and parses the request.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-client-request.svg[]
|
|
|
|
. The coordinating node sends a <<search-shards,search shards>> API request to
|
|
each remote cluster.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]
|
|
|
|
. Each remote cluster sends its response back to the coordinating node.
|
|
This response contains information about the indices and shards the {ccs}
|
|
request will be executed on.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]
|
|
|
|
. The coordinating node sends a search request to each shard, including those in
|
|
its own cluster. Each shard performs the search request independently.
|
|
+
|
|
[WARNING]
|
|
====
|
|
When network roundtrips aren't minimized, the search is executed as if all data
|
|
were in the coordinating node's cluster. We recommend updating cluster-level
|
|
settings that limit searches, such as `action.search.shard_count.limit`,
|
|
`pre_filter_shard_size`, and `max_concurrent_shard_requests`, to account for
|
|
this. If these limits are too low, the search may be rejected.
|
|
====
|
|
+
|
|
image:images/ccs/ccs-dont-min-roundtrip-shard-search.svg[]
|
|
|
|
. Each shard sends its search results back to the coordinating node.
|
|
+
|
|
image:images/ccs/ccs-dont-min-roundtrip-shard-results.svg[]
|
|
|
|
. After collecting results from each cluster, the coordinating node returns the
|
|
final results in the {ccs} response.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-client-response.svg[]
|