370 lines
9.7 KiB
Plaintext
370 lines
9.7 KiB
Plaintext
[[modules-cross-cluster-search]]
|
|
== {ccs-cap}
|
|
|
|
The _{ccs}_ feature allows any node to act as a federated client across
|
|
multiple clusters. A {ccs} node won't join the remote cluster, instead
|
|
it connects to a remote cluster in a light fashion in order to execute
|
|
federated search requests. For details on communication and compatibility
|
|
between different clusters, see <<modules-remote-clusters>>.
|
|
|
|
[float]
|
|
=== Using {ccs}
|
|
|
|
{ccs-cap} requires <<modules-remote-clusters,configuring remote clusters>>.
|
|
|
|
[source,js]
|
|
--------------------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"persistent": {
|
|
"cluster": {
|
|
"remote": {
|
|
"cluster_one": {
|
|
"seeds": [
|
|
"127.0.0.1:9300"
|
|
]
|
|
},
|
|
"cluster_two": {
|
|
"seeds": [
|
|
"127.0.0.1:9301"
|
|
]
|
|
},
|
|
"cluster_three": {
|
|
"seeds": [
|
|
"127.0.0.1:9302"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:host]
|
|
// TEST[s/127.0.0.1:9300/\${transport_host}/]
|
|
|
|
To search the `twitter` index on remote cluster `cluster_one` the index name
|
|
must be prefixed with the alias of the remote cluster followed by the `:`
|
|
character:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /cluster_one:twitter/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"user": "kimchy"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
// TEST[setup:twitter]
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 150,
|
|
"timed_out": false,
|
|
"_shards": {
|
|
"total": 1,
|
|
"successful": 1,
|
|
"failed": 0,
|
|
"skipped": 0
|
|
},
|
|
"_clusters": {
|
|
"total": 1,
|
|
"successful": 1,
|
|
"skipped": 0
|
|
},
|
|
"hits": {
|
|
"total" : {
|
|
"value": 1,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 1,
|
|
"hits": [
|
|
{
|
|
"_index": "cluster_one:twitter",
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 1,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
|
|
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
|
|
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
|
|
|
|
|
|
Indices with the same name on different clusters can also be searched:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /cluster_one:twitter,twitter/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"user": "kimchy"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
Search results are disambiguated the same way as the indices are disambiguated in the request.
|
|
Indices with same names are treated as different indices when results are merged. All results
|
|
retrieved from an index located in a remote cluster are prefixed with their corresponding
|
|
cluster alias:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 150,
|
|
"timed_out": false,
|
|
"num_reduce_phases": 3,
|
|
"_shards": {
|
|
"total": 2,
|
|
"successful": 2,
|
|
"failed": 0,
|
|
"skipped": 0
|
|
},
|
|
"_clusters": {
|
|
"total": 2,
|
|
"successful": 2,
|
|
"skipped": 0
|
|
},
|
|
"hits": {
|
|
"total" : {
|
|
"value": 2,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 1,
|
|
"hits": [
|
|
{
|
|
"_index": "twitter",
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 2,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
},
|
|
{
|
|
"_index": "cluster_one:twitter",
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 1,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
|
|
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
|
|
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
|
|
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
|
|
|
|
[float]
|
|
=== Skipping disconnected clusters
|
|
|
|
By default, all remote clusters that are searched via {ccs} need to be
|
|
available when the search request is executed. Otherwise, the whole request
|
|
fails; even if some of the clusters are available, no search results are
|
|
returned. You can use the boolean `skip_unavailable` setting to make remote
|
|
clusters optional. By default, it is set to `false`.
|
|
|
|
[source,js]
|
|
--------------------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"persistent": {
|
|
"cluster.remote.cluster_two.skip_unavailable": true <1>
|
|
}
|
|
}
|
|
--------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
<1> `cluster_two` is made optional
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /cluster_one:twitter,cluster_two:twitter,twitter/_search <1>
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"user": "kimchy"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
<1> Search against the `twitter` index in `cluster_one`, `cluster_two` and also locally
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 150,
|
|
"timed_out": false,
|
|
"num_reduce_phases": 3,
|
|
"_shards": {
|
|
"total": 2,
|
|
"successful": 2,
|
|
"failed": 0,
|
|
"skipped": 0
|
|
},
|
|
"_clusters": { <1>
|
|
"total": 3,
|
|
"successful": 2,
|
|
"skipped": 1
|
|
},
|
|
"hits": {
|
|
"total" : {
|
|
"value": 2,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 1,
|
|
"hits": [
|
|
{
|
|
"_index": "twitter",
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 2,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
},
|
|
{
|
|
"_index": "cluster_one:twitter",
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 1,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
|
|
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
|
|
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
|
|
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
|
|
<1> The `clusters` section indicates that one cluster was unavailable and got skipped
|
|
|
|
[float]
|
|
[[ccs-works]]
|
|
=== How {ccs} works
|
|
Because {ccs} involves sending requests to remote clusters, any network delays
|
|
can impact search speed. To avoid slow searches, {ccs} offers two options for
|
|
handling network delays:
|
|
|
|
<<ccs-min-roundtrips,Minimize network roundtrips>>::
|
|
By default, {es} reduces the number of network roundtrips between remote
|
|
clusters. This reduces the impact of network delays on search speed. However,
|
|
{es} can't reduce network roundtrips for large search requests, such as those
|
|
including a <<request-body-search-scroll, scroll>> or
|
|
<<request-body-search-inner-hits,inner hits>>.
|
|
+
|
|
See <<ccs-min-roundtrips>> to learn how this option works.
|
|
|
|
<<ccs-unmin-roundtrips, Don't minimize network roundtrips>>::
|
|
For search requests that include a scroll or inner hits, {es} sends multiple
|
|
outgoing and ingoing requests to each remote cluster. You can also choose this
|
|
option by setting the <<search,search>> API's
|
|
<<ccs-minimize-roundtrips,`ccs_minimize_roundtrips`>> parameter to `false`.
|
|
While typically slower, this approach may work well for networks with low
|
|
latency.
|
|
+
|
|
See <<ccs-unmin-roundtrips>> to learn how this option works.
|
|
|
|
|
|
|
|
[float]
|
|
[[ccs-min-roundtrips]]
|
|
==== Minimize network roundtrips
|
|
|
|
Here's how {ccs} works when you minimize network roundtrips.
|
|
|
|
. You send a {ccs} request to your local cluster. A coordinating node in that
|
|
cluster receives and parses the request.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-client-request.png[]
|
|
|
|
. The coordinating node sends a single search request to each cluster, including
|
|
its own. Each cluster performs the search request independently.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-cluster-search.png[]
|
|
|
|
. Each remote cluster sends its search results back to the coordinating node.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-cluster-results.png[]
|
|
|
|
. After collecting results from each cluster, the coordinating node returns the
|
|
final results in the {ccs} response.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-client-response.png[]
|
|
|
|
[float]
|
|
[[ccs-unmin-roundtrips]]
|
|
==== Don't minimize network roundtrips
|
|
|
|
Here's how {ccs} works when you don't minimize network roundtrips.
|
|
|
|
. You send a {ccs} request to your local cluster. A coordinating node in that
|
|
cluster receives and parses the request.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-client-request.png[]
|
|
|
|
. The coordinating node sends a <<search-shards,search shards>> API request to
|
|
each remote cluster.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-cluster-search.png[]
|
|
|
|
. Each remote cluster sends its response back to the coordinating node.
|
|
This response contains information about the indices and shards the {ccs}
|
|
request will be executed on.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-cluster-results.png[]
|
|
|
|
. The coordinating node sends a search request to each shard, including those in
|
|
its own cluster. Each shard performs the search request independently.
|
|
+
|
|
image:images/ccs/ccs-dont-min-roundtrip-shard-search.png[]
|
|
|
|
. Each shard sends its search results back to the coordinating node.
|
|
+
|
|
image:images/ccs/ccs-dont-min-roundtrip-shard-results.png[]
|
|
|
|
. After collecting results from each cluster, the coordinating node returns the
|
|
final results in the {ccs} response.
|
|
+
|
|
image:images/ccs/ccs-min-roundtrip-client-response.png[]
|