mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-09 14:34:43 +00:00
With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters. This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour. Relates to #32125
319 lines
8.8 KiB
Plaintext
319 lines
8.8 KiB
Plaintext
[[modules-cross-cluster-search]]
|
|
== Cross Cluster Search
|
|
|
|
The _cross cluster search_ feature allows any node to act as a federated client across
|
|
multiple clusters. A cross cluster search node won't join the remote cluster, instead
|
|
it connects to a remote cluster in a light fashion in order to execute
|
|
federated search requests.
|
|
|
|
[float]
|
|
=== Using cross cluster search
|
|
|
|
Cross-cluster search requires <<modules-remote-clusters,configuring remote clusters>>.
|
|
|
|
[source,js]
|
|
--------------------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"persistent": {
|
|
"cluster": {
|
|
"remote": {
|
|
"cluster_one": {
|
|
"seeds": [
|
|
"127.0.0.1:9300"
|
|
]
|
|
},
|
|
"cluster_two": {
|
|
"seeds": [
|
|
"127.0.0.1:9301"
|
|
]
|
|
},
|
|
"cluster_three": {
|
|
"seeds": [
|
|
"127.0.0.1:9302"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------
|
|
// CONSOLE
|
|
// TEST[setup:host]
|
|
// TEST[s/127.0.0.1:9300/\${transport_host}/]
|
|
|
|
To search the `twitter` index on remote cluster `cluster_one` the index name
|
|
must be prefixed with the cluster alias separated by a `:` character:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /cluster_one:twitter/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"user": "kimchy"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
// TEST[setup:twitter]
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 150,
|
|
"timed_out": false,
|
|
"num_reduce_phases": 2,
|
|
"_shards": {
|
|
"total": 1,
|
|
"successful": 1,
|
|
"failed": 0,
|
|
"skipped": 0
|
|
},
|
|
"_clusters": {
|
|
"total": 1,
|
|
"successful": 1,
|
|
"skipped": 0
|
|
},
|
|
"hits": {
|
|
"total" : {
|
|
"value": 1,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 1,
|
|
"hits": [
|
|
{
|
|
"_index": "cluster_one:twitter",
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 1,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
|
|
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
|
|
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
|
|
|
|
|
|
Indices can also be searched with the same name on different clusters:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /cluster_one:twitter,twitter/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"user": "kimchy"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
Search results are disambiguated the same way as the indices are disambiguated in the request. Even if index names are
|
|
identical these indices will be treated as different indices when results are merged. All results retrieved from a
|
|
remote index
|
|
will be prefixed with their remote cluster name:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 150,
|
|
"timed_out": false,
|
|
"num_reduce_phases": 3,
|
|
"_shards": {
|
|
"total": 2,
|
|
"successful": 2,
|
|
"failed": 0,
|
|
"skipped": 0
|
|
},
|
|
"_clusters": {
|
|
"total": 2,
|
|
"successful": 2,
|
|
"skipped": 0
|
|
},
|
|
"hits": {
|
|
"total" : {
|
|
"value": 2,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 1,
|
|
"hits": [
|
|
{
|
|
"_index": "cluster_one:twitter",
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 1,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
},
|
|
{
|
|
"_index": "twitter",
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 2,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
|
|
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
|
|
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
|
|
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
|
|
|
|
[float]
|
|
=== Skipping disconnected clusters
|
|
|
|
By default all remote clusters that are searched via Cross Cluster Search need to be available when
|
|
the search request is executed, otherwise the whole request fails and no search results are returned
|
|
despite some of the clusters are available. Remote clusters can be made optional through the
|
|
boolean `skip_unavailable` setting, set to `false` by default.
|
|
|
|
[source,js]
|
|
--------------------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"persistent": {
|
|
"cluster.remote.cluster_two.skip_unavailable": true <1>
|
|
}
|
|
}
|
|
--------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
<1> `cluster_two` is made optional
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
GET /cluster_one:twitter,cluster_two:twitter,twitter/_search <1>
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"user": "kimchy"
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
<1> Search against the `twitter` index in `cluster_one`, `cluster_two` and also locally
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 150,
|
|
"timed_out": false,
|
|
"num_reduce_phases": 3,
|
|
"_shards": {
|
|
"total": 2,
|
|
"successful": 2,
|
|
"failed": 0,
|
|
"skipped": 0
|
|
},
|
|
"_clusters": { <1>
|
|
"total": 3,
|
|
"successful": 2,
|
|
"skipped": 1
|
|
},
|
|
"hits": {
|
|
"total" : {
|
|
"value": 2,
|
|
"relation": "eq"
|
|
},
|
|
"max_score": 1,
|
|
"hits": [
|
|
{
|
|
"_index": "cluster_one:twitter",
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 1,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
},
|
|
{
|
|
"_index": "twitter",
|
|
"_type": "_doc",
|
|
"_id": "0",
|
|
"_score": 2,
|
|
"_source": {
|
|
"user": "kimchy",
|
|
"date": "2009-11-15T14:12:12",
|
|
"message": "trying out Elasticsearch",
|
|
"likes": 0
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
|
|
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
|
|
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
|
|
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
|
|
<1> The `clusters` section indicates that one cluster was unavailable and got skipped
|
|
|
|
[float]
|
|
[[ccs-reduction]]
|
|
=== CCS reduction phase
|
|
|
|
Cross-cluster search requests can be executed in two ways:
|
|
|
|
- the CCS coordinating node minimizes network round-trips by sending one search
|
|
request to each cluster. Each cluster performs the search independently,
|
|
reducing and fetching results. Once the CCS node has received all the
|
|
responses, it performs another reduction and returns the relevant results back
|
|
to the user. This strategy is beneficial when there is network latency between
|
|
the CCS coordinating node and the remote clusters involved, which is typically
|
|
the case. A single request is sent to each remote cluster, at the cost of
|
|
retrieving `from` + `size` already fetched results. This is the default
|
|
strategy, used whenever possible. In case a scroll is provided, or inner hits
|
|
are requested as part of field collapsing, this strategy is not supported hence
|
|
network round-trips cannot be minimized and the following strategy is used
|
|
instead.
|
|
|
|
- the CCS coordinating node sends a <<search-shards,search shards>> request to
|
|
each remote cluster, in order to collect information about their corresponding
|
|
remote indices involved in the search request and the shards where their data
|
|
is located. Once each cluster has responded to such request, the search
|
|
executes as if all shards were part of the same cluster. The coordinating node
|
|
sends one request to each shard involved, each shard executes the query and
|
|
returns its own results which are then reduced (and fetched, depending on the
|
|
<<search-request-search-type, search type>>) by the CCS coordinating node.
|
|
This strategy may be beneficial whenever there is very low network latency
|
|
between the CCS coordinating node and the remote clusters involved, as it
|
|
treats all shards the same, at the cost of sending many requests to each remote
|
|
cluster, which is problematic in presence of network latency.
|
|
|
|
The <<search-request-body, search API>> supports the `ccs_minimize_roundtrips`
|
|
parameter, which defaults to `true` and can be set to `false` in case
|
|
minimizing network round-trips is not desirable.
|
|
|
|
Note that all the communication between the nodes, regardless of which cluster
|
|
they belong to and the selected reduce mode, happens through the
|
|
<<modules-transport,transport layer>>.
|