OpenSearch/docs/reference/modules/cross-cluster-search.asciidoc

[[modules-cross-cluster-search]]
== {ccs-cap}

The _{ccs}_ feature allows any node to act as a federated client across
multiple clusters. A {ccs} node won't join the remote cluster, instead
it connects to a remote cluster in a light fashion in order to execute
federated search requests. For details on communication and compatibility
between different clusters, see <<modules-remote-clusters>>.

[float]
=== Using {ccs}

{ccs-cap} requires <<modules-remote-clusters,configuring remote clusters>>.

[source,js]
--------------------------------
PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "cluster_one": {
          "seeds": [
            "127.0.0.1:9300"
          ]
        },
        "cluster_two": {
          "seeds": [
            "127.0.0.1:9301"
          ]
        },
        "cluster_three": {
          "seeds": [
            "127.0.0.1:9302"
          ]
        }
      }
    }
  }
}
--------------------------------
// CONSOLE
// TEST[setup:host]
// TEST[s/127.0.0.1:9300/\${transport_host}/]

To search the `twitter` index on remote cluster `cluster_one` the index name
must be prefixed with the alias of the remote cluster followed by the `:`
character:

[source,js]
--------------------------------------------------
GET /cluster_one:twitter/_search
{
  "query": {
    "match": {
      "user": "kimchy"
    }
  }
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
// TEST[setup:twitter]

[source,js]
--------------------------------------------------
{
  "took": 150,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0,
    "skipped": 0
  },
  "_clusters": {
    "total": 1,
    "successful": 1,
    "skipped": 0
  },
  "hits": {
    "total" : {
        "value": 1,
        "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "cluster_one:twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      }
    ]
  }
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]


Indices with the same name on different clusters can also be searched:

[source,js]
--------------------------------------------------
GET /cluster_one:twitter,twitter/_search
{
  "query": {
    "match": {
      "user": "kimchy"
    }
  }
}
--------------------------------------------------
// CONSOLE
// TEST[continued]

Search results are disambiguated the same way as the indices are disambiguated in the request.
Indices with same names are treated as different indices when results are merged. All results
retrieved from an index located in a remote cluster are prefixed with their corresponding
cluster alias:

[source,js]
--------------------------------------------------
{
  "took": 150,
  "timed_out": false,
  "num_reduce_phases": 3,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0,
    "skipped": 0
  },
  "_clusters": {
    "total": 2,
    "successful": 2,
    "skipped": 0
  },
  "hits": {
    "total" : {
        "value": 2,
        "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 2,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      },
      {
        "_index": "cluster_one:twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      }
    ]
  }
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]

[float]
=== Skipping disconnected clusters

By default, all remote clusters that are searched via {ccs} need to be
available when the search request is executed. Otherwise, the whole request
fails; even if some of the clusters are available, no search results are
returned. You can use the boolean `skip_unavailable` setting to make remote
clusters optional. By default, it is set to `false`.

[source,js]
--------------------------------
PUT _cluster/settings
{
  "persistent": {
    "cluster.remote.cluster_two.skip_unavailable": true <1>
  }
}
--------------------------------
// CONSOLE
// TEST[continued]
<1> `cluster_two` is made optional

[source,js]
--------------------------------------------------
GET /cluster_one:twitter,cluster_two:twitter,twitter/_search <1>
{
  "query": {
    "match": {
      "user": "kimchy"
    }
  }
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
<1> Search against the `twitter` index in `cluster_one`, `cluster_two` and also locally

[source,js]
--------------------------------------------------
{
  "took": 150,
  "timed_out": false,
  "num_reduce_phases": 3,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0,
    "skipped": 0
  },
  "_clusters": { <1>
    "total": 3,
    "successful": 2,
    "skipped": 1
  },
  "hits": {
    "total" : {
        "value": 2,
        "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 2,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      },
      {
        "_index": "cluster_one:twitter",
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      }
    ]
  }
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
<1> The `clusters` section indicates that one cluster was unavailable and got skipped

[float]
[[ccs-works]]
=== How {ccs} works
Because {ccs} involves sending requests to remote clusters, any network delays
can impact search speed. To avoid slow searches, {ccs} offers two options for
handling network delays:

<<ccs-min-roundtrips,Minimize network roundtrips>>::
By default, {es} reduces the number of network roundtrips between remote
clusters. This reduces the impact of network delays on search speed. However,
{es} can't reduce network roundtrips for large search requests, such as those
including a <<request-body-search-scroll, scroll>> or
<<request-body-search-inner-hits,inner hits>>.
+
See <<ccs-min-roundtrips>> to learn how this option works.

<<ccs-unmin-roundtrips, Don't minimize network roundtrips>>::
For search requests that include a scroll or inner hits, {es} sends multiple
outgoing and ingoing requests to each remote cluster. You can also choose this
option by setting the <<search,search>> API's
<<ccs-minimize-roundtrips,`ccs_minimize_roundtrips`>> parameter to `false`.
While typically slower, this approach may work well for networks with low
latency.
+
See <<ccs-unmin-roundtrips>> to learn how this option works.


[float]
[[ccs-min-roundtrips]]
==== Minimize network roundtrips

Here's how {ccs} works when you minimize network roundtrips.

. You send a {ccs} request to your local cluster. A coordinating node in that
cluster receives and parses the request.
+
image:images/ccs/ccs-min-roundtrip-client-request.png[]

. The coordinating node sends a single search request to each cluster, including
its own. Each cluster performs the search request independently.
+
image:images/ccs/ccs-min-roundtrip-cluster-search.png[]

. Each remote cluster sends its search results back to the coordinating node.
+
image:images/ccs/ccs-min-roundtrip-cluster-results.png[]

. After collecting results from each cluster, the coordinating node returns the
final results in the {ccs} response.
+
image:images/ccs/ccs-min-roundtrip-client-response.png[]

[float]
[[ccs-unmin-roundtrips]]
==== Don't minimize network roundtrips

Here's how {ccs} works when you don't minimize network roundtrips.

. You send a {ccs} request to your local cluster. A coordinating node in that
cluster receives and parses the request.
+
image:images/ccs/ccs-min-roundtrip-client-request.png[]

. The coordinating node sends a <<search-shards,search shards>> API request to
each remote cluster.
+
image:images/ccs/ccs-min-roundtrip-cluster-search.png[]

. Each remote cluster sends its response back to the coordinating node.
This response contains information about the indices and shards the {ccs}
request will be executed on.
+
image:images/ccs/ccs-min-roundtrip-cluster-results.png[]

. The coordinating node sends a search request to each shard, including those in
its own cluster. Each shard performs the search request independently.
+
image:images/ccs/ccs-dont-min-roundtrip-shard-search.png[]

. Each shard sends its search results back to the coordinating node.
+
image:images/ccs/ccs-dont-min-roundtrip-shard-results.png[]

. After collecting results from each cluster, the coordinating node returns the
final results in the {ccs} response.
+
image:images/ccs/ccs-min-roundtrip-client-response.png[]