OpenSearch/docs/reference/modules/cross-cluster-search.asciidoc

[chapter]
[[modules-cross-cluster-search]]
= Search across clusters

*{ccs-cap}* lets you run a single search request against one or more
<<modules-remote-clusters,remote clusters>>. For example, you can use a {ccs} to
filter and analyze log data stored on clusters in different data centers.

IMPORTANT: {ccs-cap} requires <<modules-remote-clusters, remote clusters>>.

[float]
[[ccs-example]]
== {ccs-cap} examples

[float]
[[ccs-remote-cluster-setup]]
=== Remote cluster setup

To perform a {ccs}, you must have at least one remote cluster configured.

The following <<cluster-update-settings,cluster update settings>> API request
adds three remote clusters:`cluster_one`, `cluster_two`, and `cluster_three`.

[source,console]
--------------------------------
PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "cluster_one": {
          "seeds": [
            "127.0.0.1:9300"
          ]
        },
        "cluster_two": {
          "seeds": [
            "127.0.0.1:9301"
          ]
        },
        "cluster_three": {
          "seeds": [
            "127.0.0.1:9302"
          ]
        }
      }
    }
  }
}
--------------------------------
// TEST[setup:host]
// TEST[s/127.0.0.1:930\d+/\${transport_host}/]

[float]
[[ccs-search-remote-cluster]]
=== Search a single remote cluster

The following <<search,search>> API request searches the
`twitter` index on a single remote cluster, `cluster_one`.

[source,console]
--------------------------------------------------
GET /cluster_one:twitter/_search
{
  "query": {
    "match": {
      "user": "kimchy"
    }
  }
}
--------------------------------------------------
// TEST[continued]
// TEST[setup:twitter]

The API returns the following response:

[source,console-result]
--------------------------------------------------
{
  "took": 150,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0,
    "skipped": 0
  },
  "_clusters": {
    "total": 1,
    "successful": 1,
    "skipped": 0
  },
  "hits": {
    "total" : {
        "value": 1,
        "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "cluster_one:twitter", <1>
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      }
    ]
  }
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]

<1> The search response body includes the name of the remote cluster in the
`_index` parameter.

[float]
[[ccs-search-multi-remote-cluster]]
=== Search multiple remote clusters

The following <<search,search>> API request searches the `twitter` index on
three clusters:

* Your local cluster
* Two remote clusters, `cluster_one` and `cluster_two`

[source,console]
--------------------------------------------------
GET /twitter,cluster_one:twitter,cluster_two:twitter/_search
{
  "query": {
    "match": {
      "user": "kimchy"
    }
  }
}
--------------------------------------------------
// TEST[continued]

The API returns the following response:

[source,console-result]
--------------------------------------------------
{
  "took": 150,
  "timed_out": false,
  "num_reduce_phases": 4,
  "_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0,
    "skipped": 0
  },
  "_clusters": {
    "total": 3,
    "successful": 3,
    "skipped": 0
  },
  "hits": {
    "total" : {
        "value": 3,
        "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "twitter", <1>
        "_type": "_doc",
        "_id": "0",
        "_score": 2,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      },
      {
        "_index": "cluster_one:twitter", <2>
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      },
      {
        "_index": "cluster_two:twitter", <3>
        "_type": "_doc",
        "_id": "0",
        "_score": 1,
        "_source": {
          "user": "kimchy",
          "date": "2009-11-15T14:12:12",
          "message": "trying out Elasticsearch",
          "likes": 0
        }
      }
    ]
  }
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]

<1> This document's `_index` parameter doesn't include a cluster name. This
means the document came from the local cluster.
<2> This document came from `cluster_one`.
<3> This document came from `cluster_two`.

[float]
[[skip-unavailable-clusters]]
== Skip unavailable clusters

By default, a {ccs} returns an error if *any* cluster in the request is
unavailable.

To skip an unavailable cluster during a {ccs}, set the
<<skip-unavailable,`skip_unavailable`>> cluster setting to `true`.

The following <<cluster-update-settings,cluster update settings>> API request
changes `cluster_two`'s `skip_unavailable` setting to `true`.

[source,console]
--------------------------------
PUT _cluster/settings
{
  "persistent": {
    "cluster.remote.cluster_two.skip_unavailable": true
  }
}
--------------------------------
// TEST[continued]

If `cluster_two` is disconnected or unavailable during a {ccs}, {es} won't
include matching documents from that cluster in the final results.

[discrete]
[[ccs-works]]
== How {ccs} works

include::./remote-clusters.asciidoc[tag=how-remote-clusters-work]

[discrete]
[[ccs-gateway-seed-nodes]]
=== Selecting gateway and seed nodes

Gateway and seed nodes need to be accessible from the local cluster via your
network.

By default, any master-ineligible node can act as a gateway node. If wanted,
you can define the gateway nodes for a cluster by setting
`cluster.remote.node.attr.gateway` to `true`.

For {ccs}, we recommend you use gateway nodes that are capable of serving as
<<coordinating-node,coordinating nodes>> for search requests. If
wanted, the seed nodes for a cluster can be a subset of these gateway nodes.

[discrete]
[[ccs-network-delays]]
=== How {ccs} handles network delays

Because {ccs} involves sending requests to remote clusters, any network delays
can impact search speed. To avoid slow searches, {ccs} offers two options for
handling network delays:

<<ccs-min-roundtrips,Minimize network roundtrips>>::
By default, {es} reduces the number of network roundtrips between remote
clusters. This reduces the impact of network delays on search speed. However,
{es} can't reduce network roundtrips for large search requests, such as those
including a <<request-body-search-scroll, scroll>> or
<<request-body-search-inner-hits,inner hits>>.
+
See <<ccs-min-roundtrips>> to learn how this option works.

<<ccs-unmin-roundtrips, Don't minimize network roundtrips>>::
For search requests that include a scroll or inner hits, {es} sends multiple
outgoing and ingoing requests to each remote cluster. You can also choose this
option by setting the <<search,search>> API's
<<ccs-minimize-roundtrips,`ccs_minimize_roundtrips`>> parameter to `false`.
While typically slower, this approach may work well for networks with low
latency.
+
See <<ccs-unmin-roundtrips>> to learn how this option works.

[float]
[[ccs-min-roundtrips]]
==== Minimize network roundtrips

Here's how {ccs} works when you minimize network roundtrips.

. You send a {ccs} request to your local cluster. A coordinating node in that
cluster receives and parses the request.
+
image:images/ccs/ccs-min-roundtrip-client-request.svg[]

. The coordinating node sends a single search request to each cluster, including
the local cluster. Each cluster performs the search request independently,
applying its own cluster-level settings to the request.
+
image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]

. Each remote cluster sends its search results back to the coordinating node.
+
image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]

. After collecting results from each cluster, the coordinating node returns the
final results in the {ccs} response.
+
image:images/ccs/ccs-min-roundtrip-client-response.svg[]

[float]
[[ccs-unmin-roundtrips]]
==== Don't minimize network roundtrips

Here's how {ccs} works when you don't minimize network roundtrips.

. You send a {ccs} request to your local cluster. A coordinating node in that
cluster receives and parses the request.
+
image:images/ccs/ccs-min-roundtrip-client-request.svg[]

. The coordinating node sends a <<search-shards,search shards>> API request to
each remote cluster.
+
image:images/ccs/ccs-min-roundtrip-cluster-search.svg[]

. Each remote cluster sends its response back to the coordinating node.
This response contains information about the indices and shards the {ccs}
request will be executed on.
+
image:images/ccs/ccs-min-roundtrip-cluster-results.svg[]

. The coordinating node sends a search request to each shard, including those in
its own cluster. Each shard performs the search request independently.
+
[WARNING]
====
When network roundtrips aren't minimized, the search is executed as if all data
were in the coordinating node's cluster. We recommend updating cluster-level
settings that limit searches, such as `action.search.shard_count.limit`,
`pre_filter_shard_size`, and `max_concurrent_shard_requests`, to account for
this. If these limits are too low, the search may be rejected.
====
+
image:images/ccs/ccs-dont-min-roundtrip-shard-search.svg[]

. Each shard sends its search results back to the coordinating node.
+
image:images/ccs/ccs-dont-min-roundtrip-shard-results.svg[]

. After collecting results from each cluster, the coordinating node returns the
final results in the {ccs} response.
+
image:images/ccs/ccs-min-roundtrip-client-response.svg[]