[chapter] [[modules-cross-cluster-search]] = Search across clusters *{ccs-cap}* lets you run a single search request against one or more <>. For example, you can use a {ccs} to filter and analyze log data stored on clusters in different data centers. IMPORTANT: {ccs-cap} requires <>. [float] [[ccs-example]] == {ccs-cap} examples [float] [[ccs-remote-cluster-setup]] === Remote cluster setup To perform a {ccs}, you must have at least one remote cluster configured. The following <> API request adds three remote clusters:`cluster_one`, `cluster_two`, and `cluster_three`. [source,console] -------------------------------- PUT _cluster/settings { "persistent": { "cluster": { "remote": { "cluster_one": { "seeds": [ "127.0.0.1:9300" ] }, "cluster_two": { "seeds": [ "127.0.0.1:9301" ] }, "cluster_three": { "seeds": [ "127.0.0.1:9302" ] } } } } } -------------------------------- // TEST[setup:host] // TEST[s/127.0.0.1:930\d+/\${transport_host}/] [float] [[ccs-search-remote-cluster]] === Search a single remote cluster The following <> API request searches the `twitter` index on a single remote cluster, `cluster_one`. [source,console] -------------------------------------------------- GET /cluster_one:twitter/_search { "query": { "match": { "user": "kimchy" } } } -------------------------------------------------- // TEST[continued] // TEST[setup:twitter] The API returns the following response: [source,console-result] -------------------------------------------------- { "took": 150, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0, "skipped": 0 }, "_clusters": { "total": 1, "successful": 1, "skipped": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 1, "hits": [ { "_index": "cluster_one:twitter", <1> "_type": "_doc", "_id": "0", "_score": 1, "_source": { "user": "kimchy", "date": "2009-11-15T14:12:12", "message": "trying out Elasticsearch", "likes": 0 } } ] } } -------------------------------------------------- // TESTRESPONSE[s/"took": 150/"took": "$body.took"/] // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/] // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/] <1> The search response body includes the name of the remote cluster in the `_index` parameter. [float] [[ccs-search-multi-remote-cluster]] === Search multiple remote clusters The following <> API request searches the `twitter` index on three clusters: * Your local cluster * Two remote clusters, `cluster_one` and `cluster_two` [source,console] -------------------------------------------------- GET /twitter,cluster_one:twitter,cluster_two:twitter/_search { "query": { "match": { "user": "kimchy" } } } -------------------------------------------------- // TEST[continued] The API returns the following response: [source,console-result] -------------------------------------------------- { "took": 150, "timed_out": false, "num_reduce_phases": 4, "_shards": { "total": 3, "successful": 3, "failed": 0, "skipped": 0 }, "_clusters": { "total": 3, "successful": 3, "skipped": 0 }, "hits": { "total" : { "value": 3, "relation": "eq" }, "max_score": 1, "hits": [ { "_index": "twitter", <1> "_type": "_doc", "_id": "0", "_score": 2, "_source": { "user": "kimchy", "date": "2009-11-15T14:12:12", "message": "trying out Elasticsearch", "likes": 0 } }, { "_index": "cluster_one:twitter", <2> "_type": "_doc", "_id": "0", "_score": 1, "_source": { "user": "kimchy", "date": "2009-11-15T14:12:12", "message": "trying out Elasticsearch", "likes": 0 } }, { "_index": "cluster_two:twitter", <3> "_type": "_doc", "_id": "0", "_score": 1, "_source": { "user": "kimchy", "date": "2009-11-15T14:12:12", "message": "trying out Elasticsearch", "likes": 0 } } ] } } -------------------------------------------------- // TESTRESPONSE[s/"took": 150/"took": "$body.took"/] // TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/] // TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/] // TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/] <1> This document's `_index` parameter doesn't include a cluster name. This means the document came from the local cluster. <2> This document came from `cluster_one`. <3> This document came from `cluster_two`. [float] [[skip-unavailable-clusters]] == Skip unavailable clusters By default, a {ccs} returns an error if *any* cluster in the request is unavailable. To skip an unavailable cluster during a {ccs}, set the <> cluster setting to `true`. The following <> API request changes `cluster_two`'s `skip_unavailable` setting to `true`. [source,console] -------------------------------- PUT _cluster/settings { "persistent": { "cluster.remote.cluster_two.skip_unavailable": true } } -------------------------------- // TEST[continued] If `cluster_two` is disconnected or unavailable during a {ccs}, {es} won't include matching documents from that cluster in the final results. [discrete] [[ccs-works]] == How {ccs} works include::./remote-clusters.asciidoc[tag=how-remote-clusters-work] [discrete] [[ccs-gateway-seed-nodes]] === Selecting gateway and seed nodes Gateway and seed nodes need to be accessible from the local cluster via your network. By default, any master-ineligible node can act as a gateway node. If wanted, you can define the gateway nodes for a cluster by setting `cluster.remote.node.attr.gateway` to `true`. For {ccs}, we recommend you use gateway nodes that are capable of serving as <> for search requests. If wanted, the seed nodes for a cluster can be a subset of these gateway nodes. [discrete] [[ccs-network-delays]] === How {ccs} handles network delays Because {ccs} involves sending requests to remote clusters, any network delays can impact search speed. To avoid slow searches, {ccs} offers two options for handling network delays: <>:: By default, {es} reduces the number of network roundtrips between remote clusters. This reduces the impact of network delays on search speed. However, {es} can't reduce network roundtrips for large search requests, such as those including a <> or <>. + See <> to learn how this option works. <>:: For search requests that include a scroll or inner hits, {es} sends multiple outgoing and ingoing requests to each remote cluster. You can also choose this option by setting the <> API's <> parameter to `false`. While typically slower, this approach may work well for networks with low latency. + See <> to learn how this option works. [float] [[ccs-min-roundtrips]] ==== Minimize network roundtrips Here's how {ccs} works when you minimize network roundtrips. . You send a {ccs} request to your local cluster. A coordinating node in that cluster receives and parses the request. + image:images/ccs/ccs-min-roundtrip-client-request.svg[] . The coordinating node sends a single search request to each cluster, including the local cluster. Each cluster performs the search request independently, applying its own cluster-level settings to the request. + image:images/ccs/ccs-min-roundtrip-cluster-search.svg[] . Each remote cluster sends its search results back to the coordinating node. + image:images/ccs/ccs-min-roundtrip-cluster-results.svg[] . After collecting results from each cluster, the coordinating node returns the final results in the {ccs} response. + image:images/ccs/ccs-min-roundtrip-client-response.svg[] [float] [[ccs-unmin-roundtrips]] ==== Don't minimize network roundtrips Here's how {ccs} works when you don't minimize network roundtrips. . You send a {ccs} request to your local cluster. A coordinating node in that cluster receives and parses the request. + image:images/ccs/ccs-min-roundtrip-client-request.svg[] . The coordinating node sends a <> API request to each remote cluster. + image:images/ccs/ccs-min-roundtrip-cluster-search.svg[] . Each remote cluster sends its response back to the coordinating node. This response contains information about the indices and shards the {ccs} request will be executed on. + image:images/ccs/ccs-min-roundtrip-cluster-results.svg[] . The coordinating node sends a search request to each shard, including those in its own cluster. Each shard performs the search request independently. + [WARNING] ==== When network roundtrips aren't minimized, the search is executed as if all data were in the coordinating node's cluster. We recommend updating cluster-level settings that limit searches, such as `action.search.shard_count.limit`, `pre_filter_shard_size`, and `max_concurrent_shard_requests`, to account for this. If these limits are too low, the search may be rejected. ==== + image:images/ccs/ccs-dont-min-roundtrip-shard-search.svg[] . Each shard sends its search results back to the coordinating node. + image:images/ccs/ccs-dont-min-roundtrip-shard-results.svg[] . After collecting results from each cluster, the coordinating node returns the final results in the {ccs} response. + image:images/ccs/ccs-min-roundtrip-client-response.svg[]