OpenSearch/docs/reference/modules/cross-cluster-search.asciidoc

318 lines
8.6 KiB
Plaintext
Raw Normal View History

2017-01-05 10:10:34 -05:00
[[modules-cross-cluster-search]]
== {ccs-cap}
2017-01-05 10:10:34 -05:00
The _{ccs}_ feature allows any node to act as a federated client across
multiple clusters. A {ccs} node won't join the remote cluster, instead
it connects to a remote cluster in a light fashion in order to execute
federated search requests. For details on communication and compatibility
between different clusters, see <<modules-remote-clusters>>.
2017-01-05 10:10:34 -05:00
[float]
=== Using {ccs}
2017-01-05 10:10:34 -05:00
{ccs-cap} requires <<modules-remote-clusters,configuring remote clusters>>.
[source,js]
--------------------------------
PUT _cluster/settings
{
"persistent": {
"cluster": {
"remote": {
"cluster_one": {
"seeds": [
"127.0.0.1:9300"
]
},
"cluster_two": {
"seeds": [
"127.0.0.1:9301"
]
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
},
"cluster_three": {
"seeds": [
"127.0.0.1:9302"
]
}
}
}
}
}
--------------------------------
// CONSOLE
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
// TEST[setup:host]
// TEST[s/127.0.0.1:9300/\${transport_host}/]
To search the `twitter` index on remote cluster `cluster_one` the index name
must be prefixed with the alias of the remote cluster followed by the `:`
character:
2017-01-05 10:10:34 -05:00
[source,js]
--------------------------------------------------
GET /cluster_one:twitter/_search
2017-01-05 10:10:34 -05:00
{
"query": {
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"match": {
"user": "kimchy"
}
}
2017-01-05 10:10:34 -05:00
}
--------------------------------------------------
// CONSOLE
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
// TEST[continued]
// TEST[setup:twitter]
[source,js]
--------------------------------------------------
{
"took": 150,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0,
"skipped": 0
},
"_clusters": {
"total": 1,
"successful": 1,
"skipped": 0
},
"hits": {
"total" : {
"value": 1,
"relation": "eq"
},
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"max_score": 1,
"hits": [
{
"_index": "cluster_one:twitter",
"_type": "_doc",
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_id": "0",
"_score": 1,
"_source": {
"user": "kimchy",
"date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch",
"likes": 0
}
}
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
2017-01-05 10:10:34 -05:00
Indices with the same name on different clusters can also be searched:
2017-01-05 10:10:34 -05:00
[source,js]
--------------------------------------------------
GET /cluster_one:twitter,twitter/_search
2017-01-05 10:10:34 -05:00
{
"query": {
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"match": {
"user": "kimchy"
}
}
2017-01-05 10:10:34 -05:00
}
--------------------------------------------------
// CONSOLE
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
// TEST[continued]
2017-01-05 10:10:34 -05:00
Search results are disambiguated the same way as the indices are disambiguated in the request.
Indices with same names are treated as different indices when results are merged. All results
retrieved from an index located in a remote cluster are prefixed with their corresponding
cluster alias:
2017-01-05 10:10:34 -05:00
[source,js]
--------------------------------------------------
{
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"took": 150,
"timed_out": false,
"num_reduce_phases": 3,
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_shards": {
"total": 2,
"successful": 2,
"failed": 0,
"skipped": 0
2017-01-05 10:10:34 -05:00
},
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_clusters": {
"total": 2,
"successful": 2,
"skipped": 0
},
"hits": {
"total" : {
"value": 2,
"relation": "eq"
},
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"max_score": 1,
"hits": [
2017-01-05 10:10:34 -05:00
{
"_index": "twitter",
"_type": "_doc",
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_id": "0",
"_score": 2,
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_source": {
"user": "kimchy",
"date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch",
"likes": 0
2017-01-05 10:10:34 -05:00
}
},
{
"_index": "cluster_one:twitter",
"_type": "_doc",
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_id": "0",
"_score": 1,
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_source": {
"user": "kimchy",
"date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch",
"likes": 0
2017-01-05 10:10:34 -05:00
}
}
]
}
}
--------------------------------------------------
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
[float]
=== Skipping disconnected clusters
By default, all remote clusters that are searched via {ccs} need to be
available when the search request is executed. Otherwise, the whole request
fails; even if some of the clusters are available, no search results are
returned. You can use the boolean `skip_unavailable` setting to make remote
clusters optional. By default, it is set to `false`.
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
[source,js]
--------------------------------
PUT _cluster/settings
{
"persistent": {
"cluster.remote.cluster_two.skip_unavailable": true <1>
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
}
}
--------------------------------
// CONSOLE
// TEST[continued]
<1> `cluster_two` is made optional
[source,js]
--------------------------------------------------
GET /cluster_one:twitter,cluster_two:twitter,twitter/_search <1>
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
{
"query": {
"match": {
"user": "kimchy"
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
<1> Search against the `twitter` index in `cluster_one`, `cluster_two` and also locally
[source,js]
--------------------------------------------------
{
"took": 150,
"timed_out": false,
"num_reduce_phases": 3,
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_shards": {
"total": 2,
"successful": 2,
"failed": 0,
"skipped": 0
},
"_clusters": { <1>
"total": 3,
"successful": 2,
"skipped": 1
},
"hits": {
"total" : {
"value": 2,
"relation": "eq"
},
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"max_score": 1,
"hits": [
{
"_index": "twitter",
"_type": "_doc",
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_id": "0",
"_score": 2,
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_source": {
"user": "kimchy",
"date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch",
"likes": 0
}
},
{
"_index": "cluster_one:twitter",
"_type": "_doc",
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_id": "0",
"_score": 1,
Cross Cluster Search: make remote clusters optional (#27182) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 05:41:47 -05:00
"_source": {
"user": "kimchy",
"date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch",
"likes": 0
}
}
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
<1> The `clusters` section indicates that one cluster was unavailable and got skipped
[float]
[[ccs-reduction]]
=== {ccs-cap} reduction phase
{ccs-cap} (CCS) requests can be executed in two ways:
- the CCS coordinating node minimizes network round-trips by sending one search
request to each cluster. Each cluster performs the search independently,
reducing and fetching results. Once the CCS node has received all the
responses, it performs another reduction and returns the relevant results back
to the user. This strategy is beneficial when there is network latency between
the CCS coordinating node and the remote clusters involved, which is typically
the case. A single request is sent to each remote cluster, at the cost of
retrieving `from` + `size` already fetched results. This is the default
strategy, used whenever possible. In case a scroll is provided, or inner hits
are requested as part of field collapsing, this strategy is not supported hence
network round-trips cannot be minimized and the following strategy is used
instead.
- the CCS coordinating node sends a <<search-shards,search shards>> request to
each remote cluster, in order to collect information about their corresponding
remote indices involved in the search request and the shards where their data
is located. Once each cluster has responded to such request, the search
executes as if all shards were part of the same cluster. The coordinating node
sends one request to each shard involved, each shard executes the query and
returns its own results which are then reduced (and fetched, depending on the
<<search-request-search-type, search type>>) by the CCS coordinating node.
This strategy may be beneficial whenever there is very low network latency
between the CCS coordinating node and the remote clusters involved, as it
treats all shards the same, at the cost of sending many requests to each remote
cluster, which is problematic in presence of network latency.
The <<search-request-body, search API>> supports the `ccs_minimize_roundtrips`
parameter, which defaults to `true` and can be set to `false` in case
minimizing network round-trips is not desirable.