OpenSearch/docs/reference/modules/cross-cluster-search.asciidoc
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00

276 lines
6.5 KiB
Plaintext

[[modules-cross-cluster-search]]
== Cross Cluster Search
The _cross cluster search_ feature allows any node to act as a federated client across
multiple clusters. A cross cluster search node won't join the remote cluster, instead
it connects to a remote cluster in a light fashion in order to execute
federated search requests.
[float]
=== Using cross cluster search
Cross-cluster search requires <<modules-remote-clusters,configuring remote clusters>>.
[source,js]
--------------------------------
PUT _cluster/settings
{
"persistent": {
"cluster": {
"remote": {
"cluster_one": {
"seeds": [
"127.0.0.1:9300"
]
},
"cluster_two": {
"seeds": [
"127.0.0.1:9301"
]
},
"cluster_three": {
"seeds": [
"127.0.0.1:9302"
]
}
}
}
}
}
--------------------------------
// CONSOLE
// TEST[setup:host]
// TEST[s/127.0.0.1:9300/\${transport_host}/]
To search the `twitter` index on remote cluster `cluster_one` the index name
must be prefixed with the cluster alias separated by a `:` character:
[source,js]
--------------------------------------------------
GET /cluster_one:twitter/_search
{
"query": {
"match": {
"user": "kimchy"
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
// TEST[setup:twitter]
[source,js]
--------------------------------------------------
{
"took": 150,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0,
"skipped": 0
},
"_clusters": {
"total": 1,
"successful": 1,
"skipped": 0
},
"hits": {
"total" : {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "cluster_one:twitter",
"_type": "_doc",
"_id": "0",
"_score": 1,
"_source": {
"user": "kimchy",
"date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch",
"likes": 0
}
}
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
Indices can also be searched with the same name on different clusters:
[source,js]
--------------------------------------------------
GET /cluster_one:twitter,twitter/_search
{
"query": {
"match": {
"user": "kimchy"
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
Search results are disambiguated the same way as the indices are disambiguated in the request. Even if index names are
identical these indices will be treated as different indices when results are merged. All results retrieved from a
remote index
will be prefixed with their remote cluster name:
[source,js]
--------------------------------------------------
{
"took": 150,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0,
"skipped": 0
},
"_clusters": {
"total": 2,
"successful": 2,
"skipped": 0
},
"hits": {
"total" : {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "cluster_one:twitter",
"_type": "_doc",
"_id": "0",
"_score": 1,
"_source": {
"user": "kimchy",
"date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch",
"likes": 0
}
},
{
"_index": "twitter",
"_type": "_doc",
"_id": "0",
"_score": 2,
"_source": {
"user": "kimchy",
"date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch",
"likes": 0
}
}
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
[float]
=== Skipping disconnected clusters
By default all remote clusters that are searched via Cross Cluster Search need to be available when
the search request is executed, otherwise the whole request fails and no search results are returned
despite some of the clusters are available. Remote clusters can be made optional through the
boolean `skip_unavailable` setting, set to `false` by default.
[source,js]
--------------------------------
PUT _cluster/settings
{
"persistent": {
"cluster.remote.cluster_two.skip_unavailable": true <1>
}
}
--------------------------------
// CONSOLE
// TEST[continued]
<1> `cluster_two` is made optional
[source,js]
--------------------------------------------------
GET /cluster_one:twitter,cluster_two:twitter,twitter/_search <1>
{
"query": {
"match": {
"user": "kimchy"
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
<1> Search against the `twitter` index in `cluster_one`, `cluster_two` and also locally
[source,js]
--------------------------------------------------
{
"took": 150,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0,
"skipped": 0
},
"_clusters": { <1>
"total": 3,
"successful": 2,
"skipped": 1
},
"hits": {
"total" : {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "cluster_one:twitter",
"_type": "_doc",
"_id": "0",
"_score": 1,
"_source": {
"user": "kimchy",
"date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch",
"likes": 0
}
},
{
"_index": "twitter",
"_type": "_doc",
"_id": "0",
"_score": 2,
"_source": {
"user": "kimchy",
"date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch",
"likes": 0
}
}
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took": 150/"took": "$body.took"/]
// TESTRESPONSE[s/"max_score": 1/"max_score": "$body.hits.max_score"/]
// TESTRESPONSE[s/"_score": 1/"_score": "$body.hits.hits.0._score"/]
// TESTRESPONSE[s/"_score": 2/"_score": "$body.hits.hits.1._score"/]
<1> The `clusters` section indicates that one cluster was unavailable and got skipped