mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-05-05 16:07:35 +00:00
327 lines
9.5 KiB
Plaintext
327 lines
9.5 KiB
Plaintext
[[docs-get]]
|
||
== Get API
|
||
|
||
The get API allows to get a JSON document from the index based on
|
||
its id. The following example gets a JSON document from an index called
|
||
twitter with id valued 0:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
GET twitter/_doc/0
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[setup:twitter]
|
||
|
||
The result of the above get operation is:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
{
|
||
"_index" : "twitter",
|
||
"_type" : "_doc",
|
||
"_id" : "0",
|
||
"_version" : 1,
|
||
"_seq_no" : 10,
|
||
"_primary_term" : 1,
|
||
"found": true,
|
||
"_source" : {
|
||
"user" : "kimchy",
|
||
"date" : "2009-11-15T14:12:12",
|
||
"likes": 0,
|
||
"message" : "trying out Elasticsearch"
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
|
||
|
||
The above result includes the `_index`, `_id`, and `_version`
|
||
of the document we wish to retrieve, including the actual `_source`
|
||
of the document if it could be found (as indicated by the `found`
|
||
field in the response).
|
||
|
||
The API also allows to check for the existence of a document using
|
||
`HEAD`, for example:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
HEAD twitter/_doc/0
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[setup:twitter]
|
||
|
||
[float]
|
||
[[realtime]]
|
||
=== Realtime
|
||
|
||
By default, the get API is realtime, and is not affected by the refresh
|
||
rate of the index (when data will become visible for search). If a document
|
||
has been updated but is not yet refreshed, the get API will issue a refresh
|
||
call in-place to make the document visible. This will also make other documents
|
||
changed since the last refresh visible. In order to disable realtime GET,
|
||
one can set the `realtime` parameter to `false`.
|
||
|
||
[float]
|
||
[[get-source-filtering]]
|
||
=== Source filtering
|
||
|
||
By default, the get operation returns the contents of the `_source` field unless
|
||
you have used the `stored_fields` parameter or if the `_source` field is disabled.
|
||
You can turn off `_source` retrieval by using the `_source` parameter:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
GET twitter/_doc/0?_source=false
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[setup:twitter]
|
||
|
||
If you only need one or two fields from the complete `_source`, you can use the `_source_includes`
|
||
and `_source_excludes` parameters to include or filter out the parts you need. This can be especially helpful
|
||
with large documents where partial retrieval can save on network overhead. Both parameters take a comma separated list
|
||
of fields or wildcard expressions. Example:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
GET twitter/_doc/0?_source_includes=*.id&_source_excludes=entities
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[setup:twitter]
|
||
|
||
If you only want to specify includes, you can use a shorter notation:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
GET twitter/_doc/0?_source=*.id,retweeted
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[setup:twitter]
|
||
|
||
[float]
|
||
[[get-stored-fields]]
|
||
=== Stored Fields
|
||
|
||
The get operation allows specifying a set of stored fields that will be
|
||
returned by passing the `stored_fields` parameter.
|
||
If the requested fields are not stored, they will be ignored.
|
||
Consider for instance the following mapping:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
PUT twitter
|
||
{
|
||
"mappings": {
|
||
"properties": {
|
||
"counter": {
|
||
"type": "integer",
|
||
"store": false
|
||
},
|
||
"tags": {
|
||
"type": "keyword",
|
||
"store": true
|
||
}
|
||
}
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
|
||
Now we can add a document:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
PUT twitter/_doc/1
|
||
{
|
||
"counter" : 1,
|
||
"tags" : ["red"]
|
||
}
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[continued]
|
||
|
||
And then try to retrieve it:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
GET twitter/_doc/1?stored_fields=tags,counter
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[continued]
|
||
|
||
The result of the above get operation is:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
{
|
||
"_index": "twitter",
|
||
"_type": "_doc",
|
||
"_id": "1",
|
||
"_version": 1,
|
||
"_seq_no" : 22,
|
||
"_primary_term" : 1,
|
||
"found": true,
|
||
"fields": {
|
||
"tags": [
|
||
"red"
|
||
]
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
|
||
|
||
|
||
Field values fetched from the document itself are always returned as an array.
|
||
Since the `counter` field is not stored the get request simply ignores it when trying to get the `stored_fields.`
|
||
|
||
It is also possible to retrieve metadata fields like the `_routing` field:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
PUT twitter/_doc/2?routing=user1
|
||
{
|
||
"counter" : 1,
|
||
"tags" : ["white"]
|
||
}
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[continued]
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
GET twitter/_doc/2?routing=user1&stored_fields=tags,counter
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[continued]
|
||
|
||
The result of the above get operation is:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
{
|
||
"_index": "twitter",
|
||
"_type": "_doc",
|
||
"_id": "2",
|
||
"_version": 1,
|
||
"_seq_no" : 13,
|
||
"_primary_term" : 1,
|
||
"_routing": "user1",
|
||
"found": true,
|
||
"fields": {
|
||
"tags": [
|
||
"white"
|
||
]
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
|
||
|
||
Also only leaf fields can be returned via the `stored_field` option. So object fields can't be returned and such requests
|
||
will fail.
|
||
|
||
[float]
|
||
[[_source]]
|
||
=== Getting the +_source+ directly
|
||
|
||
Use the `/{index}/_source/{id}` endpoint to get
|
||
just the `_source` field of the document,
|
||
without any additional content around it. For example:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
GET twitter/_source/1
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[continued]
|
||
|
||
You can also use the same source filtering parameters to control which parts of the `_source` will be returned:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
GET twitter/_source/1/?_source_includes=*.id&_source_excludes=entities
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[continued]
|
||
|
||
Note, there is also a HEAD variant for the _source endpoint to efficiently test for document _source existence.
|
||
An existing document will not have a _source if it is disabled in the <<mapping-source-field,mapping>>.
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
HEAD twitter/_source/1
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[continued]
|
||
|
||
[float]
|
||
[[get-routing]]
|
||
=== Routing
|
||
|
||
When indexing using the ability to control the routing, in order to get
|
||
a document, the routing value should also be provided. For example:
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
GET twitter/_doc/2?routing=user1
|
||
--------------------------------------------------
|
||
// CONSOLE
|
||
// TEST[continued]
|
||
|
||
The above will get a tweet with id `2`, but will be routed based on the
|
||
user. Note that issuing a get without the correct routing will cause the
|
||
document not to be fetched.
|
||
|
||
[float]
|
||
[[preference]]
|
||
=== Preference
|
||
|
||
Controls a `preference` of which shard replicas to execute the get
|
||
request on. By default, the operation is randomized between the shard
|
||
replicas.
|
||
|
||
The `preference` can be set to:
|
||
|
||
`_local`::
|
||
The operation will prefer to be executed on a local
|
||
allocated shard if possible.
|
||
|
||
Custom (string) value::
|
||
A custom value will be used to guarantee that
|
||
the same shards will be used for the same custom value. This can help
|
||
with "jumping values" when hitting different shards in different refresh
|
||
states. A sample value can be something like the web session id, or the
|
||
user name.
|
||
|
||
[float]
|
||
[[get-refresh]]
|
||
=== Refresh
|
||
|
||
The `refresh` parameter can be set to `true` in order to refresh the
|
||
relevant shard before the get operation and make it searchable. Setting
|
||
it to `true` should be done after careful thought and verification that
|
||
this does not cause a heavy load on the system (and slows down
|
||
indexing).
|
||
|
||
[float]
|
||
[[get-distributed]]
|
||
=== Distributed
|
||
|
||
The get operation gets hashed into a specific shard id. It then gets
|
||
redirected to one of the replicas within that shard id and returns the
|
||
result. The replicas are the primary shard and its replicas within that
|
||
shard id group. This means that the more replicas we have, the
|
||
better GET scaling we will have.
|
||
|
||
|
||
[float]
|
||
[[get-versioning]]
|
||
=== Versioning support
|
||
|
||
You can use the `version` parameter to retrieve the document only if
|
||
its current version is equal to the specified one. This behavior is the same
|
||
for all version types with the exception of version type `FORCE` which always
|
||
retrieves the document. Note that `FORCE` version type is deprecated.
|
||
|
||
Internally, Elasticsearch has marked the old document as deleted and added an
|
||
entirely new document. The old version of the document doesn’t disappear
|
||
immediately, although you won’t be able to access it. Elasticsearch cleans up
|
||
deleted documents in the background as you continue to index more data.
|