mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-14 00:45:30 +00:00
- added _version to response - exists call use -XHEAD with -i flag to include headers in the output
215 lines
7.3 KiB
Plaintext
215 lines
7.3 KiB
Plaintext
[[docs-get]]
|
|
== Get API
|
|
|
|
The get API allows to get a typed JSON document from the index based on
|
|
its id. The following example gets a JSON document from an index called
|
|
twitter, under a type called tweet, with id valued 1:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'http://localhost:9200/twitter/tweet/1'
|
|
--------------------------------------------------
|
|
|
|
The result of the above get operation is:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_index" : "twitter",
|
|
"_type" : "tweet",
|
|
"_id" : "1",
|
|
"_version" : 1,
|
|
"found": true,
|
|
"_source" : {
|
|
"user" : "kimchy",
|
|
"postDate" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The above result includes the `_index`, `_type`, `_id` and `_version`
|
|
of the document we wish to retrieve, including the actual `_source`
|
|
of the document if it could be found (as indicated by the `found`
|
|
field in the response).
|
|
|
|
The API also allows to check for the existence of a document using
|
|
`HEAD`, for example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XHEAD -i 'http://localhost:9200/twitter/tweet/1'
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
[[realtime]]
|
|
=== Realtime
|
|
|
|
By default, the get API is realtime, and is not affected by the refresh
|
|
rate of the index (when data will become visible for search).
|
|
|
|
In order to disable realtime GET, one can either set `realtime`
|
|
parameter to `false`, or globally default it to by setting the
|
|
`action.get.realtime` to `false` in the node configuration.
|
|
|
|
When getting a document, one can specify `fields` to fetch from it. They
|
|
will, when possible, be fetched as stored fields (fields mapped as
|
|
stored in the mapping). When using realtime GET, there is no notion of
|
|
stored fields (at least for a period of time, basically, until the next
|
|
flush), so they will be extracted from the source itself (note, even if
|
|
source is not enabled). It is a good practice to assume that the fields
|
|
will be loaded from source when using realtime GET, even if the fields
|
|
are stored.
|
|
|
|
[float]
|
|
[[type]]
|
|
=== Optional Type
|
|
|
|
The get API allows for `_type` to be optional. Set it to `_all` in order
|
|
to fetch the first document matching the id across all types.
|
|
|
|
|
|
[float]
|
|
[[get-source-filtering]]
|
|
=== Source filtering
|
|
|
|
added[1.0.0.Beta1]
|
|
|
|
By default, the get operation returns the contents of the `_source` field unless
|
|
you have used the `fields` parameter or if the `_source` field is disabled.
|
|
You can turn off `_source` retrieval by using the `_source` parameter:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'http://localhost:9200/twitter/tweet/1?_source=false'
|
|
--------------------------------------------------
|
|
|
|
If you only need one or two fields from the complete `_source`, you can use the `_source_include`
|
|
& `_source_exclude` parameters to include or filter out that parts you need. This can be especially helpful
|
|
with large documents where partial retrieval can save on network overhead. Both parameters take a comma separated list
|
|
of fields or wildcard expressions. Example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'http://localhost:9200/twitter/tweet/1?_source_include=*.id&_source_exclude=entities'
|
|
--------------------------------------------------
|
|
|
|
If you only want to specify includes, you can use a shorter notation:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'http://localhost:9200/twitter/tweet/1?_source=*.id,retweeted'
|
|
--------------------------------------------------
|
|
|
|
|
|
[float]
|
|
[[get-fields]]
|
|
=== Fields
|
|
|
|
The get operation allows specifying a set of stored fields that will be
|
|
returned by passing the `fields` parameter. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'http://localhost:9200/twitter/tweet/1?fields=title,content'
|
|
--------------------------------------------------
|
|
|
|
For backward compatibility, if the requested fields are not stored, they will be fetched
|
|
from the `_source` (parsed and extracted). This functionality has been replaced by the
|
|
<<get-source-filtering,source filtering>> parameter.
|
|
|
|
Field values fetched from the document it self are always returned as an array. Metadata fields like `_routing` and
|
|
`_parent` fields are never returned as an array.
|
|
|
|
Also only leaf fields can be returned via the `field` option. So object fields can't be returned and such requests
|
|
will fail.
|
|
|
|
[float]
|
|
[[_source]]
|
|
=== Getting the _source directly
|
|
|
|
Use the `/{index}/{type}/{id}/_source` endpoint to get
|
|
just the `_source` field of the document,
|
|
without any additional content around it. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'http://localhost:9200/twitter/tweet/1/_source'
|
|
--------------------------------------------------
|
|
|
|
You can also use the same source filtering parameters to control which parts of the `_source` will be returned:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'http://localhost:9200/twitter/tweet/1/_source?_source_include=*.id&_source_exclude=entities'
|
|
--------------------------------------------------
|
|
|
|
Note, there is also a HEAD variant for the _source endpoint to efficiently test for document existence.
|
|
Curl example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XHEAD -i 'http://localhost:9200/twitter/tweet/1/_source'
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
[[get-routing]]
|
|
=== Routing
|
|
|
|
When indexing using the ability to control the routing, in order to get
|
|
a document, the routing value should also be provided. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'http://localhost:9200/twitter/tweet/1?routing=kimchy'
|
|
--------------------------------------------------
|
|
|
|
The above will get a tweet with id 1, but will be routed based on the
|
|
user. Note, issuing a get without the correct routing, will cause the
|
|
document not to be fetched.
|
|
|
|
[float]
|
|
[[preference]]
|
|
=== Preference
|
|
|
|
Controls a `preference` of which shard replicas to execute the get
|
|
request on. By default, the operation is randomized between the shard
|
|
replicas.
|
|
|
|
The `preference` can be set to:
|
|
|
|
`_primary`::
|
|
The operation will go and be executed only on the primary
|
|
shards.
|
|
|
|
`_local`::
|
|
The operation will prefer to be executed on a local
|
|
allocated shard if possible.
|
|
|
|
Custom (string) value::
|
|
A custom value will be used to guarantee that
|
|
the same shards will be used for the same custom value. This can help
|
|
with "jumping values" when hitting different shards in different refresh
|
|
states. A sample value can be something like the web session id, or the
|
|
user name.
|
|
|
|
[float]
|
|
[[get-refresh]]
|
|
=== Refresh
|
|
|
|
The `refresh` parameter can be set to `true` in order to refresh the
|
|
relevant shard before the get operation and make it searchable. Setting
|
|
it to `true` should be done after careful thought and verification that
|
|
this does not cause a heavy load on the system (and slows down
|
|
indexing).
|
|
|
|
[float]
|
|
[[get-distributed]]
|
|
=== Distributed
|
|
|
|
The get operation gets hashed into a specific shard id. It then gets
|
|
redirected to one of the replicas within that shard id and returns the
|
|
result. The replicas are the primary shard and its replicas within that
|
|
shard id group. This means that the more replicas we will have, the
|
|
better GET scaling we will have.
|