diff --git a/docs/reference/docs/get.asciidoc b/docs/reference/docs/get.asciidoc index 3e9f1dc053e..14f37770fa5 100644 --- a/docs/reference/docs/get.asciidoc +++ b/docs/reference/docs/get.asciidoc @@ -1,9 +1,235 @@ [[docs-get]] === Get API +++++ +Get +++++ -The get API allows to get a JSON document from the index based on -its id. The following example gets a JSON document from an index called -twitter with id valued 0: +Retrieves the specified JSON document from an index. + +[[docs-get-api-request]] +==== {api-request-title} + +`GET /_doc/<_id>` + +`HEAD /_doc/<_id>` + +`GET /_source/<_id>` + +`HEAD /_source/<_id>` + +[[docs-get-api-desc]] +==== {api-description-title} +You use GET to retrieve a document and its source or stored fields from a +particular index. Use HEAD to verify that a document exists. You can +use the `_source` resource retrieve just the document source or verify +that it exists. + +[float] +[[realtime]] +===== Realtime + +By default, the get API is realtime, and is not affected by the refresh +rate of the index (when data will become visible for search). If a document +has been updated but is not yet refreshed, the get API will issue a refresh +call in-place to make the document visible. This will also make other documents +changed since the last refresh visible. In order to disable realtime GET, +one can set the `realtime` parameter to `false`. + +[float] +[[get-source-filtering]] +===== Source filtering + +By default, the get operation returns the contents of the `_source` field unless +you have used the `stored_fields` parameter or if the `_source` field is disabled. +You can turn off `_source` retrieval by using the `_source` parameter: + +[source,js] +-------------------------------------------------- +GET twitter/_doc/0?_source=false +-------------------------------------------------- +// CONSOLE +// TEST[setup:twitter] + +If you only need one or two fields from the `_source`, use the `_source_includes` +or `_source_excludes` parameters to include or filter out particular fields. +This can be especially helpful with large documents where partial retrieval can +save on network overhead. Both parameters take a comma separated list +of fields or wildcard expressions. Example: + +[source,js] +-------------------------------------------------- +GET twitter/_doc/0?_source_includes=*.id&_source_excludes=entities +-------------------------------------------------- +// CONSOLE +// TEST[setup:twitter] + +If you only want to specify includes, you can use a shorter notation: + +[source,js] +-------------------------------------------------- +GET twitter/_doc/0?_source=*.id,retweeted +-------------------------------------------------- +// CONSOLE +// TEST[setup:twitter] + +[float] +[[get-routing]] +===== Routing + +If routing is used during indexing, the routing value also needs to be +specified to retrieve a document. For example: + +[source,js] +-------------------------------------------------- +GET twitter/_doc/2?routing=user1 +-------------------------------------------------- +// CONSOLE +// TEST[continued] + +This request gets the tweet with id `2`, but it is routed based on the +user. The document is not fetched if the correct routing is not specified. + +[float] +[[preference]] +===== Preference + +Controls a `preference` of which shard replicas to execute the get +request on. By default, the operation is randomized between the shard +replicas. + +The `preference` can be set to: + +`_local`:: + The operation will prefer to be executed on a local + allocated shard if possible. + +Custom (string) value:: + A custom value will be used to guarantee that + the same shards will be used for the same custom value. This can help + with "jumping values" when hitting different shards in different refresh + states. A sample value can be something like the web session id, or the + user name. + +[float] +[[get-refresh]] +===== Refresh + +The `refresh` parameter can be set to `true` in order to refresh the +relevant shard before the get operation and make it searchable. Setting +it to `true` should be done after careful thought and verification that +this does not cause a heavy load on the system (and slows down +indexing). + +[float] +[[get-distributed]] +===== Distributed + +The get operation gets hashed into a specific shard id. It then gets +redirected to one of the replicas within that shard id and returns the +result. The replicas are the primary shard and its replicas within that +shard id group. This means that the more replicas we have, the +better GET scaling we will have. + +[float] +[[get-versioning]] +===== Versioning support + +You can use the `version` parameter to retrieve the document only if +its current version is equal to the specified one. This behavior is the same +for all version types with the exception of version type `FORCE` which always +retrieves the document. Note that `FORCE` version type is deprecated. + +Internally, Elasticsearch has marked the old document as deleted and added an +entirely new document. The old version of the document doesn’t disappear +immediately, although you won’t be able to access it. Elasticsearch cleans up +deleted documents in the background as you continue to index more data. + +[[docs-get-api-path-params]] +==== {api-path-parms-title} + +``:: +(Required, string) Name of the index that contains the document. + +`<_id>`:: +(Required, string) Unique identifier of the document. + +[[docs-get-api-query-params]] +==== {api-query-parms-title} + +`preference`:: +(Optional, string) Specify the node or shard the operation should +be performed on (default: random). + +`realtime`:: +(Optional, boolean) Set to `false` to disable real time GET +(default: `true`). See <>. + +include::{docdir}/rest-api/common-parms.asciidoc[tag=doc-refresh] + +include::{docdir}/rest-api/common-parms.asciidoc[tag=doc-routing] + +`stored_fields`:: +(Optional, boolean) Set to `true` to retrieve the document fields stored in the +index rather than the document `_source` (default: `false`). + +`_source`:: +(Optional, list) Set to `false` to disable source retrieval (default: `true`). + You can also specify a comma-separated list of the fields +you want to retrieve. + +`_source_excludes`:: +(Optional, list) Specify the source fields you want to exclude. + +`_source_includes`:: +(Optional, list) Specify the source fields you want to retrieve. + +include::{docdir}/rest-api/common-parms.asciidoc[tag=doc-version] + +include::{docdir}/rest-api/common-parms.asciidoc[tag=doc-version-type] + +[[docs-get-api-response-body]] +==== {api-response-body-title} + +`_index`:: +The name of the index the document belongs to. + +`_type`:: +The document type. {es} indices now support a single document type, `_doc`. + +`_id`:: +The unique identifier for the document. + +`_version`:: +The document version. Incremented each time the document is updated. + +`_seq_no`:: +The sequence number assigned to the document for the indexing +operation. Sequence numbers are used to ensure an older version of a document +doesn’t overwrite a newer version. See <>. + +`_primary_term`:: +The primary term assigned to the document for the indexing operation. +See <>. + +`found`:: +Indicates whether the document exists: `true` or `false`. + +`_routing`:: +The explicit routing, if set. + +'_source':: +If `found` is `true`, contains the document data formatted in JSON. +Excluded if the `_source` parameter is set to `false` or the `stored_fields` +paramter is set to `true`. + +'_fields':: +If the `stored_fields` parameter is set to `true` and `found` is +`true`, contains the document fields stored in the index. + +[[docs-get-api-example]] +==== {api-examples-title} + +Retrieve the JSON document with the `_id` 0 from the `twitter` index: [source,js] -------------------------------------------------- @@ -12,7 +238,7 @@ GET twitter/_doc/0 // CONSOLE // TEST[setup:twitter] -The result of the above get operation is: +The API returns the following result: [source,js] -------------------------------------------------- @@ -34,13 +260,7 @@ The result of the above get operation is: -------------------------------------------------- // TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/] -The above result includes the `_index`, `_id`, and `_version` -of the document we wish to retrieve, including the actual `_source` -of the document if it could be found (as indicated by the `found` -field in the response). - -The API also allows to check for the existence of a document using -`HEAD`, for example: +Check to see if a document with the `_id` 0 exists: [source,js] -------------------------------------------------- @@ -49,60 +269,50 @@ HEAD twitter/_doc/0 // CONSOLE // TEST[setup:twitter] -[float] -[[realtime]] -==== Realtime - -By default, the get API is realtime, and is not affected by the refresh -rate of the index (when data will become visible for search). If a document -has been updated but is not yet refreshed, the get API will issue a refresh -call in-place to make the document visible. This will also make other documents -changed since the last refresh visible. In order to disable realtime GET, -one can set the `realtime` parameter to `false`. +{es} returns a status code of `200 - OK` if the document exists, or +`404 - Not Found` if it doesn't. [float] -[[get-source-filtering]] -==== Source filtering +[[_source]] +===== Get the source field only -By default, the get operation returns the contents of the `_source` field unless -you have used the `stored_fields` parameter or if the `_source` field is disabled. -You can turn off `_source` retrieval by using the `_source` parameter: +Use the `/_source/` resource to get +just the `_source` field of a document. For example: [source,js] -------------------------------------------------- -GET twitter/_doc/0?_source=false +GET twitter/_source/1 -------------------------------------------------- // CONSOLE -// TEST[setup:twitter] +// TEST[continued] -If you only need one or two fields from the complete `_source`, you can use the `_source_includes` -and `_source_excludes` parameters to include or filter out the parts you need. This can be especially helpful -with large documents where partial retrieval can save on network overhead. Both parameters take a comma separated list -of fields or wildcard expressions. Example: +You can use the source filtering parameters to control which parts of the +`_source` are returned: [source,js] -------------------------------------------------- -GET twitter/_doc/0?_source_includes=*.id&_source_excludes=entities +GET twitter/_source/1/?_source_includes=*.id&_source_excludes=entities -------------------------------------------------- // CONSOLE -// TEST[setup:twitter] +// TEST[continued] -If you only want to specify includes, you can use a shorter notation: +You can use HEAD with the `_source` endpoint to efficiently +test whether or not the document _source exists. A document's source is not +available if it is disabled in the <>. [source,js] -------------------------------------------------- -GET twitter/_doc/0?_source=*.id,retweeted +HEAD twitter/_source/1 -------------------------------------------------- // CONSOLE -// TEST[setup:twitter] +// TEST[continued] [float] [[get-stored-fields]] -==== Stored Fields +===== Get stored fields -The get operation allows specifying a set of stored fields that will be -returned by passing the `stored_fields` parameter. -If the requested fields are not stored, they will be ignored. +Use the `stored_fields` parameter to specify the set of stored fields you want +to retrieve. Any requested fields that are not stored are ignored. Consider for instance the following mapping: [source,js] @@ -147,7 +357,7 @@ GET twitter/_doc/1?stored_fields=tags,counter // CONSOLE // TEST[continued] -The result of the above get operation is: +The API returns the following result: [source,js] -------------------------------------------------- @@ -168,11 +378,10 @@ The result of the above get operation is: -------------------------------------------------- // TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/] - Field values fetched from the document itself are always returned as an array. -Since the `counter` field is not stored the get request simply ignores it when trying to get the `stored_fields.` +Since the `counter` field is not stored, the get request ignores it. -It is also possible to retrieve metadata fields like the `_routing` field: +You can also retrieve metadata fields like the `_routing` field: [source,js] -------------------------------------------------- @@ -192,7 +401,7 @@ GET twitter/_doc/2?routing=user1&stored_fields=tags,counter // CONSOLE // TEST[continued] -The result of the above get operation is: +The API returns the following result: [source,js] -------------------------------------------------- @@ -214,113 +423,5 @@ The result of the above get operation is: -------------------------------------------------- // TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/] -Also only leaf fields can be returned via the `stored_field` option. So object fields can't be returned and such requests -will fail. - -[float] -[[_source]] -==== Getting the +_source+ directly - -Use the `/{index}/_source/{id}` endpoint to get -just the `_source` field of the document, -without any additional content around it. For example: - -[source,js] --------------------------------------------------- -GET twitter/_source/1 --------------------------------------------------- -// CONSOLE -// TEST[continued] - -You can also use the same source filtering parameters to control which parts of the `_source` will be returned: - -[source,js] --------------------------------------------------- -GET twitter/_source/1/?_source_includes=*.id&_source_excludes=entities --------------------------------------------------- -// CONSOLE -// TEST[continued] - -Note, there is also a HEAD variant for the _source endpoint to efficiently test for document _source existence. -An existing document will not have a _source if it is disabled in the <>. - -[source,js] --------------------------------------------------- -HEAD twitter/_source/1 --------------------------------------------------- -// CONSOLE -// TEST[continued] - -[float] -[[get-routing]] -==== Routing - -When indexing using the ability to control the routing, in order to get -a document, the routing value should also be provided. For example: - -[source,js] --------------------------------------------------- -GET twitter/_doc/2?routing=user1 --------------------------------------------------- -// CONSOLE -// TEST[continued] - -The above will get a tweet with id `2`, but will be routed based on the -user. Note that issuing a get without the correct routing will cause the -document not to be fetched. - -[float] -[[preference]] -==== Preference - -Controls a `preference` of which shard replicas to execute the get -request on. By default, the operation is randomized between the shard -replicas. - -The `preference` can be set to: - -`_local`:: - The operation will prefer to be executed on a local - allocated shard if possible. - -Custom (string) value:: - A custom value will be used to guarantee that - the same shards will be used for the same custom value. This can help - with "jumping values" when hitting different shards in different refresh - states. A sample value can be something like the web session id, or the - user name. - -[float] -[[get-refresh]] -==== Refresh - -The `refresh` parameter can be set to `true` in order to refresh the -relevant shard before the get operation and make it searchable. Setting -it to `true` should be done after careful thought and verification that -this does not cause a heavy load on the system (and slows down -indexing). - -[float] -[[get-distributed]] -==== Distributed - -The get operation gets hashed into a specific shard id. It then gets -redirected to one of the replicas within that shard id and returns the -result. The replicas are the primary shard and its replicas within that -shard id group. This means that the more replicas we have, the -better GET scaling we will have. - - -[float] -[[get-versioning]] -==== Versioning support - -You can use the `version` parameter to retrieve the document only if -its current version is equal to the specified one. This behavior is the same -for all version types with the exception of version type `FORCE` which always -retrieves the document. Note that `FORCE` version type is deprecated. - -Internally, Elasticsearch has marked the old document as deleted and added an -entirely new document. The old version of the document doesn’t disappear -immediately, although you won’t be able to access it. Elasticsearch cleans up -deleted documents in the background as you continue to index more data. +Only leaf fields can be retrieved with the `stored_field` option. Object fields +can't be returned--if specified, the request fails.