mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-25 17:38:44 +00:00
[DOCS] Backporting API ref reformatting for document APIs (#47631)
* [DOCS] Reformats bulk API. (#47479) * Reformats bulk API. * Update docs/reference/docs/bulk.asciidoc Co-Authored-By: James Rodewig <james.rodewig@elastic.co> * Reformats mget API (#47477) * Reformats mget API * Update docs/reference/docs/get.asciidoc Co-Authored-By: James Rodewig <james.rodewig@elastic.co> * Incorporated feedback. * Reformats reindex API (#47483) * Reformats reindex API * Incorporated review feedback. * Reformats term vectors APIs (#47484) * Reformat termvectors APIs * Reformats mtermvectors * Apply suggestions from code review Co-Authored-By: James Rodewig <james.rodewig@elastic.co> * Incorporated review feedback.
This commit is contained in:
parent
ffacfc642c
commit
41c04ef39c
@ -1,28 +1,37 @@
|
||||
[[docs-bulk]]
|
||||
=== Bulk API
|
||||
++++
|
||||
<titleabbrev>Bulk</titleabbrev>
|
||||
++++
|
||||
|
||||
The bulk API makes it possible to perform many index/delete operations
|
||||
in a single API call. This can greatly increase the indexing speed.
|
||||
Performs multiple indexing or delete operations in a single API call.
|
||||
This reduces overhead and can greatly increase indexing speed.
|
||||
|
||||
.Client support for bulk requests
|
||||
*********************************************
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
POST _bulk
|
||||
{ "index" : { "_index" : "test", "_id" : "1" } }
|
||||
{ "field1" : "value1" }
|
||||
{ "delete" : { "_index" : "test", "_id" : "2" } }
|
||||
{ "create" : { "_index" : "test", "_id" : "3" } }
|
||||
{ "field1" : "value3" }
|
||||
{ "update" : {"_id" : "1", "_index" : "test"} }
|
||||
{ "doc" : {"field2" : "value2"} }
|
||||
--------------------------------------------------
|
||||
|
||||
Some of the officially supported clients provide helpers to assist with
|
||||
bulk requests and reindexing of documents from one index to another:
|
||||
[[docs-bulk-api-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
Perl::
|
||||
`POST /_bulk`
|
||||
|
||||
See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
|
||||
and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
|
||||
`POST /<index>/_bulk`
|
||||
|
||||
Python::
|
||||
[[docs-bulk-api-desc]]
|
||||
==== {api-description-title}
|
||||
|
||||
See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
|
||||
Provides a way to perform multiple `index`, `create`, `delete`, and `update` actions in a single request.
|
||||
|
||||
*********************************************
|
||||
|
||||
The REST API endpoint is `/_bulk`, and it expects the following newline delimited JSON
|
||||
(NDJSON) structure:
|
||||
The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -36,19 +45,70 @@ optional_source\n
|
||||
--------------------------------------------------
|
||||
// NOTCONSOLE
|
||||
|
||||
*NOTE*: The final line of data must end with a newline character `\n`. Each newline character
|
||||
may be preceded by a carriage return `\r`. When sending requests to this endpoint the
|
||||
`Content-Type` header should be set to `application/x-ndjson`.
|
||||
The `index` and `create` actions expect a source on the next line,
|
||||
and have the same semantics as the `op_type` parameter in the standard index API:
|
||||
create fails if a document with the same name already exists in the index,
|
||||
index adds or replaces a document as necessary.
|
||||
|
||||
The possible actions are `index`, `create`, `delete`, and `update`.
|
||||
`index` and `create` expect a source on the next
|
||||
line, and have the same semantics as the `op_type` parameter to the
|
||||
standard index API (i.e. create will fail if a document with the same
|
||||
index exists already, whereas index will add or replace a
|
||||
document as necessary). `delete` does not expect a source on the
|
||||
following line, and has the same semantics as the standard delete API.
|
||||
`update` expects that the partial doc, upsert and script and its options
|
||||
are specified on the next line.
|
||||
`update` expects that the partial doc, upsert,
|
||||
and script and its options are specified on the next line.
|
||||
|
||||
`delete` does not expect a source on the next line and
|
||||
has the same semantics as the standard delete API.
|
||||
|
||||
[NOTE]
|
||||
====
|
||||
The final line of data must end with a newline character `\n`.
|
||||
Each newline character may be preceded by a carriage return `\r`.
|
||||
When sending requests to the `_bulk` endpoint,
|
||||
the `Content-Type` header should be set to `application/x-ndjson`.
|
||||
====
|
||||
|
||||
Because this format uses literal `\n`'s as delimiters,
|
||||
make sure that the JSON actions and sources are not pretty printed.
|
||||
|
||||
If you specify an index in the request URI,
|
||||
it is used for any actions that don't explicitly specify an index.
|
||||
|
||||
A note on the format: The idea here is to make processing of this as
|
||||
fast as possible. As some of the actions are redirected to other
|
||||
shards on other nodes, only `action_meta_data` is parsed on the
|
||||
receiving node side.
|
||||
|
||||
Client libraries using this protocol should try and strive to do
|
||||
something similar on the client side, and reduce buffering as much as
|
||||
possible.
|
||||
|
||||
The response to a bulk action is a large JSON structure with
|
||||
the individual results of each action performed,
|
||||
in the same order as the actions that appeared in the request.
|
||||
The failure of a single action does not affect the remaining actions.
|
||||
|
||||
There is no "correct" number of actions to perform in a single bulk request.
|
||||
Experiment with different settings to find the optimal size for your particular workload.
|
||||
|
||||
When using the HTTP API, make sure that the client does not send HTTP chunks,
|
||||
as this will slow things down.
|
||||
|
||||
[float]
|
||||
[[bulk-clients]]
|
||||
===== Client support for bulk requests
|
||||
|
||||
Some of the officially supported clients provide helpers to assist with
|
||||
bulk requests and reindexing of documents from one index to another:
|
||||
|
||||
Perl::
|
||||
|
||||
See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
|
||||
and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
|
||||
|
||||
Python::
|
||||
|
||||
See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
|
||||
|
||||
[float]
|
||||
[[bulk-curl]]
|
||||
===== Submitting bulk requests with cURL
|
||||
|
||||
If you're providing text file input to `curl`, you *must* use the
|
||||
`--data-binary` flag instead of plain `-d`. The latter doesn't preserve
|
||||
@ -65,9 +125,97 @@ $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --
|
||||
// NOTCONSOLE
|
||||
// Not converting to console because this shows how curl works
|
||||
|
||||
Because this format uses literal `\n`'s as delimiters, please be sure
|
||||
that the JSON actions and sources are not pretty printed. Here is an
|
||||
example of a correct sequence of bulk commands:
|
||||
[float]
|
||||
[[bulk-optimistic-concurrency-control]]
|
||||
===== Optimistic Concurrency Control
|
||||
|
||||
Each `index` and `delete` action within a bulk API call may include the
|
||||
`if_seq_no` and `if_primary_term` parameters in their respective action
|
||||
and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
|
||||
how operations are executed, based on the last modification to existing
|
||||
documents. See <<optimistic-concurrency-control>> for more details.
|
||||
|
||||
|
||||
[float]
|
||||
[[bulk-versioning]]
|
||||
===== Versioning
|
||||
|
||||
Each bulk item can include the version value using the
|
||||
`version` field. It automatically follows the behavior of the
|
||||
index / delete operation based on the `_version` mapping. It also
|
||||
support the `version_type` (see <<index-versioning, versioning>>).
|
||||
|
||||
[float]
|
||||
[[bulk-routing]]
|
||||
===== Routing
|
||||
|
||||
Each bulk item can include the routing value using the
|
||||
`routing` field. It automatically follows the behavior of the
|
||||
index / delete operation based on the `_routing` mapping.
|
||||
|
||||
[float]
|
||||
[[bulk-wait-for-active-shards]]
|
||||
===== Wait For Active Shards
|
||||
|
||||
When making bulk calls, you can set the `wait_for_active_shards`
|
||||
parameter to require a minimum number of shard copies to be active
|
||||
before starting to process the bulk request. See
|
||||
<<index-wait-for-active-shards,here>> for further details and a usage
|
||||
example.
|
||||
|
||||
[float]
|
||||
[[bulk-refresh]]
|
||||
===== Refresh
|
||||
|
||||
Control when the changes made by this request are visible to search. See
|
||||
<<docs-refresh,refresh>>.
|
||||
|
||||
NOTE: Only the shards that receive the bulk request will be affected by
|
||||
`refresh`. Imagine a `_bulk?refresh=wait_for` request with three
|
||||
documents in it that happen to be routed to different shards in an index
|
||||
with five shards. The request will only wait for those three shards to
|
||||
refresh. The other two shards that make up the index do not
|
||||
participate in the `_bulk` request at all.
|
||||
|
||||
[float]
|
||||
[[bulk-security]]
|
||||
===== Security
|
||||
|
||||
See <<url-access-control>>.
|
||||
|
||||
[float]
|
||||
[[bulk-partial-responses]]
|
||||
===== Partial responses
|
||||
To ensure fast responses, the bulk API will respond with partial results if one or more shards fail.
|
||||
See <<shard-failures, Shard failures>> for more information.
|
||||
|
||||
[[docs-bulk-api-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<index>`::
|
||||
(Optional, string) Name of the index to perform the bulk actions against.
|
||||
|
||||
[[docs-bulk-api-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=pipeline]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=source]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_excludes]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_includes]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=timeout]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
|
||||
|
||||
[[docs-bulk-api-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
@ -81,7 +229,7 @@ POST _bulk
|
||||
{ "doc" : {"field2" : "value2"} }
|
||||
--------------------------------------------------
|
||||
|
||||
The result of this bulk operation is:
|
||||
The API returns the following result:
|
||||
|
||||
[source,console-result]
|
||||
--------------------------------------------------
|
||||
@ -171,85 +319,9 @@ The result of this bulk operation is:
|
||||
// TESTRESPONSE[s/"_seq_no" : 3/"_seq_no" : $body.items.3.update._seq_no/]
|
||||
// TESTRESPONSE[s/"_primary_term" : 4/"_primary_term" : $body.items.3.update._primary_term/]
|
||||
|
||||
The endpoints are `/_bulk` and `/{index}/_bulk`. When the index is provided, it
|
||||
will be used by default on bulk items that don't provide it explicitly.
|
||||
|
||||
A note on the format. The idea here is to make processing of this as
|
||||
fast as possible. As some of the actions will be redirected to other
|
||||
shards on other nodes, only `action_meta_data` is parsed on the
|
||||
receiving node side.
|
||||
|
||||
Client libraries using this protocol should try and strive to do
|
||||
something similar on the client side, and reduce buffering as much as
|
||||
possible.
|
||||
|
||||
The response to a bulk action is a large JSON structure with the individual
|
||||
results of each action that was performed in the same order as the actions that
|
||||
appeared in the request. The failure of a single action does not affect the
|
||||
remaining actions.
|
||||
|
||||
There is no "correct" number of actions to perform in a single bulk
|
||||
call. You should experiment with different settings to find the optimum
|
||||
size for your particular workload.
|
||||
|
||||
If using the HTTP API, make sure that the client does not send HTTP
|
||||
chunks, as this will slow things down.
|
||||
|
||||
[float]
|
||||
[[bulk-optimistic-concurrency-control]]
|
||||
==== Optimistic Concurrency Control
|
||||
|
||||
Each `index` and `delete` action within a bulk API call may include the
|
||||
`if_seq_no` and `if_primary_term` parameters in their respective action
|
||||
and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
|
||||
how operations are executed, based on the last modification to existing
|
||||
documents. See <<optimistic-concurrency-control>> for more details.
|
||||
|
||||
|
||||
[float]
|
||||
[[bulk-versioning]]
|
||||
==== Versioning
|
||||
|
||||
Each bulk item can include the version value using the
|
||||
`version` field. It automatically follows the behavior of the
|
||||
index / delete operation based on the `_version` mapping. It also
|
||||
support the `version_type` (see <<index-versioning, versioning>>).
|
||||
|
||||
[float]
|
||||
[[bulk-routing]]
|
||||
==== Routing
|
||||
|
||||
Each bulk item can include the routing value using the
|
||||
`routing` field. It automatically follows the behavior of the
|
||||
index / delete operation based on the `_routing` mapping.
|
||||
|
||||
[float]
|
||||
[[bulk-wait-for-active-shards]]
|
||||
==== Wait For Active Shards
|
||||
|
||||
When making bulk calls, you can set the `wait_for_active_shards`
|
||||
parameter to require a minimum number of shard copies to be active
|
||||
before starting to process the bulk request. See
|
||||
<<index-wait-for-active-shards,here>> for further details and a usage
|
||||
example.
|
||||
|
||||
[float]
|
||||
[[bulk-refresh]]
|
||||
==== Refresh
|
||||
|
||||
Control when the changes made by this request are visible to search. See
|
||||
<<docs-refresh,refresh>>.
|
||||
|
||||
NOTE: Only the shards that receive the bulk request will be affected by
|
||||
`refresh`. Imagine a `_bulk?refresh=wait_for` request with three
|
||||
documents in it that happen to be routed to different shards in an index
|
||||
with five shards. The request will only wait for those three shards to
|
||||
refresh. The other two shards that make up the index do not
|
||||
participate in the `_bulk` request at all.
|
||||
|
||||
[float]
|
||||
[[bulk-update]]
|
||||
==== Update
|
||||
===== Bulk update example
|
||||
|
||||
When using the `update` action, `retry_on_conflict` can be used as a field in
|
||||
the action itself (not in the extra payload line), to specify how many
|
||||
@ -276,13 +348,3 @@ POST _bulk
|
||||
--------------------------------------------------
|
||||
// TEST[continued]
|
||||
|
||||
[float]
|
||||
[[bulk-security]]
|
||||
==== Security
|
||||
|
||||
See <<url-access-control>>.
|
||||
|
||||
[float]
|
||||
[[bulk-partial-responses]]
|
||||
==== Partial responses
|
||||
To ensure fast responses, the bulk API will respond with partial results if one or more shards fail. See <<shard-failures, Shard failures>> for more information.
|
@ -6,6 +6,12 @@
|
||||
|
||||
Retrieves the specified JSON document from an index.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET twitter/_doc/0
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
[[docs-get-api-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
@ -150,32 +156,21 @@ deleted documents in the background as you continue to index more data.
|
||||
[[docs-get-api-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
`preference`::
|
||||
(Optional, string) Specify the node or shard the operation should
|
||||
be performed on (default: random).
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=preference]
|
||||
|
||||
`realtime`::
|
||||
(Optional, boolean) Set to `false` to disable real time GET
|
||||
(default: `true`). See <<realtime>>.
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=realtime]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
|
||||
|
||||
`stored_fields`::
|
||||
(Optional, boolean) Set to `true` to retrieve the document fields stored in the
|
||||
index rather than the document `_source` (default: `false`).
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=stored_fields]
|
||||
|
||||
`_source`::
|
||||
(Optional, list) Set to `false` to disable source retrieval (default: `true`).
|
||||
You can also specify a comma-separated list of the fields
|
||||
you want to retrieve.
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=source]
|
||||
|
||||
`_source_excludes`::
|
||||
(Optional, list) Specify the source fields you want to exclude.
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_excludes]
|
||||
|
||||
`_source_includes`::
|
||||
(Optional, list) Specify the source fields you want to retrieve.
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_includes]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=doc-version]
|
||||
|
||||
|
@ -1,15 +1,10 @@
|
||||
[[docs-multi-get]]
|
||||
=== Multi Get API
|
||||
=== Multi get (mget) API
|
||||
++++
|
||||
<titleabbrev>Multi get</titleabbrev>
|
||||
++++
|
||||
|
||||
The Multi get API returns multiple documents based on an index, type,
|
||||
(optional) and id (and possibly routing). The response includes a `docs` array
|
||||
with all the fetched documents in order corresponding to the original multi-get
|
||||
request (if there was a failure for a specific get, an object containing this
|
||||
error is included in place in the response instead). The structure of a
|
||||
successful get is similar in structure to a document provided by the
|
||||
<<docs-get,get>> API.
|
||||
|
||||
Here is an example:
|
||||
Retrieves multiple JSON documents by ID.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
@ -17,25 +12,121 @@ GET /_mget
|
||||
{
|
||||
"docs" : [
|
||||
{
|
||||
"_index" : "test",
|
||||
"_type" : "_doc",
|
||||
"_index" : "twitter",
|
||||
"_id" : "1"
|
||||
},
|
||||
{
|
||||
"_index" : "test",
|
||||
"_type" : "_doc",
|
||||
"_index" : "twitter",
|
||||
"_id" : "2"
|
||||
}
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
The `mget` endpoint can also be used against an index (in which case it
|
||||
is not required in the body):
|
||||
[[docs-multi-get-api-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
`GET /_mget`
|
||||
|
||||
`GET /<index>/_mget`
|
||||
|
||||
[[docs-multi-get-api-desc]]
|
||||
==== {api-description-title}
|
||||
|
||||
You use `mget` to retrieve multiple documents from one or more indices.
|
||||
If you specify an index in the request URI, you only need to specify the document IDs in the request body.
|
||||
|
||||
[[mget-security]]
|
||||
===== Security
|
||||
|
||||
See <<url-access-control>>.
|
||||
|
||||
[[multi-get-partial-responses]]
|
||||
===== Partial responses
|
||||
|
||||
To ensure fast responses, the multi get API responds with partial results if one or more shards fail.
|
||||
See <<shard-failures, Shard failures>> for more information.
|
||||
|
||||
[[docs-multi-get-api-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=index]
|
||||
|
||||
[[docs-multi-get-api-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=preference]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=realtime]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=stored_fields]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=source]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_excludes]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_includes]
|
||||
|
||||
[[docs-multi-get-api-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
||||
`docs`::
|
||||
(Optional, array) The documents you want to retrieve.
|
||||
Required if no index is specified in the request URI.
|
||||
You can specify the following attributes for each
|
||||
document:
|
||||
+
|
||||
--
|
||||
`_id`::
|
||||
(Required, string) The unique document ID.
|
||||
|
||||
`_index`::
|
||||
(Optional, string)
|
||||
The index that contains the document.
|
||||
Required if no index is specified in the request URI.
|
||||
|
||||
`_routing`::
|
||||
(Optional, string) The key for the primary shard the document resides on.
|
||||
Required if routing is used during indexing.
|
||||
|
||||
`_source`::
|
||||
(Optional, boolean) If `false`, excludes all `_source` fields. Defaults to `true`.
|
||||
`source_include`:::
|
||||
(Optional, array) The fields to extract and return from the `_source` field.
|
||||
`source_exclude`:::
|
||||
(Optional, array) The fields to exclude from the returned `_source` field.
|
||||
|
||||
`_stored_fields`::
|
||||
(Optional, array) The stored fields you want to retrieve.
|
||||
--
|
||||
|
||||
`ids`::
|
||||
(Optional, array) The IDs of the documents you want to retrieve.
|
||||
Allowed when the index is specified in the request URI.
|
||||
|
||||
[[multi-get-api-response-body]]
|
||||
==== {api-response-body-title}
|
||||
|
||||
The response includes a `docs` array that contains the documents in the order specified in the request.
|
||||
The structure of the returned documents is similar to that returned by the <<docs-get,get>> API.
|
||||
If there is a failure getting a particular document, the error is included in place of the document.
|
||||
|
||||
[[docs-multi-get-api-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
[[mget-ids]]
|
||||
===== Get documents by ID
|
||||
|
||||
If you specify an index in the request URI, only the document IDs are required in the request body:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /test/_mget
|
||||
GET /twitter/_mget
|
||||
{
|
||||
"docs" : [
|
||||
{
|
||||
@ -66,30 +157,31 @@ GET /test/_doc/_mget
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
//CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
In which case, the `ids` element can directly be used to simplify the
|
||||
request:
|
||||
You can use the `ids` element to simplify the request:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
GET /test/_doc/_mget
|
||||
GET /twitter/_mget
|
||||
{
|
||||
"ids" : ["1", "2"]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
[float]
|
||||
[[mget-source-filtering]]
|
||||
==== Source filtering
|
||||
===== Filter source fields
|
||||
|
||||
By default, the `_source` field will be returned for every document (if stored).
|
||||
Similar to the <<get-source-filtering,get>> API, you can retrieve only parts of
|
||||
the `_source` (or not at all) by using the `_source` parameter. You can also use
|
||||
the url parameters `_source`, `_source_includes`, and `_source_excludes` to specify defaults,
|
||||
which will be used when there are no per-document instructions.
|
||||
By default, the `_source` field is returned for every document (if stored).
|
||||
Use the `_source` and `_source_include` or `source_exclude` attributes to
|
||||
filter what fields are returned for a particular document.
|
||||
You can include the `_source`, `_source_includes`, and `_source_excludes` query parameters in the
|
||||
request URI to specify the defaults to use when there are no per-document instructions.
|
||||
|
||||
For example:
|
||||
For example, the following request sets `_source` to false for document 1 to exclude the
|
||||
source entirely, retrieves `field3` and `field4` from document 2, and retrieves the `user` field
|
||||
from document 3 but filters out the `user.location` field.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
@ -121,13 +213,16 @@ GET /_mget
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
|
||||
[float]
|
||||
[[mget-fields]]
|
||||
==== Fields
|
||||
===== Get stored fields
|
||||
|
||||
Specific stored fields can be specified to be retrieved per document to get, similar to the <<get-stored-fields,stored_fields>> parameter of the Get API.
|
||||
For example:
|
||||
Use the `stored_fields` attribute to specify the set of stored fields you want
|
||||
to retrieve. Any requested fields that are not stored are ignored.
|
||||
You can include the `stored_fields` query parameter in the request URI to specify the defaults
|
||||
to use when there are no per-document instructions.
|
||||
|
||||
For example, the following request retrieves `field1` and `field2` from document 1, and
|
||||
`field3` and `field4`from document 2:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
@ -150,8 +245,9 @@ GET /_mget
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Alternatively, you can specify the `stored_fields` parameter in the query string
|
||||
as a default to be applied to all documents.
|
||||
The following request retrieves `field1` and `field2` from all documents by default.
|
||||
These default fields are returned for document 1, but
|
||||
overridden to return `field3` and `field4` for document 2.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
@ -159,23 +255,22 @@ GET /test/_doc/_mget?stored_fields=field1,field2
|
||||
{
|
||||
"docs" : [
|
||||
{
|
||||
"_id" : "1" <1>
|
||||
"_id" : "1"
|
||||
},
|
||||
{
|
||||
"_id" : "2",
|
||||
"stored_fields" : ["field3", "field4"] <2>
|
||||
"stored_fields" : ["field3", "field4"]
|
||||
}
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
<1> Returns `field1` and `field2`
|
||||
<2> Returns `field3` and `field4`
|
||||
|
||||
[float]
|
||||
[[mget-routing]]
|
||||
==== Routing
|
||||
===== Specify document routing
|
||||
|
||||
You can also specify a routing value as a parameter:
|
||||
If routing is used during indexing, you need to specify the routing value to retrieve documents.
|
||||
For example, the following request fetches `test/_doc/2` from the shard corresponding to routing key `key1`,
|
||||
and fetches `test/_doc/1` from the shard corresponding to routing key `key2`.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
@ -195,18 +290,4 @@ GET /_mget?routing=key1
|
||||
}
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
In this example, document `test/_doc/2` will be fetched from the shard corresponding to routing key `key1` but
|
||||
document `test/_doc/1` will be fetched from the shard corresponding to routing key `key2`.
|
||||
|
||||
[float]
|
||||
[[mget-security]]
|
||||
==== Security
|
||||
|
||||
See <<url-access-control>>.
|
||||
|
||||
[float]
|
||||
[[multi-get-partial-responses]]
|
||||
==== Partial responses
|
||||
To ensure fast responses, the multi get API will respond with partial results if one or more shards fail. See <<shard-failures, Shard failures>> for more information.
|
||||
--------------------------------------------------
|
@ -1,14 +1,10 @@
|
||||
[[docs-multi-termvectors]]
|
||||
=== Multi termvectors API
|
||||
=== Multi term vectors API
|
||||
++++
|
||||
<titleabbrev>Multi term vectors</titleabbrev>
|
||||
++++
|
||||
|
||||
Multi termvectors API allows to get multiple termvectors at once. The
|
||||
documents from which to retrieve the term vectors are specified by an index and id.
|
||||
But the documents could also be artificially provided in the request itself.
|
||||
|
||||
The response includes a `docs`
|
||||
array with all the fetched termvectors, each element having the structure
|
||||
provided by the <<docs-termvectors,termvectors>>
|
||||
API. Here is an example:
|
||||
Retrieves multiple term vectors with a single request.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
@ -32,10 +28,64 @@ POST /_mtermvectors
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
See the <<docs-termvectors,termvectors>> API for a description of possible parameters.
|
||||
[[docs-multi-termvectors-api-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
The `_mtermvectors` endpoint can also be used against an index (in which case it
|
||||
is not required in the body):
|
||||
`POST /_mtermvectors`
|
||||
|
||||
`POST /<index>/_mtermvectors`
|
||||
|
||||
[[docs-multi-termvectors-api-desc]]
|
||||
==== {api-description-title}
|
||||
|
||||
You can specify existing documents by index and ID or
|
||||
provide artificial documents in the body of the request.
|
||||
The index can be specified the body of the request or in the request URI.
|
||||
|
||||
The response contains a `docs` array with all the fetched termvectors.
|
||||
Each element has the structure provided by the <<docs-termvectors,termvectors>>
|
||||
API.
|
||||
|
||||
See the <<docs-termvectors,termvectors>> API for more information about the information
|
||||
that can be included in the response.
|
||||
|
||||
[[docs-multi-termvectors-api-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<index>`::
|
||||
(Optional, string) Name of the index that contains the documents.
|
||||
|
||||
[[docs-multi-termvectors-api-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=fields]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=field_statistics]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=offsets]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=payloads]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=positions]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=preference]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=realtime]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=term_statistics]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=version]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=version_type]
|
||||
|
||||
[float]
|
||||
[[docs-multi-termvectors-api-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
If you specify an index in the request URI, the index does not need to be specified for each documents
|
||||
in the request body:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
@ -57,7 +107,8 @@ POST /twitter/_mtermvectors
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
If all requested documents are on same index and also the parameters are the same, the request can be simplified:
|
||||
If all requested documents are in same index and the parameters are the same, you can use the
|
||||
following simplified syntax:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
@ -74,9 +125,11 @@ POST /twitter/_mtermvectors
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
Additionally, just like for the <<docs-termvectors,termvectors>>
|
||||
API, term vectors could be generated for user provided documents.
|
||||
The mapping used is determined by `_index`.
|
||||
[[docs-multi-termvectors-artificial-doc]]
|
||||
===== Artificial documents
|
||||
|
||||
You can also use `mtermvectors` to generate term vectors for _artificial_ documents provided
|
||||
in the body of the request. The mapping used is determined by the specified `_index`.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -1,10 +1,10 @@
|
||||
[[docs-termvectors]]
|
||||
=== Term Vectors
|
||||
=== Term vectors API
|
||||
++++
|
||||
<titleabbrev>Term vectors</titleabbrev>
|
||||
++++
|
||||
|
||||
Returns information and statistics on terms in the fields of a particular
|
||||
document. The document could be stored in the index or artificially provided
|
||||
by the user. Term vectors are <<realtime,realtime>> by default, not near
|
||||
realtime. This can be changed by setting `realtime` parameter to `false`.
|
||||
Retrieves information and statistics for terms in the fields of a particular document.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
@ -12,8 +12,19 @@ GET /twitter/_termvectors/1
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
Optionally, you can specify the fields for which the information is
|
||||
retrieved either with a parameter in the url
|
||||
[[docs-termvectors-api-request]]
|
||||
==== {api-request-title}
|
||||
|
||||
`GET /<index>/_termvectors/<_id>`
|
||||
|
||||
[[docs-termvectors-api-desc]]
|
||||
==== {api-description-title}
|
||||
|
||||
You can retrieve term vectors for documents stored in the index or
|
||||
for _artificial_ documents passed in the body of the request.
|
||||
|
||||
You can specify the fields you are interested in through the `fields` parameter,
|
||||
or by adding the fields to the request body.
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
@ -21,18 +32,16 @@ GET /twitter/_termvectors/1?fields=message
|
||||
--------------------------------------------------
|
||||
// TEST[setup:twitter]
|
||||
|
||||
or by adding the requested fields in the request body (see
|
||||
example below). Fields can also be specified with wildcards
|
||||
in similar way to the <<query-dsl-multi-match-query,multi match query>>
|
||||
Fields can be specified using wildcards, similar to the <<query-dsl-multi-match-query,multi match query>>.
|
||||
|
||||
[float]
|
||||
==== Return values
|
||||
Term vectors are <<realtime,real-time>> by default, not near real-time.
|
||||
This can be changed by setting `realtime` parameter to `false`.
|
||||
|
||||
Three types of values can be requested: _term information_, _term statistics_
|
||||
You can request three types of values: _term information_, _term statistics_
|
||||
and _field statistics_. By default, all term information and field
|
||||
statistics are returned for all fields but no term statistics.
|
||||
statistics are returned for all fields but term statistics are excluded.
|
||||
|
||||
[float]
|
||||
[[docs-termvectors-api-term-info]]
|
||||
===== Term information
|
||||
|
||||
* term frequency in the field (always returned)
|
||||
@ -52,7 +61,7 @@ should make sure that the string you are taking a sub-string of is also encoded
|
||||
using UTF-16.
|
||||
======
|
||||
|
||||
[float]
|
||||
[[docs-termvectors-api-term-stats]]
|
||||
===== Term statistics
|
||||
|
||||
Setting `term_statistics` to `true` (default is `false`) will
|
||||
@ -65,7 +74,7 @@ return
|
||||
By default these values are not returned since term statistics can
|
||||
have a serious performance impact.
|
||||
|
||||
[float]
|
||||
[[docs-termvectors-api-field-stats]]
|
||||
===== Field statistics
|
||||
|
||||
Setting `field_statistics` to `false` (default is `true`) will
|
||||
@ -77,8 +86,8 @@ omit :
|
||||
* sum of total term frequencies (the sum of total term frequencies of
|
||||
each term in this field)
|
||||
|
||||
[float]
|
||||
===== Terms Filtering
|
||||
[[docs-termvectors-api-terms-filtering]]
|
||||
===== Terms filtering
|
||||
|
||||
With the parameter `filter`, the terms returned could also be filtered based
|
||||
on their tf-idf scores. This could be useful in order find out a good
|
||||
@ -105,7 +114,7 @@ The following sub-parameters are supported:
|
||||
`max_word_length`::
|
||||
The maximum word length above which words will be ignored. Defaults to unbounded (`0`).
|
||||
|
||||
[float]
|
||||
[[docs-termvectors-api-behavior]]
|
||||
==== Behaviour
|
||||
|
||||
The term and field statistics are not accurate. Deleted documents
|
||||
@ -116,8 +125,45 @@ whereas the absolute numbers have no meaning in this context. By default,
|
||||
when requesting term vectors of artificial documents, a shard to get the statistics
|
||||
from is randomly selected. Use `routing` only to hit a particular shard.
|
||||
|
||||
[float]
|
||||
===== Example: Returning stored term vectors
|
||||
[[docs-termvectors-api-path-params]]
|
||||
==== {api-path-parms-title}
|
||||
|
||||
`<index>`::
|
||||
(Required, string) Name of the index that contains the document.
|
||||
|
||||
`<_id>`::
|
||||
(Optional, string) Unique identifier of the document.
|
||||
|
||||
[[docs-termvectors-api-query-params]]
|
||||
==== {api-query-parms-title}
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=fields]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=field_statistics]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=offsets]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=payloads]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=positions]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=preference]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=realtime]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=term_statistics]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=version]
|
||||
|
||||
include::{docdir}/rest-api/common-parms.asciidoc[tag=version_type]
|
||||
|
||||
[[docs-termvectors-api-example]]
|
||||
==== {api-examples-title}
|
||||
|
||||
[[docs-termvectors-api-stored-termvectors]]
|
||||
===== Returning stored term vectors
|
||||
|
||||
First, we create an index that stores term vectors, payloads etc. :
|
||||
|
||||
@ -260,8 +306,8 @@ Response:
|
||||
// TEST[continued]
|
||||
// TESTRESPONSE[s/"took": 6/"took": "$body.took"/]
|
||||
|
||||
[float]
|
||||
===== Example: Generating term vectors on the fly
|
||||
[[docs-termvectors-api-generate-termvectors]]
|
||||
===== Generating term vectors on the fly
|
||||
|
||||
Term vectors which are not explicitly stored in the index are automatically
|
||||
computed on the fly. The following request returns all information and statistics for the
|
||||
@ -282,8 +328,7 @@ GET /twitter/_termvectors/1
|
||||
// TEST[continued]
|
||||
|
||||
[[docs-termvectors-artificial-doc]]
|
||||
[float]
|
||||
===== Example: Artificial documents
|
||||
===== Artificial documents
|
||||
|
||||
Term vectors can also be generated for artificial documents,
|
||||
that is for documents not present in the index. For example, the following request would
|
||||
@ -305,7 +350,6 @@ GET /twitter/_termvectors
|
||||
// TEST[continued]
|
||||
|
||||
[[docs-termvectors-per-field-analyzer]]
|
||||
[float]
|
||||
====== Per-field analyzer
|
||||
|
||||
Additionally, a different analyzer than the one at the field may be provided
|
||||
@ -371,8 +415,7 @@ Response:
|
||||
|
||||
|
||||
[[docs-termvectors-terms-filtering]]
|
||||
[float]
|
||||
===== Example: Terms filtering
|
||||
===== Terms filtering
|
||||
|
||||
Finally, the terms returned could be filtered based on their tf-idf scores. In
|
||||
the example below we obtain the three most "interesting" keywords from the
|
||||
|
@ -143,13 +143,12 @@ Wildcard expressions are not accepted.
|
||||
--
|
||||
end::expand-wildcards[]
|
||||
|
||||
tag::index-alias-filter[]
|
||||
<<query-dsl-bool-query, Filter query>>
|
||||
used to limit the index alias.
|
||||
+
|
||||
If specified,
|
||||
the index alias only applies to documents returned by the filter.
|
||||
end::index-alias-filter[]
|
||||
tag::field_statistics[]
|
||||
`field_statistics`::
|
||||
(Optional, boolean) If `true`, the response includes the document count, sum of document frequencies,
|
||||
and sum of total term frequencies.
|
||||
Defaults to `true`.
|
||||
end::field_statistics[]
|
||||
|
||||
tag::fielddata-fields[]
|
||||
`fielddata_fields`::
|
||||
@ -243,7 +242,7 @@ end::cat-h[]
|
||||
|
||||
tag::help[]
|
||||
`help`::
|
||||
(Optional, boolean) If `true`, the response returns help information. Defaults
|
||||
(Optional, boolean) If `true`, the response includes help information. Defaults
|
||||
to `false`.
|
||||
end::help[]
|
||||
|
||||
@ -465,6 +464,12 @@ Comma-separated list of node IDs or names
|
||||
used to limit returned information.
|
||||
end::node-id-query-parm[]
|
||||
|
||||
tag::offsets[]
|
||||
`<offsets>`::
|
||||
(Optional, boolean) If `true`, the response includes term offsets.
|
||||
Defaults to `true`.
|
||||
end::offsets[]
|
||||
|
||||
tag::parent-task-id[]
|
||||
`parent_task_id`::
|
||||
+
|
||||
@ -490,6 +495,18 @@ tag::path-pipeline[]
|
||||
used to limit the request.
|
||||
end::path-pipeline[]
|
||||
|
||||
tag::payloads[]
|
||||
`payloads`::
|
||||
(Optional, boolean) If `true`, the response includes term payloads.
|
||||
Defaults to `true`.
|
||||
end::payloads[]
|
||||
|
||||
tag::positions[]
|
||||
`positions`::
|
||||
(Optional, boolean) If `true`, the response includes term positions.
|
||||
Defaults to `true`.
|
||||
end::positions[]
|
||||
|
||||
tag::preference[]
|
||||
`preference`::
|
||||
(Optional, string) Specifies the node or shard the operation should be
|
||||
@ -507,6 +524,12 @@ tag::query[]
|
||||
<<query-dsl,Query DSL>>.
|
||||
end::query[]
|
||||
|
||||
tag::realtime[]
|
||||
`realtime`::
|
||||
(Optional, boolean) If `true`, the request is real-time as opposed to near-real-time.
|
||||
Defaults to `true`. See <<realtime>>.
|
||||
end::realtime[]
|
||||
|
||||
tag::refresh[]
|
||||
`refresh`::
|
||||
(Optional, enum) If `true`, {es} refreshes the affected shards to make this
|
||||
@ -517,8 +540,8 @@ end::refresh[]
|
||||
|
||||
tag::request_cache[]
|
||||
`request_cache`::
|
||||
(Optional, boolean) Specifies if the request cache should be used for this
|
||||
request. Defaults to the index-level setting.
|
||||
(Optional, boolean) If `true`, the request cache is used for this request.
|
||||
Defaults to the index-level setting.
|
||||
end::request_cache[]
|
||||
|
||||
tag::requests_per_second[]
|
||||
@ -637,6 +660,12 @@ tag::stats[]
|
||||
purposes.
|
||||
end::stats[]
|
||||
|
||||
tag::stored_fields[]
|
||||
`stored_fields`::
|
||||
(Optional, boolean) If `true`, retrieves the document fields stored in the
|
||||
index rather than the document `_source`. Defaults to `false`.
|
||||
end::stored_fields[]
|
||||
|
||||
tag::target-index[]
|
||||
`<target-index>`::
|
||||
+
|
||||
@ -654,6 +683,12 @@ tag::task-id[]
|
||||
(`node_id:task_number`).
|
||||
end::task-id[]
|
||||
|
||||
tag::term_statistics[]
|
||||
`term_statistics`::
|
||||
(Optional, boolean) If `true`, the response includes term frequency and document frequency.
|
||||
Defaults to `false`.
|
||||
end::term_statistics[]
|
||||
|
||||
tag::terminate_after[]
|
||||
`terminate_after`::
|
||||
(Optional, integer) The maximum number of documents to collect for each shard,
|
||||
@ -680,8 +715,8 @@ end::timeoutparms[]
|
||||
|
||||
tag::cat-v[]
|
||||
`v`::
|
||||
(Optional, boolean) If `true`, the response includes column headings. Defaults
|
||||
to `false`.
|
||||
(Optional, boolean) If `true`, the response includes column headings.
|
||||
Defaults to `false`.
|
||||
end::cat-v[]
|
||||
|
||||
tag::version[]
|
||||
@ -721,6 +756,6 @@ end::wait_for_active_shards[]
|
||||
|
||||
tag::wait_for_completion[]
|
||||
`wait_for_completion`::
|
||||
(Optional, boolean) Should the request block until the operation is
|
||||
complete. Defaults to `true`.
|
||||
(Optional, boolean) If `true`, the request blocks until the operation is complete.
|
||||
Defaults to `true`.
|
||||
end::wait_for_completion[]
|
||||
|
Loading…
x
Reference in New Issue
Block a user