[DOCS] Backporting API ref reformatting for document APIs (#47631)

* [DOCS] Reformats bulk API. (#47479)

* Reformats bulk API.

* Update docs/reference/docs/bulk.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Reformats mget API (#47477)

* Reformats mget API

* Update docs/reference/docs/get.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Incorporated feedback.

* Reformats reindex API (#47483)

* Reformats reindex API

* Incorporated review feedback.

* Reformats term vectors APIs (#47484)

* Reformat termvectors APIs

* Reformats mtermvectors

* Apply suggestions from code review

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Incorporated review feedback.
This commit is contained in:
debadair 2019-10-06 22:25:21 -07:00 committed by GitHub
parent ffacfc642c
commit 41c04ef39c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 1238 additions and 1069 deletions

View File

@ -1,28 +1,37 @@
[[docs-bulk]]
=== Bulk API
++++
<titleabbrev>Bulk</titleabbrev>
++++
The bulk API makes it possible to perform many index/delete operations
in a single API call. This can greatly increase the indexing speed.
Performs multiple indexing or delete operations in a single API call.
This reduces overhead and can greatly increase indexing speed.
.Client support for bulk requests
*********************************************
[source,console]
--------------------------------------------------
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
--------------------------------------------------
Some of the officially supported clients provide helpers to assist with
bulk requests and reindexing of documents from one index to another:
[[docs-bulk-api-request]]
==== {api-request-title}
Perl::
`POST /_bulk`
See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
`POST /<index>/_bulk`
Python::
[[docs-bulk-api-desc]]
==== {api-description-title}
See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
Provides a way to perform multiple `index`, `create`, `delete`, and `update` actions in a single request.
*********************************************
The REST API endpoint is `/_bulk`, and it expects the following newline delimited JSON
(NDJSON) structure:
The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
[source,js]
--------------------------------------------------
@ -36,19 +45,70 @@ optional_source\n
--------------------------------------------------
// NOTCONSOLE
*NOTE*: The final line of data must end with a newline character `\n`. Each newline character
may be preceded by a carriage return `\r`. When sending requests to this endpoint the
`Content-Type` header should be set to `application/x-ndjson`.
The `index` and `create` actions expect a source on the next line,
and have the same semantics as the `op_type` parameter in the standard index API:
create fails if a document with the same name already exists in the index,
index adds or replaces a document as necessary.
The possible actions are `index`, `create`, `delete`, and `update`.
`index` and `create` expect a source on the next
line, and have the same semantics as the `op_type` parameter to the
standard index API (i.e. create will fail if a document with the same
index exists already, whereas index will add or replace a
document as necessary). `delete` does not expect a source on the
following line, and has the same semantics as the standard delete API.
`update` expects that the partial doc, upsert and script and its options
are specified on the next line.
`update` expects that the partial doc, upsert,
and script and its options are specified on the next line.
`delete` does not expect a source on the next line and
has the same semantics as the standard delete API.
[NOTE]
====
The final line of data must end with a newline character `\n`.
Each newline character may be preceded by a carriage return `\r`.
When sending requests to the `_bulk` endpoint,
the `Content-Type` header should be set to `application/x-ndjson`.
====
Because this format uses literal `\n`'s as delimiters,
make sure that the JSON actions and sources are not pretty printed.
If you specify an index in the request URI,
it is used for any actions that don't explicitly specify an index.
A note on the format: The idea here is to make processing of this as
fast as possible. As some of the actions are redirected to other
shards on other nodes, only `action_meta_data` is parsed on the
receiving node side.
Client libraries using this protocol should try and strive to do
something similar on the client side, and reduce buffering as much as
possible.
The response to a bulk action is a large JSON structure with
the individual results of each action performed,
in the same order as the actions that appeared in the request.
The failure of a single action does not affect the remaining actions.
There is no "correct" number of actions to perform in a single bulk request.
Experiment with different settings to find the optimal size for your particular workload.
When using the HTTP API, make sure that the client does not send HTTP chunks,
as this will slow things down.
[float]
[[bulk-clients]]
===== Client support for bulk requests
Some of the officially supported clients provide helpers to assist with
bulk requests and reindexing of documents from one index to another:
Perl::
See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
Python::
See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
[float]
[[bulk-curl]]
===== Submitting bulk requests with cURL
If you're providing text file input to `curl`, you *must* use the
`--data-binary` flag instead of plain `-d`. The latter doesn't preserve
@ -65,9 +125,97 @@ $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --
// NOTCONSOLE
// Not converting to console because this shows how curl works
Because this format uses literal `\n`'s as delimiters, please be sure
that the JSON actions and sources are not pretty printed. Here is an
example of a correct sequence of bulk commands:
[float]
[[bulk-optimistic-concurrency-control]]
===== Optimistic Concurrency Control
Each `index` and `delete` action within a bulk API call may include the
`if_seq_no` and `if_primary_term` parameters in their respective action
and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
how operations are executed, based on the last modification to existing
documents. See <<optimistic-concurrency-control>> for more details.
[float]
[[bulk-versioning]]
===== Versioning
Each bulk item can include the version value using the
`version` field. It automatically follows the behavior of the
index / delete operation based on the `_version` mapping. It also
support the `version_type` (see <<index-versioning, versioning>>).
[float]
[[bulk-routing]]
===== Routing
Each bulk item can include the routing value using the
`routing` field. It automatically follows the behavior of the
index / delete operation based on the `_routing` mapping.
[float]
[[bulk-wait-for-active-shards]]
===== Wait For Active Shards
When making bulk calls, you can set the `wait_for_active_shards`
parameter to require a minimum number of shard copies to be active
before starting to process the bulk request. See
<<index-wait-for-active-shards,here>> for further details and a usage
example.
[float]
[[bulk-refresh]]
===== Refresh
Control when the changes made by this request are visible to search. See
<<docs-refresh,refresh>>.
NOTE: Only the shards that receive the bulk request will be affected by
`refresh`. Imagine a `_bulk?refresh=wait_for` request with three
documents in it that happen to be routed to different shards in an index
with five shards. The request will only wait for those three shards to
refresh. The other two shards that make up the index do not
participate in the `_bulk` request at all.
[float]
[[bulk-security]]
===== Security
See <<url-access-control>>.
[float]
[[bulk-partial-responses]]
===== Partial responses
To ensure fast responses, the bulk API will respond with partial results if one or more shards fail.
See <<shard-failures, Shard failures>> for more information.
[[docs-bulk-api-path-params]]
==== {api-path-parms-title}
`<index>`::
(Optional, string) Name of the index to perform the bulk actions against.
[[docs-bulk-api-query-params]]
==== {api-query-parms-title}
include::{docdir}/rest-api/common-parms.asciidoc[tag=pipeline]
include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
include::{docdir}/rest-api/common-parms.asciidoc[tag=source]
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_excludes]
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_includes]
include::{docdir}/rest-api/common-parms.asciidoc[tag=timeout]
include::{docdir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
[[docs-bulk-api-example]]
==== {api-examples-title}
[source,console]
--------------------------------------------------
@ -81,7 +229,7 @@ POST _bulk
{ "doc" : {"field2" : "value2"} }
--------------------------------------------------
The result of this bulk operation is:
The API returns the following result:
[source,console-result]
--------------------------------------------------
@ -171,85 +319,9 @@ The result of this bulk operation is:
// TESTRESPONSE[s/"_seq_no" : 3/"_seq_no" : $body.items.3.update._seq_no/]
// TESTRESPONSE[s/"_primary_term" : 4/"_primary_term" : $body.items.3.update._primary_term/]
The endpoints are `/_bulk` and `/{index}/_bulk`. When the index is provided, it
will be used by default on bulk items that don't provide it explicitly.
A note on the format. The idea here is to make processing of this as
fast as possible. As some of the actions will be redirected to other
shards on other nodes, only `action_meta_data` is parsed on the
receiving node side.
Client libraries using this protocol should try and strive to do
something similar on the client side, and reduce buffering as much as
possible.
The response to a bulk action is a large JSON structure with the individual
results of each action that was performed in the same order as the actions that
appeared in the request. The failure of a single action does not affect the
remaining actions.
There is no "correct" number of actions to perform in a single bulk
call. You should experiment with different settings to find the optimum
size for your particular workload.
If using the HTTP API, make sure that the client does not send HTTP
chunks, as this will slow things down.
[float]
[[bulk-optimistic-concurrency-control]]
==== Optimistic Concurrency Control
Each `index` and `delete` action within a bulk API call may include the
`if_seq_no` and `if_primary_term` parameters in their respective action
and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
how operations are executed, based on the last modification to existing
documents. See <<optimistic-concurrency-control>> for more details.
[float]
[[bulk-versioning]]
==== Versioning
Each bulk item can include the version value using the
`version` field. It automatically follows the behavior of the
index / delete operation based on the `_version` mapping. It also
support the `version_type` (see <<index-versioning, versioning>>).
[float]
[[bulk-routing]]
==== Routing
Each bulk item can include the routing value using the
`routing` field. It automatically follows the behavior of the
index / delete operation based on the `_routing` mapping.
[float]
[[bulk-wait-for-active-shards]]
==== Wait For Active Shards
When making bulk calls, you can set the `wait_for_active_shards`
parameter to require a minimum number of shard copies to be active
before starting to process the bulk request. See
<<index-wait-for-active-shards,here>> for further details and a usage
example.
[float]
[[bulk-refresh]]
==== Refresh
Control when the changes made by this request are visible to search. See
<<docs-refresh,refresh>>.
NOTE: Only the shards that receive the bulk request will be affected by
`refresh`. Imagine a `_bulk?refresh=wait_for` request with three
documents in it that happen to be routed to different shards in an index
with five shards. The request will only wait for those three shards to
refresh. The other two shards that make up the index do not
participate in the `_bulk` request at all.
[float]
[[bulk-update]]
==== Update
===== Bulk update example
When using the `update` action, `retry_on_conflict` can be used as a field in
the action itself (not in the extra payload line), to specify how many
@ -276,13 +348,3 @@ POST _bulk
--------------------------------------------------
// TEST[continued]
[float]
[[bulk-security]]
==== Security
See <<url-access-control>>.
[float]
[[bulk-partial-responses]]
==== Partial responses
To ensure fast responses, the bulk API will respond with partial results if one or more shards fail. See <<shard-failures, Shard failures>> for more information.

View File

@ -6,6 +6,12 @@
Retrieves the specified JSON document from an index.
[source,console]
--------------------------------------------------
GET twitter/_doc/0
--------------------------------------------------
// TEST[setup:twitter]
[[docs-get-api-request]]
==== {api-request-title}
@ -150,32 +156,21 @@ deleted documents in the background as you continue to index more data.
[[docs-get-api-query-params]]
==== {api-query-parms-title}
`preference`::
(Optional, string) Specify the node or shard the operation should
be performed on (default: random).
include::{docdir}/rest-api/common-parms.asciidoc[tag=preference]
`realtime`::
(Optional, boolean) Set to `false` to disable real time GET
(default: `true`). See <<realtime>>.
include::{docdir}/rest-api/common-parms.asciidoc[tag=realtime]
include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
`stored_fields`::
(Optional, boolean) Set to `true` to retrieve the document fields stored in the
index rather than the document `_source` (default: `false`).
include::{docdir}/rest-api/common-parms.asciidoc[tag=stored_fields]
`_source`::
(Optional, list) Set to `false` to disable source retrieval (default: `true`).
You can also specify a comma-separated list of the fields
you want to retrieve.
include::{docdir}/rest-api/common-parms.asciidoc[tag=source]
`_source_excludes`::
(Optional, list) Specify the source fields you want to exclude.
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_excludes]
`_source_includes`::
(Optional, list) Specify the source fields you want to retrieve.
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_includes]
include::{docdir}/rest-api/common-parms.asciidoc[tag=doc-version]

View File

@ -1,15 +1,10 @@
[[docs-multi-get]]
=== Multi Get API
=== Multi get (mget) API
++++
<titleabbrev>Multi get</titleabbrev>
++++
The Multi get API returns multiple documents based on an index, type,
(optional) and id (and possibly routing). The response includes a `docs` array
with all the fetched documents in order corresponding to the original multi-get
request (if there was a failure for a specific get, an object containing this
error is included in place in the response instead). The structure of a
successful get is similar in structure to a document provided by the
<<docs-get,get>> API.
Here is an example:
Retrieves multiple JSON documents by ID.
[source,console]
--------------------------------------------------
@ -17,25 +12,121 @@ GET /_mget
{
"docs" : [
{
"_index" : "test",
"_type" : "_doc",
"_index" : "twitter",
"_id" : "1"
},
{
"_index" : "test",
"_type" : "_doc",
"_index" : "twitter",
"_id" : "2"
}
]
}
--------------------------------------------------
// TEST[setup:twitter]
The `mget` endpoint can also be used against an index (in which case it
is not required in the body):
[[docs-multi-get-api-request]]
==== {api-request-title}
`GET /_mget`
`GET /<index>/_mget`
[[docs-multi-get-api-desc]]
==== {api-description-title}
You use `mget` to retrieve multiple documents from one or more indices.
If you specify an index in the request URI, you only need to specify the document IDs in the request body.
[[mget-security]]
===== Security
See <<url-access-control>>.
[[multi-get-partial-responses]]
===== Partial responses
To ensure fast responses, the multi get API responds with partial results if one or more shards fail.
See <<shard-failures, Shard failures>> for more information.
[[docs-multi-get-api-path-params]]
==== {api-path-parms-title}
include::{docdir}/rest-api/common-parms.asciidoc[tag=index]
[[docs-multi-get-api-query-params]]
==== {api-query-parms-title}
include::{docdir}/rest-api/common-parms.asciidoc[tag=preference]
include::{docdir}/rest-api/common-parms.asciidoc[tag=realtime]
include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
include::{docdir}/rest-api/common-parms.asciidoc[tag=stored_fields]
include::{docdir}/rest-api/common-parms.asciidoc[tag=source]
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_excludes]
include::{docdir}/rest-api/common-parms.asciidoc[tag=source_includes]
[[docs-multi-get-api-request-body]]
==== {api-request-body-title}
`docs`::
(Optional, array) The documents you want to retrieve.
Required if no index is specified in the request URI.
You can specify the following attributes for each
document:
+
--
`_id`::
(Required, string) The unique document ID.
`_index`::
(Optional, string)
The index that contains the document.
Required if no index is specified in the request URI.
`_routing`::
(Optional, string) The key for the primary shard the document resides on.
Required if routing is used during indexing.
`_source`::
(Optional, boolean) If `false`, excludes all `_source` fields. Defaults to `true`.
`source_include`:::
(Optional, array) The fields to extract and return from the `_source` field.
`source_exclude`:::
(Optional, array) The fields to exclude from the returned `_source` field.
`_stored_fields`::
(Optional, array) The stored fields you want to retrieve.
--
`ids`::
(Optional, array) The IDs of the documents you want to retrieve.
Allowed when the index is specified in the request URI.
[[multi-get-api-response-body]]
==== {api-response-body-title}
The response includes a `docs` array that contains the documents in the order specified in the request.
The structure of the returned documents is similar to that returned by the <<docs-get,get>> API.
If there is a failure getting a particular document, the error is included in place of the document.
[[docs-multi-get-api-example]]
==== {api-examples-title}
[[mget-ids]]
===== Get documents by ID
If you specify an index in the request URI, only the document IDs are required in the request body:
[source,console]
--------------------------------------------------
GET /test/_mget
GET /twitter/_mget
{
"docs" : [
{
@ -66,30 +157,31 @@ GET /test/_doc/_mget
]
}
--------------------------------------------------
//CONSOLE
// TEST[setup:twitter]
In which case, the `ids` element can directly be used to simplify the
request:
You can use the `ids` element to simplify the request:
[source,console]
--------------------------------------------------
GET /test/_doc/_mget
GET /twitter/_mget
{
"ids" : ["1", "2"]
}
--------------------------------------------------
// TEST[setup:twitter]
[float]
[[mget-source-filtering]]
==== Source filtering
===== Filter source fields
By default, the `_source` field will be returned for every document (if stored).
Similar to the <<get-source-filtering,get>> API, you can retrieve only parts of
the `_source` (or not at all) by using the `_source` parameter. You can also use
the url parameters `_source`, `_source_includes`, and `_source_excludes` to specify defaults,
which will be used when there are no per-document instructions.
By default, the `_source` field is returned for every document (if stored).
Use the `_source` and `_source_include` or `source_exclude` attributes to
filter what fields are returned for a particular document.
You can include the `_source`, `_source_includes`, and `_source_excludes` query parameters in the
request URI to specify the defaults to use when there are no per-document instructions.
For example:
For example, the following request sets `_source` to false for document 1 to exclude the
source entirely, retrieves `field3` and `field4` from document 2, and retrieves the `user` field
from document 3 but filters out the `user.location` field.
[source,console]
--------------------------------------------------
@ -121,13 +213,16 @@ GET /_mget
}
--------------------------------------------------
[float]
[[mget-fields]]
==== Fields
===== Get stored fields
Specific stored fields can be specified to be retrieved per document to get, similar to the <<get-stored-fields,stored_fields>> parameter of the Get API.
For example:
Use the `stored_fields` attribute to specify the set of stored fields you want
to retrieve. Any requested fields that are not stored are ignored.
You can include the `stored_fields` query parameter in the request URI to specify the defaults
to use when there are no per-document instructions.
For example, the following request retrieves `field1` and `field2` from document 1, and
`field3` and `field4`from document 2:
[source,console]
--------------------------------------------------
@ -150,8 +245,9 @@ GET /_mget
}
--------------------------------------------------
Alternatively, you can specify the `stored_fields` parameter in the query string
as a default to be applied to all documents.
The following request retrieves `field1` and `field2` from all documents by default.
These default fields are returned for document 1, but
overridden to return `field3` and `field4` for document 2.
[source,console]
--------------------------------------------------
@ -159,23 +255,22 @@ GET /test/_doc/_mget?stored_fields=field1,field2
{
"docs" : [
{
"_id" : "1" <1>
"_id" : "1"
},
{
"_id" : "2",
"stored_fields" : ["field3", "field4"] <2>
"stored_fields" : ["field3", "field4"]
}
]
}
--------------------------------------------------
<1> Returns `field1` and `field2`
<2> Returns `field3` and `field4`
[float]
[[mget-routing]]
==== Routing
===== Specify document routing
You can also specify a routing value as a parameter:
If routing is used during indexing, you need to specify the routing value to retrieve documents.
For example, the following request fetches `test/_doc/2` from the shard corresponding to routing key `key1`,
and fetches `test/_doc/1` from the shard corresponding to routing key `key2`.
[source,console]
--------------------------------------------------
@ -195,18 +290,4 @@ GET /_mget?routing=key1
}
]
}
--------------------------------------------------
In this example, document `test/_doc/2` will be fetched from the shard corresponding to routing key `key1` but
document `test/_doc/1` will be fetched from the shard corresponding to routing key `key2`.
[float]
[[mget-security]]
==== Security
See <<url-access-control>>.
[float]
[[multi-get-partial-responses]]
==== Partial responses
To ensure fast responses, the multi get API will respond with partial results if one or more shards fail. See <<shard-failures, Shard failures>> for more information.
--------------------------------------------------

View File

@ -1,14 +1,10 @@
[[docs-multi-termvectors]]
=== Multi termvectors API
=== Multi term vectors API
++++
<titleabbrev>Multi term vectors</titleabbrev>
++++
Multi termvectors API allows to get multiple termvectors at once. The
documents from which to retrieve the term vectors are specified by an index and id.
But the documents could also be artificially provided in the request itself.
The response includes a `docs`
array with all the fetched termvectors, each element having the structure
provided by the <<docs-termvectors,termvectors>>
API. Here is an example:
Retrieves multiple term vectors with a single request.
[source,console]
--------------------------------------------------
@ -32,10 +28,64 @@ POST /_mtermvectors
--------------------------------------------------
// TEST[setup:twitter]
See the <<docs-termvectors,termvectors>> API for a description of possible parameters.
[[docs-multi-termvectors-api-request]]
==== {api-request-title}
The `_mtermvectors` endpoint can also be used against an index (in which case it
is not required in the body):
`POST /_mtermvectors`
`POST /<index>/_mtermvectors`
[[docs-multi-termvectors-api-desc]]
==== {api-description-title}
You can specify existing documents by index and ID or
provide artificial documents in the body of the request.
The index can be specified the body of the request or in the request URI.
The response contains a `docs` array with all the fetched termvectors.
Each element has the structure provided by the <<docs-termvectors,termvectors>>
API.
See the <<docs-termvectors,termvectors>> API for more information about the information
that can be included in the response.
[[docs-multi-termvectors-api-path-params]]
==== {api-path-parms-title}
`<index>`::
(Optional, string) Name of the index that contains the documents.
[[docs-multi-termvectors-api-query-params]]
==== {api-query-parms-title}
include::{docdir}/rest-api/common-parms.asciidoc[tag=fields]
include::{docdir}/rest-api/common-parms.asciidoc[tag=field_statistics]
include::{docdir}/rest-api/common-parms.asciidoc[tag=offsets]
include::{docdir}/rest-api/common-parms.asciidoc[tag=payloads]
include::{docdir}/rest-api/common-parms.asciidoc[tag=positions]
include::{docdir}/rest-api/common-parms.asciidoc[tag=preference]
include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
include::{docdir}/rest-api/common-parms.asciidoc[tag=realtime]
include::{docdir}/rest-api/common-parms.asciidoc[tag=term_statistics]
include::{docdir}/rest-api/common-parms.asciidoc[tag=version]
include::{docdir}/rest-api/common-parms.asciidoc[tag=version_type]
[float]
[[docs-multi-termvectors-api-example]]
==== {api-examples-title}
If you specify an index in the request URI, the index does not need to be specified for each documents
in the request body:
[source,console]
--------------------------------------------------
@ -57,7 +107,8 @@ POST /twitter/_mtermvectors
--------------------------------------------------
// TEST[setup:twitter]
If all requested documents are on same index and also the parameters are the same, the request can be simplified:
If all requested documents are in same index and the parameters are the same, you can use the
following simplified syntax:
[source,console]
--------------------------------------------------
@ -74,9 +125,11 @@ POST /twitter/_mtermvectors
--------------------------------------------------
// TEST[setup:twitter]
Additionally, just like for the <<docs-termvectors,termvectors>>
API, term vectors could be generated for user provided documents.
The mapping used is determined by `_index`.
[[docs-multi-termvectors-artificial-doc]]
===== Artificial documents
You can also use `mtermvectors` to generate term vectors for _artificial_ documents provided
in the body of the request. The mapping used is determined by the specified `_index`.
[source,console]
--------------------------------------------------

File diff suppressed because it is too large Load Diff

View File

@ -1,10 +1,10 @@
[[docs-termvectors]]
=== Term Vectors
=== Term vectors API
++++
<titleabbrev>Term vectors</titleabbrev>
++++
Returns information and statistics on terms in the fields of a particular
document. The document could be stored in the index or artificially provided
by the user. Term vectors are <<realtime,realtime>> by default, not near
realtime. This can be changed by setting `realtime` parameter to `false`.
Retrieves information and statistics for terms in the fields of a particular document.
[source,console]
--------------------------------------------------
@ -12,8 +12,19 @@ GET /twitter/_termvectors/1
--------------------------------------------------
// TEST[setup:twitter]
Optionally, you can specify the fields for which the information is
retrieved either with a parameter in the url
[[docs-termvectors-api-request]]
==== {api-request-title}
`GET /<index>/_termvectors/<_id>`
[[docs-termvectors-api-desc]]
==== {api-description-title}
You can retrieve term vectors for documents stored in the index or
for _artificial_ documents passed in the body of the request.
You can specify the fields you are interested in through the `fields` parameter,
or by adding the fields to the request body.
[source,console]
--------------------------------------------------
@ -21,18 +32,16 @@ GET /twitter/_termvectors/1?fields=message
--------------------------------------------------
// TEST[setup:twitter]
or by adding the requested fields in the request body (see
example below). Fields can also be specified with wildcards
in similar way to the <<query-dsl-multi-match-query,multi match query>>
Fields can be specified using wildcards, similar to the <<query-dsl-multi-match-query,multi match query>>.
[float]
==== Return values
Term vectors are <<realtime,real-time>> by default, not near real-time.
This can be changed by setting `realtime` parameter to `false`.
Three types of values can be requested: _term information_, _term statistics_
You can request three types of values: _term information_, _term statistics_
and _field statistics_. By default, all term information and field
statistics are returned for all fields but no term statistics.
statistics are returned for all fields but term statistics are excluded.
[float]
[[docs-termvectors-api-term-info]]
===== Term information
* term frequency in the field (always returned)
@ -52,7 +61,7 @@ should make sure that the string you are taking a sub-string of is also encoded
using UTF-16.
======
[float]
[[docs-termvectors-api-term-stats]]
===== Term statistics
Setting `term_statistics` to `true` (default is `false`) will
@ -65,7 +74,7 @@ return
By default these values are not returned since term statistics can
have a serious performance impact.
[float]
[[docs-termvectors-api-field-stats]]
===== Field statistics
Setting `field_statistics` to `false` (default is `true`) will
@ -77,8 +86,8 @@ omit :
* sum of total term frequencies (the sum of total term frequencies of
each term in this field)
[float]
===== Terms Filtering
[[docs-termvectors-api-terms-filtering]]
===== Terms filtering
With the parameter `filter`, the terms returned could also be filtered based
on their tf-idf scores. This could be useful in order find out a good
@ -105,7 +114,7 @@ The following sub-parameters are supported:
`max_word_length`::
The maximum word length above which words will be ignored. Defaults to unbounded (`0`).
[float]
[[docs-termvectors-api-behavior]]
==== Behaviour
The term and field statistics are not accurate. Deleted documents
@ -116,8 +125,45 @@ whereas the absolute numbers have no meaning in this context. By default,
when requesting term vectors of artificial documents, a shard to get the statistics
from is randomly selected. Use `routing` only to hit a particular shard.
[float]
===== Example: Returning stored term vectors
[[docs-termvectors-api-path-params]]
==== {api-path-parms-title}
`<index>`::
(Required, string) Name of the index that contains the document.
`<_id>`::
(Optional, string) Unique identifier of the document.
[[docs-termvectors-api-query-params]]
==== {api-query-parms-title}
include::{docdir}/rest-api/common-parms.asciidoc[tag=fields]
include::{docdir}/rest-api/common-parms.asciidoc[tag=field_statistics]
include::{docdir}/rest-api/common-parms.asciidoc[tag=offsets]
include::{docdir}/rest-api/common-parms.asciidoc[tag=payloads]
include::{docdir}/rest-api/common-parms.asciidoc[tag=positions]
include::{docdir}/rest-api/common-parms.asciidoc[tag=preference]
include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
include::{docdir}/rest-api/common-parms.asciidoc[tag=realtime]
include::{docdir}/rest-api/common-parms.asciidoc[tag=term_statistics]
include::{docdir}/rest-api/common-parms.asciidoc[tag=version]
include::{docdir}/rest-api/common-parms.asciidoc[tag=version_type]
[[docs-termvectors-api-example]]
==== {api-examples-title}
[[docs-termvectors-api-stored-termvectors]]
===== Returning stored term vectors
First, we create an index that stores term vectors, payloads etc. :
@ -260,8 +306,8 @@ Response:
// TEST[continued]
// TESTRESPONSE[s/"took": 6/"took": "$body.took"/]
[float]
===== Example: Generating term vectors on the fly
[[docs-termvectors-api-generate-termvectors]]
===== Generating term vectors on the fly
Term vectors which are not explicitly stored in the index are automatically
computed on the fly. The following request returns all information and statistics for the
@ -282,8 +328,7 @@ GET /twitter/_termvectors/1
// TEST[continued]
[[docs-termvectors-artificial-doc]]
[float]
===== Example: Artificial documents
===== Artificial documents
Term vectors can also be generated for artificial documents,
that is for documents not present in the index. For example, the following request would
@ -305,7 +350,6 @@ GET /twitter/_termvectors
// TEST[continued]
[[docs-termvectors-per-field-analyzer]]
[float]
====== Per-field analyzer
Additionally, a different analyzer than the one at the field may be provided
@ -371,8 +415,7 @@ Response:
[[docs-termvectors-terms-filtering]]
[float]
===== Example: Terms filtering
===== Terms filtering
Finally, the terms returned could be filtered based on their tf-idf scores. In
the example below we obtain the three most "interesting" keywords from the

View File

@ -143,13 +143,12 @@ Wildcard expressions are not accepted.
--
end::expand-wildcards[]
tag::index-alias-filter[]
<<query-dsl-bool-query, Filter query>>
used to limit the index alias.
+
If specified,
the index alias only applies to documents returned by the filter.
end::index-alias-filter[]
tag::field_statistics[]
`field_statistics`::
(Optional, boolean) If `true`, the response includes the document count, sum of document frequencies,
and sum of total term frequencies.
Defaults to `true`.
end::field_statistics[]
tag::fielddata-fields[]
`fielddata_fields`::
@ -243,7 +242,7 @@ end::cat-h[]
tag::help[]
`help`::
(Optional, boolean) If `true`, the response returns help information. Defaults
(Optional, boolean) If `true`, the response includes help information. Defaults
to `false`.
end::help[]
@ -465,6 +464,12 @@ Comma-separated list of node IDs or names
used to limit returned information.
end::node-id-query-parm[]
tag::offsets[]
`<offsets>`::
(Optional, boolean) If `true`, the response includes term offsets.
Defaults to `true`.
end::offsets[]
tag::parent-task-id[]
`parent_task_id`::
+
@ -490,6 +495,18 @@ tag::path-pipeline[]
used to limit the request.
end::path-pipeline[]
tag::payloads[]
`payloads`::
(Optional, boolean) If `true`, the response includes term payloads.
Defaults to `true`.
end::payloads[]
tag::positions[]
`positions`::
(Optional, boolean) If `true`, the response includes term positions.
Defaults to `true`.
end::positions[]
tag::preference[]
`preference`::
(Optional, string) Specifies the node or shard the operation should be
@ -507,6 +524,12 @@ tag::query[]
<<query-dsl,Query DSL>>.
end::query[]
tag::realtime[]
`realtime`::
(Optional, boolean) If `true`, the request is real-time as opposed to near-real-time.
Defaults to `true`. See <<realtime>>.
end::realtime[]
tag::refresh[]
`refresh`::
(Optional, enum) If `true`, {es} refreshes the affected shards to make this
@ -517,8 +540,8 @@ end::refresh[]
tag::request_cache[]
`request_cache`::
(Optional, boolean) Specifies if the request cache should be used for this
request. Defaults to the index-level setting.
(Optional, boolean) If `true`, the request cache is used for this request.
Defaults to the index-level setting.
end::request_cache[]
tag::requests_per_second[]
@ -637,6 +660,12 @@ tag::stats[]
purposes.
end::stats[]
tag::stored_fields[]
`stored_fields`::
(Optional, boolean) If `true`, retrieves the document fields stored in the
index rather than the document `_source`. Defaults to `false`.
end::stored_fields[]
tag::target-index[]
`<target-index>`::
+
@ -654,6 +683,12 @@ tag::task-id[]
(`node_id:task_number`).
end::task-id[]
tag::term_statistics[]
`term_statistics`::
(Optional, boolean) If `true`, the response includes term frequency and document frequency.
Defaults to `false`.
end::term_statistics[]
tag::terminate_after[]
`terminate_after`::
(Optional, integer) The maximum number of documents to collect for each shard,
@ -680,8 +715,8 @@ end::timeoutparms[]
tag::cat-v[]
`v`::
(Optional, boolean) If `true`, the response includes column headings. Defaults
to `false`.
(Optional, boolean) If `true`, the response includes column headings.
Defaults to `false`.
end::cat-v[]
tag::version[]
@ -721,6 +756,6 @@ end::wait_for_active_shards[]
tag::wait_for_completion[]
`wait_for_completion`::
(Optional, boolean) Should the request block until the operation is
complete. Defaults to `true`.
(Optional, boolean) If `true`, the request blocks until the operation is complete.
Defaults to `true`.
end::wait_for_completion[]