508 lines
18 KiB
Plaintext
508 lines
18 KiB
Plaintext
[[docs-index_]]
|
||
=== Index API
|
||
++++
|
||
<titleabbrev>Index</titleabbrev>
|
||
++++
|
||
|
||
IMPORTANT: See <<removal-of-types>>.
|
||
|
||
Adds a JSON document to the specified index and makes
|
||
it searchable. If the document already exists,
|
||
updates the document and increments its version.
|
||
|
||
[[docs-index-api-request]]
|
||
==== {api-request-title}
|
||
|
||
`PUT /<index>/_doc/<_id>`
|
||
|
||
`POST /<index>/_doc/`
|
||
|
||
`PUT /<index>/_create/<_id>`
|
||
|
||
`POST /<index>/_create/<_id>`
|
||
|
||
[[docs-index-api-path-params]]
|
||
==== {api-path-parms-title}
|
||
|
||
`<index>`::
|
||
(Required, string) Name of the target index. By default, the index is created
|
||
automatically if it doesn't exist. For more information, see <<index-creation>>.
|
||
|
||
`<_id>`::
|
||
(Optional, string) Unique identifier for the document. Required if you are
|
||
using a PUT request. Omit to automatically generate an ID when using a
|
||
POST request.
|
||
|
||
|
||
[[docs-index-api-query-params]]
|
||
==== {api-query-parms-title}
|
||
|
||
include::{docdir}/rest-api/common-parms.asciidoc[tag=if_seq_no]
|
||
|
||
include::{docdir}/rest-api/common-parms.asciidoc[tag=if_primary_term]
|
||
|
||
`op_type`::
|
||
(Optional, enum) Set to `create` to only index the document
|
||
if it does not already exist (_put if absent_). If a document with the specified
|
||
`_id` already exists, the indexing operation will fail. Same as using the
|
||
`<index>/_create` endpoint. Valid values: `index`, `create`. Default: `index`.
|
||
|
||
include::{docdir}/rest-api/common-parms.asciidoc[tag=pipeline]
|
||
|
||
include::{docdir}/rest-api/common-parms.asciidoc[tag=refresh]
|
||
|
||
include::{docdir}/rest-api/common-parms.asciidoc[tag=routing]
|
||
|
||
include::{docdir}/rest-api/common-parms.asciidoc[tag=timeoutparms]
|
||
|
||
include::{docdir}/rest-api/common-parms.asciidoc[tag=doc-version]
|
||
|
||
include::{docdir}/rest-api/common-parms.asciidoc[tag=version_type]
|
||
|
||
include::{docdir}/rest-api/common-parms.asciidoc[tag=wait_for_active_shards]
|
||
|
||
[[docs-index-api-request-body]]
|
||
==== {api-request-body-title}
|
||
|
||
`<field>`::
|
||
(Required, string) Request body contains the JSON source for the document
|
||
data.
|
||
|
||
[[docs-index-api-response-body]]
|
||
==== {api-response-body-title}
|
||
|
||
`_shards`::
|
||
Provides information about the replication process of the index operation.
|
||
|
||
`_shards.total`::
|
||
Indicates how many shard copies (primary and replica shards) the index operation
|
||
should be executed on.
|
||
|
||
`_shards.successful`::
|
||
Indicates the number of shard copies the index operation succeeded on.
|
||
When the index operation is successful, `successful` is at least 1.
|
||
+
|
||
NOTE: Replica shards might not all be started when an indexing operation
|
||
returns successfully--by default, only the primary is required. Set
|
||
`wait_for_active_shards` to change this default behavior. See
|
||
<<index-wait-for-active-shards>>.
|
||
|
||
`_shards.failed`::
|
||
An array that contains replication-related errors in the case an index operation
|
||
failed on a replica shard. 0 indicates there were no failures.
|
||
|
||
`_index`::
|
||
The name of the index the document was added to.
|
||
|
||
`_type`::
|
||
The document type. {es} indices now support a single document type, `_doc`.
|
||
|
||
`_id`::
|
||
The unique identifier for the added document.
|
||
|
||
`_version`::
|
||
The document version. Incremented each time the document is updated.
|
||
|
||
`_seq_no`::
|
||
The sequence number assigned to the document for the indexing operation.
|
||
Sequence numbers are used to ensure an older version of a document
|
||
doesn’t overwrite a newer version. See <<optimistic-concurrency-control-index>>.
|
||
|
||
`_primary_term`::
|
||
The primary term assigned to the document for the indexing operation.
|
||
See <<optimistic-concurrency-control-index>>.
|
||
|
||
`result`::
|
||
The result of the indexing operation, `created` or `updated`.
|
||
|
||
[[docs-index-api-desc]]
|
||
==== {api-description-title}
|
||
|
||
You can index a new JSON document with the `_doc` or `_create` resource. Using
|
||
`_create` guarantees that the document is only indexed if it does not already
|
||
exist. To update an existing document, you must use the `_doc` resource.
|
||
|
||
[[index-creation]]
|
||
===== Create indices automatically
|
||
|
||
If the specified index does not already exist, by default the index operation
|
||
automatically creates it and applies any configured
|
||
<<indices-templates,index templates>>. If no mapping exists, the index opration
|
||
creates a dynamic mapping. By default, new fields and objects are
|
||
automatically added to the mapping if needed. For more information about field
|
||
mapping, see <<mapping,mapping>> and the <<indices-put-mapping,put mapping>> API.
|
||
|
||
Automatic index creation is controlled by the `action.auto_create_index`
|
||
setting. This setting defaults to `true`, which allows any index to be created
|
||
automatically. You can modify this setting to explicitly allow or block
|
||
automatic creation of indices that match specified patterns, or set it to
|
||
`false` to disable automatic index creation entirely. Specify a
|
||
comma-separated list of patterns you want to allow, or prefix each pattern with
|
||
`+` or `-` to indicate whether it should be allowed or blocked. When a list is
|
||
specified, the default behaviour is to disallow.
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT _cluster/settings
|
||
{
|
||
"persistent": {
|
||
"action.auto_create_index": "twitter,index10,-index1*,+ind*" <1>
|
||
}
|
||
}
|
||
|
||
PUT _cluster/settings
|
||
{
|
||
"persistent": {
|
||
"action.auto_create_index": "false" <2>
|
||
}
|
||
}
|
||
|
||
PUT _cluster/settings
|
||
{
|
||
"persistent": {
|
||
"action.auto_create_index": "true" <3>
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
|
||
<1> Allow auto-creation of indices called `twitter` or `index10`, block the
|
||
creation of indices that match the pattern `index1*`, and allow creation of
|
||
any other indices that match the `ind*` pattern. Patterns are matched in
|
||
the order specified.
|
||
|
||
<2> Disable automatic index creation entirely.
|
||
|
||
<3> Allow automatic creation of any index. This is the default.
|
||
|
||
[float]
|
||
[[operation-type]]
|
||
===== Put if absent
|
||
|
||
You can force a create operation by using the `_create` resource or
|
||
setting the `op_type` parameter to _create_. In this case,
|
||
the index operation fails if a document with the specified ID
|
||
already exists in the index.
|
||
|
||
[float]
|
||
===== Create document IDs automatically
|
||
|
||
If you don't specify a document ID when using POST, the `op_type` is
|
||
automatically set to `create` and the index operation generates a unique ID
|
||
for the document.
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
POST twitter/_doc/
|
||
{
|
||
"user" : "kimchy",
|
||
"post_date" : "2009-11-15T14:12:12",
|
||
"message" : "trying out Elasticsearch"
|
||
}
|
||
--------------------------------------------------
|
||
|
||
The API returns the following result:
|
||
|
||
[source,console-result]
|
||
--------------------------------------------------
|
||
{
|
||
"_shards" : {
|
||
"total" : 2,
|
||
"failed" : 0,
|
||
"successful" : 2
|
||
},
|
||
"_index" : "twitter",
|
||
"_type" : "_doc",
|
||
"_id" : "W0tpsmIBdwcYyG50zbta",
|
||
"_version" : 1,
|
||
"_seq_no" : 0,
|
||
"_primary_term" : 1,
|
||
"result": "created"
|
||
}
|
||
--------------------------------------------------
|
||
// TESTRESPONSE[s/W0tpsmIBdwcYyG50zbta/$body._id/ s/"successful" : 2/"successful" : 1/]
|
||
|
||
[float]
|
||
[[optimistic-concurrency-control-index]]
|
||
===== Optimistic concurrency control
|
||
|
||
Index operations can be made conditional and only be performed if the last
|
||
modification to the document was assigned the sequence number and primary
|
||
term specified by the `if_seq_no` and `if_primary_term` parameters. If a
|
||
mismatch is detected, the operation will result in a `VersionConflictException`
|
||
and a status code of 409. See <<optimistic-concurrency-control>> for more details.
|
||
|
||
[float]
|
||
[[index-routing]]
|
||
===== Routing
|
||
|
||
By default, shard placement ? or `routing` ? is controlled by using a
|
||
hash of the document's id value. For more explicit control, the value
|
||
fed into the hash function used by the router can be directly specified
|
||
on a per-operation basis using the `routing` parameter. For example:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
POST twitter/_doc?routing=kimchy
|
||
{
|
||
"user" : "kimchy",
|
||
"post_date" : "2009-11-15T14:12:12",
|
||
"message" : "trying out Elasticsearch"
|
||
}
|
||
--------------------------------------------------
|
||
|
||
In this example, the document is routed to a shard based on
|
||
the `routing` parameter provided: "kimchy".
|
||
|
||
When setting up explicit mapping, you can also use the `_routing` field
|
||
to direct the index operation to extract the routing value from the
|
||
document itself. This does come at the (very minimal) cost of an
|
||
additional document parsing pass. If the `_routing` mapping is defined
|
||
and set to be `required`, the index operation will fail if no routing
|
||
value is provided or extracted.
|
||
|
||
[float]
|
||
[[index-distributed]]
|
||
===== Distributed
|
||
|
||
The index operation is directed to the primary shard based on its route
|
||
(see the Routing section above) and performed on the actual node
|
||
containing this shard. After the primary shard completes the operation,
|
||
if needed, the update is distributed to applicable replicas.
|
||
|
||
[float]
|
||
[[index-wait-for-active-shards]]
|
||
===== Active shards
|
||
|
||
To improve the resiliency of writes to the system, indexing operations
|
||
can be configured to wait for a certain number of active shard copies
|
||
before proceeding with the operation. If the requisite number of active
|
||
shard copies are not available, then the write operation must wait and
|
||
retry, until either the requisite shard copies have started or a timeout
|
||
occurs. By default, write operations only wait for the primary shards
|
||
to be active before proceeding (i.e. `wait_for_active_shards=1`).
|
||
This default can be overridden in the index settings dynamically
|
||
by setting `index.write.wait_for_active_shards`. To alter this behavior
|
||
per operation, the `wait_for_active_shards` request parameter can be used.
|
||
|
||
Valid values are `all` or any positive integer up to the total number
|
||
of configured copies per shard in the index (which is `number_of_replicas+1`).
|
||
Specifying a negative value or a number greater than the number of
|
||
shard copies will throw an error.
|
||
|
||
For example, suppose we have a cluster of three nodes, `A`, `B`, and `C` and
|
||
we create an index `index` with the number of replicas set to 3 (resulting in
|
||
4 shard copies, one more copy than there are nodes). If we
|
||
attempt an indexing operation, by default the operation will only ensure
|
||
the primary copy of each shard is available before proceeding. This means
|
||
that even if `B` and `C` went down, and `A` hosted the primary shard copies,
|
||
the indexing operation would still proceed with only one copy of the data.
|
||
If `wait_for_active_shards` is set on the request to `3` (and all 3 nodes
|
||
are up), then the indexing operation will require 3 active shard copies
|
||
before proceeding, a requirement which should be met because there are 3
|
||
active nodes in the cluster, each one holding a copy of the shard. However,
|
||
if we set `wait_for_active_shards` to `all` (or to `4`, which is the same),
|
||
the indexing operation will not proceed as we do not have all 4 copies of
|
||
each shard active in the index. The operation will timeout
|
||
unless a new node is brought up in the cluster to host the fourth copy of
|
||
the shard.
|
||
|
||
It is important to note that this setting greatly reduces the chances of
|
||
the write operation not writing to the requisite number of shard copies,
|
||
but it does not completely eliminate the possibility, because this check
|
||
occurs before the write operation commences. Once the write operation
|
||
is underway, it is still possible for replication to fail on any number of
|
||
shard copies but still succeed on the primary. The `_shards` section of the
|
||
write operation's response reveals the number of shard copies on which
|
||
replication succeeded/failed.
|
||
|
||
[source,js]
|
||
--------------------------------------------------
|
||
{
|
||
"_shards" : {
|
||
"total" : 2,
|
||
"failed" : 0,
|
||
"successful" : 2
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// NOTCONSOLE
|
||
|
||
[float]
|
||
[[index-refresh]]
|
||
===== Refresh
|
||
|
||
Control when the changes made by this request are visible to search. See
|
||
<<docs-refresh,refresh>>.
|
||
|
||
[float]
|
||
[[index-noop]]
|
||
===== Noop updates
|
||
|
||
When updating a document using the index API a new version of the document is
|
||
always created even if the document hasn't changed. If this isn't acceptable
|
||
use the `_update` API with `detect_noop` set to true. This option isn't
|
||
available on the index API because the index API doesn't fetch the old source
|
||
and isn't able to compare it against the new source.
|
||
|
||
There isn't a hard and fast rule about when noop updates aren't acceptable.
|
||
It's a combination of lots of factors like how frequently your data source
|
||
sends updates that are actually noops and how many queries per second
|
||
Elasticsearch runs on the shard receiving the updates.
|
||
|
||
[float]
|
||
[[timeout]]
|
||
===== Timeout
|
||
|
||
The primary shard assigned to perform the index operation might not be
|
||
available when the index operation is executed. Some reasons for this
|
||
might be that the primary shard is currently recovering from a gateway
|
||
or undergoing relocation. By default, the index operation will wait on
|
||
the primary shard to become available for up to 1 minute before failing
|
||
and responding with an error. The `timeout` parameter can be used to
|
||
explicitly specify how long it waits. Here is an example of setting it
|
||
to 5 minutes:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT twitter/_doc/1?timeout=5m
|
||
{
|
||
"user" : "kimchy",
|
||
"post_date" : "2009-11-15T14:12:12",
|
||
"message" : "trying out Elasticsearch"
|
||
}
|
||
--------------------------------------------------
|
||
|
||
[float]
|
||
[[index-versioning]]
|
||
===== Versioning
|
||
|
||
Each indexed document is given a version number. By default,
|
||
internal versioning is used that starts at 1 and increments
|
||
with each update, deletes included. Optionally, the version number can be
|
||
set to an external value (for example, if maintained in a
|
||
database). To enable this functionality, `version_type` should be set to
|
||
`external`. The value provided must be a numeric, long value greater than or equal to 0,
|
||
and less than around 9.2e+18.
|
||
|
||
When using the external version type, the system checks to see if
|
||
the version number passed to the index request is greater than the
|
||
version of the currently stored document. If true, the document will be
|
||
indexed and the new version number used. If the value provided is less
|
||
than or equal to the stored document's version number, a version
|
||
conflict will occur and the index operation will fail. For example:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT twitter/_doc/1?version=2&version_type=external
|
||
{
|
||
"message" : "elasticsearch now has versioning support, double cool!"
|
||
}
|
||
--------------------------------------------------
|
||
// TEST[continued]
|
||
|
||
NOTE: Versioning is completely real time, and is not affected by the
|
||
near real time aspects of search operations. If no version is provided,
|
||
then the operation is executed without any version checks.
|
||
|
||
In the previous example, the operation will succeed since the supplied
|
||
version of 2 is higher than
|
||
the current document version of 1. If the document was already updated
|
||
and its version was set to 2 or higher, the indexing command will fail
|
||
and result in a conflict (409 http status code).
|
||
|
||
A nice side effect is that there is no need to maintain strict ordering
|
||
of async indexing operations executed as a result of changes to a source
|
||
database, as long as version numbers from the source database are used.
|
||
Even the simple case of updating the Elasticsearch index using data from
|
||
a database is simplified if external versioning is used, as only the
|
||
latest version will be used if the index operations arrive out of order for
|
||
whatever reason.
|
||
|
||
[float]
|
||
[[index-version-types]]
|
||
===== Version types
|
||
|
||
In addition to the `external` version type, Elasticsearch
|
||
also supports other types for specific use cases:
|
||
|
||
[[_version_types]]
|
||
`internal`:: Only index the document if the given version is identical to the version
|
||
of the stored document.
|
||
|
||
`external` or `external_gt`:: Only index the document if the given version is strictly higher
|
||
than the version of the stored document *or* if there is no existing document. The given
|
||
version will be used as the new version and will be stored with the new document. The supplied
|
||
version must be a non-negative long number.
|
||
|
||
`external_gte`:: Only index the document if the given version is *equal* or higher
|
||
than the version of the stored document. If there is no existing document
|
||
the operation will succeed as well. The given version will be used as the new version
|
||
and will be stored with the new document. The supplied version must be a non-negative long number.
|
||
|
||
NOTE: The `external_gte` version type is meant for special use cases and
|
||
should be used with care. If used incorrectly, it can result in loss of data.
|
||
There is another option, `force`, which is deprecated because it can cause
|
||
primary and replica shards to diverge.
|
||
|
||
[[docs-index-api-example]]
|
||
==== {api-examples-title}
|
||
|
||
Insert a JSON document into the `twitter` index with an `_id` of 1:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT twitter/_doc/1
|
||
{
|
||
"user" : "kimchy",
|
||
"post_date" : "2009-11-15T14:12:12",
|
||
"message" : "trying out Elasticsearch"
|
||
}
|
||
--------------------------------------------------
|
||
|
||
The API returns the following result:
|
||
|
||
[source,console-result]
|
||
--------------------------------------------------
|
||
{
|
||
"_shards" : {
|
||
"total" : 2,
|
||
"failed" : 0,
|
||
"successful" : 2
|
||
},
|
||
"_index" : "twitter",
|
||
"_type" : "_doc",
|
||
"_id" : "1",
|
||
"_version" : 1,
|
||
"_seq_no" : 0,
|
||
"_primary_term" : 1,
|
||
"result" : "created"
|
||
}
|
||
--------------------------------------------------
|
||
// TESTRESPONSE[s/"successful" : 2/"successful" : 1/]
|
||
|
||
Use the `_create` resource to index a document into the `twitter` index if
|
||
no document with that ID exists:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT twitter/_create/1
|
||
{
|
||
"user" : "kimchy",
|
||
"post_date" : "2009-11-15T14:12:12",
|
||
"message" : "trying out Elasticsearch"
|
||
}
|
||
--------------------------------------------------
|
||
|
||
Set the `op_type` parameter to _create_ to index a document into the `twitter`
|
||
index if no document with that ID exists:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
PUT twitter/_doc/1?op_type=create
|
||
{
|
||
"user" : "kimchy",
|
||
"post_date" : "2009-11-15T14:12:12",
|
||
"message" : "trying out Elasticsearch"
|
||
}
|
||
--------------------------------------------------
|