408 lines
15 KiB
Plaintext
408 lines
15 KiB
Plaintext
[[docs-index_]]
|
|
== Index API
|
|
|
|
IMPORTANT: See <<removal-of-types>>.
|
|
|
|
The index API adds or updates a JSON document in a specific index,
|
|
making it searchable. The following example inserts the JSON document
|
|
into the "twitter" index with an id of 1:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter/_doc/1
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
The result of the above index operation is:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_shards" : {
|
|
"total" : 2,
|
|
"failed" : 0,
|
|
"successful" : 2
|
|
},
|
|
"_index" : "twitter",
|
|
"_type" : "_doc",
|
|
"_id" : "1",
|
|
"_version" : 1,
|
|
"_seq_no" : 0,
|
|
"_primary_term" : 1,
|
|
"result" : "created"
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"successful" : 2/"successful" : 1/]
|
|
|
|
The `_shards` header provides information about the replication process of the index operation:
|
|
|
|
`total`:: Indicates how many shard copies (primary and replica shards) the index operation should be executed on.
|
|
`successful`:: Indicates the number of shard copies the index operation succeeded on.
|
|
`failed`:: An array that contains replication-related errors in the case an index operation failed on a replica shard.
|
|
|
|
The index operation is successful in the case `successful` is at least 1.
|
|
|
|
NOTE: Replica shards may not all be started when an indexing operation successfully returns (by default, only the
|
|
primary is required, but this behavior can be <<index-wait-for-active-shards,changed>>). In that case,
|
|
`total` will be equal to the total shards based on the `number_of_replicas` setting and `successful` will be
|
|
equal to the number of shards started (primary plus replicas). If there were no failures, the `failed` will be 0.
|
|
|
|
[float]
|
|
[[index-creation]]
|
|
=== Automatic Index Creation
|
|
|
|
The index operation automatically creates an index if it does not already
|
|
exist, and applies any <<indices-templates,index templates>> that are
|
|
configured. The index operation also creates a dynamic mapping if one does not
|
|
already exist. By default, new fields and objects will automatically be added
|
|
to the mapping definition if needed. Check out the <<mapping,mapping>> section
|
|
for more information on mapping definitions, and the
|
|
<<indices-put-mapping,put mapping>> API for information about updating mappings
|
|
manually.
|
|
|
|
Automatic index creation is controlled by the `action.auto_create_index`
|
|
setting. This setting defaults to `true`, meaning that indices are always
|
|
automatically created. Automatic index creation can be permitted only for
|
|
indices matching certain patterns by changing the value of this setting to a
|
|
comma-separated list of these patterns. It can also be explicitly permitted and
|
|
forbidden by prefixing patterns in the list with a `+` or `-`. Finally it can
|
|
be completely disabled by changing this setting to `false`.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT _cluster/settings
|
|
{
|
|
"persistent": {
|
|
"action.auto_create_index": "twitter,index10,-index1*,+ind*" <1>
|
|
}
|
|
}
|
|
|
|
PUT _cluster/settings
|
|
{
|
|
"persistent": {
|
|
"action.auto_create_index": "false" <2>
|
|
}
|
|
}
|
|
|
|
PUT _cluster/settings
|
|
{
|
|
"persistent": {
|
|
"action.auto_create_index": "true" <3>
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
<1> Permit only the auto-creation of indices called `twitter`, `index10`, no
|
|
other index matching `index1*`, and any other index matching `ind*`. The
|
|
patterns are matched in the order in which they are given.
|
|
|
|
<2> Completely disable the auto-creation of indices.
|
|
|
|
<3> Permit the auto-creation of indices with any name. This is the default.
|
|
|
|
[float]
|
|
[[operation-type]]
|
|
=== Operation Type
|
|
|
|
The index operation also accepts an `op_type` that can be used to force
|
|
a `create` operation, allowing for "put-if-absent" behavior. When
|
|
`create` is used, the index operation will fail if a document by that id
|
|
already exists in the index.
|
|
|
|
Here is an example of using the `op_type` parameter:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter/_doc/1?op_type=create
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
Another option to specify `create` is to use the following uri:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter/_create/1
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
[float]
|
|
=== Automatic ID Generation
|
|
|
|
The index operation can be executed without specifying the id. In such a
|
|
case, an id will be generated automatically. In addition, the `op_type`
|
|
will automatically be set to `create`. Here is an example (note the
|
|
*POST* used instead of *PUT*):
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST twitter/_doc/
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
The result of the above index operation is:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_shards" : {
|
|
"total" : 2,
|
|
"failed" : 0,
|
|
"successful" : 2
|
|
},
|
|
"_index" : "twitter",
|
|
"_type" : "_doc",
|
|
"_id" : "W0tpsmIBdwcYyG50zbta",
|
|
"_version" : 1,
|
|
"_seq_no" : 0,
|
|
"_primary_term" : 1,
|
|
"result": "created"
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/W0tpsmIBdwcYyG50zbta/$body._id/ s/"successful" : 2/"successful" : 1/]
|
|
|
|
[float]
|
|
[[optimistic-concurrency-control-index]]
|
|
=== Optimistic concurrency control
|
|
|
|
Index operations can be made conditional and only be performed if the last
|
|
modification to the document was assigned the sequence number and primary
|
|
term specified by the `if_seq_no` and `if_primary_term` parameters. If a
|
|
mismatch is detected, the operation will result in a `VersionConflictException`
|
|
and a status code of 409. See <<optimistic-concurrency-control>> for more details.
|
|
|
|
[float]
|
|
[[index-routing]]
|
|
=== Routing
|
|
|
|
By default, shard placement ? or `routing` ? is controlled by using a
|
|
hash of the document's id value. For more explicit control, the value
|
|
fed into the hash function used by the router can be directly specified
|
|
on a per-operation basis using the `routing` parameter. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST twitter/_doc?routing=kimchy
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
In the example above, the "_doc" document is routed to a shard based on
|
|
the `routing` parameter provided: "kimchy".
|
|
|
|
When setting up explicit mapping, the `_routing` field can be optionally
|
|
used to direct the index operation to extract the routing value from the
|
|
document itself. This does come at the (very minimal) cost of an
|
|
additional document parsing pass. If the `_routing` mapping is defined
|
|
and set to be `required`, the index operation will fail if no routing
|
|
value is provided or extracted.
|
|
|
|
[float]
|
|
[[index-distributed]]
|
|
=== Distributed
|
|
|
|
The index operation is directed to the primary shard based on its route
|
|
(see the Routing section above) and performed on the actual node
|
|
containing this shard. After the primary shard completes the operation,
|
|
if needed, the update is distributed to applicable replicas.
|
|
|
|
[float]
|
|
[[index-wait-for-active-shards]]
|
|
=== Wait For Active Shards
|
|
|
|
To improve the resiliency of writes to the system, indexing operations
|
|
can be configured to wait for a certain number of active shard copies
|
|
before proceeding with the operation. If the requisite number of active
|
|
shard copies are not available, then the write operation must wait and
|
|
retry, until either the requisite shard copies have started or a timeout
|
|
occurs. By default, write operations only wait for the primary shards
|
|
to be active before proceeding (i.e. `wait_for_active_shards=1`).
|
|
This default can be overridden in the index settings dynamically
|
|
by setting `index.write.wait_for_active_shards`. To alter this behavior
|
|
per operation, the `wait_for_active_shards` request parameter can be used.
|
|
|
|
Valid values are `all` or any positive integer up to the total number
|
|
of configured copies per shard in the index (which is `number_of_replicas+1`).
|
|
Specifying a negative value or a number greater than the number of
|
|
shard copies will throw an error.
|
|
|
|
For example, suppose we have a cluster of three nodes, `A`, `B`, and `C` and
|
|
we create an index `index` with the number of replicas set to 3 (resulting in
|
|
4 shard copies, one more copy than there are nodes). If we
|
|
attempt an indexing operation, by default the operation will only ensure
|
|
the primary copy of each shard is available before proceeding. This means
|
|
that even if `B` and `C` went down, and `A` hosted the primary shard copies,
|
|
the indexing operation would still proceed with only one copy of the data.
|
|
If `wait_for_active_shards` is set on the request to `3` (and all 3 nodes
|
|
are up), then the indexing operation will require 3 active shard copies
|
|
before proceeding, a requirement which should be met because there are 3
|
|
active nodes in the cluster, each one holding a copy of the shard. However,
|
|
if we set `wait_for_active_shards` to `all` (or to `4`, which is the same),
|
|
the indexing operation will not proceed as we do not have all 4 copies of
|
|
each shard active in the index. The operation will timeout
|
|
unless a new node is brought up in the cluster to host the fourth copy of
|
|
the shard.
|
|
|
|
It is important to note that this setting greatly reduces the chances of
|
|
the write operation not writing to the requisite number of shard copies,
|
|
but it does not completely eliminate the possibility, because this check
|
|
occurs before the write operation commences. Once the write operation
|
|
is underway, it is still possible for replication to fail on any number of
|
|
shard copies but still succeed on the primary. The `_shards` section of the
|
|
write operation's response reveals the number of shard copies on which
|
|
replication succeeded/failed.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_shards" : {
|
|
"total" : 2,
|
|
"failed" : 0,
|
|
"successful" : 2
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// NOTCONSOLE
|
|
|
|
[float]
|
|
[[index-refresh]]
|
|
=== Refresh
|
|
|
|
Control when the changes made by this request are visible to search. See
|
|
<<docs-refresh,refresh>>.
|
|
|
|
[float]
|
|
[[index-noop]]
|
|
=== Noop Updates
|
|
|
|
When updating a document using the index API a new version of the document is
|
|
always created even if the document hasn't changed. If this isn't acceptable
|
|
use the `_update` API with `detect_noop` set to true. This option isn't
|
|
available on the index API because the index API doesn't fetch the old source
|
|
and isn't able to compare it against the new source.
|
|
|
|
There isn't a hard and fast rule about when noop updates aren't acceptable.
|
|
It's a combination of lots of factors like how frequently your data source
|
|
sends updates that are actually noops and how many queries per second
|
|
Elasticsearch runs on the shard receiving the updates.
|
|
|
|
[float]
|
|
[[timeout]]
|
|
=== Timeout
|
|
|
|
The primary shard assigned to perform the index operation might not be
|
|
available when the index operation is executed. Some reasons for this
|
|
might be that the primary shard is currently recovering from a gateway
|
|
or undergoing relocation. By default, the index operation will wait on
|
|
the primary shard to become available for up to 1 minute before failing
|
|
and responding with an error. The `timeout` parameter can be used to
|
|
explicitly specify how long it waits. Here is an example of setting it
|
|
to 5 minutes:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter/_doc/1?timeout=5m
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
[float]
|
|
[[index-versioning]]
|
|
=== Versioning
|
|
|
|
Each indexed document is given a version number. By default,
|
|
internal versioning is used that starts at 1 and increments
|
|
with each update, deletes included. Optionally, the version number can be
|
|
set to an external value (for example, if maintained in a
|
|
database). To enable this functionality, `version_type` should be set to
|
|
`external`. The value provided must be a numeric, long value greater than or equal to 0,
|
|
and less than around 9.2e+18.
|
|
|
|
When using the external version type, the system checks to see if
|
|
the version number passed to the index request is greater than the
|
|
version of the currently stored document. If true, the document will be
|
|
indexed and the new version number used. If the value provided is less
|
|
than or equal to the stored document's version number, a version
|
|
conflict will occur and the index operation will fail. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter/_doc/1?version=2&version_type=external
|
|
{
|
|
"message" : "elasticsearch now has versioning support, double cool!"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[continued]
|
|
|
|
*NOTE:* Versioning is completely real time, and is not affected by the
|
|
near real time aspects of search operations. If no version is provided,
|
|
then the operation is executed without any version checks.
|
|
|
|
The above will succeed since the supplied version of 2 is higher than
|
|
the current document version of 1. If the document was already updated
|
|
and its version was set to 2 or higher, the indexing command will fail
|
|
and result in a conflict (409 http status code).
|
|
|
|
A nice side effect is that there is no need to maintain strict ordering
|
|
of async indexing operations executed as a result of changes to a source
|
|
database, as long as version numbers from the source database are used.
|
|
Even the simple case of updating the Elasticsearch index using data from
|
|
a database is simplified if external versioning is used, as only the
|
|
latest version will be used if the index operations arrive out of order for
|
|
whatever reason.
|
|
|
|
[float]
|
|
==== Version types
|
|
|
|
Next to the `external` version type explained above, Elasticsearch
|
|
also supports other types for specific use cases. Here is an overview of
|
|
the different version types and their semantics.
|
|
|
|
`internal`:: Only index the document if the given version is identical to the version
|
|
of the stored document.
|
|
|
|
`external` or `external_gt`:: Only index the document if the given version is strictly higher
|
|
than the version of the stored document *or* if there is no existing document. The given
|
|
version will be used as the new version and will be stored with the new document. The supplied
|
|
version must be a non-negative long number.
|
|
|
|
`external_gte`:: Only index the document if the given version is *equal* or higher
|
|
than the version of the stored document. If there is no existing document
|
|
the operation will succeed as well. The given version will be used as the new version
|
|
and will be stored with the new document. The supplied version must be a non-negative long number.
|
|
|
|
*NOTE*: The `external_gte` version type is meant for special use cases and
|
|
should be used with care. If used incorrectly, it can result in loss of data.
|
|
There is another option, `force`, which is deprecated because it can cause
|
|
primary and replica shards to diverge.
|
|
|