375 lines
14 KiB
Plaintext
375 lines
14 KiB
Plaintext
[[docs-index_]]
|
|
== Index API
|
|
|
|
The index API adds or updates a typed JSON document in a specific index,
|
|
making it searchable. The following example inserts the JSON document
|
|
into the "twitter" index, under a type called "tweet" with an id of 1:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter/tweet/1
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
The result of the above index operation is:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_shards" : {
|
|
"total" : 2,
|
|
"failed" : 0,
|
|
"successful" : 2
|
|
},
|
|
"_index" : "twitter",
|
|
"_type" : "tweet",
|
|
"_id" : "1",
|
|
"_version" : 1,
|
|
"created" : true,
|
|
"forced_refresh": false
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"successful" : 2/"successful" : 1/]
|
|
|
|
The `_shards` header provides information about the replication process of the index operation.
|
|
|
|
* `total` - Indicates to how many shard copies (primary and replica shards) the index operation should be executed on.
|
|
* `successful`- Indicates the number of shard copies the index operation succeeded on.
|
|
* `failures` - An array that contains replication related errors in the case an index operation failed on a replica shard.
|
|
|
|
The index operation is successful in the case `successful` is at least 1.
|
|
|
|
NOTE: Replica shards may not all be started when an indexing operation successfully returns (by default, a quorum is
|
|
required). In that case, `total` will be equal to the total shards based on the index replica settings and
|
|
`successful` will be equal to the number of shards started (primary plus replicas). As there were no failures,
|
|
the `failed` will be 0.
|
|
|
|
[float]
|
|
[[index-creation]]
|
|
=== Automatic Index Creation
|
|
|
|
The index operation automatically creates an index if it has not been
|
|
created before (check out the
|
|
<<indices-create-index,create index API>> for manually
|
|
creating an index), and also automatically creates a
|
|
dynamic type mapping for the specific type if one has not yet been
|
|
created (check out the <<indices-put-mapping,put mapping>>
|
|
API for manually creating a type mapping).
|
|
|
|
The mapping itself is very flexible and is schema-free. New fields and
|
|
objects will automatically be added to the mapping definition of the
|
|
type specified. Check out the <<mapping,mapping>>
|
|
section for more information on mapping definitions.
|
|
|
|
Automatic index creation can be disabled by setting
|
|
`action.auto_create_index` to `false` in the config file of all nodes.
|
|
Automatic mapping creation can be disabled by setting
|
|
`index.mapper.dynamic` to `false` in the config files of all nodes (or
|
|
on the specific index settings).
|
|
|
|
Automatic index creation can include a pattern based white/black list,
|
|
for example, set `action.auto_create_index` to `+aaa*,-bbb*,+ccc*,-*` (+
|
|
meaning allowed, and - meaning disallowed).
|
|
|
|
[float]
|
|
[[index-versioning]]
|
|
=== Versioning
|
|
|
|
Each indexed document is given a version number. The associated
|
|
`version` number is returned as part of the response to the index API
|
|
request. The index API optionally allows for
|
|
http://en.wikipedia.org/wiki/Optimistic_concurrency_control[optimistic
|
|
concurrency control] when the `version` parameter is specified. This
|
|
will control the version of the document the operation is intended to be
|
|
executed against. A good example of a use case for versioning is
|
|
performing a transactional read-then-update. Specifying a `version` from
|
|
the document initially read ensures no changes have happened in the
|
|
meantime (when reading in order to update, it is recommended to set
|
|
`preference` to `_primary`). For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter/tweet/1?version=2
|
|
{
|
|
"message" : "elasticsearch now has versioning support, double cool!"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
// TEST[catch: conflict]
|
|
|
|
*NOTE:* versioning is completely real time, and is not affected by the
|
|
near real time aspects of search operations. If no version is provided,
|
|
then the operation is executed without any version checks.
|
|
|
|
By default, internal versioning is used that starts at 1 and increments
|
|
with each update, deletes included. Optionally, the version number can be
|
|
supplemented with an external value (for example, if maintained in a
|
|
database). To enable this functionality, `version_type` should be set to
|
|
`external`. The value provided must be a numeric, long value greater or equal to 0,
|
|
and less than around 9.2e+18. When using the external version type, instead
|
|
of checking for a matching version number, the system checks to see if
|
|
the version number passed to the index request is greater than the
|
|
version of the currently stored document. If true, the document will be
|
|
indexed and the new version number used. If the value provided is less
|
|
than or equal to the stored document's version number, a version
|
|
conflict will occur and the index operation will fail.
|
|
|
|
WARNING: External versioning supports the value 0 as a valid version number.
|
|
This allows the version to be in sync with an external versioning system
|
|
where version numbers start from zero instead of one. It has the side effect
|
|
that documents with version number equal to zero cannot neither be updated
|
|
using the <<docs-update-by-query,Update-By-Query API>> nor be deleted
|
|
using the <<docs-delete-by-query,Delete By Query API>> as long as their
|
|
version number is equal to zero.
|
|
|
|
A nice side effect is that there is no need to maintain strict ordering
|
|
of async indexing operations executed as a result of changes to a source
|
|
database, as long as version numbers from the source database are used.
|
|
Even the simple case of updating the elasticsearch index using data from
|
|
a database is simplified if external versioning is used, as only the
|
|
latest version will be used if the index operations are out of order for
|
|
whatever reason.
|
|
|
|
[float]
|
|
==== Version types
|
|
|
|
Next to the `internal` & `external` version types explained above, Elasticsearch
|
|
also supports other types for specific use cases. Here is an overview of
|
|
the different version types and their semantics.
|
|
|
|
`internal`:: only index the document if the given version is identical to the version
|
|
of the stored document.
|
|
|
|
`external` or `external_gt`:: only index the document if the given version is strictly higher
|
|
than the version of the stored document *or* if there is no existing document. The given
|
|
version will be used as the new version and will be stored with the new document. The supplied
|
|
version must be a non-negative long number.
|
|
|
|
`external_gte`:: only index the document if the given version is *equal* or higher
|
|
than the version of the stored document. If there is no existing document
|
|
the operation will succeed as well. The given version will be used as the new version
|
|
and will be stored with the new document. The supplied version must be a non-negative long number.
|
|
|
|
`force`:: the document will be indexed regardless of the version of the stored document or if there
|
|
is no existing document. The given version will be used as the new version and will be stored
|
|
with the new document. This version type is typically used for correcting errors.
|
|
|
|
*NOTE*: The `external_gte` & `force` version types are meant for special use cases and should be used
|
|
with care. If used incorrectly, they can result in loss of data.
|
|
|
|
[float]
|
|
[[operation-type]]
|
|
=== Operation Type
|
|
|
|
The index operation also accepts an `op_type` that can be used to force
|
|
a `create` operation, allowing for "put-if-absent" behavior. When
|
|
`create` is used, the index operation will fail if a document by that id
|
|
already exists in the index.
|
|
|
|
Here is an example of using the `op_type` parameter:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter/tweet/1?op_type=create
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
Another option to specify `create` is to use the following uri:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter/tweet/1/_create
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
[float]
|
|
=== Automatic ID Generation
|
|
|
|
The index operation can be executed without specifying the id. In such a
|
|
case, an id will be generated automatically. In addition, the `op_type`
|
|
will automatically be set to `create`. Here is an example (note the
|
|
*POST* used instead of *PUT*):
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST twitter/tweet/
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
The result of the above index operation is:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_shards" : {
|
|
"total" : 2,
|
|
"failed" : 0,
|
|
"successful" : 2
|
|
},
|
|
"_index" : "twitter",
|
|
"_type" : "tweet",
|
|
"_id" : "6a8ca01c-7896-48e9-81cc-9f70661fcb32",
|
|
"_version" : 1,
|
|
"created" : true,
|
|
"forced_refresh": false
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/6a8ca01c-7896-48e9-81cc-9f70661fcb32/$body._id/ s/"successful" : 2/"successful" : 1/]
|
|
|
|
[float]
|
|
[[index-routing]]
|
|
=== Routing
|
|
|
|
By default, shard placement — or `routing` — is controlled by using a
|
|
hash of the document's id value. For more explicit control, the value
|
|
fed into the hash function used by the router can be directly specified
|
|
on a per-operation basis using the `routing` parameter. For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST twitter/tweet?routing=kimchy
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
In the example above, the "tweet" document is routed to a shard based on
|
|
the `routing` parameter provided: "kimchy".
|
|
|
|
When setting up explicit mapping, the `_routing` field can be optionally
|
|
used to direct the index operation to extract the routing value from the
|
|
document itself. This does come at the (very minimal) cost of an
|
|
additional document parsing pass. If the `_routing` mapping is defined
|
|
and set to be `required`, the index operation will fail if no routing
|
|
value is provided or extracted.
|
|
|
|
[float]
|
|
[[parent-children]]
|
|
=== Parents & Children
|
|
|
|
A child document can be indexed by specifying its parent when indexing.
|
|
For example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT blogs
|
|
{
|
|
"mappings": {
|
|
"tag_parent": {},
|
|
"blog_tag": {
|
|
"_parent": {
|
|
"type": "tag_parent"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
PUT blogs/blog_tag/1122?parent=1111
|
|
{
|
|
"tag" : "something"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
When indexing a child document, the routing value is automatically set
|
|
to be the same as its parent, unless the routing value is explicitly
|
|
specified using the `routing` parameter.
|
|
|
|
[float]
|
|
[[index-distributed]]
|
|
=== Distributed
|
|
|
|
The index operation is directed to the primary shard based on its route
|
|
(see the Routing section above) and performed on the actual node
|
|
containing this shard. After the primary shard completes the operation,
|
|
if needed, the update is distributed to applicable replicas.
|
|
|
|
[float]
|
|
[[index-consistency]]
|
|
=== Write Consistency
|
|
|
|
To prevent writes from taking place on the "wrong" side of a network
|
|
partition, by default, index operations only succeed if a quorum
|
|
(>replicas/2+1) of active shards are available. This default can be
|
|
overridden on a node-by-node basis using the `action.write_consistency`
|
|
setting. To alter this behavior per-operation, the `consistency` request
|
|
parameter can be used.
|
|
|
|
Valid write consistency values are `one`, `quorum`, and `all`.
|
|
|
|
Note, for the case where the number of replicas is 1 (total of 2 copies
|
|
of the data), then the default behavior is to succeed if 1 copy (the primary)
|
|
can perform the write.
|
|
|
|
The index operation only returns after all *active* shards within the
|
|
replication group have indexed the document (sync replication).
|
|
|
|
[float]
|
|
[[index-refresh]]
|
|
=== Refresh
|
|
|
|
Control when the changes made by this request are visible to search. See
|
|
<<docs-refresh>>.
|
|
|
|
[float]
|
|
[[index-noop]]
|
|
=== Noop Updates
|
|
|
|
When updating a document using the index api a new version of the document is
|
|
always created even if the document hasn't changed. If this isn't acceptable
|
|
use the `_update` api with `detect_noop` set to true. This option isn't
|
|
available on the index api because the index api doesn't fetch the old source
|
|
and isn't able to compare it against the new source.
|
|
|
|
There isn't a hard and fast rule about when noop updates aren't acceptable.
|
|
It's a combination of lots of factors like how frequently your data source
|
|
sends updates that are actually noops and how many queries per second
|
|
elasticsearch runs on the shard with receiving the updates.
|
|
|
|
[float]
|
|
[[timeout]]
|
|
=== Timeout
|
|
|
|
The primary shard assigned to perform the index operation might not be
|
|
available when the index operation is executed. Some reasons for this
|
|
might be that the primary shard is currently recovering from a gateway
|
|
or undergoing relocation. By default, the index operation will wait on
|
|
the primary shard to become available for up to 1 minute before failing
|
|
and responding with an error. The `timeout` parameter can be used to
|
|
explicitly specify how long it waits. Here is an example of setting it
|
|
to 5 minutes:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT twitter/tweet/1?timeout=5m
|
|
{
|
|
"user" : "kimchy",
|
|
"post_date" : "2009-11-15T14:12:12",
|
|
"message" : "trying out Elasticsearch"
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|