Document Seq No powered optimistic concurrency control (#37284)

Add documentation to describe the new sequence number powered optimistic concurrency control

Relates #36148
Relates #10708
This commit is contained in:
Boaz Leskes 2019-01-11 07:59:15 -08:00 committed by GitHub
parent 1eba1d1df9
commit cae71cddfe
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 227 additions and 83 deletions

View File

@ -50,3 +50,5 @@ include::docs/termvectors.asciidoc[]
include::docs/multi-termvectors.asciidoc[] include::docs/multi-termvectors.asciidoc[]
include::docs/refresh.asciidoc[] include::docs/refresh.asciidoc[]
include::docs/concurrency-control.asciidoc[]

View File

@ -197,6 +197,17 @@ size for your particular workload.
If using the HTTP API, make sure that the client does not send HTTP If using the HTTP API, make sure that the client does not send HTTP
chunks, as this will slow things down. chunks, as this will slow things down.
[float]
[[bulk-optimistic-concurrency-control]]
=== Optimistic Concurrency Control
Each `index` and `delete` action within a bulk API call may include the
`if_seq_no` and `if_primary_term` parameters in their respective action
and meta data lines. The `if_seq_no` and `if_primary_term` parameters control
how operations are executed, based on the last modification to existing
documents. See <<optimistic-concurrency-control>> for more details.
[float] [float]
[[bulk-versioning]] [[bulk-versioning]]
=== Versioning === Versioning

View File

@ -0,0 +1,114 @@
[[optimistic-concurrency-control]]
== Optimistic concurrency control
Elasticsearch is distributed. When documents are created, updated, or deleted,
the new version of the document has to be replicated to other nodes in the cluster.
Elasticsearch is also asynchronous and concurrent, meaning that these replication
requests are sent in parallel, and may arrive at their destination out of sequence.
Elasticsearch needs a way of ensuring that an older version of a document never
overwrites a newer version.
To ensure an older version of a document doesn't overwrite a newer version, every
operation performed to a document is assigned a sequence number by the primary
shard that coordinates that change. The sequence number is increased with each
operation and thus newer operations are guaranteed to have a higher sequence
number than older operations. Elasticsearch can then use the sequence number of
operations to make sure they never override a newer document version is never
overridden by a change that has a smaller sequence number assigned to it.
For example, the following indexing command will create a document and assign it
an initial sequence number and primary term:
[source,js]
--------------------------------------------------
PUT products/_doc/1567
{
"product" : "r2d2",
"details" : "A resourceful astromech droid"
}
--------------------------------------------------
// CONSOLE
You can see the assigned sequence number and primary term in the
the `_seq_no` and `_primary_term` fields of the response:
[source,js]
--------------------------------------------------
{
"_shards" : {
"total" : 2,
"failed" : 0,
"successful" : 1
},
"_index" : "products",
"_type" : "_doc",
"_id" : "1567",
"_version" : 1,
"_seq_no" : 362,
"_primary_term" : 2,
"result" : "created"
}
--------------------------------------------------
// TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 2/"_primary_term" : $body._primary_term/]
Elasticsearch keeps tracks of the sequence number and primary of the last
operation to have changed each of the document it stores. The sequence number
and primary term are returned in the `_seq_no` and `_primary_term` fields in
the response of the <<docs-get,GET API>>:
[source,js]
--------------------------------------------------
GET products/_doc/1567
--------------------------------------------------
// CONSOLE
// TEST[continued]
returns:
[source,js]
--------------------------------------------------
{
"_index" : "products",
"_type" : "_doc",
"_id" : "1567",
"_version" : 1,
"_seq_no" : 362,
"_primary_term" : 2,
"found": true,
"_source" : {
"product" : "r2d2",
"details" : "A resourceful astromech droid"
}
}
--------------------------------------------------
// TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 2/"_primary_term" : $body._primary_term/]
Note: The <<search-search,Search API>> can return the `_seq_no` and `_primary_term`
for each search hit by requesting the `_seq_no` and `_primary_term` <<search-request-docvalue-fields,Doc Value Fields>>.
The sequence number and the primary term uniquely identify a change. By noting down
the sequence number and primary term returned, you can make sure to only change the
document if no other change was made to it since you retrieved it. This
is done by setting the `if_seq_no` and `if_primary_term` parameters of either the
<<docs-index_,Index API>> or the <<docs-delete,Delete API>>.
For example, the following indexing call will make sure to add a tag to the
document without losing any potential change to the description or an addition
of another tag by another API:
[source,js]
--------------------------------------------------
PUT products/_doc/1567?if_seq_no=362&if_primary_term=2
{
"product" : "r2d2",
"details" : "A resourceful astromech droid",
"tags": ["droid"]
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
// TEST[catch: conflict]

View File

@ -35,6 +35,16 @@ The result of the above delete operation is:
// TESTRESPONSE[s/"_primary_term" : 1/"_primary_term" : $body._primary_term/] // TESTRESPONSE[s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
// TESTRESPONSE[s/"_seq_no" : 5/"_seq_no" : $body._seq_no/] // TESTRESPONSE[s/"_seq_no" : 5/"_seq_no" : $body._seq_no/]
[float]
[[optimistic-concurrency-control-delete]]
=== Optimistic concurrency control
Delete operations can be made optional and only be performed if the last
modification to the document was assigned the sequence number and primary
term specified by the `if_seq_no` and `if_primary_term` parameters. If a
mismatch is detected, the operation will result in a `VersionConflictException`
and a status code of 409. See <<optimistic-concurrency-control>> for more details.
[float] [float]
[[delete-versioning]] [[delete-versioning]]
=== Versioning === Versioning

View File

@ -79,89 +79,6 @@ Automatic index creation can include a pattern based white/black list,
for example, set `action.auto_create_index` to `+aaa*,-bbb*,+ccc*,-*` (+ for example, set `action.auto_create_index` to `+aaa*,-bbb*,+ccc*,-*` (+
meaning allowed, and - meaning disallowed). meaning allowed, and - meaning disallowed).
[float]
[[index-versioning]]
=== Versioning
Each indexed document is given a version number. The associated
`version` number is returned as part of the response to the index API
request. The index API optionally allows for
http://en.wikipedia.org/wiki/Optimistic_concurrency_control[optimistic
concurrency control] when the `version` parameter is specified. This
will control the version of the document the operation is intended to be
executed against. A good example of a use case for versioning is
performing a transactional read-then-update. Specifying a `version` from
the document initially read ensures no changes have happened in the
meantime. For example:
[source,js]
--------------------------------------------------
PUT twitter/_doc/1?version=2
{
"message" : "elasticsearch now has versioning support, double cool!"
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
// TEST[catch: conflict]
*NOTE:* versioning is completely real time, and is not affected by the
near real time aspects of search operations. If no version is provided,
then the operation is executed without any version checks.
By default, internal versioning is used that starts at 1 and increments
with each update, deletes included. Optionally, the version number can be
supplemented with an external value (for example, if maintained in a
database). To enable this functionality, `version_type` should be set to
`external`. The value provided must be a numeric, long value greater or equal to 0,
and less than around 9.2e+18. When using the external version type, instead
of checking for a matching version number, the system checks to see if
the version number passed to the index request is greater than the
version of the currently stored document. If true, the document will be
indexed and the new version number used. If the value provided is less
than or equal to the stored document's version number, a version
conflict will occur and the index operation will fail.
WARNING: External versioning supports the value 0 as a valid version number.
This allows the version to be in sync with an external versioning system
where version numbers start from zero instead of one. It has the side effect
that documents with version number equal to zero cannot neither be updated
using the <<docs-update-by-query,Update-By-Query API>> nor be deleted
using the <<docs-delete-by-query,Delete By Query API>> as long as their
version number is equal to zero.
A nice side effect is that there is no need to maintain strict ordering
of async indexing operations executed as a result of changes to a source
database, as long as version numbers from the source database are used.
Even the simple case of updating the Elasticsearch index using data from
a database is simplified if external versioning is used, as only the
latest version will be used if the index operations are out of order for
whatever reason.
[float]
==== Version types
Next to the `internal` & `external` version types explained above, Elasticsearch
also supports other types for specific use cases. Here is an overview of
the different version types and their semantics.
`internal`:: only index the document if the given version is identical to the version
of the stored document.
`external` or `external_gt`:: only index the document if the given version is strictly higher
than the version of the stored document *or* if there is no existing document. The given
version will be used as the new version and will be stored with the new document. The supplied
version must be a non-negative long number.
`external_gte`:: only index the document if the given version is *equal* or higher
than the version of the stored document. If there is no existing document
the operation will succeed as well. The given version will be used as the new version
and will be stored with the new document. The supplied version must be a non-negative long number.
*NOTE*: The `external_gte` version type is meant for special use cases and
should be used with care. If used incorrectly, it can result in loss of data.
There is another option, `force`, which is deprecated because it can cause
primary and replica shards to diverge.
[float] [float]
[[operation-type]] [[operation-type]]
@ -238,6 +155,16 @@ The result of the above index operation is:
-------------------------------------------------- --------------------------------------------------
// TESTRESPONSE[s/W0tpsmIBdwcYyG50zbta/$body._id/ s/"successful" : 2/"successful" : 1/] // TESTRESPONSE[s/W0tpsmIBdwcYyG50zbta/$body._id/ s/"successful" : 2/"successful" : 1/]
[float]
[[optimistic-concurrency-control-index]]
=== Optimistic concurrency control
Index operations can be made optional and only be performed if the last
modification to the document was assigned the sequence number and primary
term specified by the `if_seq_no` and `if_primary_term` parameters. If a
mismatch is detected, the operation will result in a `VersionConflictException`
and a status code of 409. See <<optimistic-concurrency-control>> for more details.
[float] [float]
[[index-routing]] [[index-routing]]
=== Routing === Routing
@ -380,3 +307,83 @@ PUT twitter/_doc/1?timeout=5m
} }
-------------------------------------------------- --------------------------------------------------
// CONSOLE // CONSOLE
[float]
[[index-versioning]]
=== Versioning
Each indexed document is given a version number. By default,
internal versioning is used that starts at 1 and increments
with each update, deletes included. Optionally, the version number can be
set to an external value (for example, if maintained in a
database). To enable this functionality, `version_type` should be set to
`external`. The value provided must be a numeric, long value greater or equal to 0,
and less than around 9.2e+18.
When using the external version type, the system checks to see if
the version number passed to the index request is greater than the
version of the currently stored document. If true, the document will be
indexed and the new version number used. If the value provided is less
than or equal to the stored document's version number, a version
conflict will occur and the index operation will fail. For example:
[source,js]
--------------------------------------------------
PUT twitter/_doc/1?version=2&version_type=external
{
"message" : "elasticsearch now has versioning support, double cool!"
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
*NOTE:* versioning is completely real time, and is not affected by the
near real time aspects of search operations. If no version is provided,
then the operation is executed without any version checks.
The above will succeed since the the supplied version of 2 is higher than
the current document version of 1. If the document was already updated
and it's version was set to 2 or higher, the indexing command will fail
and result in a conflict (409 http status code).
WARNING: External versioning supports the value 0 as a valid version number.
This allows the version to be in sync with an external versioning system
where version numbers start from zero instead of one. It has the side effect
that documents with version number equal to zero cannot neither be updated
using the <<docs-update-by-query,Update-By-Query API>> nor be deleted
using the <<docs-delete-by-query,Delete By Query API>> as long as their
version number is equal to zero.
A nice side effect is that there is no need to maintain strict ordering
of async indexing operations executed as a result of changes to a source
database, as long as version numbers from the source database are used.
Even the simple case of updating the Elasticsearch index using data from
a database is simplified if external versioning is used, as only the
latest version will be used if the index operations are out of order for
whatever reason.
[float]
==== Version types
Next to the `external` version type explained above, Elasticsearch
also supports other types for specific use cases. Here is an overview of
the different version types and their semantics.
`internal`:: only index the document if the given version is identical to the version
of the stored document.
`external` or `external_gt`:: only index the document if the given version is strictly higher
than the version of the stored document *or* if there is no existing document. The given
version will be used as the new version and will be stored with the new document. The supplied
version must be a non-negative long number.
`external_gte`:: only index the document if the given version is *equal* or higher
than the version of the stored document. If there is no existing document
the operation will succeed as well. The given version will be used as the new version
and will be stored with the new document. The supplied version must be a non-negative long number.
*NOTE*: The `external_gte` version type is meant for special use cases and
should be used with care. If used incorrectly, it can result in loss of data.
There is another option, `force`, which is deprecated because it can cause
primary and replica shards to diverge.