6.1 KiB
layout | title | parent | nav_order |
---|---|---|---|
default | Bulk | Document APIs | 20 |
Bulk
Introduced 1.0 {: .label .label-purple }
The bulk operation lets you add, update, or delete many documents in a single request. Compared to individual OpenSearch indexing requests, the bulk operation has significant performance benefits. Whenever practical, we recommend batching indexing operations into bulk requests.
Example
POST _bulk
{ "delete": { "_index": "movies", "_id": "tt2229499" } }
{ "index": { "_index": "movies", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013 }
{ "create": { "_index": "movies", "_id": "tt1392214" } }
{ "title": "Prisoners", "year": 2013 }
{ "update": { "_index": "movies", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } }
{% include copy-curl.html %}
Path and HTTP methods
POST _bulk
POST <index>/_bulk
Specifying the index in the path means you don't need to include it in the request body.
OpenSearch also accepts PUT requests to the _bulk
path, but we highly recommend using POST. The accepted usage of PUT---adding or replacing a single resource at a given path---doesn't make sense for bulk requests.
{: .note }
URL parameters
All bulk URL parameters are optional.
Parameter | Type | Description |
---|---|---|
pipeline | String | The pipeline ID for preprocessing documents. |
refresh | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is false . true makes the changes show up in search results immediately, but hurts cluster performance. wait_for waits for a refresh. Requests take longer to return, but cluster performance doesn't suffer. |
require_alias | Boolean | Set to true to require that all actions target an index alias rather than an index. Default is false . |
routing | String | Routes the request to the specified shard. |
timeout | Time | How long to wait for the request to return. Default 1m . |
type | String | (Deprecated) The default document type for documents that don't specify a type. Default is _doc . We highly recommend ignoring this parameter and using a type of _doc for all indices. |
wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to all or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed. |
{% comment %}_source | List | asdf |
_source_excludes | list | asdf |
_source_includes | list | asdf{% endcomment %} |
Request body
The bulk request body follows this pattern:
Action and metadata\n
Optional document\n
Action and metadata\n
Optional document\n
The optional JSON document doesn't need to be minified---spaces are fine---but it does need to be on a single line. OpenSearch uses newline characters to parse bulk requests and requires that the request body end with a newline character.
All actions support the same metadata: _index
, _id
, and _require_alias
. If you don't provide an ID, OpenSearch generates one automatically, which can make it challenging to update the document at a later time.
-
Create
Creates a document if it doesn't already exist and returns an error otherwise. The next line must include a JSON document.
{ "create": { "_index": "movies", "_id": "tt1392214" } } { "title": "Prisoners", "year": 2013 }
-
Delete
This action deletes a document if it exists. If the document doesn't exist, OpenSearch doesn't return an error, but instead returns
not_found
underresult
. Delete actions don't require documents on the next line.{ "delete": { "_index": "movies", "_id": "tt2229499" } }
-
Index
Index actions create a document if it doesn't yet exist and replace the document if it already exists. The next line must include a JSON document.
{ "index": { "_index": "movies", "_id": "tt1979320" } } { "title": "Rush", "year": 2013}
-
Update
This action updates existing documents and returns an error if the document doesn't exist. The next line must include a full or partial JSON document, depending on how much of the document you want to update.
{ "update": { "_index": "movies", "_id": "tt0816711" } } { "doc" : { "title": "World War Z" } }
It can also include a script or upsert for more complex document updates.
- Script
{ "update": { "_index": "movies", "_id": "tt0816711" } } { "script" : { "source": "ctx._source.title = \"World War Z\"" } }
- Upsert
{ "update": { "_index": "movies", "_id": "tt0816711" } } { "doc" : { "title": "World War Z" }, "doc_as_upsert": true }
Response
In the response, pay particular attention to the top-level errors
boolean. If true, you can iterate over the individual actions for more detailed information.
{
"took": 11,
"errors": true,
"items": [
{
"index": {
"_index": "movies",
"_id": "tt1979320",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1,
"status": 201
}
},
{
"create": {
"_index": "movies",
"_id": "tt1392214",
"status": 409,
"error": {
"type": "version_conflict_engine_exception",
"reason": "[tt1392214]: version conflict, document already exists (current version [1])",
"index": "movies",
"shard": "0",
"index_uuid": "yhizhusbSWmP0G7OJnmcLg"
}
}
},
{
"update": {
"_index": "movies",
"_id": "tt0816711",
"status": 404,
"error": {
"type": "document_missing_exception",
"reason": "[_doc][tt0816711]: document missing",
"index": "movies",
"shard": "0",
"index_uuid": "yhizhusbSWmP0G7OJnmcLg"
}
}
}
]
}