opensearch-docs-cn/_opensearch/rest-api/document-apis/bulk.md

173 lines
5.9 KiB
Markdown
Raw Normal View History

2021-05-28 13:48:19 -04:00
---
layout: default
title: Bulk
parent: Document APIs
grand_parent: REST API reference
nav_order: 20
2021-05-28 13:48:19 -04:00
---
# Bulk
2021-07-26 19:14:22 -04:00
Introduced 1.0
{: .label .label-purple }
2021-05-28 13:48:19 -04:00
The bulk operation lets you add, update, or delete many documents in a single request. Compared to individual OpenSearch indexing requests, the bulk operation has significant performance benefits. Whenever practical, we recommend batching indexing operations into bulk requests.
## Example
```json
POST _bulk
{ "delete": { "_index": "movies", "_id": "tt2229499" } }
{ "index": { "_index": "movies", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013 }
{ "create": { "_index": "movies", "_id": "tt1392214" } }
{ "title": "Prisoners", "year": 2013 }
{ "update": { "_index": "movies", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } }
```
## Path and HTTP methods
```
POST _bulk
POST <index>/_bulk
2021-05-28 13:48:19 -04:00
```
Specifying the index in the path means you don't need to include it in the [request body]({{site.url}}{{site.baseurl}}/opensearch/rest-api/document-apis/bulk/#request-body).
2021-05-28 13:48:19 -04:00
OpenSearch also accepts PUT requests to the `_bulk` path, but we highly recommend using POST. The accepted usage of PUT---adding or replacing a single resource at a given path---doesn't make sense for bulk requests.
{: .note }
## URL parameters
All bulk URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
pipeline | String | The pipeline ID for preprocessing documents.
refresh | Enum | Whether to refresh the affected shards after performing the indexing operations. Default is `false`. `true` makes the changes show up in search results immediately, but hurts cluster performance. `wait_for` waits for a refresh. Requests take longer to return, but cluster performance doesn't suffer.
require_alias | Boolean | Set to `true` to require that all actions target an index alias rather than an index. Default is `false`.
routing | String | Routes the request to the specified shard.
timeout | Time | How long to wait for the request to return. Default `1m`.
type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indices.
wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed.
{% comment %}_source | List | asdf
_source_excludes | list | asdf
_source_includes | list | asdf{% endcomment %}
## Request body
The bulk request body follows this pattern:
```
Action and metadata\n
Optional document\n
Action and metadata\n
Optional document\n
```
The optional JSON document doesn't need to be minified---spaces are fine---but it does need to be on a single line. OpenSearch uses newline characters to parse bulk requests and requires that the request body end with a newline character.
All actions support the same metadata: `_index`, `_id`, and `_require_alias`. If you don't provide an ID, OpenSearch generates one automatically, which can make it challenging to update the document at a later time.
- Create
Creates a document if it doesn't already exist and returns an error otherwise. The next line must include a JSON document.
```json
{ "create": { "_index": "movies", "_id": "tt1392214" } }
{ "title": "Prisoners", "year": 2013 }
```
- Delete
This action deletes a document if it exists. If the document doesn't exist, OpenSearch doesn't return an error, but instead returns `not_found` under `result`. Delete actions don't require documents on the next line.
```json
{ "delete": { "_index": "movies", "_id": "tt2229499" } }
```
- Index
Index actions create a document if it doesn't yet exist and replace the document if it already exists. The next line must include a JSON document.
```json
{ "index": { "_index": "movies", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013}
```
- Update
This action updates existing documents and returns an error if the document doesn't exist. The next line must include a full or partial JSON document, depending on how much of the document you want to update. It can also include a script or upsert for more complex document updates.
```json
{ "update": { "_index": "movies", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } }
```
## Response
In the response, pay particular attention to the top-level `errors` boolean. If true, you can iterate over the individual actions for more detailed information.
```json
{
"took": 11,
"errors": true,
"items": [
{
"index": {
"_index": "movies",
"_type": "_doc",
"_id": "tt1979320",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1,
"status": 201
}
},
{
"create": {
"_index": "movies",
"_type": "_doc",
"_id": "tt1392214",
"status": 409,
"error": {
"type": "version_conflict_engine_exception",
"reason": "[tt1392214]: version conflict, document already exists (current version [1])",
"index": "movies",
"shard": "0",
"index_uuid": "yhizhusbSWmP0G7OJnmcLg"
}
}
},
{
"update": {
"_index": "movies",
"_type": "_doc",
"_id": "tt0816711",
"status": 404,
"error": {
"type": "document_missing_exception",
"reason": "[_doc][tt0816711]: document missing",
"index": "movies",
"shard": "0",
"index_uuid": "yhizhusbSWmP0G7OJnmcLg"
}
}
}
]
}
```