OpenSearch/docs/java-api/bulk.asciidoc

[[bulk]]
== Bulk API

The bulk API allows one to index and delete several documents in a
single request. Here is a sample usage:

[source,java]
--------------------------------------------------
import static org.elasticsearch.common.xcontent.XContentFactory.*;

BulkRequestBuilder bulkRequest = client.prepareBulk();

// either use client#prepare, or use Requests# to directly build index/delete requests
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "trying out Elasticsearch")
                    .endObject()
                  )
        );

bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "another post")
                    .endObject()
                  )
        );
        
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
if (bulkResponse.hasFailures()) {
    // process failures by iterating through each bulk response item
}
--------------------------------------------------

[float]
== Using Bulk Processor

The `BulkProcessor` class offers a simple interface to flush bulk operations automatically based on the number or size
of requests, or after a given period.

To use it, first create a `BulkProcessor` instance:

[source,java]
--------------------------------------------------
import org.elasticsearch.action.bulk.BulkProcessor;
import org.elasticsearch.common.unit.ByteSizeUnit;
import org.elasticsearch.common.unit.ByteSizeValue;

BulkProcessor bulkProcessor = BulkProcessor.builder(
        client,  <1>
        new BulkProcessor.Listener() {
            @Override
            public void beforeBulk(long executionId,
                                   BulkRequest request) { ... } <2>

            @Override
            public void afterBulk(long executionId,
                                  BulkRequest request,
                                  BulkResponse response) { ... } <3>

            @Override
            public void afterBulk(long executionId,
                                  BulkRequest request,
                                  Throwable failure) { ... } <4>
        })
        .setBulkActions(10000) <5>
        .setBulkSize(new ByteSizeValue(1, ByteSizeUnit.GB)) <6>
        .setFlushInterval(TimeValue.timeValueSeconds(5)) <7>
        .setConcurrentRequests(1) <8>
        .build();
--------------------------------------------------
<1> Add your elasticsearch client
<2> This method is called just before bulk is executed. You can for example see the numberOfActions with
    `request.numberOfActions()`
<3> This method is called after bulk execution. You can for example check if there was some failing requests
    with `response.hasFailures()`
<4> This method is called when the bulk failed and raised a `Throwable`
<5> We want to execute the bulk every 10 000 requests
<6> We want to flush the bulk every 1gb
<7> We want to flush the bulk every 5 seconds whatever the number of requests
<8> Set the number of concurrent requests. A value of 0 means that only a single request will  be allowed to be
    executed. A value of 1 means 1 concurrent request is allowed to be executed while accumulating new bulk requests.

Then you can simply add your requests to the `BulkProcessor`:

[source,java]
--------------------------------------------------
bulkProcessor.add(new IndexRequest("twitter", "tweet", "1").source(/* your doc here */));
bulkProcessor.add(new DeleteRequest("twitter", "tweet", "2"));
--------------------------------------------------

By default, `BulkProcessor`:

* sets bulkActions to `1000`
* sets bulkSize to `5mb`
* does not set flushInterval
* sets concurrentRequests to 1

When all documents are loaded to the `BulkProcessor` it can be closed by using `awaitClose` or `close` methods:

[source,java]
--------------------------------------------------
bulkProcessor.awaitClose(10, TimeUnit.MINUTES);
--------------------------------------------------

or

[source,java]
--------------------------------------------------
bulkProcessor.close();
--------------------------------------------------

Both methods flush any remaining documents and disable all other scheduled flushes if they were scheduled by setting
`flushInterval`. If concurrent requests were enabled the `awaitClose` method waits for up to the specified timeout for
all bulk requests to complete then returns `true`, if the specified waiting time elapses before all bulk requests complete,
`false` is returned. The `close` method doesn't wait for any remaining bulk requests to complete and exists immediately.
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`[[bulk]]`
			`== Bulk API`

			`The bulk API allows one to index and delete several documents in a`
			`single request. Here is a sample usage:`

			`[source,java]`
			`--------------------------------------------------`
			`import static org.elasticsearch.common.xcontent.XContentFactory.*;`

			`BulkRequestBuilder bulkRequest = client.prepareBulk();`

			`// either use client#prepare, or use Requests# to directly build index/delete requests`
			`bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")`
			`.setSource(jsonBuilder()`
			`.startObject()`
			`.field("user", "kimchy")`
			`.field("postDate", new Date())`
Cleanup comments and class names s/ElasticSearch/Elasticsearch * Clean up s/ElasticSearch/Elasticsearch on docs/* * Clean up s/ElasticSearch/Elasticsearch on src/* bin/* & pom.xml * Clean up s/ElasticSearch/Elasticsearch on NOTICE.txt and README.textile Closes #4634 2014-01-06 15:58:46 -05:00			`.field("message", "trying out Elasticsearch")`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`.endObject()`
			`)`
			`);`

			`bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")`
			`.setSource(jsonBuilder()`
			`.startObject()`
			`.field("user", "kimchy")`
			`.field("postDate", new Date())`
			`.field("message", "another post")`
			`.endObject()`
			`)`
			`);`

			`BulkResponse bulkResponse = bulkRequest.execute().actionGet();`
			`if (bulkResponse.hasFailures()) {`
			`// process failures by iterating through each bulk response item`
			`}`
			`--------------------------------------------------`
Document the Java BulkProcessor Closes #7638. 2014-09-22 09:05:10 -04:00
			`[float]`
			`== Using Bulk Processor`

			The `BulkProcessor` class offers a simple interface to flush bulk operations automatically based on the number or size
			`of requests, or after a given period.`

			To use it, first create a `BulkProcessor` instance:

			`[source,java]`
			`--------------------------------------------------`
			`import org.elasticsearch.action.bulk.BulkProcessor;`
[doc] Add missing imports for Java bulk API doc I was unable to get my BulkProcessor script to work without importing the "ByteSizeUnit" and "ByteSizeValue" classes. Perhaps I overlooked something in the example and do not understand its code. 2015-06-18 10:38:12 -04:00			`import org.elasticsearch.common.unit.ByteSizeUnit;`
			`import org.elasticsearch.common.unit.ByteSizeValue;`
Document the Java BulkProcessor Closes #7638. 2014-09-22 09:05:10 -04:00
			`BulkProcessor bulkProcessor = BulkProcessor.builder(`
			`client, <1>`
			`new BulkProcessor.Listener() {`
			`@Override`
			`public void beforeBulk(long executionId,`
			`BulkRequest request) { ... } <2>`

			`@Override`
			`public void afterBulk(long executionId,`
			`BulkRequest request,`
			`BulkResponse response) { ... } <3>`

			`@Override`
			`public void afterBulk(long executionId,`
			`BulkRequest request,`
			`Throwable failure) { ... } <4>`
			`})`
			`.setBulkActions(10000) <5>`
			`.setBulkSize(new ByteSizeValue(1, ByteSizeUnit.GB)) <6>`
			`.setFlushInterval(TimeValue.timeValueSeconds(5)) <7>`
			`.setConcurrentRequests(1) <8>`
			`.build();`
			`--------------------------------------------------`
			`<1> Add your elasticsearch client`
			`<2> This method is called just before bulk is executed. You can for example see the numberOfActions with`
			`request.numberOfActions()`
			`<3> This method is called after bulk execution. You can for example check if there was some failing requests`
			with `response.hasFailures()`
			<4> This method is called when the bulk failed and raised a `Throwable`
			`<5> We want to execute the bulk every 10 000 requests`
			`<6> We want to flush the bulk every 1gb`
			`<7> We want to flush the bulk every 5 seconds whatever the number of requests`
			`<8> Set the number of concurrent requests. A value of 0 means that only a single request will be allowed to be`
			`executed. A value of 1 means 1 concurrent request is allowed to be executed while accumulating new bulk requests.`

			Then you can simply add your requests to the `BulkProcessor`:

			`[source,java]`
			`--------------------------------------------------`
			`bulkProcessor.add(new IndexRequest("twitter", "tweet", "1").source(/* your doc here */));`
			`bulkProcessor.add(new DeleteRequest("twitter", "tweet", "2"));`
			`--------------------------------------------------`

			By default, `BulkProcessor`:

			* sets bulkActions to `1000`
			* sets bulkSize to `5mb`
			`* does not set flushInterval`
			`* sets concurrentRequests to 1`

Docs: add notes about using close and awaitClose with bulk processor Closes #10839 2015-04-29 10:53:16 -04:00			When all documents are loaded to the `BulkProcessor` it can be closed by using `awaitClose` or `close` methods:

			`[source,java]`
			`--------------------------------------------------`
			`bulkProcessor.awaitClose(10, TimeUnit.MINUTES);`
			`--------------------------------------------------`

			`or`

			`[source,java]`
			`--------------------------------------------------`
			`bulkProcessor.close();`
			`--------------------------------------------------`

			`Both methods flush any remaining documents and disable all other scheduled flushes if they were scheduled by setting`
			`flushInterval`. If concurrent requests were enabled the `awaitClose` method waits for up to the specified timeout for
			all bulk requests to complete then returns `true`, if the specified waiting time elapses before all bulk requests complete,
			`false` is returned. The `close` method doesn't wait for any remaining bulk requests to complete and exists immediately.