OpenSearch/docs/reference/search/request/scroll.asciidoc

[[request-body-search-scroll]]
=== Scroll

While a `search` request returns a single ``page'' of results, the `scroll`
API can be used to retrieve large numbers of results (or even all results)
from a single search request, in much the same way as you would use a cursor
on a traditional database.

Scrolling is not intended for real time user requests, but rather for
processing large amounts of data, e.g. in order to reindex the contents of one
index into a new index with a different configuration.

.Client support for scrolling and reindexing
*********************************************

Some of the officially supported clients provide helpers to assist with
scrolled searches and reindexing of documents from one index to another:

Perl::

    See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
    and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]

Python::

    See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]

*********************************************

NOTE: The results that are returned from a scroll request reflect the state of
the index at the time that the initial `search` request was  made, like a
snapshot in time. Subsequent changes to documents (index, update or delete)
will only affect later search requests.

In order to use scrolling, the initial search request should specify the
`scroll` parameter in the query string, which tells Elasticsearch how long it
should keep the ``search context'' alive (see <<scroll-search-context>>), eg `?scroll=1m`.

[source,js]
--------------------------------------------------
POST /twitter/_search?scroll=1m
{
    "size": 100,
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
--------------------------------------------------
// CONSOLE
// TEST[setup:twitter]

The result from the above request includes a `_scroll_id`, which should
be passed to the `scroll` API in order to retrieve the next batch of
results.

[source,js]
--------------------------------------------------
POST /_search/scroll <1>
{
    "scroll" : "1m", <2>
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" <3>
}
--------------------------------------------------
// CONSOLE
// TEST[continued s/DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==/$body._scroll_id/]

<1> `GET` or `POST` can be used and the URL should not include the `index`
    name -- this is specified in the original `search` request instead.
<2> The `scroll` parameter tells Elasticsearch to keep the search context open
    for another `1m`.
<3> The `scroll_id` parameter

The `size` parameter allows you to configure the maximum number of hits to be 
returned with each batch of results.  Each call to the `scroll` API returns the 
next batch of results until there are no more results left to return, ie the 
`hits` array is empty.

IMPORTANT: The initial search request and each subsequent scroll request each 
return a `_scroll_id`. While the `_scroll_id` may change between requests, it doesn’t 
always change — in any case, only the most recently received `_scroll_id` should be used.

NOTE: If the request specifies aggregations, only the initial search response
will contain the aggregations results.

NOTE: Scroll requests have optimizations that make them faster when the sort
order is `_doc`. If you want to iterate over all documents regardless of the
order, this is the most efficient option:

[source,js]
--------------------------------------------------
GET /_search?scroll=1m
{
  "sort": [
    "_doc"
  ]
}
--------------------------------------------------
// CONSOLE
// TEST[setup:twitter]

[[scroll-search-context]]
==== Keeping the search context alive

A scroll returns all the documents which matched the search at the time of the
initial search request. It ignores any subsequent changes to these documents.
The `scroll_id` identifies a _search context_ which keeps track of everything
that {es} needs to return the correct documents. The search context is created
by the initial request and kept alive by subsequent requests.

The `scroll` parameter (passed to the `search` request and to every `scroll`
request) tells Elasticsearch how long it should keep the search context alive.
Its value (e.g. `1m`, see <<time-units>>) does not need to be long enough to
process all data -- it just needs to be long enough to process the previous
batch of results. Each `scroll` request (with the `scroll` parameter) sets a
new  expiry time. If a `scroll` request doesn't pass in the `scroll`
parameter, then the search context will be freed as part of _that_ `scroll`
request.

Normally, the background merge process optimizes the index by merging together
smaller segments to create new, bigger segments. Once the smaller segments are
no longer needed they are deleted. This process continues during scrolling, but
an open search context prevents the old segments from being deleted since they
are still in use.

TIP: Keeping older segments alive means that more disk space and file handles
are needed. Ensure that you have configured your nodes to have ample free file
handles. See <<file-descriptors>>.

Additionally, if a segment contains deleted or updated documents then the
search context must keep track of whether each document in the segment was live
at the time of the initial search request. Ensure that your nodes have
sufficient heap space if you have many open scrolls on an index that is subject
to ongoing deletes or updates.

NOTE: To prevent against issues caused by having too many scrolls open, the
user is not allowed to open scrolls past a certain limit. By default, the
maximum number of open scrolls is 500. This limit can be updated with the
`search.max_open_scroll_context` cluster setting.

You can check how many search contexts are open with the
<<cluster-nodes-stats,nodes stats API>>:

[source,js]
---------------------------------------
GET /_nodes/stats/indices/search
---------------------------------------
// CONSOLE

==== Clear scroll API

Search context are automatically removed when the `scroll` timeout has been
exceeded. However keeping scrolls open has a cost, as discussed in the
<<scroll-search-context,previous section>> so scrolls should be explicitly
cleared as soon as the scroll is not being used anymore using the
`clear-scroll` API:

[source,js]
---------------------------------------
DELETE /_search/scroll
{
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}
---------------------------------------
// CONSOLE
// TEST[catch:missing]

Multiple scroll IDs can be passed as array:

[source,js]
---------------------------------------
DELETE /_search/scroll
{
    "scroll_id" : [
      "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==",
      "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAABFmtSWWRRWUJrU2o2ZExpSGJCVmQxYUEAAAAAAAAAAxZrUllkUVlCa1NqNmRMaUhiQlZkMWFBAAAAAAAAAAIWa1JZZFFZQmtTajZkTGlIYkJWZDFhQQAAAAAAAAAFFmtSWWRRWUJrU2o2ZExpSGJCVmQxYUEAAAAAAAAABBZrUllkUVlCa1NqNmRMaUhiQlZkMWFB"
    ]
}
---------------------------------------
// CONSOLE
// TEST[catch:missing]

All search contexts can be cleared with the `_all` parameter:

[source,js]
---------------------------------------
DELETE /_search/scroll/_all
---------------------------------------
// CONSOLE

The `scroll_id` can also be passed as a query string parameter or in the request body.
Multiple scroll IDs can be passed as comma separated values:

[source,js]
---------------------------------------
DELETE /_search/scroll/DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==,DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAABFmtSWWRRWUJrU2o2ZExpSGJCVmQxYUEAAAAAAAAAAxZrUllkUVlCa1NqNmRMaUhiQlZkMWFBAAAAAAAAAAIWa1JZZFFZQmtTajZkTGlIYkJWZDFhQQAAAAAAAAAFFmtSWWRRWUJrU2o2ZExpSGJCVmQxYUEAAAAAAAAABBZrUllkUVlCa1NqNmRMaUhiQlZkMWFB
---------------------------------------
// CONSOLE
// TEST[catch:missing]

[[sliced-scroll]]
==== Sliced Scroll

For scroll queries that return a lot of documents it is possible to split the scroll in multiple slices which
can be consumed independently:

[source,js]
--------------------------------------------------
GET /twitter/_search?scroll=1m
{
    "slice": {
        "id": 0, <1>
        "max": 2 <2>
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
GET /twitter/_search?scroll=1m
{
    "slice": {
        "id": 1,
        "max": 2
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
--------------------------------------------------
// CONSOLE
// TEST[setup:big_twitter]

<1> The id of the slice
<2> The maximum number of slices

The result from the first request returned documents that belong to the first slice (id: 0) and the result from the
second request returned documents that belong to the second slice. Since the maximum number of slices is set to 2
 the union of the results of the two requests is equivalent to the results of a scroll query without slicing.
By default the splitting is done on the shards first and then locally on each shard using the _id field
with the following formula:
`slice(doc) = floorMod(hashCode(doc._id), max)`
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
to the first shard and the slices 1 and 3 are assigned to the second shard.

Each scroll is independent and can be processed in parallel like any scroll request.

NOTE: If the number of slices is bigger than the number of shards the slice filter is very slow on the first calls, it has a complexity of O(N) and a memory cost equals
to N bits per slice where N is the total number of documents in the shard.
After few calls the filter should be cached and subsequent calls should be faster but you should limit the number of
 sliced query you perform in parallel to avoid the memory explosion.

To avoid this cost entirely it is possible to use the `doc_values` of another field to do the slicing
but the user must ensure that the field has the following properties:

    * The field is numeric.

    * `doc_values` are enabled on that field

    * Every document should contain a single value. If a document has multiple values for the specified field, the first value is used.

    * The value for each document should be set once when the document is created and never updated. This ensures that each
slice gets deterministic results.

    * The cardinality of the field should be high. This ensures that each slice gets approximately the same amount of documents.

[source,js]
--------------------------------------------------
GET /twitter/_search?scroll=1m
{
    "slice": {
        "field": "date",
        "id": 0,
        "max": 10
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
--------------------------------------------------
// CONSOLE
// TEST[setup:big_twitter]

For append only time-based indices, the `timestamp` field can be used safely.

NOTE: By default the maximum number of slices allowed per scroll is limited to 1024.
You can update the `index.max_slices_per_scroll` index setting to bypass this limit.
-												[DOCS] Move Elasticsearch APIs to REST APIs section. (#44238) (#44372)

Moves the following API sections under the REST APIs navigations:
- API Conventions
- Document APIs
- Search APIs
- Index APIs (previously named Indices APIs)
- cat APIs
- Cluster APIs

Other supporting changes:
- Removes the previous index APIs page under REST APIs. Adds a redirect for the removed page.
- Removes several [partintro] macros so the docs build correctly.
- Changes anchors for pages that become sections of a parent page.
- Adds several redirects for existing pages that become sections of a parent page.

This commit re-applies changes from #44238. Changes from that PR were reverted due to broken links in several repos. This commit adds redirects for those broken links.
											
										
										
											2019-07-17 08:49:22 -04:00
+								[[request-body-search-scroll]]
-												Migrated documentation into the main repo

											
										
										
											2013-08-28 19:24:34 -04:00
+								=== Scroll
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								While a `search` request returns a single ``page'' of results, the `scroll`
 								API can be used to retrieve large numbers of results (or even all results)
 								from a single search request, in much the same way as you would use a cursor
 								on a traditional database.
 								Scrolling is not intended for real time user requests, but rather for
 								processing large amounts of data, e.g. in order to reindex the contents of one
 								index into a new index with a different configuration.
-												Docs: Add links to client helper classes for bulk/scroll/reindexing

											
										
										
											2014-07-18 07:55:20 -04:00
+								.Client support for scrolling and reindexing
 								*********************************************
 								Some of the officially supported clients provide helpers to assist with
 								scrolled searches and reindexing of documents from one index to another:
 								Perl::
-												Fix link to perl docs (#24842)

* Fixes Elasticsearch issue #24606.

* Fixes Elasticsearch issue #24606.

* Fixes Elasticsearch issue #24606.

* Fixes Elasticsearch issue #24606.

* Issue #24606 - Changed the link text to Search::Elasticsearch::Client::5_0::Bulk and
Search::Elasticsearch::Client::5_0::Scroll.

											
										
										
											2017-05-24 05:43:54 -04:00
+								    See https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk[Search::Elasticsearch::Client::5_0::Bulk]
 								    and https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll[Search::Elasticsearch::Client::5_0::Scroll]
-												Docs: Add links to client helper classes for bulk/scroll/reindexing

											
										
										
											2014-07-18 07:55:20 -04:00
 								Python::
 								    See http://elasticsearch-py.readthedocs.org/en/master/helpers.html[elasticsearch.helpers.*]
 								*********************************************
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								NOTE: The results that are returned from a scroll request reflect the state of
 								the index at the time that the initial `search` request was  made, like a
 								snapshot in time. Subsequent changes to documents (index, update or delete)
 								will only affect later search requests.
 								In order to use scrolling, the initial search request should specify the
 								`scroll` parameter in the query string, which tells Elasticsearch how long it
 								should keep the ``search context'' alive (see <<scroll-search-context>>), eg `?scroll=1m`.
-												Migrated documentation into the main repo

											
										
										
											2013-08-28 19:24:34 -04:00
 								[source,js]
 								--------------------------------------------------
-												Allow `_doc` as a type. (#27816)

Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
											
										
										
											2017-12-14 11:47:53 -05:00
+								POST /twitter/_search?scroll=1m
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								{
-												Documentation updates for scroll API size parameter (#21229)

* Document size parameter for scroll API

* Fix size parameter behavior description for scroll

											
										
										
											2016-11-01 15:55:09 -04:00
+								    "size": 100,
-												Migrated documentation into the main repo

											
										
										
											2013-08-28 19:24:34 -04:00
+								    "query": {
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								        "match" : {
 								            "title" : "elasticsearch"
-												Migrated documentation into the main repo

											
										
										
											2013-08-28 19:24:34 -04:00
+								        }
 								    }
 								}
 								--------------------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								// CONSOLE
 								// TEST[setup:twitter]
-												Migrated documentation into the main repo

											
										
										
											2013-08-28 19:24:34 -04:00
-												Docs: The name of scroll ID attribute in the response is "_scroll_id" rather than "scroll_id"

Closes #10691

											
										
										
											2015-04-20 19:47:36 -04:00
+								The result from the above request includes a `_scroll_id`, which should
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								be passed to the `scroll` API in order to retrieve the next batch of
 								results.
-												Migrated documentation into the main repo

											
										
										
											2013-08-28 19:24:34 -04:00
+								[source,js]
 								--------------------------------------------------
-												Docs: Drop inline callout from scroll example (#38340)

Coalesces two calls into one in a scroll example so all callouts are at
the end of the line. This is the only sort of callouts that are
supported by asciidoctor and we'd like to start building our docs with
asciidoctor.

At present we don't have any mechanism to stop folks adding more inline
callouts but we ought to be able to have one in a few weeks. For now,
though, removing these inline callouts is a step in the right direction.

Relates to #38335
											
										
										
											2019-02-04 14:57:38 -05:00
+								POST /_search/scroll <1>
-												Rest: Add json in request body to scroll, clear scroll, and analyze API

Change analyze.asciidoc and scroll.asciidoc
Add json support to Analyze and Scroll, and clear scrollAPI
Add rest-api-spec/test

Closes #5866

											
										
										
											2015-04-02 21:51:15 -04:00
+								{
-												Docs: Drop inline callout from scroll example (#38340)

Coalesces two calls into one in a scroll example so all callouts are at
the end of the line. This is the only sort of callouts that are
supported by asciidoctor and we'd like to start building our docs with
asciidoctor.

At present we don't have any mechanism to stop folks adding more inline
callouts but we ought to be able to have one in a few weeks. For now,
though, removing these inline callouts is a step in the right direction.

Relates to #38335
											
										
										
											2019-02-04 14:57:38 -05:00
+								    "scroll" : "1m", <2>
 								    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" <3>
-												Rest: Add json in request body to scroll, clear scroll, and analyze API

Change analyze.asciidoc and scroll.asciidoc
Add json support to Analyze and Scroll, and clear scrollAPI
Add rest-api-spec/test

Closes #5866

											
										
										
											2015-04-02 21:51:15 -04:00
+								}
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								--------------------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								// CONSOLE
 								// TEST[continued s/DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==/$body._scroll_id/]
-												Rest: Add json in request body to scroll, clear scroll, and analyze API

Change analyze.asciidoc and scroll.asciidoc
Add json support to Analyze and Scroll, and clear scrollAPI
Add rest-api-spec/test

Closes #5866

											
										
										
											2015-04-02 21:51:15 -04:00
-												Docs: Drop inline callout from scroll example (#38340)

Coalesces two calls into one in a scroll example so all callouts are at
the end of the line. This is the only sort of callouts that are
supported by asciidoctor and we'd like to start building our docs with
asciidoctor.

At present we don't have any mechanism to stop folks adding more inline
callouts but we ought to be able to have one in a few weeks. For now,
though, removing these inline callouts is a step in the right direction.

Relates to #38335
											
										
										
											2019-02-04 14:57:38 -05:00
+								<1> `GET` or `POST` can be used and the URL should not include the `index`
 								    name -- this is specified in the original `search` request instead.
 								<2> The `scroll` parameter tells Elasticsearch to keep the search context open
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								    for another `1m`.
-												Docs: Drop inline callout from scroll example (#38340)

Coalesces two calls into one in a scroll example so all callouts are at
the end of the line. This is the only sort of callouts that are
supported by asciidoctor and we'd like to start building our docs with
asciidoctor.

At present we don't have any mechanism to stop folks adding more inline
callouts but we ought to be able to have one in a few weeks. For now,
though, removing these inline callouts is a step in the right direction.

Relates to #38335
											
										
										
											2019-02-04 14:57:38 -05:00
+								<3> The `scroll_id` parameter
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
-												Documentation updates for scroll API size parameter (#21229)

* Document size parameter for scroll API

* Fix size parameter behavior description for scroll

											
										
										
											2016-11-01 15:55:09 -04:00
+								The `size` parameter allows you to configure the maximum number of hits to be
 								returned with each batch of results.  Each call to the `scroll` API returns the
 								next batch of results until there are no more results left to return, ie the
 								`hits` array is empty.
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
-												Clarify documentation of scroll_id (#29424)

* Clarify documentation of scroll_id

The Scroll API may return the same scroll ID for multiple requests due to server side state. This is not clear from the current documentation.

* Further clarify scroll ID return behaviour

											
										
										
											2018-04-26 04:45:48 -04:00
+								IMPORTANT: The initial search request and each subsequent scroll request each
-												[DOCS] Update scroll.asciidoc (#32530)


											
										
										
											2018-09-18 10:59:26 -04:00
+								return a `_scroll_id`. While the `_scroll_id` may change between requests, it doesn’t
 								always change — in any case, only the most recently received `_scroll_id` should be used.
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
-												Aggregations: Only return aggregations on the first page when scrolling.

Aggregations are collection-wide statistics so they would always be the same.
In order to save CPU/bandwidth, we can just return them on the first page.

Same as #1642 but for aggregations.

											
										
										
											2014-08-28 06:19:13 -04:00
+								NOTE: If the request specifies aggregations, only the initial search response
 								will contain the aggregations results.
-												Deprecate the `scan` search type.

This commit deprecates the `scan` search type in favour of regular scroll
requests sorted by `_doc`.

Related to #12983

											
										
										
											2015-08-19 10:43:50 -04:00
+								NOTE: Scroll requests have optimizations that make them faster when the sort
 								order is `_doc`. If you want to iterate over all documents regardless of the
 								order, this is the most efficient option:
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
 								[source,js]
 								--------------------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								GET /_search?scroll=1m
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								{
-												Deprecate the `scan` search type.

This commit deprecates the `scan` search type in favour of regular scroll
requests sorted by `_doc`.

Related to #12983

											
										
										
											2015-08-19 10:43:50 -04:00
+								  "sort": [
 								    "_doc"
-												Fix typo in scroll.asciidoc

Fix scroll request with sort.

Closes #15493

											
										
										
											2015-12-16 17:43:50 -05:00
+								  ]
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								}
-												Migrated documentation into the main repo

											
										
										
											2013-08-28 19:24:34 -04:00
+								--------------------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								// CONSOLE
 								// TEST[setup:twitter]
-												[DOCS] Be explicit about scan doing no scoring

											
										
										
											2015-04-19 18:13:51 -04:00
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								[[scroll-search-context]]
 								==== Keeping the search context alive
-												Mention the cost of tracking live docs in scrolls (#41375)

Relates #41337, in which a heap dump shows hundreds of MBs allocated on the
heap for tracking the live docs for each scroll.
											
										
										
											2019-04-23 10:26:14 -04:00
+								A scroll returns all the documents which matched the search at the time of the
 								initial search request. It ignores any subsequent changes to these documents.
 								The `scroll_id` identifies a _search context_ which keeps track of everything
 								that {es} needs to return the correct documents. The search context is created
 								by the initial request and kept alive by subsequent requests.
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								The `scroll` parameter (passed to the `search` request and to every `scroll`
 								request) tells Elasticsearch how long it should keep the search context alive.
 								Its value (e.g. `1m`, see <<time-units>>) does not need to be long enough to
 								process all data -- it just needs to be long enough to process the previous
 								batch of results. Each `scroll` request (with the `scroll` parameter) sets a
-												document the search context is freed if the scroll is not extended (#34739)

The `fetchPhaseShouldFreeContext` returns true when there is a scroll context but the scroll parameter is null, thus freeing the search context.

https://github.com/elastic/elasticsearch/blob/183c32d4c39948e037a7fb44dccf31ab0d60d3c3/server/src/main/java/org/elasticsearch/search/SearchService.java#L491

											
										
										
											2018-10-25 16:48:06 -04:00
+								new  expiry time. If a `scroll` request doesn't pass in the `scroll`
 								parameter, then the search context will be freed as part of _that_ `scroll`
 								request.
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
-												Mention the cost of tracking live docs in scrolls (#41375)

Relates #41337, in which a heap dump shows hundreds of MBs allocated on the
heap for tracking the live docs for each scroll.
											
										
										
											2019-04-23 10:26:14 -04:00
+								Normally, the background merge process optimizes the index by merging together
 								smaller segments to create new, bigger segments. Once the smaller segments are
 								no longer needed they are deleted. This process continues during scrolling, but
 								an open search context prevents the old segments from being deleted since they
 								are still in use.
 								TIP: Keeping older segments alive means that more disk space and file handles
 								are needed. Ensure that you have configured your nodes to have ample free file
 								handles. See <<file-descriptors>>.
 								Additionally, if a segment contains deleted or updated documents then the
 								search context must keep track of whether each document in the segment was live
 								at the time of the initial search request. Ensure that your nodes have
 								sufficient heap space if you have many open scrolls on an index that is subject
 								to ongoing deletes or updates.
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
-												Added soft limit to open scroll contexts #25244 (#36009)

This change adds a soft limit to open scroll contexts that can be controlled with the dynamic cluster setting `search.max_open_scroll_context` (defaults to 500).

											
										
										
											2018-12-03 13:57:10 -05:00
+								NOTE: To prevent against issues caused by having too many scrolls open, the
 								user is not allowed to open scrolls past a certain limit. By default, the
 								maximum number of open scrolls is 500. This limit can be updated with the
 								`search.max_open_scroll_context` cluster setting.
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								You can check how many search contexts are open with the
 								<<cluster-nodes-stats,nodes stats API>>:
 								[source,js]
 								---------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								GET /_nodes/stats/indices/search
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								---------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								// CONSOLE
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
 								==== Clear scroll API
-												Fix documentation: scrolls are not closed automatically.

The documentation states that scrolls are automatically closed when all
documents are consumed, but this is not the case. I first tried to fix
the code to close scrolls automatically but this made REST tests fail
because clearing a scroll that is already closed returned a 4xx error
instead of a 2xx code, so this has probably been this way for a very long
time.

											
										
										
											2015-08-19 13:31:07 -04:00
+								Search context are automatically removed when the `scroll` timeout has been
 								exceeded. However keeping scrolls open has a cost, as discussed in the
 								<<scroll-search-context,previous section>> so scrolls should be explicitly
 								cleared as soon as the scroll is not being used anymore using the
 								`clear-scroll` API:
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
 								[source,js]
 								---------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								DELETE /_search/scroll
-												Rest: Add json in request body to scroll, clear scroll, and analyze API

Change analyze.asciidoc and scroll.asciidoc
Add json support to Analyze and Scroll, and clear scrollAPI
Add rest-api-spec/test

Closes #5866

											
										
										
											2015-04-02 21:51:15 -04:00
+								{
-												Allow passing single scrollID in clear scroll API body (#24242)

* Allow single scrollId in string format

Closes #24233
											
										
										
											2017-04-25 07:43:21 -04:00
+								    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								}
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								---------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								// CONSOLE
 								// TEST[catch:missing]
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
-												Rest: Add json in request body to scroll, clear scroll, and analyze API

Change analyze.asciidoc and scroll.asciidoc
Add json support to Analyze and Scroll, and clear scrollAPI
Add rest-api-spec/test

Closes #5866

											
										
										
											2015-04-02 21:51:15 -04:00
+								Multiple scroll IDs can be passed as array:
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
 								[source,js]
 								---------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								DELETE /_search/scroll
-												Rest: Add json in request body to scroll, clear scroll, and analyze API

Change analyze.asciidoc and scroll.asciidoc
Add json support to Analyze and Scroll, and clear scrollAPI
Add rest-api-spec/test

Closes #5866

											
										
										
											2015-04-02 21:51:15 -04:00
+								{
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								    "scroll_id" : [
 								      "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==",
 								      "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAABFmtSWWRRWUJrU2o2ZExpSGJCVmQxYUEAAAAAAAAAAxZrUllkUVlCa1NqNmRMaUhiQlZkMWFBAAAAAAAAAAIWa1JZZFFZQmtTajZkTGlIYkJWZDFhQQAAAAAAAAAFFmtSWWRRWUJrU2o2ZExpSGJCVmQxYUEAAAAAAAAABBZrUllkUVlCa1NqNmRMaUhiQlZkMWFB"
 								    ]
 								}
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								---------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								// CONSOLE
 								// TEST[catch:missing]
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
 								All search contexts can be cleared with the `_all` parameter:
 								[source,js]
 								---------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								DELETE /_search/scroll/_all
-												Docs: Rewrote the scroll/scan docs

Closes #6774

											
										
										
											2014-07-08 05:54:53 -04:00
+								---------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								// CONSOLE
-												Migrated documentation into the main repo

											
										
										
											2013-08-28 19:24:34 -04:00
-												Rest: Add json in request body to scroll, clear scroll, and analyze API

Change analyze.asciidoc and scroll.asciidoc
Add json support to Analyze and Scroll, and clear scrollAPI
Add rest-api-spec/test

Closes #5866

											
										
										
											2015-04-02 21:51:15 -04:00
+								The `scroll_id` can also be passed as a query string parameter or in the request body.
 								Multiple scroll IDs can be passed as comma separated values:
 								[source,js]
 								---------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								DELETE /_search/scroll/DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==,DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAABFmtSWWRRWUJrU2o2ZExpSGJCVmQxYUEAAAAAAAAAAxZrUllkUVlCa1NqNmRMaUhiQlZkMWFBAAAAAAAAAAIWa1JZZFFZQmtTajZkTGlIYkJWZDFhQQAAAAAAAAAFFmtSWWRRWUJrU2o2ZExpSGJCVmQxYUEAAAAAAAAABBZrUllkUVlCa1NqNmRMaUhiQlZkMWFB
-												Rest: Add json in request body to scroll, clear scroll, and analyze API

Change analyze.asciidoc and scroll.asciidoc
Add json support to Analyze and Scroll, and clear scrollAPI
Add rest-api-spec/test

Closes #5866

											
										
										
											2015-04-02 21:51:15 -04:00
+								---------------------------------------
-												CONSOLEify scroll docs

This causes the snippets to be tested during the build and gives
helpful links to the reader to open the docs in console or copy them
as curl commands.

Relates to #18160

											
										
										
											2016-10-05 11:16:40 -04:00
+								// CONSOLE
 								// TEST[catch:missing]
-												Rest: Add json in request body to scroll, clear scroll, and analyze API

Change analyze.asciidoc and scroll.asciidoc
Add json support to Analyze and Scroll, and clear scrollAPI
Add rest-api-spec/test

Closes #5866

											
										
										
											2015-04-02 21:51:15 -04:00
-												Document that sliced scroll works for reindex

Surprise! You can use sliced scroll to easily parallelize reindex
and friend. They support it because they use the same infrastructure
as a regular search to parse the search request. While we would like
to make an "automatic" option for parallelizing reindex, this manual
option works right now and is pretty convenient!

											
										
										
											2016-09-21 12:42:07 -04:00
+								[[sliced-scroll]]
-												Add the ability to partition a scroll in multiple slices.
API:

```
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{
    "slice": {
        "field": "_uid", <1>
        "id": 0, <2>
        "max": 10 <3>
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
```

<1> (optional) The field name used to do the slicing (_uid by default)
<2> The id of the slice

By default the splitting is done on the shards first and then locally on each shard using the _uid field
with the following formula:
`slice(doc) = floorMod(hashCode(doc._uid), max)`
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
to the first shard and the slices 1 and 3 are assigned to the second shard.

Each scroll is independent and can be processed in parallel like any scroll request.

Closes #13494

											
										
										
											2016-05-10 06:10:55 -04:00
+								==== Sliced Scroll
 								For scroll queries that return a lot of documents it is possible to split the scroll in multiple slices which
 								can be consumed independently:
 								[source,js]
 								--------------------------------------------------
-												Allow `_doc` as a type. (#27816)

Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
											
										
										
											2017-12-14 11:47:53 -05:00
+								GET /twitter/_search?scroll=1m
-												Add the ability to partition a scroll in multiple slices.
API:

```
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{
    "slice": {
        "field": "_uid", <1>
        "id": 0, <2>
        "max": 10 <3>
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
```

<1> (optional) The field name used to do the slicing (_uid by default)
<2> The id of the slice

By default the splitting is done on the shards first and then locally on each shard using the _uid field
with the following formula:
`slice(doc) = floorMod(hashCode(doc._uid), max)`
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
to the first shard and the slices 1 and 3 are assigned to the second shard.

Each scroll is independent and can be processed in parallel like any scroll request.

Closes #13494

											
										
										
											2016-05-10 06:10:55 -04:00
+								{
 								    "slice": {
 								        "id": 0, <1>
 								        "max": 2 <2>
 								    },
 								    "query": {
 								        "match" : {
 								            "title" : "elasticsearch"
 								        }
 								    }
 								}
-												Allow `_doc` as a type. (#27816)

Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
											
										
										
											2017-12-14 11:47:53 -05:00
+								GET /twitter/_search?scroll=1m
-												Add the ability to partition a scroll in multiple slices.
API:

```
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{
    "slice": {
        "field": "_uid", <1>
        "id": 0, <2>
        "max": 10 <3>
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
```

<1> (optional) The field name used to do the slicing (_uid by default)
<2> The id of the slice

By default the splitting is done on the shards first and then locally on each shard using the _uid field
with the following formula:
`slice(doc) = floorMod(hashCode(doc._uid), max)`
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
to the first shard and the slices 1 and 3 are assigned to the second shard.

Each scroll is independent and can be processed in parallel like any scroll request.

Closes #13494

											
										
										
											2016-05-10 06:10:55 -04:00
+								{
 								    "slice": {
 								        "id": 1,
 								        "max": 2
 								    },
 								    "query": {
 								        "match" : {
 								            "title" : "elasticsearch"
 								        }
 								    }
 								}
 								--------------------------------------------------
-												Document that sliced scroll works for reindex

Surprise! You can use sliced scroll to easily parallelize reindex
and friend. They support it because they use the same infrastructure
as a regular search to parse the search request. While we would like
to make an "automatic" option for parallelizing reindex, this manual
option works right now and is pretty convenient!

											
										
										
											2016-09-21 12:42:07 -04:00
+								// CONSOLE
 								// TEST[setup:big_twitter]
-												Add the ability to partition a scroll in multiple slices.
API:

```
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{
    "slice": {
        "field": "_uid", <1>
        "id": 0, <2>
        "max": 10 <3>
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
```

<1> (optional) The field name used to do the slicing (_uid by default)
<2> The id of the slice

By default the splitting is done on the shards first and then locally on each shard using the _uid field
with the following formula:
`slice(doc) = floorMod(hashCode(doc._uid), max)`
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
to the first shard and the slices 1 and 3 are assigned to the second shard.

Each scroll is independent and can be processed in parallel like any scroll request.

Closes #13494

											
										
										
											2016-05-10 06:10:55 -04:00
 								<1> The id of the slice
 								<2> The maximum number of slices
 								The result from the first request returned documents that belong to the first slice (id: 0) and the result from the
 								second request returned documents that belong to the second slice. Since the maximum number of slices is set to 2
 								 the union of the results of the two requests is equivalent to the results of a scroll query without slicing.
-												Remove legacy mapping code. (#29224)

Some features have been deprecated since `6.0` like the `_parent` field or the
ability to have multiple types per index. This allows to remove quite some
code, which in-turn will hopefully make it easier to proceed with the removal
of types.
											
										
										
											2018-04-11 03:41:37 -04:00
+								By default the splitting is done on the shards first and then locally on each shard using the _id field
-												Add the ability to partition a scroll in multiple slices.
API:

```
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{
    "slice": {
        "field": "_uid", <1>
        "id": 0, <2>
        "max": 10 <3>
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
```

<1> (optional) The field name used to do the slicing (_uid by default)
<2> The id of the slice

By default the splitting is done on the shards first and then locally on each shard using the _uid field
with the following formula:
`slice(doc) = floorMod(hashCode(doc._uid), max)`
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
to the first shard and the slices 1 and 3 are assigned to the second shard.

Each scroll is independent and can be processed in parallel like any scroll request.

Closes #13494

											
										
										
											2016-05-10 06:10:55 -04:00
+								with the following formula:
-												Remove legacy mapping code. (#29224)

Some features have been deprecated since `6.0` like the `_parent` field or the
ability to have multiple types per index. This allows to remove quite some
code, which in-turn will hopefully make it easier to proceed with the removal
of types.
											
										
										
											2018-04-11 03:41:37 -04:00
+								`slice(doc) = floorMod(hashCode(doc._id), max)`
-												Add the ability to partition a scroll in multiple slices.
API:

```
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{
    "slice": {
        "field": "_uid", <1>
        "id": 0, <2>
        "max": 10 <3>
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
```

<1> (optional) The field name used to do the slicing (_uid by default)
<2> The id of the slice

By default the splitting is done on the shards first and then locally on each shard using the _uid field
with the following formula:
`slice(doc) = floorMod(hashCode(doc._uid), max)`
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
to the first shard and the slices 1 and 3 are assigned to the second shard.

Each scroll is independent and can be processed in parallel like any scroll request.

Closes #13494

											
										
										
											2016-05-10 06:10:55 -04:00
+								For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
 								to the first shard and the slices 1 and 3 are assigned to the second shard.
 								Each scroll is independent and can be processed in parallel like any scroll request.
 								NOTE: If the number of slices is bigger than the number of shards the slice filter is very slow on the first calls, it has a complexity of O(N) and a memory cost equals
 								to N bits per slice where N is the total number of documents in the shard.
 								After few calls the filter should be cached and subsequent calls should be faster but you should limit the number of
 								 sliced query you perform in parallel to avoid the memory explosion.
 								To avoid this cost entirely it is possible to use the `doc_values` of another field to do the slicing
 								but the user must ensure that the field has the following properties:
 								    * The field is numeric.
 								    * `doc_values` are enabled on that field
 								    * Every document should contain a single value. If a document has multiple values for the specified field, the first value is used.
 								    * The value for each document should be set once when the document is created and never updated. This ensures that each
 								slice gets deterministic results.
 								    * The cardinality of the field should be high. This ensures that each slice gets approximately the same amount of documents.
 								[source,js]
 								--------------------------------------------------
-												Allow `_doc` as a type. (#27816)

Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
											
										
										
											2017-12-14 11:47:53 -05:00
+								GET /twitter/_search?scroll=1m
-												Add the ability to partition a scroll in multiple slices.
API:

```
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{
    "slice": {
        "field": "_uid", <1>
        "id": 0, <2>
        "max": 10 <3>
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
```

<1> (optional) The field name used to do the slicing (_uid by default)
<2> The id of the slice

By default the splitting is done on the shards first and then locally on each shard using the _uid field
with the following formula:
`slice(doc) = floorMod(hashCode(doc._uid), max)`
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
to the first shard and the slices 1 and 3 are assigned to the second shard.

Each scroll is independent and can be processed in parallel like any scroll request.

Closes #13494

											
										
										
											2016-05-10 06:10:55 -04:00
+								{
 								    "slice": {
-												Document that sliced scroll works for reindex

Surprise! You can use sliced scroll to easily parallelize reindex
and friend. They support it because they use the same infrastructure
as a regular search to parse the search request. While we would like
to make an "automatic" option for parallelizing reindex, this manual
option works right now and is pretty convenient!

											
										
										
											2016-09-21 12:42:07 -04:00
+								        "field": "date",
-												Add the ability to partition a scroll in multiple slices.
API:

```
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{
    "slice": {
        "field": "_uid", <1>
        "id": 0, <2>
        "max": 10 <3>
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
```

<1> (optional) The field name used to do the slicing (_uid by default)
<2> The id of the slice

By default the splitting is done on the shards first and then locally on each shard using the _uid field
with the following formula:
`slice(doc) = floorMod(hashCode(doc._uid), max)`
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
to the first shard and the slices 1 and 3 are assigned to the second shard.

Each scroll is independent and can be processed in parallel like any scroll request.

Closes #13494

											
										
										
											2016-05-10 06:10:55 -04:00
+								        "id": 0,
 								        "max": 10
 								    },
 								    "query": {
 								        "match" : {
 								            "title" : "elasticsearch"
 								        }
 								    }
 								}
 								--------------------------------------------------
-												Document that sliced scroll works for reindex

Surprise! You can use sliced scroll to easily parallelize reindex
and friend. They support it because they use the same infrastructure
as a regular search to parse the search request. While we would like
to make an "automatic" option for parallelizing reindex, this manual
option works right now and is pretty convenient!

											
										
										
											2016-09-21 12:42:07 -04:00
+								// CONSOLE
 								// TEST[setup:big_twitter]
-												Add the ability to partition a scroll in multiple slices.
API:

```
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '{
    "slice": {
        "field": "_uid", <1>
        "id": 0, <2>
        "max": 10 <3>
    },
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
```

<1> (optional) The field name used to do the slicing (_uid by default)
<2> The id of the slice

By default the splitting is done on the shards first and then locally on each shard using the _uid field
with the following formula:
`slice(doc) = floorMod(hashCode(doc._uid), max)`
For instance if the number of shards is equal to 2 and the user requested 4 slices then the slices 0 and 2 are assigned
to the first shard and the slices 1 and 3 are assigned to the second shard.

Each scroll is independent and can be processed in parallel like any scroll request.

Closes #13494

											
										
										
											2016-05-10 06:10:55 -04:00
-												Add an index setting to limit the maximum number of slices allowed in a scroll request (default to 1024).

											
										
										
											2016-06-08 09:28:06 -04:00
+								For append only time-based indices, the `timestamp` field can be used safely.
 								NOTE: By default the maximum number of slices allowed per scroll is limited to 1024.
-												Document that sliced scroll works for reindex

Surprise! You can use sliced scroll to easily parallelize reindex
and friend. They support it because they use the same infrastructure
as a regular search to parse the search request. While we would like
to make an "automatic" option for parallelizing reindex, this manual
option works right now and is pretty convenient!

											
										
										
											2016-09-21 12:42:07 -04:00
+								You can update the `index.max_slices_per_scroll` index setting to bypass this limit.