OpenSearch/docs/reference/indices/shadow-replicas.asciidoc

[[indices-shadow-replicas]]
== Shadow replica indices

experimental[]

If you would like to use a shared filesystem, you can use the shadow replicas
settings to choose where on disk the data for an index should be kept, as well
as how Elasticsearch should replay operations on all the replica shards of an
index.

In order to fully utilize the `index.data_path` and `index.shadow_replicas`
settings, you need to enable using it in elasticsearch.yml:

[source,yaml]
--------------------------------------------------
node.enable_custom_paths: true
--------------------------------------------------

You can then create an index with a custom data path, where each node will use
this path for the data:

[WARNING]
========================
Because shadow replicas do not index the document on replica shards, it's
possible for the replica's known mapping to be behind the index's known mapping
if the latest cluster state has not yet been processed on the node containing
the replica. Because of this, it is highly recommended to use pre-defined
mappings when using shadow replicas.
========================

[source,js]
--------------------------------------------------
curl -XPUT 'localhost:9200/my_index' -d '
{
    "index" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 4,
        "data_path": "/var/data/my_index",
        "shadow_replicas": true
    } 
}'
--------------------------------------------------

[WARNING]
========================
In the above example, the "/var/data/my_index" path is a shared filesystem that
must be available on every node in the Elasticsearch cluster. You must also
ensure that the Elasticsearch process has the correct permissions to read from
and write to the directory used in the `index.data_path` setting.
========================

An index that has been created with the `index.shadow_replicas` setting set to
"true" will not replicate document operations to any of the replica shards,
instead, it will only continually refresh. Once segments are available on the
filesystem where the shadow replica resides (after an Elasticsearch "flush"), a
regular refresh (governed by the `index.refresh_interval`) can be used to make
the new data searchable.

NOTE: Since documents are only indexed on the primary shard, realtime GET
requests could fail to return a document if executed on the replica shard,
therefore, GET API requests automatically have the `?preference=_primary` flag
set if there is no preference flag already set.

In order to ensure the data is being synchronized in a fast enough manner, you
may need to tune the flush threshold for the index to a desired number. A flush
is needed to fsync segment files to disk, so they will be visible to all other
replica nodes. Users should test what flush threshold levels they are
comfortable with, as increased flushing can impact indexing performance.

The Elasticsearch cluster will still detect the loss of a primary shard, and
transform the replica into a primary in this situation. This transformation will
take slightly longer, since no `IndexWriter` is maintained for each shadow
replica.

Below is the list of settings that can be changed using the update
settings API:

`index.data_path` (string)::
    Path to use for the index's data. Note that by default Elasticsearch will
    append the node ordinal by default to the path to ensure multiple instances
    of Elasticsearch on the same machine do not share a data directory.

`index.shadow_replicas`::
    Boolean value indicating this index should use shadow replicas. Defaults to
    `false`.

`index.shared_filesystem`::
    Boolean value indicating this index uses a shared filesystem. Defaults to
    the `true` if `index.shadow_replicas` is set to true, `false` otherwise.

=== Node level settings related to shadow replicas

These are non-dynamic settings that need to be configured in `elasticsearch.yml`

`node.add_id_to_custom_path`::
    Boolean setting indicating whether Elasticsearch should append the node's
    ordinal to the custom data path. For example, if this is enabled and a path
    of "/tmp/foo" is used, the first locally-running node will use "/tmp/foo/0",
    the second will use "/tmp/foo/1", the third "/tmp/foo/2", etc. Defaults to
    `true`.

`node.enable_custom_paths`::
    Boolean value that must be set to `true` in order to use the
    `index.data_path` setting. Defaults to `false`.
Add shadow replicas for shared filesystems Squashed commit of the following: commit 20835037c98e7d2fac4206c372717a05a27c4790 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 15:27:17 2015 -0700 Use Enum for "_primary" preference commit 325acbe4585179190a959ba3101ee63b99f1931a Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 14:32:41 2015 -0700 Use ?preference=_primary automatically for realtime GET operations commit edd49434af5de7e55928f27a1c9ed0fddb1fb133 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 14:32:06 2015 -0700 Move engine creation into protected createNewEngine method commit 67a797a9235d4aa376ff4af16f3944d907df4577 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 13:14:01 2015 -0700 Factor out AssertingSearcher so it can be used by mock Engines commit 62b0c28df8c23cc0b8205b33f7595c68ff940e2b Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 11:43:17 2015 -0700 Use IndexMetaData.isIndexUsingShadowReplicas helper commit 1a0d45629457578a60ae5bccbeba05acf5d79ddd Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 09:59:31 2015 -0700 Rename usesSharedFilesystem -> isOnSharedFilesystem commit 73c62df4fc7da8a5ed557620a83910d89b313aa1 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 09:58:02 2015 -0700 Add MockShadowEngine and hook it up to be used commit c8e8db473830fce1bdca3c4df80a685e782383bc Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 09:45:50 2015 -0700 Clarify comment about pre-defined mappings commit 60a4d5374af5262bd415f4ef40f635278ed12a03 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 09:18:22 2015 -0700 Add a test for shadow replicas that uses field data commit 7346f9f382f83a21cd2445b3386fe67472bc3184 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 08:37:14 2015 -0700 Revert changes to RecoveryTarget.java commit d90d6980c9b737bd8c0f4339613a5373b1645e95 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 08:35:44 2015 -0700 Rename `ownsShard` to `canDeleteShardContent` commit 23001af834d66278ac84d9a72c37b5d1f3a10a7b Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 08:35:25 2015 -0700 Remove ShadowEngineFactory, add .newReadOnlyEngine method in EngineFactory commit b64fef1d2c5e167713e869b22d388ff479252173 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 18 08:25:19 2015 -0700 Add warning that predefined mappings should be used commit a1b8b8cf0db49d1bd1aeb84e51491f7f0de43b59 Author: Lee Hinman <lee@writequit.org> Date: Tue Feb 17 14:31:50 2015 -0700 Remove unused import and fix index creation example in docs commit 0b1b852365ceafc0df86866ac3a4ffb6988b08e4 Merge: b9d1fed a22bd49 Author: Lee Hinman <lee@writequit.org> Date: Tue Feb 17 10:56:02 2015 -0700 Merge remote-tracking branch 'refs/remotes/origin/master' into shadow-replicas commit b9d1fed25ae472a9dce1904eb806702fba4d9786 Merge: 4473e63 41fd4d8 Author: Lee Hinman <lee@writequit.org> Date: Tue Feb 17 09:02:27 2015 -0700 Merge remote-tracking branch 'refs/remotes/origin/master' into shadow-replicas commit 4473e630460e2f0ca2a2e2478f3712f39a64c919 Author: Lee Hinman <lee@writequit.org> Date: Tue Feb 17 09:00:39 2015 -0700 Add asciidoc documentation for shadow replicas commit eb699c19f04965952ae45e2caf107124837c4654 Author: Simon Willnauer <simonw@apache.org> Date: Tue Feb 17 16:15:39 2015 +0100 remove last nocommit commit c5ece6d16d423fbdd36f5d789bd8daa5724d77b0 Author: Simon Willnauer <simonw@apache.org> Date: Tue Feb 17 16:13:12 2015 +0100 simplify shadow engine commit 45cd34a12a442080477da3ef14ab2fe7947ea97e Author: Simon Willnauer <simonw@apache.org> Date: Tue Feb 17 11:32:57 2015 +0100 fix tests commit 744f228c192602a6737051571e040731d413ba8b Author: Simon Willnauer <simonw@apache.org> Date: Tue Feb 17 11:28:12 2015 +0100 revert changes to IndexShardGateway - these are leftovers from previous iterations commit 11886b7653dabc23655ec76d112f291301f98f4a Author: Simon Willnauer <simonw@apache.org> Date: Tue Feb 17 11:26:48 2015 +0100 Back out non-shared FS code. this will go in in a second iteration commit 77fba571f150a0ca7fb340603669522c3ed65363 Merge: e8ad614 2e3c6a9 Author: Simon Willnauer <simonw@apache.org> Date: Tue Feb 17 11:16:46 2015 +0100 Merge branch 'master' into shadow-replicas Conflicts: src/main/java/org/elasticsearch/index/engine/Engine.java commit e8ad61467304e6d175257e389b8406d2a6cf8dba Merge: 48a700d 1b8d8da Author: Simon Willnauer <simonw@apache.org> Date: Tue Feb 17 10:54:20 2015 +0100 Merge branch 'master' into shadow-replicas commit 48a700d23cff117b8e4851d4008364f92b8272a0 Author: Simon Willnauer <simonw@apache.org> Date: Tue Feb 17 10:50:59 2015 +0100 add test for failing shadow engine / remove nocommit commit d77414c5e7b2cde830a8e3f70fe463ccc904d4d0 Author: Simon Willnauer <simonw@apache.org> Date: Tue Feb 17 10:27:56 2015 +0100 remove nocommits in IndexMetaData commit abb696563a9e418d3f842a790fcb832f91150be2 Author: Simon Willnauer <simonw@apache.org> Date: Mon Feb 16 17:05:02 2015 +0100 remove nocommit and simplify delete logic commit 82b9f0449108cd4741568d9b4495bf6c10a5b019 Author: Simon Willnauer <simonw@apache.org> Date: Mon Feb 16 16:45:27 2015 +0100 reduce the changes compared to master commit 28f069b6d99a65e285ac8c821e6a332a1d8eb315 Author: Simon Willnauer <simonw@apache.org> Date: Mon Feb 16 16:43:46 2015 +0100 fix primary relocation commit c4c999dd61a44a7a0db9798275a622f2b85b1039 Merge: 2ae80f9 455a85d Author: Simon Willnauer <simonw@apache.org> Date: Mon Feb 16 15:04:26 2015 +0100 Merge branch 'master' into shadow-replicas commit 2ae80f9689346f8fd346a0d3775a6341874d8bef Author: Lee Hinman <lee@writequit.org> Date: Fri Feb 13 16:25:34 2015 -0700 throw UnsupportedOperationException on write operations in ShadowEngine commit 740c28dd9ef987bf56b670fa1a8bcc6de2845819 Merge: e5bc047 305ba33 Author: Lee Hinman <lee@writequit.org> Date: Fri Feb 13 15:38:39 2015 -0700 Merge branch 'master' into shadow-replicas commit e5bc047d7c872ae960d397b1ae7b4b78d6a1ea10 Author: Lee Hinman <lee@writequit.org> Date: Fri Feb 13 11:38:09 2015 -0700 Don't replicate document request when using shadow replicas commit 213292e0679d8ae1492ea11861178236f4abd8ea Author: Simon Willnauer <simonw@apache.org> Date: Fri Feb 13 13:58:05 2015 +0100 add one more nocommit commit 83d171cf632f9b77cca9de58505f7db8fcda5599 Merge: aea9692 09eb8d1 Author: Simon Willnauer <simonw@apache.org> Date: Fri Feb 13 13:52:29 2015 +0100 Merge branch 'master' into shadow-replicas commit aea96920d995dacef294e48e719ba18f1ecf5860 Author: Simon Willnauer <simonw@apache.org> Date: Fri Feb 13 09:56:41 2015 +0100 revert unneeded changes on Store commit ea4e3e58dc6959a92c06d5990276268d586735f3 Author: Lee Hinman <lee@writequit.org> Date: Thu Feb 12 14:26:30 2015 -0700 Add documentation to ShadowIndexShard, remove nocommit commit 4f71c8d9f706a0c1c39aa3a370efb1604559d928 Author: Lee Hinman <lee@writequit.org> Date: Thu Feb 12 14:17:22 2015 -0700 Add documentation to ShadowEngine commit 28a9d1842722acba7ea69e0fa65200444532a30c Author: Lee Hinman <lee@writequit.org> Date: Thu Feb 12 14:08:25 2015 -0700 Remove nocommit, document canDeleteIndexContents commit d8d59dbf6d0525cd823d97268d035820e5727ac9 Author: Lee Hinman <lee@writequit.org> Date: Thu Feb 12 10:34:32 2015 -0700 Refactor more shared methods into the abstract Engine commit a7eb53c1e8b8fbfd9281b43ae39eacbe3cd1a0a6 Author: Simon Willnauer <simonw@apache.org> Date: Thu Feb 12 17:38:59 2015 +0100 Simplify shared filesystem recovery by using a dedicated recovery handler that skip most phases and enforces shard closing on the soruce before the target opens it's engine commit a62b9a70adad87d7492c526f4daf868cb05018d9 Author: Simon Willnauer <simonw@apache.org> Date: Thu Feb 12 15:59:54 2015 +0100 fix compile error after upstream changes commit abda7807bc3328a89fd783ca7ad8c6deac35f16f Merge: f229719 35f6496 Author: Simon Willnauer <simonw@apache.org> Date: Thu Feb 12 15:57:28 2015 +0100 Merge branch 'master' into shadow-replicas Conflicts: src/main/java/org/elasticsearch/index/engine/Engine.java commit f2297199b7dd5d3f9f1f109d0ddf3dd83390b0d1 Author: Simon Willnauer <simonw@apache.org> Date: Thu Feb 12 12:41:32 2015 +0100 first cut at catchup from primary make flush to a refresh factor our ShadowIndexShard to have IndexShard be idential to the master and least intrusive cleanup abstractions commit 4a367c07505b84b452807a58890f1cbe21711f27 Author: Simon Willnauer <simonw@apache.org> Date: Thu Feb 12 09:50:36 2015 +0100 fix primary promotion commit cf2fb807e7e243f1ad603a79bc9d5f31a499b769 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 16:45:41 2015 -0700 Make assertPathHasBeenCleared recursive commit 5689b7d2f84ca1c41e4459030af56cb9c0151eff Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 15:58:19 2015 -0700 Add testShadowReplicaNaturalRelocation commit fdbe4133537eaeb768747c2200cfc91878afeb97 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 15:28:57 2015 -0700 Use check for shared filesystem in primary -> primary relocation Also adds a nocommit commit 06e2eb4496762130af87ce68a47d360962091697 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 15:21:32 2015 -0700 Add a test checking that indices with shadow replicas clean up after themselves commit e4dbfb09a689b449f0edf6ee24222d7eaba2a215 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 15:08:18 2015 -0700 Fix segment info for ShadowEngine, remove test nocommit commit 80cf0e884c66eda7d59ac5d59235e1ce215af8f5 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 14:30:13 2015 -0700 Remove nocommit in ShadowEngineTests#testFailStart() commit 5e33eeaca971807b342f9be51a6a566eee005251 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 14:22:59 2015 -0700 Remove overly-complex test commit 2378fbb917b467e79c0262d7a41c23321bbeb147 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 13:45:44 2015 -0700 Fix missing import commit 52e9cd1b8334a5dd228d5d68bd03fd0040e9c8e9 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 13:45:05 2015 -0700 Add a test for replica -> primary promotion commit a95adbeded426d7f69f6ddc4cbd6712b6f6380b4 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 12:54:14 2015 -0700 Remove tests that don't apply to ShadowEngine commit 1896feda9de69e4f9cf774ef6748a5c50e953946 Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 10:29:12 2015 -0700 Add testShadowEngineIgnoresWriteOperations and testSearchResultRelease commit 67d7df41eac5e10a1dd63ddb31de74e326e9d38b Author: Lee Hinman <lee@writequit.org> Date: Wed Feb 11 10:06:05 2015 -0700 Add start of ShadowEngine unit tests commit ca9beb2d93d9b5af9aa6c75dbc0ead4ef57e220d Merge: 2d42736 57a4646 Author: Simon Willnauer <simonw@apache.org> Date: Wed Feb 11 18:03:53 2015 +0100 Merge branch 'master' into shadow-replicas commit 2d42736fed3ed8afda7e4aff10b65d292e1c6f92 Author: Simon Willnauer <simonw@apache.org> Date: Wed Feb 11 17:51:22 2015 +0100 shortcut recovery if we are on a shared FS - no need to compare files etc. commit 24d36c92dd82adce650e7ac8e9f0b43c83b2dc53 Author: Simon Willnauer <simonw@apache.org> Date: Wed Feb 11 17:08:08 2015 +0100 utilize the new delete code commit 2a2eed10f58825aae29ffe4cf01aefa5743a97c7 Merge: 343dc0b 173cfc1 Author: Simon Willnauer <simonw@apache.org> Date: Wed Feb 11 16:07:41 2015 +0100 Merge branch 'master' into shadow-replicas Conflicts: src/main/java/org/elasticsearch/gateway/GatewayMetaState.java commit 343dc0b527a7052acdc783ac5abcaad1ef78dbda Author: Simon Willnauer <simonw@apache.org> Date: Wed Feb 11 16:05:28 2015 +0100 long adder is not available in java7 commit be02cabfeebaea74b51b212957a2a466cfbfb716 Author: Lee Hinman <lee@writequit.org> Date: Tue Feb 10 22:04:24 2015 -0700 Add test that restarts nodes to ensure shadow replicas recover commit 7fcb373f0617050ca1a5a577b8cf32e32dc612b0 Author: Simon Willnauer <simonw@apache.org> Date: Tue Feb 10 23:19:21 2015 +0100 make test more evil commit 38135af0c1991b88f168ece0efb72ffe9498ff59 Author: Simon Willnauer <simonw@apache.org> Date: Tue Feb 10 22:25:11 2015 +0100 make tests pass commit 05975af69e6db63cb95f3e40d25bfa7174e006ea Author: Lee Hinman <lee@writequit.org> Date: Mon Jan 12 18:44:29 2015 +0100 Add ShadowEngine 2015-02-18 17:34:06 -05:00			`[[indices-shadow-replicas]]`
			`== Shadow replica indices`

			`experimental[]`

			`If you would like to use a shared filesystem, you can use the shadow replicas`
			`settings to choose where on disk the data for an index should be kept, as well`
			`as how Elasticsearch should replay operations on all the replica shards of an`
			`index.`

			In order to fully utilize the `index.data_path` and `index.shadow_replicas`
			`settings, you need to enable using it in elasticsearch.yml:`

			`[source,yaml]`
			`--------------------------------------------------`
			`node.enable_custom_paths: true`
			`--------------------------------------------------`

			`You can then create an index with a custom data path, where each node will use`
			`this path for the data:`

			`[WARNING]`
			`========================`
			`Because shadow replicas do not index the document on replica shards, it's`
			`possible for the replica's known mapping to be behind the index's known mapping`
			`if the latest cluster state has not yet been processed on the node containing`
			`the replica. Because of this, it is highly recommended to use pre-defined`
			`mappings when using shadow replicas.`
			`========================`

			`[source,js]`
			`--------------------------------------------------`
			`curl -XPUT 'localhost:9200/my_index' -d '`
			`{`
			`"index" : {`
			`"number_of_shards" : 1,`
			`"number_of_replicas" : 4,`
			`"data_path": "/var/data/my_index",`
			`"shadow_replicas": true`
			`}`
			`}'`
			`--------------------------------------------------`

			`[WARNING]`
			`========================`
			`In the above example, the "/var/data/my_index" path is a shared filesystem that`
			`must be available on every node in the Elasticsearch cluster. You must also`
			`ensure that the Elasticsearch process has the correct permissions to read from`
			and write to the directory used in the `index.data_path` setting.
			`========================`

			An index that has been created with the `index.shadow_replicas` setting set to
			`"true" will not replicate document operations to any of the replica shards,`
			`instead, it will only continually refresh. Once segments are available on the`
			`filesystem where the shadow replica resides (after an Elasticsearch "flush"), a`
			regular refresh (governed by the `index.refresh_interval`) can be used to make
			`the new data searchable.`

			`NOTE: Since documents are only indexed on the primary shard, realtime GET`
			`requests could fail to return a document if executed on the replica shard,`
			therefore, GET API requests automatically have the `?preference=_primary` flag
			`set if there is no preference flag already set.`

			`In order to ensure the data is being synchronized in a fast enough manner, you`
			`may need to tune the flush threshold for the index to a desired number. A flush`
			`is needed to fsync segment files to disk, so they will be visible to all other`
			`replica nodes. Users should test what flush threshold levels they are`
			`comfortable with, as increased flushing can impact indexing performance.`

			`The Elasticsearch cluster will still detect the loss of a primary shard, and`
			`transform the replica into a primary in this situation. This transformation will`
			take slightly longer, since no `IndexWriter` is maintained for each shadow
			`replica.`

			`Below is the list of settings that can be changed using the update`
			`settings API:`

			`index.data_path` (string)::
			`Path to use for the index's data. Note that by default Elasticsearch will`
			`append the node ordinal by default to the path to ensure multiple instances`
			`of Elasticsearch on the same machine do not share a data directory.`

			`index.shadow_replicas`::
			`Boolean value indicating this index should use shadow replicas. Defaults to`
			`false`.

			`index.shared_filesystem`::
			`Boolean value indicating this index uses a shared filesystem. Defaults to`
			the `true` if `index.shadow_replicas` is set to true, `false` otherwise.

			`=== Node level settings related to shadow replicas`

			These are non-dynamic settings that need to be configured in `elasticsearch.yml`

			`node.add_id_to_custom_path`::
			`Boolean setting indicating whether Elasticsearch should append the node's`
			`ordinal to the custom data path. For example, if this is enabled and a path`
			`of "/tmp/foo" is used, the first locally-running node will use "/tmp/foo/0",`
			`the second will use "/tmp/foo/1", the third "/tmp/foo/2", etc. Defaults to`
			`true`.

			`node.enable_custom_paths`::
			Boolean value that must be set to `true` in order to use the
			`index.data_path` setting. Defaults to `false`.