OpenSearch/docs/plugins/repository-hdfs.asciidoc

[[repository-hdfs]]
=== Hadoop HDFS Repository Plugin

The HDFS repository plugin adds support for using HDFS File System as a repository for
{ref}/modules-snapshots.html[Snapshot/Restore].

[[repository-hdfs-install]]
[float]
==== Installation

This plugin can be installed through the plugin manager:

[source,sh]
----------------------------------------------------------------
sudo bin/elasticsearch-plugin install repository-hdfs
----------------------------------------------------------------

The plugin must be installed on _every_ node in the cluster, and each node must
be restarted after installation.

This plugin can be downloaded for <<plugin-management-custom-url,offline install>> from
{plugin_url}/repository-hdfs/{version}/repository-hdfs-{version}.zip.

[[repository-hdfs-remove]]
[float]
==== Removal

The plugin can be removed by specifying the _installed_ package:

[source,sh]
----------------------------------------------------------------
sudo bin/elasticsearch-plugin remove repository-hdfs
----------------------------------------------------------------

The node must be stopped before removing the plugin.

[[repository-hdfs-usage]]
==== Getting started with HDFS

The HDFS snapshot/restore plugin is built against the latest Apache Hadoop 2.x (currently 2.7.1). If the distro you are using is not protocol
compatible with Apache Hadoop, consider replacing the Hadoop libraries inside the plugin folder with your own (you might have to adjust the security permissions required).

Even if Hadoop is already installed on the Elasticsearch nodes, for security reasons, the required libraries need to be placed under the plugin folder. Note that in most cases, if the distro is compatible, one simply needs to configure the repository with the appropriate Hadoop configuration files (see below).

Windows Users::
Using Apache Hadoop on Windows is problematic and thus it is not recommended. For those _really_ wanting to use it, make sure you place the elusive `winutils.exe` under the
plugin folder and point `HADOOP_HOME` variable to it; this should minimize the amount of permissions Hadoop requires (though one would still have to add some more).

[[repository-hdfs-config]]
==== Configuration Properties

Once installed, define the configuration for the `hdfs` repository through the
{ref}/modules-snapshots.html[REST API]:

[source,js]
----
PUT _snapshot/my_hdfs_repository
{
  "type": "hdfs",
  "settings": {
    "uri": "hdfs://namenode:8020/",
    "path": "elasticsearch/respositories/my_hdfs_repository",
    "conf.dfs.client.read.shortcircuit": "true"
  }
}
----
// CONSOLE
// TEST[skip:we don't have hdfs set up while testing this]

The following settings are supported:

[horizontal]
`uri`::

    The uri address for hdfs. ex: "hdfs://<host>:<port>/". (Required)

`path`::

    The file path within the filesystem where data is stored/loaded. ex: "path/to/file". (Required)

`load_defaults`::

    Whether to load the default Hadoop configuration or not. (Enabled by default)

`conf.<key>`::

    Inlined configuration parameter to be added to Hadoop configuration. (Optional)
    Only client oriented properties from the hadoop http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml[core] and http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs] configuration files will be recognized by the plugin.

`compress`::

    Whether to compress the metadata or not. (Disabled by default)

`chunk_size`::

    Override the chunk size. (Disabled by default)


Alternatively, you can define the `hdfs` repository and its settings in your `elasticsearch.yml`:
[source,yaml]
----
repositories:
  hdfs:
    uri: "hdfs://<host>:<port>/"    \# required - HDFS address only
    path: "some/path"               \# required - path within the file-system where data is stored/loaded
    load_defaults: "true"           \# optional - whether to load the default Hadoop configuration (default) or not
    conf.<key> : "<value>"          \# optional - 'inlined' key=value added to the Hadoop configuration
    compress: "false"               \# optional - whether to compress the metadata or not (default)
    chunk_size: "10mb"              \# optional - chunk size (disabled by default)
----
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00			`[[repository-hdfs]]`
			`=== Hadoop HDFS Repository Plugin`

			`The HDFS repository plugin adds support for using HDFS File System as a repository for`
			`{ref}/modules-snapshots.html[Snapshot/Restore].`

			`[[repository-hdfs-install]]`
			`[float]`
			`==== Installation`

[DOC] simplify docs for repository-hdfs 2015-12-19 18:49:28 -05:00			`This plugin can be installed through the plugin manager:`
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00
			`[source,sh]`
			`----------------------------------------------------------------`
Rename bin/plugin in bin/elasticsearch-plugin 2016-02-04 10:00:55 -05:00			`sudo bin/elasticsearch-plugin install repository-hdfs`
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00			`----------------------------------------------------------------`

[DOC] simplify docs for repository-hdfs 2015-12-19 18:49:28 -05:00			`The plugin must be installed on _every_ node in the cluster, and each node must`
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00			`be restarted after installation.`

Add link to "offline install" and show the full URL 2016-09-19 09:04:29 -04:00			`This plugin can be downloaded for <<plugin-management-custom-url,offline install>> from`
			`{plugin_url}/repository-hdfs/{version}/repository-hdfs-{version}.zip.`
Add offline install instructions for plugins Follow up of https://github.com/elastic/elasticsearch/issues/15925#issuecomment-171250150 This commit adds offline install instructions for plugins. 2016-09-12 09:34:44 -04:00
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00			`[[repository-hdfs-remove]]`
			`[float]`
			`==== Removal`

[DOC] simplify docs for repository-hdfs 2015-12-19 18:49:28 -05:00			`The plugin can be removed by specifying the _installed_ package:`
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00
			`[source,sh]`
			`----------------------------------------------------------------`
Rename bin/plugin in bin/elasticsearch-plugin 2016-02-04 10:00:55 -05:00			`sudo bin/elasticsearch-plugin remove repository-hdfs`
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00			`----------------------------------------------------------------`

			`The node must be stopped before removing the plugin.`

			`[[repository-hdfs-usage]]`
			`==== Getting started with HDFS`

[DOC] simplify docs for repository-hdfs 2015-12-19 18:49:28 -05:00			`The HDFS snapshot/restore plugin is built against the latest Apache Hadoop 2.x (currently 2.7.1). If the distro you are using is not protocol`
			`compatible with Apache Hadoop, consider replacing the Hadoop libraries inside the plugin folder with your own (you might have to adjust the security permissions required).`
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00
Restrict usage to HDFS only 2015-12-20 08:53:18 -05:00			`Even if Hadoop is already installed on the Elasticsearch nodes, for security reasons, the required libraries need to be placed under the plugin folder. Note that in most cases, if the distro is compatible, one simply needs to configure the repository with the appropriate Hadoop configuration files (see below).`
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00
[DOC] simplify docs for repository-hdfs 2015-12-19 18:49:28 -05:00			`Windows Users::`
			Using Apache Hadoop on Windows is problematic and thus it is not recommended. For those _really_ wanting to use it, make sure you place the elusive `winutils.exe` under the
			plugin folder and point `HADOOP_HOME` variable to it; this should minimize the amount of permissions Hadoop requires (though one would still have to add some more).
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00
			`[[repository-hdfs-config]]`
			`==== Configuration Properties`

Updating HDFS repository plugin documentation (#19423) 2016-07-14 16:12:59 -04:00			Once installed, define the configuration for the `hdfs` repository through the
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00			`{ref}/modules-snapshots.html[REST API]:`

Updating HDFS repository plugin documentation (#19423) 2016-07-14 16:12:59 -04:00			`[source,js]`
			`----`
			`PUT _snapshot/my_hdfs_repository`
			`{`
			`"type": "hdfs",`
			`"settings": {`
			`"uri": "hdfs://namenode:8020/",`
			`"path": "elasticsearch/respositories/my_hdfs_repository",`
			`"conf.dfs.client.read.shortcircuit": "true"`
			`}`
			`}`
			`----`
			`// CONSOLE`
			`// TEST[skip:we don't have hdfs set up while testing this]`

			`The following settings are supported:`

			`[horizontal]`
			`uri`::

			`The uri address for hdfs. ex: "hdfs://<host>:<port>/". (Required)`

			`path`::

			`The file path within the filesystem where data is stored/loaded. ex: "path/to/file". (Required)`

			`load_defaults`::

			`Whether to load the default Hadoop configuration or not. (Enabled by default)`

			`conf.<key>`::

			`Inlined configuration parameter to be added to Hadoop configuration. (Optional)`
			`Only client oriented properties from the hadoop http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml[core] and http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs] configuration files will be recognized by the plugin.`

			`compress`::

			`Whether to compress the metadata or not. (Disabled by default)`

			`chunk_size`::

			`Override the chunk size. (Disabled by default)`


			Alternatively, you can define the `hdfs` repository and its settings in your `elasticsearch.yml`:
[DOC] escape # in programlisting 2015-12-15 09:44:27 -05:00			`[source,yaml]`
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00			`----`
Updating HDFS repository plugin documentation (#19423) 2016-07-14 16:12:59 -04:00			`repositories:`
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00			`hdfs:`
Restrict usage to HDFS only 2015-12-20 08:53:18 -05:00			`uri: "hdfs://<host>:<port>/" \# required - HDFS address only`
			`path: "some/path" \# required - path within the file-system where data is stored/loaded`
[DOC] escape # in programlisting 2015-12-15 09:44:27 -05:00			`load_defaults: "true" \# optional - whether to load the default Hadoop configuration (default) or not`
			`conf.<key> : "<value>" \# optional - 'inlined' key=value added to the Hadoop configuration`
			`compress: "false" \# optional - whether to compress the metadata or not (default)`
			`chunk_size: "10mb" \# optional - chunk size (disabled by default)`
HDFS Snapshot/Restore plugin Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle) 2015-11-24 18:04:40 -05:00			`----`