[DOC] simplify docs for repository-hdfs

This commit is contained in:
Costin Leau 2015-12-20 01:49:28 +02:00
parent d171773bdb
commit 323111b715
1 changed files with 11 additions and 60 deletions

View File

@ -8,29 +8,25 @@ The HDFS repository plugin adds support for using HDFS File System as a reposito
[float]
==== Installation
This plugin can be installed using the plugin manager using _one_ of the following packages:
This plugin can be installed through the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin install repository-hdfs
sudo bin/plugin install repository-hdfs-hadoop2
sudo bin/plugin install repository-hdfs-lite
----------------------------------------------------------------
The chosen plugin must be installed on every node in the cluster, and each node must
The plugin must be installed on _every_ node in the cluster, and each node must
be restarted after installation.
[[repository-hdfs-remove]]
[float]
==== Removal
The plugin can be removed by specifying the _installed_ package using _one_ of the following commands:
The plugin can be removed by specifying the _installed_ package:
[source,sh]
----------------------------------------------------------------
sudo bin/plugin remove repository-hdfs
sudo bin/plugin remove repository-hdfs-hadoop2
sudo bin/plugin remove repository-hdfs-lite
----------------------------------------------------------------
The node must be stopped before removing the plugin.
@ -38,49 +34,15 @@ The node must be stopped before removing the plugin.
[[repository-hdfs-usage]]
==== Getting started with HDFS
The HDFS snapshot/restore plugin comes in three _flavors_:
The HDFS snapshot/restore plugin is built against the latest Apache Hadoop 2.x (currently 2.7.1). If the distro you are using is not protocol
compatible with Apache Hadoop, consider replacing the Hadoop libraries inside the plugin folder with your own (you might have to adjust the security permissions required).
* Default / Hadoop 1.x::
The default version contains the plugin jar alongside Apache Hadoop 1.x (stable) dependencies.
* YARN / Hadoop 2.x::
The `hadoop2` version contains the plugin jar plus the Apache Hadoop 2.x (also known as YARN) dependencies.
* Lite::
The `lite` version contains just the plugin jar, without any Hadoop dependencies. The user should provide these (read below).
Even if Hadoop is already installed on the Elasticsearch nodes, for security reasons, the required libraries need to be placed under the plugin folder.
Note that in most cases, if the distro is compatible, one simply needs to configure the repository with the appropriate Hadoop configuration files (see below).
[[repository-hdfs-flavor]]
===== What version to use?
It depends on whether Hadoop is locally installed or not and if not, whether it is compatible with Apache Hadoop clients.
* Are you using Apache Hadoop (or a _compatible_ distro) and do not have installed on the Elasticsearch nodes?::
+
If the answer is yes, for Apache Hadoop 1 use the default `repository-hdfs` or `repository-hdfs-hadoop2` for Apache Hadoop 2.
+
* If you are have Hadoop installed locally on the Elasticsearch nodes or are using a certain distro::
+
Use the `lite` version and place your Hadoop _client_ jars and their dependencies in the plugin folder under `hadoop-libs`.
For large deployments, it is recommended to package the libraries in the plugin zip and deploy it manually across nodes
(and thus avoiding having to do the libraries setup on each node).
[[repository-hdfs-security]]
==== Handling JVM Security and Permissions
Out of the box, Elasticsearch runs in a JVM with the security manager turned _on_ to make sure that unsafe or sensitive actions
are allowed only from trusted code. Hadoop however is not really designed to run under one; it does not rely on privileged blocks
to execute sensitive code, of which it uses plenty.
The `repository-hdfs` plugin provides the necessary permissions for both Apache Hadoop 1.x and 2.x (latest versions) to successfully
run in a secured JVM as one can tell from the number of permissions required when installing the plugin.
However using a certain Hadoop File-System (outside DFS), a certain distro or operating system (in particular Windows), might require
additional permissions which are not provided by the plugin.
In this case there are several workarounds:
* add the permission into `plugin-security.policy` (available in the plugin folder)
* disable the security manager through `es.security.manager.enabled=false` configurations setting - NOT RECOMMENDED
If you find yourself in such a situation, please let us know what Hadoop distro version and OS you are using and what permission is missing
by raising an issue. Thank you!
Windows Users::
Using Apache Hadoop on Windows is problematic and thus it is not recommended. For those _really_ wanting to use it, make sure you place the elusive `winutils.exe` under the
plugin folder and point `HADOOP_HOME` variable to it; this should minimize the amount of permissions Hadoop requires (though one would still have to add some more).
[[repository-hdfs-config]]
==== Configuration Properties
@ -104,15 +66,4 @@ repositories
----
NOTE: Be careful when including a paths within the `uri` setting; Some implementations ignore them completely while
others consider them. In general, we recommend keeping the `uri` to a minimum and using the `path` element instead.
[[repository-hdfs-other-fs]]
==== Plugging other file-systems
Any HDFS-compatible file-systems (like Amazon `s3://` or Google `gs://`) can be used as long as the proper Hadoop
configuration is passed to the Elasticsearch plugin. In practice, this means making sure the correct Hadoop configuration
files (`core-site.xml` and `hdfs-site.xml`) and its jars are available in plugin classpath, just as you would with any
other Hadoop client or job.
Otherwise, the plugin will only read the _default_, vanilla configuration of Hadoop and will not be able to recognized
the plugged-in file-system.
others consider them. In general, we recommend keeping the `uri` to a minimum and using the `path` element instead.