[DOC] simplify docs for repository-hdfs
This commit is contained in:
parent
d171773bdb
commit
323111b715
|
@ -8,29 +8,25 @@ The HDFS repository plugin adds support for using HDFS File System as a reposito
|
|||
[float]
|
||||
==== Installation
|
||||
|
||||
This plugin can be installed using the plugin manager using _one_ of the following packages:
|
||||
This plugin can be installed through the plugin manager:
|
||||
|
||||
[source,sh]
|
||||
----------------------------------------------------------------
|
||||
sudo bin/plugin install repository-hdfs
|
||||
sudo bin/plugin install repository-hdfs-hadoop2
|
||||
sudo bin/plugin install repository-hdfs-lite
|
||||
----------------------------------------------------------------
|
||||
|
||||
The chosen plugin must be installed on every node in the cluster, and each node must
|
||||
The plugin must be installed on _every_ node in the cluster, and each node must
|
||||
be restarted after installation.
|
||||
|
||||
[[repository-hdfs-remove]]
|
||||
[float]
|
||||
==== Removal
|
||||
|
||||
The plugin can be removed by specifying the _installed_ package using _one_ of the following commands:
|
||||
The plugin can be removed by specifying the _installed_ package:
|
||||
|
||||
[source,sh]
|
||||
----------------------------------------------------------------
|
||||
sudo bin/plugin remove repository-hdfs
|
||||
sudo bin/plugin remove repository-hdfs-hadoop2
|
||||
sudo bin/plugin remove repository-hdfs-lite
|
||||
----------------------------------------------------------------
|
||||
|
||||
The node must be stopped before removing the plugin.
|
||||
|
@ -38,49 +34,15 @@ The node must be stopped before removing the plugin.
|
|||
[[repository-hdfs-usage]]
|
||||
==== Getting started with HDFS
|
||||
|
||||
The HDFS snapshot/restore plugin comes in three _flavors_:
|
||||
The HDFS snapshot/restore plugin is built against the latest Apache Hadoop 2.x (currently 2.7.1). If the distro you are using is not protocol
|
||||
compatible with Apache Hadoop, consider replacing the Hadoop libraries inside the plugin folder with your own (you might have to adjust the security permissions required).
|
||||
|
||||
* Default / Hadoop 1.x::
|
||||
The default version contains the plugin jar alongside Apache Hadoop 1.x (stable) dependencies.
|
||||
* YARN / Hadoop 2.x::
|
||||
The `hadoop2` version contains the plugin jar plus the Apache Hadoop 2.x (also known as YARN) dependencies.
|
||||
* Lite::
|
||||
The `lite` version contains just the plugin jar, without any Hadoop dependencies. The user should provide these (read below).
|
||||
Even if Hadoop is already installed on the Elasticsearch nodes, for security reasons, the required libraries need to be placed under the plugin folder.
|
||||
Note that in most cases, if the distro is compatible, one simply needs to configure the repository with the appropriate Hadoop configuration files (see below).
|
||||
|
||||
[[repository-hdfs-flavor]]
|
||||
===== What version to use?
|
||||
|
||||
It depends on whether Hadoop is locally installed or not and if not, whether it is compatible with Apache Hadoop clients.
|
||||
|
||||
* Are you using Apache Hadoop (or a _compatible_ distro) and do not have installed on the Elasticsearch nodes?::
|
||||
+
|
||||
If the answer is yes, for Apache Hadoop 1 use the default `repository-hdfs` or `repository-hdfs-hadoop2` for Apache Hadoop 2.
|
||||
+
|
||||
* If you are have Hadoop installed locally on the Elasticsearch nodes or are using a certain distro::
|
||||
+
|
||||
Use the `lite` version and place your Hadoop _client_ jars and their dependencies in the plugin folder under `hadoop-libs`.
|
||||
For large deployments, it is recommended to package the libraries in the plugin zip and deploy it manually across nodes
|
||||
(and thus avoiding having to do the libraries setup on each node).
|
||||
|
||||
[[repository-hdfs-security]]
|
||||
==== Handling JVM Security and Permissions
|
||||
|
||||
Out of the box, Elasticsearch runs in a JVM with the security manager turned _on_ to make sure that unsafe or sensitive actions
|
||||
are allowed only from trusted code. Hadoop however is not really designed to run under one; it does not rely on privileged blocks
|
||||
to execute sensitive code, of which it uses plenty.
|
||||
|
||||
The `repository-hdfs` plugin provides the necessary permissions for both Apache Hadoop 1.x and 2.x (latest versions) to successfully
|
||||
run in a secured JVM as one can tell from the number of permissions required when installing the plugin.
|
||||
However using a certain Hadoop File-System (outside DFS), a certain distro or operating system (in particular Windows), might require
|
||||
additional permissions which are not provided by the plugin.
|
||||
|
||||
In this case there are several workarounds:
|
||||
* add the permission into `plugin-security.policy` (available in the plugin folder)
|
||||
|
||||
* disable the security manager through `es.security.manager.enabled=false` configurations setting - NOT RECOMMENDED
|
||||
|
||||
If you find yourself in such a situation, please let us know what Hadoop distro version and OS you are using and what permission is missing
|
||||
by raising an issue. Thank you!
|
||||
Windows Users::
|
||||
Using Apache Hadoop on Windows is problematic and thus it is not recommended. For those _really_ wanting to use it, make sure you place the elusive `winutils.exe` under the
|
||||
plugin folder and point `HADOOP_HOME` variable to it; this should minimize the amount of permissions Hadoop requires (though one would still have to add some more).
|
||||
|
||||
[[repository-hdfs-config]]
|
||||
==== Configuration Properties
|
||||
|
@ -104,15 +66,4 @@ repositories
|
|||
----
|
||||
|
||||
NOTE: Be careful when including a paths within the `uri` setting; Some implementations ignore them completely while
|
||||
others consider them. In general, we recommend keeping the `uri` to a minimum and using the `path` element instead.
|
||||
|
||||
[[repository-hdfs-other-fs]]
|
||||
==== Plugging other file-systems
|
||||
|
||||
Any HDFS-compatible file-systems (like Amazon `s3://` or Google `gs://`) can be used as long as the proper Hadoop
|
||||
configuration is passed to the Elasticsearch plugin. In practice, this means making sure the correct Hadoop configuration
|
||||
files (`core-site.xml` and `hdfs-site.xml`) and its jars are available in plugin classpath, just as you would with any
|
||||
other Hadoop client or job.
|
||||
|
||||
Otherwise, the plugin will only read the _default_, vanilla configuration of Hadoop and will not be able to recognized
|
||||
the plugged-in file-system.
|
||||
others consider them. In general, we recommend keeping the `uri` to a minimum and using the `path` element instead.
|
Loading…
Reference in New Issue