From 323111b715eb384337c1188d6ca52555127c1acf Mon Sep 17 00:00:00 2001 From: Costin Leau Date: Sun, 20 Dec 2015 01:49:28 +0200 Subject: [PATCH] [DOC] simplify docs for repository-hdfs --- docs/plugins/repository-hdfs.asciidoc | 71 +++++---------------------- 1 file changed, 11 insertions(+), 60 deletions(-) diff --git a/docs/plugins/repository-hdfs.asciidoc b/docs/plugins/repository-hdfs.asciidoc index 114dbf13035..53052604514 100644 --- a/docs/plugins/repository-hdfs.asciidoc +++ b/docs/plugins/repository-hdfs.asciidoc @@ -8,29 +8,25 @@ The HDFS repository plugin adds support for using HDFS File System as a reposito [float] ==== Installation -This plugin can be installed using the plugin manager using _one_ of the following packages: +This plugin can be installed through the plugin manager: [source,sh] ---------------------------------------------------------------- sudo bin/plugin install repository-hdfs -sudo bin/plugin install repository-hdfs-hadoop2 -sudo bin/plugin install repository-hdfs-lite ---------------------------------------------------------------- -The chosen plugin must be installed on every node in the cluster, and each node must +The plugin must be installed on _every_ node in the cluster, and each node must be restarted after installation. [[repository-hdfs-remove]] [float] ==== Removal -The plugin can be removed by specifying the _installed_ package using _one_ of the following commands: +The plugin can be removed by specifying the _installed_ package: [source,sh] ---------------------------------------------------------------- sudo bin/plugin remove repository-hdfs -sudo bin/plugin remove repository-hdfs-hadoop2 -sudo bin/plugin remove repository-hdfs-lite ---------------------------------------------------------------- The node must be stopped before removing the plugin. @@ -38,49 +34,15 @@ The node must be stopped before removing the plugin. [[repository-hdfs-usage]] ==== Getting started with HDFS -The HDFS snapshot/restore plugin comes in three _flavors_: +The HDFS snapshot/restore plugin is built against the latest Apache Hadoop 2.x (currently 2.7.1). If the distro you are using is not protocol +compatible with Apache Hadoop, consider replacing the Hadoop libraries inside the plugin folder with your own (you might have to adjust the security permissions required). -* Default / Hadoop 1.x:: -The default version contains the plugin jar alongside Apache Hadoop 1.x (stable) dependencies. -* YARN / Hadoop 2.x:: -The `hadoop2` version contains the plugin jar plus the Apache Hadoop 2.x (also known as YARN) dependencies. -* Lite:: -The `lite` version contains just the plugin jar, without any Hadoop dependencies. The user should provide these (read below). +Even if Hadoop is already installed on the Elasticsearch nodes, for security reasons, the required libraries need to be placed under the plugin folder. +Note that in most cases, if the distro is compatible, one simply needs to configure the repository with the appropriate Hadoop configuration files (see below). -[[repository-hdfs-flavor]] -===== What version to use? - -It depends on whether Hadoop is locally installed or not and if not, whether it is compatible with Apache Hadoop clients. - -* Are you using Apache Hadoop (or a _compatible_ distro) and do not have installed on the Elasticsearch nodes?:: -+ -If the answer is yes, for Apache Hadoop 1 use the default `repository-hdfs` or `repository-hdfs-hadoop2` for Apache Hadoop 2. -+ -* If you are have Hadoop installed locally on the Elasticsearch nodes or are using a certain distro:: -+ -Use the `lite` version and place your Hadoop _client_ jars and their dependencies in the plugin folder under `hadoop-libs`. -For large deployments, it is recommended to package the libraries in the plugin zip and deploy it manually across nodes -(and thus avoiding having to do the libraries setup on each node). - -[[repository-hdfs-security]] -==== Handling JVM Security and Permissions - -Out of the box, Elasticsearch runs in a JVM with the security manager turned _on_ to make sure that unsafe or sensitive actions -are allowed only from trusted code. Hadoop however is not really designed to run under one; it does not rely on privileged blocks -to execute sensitive code, of which it uses plenty. - -The `repository-hdfs` plugin provides the necessary permissions for both Apache Hadoop 1.x and 2.x (latest versions) to successfully -run in a secured JVM as one can tell from the number of permissions required when installing the plugin. -However using a certain Hadoop File-System (outside DFS), a certain distro or operating system (in particular Windows), might require -additional permissions which are not provided by the plugin. - -In this case there are several workarounds: -* add the permission into `plugin-security.policy` (available in the plugin folder) - -* disable the security manager through `es.security.manager.enabled=false` configurations setting - NOT RECOMMENDED - -If you find yourself in such a situation, please let us know what Hadoop distro version and OS you are using and what permission is missing -by raising an issue. Thank you! +Windows Users:: +Using Apache Hadoop on Windows is problematic and thus it is not recommended. For those _really_ wanting to use it, make sure you place the elusive `winutils.exe` under the +plugin folder and point `HADOOP_HOME` variable to it; this should minimize the amount of permissions Hadoop requires (though one would still have to add some more). [[repository-hdfs-config]] ==== Configuration Properties @@ -104,15 +66,4 @@ repositories ---- NOTE: Be careful when including a paths within the `uri` setting; Some implementations ignore them completely while -others consider them. In general, we recommend keeping the `uri` to a minimum and using the `path` element instead. - -[[repository-hdfs-other-fs]] -==== Plugging other file-systems - -Any HDFS-compatible file-systems (like Amazon `s3://` or Google `gs://`) can be used as long as the proper Hadoop -configuration is passed to the Elasticsearch plugin. In practice, this means making sure the correct Hadoop configuration -files (`core-site.xml` and `hdfs-site.xml`) and its jars are available in plugin classpath, just as you would with any -other Hadoop client or job. - -Otherwise, the plugin will only read the _default_, vanilla configuration of Hadoop and will not be able to recognized -the plugged-in file-system. +others consider them. In general, we recommend keeping the `uri` to a minimum and using the `path` element instead. \ No newline at end of file