From f9c938666b1dac84514a2a18c4f8d632959afd11 Mon Sep 17 00:00:00 2001 From: Uma Maheswara Rao G Date: Tue, 12 Jun 2012 18:19:46 +0000 Subject: [PATCH] HDFS-3389. Document the BKJM usage in Namenode HA. Contributed by Uma Maheswara Rao G and Ivan Kelly. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1349466 13f79535-47bb-0310-9956-ffa450edef68 --- .../src/site/apt/HDFSHighAvailability.apt.vm | 153 +++++++++++++++++- 1 file changed, 152 insertions(+), 1 deletion(-) diff --git a/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/HDFSHighAvailability.apt.vm b/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/HDFSHighAvailability.apt.vm index 67d423221c0..7e7cb66772f 100644 --- a/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/HDFSHighAvailability.apt.vm +++ b/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/HDFSHighAvailability.apt.vm @@ -712,4 +712,155 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda Even if automatic failover is configured, you may initiate a manual failover using the same <<>> command. It will perform a coordinated - failover. \ No newline at end of file + failover. + + +* BookKeeper as a Shared storage (EXPERIMENTAL) + + One option for shared storage for the NameNode is BookKeeper. + BookKeeper achieves high availability and strong durability guarantees by replicating + edit log entries across multiple storage nodes. The edit log can be striped across + the storage nodes for high performance. Fencing is supported in the protocol, i.e, + BookKeeper will not allow two writers to write the single edit log. + + The meta data for BookKeeper is stored in ZooKeeper. + In current HA architecture, a Zookeeper cluster is required for ZKFC. The same cluster can be + for BookKeeper metadata. + + For more details on building a BookKeeper cluster, please refer to the + {{{http://zookeeper.apache.org/bookkeeper/docs/trunk/bookkeeperConfig.html }BookKeeper documentation}} + + The BookKeeperJournalManager is an implementation of the HDFS JournalManager interface, which allows custom write ahead logging implementations to be plugged into the HDFS NameNode. + + **<> + + To use BookKeeperJournalManager, add the following to hdfs-site.xml. + +---- + + dfs.namenode.shared.edits.dir + bookkeeper://zk1:2181;zk2:2181;zk3:2181/hdfsjournal + + + + dfs.namenode.edits.journal-plugin.bookkeeper + org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager + +---- + + The URI format for bookkeeper is <<>> is a list of semi-colon separated, zookeeper host:port + pairs. In the example above there are 3 servers, in the ensemble, + zk1, zk2 & zk3, each one listening on port 2181. + + <<<[root znode]>>> is the path of the zookeeper znode, under which the edit log + information will be stored. + + The class specified for the journal-plugin must be available in the NameNode's + classpath. We explain how to generate a jar file with the journal manager and + its dependencies, and how to put it into the classpath below. + + *** <> + + * <> - + Number of bytes a bookkeeper journal stream will buffer before + forcing a flush. Default is 1024. + +---- + + dfs.namenode.bookkeeperjournal.output-buffer-size + 1024 + +---- + + * <> - + Number of bookkeeper servers in edit log ensembles. This + is the number of bookkeeper servers which need to be available + for the edit log to be writable. Default is 3. + +---- + + dfs.namenode.bookkeeperjournal.ensemble-size + 3 + +---- + + * <> - + Number of bookkeeper servers in the write quorum. This is the + number of bookkeeper servers which must have acknowledged the + write of an entry before it is considered written. Default is 2. + +---- + + dfs.namenode.bookkeeperjournal.quorum-size + 2 + +---- + + * <> - + Password to use when creating edit log segments. + +---- + + dfs.namenode.bookkeeperjournal.digestPw + myPassword + +---- + + * <> - + Session timeout for Zookeeper client from BookKeeper Journal Manager. + Hadoop recommends that this value should be less than the ZKFC + session timeout value. Default value is 3000. + +---- + + dfs.namenode.bookkeeperjournal.zk.session.timeout + 3000 + +---- + + *** <> + + To generate the distribution packages for BK journal, do the + following. + + $ mvn clean package -Pdist + + This will generate a jar with the BookKeeperJournalManager, all the dependencies + needed by the journal manager, + hadoop-hdfs/src/contrib/bkjournal/target/hadoop-hdfs-bkjournal-.jar + + Note that the -Pdist part of the build command is important, as otherwise + the dependencies would not be packaged in the jar. The dependencies included in + the jar are {{{http://maven.apache.org/plugins/maven-shade-plugin/}shaded}} to + avoid conflicts with other dependencies of the NameNode. + + *** <> + + To run a HDFS namenode using BookKeeper as a backend, copy the bkjournal + jar, generated above, into the lib directory of hdfs. In the standard + distribution of HDFS, this is at $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/ + + cp hadoop-hdfs/src/contrib/bkjournal/target/hadoop-hdfs-bkjournal-.jar $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/ + + *** <> + + 1) NameNode format command will not format the BookKeeper data automatically. + We have to clean the data manually from BookKeeper cluster + and create the /ledgers/available path in Zookeeper. +---- +$ zkCli.sh create /ledgers 0 +$ zkCli.sh create /ledgers/available 0 +---- + Note: + bookkeeper://zk1:2181;zk2:2181;zk3:2181/hdfsjournal + The final part /hdfsjournal specifies the znode in zookeeper where + ledger metadata will be stored. Administrators may set this to anything + they wish. + + 2) Security in BookKeeper. BookKeeper does not support SASL nor SSL for + connections between the NameNode and BookKeeper storage nodes. + + 3) Auto-Recovery of storage node failures. Work inprogress + {{{https://issues.apache.org/jira/browse/BOOKKEEPER-237 }BOOKKEEPER-237}}. + Currently we have the tools to manually recover the data from failed storage nodes. \ No newline at end of file