HDFS-7581. HDFS documentation needs updating post-shell rewrite (aw)

This commit is contained in:
Allen Wittenauer 2015-01-15 07:48:55 -08:00
parent 533e551eb4
commit ce0117636a
6 changed files with 560 additions and 287 deletions

View File

@ -265,6 +265,8 @@ Trunk (Unreleased)
HDFS-7407. Minor typo in privileged pid/out/log names (aw)
HDFS-7581. HDFS documentation needs updating post-shell rewrite (aw)
Release 2.7.0 - UNRELEASED
INCOMPATIBLE CHANGES

View File

@ -34,40 +34,40 @@ HDFS Federation
* Consists of directories, files and blocks
* It supports all the namespace related file system operations such as
* It supports all the namespace related file system operations such as
create, delete, modify and list files and directories.
* <<Block Storage Service>> has two parts
* Block Management (which is done in Namenode)
* Provides datanode cluster membership by handling registrations, and
* Provides datanode cluster membership by handling registrations, and
periodic heart beats.
* Processes block reports and maintains location of blocks.
* Supports block related operations such as create, delete, modify and
* Supports block related operations such as create, delete, modify and
get block location.
* Manages replica placement and replication of a block for under
* Manages replica placement and replication of a block for under
replicated blocks and deletes blocks that are over replicated.
* Storage - is provided by datanodes by storing blocks on the local file
* Storage - is provided by datanodes by storing blocks on the local file
system and allows read/write access.
The prior HDFS architecture allows only a single namespace for the
entire cluster. A single Namenode manages this namespace. HDFS
Federation addresses limitation of the prior architecture by adding
The prior HDFS architecture allows only a single namespace for the
entire cluster. A single Namenode manages this namespace. HDFS
Federation addresses limitation of the prior architecture by adding
support multiple Namenodes/namespaces to HDFS file system.
* {Multiple Namenodes/Namespaces}
In order to scale the name service horizontally, federation uses multiple
independent Namenodes/namespaces. The Namenodes are federated, that is, the
Namenodes are independent and dont require coordination with each other.
The datanodes are used as common storage for blocks by all the Namenodes.
Each datanode registers with all the Namenodes in the cluster. Datanodes
send periodic heartbeats and block reports and handles commands from the
In order to scale the name service horizontally, federation uses multiple
independent Namenodes/namespaces. The Namenodes are federated, that is, the
Namenodes are independent and do not require coordination with each other.
The datanodes are used as common storage for blocks by all the Namenodes.
Each datanode registers with all the Namenodes in the cluster. Datanodes
send periodic heartbeats and block reports and handles commands from the
Namenodes.
Users may use {{{./ViewFs.html}ViewFs}} to create personalized namespace views,
@ -78,48 +78,48 @@ HDFS Federation
<<Block Pool>>
A Block Pool is a set of blocks that belong to a single namespace.
A Block Pool is a set of blocks that belong to a single namespace.
Datanodes store blocks for all the block pools in the cluster.
It is managed independently of other block pools. This allows a namespace
to generate Block IDs for new blocks without the need for coordination
with the other namespaces. The failure of a Namenode does not prevent
It is managed independently of other block pools. This allows a namespace
to generate Block IDs for new blocks without the need for coordination
with the other namespaces. The failure of a Namenode does not prevent
the datanode from serving other Namenodes in the cluster.
A Namespace and its block pool together are called Namespace Volume.
It is a self-contained unit of management. When a Namenode/namespace
A Namespace and its block pool together are called Namespace Volume.
It is a self-contained unit of management. When a Namenode/namespace
is deleted, the corresponding block pool at the datanodes is deleted.
Each namespace volume is upgraded as a unit, during cluster upgrade.
<<ClusterID>>
A new identifier <<ClusterID>> is added to identify all the nodes in
the cluster. When a Namenode is formatted, this identifier is provided
or auto generated. This ID should be used for formatting the other
A new identifier <<ClusterID>> is added to identify all the nodes in
the cluster. When a Namenode is formatted, this identifier is provided
or auto generated. This ID should be used for formatting the other
Namenodes into the cluster.
** Key Benefits
* Namespace Scalability - HDFS cluster storage scales horizontally but
the namespace does not. Large deployments or deployments using lot
of small files benefit from scaling the namespace by adding more
* Namespace Scalability - HDFS cluster storage scales horizontally but
the namespace does not. Large deployments or deployments using lot
of small files benefit from scaling the namespace by adding more
Namenodes to the cluster
* Performance - File system operation throughput is limited by a single
Namenode in the prior architecture. Adding more Namenodes to the cluster
scales the file system read/write operations throughput.
* Isolation - A single Namenode offers no isolation in multi user
environment. An experimental application can overload the Namenode
and slow down production critical applications. With multiple Namenodes,
different categories of applications and users can be isolated to
* Isolation - A single Namenode offers no isolation in multi user
environment. An experimental application can overload the Namenode
and slow down production critical applications. With multiple Namenodes,
different categories of applications and users can be isolated to
different namespaces.
* {Federation Configuration}
Federation configuration is <<backward compatible>> and allows existing
single Namenode configuration to work without any change. The new
configuration is designed such that all the nodes in the cluster have
same configuration without the need for deploying different configuration
Federation configuration is <<backward compatible>> and allows existing
single Namenode configuration to work without any change. The new
configuration is designed such that all the nodes in the cluster have
same configuration without the need for deploying different configuration
based on the type of the node in the cluster.
A new abstraction called <<<NameServiceID>>> is added with
@ -132,12 +132,12 @@ HDFS Federation
** Configuration:
<<Step 1>>: Add the following parameters to your configuration:
<<<dfs.nameservices>>>: Configure with list of comma separated
NameServiceIDs. This will be used by Datanodes to determine all the
<<<dfs.nameservices>>>: Configure with list of comma separated
NameServiceIDs. This will be used by Datanodes to determine all the
Namenodes in the cluster.
<<Step 2>>: For each Namenode and Secondary Namenode/BackupNode/Checkpointer
add the following configuration suffixed with the corresponding
<<Step 2>>: For each Namenode and Secondary Namenode/BackupNode/Checkpointer
add the following configuration suffixed with the corresponding
<<<NameServiceID>>> into the common configuration file.
*---------------------+--------------------------------------------+
@ -159,7 +159,7 @@ HDFS Federation
| BackupNode | <<<dfs.namenode.backup.address>>> |
| | <<<dfs.secondary.namenode.keytab.file>>> |
*---------------------+--------------------------------------------+
Here is an example configuration with two namenodes:
----
@ -200,31 +200,31 @@ HDFS Federation
** Formatting Namenodes
<<Step 1>>: Format a namenode using the following command:
----
> $HADOOP_PREFIX_HOME/bin/hdfs namenode -format [-clusterId <cluster_id>]
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format [-clusterId <cluster_id>]
----
Choose a unique cluster_id, which will not conflict other clusters in
your environment. If it is not provided, then a unique ClusterID is
Choose a unique cluster_id, which will not conflict other clusters in
your environment. If it is not provided, then a unique ClusterID is
auto generated.
<<Step 2>>: Format additional namenode using the following command:
----
> $HADOOP_PREFIX_HOME/bin/hdfs namenode -format -clusterId <cluster_id>
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format -clusterId <cluster_id>
----
Note that the cluster_id in step 2 must be same as that of the
cluster_id in step 1. If they are different, the additional Namenodes
Note that the cluster_id in step 2 must be same as that of the
cluster_id in step 1. If they are different, the additional Namenodes
will not be part of the federated cluster.
** Upgrading from an older release and configuring federation
Older releases supported a single Namenode.
Older releases supported a single Namenode.
Upgrade the cluster to newer release to enable federation
During upgrade you can provide a ClusterID as follows:
----
> $HADOOP_PREFIX_HOME/bin/hdfs start namenode --config $HADOOP_CONF_DIR -upgrade -clusterId <cluster_ID>
[hdfs]$ $HADOOP_PREFIX/bin/hdfs start namenode --config $HADOOP_CONF_DIR -upgrade -clusterId <cluster_ID>
----
If ClusterID is not provided, it is auto generated.
@ -234,8 +234,8 @@ HDFS Federation
* Add configuration parameter <<<dfs.nameservices>>> to the configuration.
* Update the configuration with NameServiceID suffix. Configuration
key names have changed post release 0.20. You must use new configuration
* Update the configuration with NameServiceID suffix. Configuration
key names have changed post release 0.20. You must use new configuration
parameter names, for federation.
* Add new Namenode related config to the configuration files.
@ -244,11 +244,11 @@ HDFS Federation
* Start the new Namenode, Secondary/Backup.
* Refresh the datanodes to pickup the newly added Namenode by running
* Refresh the datanodes to pickup the newly added Namenode by running
the following command:
----
> $HADOOP_PREFIX_HOME/bin/hdfs dfadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
----
* The above command must be run against all the datanodes in the cluster.
@ -260,37 +260,37 @@ HDFS Federation
To start the cluster run the following command:
----
> $HADOOP_PREFIX_HOME/bin/start-dfs.sh
[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
----
To stop the cluster run the following command:
----
> $HADOOP_PREFIX_HOME/bin/stop-dfs.sh
[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
----
These commands can be run from any node where the HDFS configuration is
available. The command uses configuration to determine the Namenodes
in the cluster and starts the Namenode process on those nodes. The
datanodes are started on nodes specified in the <<<slaves>>> file. The
script can be used as reference for building your own scripts for
These commands can be run from any node where the HDFS configuration is
available. The command uses configuration to determine the Namenodes
in the cluster and starts the Namenode process on those nodes. The
datanodes are started on nodes specified in the <<<slaves>>> file. The
script can be used as reference for building your own scripts for
starting and stopping the cluster.
** Balancer
Balancer has been changed to work with multiple Namenodes in the cluster to
Balancer has been changed to work with multiple Namenodes in the cluster to
balance the cluster. Balancer can be run using the command:
----
"$HADOOP_PREFIX"/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start balancer [-policy <policy>]
[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start balancer [-policy <policy>]
----
Policy could be:
* <<<datanode>>> - this is the <default> policy. This balances the storage at
* <<<datanode>>> - this is the <default> policy. This balances the storage at
the datanode level. This is similar to balancing policy from prior releases.
* <<<blockpool>>> - this balances the storage at the block pool level.
* <<<blockpool>>> - this balances the storage at the block pool level.
Balancing at block pool level balances storage at the datanode level also.
Note that Balancer only balances the data and does not balance the namespace.
@ -298,43 +298,43 @@ HDFS Federation
** Decommissioning
Decommissioning is similar to prior releases. The nodes that need to be
decomissioned are added to the exclude file at all the Namenode. Each
Namenode decommissions its Block Pool. When all the Namenodes finish
Decommissioning is similar to prior releases. The nodes that need to be
decomissioned are added to the exclude file at all the Namenode. Each
Namenode decommissions its Block Pool. When all the Namenodes finish
decommissioning a datanode, the datanode is considered to be decommissioned.
<<Step 1>>: To distributed an exclude file to all the Namenodes, use the
<<Step 1>>: To distributed an exclude file to all the Namenodes, use the
following command:
----
"$HADOOP_PREFIX"/bin/distributed-exclude.sh <exclude_file>
[hdfs]$ $HADOOP_PREFIX/sbin/distributed-exclude.sh <exclude_file>
----
<<Step 2>>: Refresh all the Namenodes to pick up the new exclude file.
----
"$HADOOP_PREFIX"/bin/refresh-namenodes.sh
[hdfs]$ $HADOOP_PREFIX/sbin/refresh-namenodes.sh
----
The above command uses HDFS configuration to determine the Namenodes
configured in the cluster and refreshes all the Namenodes to pick up
The above command uses HDFS configuration to determine the Namenodes
configured in the cluster and refreshes all the Namenodes to pick up
the new exclude file.
** Cluster Web Console
Similar to Namenode status web page, a Cluster Web Console is added in
federation to monitor the federated cluster at
Similar to Namenode status web page, a Cluster Web Console is added in
federation to monitor the federated cluster at
<<<http://<any_nn_host:port>/dfsclusterhealth.jsp>>>.
Any Namenode in the cluster can be used to access this web page.
The web page provides the following information:
* Cluster summary that shows number of files, number of blocks and
total configured storage capacity, available and used storage information
* Cluster summary that shows number of files, number of blocks and
total configured storage capacity, available and used storage information
for the entire cluster.
* Provides list of Namenodes and summary that includes number of files,
blocks, missing blocks, number of live and dead data nodes for each
blocks, missing blocks, number of live and dead data nodes for each
Namenode. It also provides a link to conveniently access Namenode web UI.
* It also provides decommissioning status of datanodes.

View File

@ -18,7 +18,7 @@
HDFS Commands Guide
%{toc|section=1|fromDepth=2|toDepth=4}
%{toc|section=1|fromDepth=2|toDepth=3}
* Overview
@ -26,39 +26,37 @@ HDFS Commands Guide
hdfs script without any arguments prints the description for all
commands.
Usage: <<<hdfs [--config confdir] [--loglevel loglevel] [COMMAND]
[GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
Usage: <<<hdfs [SHELL_OPTIONS] COMMAND [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
Hadoop has an option parsing framework that employs parsing generic options
as well as running classes.
Hadoop has an option parsing framework that employs parsing generic options as
well as running classes.
*-----------------------+---------------+
|| COMMAND_OPTION || Description
*-----------------------+---------------+
| <<<--config confdir>>>| Overwrites the default Configuration directory.
| | Default is <<<${HADOOP_HOME}/conf>>>.
*-----------------------+---------------+
| <<<--loglevel loglevel>>>| Overwrites the log level. Valid log levels are
| | FATAL, ERROR, WARN, INFO, DEBUG, and TRACE.
| | Default is INFO.
*-----------------------+---------------+
| GENERIC_OPTIONS | The common set of options supported by multiple
| | commands. Full list is
| | {{{../hadoop-common/CommandsManual.html#Generic_Options}here}}.
*-----------------------+---------------+
| COMMAND_OPTIONS | Various commands with their options are described in
| | the following sections. The commands have been
| | grouped into {{{User Commands}}} and
| | {{{Administration Commands}}}.
*-----------------------+---------------+
*---------------+--------------+
|| COMMAND_OPTIONS || Description |
*-------------------------+-------------+
| SHELL_OPTIONS | The common set of shell options. These are documented on the {{{../../hadoop-project-dist/hadoop-common/CommandsManual.html#Shell Options}Commands Manual}} page.
*-------------------------+----+
| GENERIC_OPTIONS | The common set of options supported by multiple commands. See the Hadoop {{{../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic Options}Commands Manual}} for more information.
*------------------+---------------+
| COMMAND COMMAND_OPTIONS | Various commands with their options are described
| | in the following sections. The commands have been
| | grouped into {{User Commands}} and
| | {{Administration Commands}}.
*-------------------------+--------------+
* User Commands
* {User Commands}
Commands useful for users of a hadoop cluster.
** <<<classpath>>>
Usage: <<<hdfs classpath>>>
Prints the class path needed to get the Hadoop jar and the required libraries
** <<<dfs>>>
Usage: <<<hdfs dfs [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
Usage: <<<hdfs dfs [COMMAND [COMMAND_OPTIONS]]>>>
Run a filesystem command on the file system supported in Hadoop.
The various COMMAND_OPTIONS can be found at
@ -66,97 +64,307 @@ HDFS Commands Guide
** <<<fetchdt>>>
Gets Delegation Token from a NameNode.
See {{{./HdfsUserGuide.html#fetchdt}fetchdt}} for more info.
Usage: <<<hdfs fetchdt [GENERIC_OPTIONS]
[--webservice <namenode_http_addr>] <path> >>>
Usage: <<<hdfs fetchdt [--webservice <namenode_http_addr>] <path> >>>
*------------------------------+---------------------------------------------+
|| COMMAND_OPTION || Description
*------------------------------+---------------------------------------------+
| <fileName> | File name to store the token into.
*------------------------------+---------------------------------------------+
| --webservice <https_address> | use http protocol instead of RPC
*------------------------------+---------------------------------------------+
| <fileName> | File name to store the token into.
*------------------------------+---------------------------------------------+
Gets Delegation Token from a NameNode.
See {{{./HdfsUserGuide.html#fetchdt}fetchdt}} for more info.
** <<<fsck>>>
Runs a HDFS filesystem checking utility.
See {{{./HdfsUserGuide.html#fsck}fsck}} for more info.
Usage:
Usage: <<<hdfs fsck [GENERIC_OPTIONS] <path>
[-list-corruptfileblocks |
---
hdfs fsck <path>
[-list-corruptfileblocks |
[-move | -delete | -openforwrite]
[-files [-blocks [-locations | -racks]]]
[-includeSnapshots] [-showprogress]>>>
[-includeSnapshots] [-showprogress]
---
*------------------------+---------------------------------------------+
|| COMMAND_OPTION || Description
*------------------------+---------------------------------------------+
| <path> | Start checking from this path.
*------------------------+---------------------------------------------+
| -move | Move corrupted files to /lost+found.
*------------------------+---------------------------------------------+
| -delete | Delete corrupted files.
*------------------------+---------------------------------------------+
| -files | Print out files being checked.
*------------------------+---------------------------------------------+
| -openforwrite | Print out files opened for write.
| -files -blocks | Print out the block report
*------------------------+---------------------------------------------+
| | Include snapshot data if the given path
| -includeSnapshots | indicates a snapshottable directory or
| -files -blocks -locations | Print out locations for every block.
*------------------------+---------------------------------------------+
| -files -blocks -racks | Print out network topology for data-node locations.
*------------------------+---------------------------------------------+
| | Include snapshot data if the given path
| -includeSnapshots | indicates a snapshottable directory or
| | there are snapshottable directories under it.
*------------------------+---------------------------------------------+
| -list-corruptfileblocks| Print out list of missing blocks and
| -list-corruptfileblocks| Print out list of missing blocks and
| | files they belong to.
*------------------------+---------------------------------------------+
| -blocks | Print out block report.
| -move | Move corrupted files to /lost+found.
*------------------------+---------------------------------------------+
| -locations | Print out locations for every block.
*------------------------+---------------------------------------------+
| -racks | Print out network topology for data-node locations.
| -openforwrite | Print out files opened for write.
*------------------------+---------------------------------------------+
| -showprogress | Print out dots for progress in output. Default is OFF
| | (no progress).
*------------------------+---------------------------------------------+
Runs the HDFS filesystem checking utility.
See {{{./HdfsUserGuide.html#fsck}fsck}} for more info.
** <<<getconf>>>
Usage:
---
hdfs getconf -namenodes
hdfs getconf -secondaryNameNodes
hdfs getconf -backupNodes
hdfs getconf -includeFile
hdfs getconf -excludeFile
hdfs getconf -nnRpcAddresses
hdfs getconf -confKey [key]
---
*------------------------+---------------------------------------------+
|| COMMAND_OPTION || Description
*------------------------+---------------------------------------------+
| -namenodes | gets list of namenodes in the cluster.
*------------------------+---------------------------------------------+
| -secondaryNameNodes | gets list of secondary namenodes in the cluster.
*------------------------+---------------------------------------------+
| -backupNodes | gets list of backup nodes in the cluster.
*------------------------+---------------------------------------------+
| -includeFile | gets the include file path that defines the datanodes that can join the cluster.
*------------------------+---------------------------------------------+
| -excludeFile | gets the exclude file path that defines the datanodes that need to decommissioned.
*------------------------+---------------------------------------------+
| -nnRpcAddresses | gets the namenode rpc addresses
*------------------------+---------------------------------------------+
| -confKey [key] | gets a specific key from the configuration
*------------------------+---------------------------------------------+
Gets configuration information from the configuration directory, post-processing.
** <<<groups>>>
Usage: <<<hdfs groups [username ...]>>>
Returns the group information given one or more usernames.
** <<<lsSnapshottableDir>>>
Usage: <<<hdfs lsSnapshottableDir [-help]>>>
*------------------------+---------------------------------------------+
|| COMMAND_OPTION || Description
*------------------------+---------------------------------------------+
| -help | print help
*------------------------+---------------------------------------------+
Get the list of snapshottable directories. When this is run as a super user,
it returns all snapshottable directories. Otherwise it returns those directories
that are owned by the current user.
** <<<jmxget>>>
Usage: <<<hdfs jmxget [-localVM ConnectorURL | -port port | -server mbeanserver | -service service]>>>
*------------------------+---------------------------------------------+
|| COMMAND_OPTION || Description
*------------------------+---------------------------------------------+
| -help | print help
*------------------------+---------------------------------------------+
| -localVM ConnectorURL | connect to the VM on the same machine
*------------------------+---------------------------------------------+
| -port <mbean server port> | specify mbean server port, if missing
| | it will try to connect to MBean Server in
| | the same VM
*------------------------+---------------------------------------------+
| -service | specify jmx service, either DataNode or NameNode, the default
*------------------------+---------------------------------------------+
Dump JMX information from a service.
** <<<oev>>>
Usage: <<<hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE>>>
*** Required command line arguments:
*------------------------+---------------------------------------------+
|| COMMAND_OPTION || Description
*------------------------+---------------------------------------------+
|-i,--inputFile <arg> | edits file to process, xml (case
| insensitive) extension means XML format,
| any other filename means binary format
*------------------------+---------------------------------------------+
| -o,--outputFile <arg> | Name of output file. If the specified
| file exists, it will be overwritten,
| format of the file is determined
| by -p option
*------------------------+---------------------------------------------+
*** Optional command line arguments:
*------------------------+---------------------------------------------+
|| COMMAND_OPTION || Description
*------------------------+---------------------------------------------+
| -f,--fix-txids | Renumber the transaction IDs in the input,
| so that there are no gaps or invalid transaction IDs.
*------------------------+---------------------------------------------+
| -h,--help | Display usage information and exit
*------------------------+---------------------------------------------+
| -r,--recover | When reading binary edit logs, use recovery
| mode. This will give you the chance to skip
| corrupt parts of the edit log.
*------------------------+---------------------------------------------+
| -p,--processor <arg> | Select which type of processor to apply
| against image file, currently supported
| processors are: binary (native binary format
| that Hadoop uses), xml (default, XML
| format), stats (prints statistics about
| edits file)
*------------------------+---------------------------------------------+
| -v,--verbose | More verbose output, prints the input and
| output filenames, for processors that write
| to a file, also output to screen. On large
| image files this will dramatically increase
| processing time (default is false).
*------------------------+---------------------------------------------+
Hadoop offline edits viewer.
** <<<oiv>>>
Usage: <<<hdfs oiv [OPTIONS] -i INPUT_FILE>>>
*** Required command line arguments:
*------------------------+---------------------------------------------+
|| COMMAND_OPTION || Description
*------------------------+---------------------------------------------+
|-i,--inputFile <arg> | edits file to process, xml (case
| insensitive) extension means XML format,
| any other filename means binary format
*------------------------+---------------------------------------------+
*** Optional command line arguments:
*------------------------+---------------------------------------------+
|| COMMAND_OPTION || Description
*------------------------+---------------------------------------------+
| -h,--help | Display usage information and exit
*------------------------+---------------------------------------------+
| -o,--outputFile <arg> | Name of output file. If the specified
| file exists, it will be overwritten,
| format of the file is determined
| by -p option
*------------------------+---------------------------------------------+
| -p,--processor <arg> | Select which type of processor to apply
| against image file, currently supported
| processors are: binary (native binary format
| that Hadoop uses), xml (default, XML
| format), stats (prints statistics about
| edits file)
*------------------------+---------------------------------------------+
Hadoop Offline Image Viewer for newer image files.
** <<<oiv_legacy>>>
Usage: <<<hdfs oiv_legacy [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE>>>
*------------------------+---------------------------------------------+
|| COMMAND_OPTION || Description
*------------------------+---------------------------------------------+
| -h,--help | Display usage information and exit
*------------------------+---------------------------------------------+
|-i,--inputFile <arg> | edits file to process, xml (case
| insensitive) extension means XML format,
| any other filename means binary format
*------------------------+---------------------------------------------+
| -o,--outputFile <arg> | Name of output file. If the specified
| file exists, it will be overwritten,
| format of the file is determined
| by -p option
*------------------------+---------------------------------------------+
Hadoop offline image viewer for older versions of Hadoop.
** <<<snapshotDiff>>>
Usage: <<<hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot> >>>
Determine the difference between HDFS snapshots. See the
{{{./HdfsSnapshots.html#Get_Snapshots_Difference_Report}HDFS Snapshot Documentation}} for more information.
** <<<version>>>
Prints the version.
Usage: <<<hdfs version>>>
Prints the version.
* Administration Commands
Commands useful for administrators of a hadoop cluster.
** <<<balancer>>>
Runs a cluster balancing utility. An administrator can simply press Ctrl-C
to stop the rebalancing process. See
{{{./HdfsUserGuide.html#Balancer}Balancer}} for more details.
Usage: <<<hdfs balancer [-threshold <threshold>] [-policy <policy>]>>>
*------------------------+----------------------------------------------------+
|| COMMAND_OPTION | Description
*------------------------+----------------------------------------------------+
| -threshold <threshold> | Percentage of disk capacity. This overwrites the
| | default threshold.
*------------------------+----------------------------------------------------+
| -policy <policy> | <<<datanode>>> (default): Cluster is balanced if
| | each datanode is balanced. \
| | <<<blockpool>>>: Cluster is balanced if each block
| | pool in each datanode is balanced.
*------------------------+----------------------------------------------------+
| -threshold <threshold> | Percentage of disk capacity. This overwrites the
| | default threshold.
*------------------------+----------------------------------------------------+
Runs a cluster balancing utility. An administrator can simply press Ctrl-C
to stop the rebalancing process. See
{{{./HdfsUserGuide.html#Balancer}Balancer}} for more details.
Note that the <<<blockpool>>> policy is more strict than the <<<datanode>>>
policy.
** <<<datanode>>>
** <<<cacheadmin>>>
Runs a HDFS datanode.
Usage: <<<hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]>>>
See the {{{./CentralizedCacheManagement.html#cacheadmin_command-line_interface}HDFS Cache Administration Documentation}} for more information.
** <<<crypto>>>
Usage:
---
hdfs crypto -createZone -keyName <keyName> -path <path>
hdfs crypto -help <command-name>
hdfs crypto -listZones
---
See the {{{./TransparentEncryption.html#crypto_command-line_interface}HDFS Transparent Encryption Documentation}} for more information.
** <<<datanode>>>
Usage: <<<hdfs datanode [-regular | -rollback | -rollingupgrace rollback]>>>
@ -172,12 +380,14 @@ HDFS Commands Guide
| -rollingupgrade rollback | Rollback a rolling upgrade operation.
*-----------------+-----------------------------------------------------------+
Runs a HDFS datanode.
** <<<dfsadmin>>>
Runs a HDFS dfsadmin client.
Usage:
+------------------------------------------+
Usage: hdfs dfsadmin [GENERIC_OPTIONS]
------------------------------------------
hdfs dfsadmin [GENERIC_OPTIONS]
[-report [-live] [-dead] [-decommissioning]]
[-safemode enter | leave | get | wait]
[-saveNamespace]
@ -210,7 +420,7 @@ HDFS Commands Guide
[-getDatanodeInfo <datanode_host:ipc_port>]
[-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
[-help [cmd]]
+------------------------------------------+
------------------------------------------
*-----------------+-----------------------------------------------------------+
|| COMMAND_OPTION || Description
@ -323,11 +533,11 @@ HDFS Commands Guide
*-----------------+-----------------------------------------------------------+
| -allowSnapshot \<snapshotDir\> | Allowing snapshots of a directory to be
| created. If the operation completes successfully, the
| directory becomes snapshottable.
| directory becomes snapshottable. See the {{{./HdfsSnapshots.html}HDFS Snapshot Documentation}} for more information.
*-----------------+-----------------------------------------------------------+
| -disallowSnapshot \<snapshotDir\> | Disallowing snapshots of a directory to
| be created. All snapshots of the directory must be deleted
| before disallowing snapshots.
| before disallowing snapshots. See the {{{./HdfsSnapshots.html}HDFS Snapshot Documentation}} for more information.
*-----------------+-----------------------------------------------------------+
| -fetchImage \<local directory\> | Downloads the most recent fsimage from the
| NameNode and saves it in the specified local directory.
@ -351,30 +561,68 @@ HDFS Commands Guide
| is specified.
*-----------------+-----------------------------------------------------------+
** <<<mover>>>
Runs a HDFS dfsadmin client.
Runs the data migration utility.
See {{{./ArchivalStorage.html#Mover_-_A_New_Data_Migration_Tool}Mover}} for more details.
** <<<haadmin>>>
Usage:
---
hdfs haadmin -checkHealth <serviceId>
hdfs haadmin -failover [--forcefence] [--forceactive] <serviceId> <serviceId>
hdfs haadmin -getServiceState <serviceId>
hdfs haadmin -help <command>
hdfs haadmin -transitionToActive <serviceId> [--forceactive]
hdfs haadmin -transitionToStandby <serviceId>
---
*--------------------+--------------------------------------------------------+
|| COMMAND_OPTION || Description
*--------------------+--------------------------------------------------------+
| -checkHealth | check the health of the given NameNode
*--------------------+--------------------------------------------------------+
| -failover | initiate a failover between two NameNodes
*--------------------+--------------------------------------------------------+
| -getServiceState | determine whether the given NameNode is Active or Standby
*--------------------+--------------------------------------------------------+
| -transitionToActive | transition the state of the given NameNode to Active (Warning: No fencing is done)
*--------------------+--------------------------------------------------------+
| -transitionToStandby | transition the state of the given NameNode to Standby (Warning: No fencing is done)
*--------------------+--------------------------------------------------------+
See {{{./HDFSHighAvailabilityWithNFS.html#Administrative_commands}HDFS HA with NFS}} or
{{{./HDFSHighAvailabilityWithQJM.html#Administrative_commands}HDFS HA with QJM}} for more
information on this command.
** <<<journalnode>>>
Usage: <<<hdfs journalnode>>>
This comamnd starts a journalnode for use with {{{./HDFSHighAvailabilityWithQJM.html#Administrative_commands}HDFS HA with QJM}}.
** <<<mover>>>
Usage: <<<hdfs mover [-p <files/dirs> | -f <local file name>]>>>
*--------------------+--------------------------------------------------------+
|| COMMAND_OPTION || Description
*--------------------+--------------------------------------------------------+
| -p \<files/dirs\> | Specify a space separated list of HDFS files/dirs to migrate.
*--------------------+--------------------------------------------------------+
| -f \<local file\> | Specify a local file containing a list of HDFS files/dirs to migrate.
*--------------------+--------------------------------------------------------+
| -p \<files/dirs\> | Specify a space separated list of HDFS files/dirs to migrate.
*--------------------+--------------------------------------------------------+
Runs the data migration utility.
See {{{./ArchivalStorage.html#Mover_-_A_New_Data_Migration_Tool}Mover}} for more details.
Note that, when both -p and -f options are omitted, the default path is the root directory.
** <<<namenode>>>
Runs the namenode. More info about the upgrade, rollback and finalize is at
{{{./HdfsUserGuide.html#Upgrade_and_Rollback}Upgrade Rollback}}.
Usage:
+------------------------------------------+
Usage: hdfs namenode [-backup] |
------------------------------------------
hdfs namenode [-backup] |
[-checkpoint] |
[-format [-clusterid cid ] [-force] [-nonInteractive] ] |
[-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] |
@ -387,7 +635,7 @@ HDFS Commands Guide
[-bootstrapStandby] |
[-recover [-force] ] |
[-metadataVersion ]
+------------------------------------------+
------------------------------------------
*--------------------+--------------------------------------------------------+
|| COMMAND_OPTION || Description
@ -443,11 +691,23 @@ HDFS Commands Guide
| metadata versions of the software and the image.
*--------------------+--------------------------------------------------------+
** <<<secondarynamenode>>>
Runs the namenode. More info about the upgrade, rollback and finalize is at
{{{./HdfsUserGuide.html#Upgrade_and_Rollback}Upgrade Rollback}}.
Runs the HDFS secondary namenode.
See {{{./HdfsUserGuide.html#Secondary_NameNode}Secondary Namenode}}
for more info.
** <<<nfs3>>>
Usage: <<<hdfs nfs3>>>
This comamnd starts the NFS3 gateway for use with the {{{./HdfsNfsGateway.html#Start_and_stop_NFS_gateway_service}HDFS NFS3 Service}}.
** <<<portmap>>>
Usage: <<<hdfs portmap>>>
This comamnd starts the RPC portmap for use with the {{{./HdfsNfsGateway.html#Start_and_stop_NFS_gateway_service}HDFS NFS3 Service}}.
** <<<secondarynamenode>>>
Usage: <<<hdfs secondarynamenode [-checkpoint [force]] | [-format] |
[-geteditsize]>>>
@ -465,6 +725,33 @@ HDFS Commands Guide
| the NameNode.
*----------------------+------------------------------------------------------+
Runs the HDFS secondary namenode.
See {{{./HdfsUserGuide.html#Secondary_NameNode}Secondary Namenode}}
for more info.
** <<<storagepolicies>>>
Usage: <<<hdfs storagepolicies>>>
Lists out all storage policies. See the {{{./ArchivalStorage.html}HDFS Storage Policy Documentation}} for more information.
** <<<zkfc>>>
Usage: <<<hdfs zkfc [-formatZK [-force] [-nonInteractive]]>>>
*----------------------+------------------------------------------------------+
|| COMMAND_OPTION || Description
*----------------------+------------------------------------------------------+
| -formatZK | Format the Zookeeper instance
*----------------------+------------------------------------------------------+
| -h | Display help
*----------------------+------------------------------------------------------+
This comamnd starts a Zookeeper Failover Controller process for use with {{{./HDFSHighAvailabilityWithQJM.html#Administrative_commands}HDFS HA with QJM}}.
* Debug Commands
Useful commands to help administrators debug HDFS issues, like validating
@ -472,30 +759,25 @@ HDFS Commands Guide
** <<<verify>>>
Verify HDFS metadata and block files. If a block file is specified, we
will verify that the checksums in the metadata file match the block
file.
Usage: <<<hdfs dfs verify [-meta <metadata-file>] [-block <block-file>]>>>
*------------------------+----------------------------------------------------+
|| COMMAND_OPTION | Description
*------------------------+----------------------------------------------------+
| -meta <metadata-file> | Absolute path for the metadata file on the local file
| | system of the data node.
*------------------------+----------------------------------------------------+
| -block <block-file> | Optional parameter to specify the absolute path for
| | the block file on the local file system of the data
| | node.
*------------------------+----------------------------------------------------+
| -meta <metadata-file> | Absolute path for the metadata file on the local file
| | system of the data node.
*------------------------+----------------------------------------------------+
Verify HDFS metadata and block files. If a block file is specified, we
will verify that the checksums in the metadata file match the block
file.
** <<<recoverLease>>>
Recover the lease on the specified path. The path must reside on an
HDFS filesystem. The default number of retries is 1.
Usage: <<<hdfs dfs recoverLease [-path <path>] [-retries <num-retries>]>>>
*-------------------------------+--------------------------------------------+
@ -507,3 +789,6 @@ HDFS Commands Guide
| | recoverLease. The default number of retries
| | is 1.
*-------------------------------+---------------------------------------------+
Recover the lease on the specified path. The path must reside on an
HDFS filesystem. The default number of retries is 1.

View File

@ -25,7 +25,7 @@ HDFS High Availability
This guide provides an overview of the HDFS High Availability (HA) feature and
how to configure and manage an HA HDFS cluster, using NFS for the shared
storage required by the NameNodes.
This document assumes that the reader has a general understanding of
general components and node types in an HDFS cluster. Please refer to the
HDFS Architecture guide for details.
@ -44,7 +44,7 @@ HDFS High Availability
an HDFS cluster. Each cluster had a single NameNode, and if that machine or
process became unavailable, the cluster as a whole would be unavailable
until the NameNode was either restarted or brought up on a separate machine.
This impacted the total availability of the HDFS cluster in two major ways:
* In the case of an unplanned event such as a machine crash, the cluster would
@ -52,7 +52,7 @@ HDFS High Availability
* Planned maintenance events such as software or hardware upgrades on the
NameNode machine would result in windows of cluster downtime.
The HDFS High Availability feature addresses the above problems by providing
the option of running two redundant NameNodes in the same cluster in an
Active/Passive configuration with a hot standby. This allows a fast failover to
@ -67,7 +67,7 @@ HDFS High Availability
for all client operations in the cluster, while the Standby is simply acting
as a slave, maintaining enough state to provide a fast failover if
necessary.
In order for the Standby node to keep its state synchronized with the Active
node, the current implementation requires that the two nodes both have access
to a directory on a shared storage device (eg an NFS mount from a NAS). This
@ -80,12 +80,12 @@ HDFS High Availability
a failover, the Standby will ensure that it has read all of the edits from the
shared storage before promoting itself to the Active state. This ensures that
the namespace state is fully synchronized before a failover occurs.
In order to provide a fast failover, it is also necessary that the Standby node
have up-to-date information regarding the location of blocks in the cluster.
In order to achieve this, the DataNodes are configured with the location of
both NameNodes, and send block location information and heartbeats to both.
It is vital for the correct operation of an HA cluster that only one of the
NameNodes be Active at a time. Otherwise, the namespace state would quickly
diverge between the two, risking data loss or other incorrect results. In
@ -116,7 +116,7 @@ HDFS High Availability
network, and power). Beacuse of this, it is recommended that the shared storage
server be a high-quality dedicated NAS appliance rather than a simple Linux
server.
Note that, in an HA cluster, the Standby NameNode also performs checkpoints of
the namespace state, and thus it is not necessary to run a Secondary NameNode,
CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an
@ -133,7 +133,7 @@ HDFS High Availability
The new configuration is designed such that all the nodes in the cluster may
have the same configuration without the need for deploying different
configuration files to different machines based on the type of the node.
Like HDFS Federation, HA clusters reuse the <<<nameservice ID>>> to identify a
single HDFS instance that may in fact consist of multiple HA NameNodes. In
addition, a new abstraction called <<<NameNode ID>>> is added with HA. Each
@ -330,7 +330,7 @@ HDFS High Availability
<<dfs_namenode_rpc-address>> will contain the RPC address of the target node, even
though the configuration may specify that variable as
<<dfs.namenode.rpc-address.ns1.nn1>>.
Additionally, the following variables referring to the target node to be fenced
are also available:
@ -345,7 +345,7 @@ HDFS High Availability
*-----------------------:-----------------------------------+
| $target_namenodeid | the namenode ID of the NN to be fenced |
*-----------------------:-----------------------------------+
These environment variables may also be used as substitutions in the shell
command itself. For example:
@ -355,7 +355,7 @@ HDFS High Availability
<value>shell(/path/to/my/script.sh --nameservice=$target_nameserviceid $target_host:$target_port)</value>
</property>
---
If the shell command returns an exit
code of 0, the fencing is determined to be successful. If it returns any other
exit code, the fencing was not successful and the next fencing method in the
@ -386,7 +386,7 @@ HDFS High Availability
* If you are setting up a fresh HDFS cluster, you should first run the format
command (<hdfs namenode -format>) on one of NameNodes.
* If you have already formatted the NameNode, or are converting a
non-HA-enabled cluster to be HA-enabled, you should now copy over the
contents of your NameNode metadata directories to the other, unformatted
@ -394,7 +394,7 @@ HDFS High Availability
unformatted NameNode. Running this command will also ensure that the shared
edits directory (as configured by <<dfs.namenode.shared.edits.dir>>) contains
sufficient edits transactions to be able to start both NameNodes.
* If you are converting a non-HA NameNode to be HA, you should run the
command "<hdfs -initializeSharedEdits>", which will initialize the shared
edits directory with the edits data from the local NameNode edits directories.
@ -484,7 +484,7 @@ Usage: DFSHAAdmin [-ns <nameserviceId>]
of coordination data, notifying clients of changes in that data, and
monitoring clients for failures. The implementation of automatic HDFS failover
relies on ZooKeeper for the following things:
* <<Failure detection>> - each of the NameNode machines in the cluster
maintains a persistent session in ZooKeeper. If the machine crashes, the
ZooKeeper session will expire, notifying the other NameNode that a failover
@ -585,7 +585,7 @@ Usage: DFSHAAdmin [-ns <nameserviceId>]
from one of the NameNode hosts.
----
$ hdfs zkfc -formatZK
[hdfs]$ $HADOOP_PREFIX/bin/zkfc -formatZK
----
This will create a znode in ZooKeeper inside of which the automatic failover
@ -605,7 +605,7 @@ $ hdfs zkfc -formatZK
can start the daemon by running:
----
$ hadoop-daemon.sh start zkfc
[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start zkfc
----
** Securing access to ZooKeeper
@ -646,7 +646,7 @@ digest:hdfs-zkfcs:mypassword
a command like the following:
----
$ java -cp $ZK_HOME/lib/*:$ZK_HOME/zookeeper-3.4.2.jar org.apache.zookeeper.server.auth.DigestAuthenticationProvider hdfs-zkfcs:mypassword
[hdfs]$ java -cp $ZK_HOME/lib/*:$ZK_HOME/zookeeper-3.4.2.jar org.apache.zookeeper.server.auth.DigestAuthenticationProvider hdfs-zkfcs:mypassword
output: hdfs-zkfcs:mypassword->hdfs-zkfcs:P/OQvnYyU/nF/mGYvB/xurX8dYs=
----
@ -726,24 +726,24 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
using the same <<<hdfs haadmin>>> command. It will perform a coordinated
failover.
* BookKeeper as a Shared storage (EXPERIMENTAL)
One option for shared storage for the NameNode is BookKeeper.
One option for shared storage for the NameNode is BookKeeper.
BookKeeper achieves high availability and strong durability guarantees by replicating
edit log entries across multiple storage nodes. The edit log can be striped across
the storage nodes for high performance. Fencing is supported in the protocol, i.e,
edit log entries across multiple storage nodes. The edit log can be striped across
the storage nodes for high performance. Fencing is supported in the protocol, i.e,
BookKeeper will not allow two writers to write the single edit log.
The meta data for BookKeeper is stored in ZooKeeper.
In current HA architecture, a Zookeeper cluster is required for ZKFC. The same cluster can be
for BookKeeper metadata.
For more details on building a BookKeeper cluster, please refer to the
For more details on building a BookKeeper cluster, please refer to the
{{{http://zookeeper.apache.org/bookkeeper/docs/trunk/bookkeeperConfig.html }BookKeeper documentation}}
The BookKeeperJournalManager is an implementation of the HDFS JournalManager interface, which allows custom write ahead logging implementations to be plugged into the HDFS NameNode.
**<<BookKeeper Journal Manager>>
To use BookKeeperJournalManager, add the following to hdfs-site.xml.
@ -772,12 +772,12 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
classpath. We explain how to generate a jar file with the journal manager and
its dependencies, and how to put it into the classpath below.
*** <<More configuration options>>
*** <<More configuration options>>
* <<dfs.namenode.bookkeeperjournal.output-buffer-size>> -
* <<dfs.namenode.bookkeeperjournal.output-buffer-size>> -
Number of bytes a bookkeeper journal stream will buffer before
forcing a flush. Default is 1024.
----
<property>
<name>dfs.namenode.bookkeeperjournal.output-buffer-size</name>
@ -785,7 +785,7 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
</property>
----
* <<dfs.namenode.bookkeeperjournal.ensemble-size>> -
* <<dfs.namenode.bookkeeperjournal.ensemble-size>> -
Number of bookkeeper servers in edit log ensembles. This
is the number of bookkeeper servers which need to be available
for the edit log to be writable. Default is 3.
@ -797,7 +797,7 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
</property>
----
* <<dfs.namenode.bookkeeperjournal.quorum-size>> -
* <<dfs.namenode.bookkeeperjournal.quorum-size>> -
Number of bookkeeper servers in the write quorum. This is the
number of bookkeeper servers which must have acknowledged the
write of an entry before it is considered written. Default is 2.
@ -809,7 +809,7 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
</property>
----
* <<dfs.namenode.bookkeeperjournal.digestPw>> -
* <<dfs.namenode.bookkeeperjournal.digestPw>> -
Password to use when creating edit log segments.
----
@ -819,9 +819,9 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
</property>
----
* <<dfs.namenode.bookkeeperjournal.zk.session.timeout>> -
* <<dfs.namenode.bookkeeperjournal.zk.session.timeout>> -
Session timeout for Zookeeper client from BookKeeper Journal Manager.
Hadoop recommends that this value should be less than the ZKFC
Hadoop recommends that this value should be less than the ZKFC
session timeout value. Default value is 3000.
----
@ -838,22 +838,22 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
$ mvn clean package -Pdist
This will generate a jar with the BookKeeperJournalManager,
This will generate a jar with the BookKeeperJournalManager,
hadoop-hdfs/src/contrib/bkjournal/target/hadoop-hdfs-bkjournal-<VERSION>.jar
Note that the -Pdist part of the build command is important, this would
copy the dependent bookkeeper-server jar under
copy the dependent bookkeeper-server jar under
hadoop-hdfs/src/contrib/bkjournal/target/lib.
*** <<Putting the BookKeeperJournalManager in the NameNode classpath>>
To run a HDFS namenode using BookKeeper as a backend, copy the bkjournal and
bookkeeper-server jar, mentioned above, into the lib directory of hdfs. In the
bookkeeper-server jar, mentioned above, into the lib directory of hdfs. In the
standard distribution of HDFS, this is at $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/
cp hadoop-hdfs/src/contrib/bkjournal/target/hadoop-hdfs-bkjournal-<VERSION>.jar $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/
*** <<Current limitations>>
*** <<Current limitations>>
1) Security in BookKeeper. BookKeeper does not support SASL nor SSL for
connections between the NameNode and BookKeeper storage nodes.

View File

@ -25,7 +25,7 @@ HDFS High Availability Using the Quorum Journal Manager
This guide provides an overview of the HDFS High Availability (HA) feature
and how to configure and manage an HA HDFS cluster, using the Quorum Journal
Manager (QJM) feature.
This document assumes that the reader has a general understanding of
general components and node types in an HDFS cluster. Please refer to the
HDFS Architecture guide for details.
@ -44,7 +44,7 @@ HDFS High Availability Using the Quorum Journal Manager
an HDFS cluster. Each cluster had a single NameNode, and if that machine or
process became unavailable, the cluster as a whole would be unavailable
until the NameNode was either restarted or brought up on a separate machine.
This impacted the total availability of the HDFS cluster in two major ways:
* In the case of an unplanned event such as a machine crash, the cluster would
@ -52,7 +52,7 @@ HDFS High Availability Using the Quorum Journal Manager
* Planned maintenance events such as software or hardware upgrades on the
NameNode machine would result in windows of cluster downtime.
The HDFS High Availability feature addresses the above problems by providing
the option of running two redundant NameNodes in the same cluster in an
Active/Passive configuration with a hot standby. This allows a fast failover to
@ -67,7 +67,7 @@ HDFS High Availability Using the Quorum Journal Manager
for all client operations in the cluster, while the Standby is simply acting
as a slave, maintaining enough state to provide a fast failover if
necessary.
In order for the Standby node to keep its state synchronized with the Active
node, both nodes communicate with a group of separate daemons called
"JournalNodes" (JNs). When any namespace modification is performed by the
@ -78,12 +78,12 @@ HDFS High Availability Using the Quorum Journal Manager
failover, the Standby will ensure that it has read all of the edits from the
JounalNodes before promoting itself to the Active state. This ensures that the
namespace state is fully synchronized before a failover occurs.
In order to provide a fast failover, it is also necessary that the Standby node
have up-to-date information regarding the location of blocks in the cluster.
In order to achieve this, the DataNodes are configured with the location of
both NameNodes, and send block location information and heartbeats to both.
It is vital for the correct operation of an HA cluster that only one of the
NameNodes be Active at a time. Otherwise, the namespace state would quickly
diverge between the two, risking data loss or other incorrect results. In
@ -113,7 +113,7 @@ HDFS High Availability Using the Quorum Journal Manager
you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when
running with N JournalNodes, the system can tolerate at most (N - 1) / 2
failures and continue to function normally.
Note that, in an HA cluster, the Standby NameNode also performs checkpoints of
the namespace state, and thus it is not necessary to run a Secondary NameNode,
CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an
@ -130,7 +130,7 @@ HDFS High Availability Using the Quorum Journal Manager
The new configuration is designed such that all the nodes in the cluster may
have the same configuration without the need for deploying different
configuration files to different machines based on the type of the node.
Like HDFS Federation, HA clusters reuse the <<<nameservice ID>>> to identify a
single HDFS instance that may in fact consist of multiple HA NameNodes. In
addition, a new abstraction called <<<NameNode ID>>> is added with HA. Each
@ -347,7 +347,7 @@ HDFS High Availability Using the Quorum Journal Manager
<<dfs_namenode_rpc-address>> will contain the RPC address of the target node, even
though the configuration may specify that variable as
<<dfs.namenode.rpc-address.ns1.nn1>>.
Additionally, the following variables referring to the target node to be fenced
are also available:
@ -362,7 +362,7 @@ HDFS High Availability Using the Quorum Journal Manager
*-----------------------:-----------------------------------+
| $target_namenodeid | the namenode ID of the NN to be fenced |
*-----------------------:-----------------------------------+
These environment variables may also be used as substitutions in the shell
command itself. For example:
@ -372,7 +372,7 @@ HDFS High Availability Using the Quorum Journal Manager
<value>shell(/path/to/my/script.sh --nameservice=$target_nameserviceid $target_host:$target_port)</value>
</property>
---
If the shell command returns an exit
code of 0, the fencing is determined to be successful. If it returns any other
exit code, the fencing was not successful and the next fencing method in the
@ -424,7 +424,7 @@ HDFS High Availability Using the Quorum Journal Manager
* If you are setting up a fresh HDFS cluster, you should first run the format
command (<hdfs namenode -format>) on one of NameNodes.
* If you have already formatted the NameNode, or are converting a
non-HA-enabled cluster to be HA-enabled, you should now copy over the
contents of your NameNode metadata directories to the other, unformatted
@ -432,7 +432,7 @@ HDFS High Availability Using the Quorum Journal Manager
unformatted NameNode. Running this command will also ensure that the
JournalNodes (as configured by <<dfs.namenode.shared.edits.dir>>) contain
sufficient edits transactions to be able to start both NameNodes.
* If you are converting a non-HA NameNode to be HA, you should run the
command "<hdfs -initializeSharedEdits>", which will initialize the
JournalNodes with the edits data from the local NameNode edits directories.
@ -522,7 +522,7 @@ Usage: DFSHAAdmin [-ns <nameserviceId>]
of coordination data, notifying clients of changes in that data, and
monitoring clients for failures. The implementation of automatic HDFS failover
relies on ZooKeeper for the following things:
* <<Failure detection>> - each of the NameNode machines in the cluster
maintains a persistent session in ZooKeeper. If the machine crashes, the
ZooKeeper session will expire, notifying the other NameNode that a failover
@ -623,7 +623,7 @@ Usage: DFSHAAdmin [-ns <nameserviceId>]
from one of the NameNode hosts.
----
$ hdfs zkfc -formatZK
[hdfs]$ $HADOOP_PREFIX/bin/hdfs zkfc -formatZK
----
This will create a znode in ZooKeeper inside of which the automatic failover
@ -643,7 +643,7 @@ $ hdfs zkfc -formatZK
can start the daemon by running:
----
$ hadoop-daemon.sh start zkfc
[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start zkfc
----
** Securing access to ZooKeeper
@ -684,7 +684,7 @@ digest:hdfs-zkfcs:mypassword
a command like the following:
----
$ java -cp $ZK_HOME/lib/*:$ZK_HOME/zookeeper-3.4.2.jar org.apache.zookeeper.server.auth.DigestAuthenticationProvider hdfs-zkfcs:mypassword
[hdfs]$ java -cp $ZK_HOME/lib/*:$ZK_HOME/zookeeper-3.4.2.jar org.apache.zookeeper.server.auth.DigestAuthenticationProvider hdfs-zkfcs:mypassword
output: hdfs-zkfcs:mypassword->hdfs-zkfcs:P/OQvnYyU/nF/mGYvB/xurX8dYs=
----
@ -786,18 +786,18 @@ digest:hdfs-zkfcs:vlUvLnd8MlacsE80rDuu6ONESbM=:rwcda
operations, the operation will fail.
[[3]] Start one of the NNs with the <<<'-upgrade'>>> flag.
[[4]] On start, this NN will not enter the standby state as usual in an HA
setup. Rather, this NN will immediately enter the active state, perform an
upgrade of its local storage dirs, and also perform an upgrade of the shared
edit log.
[[5]] At this point the other NN in the HA pair will be out of sync with
the upgraded NN. In order to bring it back in sync and once again have a highly
available setup, you should re-bootstrap this NameNode by running the NN with
the <<<'-bootstrapStandby'>>> flag. It is an error to start this second NN with
the <<<'-upgrade'>>> flag.
Note that if at any time you want to restart the NameNodes before finalizing
or rolling back the upgrade, you should start the NNs as normal, i.e. without
any special startup flag.

View File

@ -36,15 +36,15 @@ HDFS NFS Gateway
HDFS file system.
* Users can stream data directly to HDFS through the mount point. File
append is supported but random write is not supported.
append is supported but random write is not supported.
The NFS gateway machine needs the same thing to run an HDFS client like Hadoop JAR files, HADOOP_CONF directory.
The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client.
The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client.
* {Configuration}
The NFS-gateway uses proxy user to proxy all the users accessing the NFS mounts.
The NFS-gateway uses proxy user to proxy all the users accessing the NFS mounts.
In non-secure mode, the user running the gateway is the proxy user, while in secure mode the
user in Kerberos keytab is the proxy user. Suppose the proxy user is 'nfsserver'
and users belonging to the groups 'users-group1'
@ -57,10 +57,10 @@ HDFS NFS Gateway
<name>hadoop.proxyuser.nfsserver.groups</name>
<value>root,users-group1,users-group2</value>
<description>
The 'nfsserver' user is allowed to proxy all members of the 'users-group1' and
The 'nfsserver' user is allowed to proxy all members of the 'users-group1' and
'users-group2' groups. Note that in most cases you will need to include the
group "root" because the user "root" (which usually belonges to "root" group) will
generally be the user that initially executes the mount on the NFS client system.
generally be the user that initially executes the mount on the NFS client system.
Set this to '*' to allow nfsserver user to proxy any group.
</description>
</property>
@ -78,7 +78,7 @@ HDFS NFS Gateway
----
The above are the only required configuration for the NFS gateway in non-secure mode. For Kerberized
hadoop clusters, the following configurations need to be added to hdfs-site.xml for the gateway (NOTE: replace
hadoop clusters, the following configurations need to be added to hdfs-site.xml for the gateway (NOTE: replace
string "nfsserver" with the proxy user name and ensure the user contained in the keytab is
also the same proxy user):
@ -95,7 +95,7 @@ HDFS NFS Gateway
<value>nfsserver/_HOST@YOUR-REALM.COM</value>
</property>
----
The rest of the NFS gateway configurations are optional for both secure and non-secure mode.
The AIX NFS client has a {{{https://issues.apache.org/jira/browse/HDFS-6549}few known issues}}
@ -119,11 +119,11 @@ HDFS NFS Gateway
It's strongly recommended for the users to update a few configuration properties based on their use
cases. All the following configuration properties can be added or updated in hdfs-site.xml.
* If the client mounts the export with access time update allowed, make sure the following
property is not disabled in the configuration file. Only NameNode needs to restart after
* If the client mounts the export with access time update allowed, make sure the following
property is not disabled in the configuration file. Only NameNode needs to restart after
this property is changed. On some Unix systems, the user can disable access time update
by mounting the export with "noatime". If the export is mounted with "noatime", the user
by mounting the export with "noatime". If the export is mounted with "noatime", the user
doesn't need to change the following property and thus no need to restart namenode.
----
@ -149,11 +149,11 @@ HDFS NFS Gateway
this property is updated.
----
<property>
<property>
<name>nfs.dump.dir</name>
<value>/tmp/.hdfs-nfs</value>
</property>
----
----
* By default, the export can be mounted by any client. To better control the access,
users can update the following property. The value string contains machine name and
@ -161,7 +161,7 @@ HDFS NFS Gateway
characters. The machine name format can be a single host, a Java regular expression, or an IPv4 address. The
access privilege uses rw or ro to specify read/write or read-only access of the machines to exports. If the access
privilege is not provided, the default is read-only. Entries are separated by ";".
For example: "192.168.0.0/22 rw ; host.*\.example\.com ; host1.test.org ro;". Only the NFS gateway needs to restart after
For example: "192.168.0.0/22 rw ; host.*\.example\.com ; host1.test.org ro;". Only the NFS gateway needs to restart after
this property is updated.
----
@ -171,22 +171,22 @@ HDFS NFS Gateway
</property>
----
* JVM and log settings. You can export JVM settings (e.g., heap size and GC log) in
HADOOP_NFS3_OPTS. More NFS related settings can be found in hadoop-env.sh.
To get NFS debug trace, you can edit the log4j.property file
* JVM and log settings. You can export JVM settings (e.g., heap size and GC log) in
HADOOP_NFS3_OPTS. More NFS related settings can be found in hadoop-env.sh.
To get NFS debug trace, you can edit the log4j.property file
to add the following. Note, debug trace, especially for ONCRPC, can be very verbose.
To change logging level:
-----------------------------------------------
-----------------------------------------------
log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG
-----------------------------------------------
-----------------------------------------------
To get more details of ONCRPC requests:
-----------------------------------------------
-----------------------------------------------
log4j.logger.org.apache.hadoop.oncrpc=DEBUG
-----------------------------------------------
-----------------------------------------------
* {Start and stop NFS gateway service}
@ -195,53 +195,39 @@ HDFS NFS Gateway
The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as the
only export. It is recommended to use the portmap included in NFS gateway package. Even
though NFS gateway works with portmap/rpcbind provide by most Linux distributions, the
package included portmap is needed on some Linux systems such as REHL6.2 due to an
package included portmap is needed on some Linux systems such as REHL6.2 due to an
{{{https://bugzilla.redhat.com/show_bug.cgi?id=731542}rpcbind bug}}. More detailed discussions can
be found in {{{https://issues.apache.org/jira/browse/HDFS-4763}HDFS-4763}}.
[[1]] Stop nfs/rpcbind/portmap services provided by the platform (commands can be different on various Unix platforms):
-------------------------
service nfs stop
service rpcbind stop
-------------------------
[[2]] Start package included portmap (needs root privileges):
[[1]] Stop nfsv3 and rpcbind/portmap services provided by the platform (commands can be different on various Unix platforms):
-------------------------
hdfs portmap
OR
[root]> service nfs stop
[root]> service rpcbind stop
-------------------------
hadoop-daemon.sh start portmap
[[2]] Start Hadoop's portmap (needs root privileges):
-------------------------
[root]> $HADOOP_PREFIX/bin/hdfs --daemon start portmap
-------------------------
[[3]] Start mountd and nfsd.
No root privileges are required for this command. In non-secure mode, the NFS gateway
should be started by the proxy user mentioned at the beginning of this user guide.
While in secure mode, any user can start NFS gateway
should be started by the proxy user mentioned at the beginning of this user guide.
While in secure mode, any user can start NFS gateway
as long as the user has read access to the Kerberos keytab defined in "nfs.keytab.file".
-------------------------
hdfs nfs3
OR
hadoop-daemon.sh start nfs3
[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start nfs3
-------------------------
Note, if the hadoop-daemon.sh script starts the NFS gateway, its log can be found in the hadoop log folder.
[[4]] Stop NFS gateway services.
-------------------------
hadoop-daemon.sh stop nfs3
hadoop-daemon.sh stop portmap
[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon stop nfs3
[root]> $HADOOP_PREFIX/bin/hdfs --daemon stop portmap
-------------------------
Optionally, you can forgo running the Hadoop-provided portmap daemon and
@ -263,7 +249,7 @@ HDFS NFS Gateway
[[1]] Execute the following command to verify if all the services are up and running:
-------------------------
rpcinfo -p $nfs_server_ip
[root]> rpcinfo -p $nfs_server_ip
-------------------------
You should see output similar to the following:
@ -293,11 +279,11 @@ HDFS NFS Gateway
[[2]] Verify if the HDFS namespace is exported and can be mounted.
-------------------------
showmount -e $nfs_server_ip
[root]> showmount -e $nfs_server_ip
-------------------------
You should see output similar to the following:
-------------------------
Exports list on $nfs_server_ip :
@ -307,22 +293,22 @@ HDFS NFS Gateway
* {Mount the export “/”}
Currently NFS v3 only uses TCP as the transportation protocol.
Currently NFS v3 only uses TCP as the transportation protocol.
NLM is not supported so mount option "nolock" is needed. It's recommended to use
hard mount. This is because, even after the client sends all data to
NFS gateway, it may take NFS gateway some extra time to transfer data to HDFS
hard mount. This is because, even after the client sends all data to
NFS gateway, it may take NFS gateway some extra time to transfer data to HDFS
when writes were reorderd by NFS client Kernel.
If soft mount has to be used, the user should give it a relatively
If soft mount has to be used, the user should give it a relatively
long timeout (at least no less than the default timeout on the host) .
The users can mount the HDFS namespace as shown below:
-------------------------------------------------------------------
mount -t nfs -o vers=3,proto=tcp,nolock,noacl $server:/ $mount_point
-------------------------------------------------------------------
[root]>mount -t nfs -o vers=3,proto=tcp,nolock,noacl $server:/ $mount_point
-------------------------------------------------------------------
Then the users can access HDFS as part of the local file system except that,
Then the users can access HDFS as part of the local file system except that,
hard link and random write are not supported yet. To optimize the performance
of large file I/O, one can increase the NFS transfer size(rsize and wsize) during mount.
By default, NFS gateway supports 1MB as the maximum transfer size. For larger data
@ -347,7 +333,7 @@ HDFS NFS Gateway
* {User authentication and mapping}
NFS gateway in this release uses AUTH_UNIX style authentication. When the user on NFS client
accesses the mount point, NFS client passes the UID to NFS gateway.
accesses the mount point, NFS client passes the UID to NFS gateway.
NFS gateway does a lookup to find user name from the UID, and then passes the
username to the HDFS along with the HDFS requests.
For example, if the NFS client has current user as "admin", when the user accesses
@ -358,7 +344,7 @@ HDFS NFS Gateway
The system administrator must ensure that the user on NFS client host has the same
name and UID as that on the NFS gateway host. This is usually not a problem if
the same user management system (e.g., LDAP/NIS) is used to create and deploy users on
HDFS nodes and NFS client node. In case the user account is created manually on different hosts, one might need to
HDFS nodes and NFS client node. In case the user account is created manually on different hosts, one might need to
modify UID (e.g., do "usermod -u 123 myusername") on either NFS client or NFS gateway host
in order to make it the same on both sides. More technical details of RPC AUTH_UNIX can be found
in {{{http://tools.ietf.org/html/rfc1057}RPC specification}}.