HDFS-6864. Archival Storage: add user documentation. Contributed by Tsz Wo Nicholas Sze.
This commit is contained in:
parent
91f6ddeb34
commit
b014e83bc5
|
@ -71,6 +71,8 @@ HDFS-6584: Archival Storage
|
|||
|
||||
HDFS-7072. Fix TestBlockManager and TestStorageMover. (jing9 via szetszwo)
|
||||
|
||||
HDFS-6864. Archival Storage: add user documentation. (szetszwo via jing9)
|
||||
|
||||
Trunk (Unreleased)
|
||||
|
||||
INCOMPATIBLE CHANGES
|
||||
|
|
|
@ -472,6 +472,12 @@ public class DistributedFileSystem extends FileSystem {
|
|||
}.resolve(this, absF);
|
||||
}
|
||||
|
||||
/**
|
||||
* Set the source path to the specified storage policy.
|
||||
*
|
||||
* @param src The source path referring to either a directory or a file.
|
||||
* @param policyName The name of the storage policy.
|
||||
*/
|
||||
public void setStoragePolicy(final Path src, final String policyName)
|
||||
throws IOException {
|
||||
statistics.incrementWriteOps(1);
|
||||
|
|
|
@ -498,9 +498,9 @@ public class Mover {
|
|||
|
||||
static class Cli extends Configured implements Tool {
|
||||
private static final String USAGE = "Usage: java "
|
||||
+ Mover.class.getSimpleName()
|
||||
+ " [-p <space separated files/dirs> specify a list of files/dirs to migrate]"
|
||||
+ " [-f <local file name> specify a local file containing files/dirs to migrate]";
|
||||
+ Mover.class.getSimpleName() + " [-p <files/dirs> | -f <local file>]"
|
||||
+ "\n\t-p <files/dirs>\ta space separated list of HDFS files/dirs to migrate."
|
||||
+ "\n\t-f <local file>\ta local file containing a list of HDFS files/dirs to migrate.";
|
||||
|
||||
private static Options buildCliOptions() {
|
||||
Options opts = new Options();
|
||||
|
|
|
@ -0,0 +1,302 @@
|
|||
~~ Licensed under the Apache License, Version 2.0 (the "License");
|
||||
~~ you may not use this file except in compliance with the License.
|
||||
~~ You may obtain a copy of the License at
|
||||
~~
|
||||
~~ http://www.apache.org/licenses/LICENSE-2.0
|
||||
~~
|
||||
~~ Unless required by applicable law or agreed to in writing, software
|
||||
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
||||
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
~~ See the License for the specific language governing permissions and
|
||||
~~ limitations under the License. See accompanying LICENSE file.
|
||||
|
||||
---
|
||||
HDFS Archival Storage
|
||||
---
|
||||
---
|
||||
${maven.build.timestamp}
|
||||
|
||||
HDFS Archival Storage
|
||||
|
||||
%{toc|section=1|fromDepth=0}
|
||||
|
||||
* {Introduction}
|
||||
|
||||
<Archival Storage> is a solution to decouple growing storage capacity from compute capacity.
|
||||
Nodes with higher density and less expensive storage with low compute power are becoming available
|
||||
and can be used as cold storage in the clusters.
|
||||
Based on policy the data from hot can be moved to the cold.
|
||||
Adding more nodes to the cold storage can grow the storage independent of the compute capacity
|
||||
in the cluster.
|
||||
|
||||
* {Storage Types and Storage Policies}
|
||||
|
||||
** {Storage Types: DISK, SSD and ARCHIVE}
|
||||
|
||||
The first phase of
|
||||
{{{https://issues.apache.org/jira/browse/HDFS-2832}Heterogeneous Storage (HDFS-2832)}}
|
||||
changed datanode storage model from a single storage,
|
||||
which may correspond to multiple physical storage medias,
|
||||
to a collection of storages with each storage corresponding to a physical storage media.
|
||||
It also added the notion of storage types, DISK and SSD,
|
||||
where DISK is the default storage type.
|
||||
|
||||
A new storage type <ARCHIVE>,
|
||||
which has high storage density (petabyte of storage) but little compute power,
|
||||
is added for supporting archival storage.
|
||||
|
||||
** {Storage Policies: Hot, Warm and Cold}
|
||||
|
||||
A new concept of storage policies is introduced in order to allow files to be stored
|
||||
in different storage types according to the storage policy.
|
||||
|
||||
We have the following storage policies:
|
||||
|
||||
* <<Hot>> - for both storage and compute.
|
||||
The data that is popular and still being used for processing will stay in this policy.
|
||||
When a block is hot, all replicas are stored in DISK.
|
||||
|
||||
* <<Cold>> - only for storage with limited compute.
|
||||
The data that is no longer being used, or data that needs to be archived is moved
|
||||
from hot storage to cold storage.
|
||||
When a block is cold, all replicas are stored in ARCHIVE.
|
||||
|
||||
* <<Warm>> - partially hot and partially cold.
|
||||
When a block is warm, some of its replicas are stored in DISK
|
||||
and the remaining replicas are stored in ARCHIVE.
|
||||
|
||||
[]
|
||||
|
||||
More formally, a storage policy consists of the following fields:
|
||||
|
||||
[[1]] Policy ID
|
||||
|
||||
[[2]] Policy name
|
||||
|
||||
[[3]] A list of storage types for block placement
|
||||
|
||||
[[4]] A list of fallback storage types for file creation
|
||||
|
||||
[[5]] A list of fallback storage types for replication
|
||||
|
||||
[]
|
||||
|
||||
When there is enough space,
|
||||
block replicas are stored according to the storage type list specified in #3.
|
||||
When some of the storage types in list #3 are running out of space,
|
||||
the fallback storage type lists specified in #4 and #5 are used
|
||||
to replace the out-of-space storage types for file creation and replication, respectively.
|
||||
|
||||
The following is a typical storage policy table.
|
||||
|
||||
*--------+---------------+-------------------------+-----------------------+-----------------------+
|
||||
| <<Policy>> | <<Policy>>| <<Block Placement>> | <<Fallback storages>> | <<Fallback storages>> |
|
||||
| <<ID>> | <<Name>> | <<(n\ replicas)>> | <<for creation>> | <<for replication>> |
|
||||
*--------+---------------+-------------------------+-----------------------+-----------------------+
|
||||
| 12 | Hot (default) | DISK: <n> | \<none\> | ARCHIVE |
|
||||
*--------+---------------+-------------------------+-----------------------+-----------------------+
|
||||
| 8 | Warm | DISK: 1, ARCHIVE: <n>-1 | ARCHIVE, DISK | ARCHIVE, DISK |
|
||||
*--------+---------------+-------------------------+-----------------------+-----------------------+
|
||||
| 4 | Cold | ARCHIVE: <n> | \<none\> | \<none\> |
|
||||
*--------+---------------+-------------------------+-----------------------+-----------------------+
|
||||
|
||||
Note that cluster administrators may change the storage policy table
|
||||
according to the characteristic of the cluster.
|
||||
For example, in order to prevent losing archival data,
|
||||
administrators may want to use DISK as fallback storage for replication in the Cold policy.
|
||||
A drawback of such setting is that the DISK storages could be filled up with archival data.
|
||||
As a result, the entire cluster may become full and cannot serve hot data anymore.
|
||||
|
||||
** {Configurations}
|
||||
|
||||
*** {Setting The List of All Storage Policies}
|
||||
|
||||
* <<dfs.block.storage.policies>>
|
||||
- a list of block storage policy names and IDs.
|
||||
The syntax is
|
||||
|
||||
NAME_1:ID_1, NAME_2:ID_2, ..., NAME_<n>:ID_<n>
|
||||
|
||||
where ID is an integer in the closed range [1,15] and NAME is case insensitive.
|
||||
The first element is the <default policy>. Empty list is not allowed.
|
||||
|
||||
The default value is shown below.
|
||||
|
||||
+------------------------------------------+
|
||||
<property>
|
||||
<name>dfs.block.storage.policies</name>
|
||||
<value>HOT:12, WARM:8, COLD:4</value>
|
||||
</property>
|
||||
+------------------------------------------+
|
||||
|
||||
[]
|
||||
|
||||
*** {Setting Storage Policy Details}
|
||||
|
||||
The following configuration properties are for setting the details of each storage policy,
|
||||
where <<<\<ID\>>>> is the actual policy ID.
|
||||
|
||||
* <<dfs.block.storage.policy.\<ID\>>>
|
||||
- a list of storage types for storing the block replicas.
|
||||
The syntax is
|
||||
|
||||
STORAGE_TYPE_1, STORAGE_TYPE_2, ..., STORAGE_TYPE_<n>
|
||||
|
||||
When creating a block, the <i>-th replica is stored using <i>-th storage type
|
||||
for <i> less than or equal to <n>, and
|
||||
the <j>-th replica is stored using <n>-th storage type for <j> greater than <n>.
|
||||
|
||||
Empty list is not allowed.
|
||||
|
||||
Examples:
|
||||
|
||||
+------------------------------------------+
|
||||
DISK : all replicas stored using DISK.
|
||||
DISK, ARCHIVE : the first replica is stored using DISK and all the
|
||||
remaining replicas are stored using ARCHIVE.
|
||||
+------------------------------------------+
|
||||
|
||||
* <<dfs.block.storage.policy.creation-fallback.\<ID\>>>
|
||||
- a list of storage types for creation fallback storage.
|
||||
The syntax is
|
||||
|
||||
STORAGE_TYPE_1, STORAGE_TYPE_2, ..., STORAGE_TYPE_n
|
||||
|
||||
When creating a block, if a particular storage type specified in the policy
|
||||
is unavailable, the fallback STORAGE_TYPE_1 is used. Further, if
|
||||
STORAGE_TYPE_<i> is also unavailable, the fallback STORAGE_TYPE_<(i+1)> is used.
|
||||
In case all fallback storages are unavailable, the block will be created
|
||||
with number of replicas less than the specified replication factor.
|
||||
|
||||
An empty list indicates that there is no fallback storage.
|
||||
|
||||
* <<dfs.block.storage.policy.replication-fallback.\<ID\>>>
|
||||
- a list of storage types for replication fallback storage.
|
||||
The usage of this configuration property is similar to
|
||||
<<<dfs.block.storage.policy.creation-fallback.\<ID\>>>>
|
||||
except that it takes effect on replication but not block creation.
|
||||
|
||||
[]
|
||||
|
||||
The following are the default configuration values for Hot, Warm and Cold storage policies.
|
||||
|
||||
* Block Storage Policy <<HOT:12>>
|
||||
|
||||
+------------------------------------------+
|
||||
<property>
|
||||
<name>dfs.block.storage.policy.12</name>
|
||||
<value>DISK</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dfs.block.storage.policy.creation-fallback.12</name>
|
||||
<value></value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dfs.block.storage.policy.replication-fallback.12</name>
|
||||
<value>ARCHIVE</value>
|
||||
</property>
|
||||
+------------------------------------------+
|
||||
|
||||
* Block Storage Policy <<WARM:8>>
|
||||
|
||||
+------------------------------------------+
|
||||
<property>
|
||||
<name>dfs.block.storage.policy.8</name>
|
||||
<value>DISK, ARCHIVE</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dfs.block.storage.policy.creation-fallback.8</name>
|
||||
<value>DISK, ARCHIVE</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dfs.block.storage.policy.replication-fallback.8</name>
|
||||
<value>DISK, ARCHIVE</value>
|
||||
</property>
|
||||
+------------------------------------------+
|
||||
|
||||
* Block Storage Policy <<COLD:4>>
|
||||
|
||||
+------------------------------------------+
|
||||
<property>
|
||||
<name>dfs.block.storage.policy.4</name>
|
||||
<value>ARCHIVE</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dfs.block.storage.policy.creation-fallback.4</name>
|
||||
<value></value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dfs.block.storage.policy.replication-fallback.4</name>
|
||||
<value></value>
|
||||
</property>
|
||||
+------------------------------------------+
|
||||
|
||||
[]
|
||||
|
||||
* {Mover - A New Data Migration Tool}
|
||||
|
||||
A new data migration tool is added for archiving data.
|
||||
The tool is similar to Balancer.
|
||||
It periodically scans the files in HDFS to check if the block placement satisfies the storage policy.
|
||||
For the blocks violating the storage policy,
|
||||
it moves the replicas to a different storage type
|
||||
in order to fulfill the storage policy requirement.
|
||||
|
||||
* Command:
|
||||
|
||||
+------------------------------------------+
|
||||
hdfs mover [-p <files/dirs> | -f <local file name>]
|
||||
+------------------------------------------+
|
||||
|
||||
* Arguments:
|
||||
|
||||
*-------------------------+--------------------------------------------------------+
|
||||
| <<<-p \<files/dirs\>>>> | Specify a space separated list of HDFS files/dirs to migrate.
|
||||
*-------------------------+--------------------------------------------------------+
|
||||
| <<<-f \<local file\>>>> | Specify a local file containing a list of HDFS files/dirs to migrate.
|
||||
*-------------------------+--------------------------------------------------------+
|
||||
|
||||
Note that, when both -p and -f options are omitted, the default path is the root directory.
|
||||
|
||||
[]
|
||||
|
||||
|
||||
* {<<<DFSAdmin>>> Commands}
|
||||
|
||||
** {Set Storage Policy}
|
||||
|
||||
Set a storage policy to a file or a directory.
|
||||
|
||||
* Command:
|
||||
|
||||
+------------------------------------------+
|
||||
hdfs dfsadmin -setStoragePolicy <path> <policyName>
|
||||
+------------------------------------------+
|
||||
|
||||
* Arguments:
|
||||
|
||||
*----------------------+-----------------------------------------------------+
|
||||
| <<<\<path\>>>> | The path referring to either a directory or a file. |
|
||||
*----------------------+-----------------------------------------------------+
|
||||
| <<<\<policyName\>>>> | The name of the storage policy. |
|
||||
*----------------------+-----------------------------------------------------+
|
||||
|
||||
[]
|
||||
|
||||
** {Get Storage Policy}
|
||||
|
||||
Get the storage policy of a file or a directory.
|
||||
|
||||
* Command:
|
||||
|
||||
+------------------------------------------+
|
||||
hdfs dfsadmin -getStoragePolicy <path>
|
||||
+------------------------------------------+
|
||||
|
||||
* Arguments:
|
||||
|
||||
*----------------------+-----------------------------------------------------+
|
||||
| <<<\<path\>>>> | The path referring to either a directory or a file. |
|
||||
*----------------------+-----------------------------------------------------+
|
||||
|
||||
[]
|
|
@ -147,18 +147,19 @@ HDFS Commands Guide
|
|||
*-----------------+-----------------------------------------------------------+
|
||||
| -regular | Normal datanode startup (default).
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| -rollback | Rollsback the datanode to the previous version. This should
|
||||
| -rollback | Rollback the datanode to the previous version. This should
|
||||
| | be used after stopping the datanode and distributing the
|
||||
| | old hadoop version.
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| -rollingupgrade rollback | Rollsback a rolling upgrade operation.
|
||||
| -rollingupgrade rollback | Rollback a rolling upgrade operation.
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
|
||||
** <<<dfsadmin>>>
|
||||
|
||||
Runs a HDFS dfsadmin client.
|
||||
|
||||
Usage: <<<hdfs dfsadmin [GENERIC_OPTIONS]
|
||||
+------------------------------------------+
|
||||
Usage: hdfs dfsadmin [GENERIC_OPTIONS]
|
||||
[-report [-live] [-dead] [-decommissioning]]
|
||||
[-safemode enter | leave | get | wait]
|
||||
[-saveNamespace]
|
||||
|
@ -169,6 +170,8 @@ HDFS Commands Guide
|
|||
[-clrQuota <dirname>...<dirname>]
|
||||
[-setSpaceQuota <quota> <dirname>...<dirname>]
|
||||
[-clrSpaceQuota <dirname>...<dirname>]
|
||||
[-setStoragePolicy <path> <policyName>]
|
||||
[-getStoragePolicy <path>]
|
||||
[-finalizeUpgrade]
|
||||
[-rollingUpgrade [<query>|<prepare>|<finalize>]]
|
||||
[-metasave filename]
|
||||
|
@ -186,7 +189,8 @@ HDFS Commands Guide
|
|||
[-fetchImage <local directory>]
|
||||
[-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
|
||||
[-getDatanodeInfo <datanode_host:ipc_port>]
|
||||
[-help [cmd]]>>>
|
||||
[-help [cmd]]
|
||||
+------------------------------------------+
|
||||
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
|| COMMAND_OPTION || Description
|
||||
|
@ -236,6 +240,10 @@ HDFS Commands Guide
|
|||
| {{{../hadoop-hdfs/HdfsQuotaAdminGuide.html#Administrative_Commands}HDFS Quotas Guide}}
|
||||
| for the detail.
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| -setStoragePolicy \<path\> \<policyName\> | Set a storage policy to a file or a directory.
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| -getStoragePolicy \<path\> | Get the storage policy of a file or a directory.
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| -finalizeUpgrade| Finalize upgrade of HDFS. Datanodes delete their previous
|
||||
| version working directories, followed by Namenode doing the
|
||||
| same. This completes the upgrade process.
|
||||
|
@ -250,7 +258,7 @@ HDFS Commands Guide
|
|||
| <filename> will contain one line for each of the following\
|
||||
| 1. Datanodes heart beating with Namenode\
|
||||
| 2. Blocks waiting to be replicated\
|
||||
| 3. Blocks currrently being replicated\
|
||||
| 3. Blocks currently being replicated\
|
||||
| 4. Blocks waiting to be deleted
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
| -refreshServiceAcl | Reload the service-level authorization policy file.
|
||||
|
@ -312,12 +320,30 @@ HDFS Commands Guide
|
|||
| is specified.
|
||||
*-----------------+-----------------------------------------------------------+
|
||||
|
||||
** <<<mover>>>
|
||||
|
||||
Runs the data migration utility.
|
||||
See {{{./ArchivalStorage.html#Mover_-_A_New_Data_Migration_Tool}Mover}} for more details.
|
||||
|
||||
Usage: <<<hdfs mover [-p <files/dirs> | -f <local file name>]>>>
|
||||
|
||||
*--------------------+--------------------------------------------------------+
|
||||
|| COMMAND_OPTION || Description
|
||||
*--------------------+--------------------------------------------------------+
|
||||
| -p \<files/dirs\> | Specify a space separated list of HDFS files/dirs to migrate.
|
||||
*--------------------+--------------------------------------------------------+
|
||||
| -f \<local file\> | Specify a local file containing a list of HDFS files/dirs to migrate.
|
||||
*--------------------+--------------------------------------------------------+
|
||||
|
||||
Note that, when both -p and -f options are omitted, the default path is the root directory.
|
||||
|
||||
** <<<namenode>>>
|
||||
|
||||
Runs the namenode. More info about the upgrade, rollback and finalize is at
|
||||
{{{./HdfsUserGuide.html#Upgrade_and_Rollback}Upgrade Rollback}}.
|
||||
|
||||
Usage: <<<hdfs namenode [-backup] |
|
||||
+------------------------------------------+
|
||||
Usage: hdfs namenode [-backup] |
|
||||
[-checkpoint] |
|
||||
[-format [-clusterid cid ] [-force] [-nonInteractive] ] |
|
||||
[-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] |
|
||||
|
@ -329,7 +355,8 @@ HDFS Commands Guide
|
|||
[-initializeSharedEdits] |
|
||||
[-bootstrapStandby] |
|
||||
[-recover [-force] ] |
|
||||
[-metadataVersion ]>>>
|
||||
[-metadataVersion ]
|
||||
+------------------------------------------+
|
||||
|
||||
*--------------------+--------------------------------------------------------+
|
||||
|| COMMAND_OPTION || Description
|
||||
|
@ -351,7 +378,7 @@ HDFS Commands Guide
|
|||
| -upgradeOnly [-clusterid cid] [-renameReserved\<k-v pairs\>] | Upgrade the
|
||||
| specified NameNode and then shutdown it.
|
||||
*--------------------+--------------------------------------------------------+
|
||||
| -rollback | Rollsback the NameNode to the previous version. This
|
||||
| -rollback | Rollback the NameNode to the previous version. This
|
||||
| should be used after stopping the cluster and
|
||||
| distributing the old Hadoop version.
|
||||
*--------------------+--------------------------------------------------------+
|
||||
|
|
|
@ -93,6 +93,7 @@
|
|||
<item name="Extended Attributes" href="hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html"/>
|
||||
<item name="Transparent Encryption" href="hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html"/>
|
||||
<item name="HDFS Support for Multihoming" href="hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html"/>
|
||||
<item name="Archival Storage" href="hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html"/>
|
||||
</menu>
|
||||
|
||||
<menu name="MapReduce" inherit="top">
|
||||
|
|
Loading…
Reference in New Issue