hadoop/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html

944 lines
47 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
| Generated by Apache Maven Doxia at 2023-03-01
| Rendered using Apache Maven Stylus Skin 1.5
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop 3.4.0-SNAPSHOT &#x2013; Archival Storage, SSD & Memory</title>
<style type="text/css" media="all">
@import url("./css/maven-base.css");
@import url("./css/maven-theme.css");
@import url("./css/site.css");
</style>
<link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20230301" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://gitbox.apache.org/repos/asf/hadoop.git" class="externalLink">git</a>
|
<a href="http://hadoop.apache.org/" class="externalLink">Apache Hadoop</a>
&nbsp;| Last Published: 2023-03-01
&nbsp;| Version: 3.4.0-SNAPSHOT
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../../index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CommandsManual.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Compatibility.html">Compatibility Specification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DownstreamDev.html">Downstream Developer's Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AdminCompatibilityGuide.html">Admin Compatibility Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/InterfaceClassification.html">Interface Classification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/filesystem/index.html">FileSystem Specification</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FairCallQueue.html">Fair Call Queue</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Superusers.html">Proxy User</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>
</li>
<li class="none">
<a href="../../hadoop-kms/index.html">Hadoop KMS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Tracing.html">Tracing</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellGuide.html">Unix Shell Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/registry/index.html">Registry</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AsyncProfilerServlet.html">Async Profiler</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">User Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html">NameNode HA With QJM</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html">NameNode HA With NFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html">Observer NameNode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFs.html">ViewFs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFsOverloadScheme.html">ViewFsOverloadScheme</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">Snapshots</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">libhdfs (C API)</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS (REST API)</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/index.html">HttpFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">NFS Gateway</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">Rolling Upgrade</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html">Extended Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html">Transparent Encryption</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html">Multihoming</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html">Storage Policies</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/MemoryStorage.html">Memory Storage Support</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/SLGUserGuide.html">Synthetic Load Generator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html">Erasure Coding</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html">Disk Balancer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html">Upgrade Domain</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html">DataNode Admin</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html">Router Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html">Provided Storage</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html">Tutorial</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility with 1.x</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/SharedCacheSupport.html">Support for YARN Shared Cache</a>
</li>
</ul>
<h5>MapReduce REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html">MR History Server</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YARN.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html">ResourceManager Restart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html">ResourceManager HA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceModel.html">Resource Model</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeLabel.html">Node Labels</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeAttributes.html">Node Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service V.2</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html">YARN Application Security</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManager.html">NodeManager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/DockerContainers.html">Running Applications in Docker Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/RuncContainers.html">Running Applications in runC Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html">Using CGroups</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SecureContainer.html">Secure Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ReservationSystem.html">Reservation System</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html">Graceful Decommission</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html">Opportunistic Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/Federation.html">YARN Federation</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SharedCache.html">Shared Cache</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingGpus.html">Using GPU</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingFPGA.html">Using FPGA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html">Placement Constraints</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnUI2.html">YARN UI2</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html#Timeline_Service_v.2_REST_API">Timeline Service V.2</a>
</li>
</ul>
<h5>YARN Service</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html">QuickStart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Concepts.html">Concepts</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html">Yarn Service API</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/ServiceDiscovery.html">Service Discovery</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html">System Services</a>
</li>
</ul>
<h5>Hadoop Compatible File Systems</h5>
<ul>
<li class="none">
<a href="../../hadoop-aliyun/tools/hadoop-aliyun/index.html">Aliyun OSS</a>
</li>
<li class="none">
<a href="../../hadoop-aws/tools/hadoop-aws/index.html">Amazon S3</a>
</li>
<li class="none">
<a href="../../hadoop-azure/index.html">Azure Blob Storage</a>
</li>
<li class="none">
<a href="../../hadoop-azure-datalake/index.html">Azure Data Lake Storage</a>
</li>
<li class="none">
<a href="../../hadoop-cos/cloud-storage/index.html">Tencent COS</a>
</li>
<li class="none">
<a href="../../hadoop-huaweicloud/cloud-storage/index.html">Huaweicloud OBS</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Tools</h5>
<ul>
<li class="none">
<a href="../../hadoop-streaming/HadoopStreaming.html">Hadoop Streaming</a>
</li>
<li class="none">
<a href="../../hadoop-archives/HadoopArchives.html">Hadoop Archives</a>
</li>
<li class="none">
<a href="../../hadoop-archive-logs/HadoopArchiveLogs.html">Hadoop Archive Logs</a>
</li>
<li class="none">
<a href="../../hadoop-distcp/DistCp.html">DistCp</a>
</li>
<li class="none">
<a href="../../hadoop-federation-balance/HDFSFederationBalance.html">HDFS Federation Balance</a>
</li>
<li class="none">
<a href="../../hadoop-gridmix/GridMix.html">GridMix</a>
</li>
<li class="none">
<a href="../../hadoop-rumen/Rumen.html">Rumen</a>
</li>
<li class="none">
<a href="../../hadoop-resourceestimator/ResourceEstimator.html">Resource Estimator Service</a>
</li>
<li class="none">
<a href="../../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Benchmarking.html">Hadoop Benchmarking</a>
</li>
<li class="none">
<a href="../../hadoop-dynamometer/Dynamometer.html">Dynamometer</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/release/">Changelog and Release Notes</a>
</li>
<li class="none">
<a href="../../api/index.html">Java API docs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellAPI.html">Unix Shell API</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Metrics.html">Metrics</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/hdfs-rbf-default.xml">hdfs-rbf-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-kms/kms-default.html">kms-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/httpfs-default.html">httpfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="./images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<h1>Archival Storage, SSD &amp; Memory</h1>
<ul>
<li><a href="#Introduction">Introduction</a></li>
<li><a href="#Storage_Types_and_Storage_Policies">Storage Types and Storage Policies</a>
<ul>
<li><a href="#Storage_Types:_ARCHIVE.2C_DISK.2C_SSD.2C_RAM_DISK_and_NVDIMM">Storage Types: ARCHIVE, DISK, SSD, RAM_DISK and NVDIMM</a></li>
<li><a href="#Storage_Policies:_Hot.2C_Warm.2C_Cold.2C_All_SSD.2C_One_SSD.2C_Lazy_Persist.2C_Provided_and_All_NVDIMM">Storage Policies: Hot, Warm, Cold, All_SSD, One_SSD, Lazy_Persist, Provided and All_NVDIMM</a></li>
<li><a href="#Storage_Policy_Resolution">Storage Policy Resolution</a></li>
<li><a href="#Configuration">Configuration</a></li></ul></li>
<li><a href="#Storage_Policy_Based_Data_Movement">Storage Policy Based Data Movement</a>
<ul>
<li><a href="#Storage_Policy_Satisfier_.28SPS.29">Storage Policy Satisfier (SPS)</a>
<ul>
<li><a href="#Configurations:">Configurations:</a></li></ul></li>
<li><a href="#Mover_-_A_New_Data_Migration_Tool">Mover - A New Data Migration Tool</a>
<ul>
<li><a href="#Administrator_notes:">Administrator notes:</a></li></ul></li></ul></li>
<li><a href="#Storage_Policy_Commands">Storage Policy Commands</a>
<ul>
<li><a href="#List_Storage_Policies">List Storage Policies</a></li>
<li><a href="#Set_Storage_Policy">Set Storage Policy</a></li>
<li><a href="#Unset_Storage_Policy">Unset Storage Policy</a></li>
<li><a href="#Get_Storage_Policy">Get Storage Policy</a></li>
<li><a href="#Satisfy_Storage_Policy">Satisfy Storage Policy</a></li>
<li><a href="#Enable_external_service_outside_NN_or_Disable_SPS_without_restarting_Namenode">Enable external service outside NN or Disable SPS without restarting Namenode</a></li>
<li><a href="#Start_External_SPS_Service.">Start External SPS Service.</a></li></ul></li></ul>
<section>
<h2><a name="Introduction"></a>Introduction</h2>
<p><i>Archival Storage</i> is a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot can be moved to the cold. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster.</p>
<p>The frameworks provided by Heterogeneous Storage and Archival Storage generalizes the HDFS architecture to include other kinds of storage media including <i>SSD</i> and <i>memory</i>. Users may choose to store their data in SSD or memory for a better performance.</p></section><section>
<h2><a name="Storage_Types_and_Storage_Policies"></a>Storage Types and Storage Policies</h2><section>
<h3><a name="Storage_Types:_ARCHIVE.2C_DISK.2C_SSD.2C_RAM_DISK_and_NVDIMM"></a>Storage Types: ARCHIVE, DISK, SSD, RAM_DISK and NVDIMM</h3>
<p>The first phase of <a class="externalLink" href="https://issues.apache.org/jira/browse/HDFS-2832">Heterogeneous Storage (HDFS-2832)</a> changed datanode storage model from a single storage, which may correspond to multiple physical storage medias, to a collection of storages with each storage corresponding to a physical storage media. It also added the notion of storage types, DISK and SSD, where DISK is the default storage type.</p>
<p>A new storage type <i>ARCHIVE</i>, which has high storage density (petabyte of storage) but little compute power, is added for supporting archival storage.</p>
<p>Another new storage type <i>RAM_DISK</i> is added for supporting writing single replica files in memory.</p>
<p>From Hadoop 3.4, a new storage type <i>NVDIMM</i> is added for supporting writing replica files in non-volatile memory that has the capability to hold saved data even if the power is turned off.</p></section><section>
<h3><a name="Storage_Policies:_Hot.2C_Warm.2C_Cold.2C_All_SSD.2C_One_SSD.2C_Lazy_Persist.2C_Provided_and_All_NVDIMM"></a>Storage Policies: Hot, Warm, Cold, All_SSD, One_SSD, Lazy_Persist, Provided and All_NVDIMM</h3>
<p>A new concept of storage policies is introduced in order to allow files to be stored in different storage types according to the storage policy.</p>
<p>We have the following storage policies:</p>
<ul>
<li><b>Hot</b> - for both storage and compute. The data that is popular and still being used for processing will stay in this policy. When a block is hot, all replicas are stored in DISK.</li>
<li><b>Cold</b> - only for storage with limited compute. The data that is no longer being used, or data that needs to be archived is moved from hot storage to cold storage. When a block is cold, all replicas are stored in ARCHIVE.</li>
<li><b>Warm</b> - partially hot and partially cold. When a block is warm, some of its replicas are stored in DISK and the remaining replicas are stored in ARCHIVE.</li>
<li><b>All_SSD</b> - for storing all replicas in SSD.</li>
<li><b>One_SSD</b> - for storing one of the replicas in SSD. The remaining replicas are stored in DISK.</li>
<li><b>Lazy_Persist</b> - for writing blocks with single replica in memory. The replica is first written in RAM_DISK and then it is lazily persisted in DISK.</li>
<li><b>Provided</b> - for storing data outside HDFS. See also <a href="./HdfsProvidedStorage.html">HDFS Provided Storage</a>.</li>
<li><b>All_NVDIMM</b> - for storing all replicas in NVDIMM.</li>
</ul>
<p>More formally, a storage policy consists of the following fields:</p>
<ol style="list-style-type: decimal">
<li>Policy ID</li>
<li>Policy name</li>
<li>A list of storage types for block placement</li>
<li>A list of fallback storage types for file creation</li>
<li>A list of fallback storage types for replication</li>
</ol>
<p>When there is enough space, block replicas are stored according to the storage type list specified in #3. When some of the storage types in list #3 are running out of space, the fallback storage type lists specified in #4 and #5 are used to replace the out-of-space storage types for file creation and replication, respectively.</p>
<p>The following is a typical storage policy table.</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> <b>Policy</b> <b>ID</b> </th>
<th align="left"> <b>Policy</b> <b>Name</b> </th>
<th align="left"> <b>Block Placement</b> <b>(n&#xa0; replicas)</b> </th>
<th align="left"> <b>Fallback storages</b> <b>for creation</b> </th>
<th align="left"> <b>Fallback storages</b> <b>for replication</b> </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> 15 </td>
<td align="left"> Lazy_Persist </td>
<td align="left"> RAM_DISK: 1, DISK: <i>n</i>-1 </td>
<td align="left"> DISK </td>
<td align="left"> DISK </td></tr>
<tr class="a">
<td align="left"> 14 </td>
<td align="left"> All_NVDIMM </td>
<td align="left"> NVDIMM: <i>n</i> </td>
<td align="left"> DISK </td>
<td align="left"> DISK </td></tr>
<tr class="b">
<td align="left"> 12 </td>
<td align="left"> All_SSD </td>
<td align="left"> SSD: <i>n</i> </td>
<td align="left"> DISK </td>
<td align="left"> DISK </td></tr>
<tr class="a">
<td align="left"> 10 </td>
<td align="left"> One_SSD </td>
<td align="left"> SSD: 1, DISK: <i>n</i>-1 </td>
<td align="left"> SSD, DISK </td>
<td align="left"> SSD, DISK </td></tr>
<tr class="b">
<td align="left"> 7 </td>
<td align="left"> Hot (default) </td>
<td align="left"> DISK: <i>n</i> </td>
<td align="left"> &lt;none&gt; </td>
<td align="left"> ARCHIVE </td></tr>
<tr class="a">
<td align="left"> 5 </td>
<td align="left"> Warm </td>
<td align="left"> DISK: 1, ARCHIVE: <i>n</i>-1 </td>
<td align="left"> ARCHIVE, DISK </td>
<td align="left"> ARCHIVE, DISK </td></tr>
<tr class="b">
<td align="left"> 2 </td>
<td align="left"> Cold </td>
<td align="left"> ARCHIVE: <i>n</i> </td>
<td align="left"> &lt;none&gt; </td>
<td align="left"> &lt;none&gt; </td></tr>
<tr class="a">
<td align="left"> 1 </td>
<td align="left"> Provided </td>
<td align="left"> PROVIDED: 1, DISK: <i>n</i>-1 </td>
<td align="left"> PROVIDED, DISK </td>
<td align="left"> PROVIDED, DISK </td></tr>
</tbody>
</table>
<p>Note 1: The Lazy_Persist policy is useful only for single replica blocks. For blocks with more than one replicas, all the replicas will be written to DISK since writing only one of the replicas to RAM_DISK does not improve the overall performance.</p>
<p>Note 2: For the erasure coded files with striping layout, the suitable storage policies are All_SSD, Hot, Cold and All_NVDIMM. So, if user sets the policy for striped EC files other than the mentioned policies, it will not follow that policy while creating or moving block.</p></section><section>
<h3><a name="Storage_Policy_Resolution"></a>Storage Policy Resolution</h3>
<p>When a file or directory is created, its storage policy is <i>unspecified</i>. The storage policy can be specified using the &#x201c;<a href="#Set_Storage_Policy"><code>storagepolicies -setStoragePolicy</code></a>&#x201d; command. The effective storage policy of a file or directory is resolved by the following rules.</p>
<ol style="list-style-type: decimal">
<li>
<p>If the file or directory is specified with a storage policy, return it.</p>
</li>
<li>
<p>For an unspecified file or directory, if it is the root directory, return the <i>default storage policy</i>. Otherwise, return its parent&#x2019;s effective storage policy.</p>
</li>
</ol>
<p>The effective storage policy can be retrieved by the &#x201c;<a href="#Get_Storage_Policy"><code>storagepolicies -getStoragePolicy</code></a>&#x201d; command.</p></section><section>
<h3><a name="Configuration"></a>Configuration</h3>
<ul>
<li><b>dfs.storage.policy.enabled</b> - for enabling/disabling the storage policy feature. The default value is <code>true</code>.</li>
<li><b>dfs.storage.default.policy</b> - Set the default storage policy with the policy name. The default value is <code>HOT</code>. All possible policies are defined in enum StoragePolicy, including <code>LAZY_PERSIST</code> <code>ALL_SSD</code> <code>ONE_SSD</code> <code>HOT</code> <code>WARM</code> <code>COLD</code> <code>PROVIDED</code> and <code>ALL_NVDIMM</code>.</li>
<li><b>dfs.datanode.data.dir</b> - on each data node, the comma-separated storage locations should be tagged with their storage types. This allows storage policies to place the blocks on different storage types according to policy. For example:
<ol style="list-style-type: decimal">
<li>A datanode storage location /grid/dn/disk0 on DISK should be configured with <code>[DISK]file:///grid/dn/disk0</code></li>
<li>A datanode storage location /grid/dn/ssd0 on SSD can should configured with <code>[SSD]file:///grid/dn/ssd0</code></li>
<li>A datanode storage location /grid/dn/archive0 on ARCHIVE should be configured with <code>[ARCHIVE]file:///grid/dn/archive0</code></li>
<li>A datanode storage location /grid/dn/ram0 on RAM_DISK should be configured with <code>[RAM_DISK]file:///grid/dn/ram0</code></li>
<li>A datanode storage location /grid/dn/nvdimm0 on NVDIMM should be configured with <code>[NVDIMM]file:///grid/dn/nvdimm0</code></li>
</ol>
<p>The default storage type of a datanode storage location will be DISK if it does not have a storage type tagged explicitly.</p>
<p>Sometimes, users can setup the DataNode data directory to point to multiple volumes with different storage types. It is important to check if the volume is mounted correctly before initializing the storage locations. The user has the option to enforce the filesystem for a storage key with the following key: <b>dfs.datanode.storagetype.*.filesystem</b> - replace the &#x2018;*&#x2019; to any storage type, for example, dfs.datanode.storagetype.ARCHIVE.filesystem=fuse_filesystem</p>
</li>
</ul></section></section><section>
<h2><a name="Storage_Policy_Based_Data_Movement"></a>Storage Policy Based Data Movement</h2>
<p>Setting a new storage policy on already existing file/dir will change the policy in Namespace, but it will not move the blocks physically across storage medias. Following 2 options will allow users to move the blocks based on new policy set. So, once user change/set to a new policy on file/directory, user should also perform one of the following options to achieve the desired data movement. Note that both options cannot be allowed to run simultaneously.</p><section>
<h3><a name="Storage_Policy_Satisfier_.28SPS.29"></a><u>S</u>torage <u>P</u>olicy <u>S</u>atisfier (SPS)</h3>
<p>When user changes the storage policy on a file/directory, user can call <code>HdfsAdmin</code> API <code>satisfyStoragePolicy()</code> to move the blocks as per the new policy set. The SPS tool running external to namenode periodically scans for the storage mismatches between new policy set and the physical blocks placed. This will only track the files/directories for which user invoked satisfyStoragePolicy. If SPS identifies some blocks to be moved for a file, then it will schedule block movement tasks to datanodes. If there are any failures in movement, the SPS will re-attempt by sending new block movement tasks.</p>
<p>SPS can be enabled as an external service outside Namenode or disabled dynamically without restarting the Namenode.</p>
<p>Detailed design documentation can be found at <a class="externalLink" href="https://issues.apache.org/jira/browse/HDFS-10285">Storage Policy Satisfier(SPS) (HDFS-10285)</a></p>
<ul>
<li>
<p><b>Note</b>: When user invokes <code>satisfyStoragePolicy()</code> API on a directory, SPS will scan all sub-directories and consider all the files for satisfy the policy..</p>
</li>
<li>
<p>HdfsAdmin API : <code>public void satisfyStoragePolicy(final Path path) throws IOException</code></p>
</li>
<li>
<p>Arguments :</p>
</li>
</ul>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> </th>
<th align="left"> </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <code>path</code> </td>
<td align="left"> A path which requires blocks storage movement. </td></tr>
</tbody>
</table><section>
<h4><a name="Configurations:"></a>Configurations:</h4>
<ul>
<li>
<p><b>dfs.storage.policy.satisfier.mode</b> - Used to enable external service outside NN or disable SPS. Following string values are supported - <code>external</code>, <code>none</code>. Configuring <code>external</code> value represents SPS is enable and <code>none</code> to disable. The default value is <code>none</code>.</p>
</li>
<li>
<p><b>dfs.storage.policy.satisfier.recheck.timeout.millis</b> - A timeout to re-check the processed block storage movement command results from Datanodes.</p>
</li>
<li>
<p><b>dfs.storage.policy.satisfier.self.retry.timeout.millis</b> - A timeout to retry if no block movement results reported from Datanode in this configured timeout.</p>
</li>
</ul></section></section><section>
<h3><a name="Mover_-_A_New_Data_Migration_Tool"></a>Mover - A New Data Migration Tool</h3>
<p>A new data migration tool is added for archiving data. The tool is similar to Balancer. It periodically scans the files in HDFS to check if the block placement satisfies the storage policy. For the blocks violating the storage policy, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. Note that it always tries to move block replicas within the same node whenever possible. If that is not possible (e.g. when a node doesn&#x2019;t have the target storage type) then it will copy the block replicas to another node over the network.</p>
<ul>
<li>
<p>Command:</p>
<div class="source">
<div class="source">
<pre>hdfs mover [-p &lt;files/dirs&gt; | -f &lt;local file name&gt;]
</pre></div></div>
</li>
<li>
<p>Arguments:</p>
</li>
</ul>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> </th>
<th align="left"> </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <code>-p &lt;files/dirs&gt;</code> </td>
<td align="left"> Specify a space separated list of HDFS files/dirs to migrate. </td></tr>
<tr class="a">
<td align="left"> <code>-f &lt;local file&gt;</code> </td>
<td align="left"> Specify a local file containing a list of HDFS files/dirs to migrate. </td></tr>
</tbody>
</table>
<p>Note that, when both -p and -f options are omitted, the default path is the root directory.</p><section>
<h4><a name="Administrator_notes:"></a>Administrator notes:</h4>
<p><code>StoragePolicySatisfier</code> and <code>Mover tool</code> cannot run simultaneously. If a Mover instance is already triggered and running, SPS will be disabled while starting. In that case, administrator should make sure, Mover execution finished and then enable external SPS service again. Similarly when SPS enabled already, Mover cannot be run. If administrator is looking to run Mover tool explicitly, then he/she should make sure to disable SPS first and then run Mover. Please look at the commands section to know how to enable external service outside NN or disable SPS dynamically.</p></section></section></section><section>
<h2><a name="Storage_Policy_Commands"></a>Storage Policy Commands</h2><section>
<h3><a name="List_Storage_Policies"></a>List Storage Policies</h3>
<p>List out all the storage policies.</p>
<ul>
<li>
<p>Command:</p>
<div class="source">
<div class="source">
<pre>hdfs storagepolicies -listPolicies
</pre></div></div>
</li>
<li>
<p>Arguments: none.</p>
</li>
</ul></section><section>
<h3><a name="Set_Storage_Policy"></a>Set Storage Policy</h3>
<p>Set a storage policy to a file or a directory.</p>
<ul>
<li>
<p>Command:</p>
<div class="source">
<div class="source">
<pre>hdfs storagepolicies -setStoragePolicy -path &lt;path&gt; -policy &lt;policy&gt;
</pre></div></div>
</li>
<li>
<p>Arguments:</p>
</li>
</ul>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> </th>
<th align="left"> </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <code>-path &lt;path&gt;</code> </td>
<td align="left"> The path referring to either a directory or a file. </td></tr>
<tr class="a">
<td align="left"> <code>-policy &lt;policy&gt;</code> </td>
<td align="left"> The name of the storage policy. </td></tr>
</tbody>
</table></section><section>
<h3><a name="Unset_Storage_Policy"></a>Unset Storage Policy</h3>
<p>Unset a storage policy to a file or a directory. After the unset command the storage policy of the nearest ancestor will apply, and if there is no policy on any ancestor then the default storage policy will apply.</p>
<ul>
<li>
<p>Command:</p>
<div class="source">
<div class="source">
<pre>hdfs storagepolicies -unsetStoragePolicy -path &lt;path&gt;
</pre></div></div>
</li>
<li>
<p>Arguments:</p>
</li>
</ul>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> </th>
<th align="left"> </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <code>-path &lt;path&gt;</code> </td>
<td align="left"> The path referring to either a directory or a file. </td></tr>
</tbody>
</table></section><section>
<h3><a name="Get_Storage_Policy"></a>Get Storage Policy</h3>
<p>Get the storage policy of a file or a directory.</p>
<ul>
<li>
<p>Command:</p>
<div class="source">
<div class="source">
<pre>hdfs storagepolicies -getStoragePolicy -path &lt;path&gt;
</pre></div></div>
</li>
<li>
<p>Arguments:</p>
</li>
</ul>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> </th>
<th align="left"> </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <code>-path &lt;path&gt;</code> </td>
<td align="left"> The path referring to either a directory or a file. </td></tr>
</tbody>
</table></section><section>
<h3><a name="Satisfy_Storage_Policy"></a>Satisfy Storage Policy</h3>
<p>Schedule blocks to move based on file&#x2019;s/directory&#x2019;s current storage policy.</p>
<ul>
<li>
<p>Command:</p>
<div class="source">
<div class="source">
<pre>hdfs storagepolicies -satisfyStoragePolicy -path &lt;path&gt;
</pre></div></div>
</li>
<li>
<p>Arguments:</p>
</li>
</ul>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th align="left"> </th>
<th align="left"> </th></tr>
</thead><tbody>
<tr class="b">
<td align="left"> <code>-path &lt;path&gt;</code> </td>
<td align="left"> The path referring to either a directory or a file. </td></tr>
</tbody>
</table></section><section>
<h3><a name="Enable_external_service_outside_NN_or_Disable_SPS_without_restarting_Namenode"></a>Enable external service outside NN or Disable SPS without restarting Namenode</h3>
<p>If administrator wants to switch modes of SPS feature while Namenode is running, first he/she needs to update the desired value(external or none) for the configuration item <code>dfs.storage.policy.satisfier.mode</code> in configuration file (<code>hdfs-site.xml</code>) and then run the following Namenode reconfig command</p>
<ul>
<li>Command:
<p>hdfs dfsadmin -reconfig namenode <a href="host:ipc_port">host:ipc_port</a> start</p></li>
</ul></section><section>
<h3><a name="Start_External_SPS_Service."></a>Start External SPS Service.</h3>
<p>If administrator wants to start external sps, first he/she needs to configure property <code>dfs.storage.policy.satisfier.mode</code> with <code>external</code> value in configuration file (<code>hdfs-site.xml</code>) and then run Namenode reconfig command. Please ensure that network topology configurations in the configuration file are same as namenode, this cluster will be used for matching target nodes. After this, start external sps service using following command</p>
<ul>
<li>Command:
<p>hdfs &#x2013;daemon start sps</p></li>
</ul></section></section>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
&#169; 2008-2023
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>