hadoop/hadoop-project-dist/hadoop-hdfs/ViewFs.html

1056 lines
60 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
| Generated by Apache Maven Doxia at 2023-02-28
| Rendered using Apache Maven Stylus Skin 1.5
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop 3.4.0-SNAPSHOT &#x2013; ViewFs Guide</title>
<style type="text/css" media="all">
@import url("./css/maven-base.css");
@import url("./css/maven-theme.css");
@import url("./css/site.css");
</style>
<link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20230228" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://gitbox.apache.org/repos/asf/hadoop.git" class="externalLink">git</a>
|
<a href="http://hadoop.apache.org/" class="externalLink">Apache Hadoop</a>
&nbsp;| Last Published: 2023-02-28
&nbsp;| Version: 3.4.0-SNAPSHOT
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../../index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CommandsManual.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Compatibility.html">Compatibility Specification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DownstreamDev.html">Downstream Developer's Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AdminCompatibilityGuide.html">Admin Compatibility Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/InterfaceClassification.html">Interface Classification</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/filesystem/index.html">FileSystem Specification</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/FairCallQueue.html">Fair Call Queue</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Superusers.html">Proxy User</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>
</li>
<li class="none">
<a href="../../hadoop-kms/index.html">Hadoop KMS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Tracing.html">Tracing</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellGuide.html">Unix Shell Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/registry/index.html">Registry</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/AsyncProfilerServlet.html">Async Profiler</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">User Guide</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html">NameNode HA With QJM</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html">NameNode HA With NFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html">Observer NameNode</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFs.html">ViewFs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ViewFsOverloadScheme.html">ViewFsOverloadScheme</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">Snapshots</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">libhdfs (C API)</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS (REST API)</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/index.html">HttpFS</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">NFS Gateway</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">Rolling Upgrade</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html">Extended Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html">Transparent Encryption</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html">Multihoming</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html">Storage Policies</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/MemoryStorage.html">Memory Storage Support</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/SLGUserGuide.html">Synthetic Load Generator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html">Erasure Coding</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html">Disk Balancer</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html">Upgrade Domain</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html">DataNode Admin</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html">Router Federation</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html">Provided Storage</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html">Tutorial</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility with 1.x</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/SharedCacheSupport.html">Support for YARN Shared Cache</a>
</li>
</ul>
<h5>MapReduce REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html">MR History Server</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YARN.html">Architecture</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html">ResourceManager Restart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html">ResourceManager HA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceModel.html">Resource Model</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeLabel.html">Node Labels</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeAttributes.html">Node Attributes</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service V.2</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html">YARN Application Security</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManager.html">NodeManager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/DockerContainers.html">Running Applications in Docker Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/RuncContainers.html">Running Applications in runC Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html">Using CGroups</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SecureContainer.html">Secure Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ReservationSystem.html">Reservation System</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html">Graceful Decommission</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html">Opportunistic Containers</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/Federation.html">YARN Federation</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/SharedCache.html">Shared Cache</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingGpus.html">Using GPU</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/UsingFPGA.html">Using FPGA</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html">Placement Constraints</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/YarnUI2.html">YARN UI2</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1">Timeline Server</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html#Timeline_Service_v.2_REST_API">Timeline Service V.2</a>
</li>
</ul>
<h5>YARN Service</h5>
<ul>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html">QuickStart</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/Concepts.html">Concepts</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html">Yarn Service API</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/ServiceDiscovery.html">Service Discovery</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html">System Services</a>
</li>
</ul>
<h5>Hadoop Compatible File Systems</h5>
<ul>
<li class="none">
<a href="../../hadoop-aliyun/tools/hadoop-aliyun/index.html">Aliyun OSS</a>
</li>
<li class="none">
<a href="../../hadoop-aws/tools/hadoop-aws/index.html">Amazon S3</a>
</li>
<li class="none">
<a href="../../hadoop-azure/index.html">Azure Blob Storage</a>
</li>
<li class="none">
<a href="../../hadoop-azure-datalake/index.html">Azure Data Lake Storage</a>
</li>
<li class="none">
<a href="../../hadoop-cos/cloud-storage/index.html">Tencent COS</a>
</li>
<li class="none">
<a href="../../hadoop-huaweicloud/cloud-storage/index.html">Huaweicloud OBS</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Tools</h5>
<ul>
<li class="none">
<a href="../../hadoop-streaming/HadoopStreaming.html">Hadoop Streaming</a>
</li>
<li class="none">
<a href="../../hadoop-archives/HadoopArchives.html">Hadoop Archives</a>
</li>
<li class="none">
<a href="../../hadoop-archive-logs/HadoopArchiveLogs.html">Hadoop Archive Logs</a>
</li>
<li class="none">
<a href="../../hadoop-distcp/DistCp.html">DistCp</a>
</li>
<li class="none">
<a href="../../hadoop-federation-balance/HDFSFederationBalance.html">HDFS Federation Balance</a>
</li>
<li class="none">
<a href="../../hadoop-gridmix/GridMix.html">GridMix</a>
</li>
<li class="none">
<a href="../../hadoop-rumen/Rumen.html">Rumen</a>
</li>
<li class="none">
<a href="../../hadoop-resourceestimator/ResourceEstimator.html">Resource Estimator Service</a>
</li>
<li class="none">
<a href="../../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Benchmarking.html">Hadoop Benchmarking</a>
</li>
<li class="none">
<a href="../../hadoop-dynamometer/Dynamometer.html">Dynamometer</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/release/">Changelog and Release Notes</a>
</li>
<li class="none">
<a href="../../api/index.html">Java API docs</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/UnixShellAPI.html">Unix Shell API</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/Metrics.html">Metrics</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-hdfs-rbf/hdfs-rbf-default.xml">hdfs-rbf-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-kms/kms-default.html">kms-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-hdfs-httpfs/httpfs-default.html">httpfs-default.xml</a>
</li>
<li class="none">
<a href="../../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="./images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<h1>ViewFs Guide</h1>
<ul>
<li><a href="#Introduction">Introduction</a></li>
<li><a href="#The_Old_World_.28Prior_to_Federation.29">The Old World (Prior to Federation)</a>
<ul>
<li><a href="#Single_Namenode_Clusters">Single Namenode Clusters</a></li>
<li><a href="#Pathnames_Usage_Patterns">Pathnames Usage Patterns</a></li>
<li><a href="#Pathname_Usage_Best_Practices">Pathname Usage Best Practices</a></li></ul></li>
<li><a href="#New_World_.E2.80.93_Federation_and_ViewFs">New World &#x2013; Federation and ViewFs</a>
<ul>
<li><a href="#How_The_Clusters_Look">How The Clusters Look</a></li>
<li><a href="#A_Global_Namespace_Per_Cluster_Using_ViewFs">A Global Namespace Per Cluster Using ViewFs</a></li>
<li><a href="#Pathname_Usage_Patterns">Pathname Usage Patterns</a></li>
<li><a href="#Pathname_Usage_Best_Practices">Pathname Usage Best Practices</a></li>
<li><a href="#Renaming_Pathnames_Across_Namespaces">Renaming Pathnames Across Namespaces</a></li></ul></li>
<li><a href="#Multi-Filesystem_I.2F0_with_Nfly_Mount_Points">Multi-Filesystem I/0 with Nfly Mount Points</a>
<ul>
<li><a href="#Basic_Configuration">Basic Configuration</a></li>
<li><a href="#Advanced_Configuration">Advanced Configuration</a></li>
<li><a href="#Network_Topology">Network Topology</a></li>
<li><a href="#Example_Nfly_Configuration">Example Nfly Configuration</a></li>
<li><a href="#How_Nfly_File_Creation_works">How Nfly File Creation works</a></li>
<li><a href="#FAQ">FAQ</a></li></ul></li>
<li><a href="#Don.E2.80.99t_want_to_change_scheme_or_difficult_to_copy_mount-table_configurations_to_all_clients.3F">Don&#x2019;t want to change scheme or difficult to copy mount-table configurations to all clients?</a></li>
<li><a href="#Regex_Pattern_Based_Mount_Points">Regex Pattern Based Mount Points</a>
<ul>
<li><a href="#Understand_the_Difference">Understand the Difference</a></li>
<li><a href="#Basic_Regex_Link_Mapping_Config">Basic Regex Link Mapping Config</a></li>
<li><a href="#Regex_Link_Mapping_With_Interceptors">Regex Link Mapping With Interceptors</a></li></ul></li>
<li><a href="#Appendix:_A_Mount_Table_Configuration_Example">Appendix: A Mount Table Configuration Example</a></li></ul>
<section>
<h2><a name="Introduction"></a>Introduction</h2>
<p>The View File System (ViewFs) provides a way to manage multiple Hadoop file system namespaces (or namespace volumes). It is particularly useful for clusters having multiple namenodes, and hence multiple namespaces, in <a href="./Federation.html">HDFS Federation</a>. ViewFs is analogous to <i>client side mount tables</i> in some Unix/Linux systems. ViewFs can be used to create personalized namespace views and also per-cluster common views.</p>
<p>This guide is presented in the context of Hadoop systems that have several clusters, each cluster may be federated into multiple namespaces. It also describes how to use ViewFs in federated HDFS to provide a per-cluster global namespace so that applications can operate in a way similar to the pre-federation world.</p></section><section>
<h2><a name="The_Old_World_.28Prior_to_Federation.29"></a>The Old World (Prior to Federation)</h2><section>
<h3><a name="Single_Namenode_Clusters"></a>Single Namenode Clusters</h3>
<p>In the old world prior to <a href="./Federation.html">HDFS Federation</a>, a cluster has a single namenode which provides a single file system namespace for that cluster. Suppose there are multiple clusters. The file system namespaces of each cluster are completely independent and disjoint. Furthermore, physical storage is NOT shared across clusters (i.e. the Datanodes are not shared across clusters.)</p>
<p>The <code>core-site.xml</code> of each cluster has a configuration property that sets the default file system to the namenode of that cluster:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.default.name&lt;/name&gt;
&lt;value&gt;hdfs://namenodeOfClusterX:port&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>Such a configuration property allows one to use slash-relative names to resolve paths relative to the cluster namenode. For example, the path <code>/foo/bar</code> is referring to <code>hdfs://namenodeOfClusterX:port/foo/bar</code> using the above configuration.</p>
<p>This configuration property is set on each gateway on the clusters and also on key services of that cluster such the JobTracker and Oozie.</p></section><section>
<h3><a name="Pathnames_Usage_Patterns"></a>Pathnames Usage Patterns</h3>
<p>Hence on Cluster X where the <code>core-site.xml</code> is set as above, the typical pathnames are</p>
<ol style="list-style-type: decimal">
<li>
<p><code>/foo/bar</code></p>
<ul>
<li>This is equivalent to <code>hdfs://namenodeOfClusterX:port/foo/bar</code> as before.</li>
</ul>
</li>
<li>
<p><code>hdfs://namenodeOfClusterX:port/foo/bar</code></p>
<ul>
<li>While this is a valid pathname, one is better using <code>/foo/bar</code> as it allows the application and its data to be transparently moved to another cluster when needed.</li>
</ul>
</li>
<li>
<p><code>hdfs://namenodeOfClusterY:port/foo/bar</code></p>
<ul>
<li>It is an URI for referring a pathname on another cluster such as Cluster Y. In particular, the command for copying files from cluster Y to Cluster Z looks like:
<div class="source">
<div class="source">
<pre>distcp hdfs://namenodeClusterY:port/pathSrc hdfs://namenodeClusterZ:port/pathDest
</pre></div></div>
</li>
</ul>
</li>
<li>
<p><code>webhdfs://namenodeClusterX:http_port/foo/bar</code></p>
<ul>
<li>It is an URI for accessing files via the WebHDFS file system. Note that WebHDFS uses the HTTP port of the namenode but not the RPC port.</li>
</ul>
</li>
<li>
<p><code>http://namenodeClusterX:http_port/webhdfs/v1/foo/bar</code> and <code>http://proxyClusterX:http_port/foo/bar</code></p>
<ul>
<li>These are HTTP URLs respectively for accessing files via <a href="./WebHDFS.html">WebHDFS REST API</a> and HDFS proxy.</li>
</ul>
</li>
</ol></section><section>
<h3><a name="Pathname_Usage_Best_Practices"></a>Pathname Usage Best Practices</h3>
<p>When one is within a cluster, it is recommended to use the pathname of type (1) above instead of a fully qualified URI like (2). Fully qualified URIs are similar to addresses and do not allow the application to move along with its data.</p></section></section><section>
<h2><a name="New_World_.E2.80.93_Federation_and_ViewFs"></a>New World &#x2013; Federation and ViewFs</h2><section>
<h3><a name="How_The_Clusters_Look"></a>How The Clusters Look</h3>
<p>Suppose there are multiple clusters. Each cluster has one or more namenodes. Each namenode has its own namespace. A namenode belongs to one and only one cluster. The namenodes in the same cluster share the physical storage of that cluster. The namespaces across clusters are independent as before.</p>
<p>Operations decide what is stored on each namenode within a cluster based on the storage needs. For example, they may put all the user data (<code>/user/&lt;username&gt;</code>) in one namenode, all the feed-data (<code>/data</code>) in another namenode, all the projects (<code>/projects</code>) in yet another namenode, etc.</p></section><section>
<h3><a name="A_Global_Namespace_Per_Cluster_Using_ViewFs"></a>A Global Namespace Per Cluster Using ViewFs</h3>
<p>In order to provide transparency with the old world, the ViewFs file system (i.e. client-side mount table) is used to create each cluster an independent cluster namespace view, which is similar to the namespace in the old world. The client-side mount tables like the Unix mount tables and they mount the new namespace volumes using the old naming convention. The following figure shows a mount table mounting four namespace volumes <code>/user</code>, <code>/data</code>, <code>/projects</code>, and <code>/tmp</code>:</p>
<p><img src="./images/viewfs_TypicalMountTable.png" alt="Typical Mount Table for each Cluster" /></p>
<p>ViewFs implements the Hadoop file system interface just like HDFS and the local file system. It is a trivial file system in the sense that it only allows linking to other file systems. Because ViewFs implements the Hadoop file system interface, it works transparently Hadoop tools. For example, all the shell commands work with ViewFs as with HDFS and local file system.</p>
<p>In the configuration of each cluster, the default file system is set to the mount table for that cluster as shown below (compare it with the configuration in <a href="#Single_Namenode_Clusters">Single Namenode Clusters</a>).</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.defaultFS&lt;/name&gt;
&lt;value&gt;viewfs://clusterX&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>The authority following the <code>viewfs://</code> scheme in the URI is the mount table name. It is recommended that the mount table of a cluster should be named by the cluster name. Then Hadoop system will look for a mount table with the name &#x201c;clusterX&#x201d; in the Hadoop configuration files. Operations arrange all gateways and service machines to contain the mount tables for ALL clusters such that, for each cluster, the default file system is set to the ViewFs mount table for that cluster as described above.</p>
<p>The mount points of a mount table are specified in the standard Hadoop configuration files. All the mount table config entries for <code>viewfs</code> are prefixed by <code>fs.viewfs.mounttable.</code>. The mount points that are linking other filesystems are specified using <code>link</code> tags. The recommendation is to have mount points name same as in the linked filesystem target locations. For all namespaces that are not configured in the mount table, we can have them fallback to a default filesystem via <code>linkFallback</code>.</p>
<p>In the below mount table configuration, namespace <code>/data</code> is linked to the filesystem <code>hdfs://nn1-clusterx.example.com:8020/data</code>, <code>/project</code> is linked to the filesystem <code>hdfs://nn2-clusterx.example.com:8020/project</code>. All namespaces that are not configured in the mount table, like <code>/logs</code> are linked to the filesystem <code>hdfs://nn5-clusterx.example.com:8020/home</code>.</p>
<div class="source">
<div class="source">
<pre>&lt;configuration&gt;
&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.clusterX.link./data&lt;/name&gt;
&lt;value&gt;hdfs://nn1-clusterx.example.com:8020/data&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.clusterX.link./project&lt;/name&gt;
&lt;value&gt;hdfs://nn2-clusterx.example.com:8020/project&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.clusterX.link./user&lt;/name&gt;
&lt;value&gt;hdfs://nn3-clusterx.example.com:8020/user&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.clusterX.link./tmp&lt;/name&gt;
&lt;value&gt;hdfs://nn4-clusterx.example.com:8020/tmp&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.clusterX.linkFallback&lt;/name&gt;
&lt;value&gt;hdfs://nn5-clusterx.example.com:8020/home&lt;/value&gt;
&lt;/property&gt;
&lt;/configuration&gt;
</pre></div></div>
<p>Alternatively we can have the mount table&#x2019;s root merged with the root of another filesystem via <code>linkMergeSlash</code>. In the below mount table configuration, clusterY&#x2019;s root is merged with the root filesystem at <code>hdfs://nn1-clustery.example.com:8020</code>.</p>
<div class="source">
<div class="source">
<pre>&lt;configuration&gt;
&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.clusterY.linkMergeSlash&lt;/name&gt;
&lt;value&gt;hdfs://nn1-clustery.example.com:8020/&lt;/value&gt;
&lt;/property&gt;
&lt;/configuration&gt;
</pre></div></div>
</section><section>
<h3><a name="Pathname_Usage_Patterns"></a>Pathname Usage Patterns</h3>
<p>Hence on Cluster X, where the <code>core-site.xml</code> is set to make the default fs to use the mount table of that cluster, the typical pathnames are</p>
<ol style="list-style-type: decimal">
<li>
<p><code>/foo/bar</code></p>
<ul>
<li>This is equivalent to <code>viewfs://clusterX/foo/bar</code>. If such pathname is used in the old non-federated world, then the transition to federation world is transparent.</li>
</ul>
</li>
<li>
<p><code>viewfs://clusterX/foo/bar</code></p>
<ul>
<li>While this a valid pathname, one is better using <code>/foo/bar</code> as it allows the application and its data to be transparently moved to another cluster when needed.</li>
</ul>
</li>
<li>
<p><code>viewfs://clusterY/foo/bar</code></p>
<ul>
<li>It is an URI for referring a pathname on another cluster such as Cluster Y. In particular, the command for copying files from cluster Y to Cluster Z looks like:
<div class="source">
<div class="source">
<pre>distcp viewfs://clusterY/pathSrc viewfs://clusterZ/pathDest
</pre></div></div>
</li>
</ul>
</li>
<li>
<p><code>viewfs://clusterX-webhdfs/foo/bar</code></p>
<ul>
<li>It is an URI for accessing files via the WebHDFS file system.</li>
</ul>
</li>
<li>
<p><code>http://namenodeClusterX:http_port/webhdfs/v1/foo/bar</code> and <code>http://proxyClusterX:http_port/foo/bar</code></p>
<ul>
<li>These are HTTP URLs respectively for accessing files via <a href="./WebHDFS.html">WebHDFS REST API</a> and HDFS proxy. Note that they are the same as before.</li>
</ul>
</li>
</ol></section><section>
<h3><a name="Pathname_Usage_Best_Practices"></a>Pathname Usage Best Practices</h3>
<p>When one is within a cluster, it is recommended to use the pathname of type (1) above instead of a fully qualified URI like (2). Further, applications should not use the knowledge of the mount points and use a path like <code>hdfs://namenodeContainingUserDirs:port/joe/foo/bar</code> to refer to a file in a particular namenode. One should use <code>/user/joe/foo/bar</code> instead.</p></section><section>
<h3><a name="Renaming_Pathnames_Across_Namespaces"></a>Renaming Pathnames Across Namespaces</h3>
<p>Recall that one cannot rename files or directories across namenodes or clusters in the old world. The same is true in the new world but with an additional twist. For example, in the old world one can perform the commend below.</p>
<div class="source">
<div class="source">
<pre>rename /user/joe/myStuff /data/foo/bar
</pre></div></div>
<p>This will NOT work in the new world if <code>/user</code> and <code>/data</code> are actually stored on different namenodes within a cluster.</p></section></section><section>
<h2><a name="Multi-Filesystem_I.2F0_with_Nfly_Mount_Points"></a>Multi-Filesystem I/0 with Nfly Mount Points</h2>
<p>HDFS and other distributed filesystems provide data resilience via some sort of redundancy such as block replication or more sophisticated distributed encoding. However, modern setups may be comprised of multiple Hadoop clusters, enterprise filers, hosted on and off premise. Nfly mount points make it possible for a single logical file to be synchronously replicated by multiple filesystems. It&#x2019;s designed for a relatively small files up to a gigabyte. In general it&#x2019;s a function of a single core/single network link performance since the logic resides in a single client JVM using ViewFs such as FsShell or a MapReduce task.</p><section>
<h3><a name="Basic_Configuration"></a>Basic Configuration</h3>
<p>Consider the following example to understand the basic configuration of Nfly. Suppose we want to keep the directory <code>ads</code> replicated on three filesystems represented by URIs: <code>uri1</code>, <code>uri2</code> and <code>uri3</code>.</p>
<div class="source">
<div class="source">
<pre> &lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.global.linkNfly../ads&lt;/name&gt;
&lt;value&gt;uri1,uri2,uri3&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>Note 2 consecutive <code>..</code> in the property name. They arise because of empty settings for advanced tweaking of the mount point which we will show in subsequent sections. The property value is a comma-separated list of URIs.</p>
<p>URIs may point to different clusters in different regions <code>hdfs://datacenter-east/ads</code>, <code>s3a://models-us-west/ads</code>, <code>hdfs://datacenter-west/ads</code> or in the simplest case to different directories under the same filesystem, e.g., <code>file:/tmp/ads1</code>, <code>file:/tmp/ads2</code>, <code>file:/tmp/ads3</code></p>
<p>All <i>modifications</i> performed under the global path <code>viewfs://global/ads</code> are propagated to all destination URIs if the underlying system is available.</p>
<p>For instance if we create a file via hadoop shell</p>
<div class="source">
<div class="source">
<pre>hadoop fs -touchz viewfs://global/ads/z1
</pre></div></div>
<p>We will find it via local filesystem in the latter configuration</p>
<div class="source">
<div class="source">
<pre>ls -al /tmp/ads*/z1
-rw-r--r-- 1 user wheel 0 Mar 11 12:17 /tmp/ads1/z1
-rw-r--r-- 1 user wheel 0 Mar 11 12:17 /tmp/ads2/z1
-rw-r--r-- 1 user wheel 0 Mar 11 12:17 /tmp/ads3/z1
</pre></div></div>
<p>A read from the global path is processed by the first filesystem that does not result in an exception. The order in which filesystems are accessed depends on whether they are available at this moment or and whether a topological order exists.</p></section><section>
<h3><a name="Advanced_Configuration"></a>Advanced Configuration</h3>
<p>Mount points <code>linkNfly</code> can be further configured using parameters passed as a comma-separated list of key=value pairs. Following parameters are currently supported.</p>
<p><code>minReplication=int</code> determines the minimum number of destinations that have to process a write modification without exceptions, if below nfly write is failed. It is an configuration error to have minReplication higher than the number of target URIs. The default is 2.</p>
<p>If minReplication is lower than the number of target URIs we may have some target URIs without latest writes. It can be compensated by employing more expensive read operations controlled by the following settings</p>
<p><code>readMostRecent=boolean</code> if set to <code>true</code> causes Nfly client to check the path under all target URIs instead of just the first one based on the topology order. Among all available at the moment the one with the most recent modification time is processed.</p>
<p><code>repairOnRead=boolean</code> if set to <code>true</code> causes Nfly to copy most recent replica to stale targets such that subsequent reads can be done cheaply again from the closest replica.</p></section><section>
<h3><a name="Network_Topology"></a>Network Topology</h3>
<p>Nfly seeks to satisfy reads from the &#x201c;closest&#x201d; target URI.</p>
<p>To this end, Nfly extends the notion of <a href="hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a> to the authorities of target URIs.</p>
<p>Nfly applies NetworkTopology to resolve authorities of the URIs. Most commonly a script based mapping is used in a heterogeneous setup. We could have a script providing the following topology mapping</p>
<table border="0" class="bodyTable">
<thead>
<tr class="a">
<th> URI </th>
<th> Topology </th></tr>
</thead><tbody>
<tr class="b">
<td> <code>hdfs://datacenter-east/ads</code> </td>
<td> /us-east/onpremise-hdfs </td></tr>
<tr class="a">
<td> <code>s3a://models-us-west/ads</code> </td>
<td> /us-west/aws </td></tr>
<tr class="b">
<td> <code>hdfs://datacenter-west/ads</code> </td>
<td> /us-west/onpremise-hdfs </td></tr>
</tbody>
</table>
<p>If a target URI does not have the authority part as in <code>file:/</code> Nfly injects client&#x2019;s local node name.</p></section><section>
<h3><a name="Example_Nfly_Configuration"></a>Example Nfly Configuration</h3>
<div class="source">
<div class="source">
<pre> &lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.global.linkNfly.minReplication=3,readMostRecent=true,repairOnRead=false./ads&lt;/name&gt;
&lt;value&gt;hdfs://datacenter-east/ads,hdfs://datacenter-west/ads,s3a://models-us-west/ads,file:/tmp/ads&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
</section><section>
<h3><a name="How_Nfly_File_Creation_works"></a>How Nfly File Creation works</h3>
<div class="source">
<div class="source">
<pre>FileSystem fs = FileSystem.get(&quot;viewfs://global/&quot;, ...);
FSDataOutputStream out = fs.create(&quot;viewfs://global/ads/f1&quot;);
out.write(...);
out.close();
</pre></div></div>
<p>The code above would result in the following execution.</p>
<ol style="list-style-type: decimal">
<li>
<p>create an invisible file <code>_nfly_tmp_f1</code> under each target URI i.e., <code>hdfs://datacenter-east/ads/_nfly_tmp_f1</code>, <code>hdfs://datacenter-west/ads/_nfly_tmp_f1</code>, etc. This is done by calling <code>create</code> on underlying filesystems and returns a <code>FSDataOutputStream</code> object <code>out</code> that wraps all four output streams.</p>
</li>
<li>
<p>Thus each subsequent write on <code>out</code> can be forwarded to each wrapped stream.</p>
</li>
<li>
<p>On <code>out.close</code> all streams are closed, and the files are renamed from <code>_nfly_tmp_f1</code> to <code>f1</code>. All files receive the same <i>modification time</i> corresponding to the client system time as of beginning of this step.</p>
</li>
<li>
<p>If at least <code>minReplication</code> destinations have gone through steps 1-3 without failures Nfly considers the transaction logically committed; Otherwise it tries to clean up the temporary files in a best-effort attempt.</p>
</li>
</ol>
<p>Note that because 4 is a best-effort step and the client JVM could crash and never resume its work, it&#x2019;s a good idea to provision some sort of cron job to purge such <code>_nfly_tmp</code> files.</p></section><section>
<h3><a name="FAQ"></a>FAQ</h3>
<ol style="list-style-type: decimal">
<li>
<p><b>As I move from non-federated world to the federated world, I will have to keep track of namenodes for different volumes; how do I do that?</b></p>
<p>No, you won&#x2019;t. See the examples above &#x2013; you are either using a relative name and taking advantage of the default file system, or changing your path from <code>hdfs://namenodeCLusterX/foo/bar</code> to <code>viewfs://clusterX/foo/bar</code>.</p>
</li>
<li>
<p><b>What happens of Operations move some files from one namenode to another namenode within a cluster?</b></p>
<p>Operations may move files from one namenode to another in order to deal with storage capacity issues. They will do this in a way to avoid applications from breaking. Let&#x2019;s take some examples.</p>
<ul>
<li>
<p>Example 1: <code>/user</code> and <code>/data</code> were on one namenode and later they need to be on separate namenodes to deal with capacity issues. Indeed, operations would have created separate mount points for <code>/user</code> and <code>/data</code>. Prior to the change the mounts for <code>/user</code> and <code>/data</code> would have pointed to the same namenode, say <code>namenodeContainingUserAndData</code>. Operations will update the mount tables so that the mount points are changed to <code>namenodeContaingUser</code> and <code>namenodeContainingData</code>, respectively.</p>
</li>
<li>
<p>Example 2: All projects were fitted on one namenode and but later they need two or more namenodes. ViewFs allows mounts like <code>/project/foo</code> and <code>/project/bar</code>. This allows mount tables to be updated to point to the corresponding namenode.</p>
</li>
</ul>
</li>
<li>
<p><b>Is the mount table in each</b> <code>core-site.xml</code> <b>or in a separate file of its own?</b></p>
<p>The plan is to keep the mount tables in separate files and have the <code>core-site.xml</code> <a class="externalLink" href="http://www.w3.org/2001/XInclude">xincluding</a> it. While one can keep these files on each machine locally, it is better to use HTTP to access it from a central location.</p>
</li>
<li>
<p><b>Should the configuration have the mount table definitions for only one cluster or all clusters?</b></p>
<p>The configuration should have the mount definitions for all clusters since one needs to have access to data in other clusters such as with distcp.</p>
</li>
<li>
<p><b>When is the mount table actually read given that Operations may change a mount table over time?</b></p>
<p>The mount table is read when the job is submitted to the cluster. The <code>XInclude</code> in <code>core-site.xml</code> is expanded at job submission time. This means that if the mount table are changed then the jobs need to be resubmitted. Due to this reason, we want to implement merge-mount which will greatly reduce the need to change mount tables. Further, we would like to read the mount tables via another mechanism that is initialized at job start time in the future.</p>
</li>
<li>
<p><b>Will JobTracker (or Yarn&#x2019;s Resource Manager) itself use the ViewFs?</b></p>
<p>No, it does not need to. Neither does the NodeManager.</p>
</li>
<li>
<p><b>Does ViewFs allow only mounts at the top level?</b></p>
<p>No; it is more general. For example, one can mount <code>/user/joe</code> and <code>/user/jane</code>. In this case, an internal read-only directory is created for <code>/user</code> in the mount table. All operations on <code>/user</code> are valid except that <code>/user</code> is read-only.</p>
</li>
<li>
<p><b>An application works across the clusters and needs to persistently store file paths. Which paths should it store?</b></p>
<p>You should store <code>viewfs://cluster/path</code> type path names, the same as it uses when running applications. This insulates you from movement of data within namenodes inside a cluster as long as operations do the moves in a transparent fashion. It does not insulate you if data gets moved from one cluster to another; the older (pre-federation) world did not protect you form such data movements across clusters anyway.</p>
</li>
<li>
<p><b>What about delegation tokens?</b></p>
<p>Delegation tokens for the cluster to which you are submitting the job (including all mounted volumes for that cluster&#x2019;s mount table), and for input and output paths to your map-reduce job (including all volumes mounted via mount tables for the specified input and output paths) are all handled automatically. In addition, there is a way to add additional delegation tokens to the base cluster configuration for special circumstances.</p>
</li>
</ol></section></section><section>
<h2><a name="Don.E2.80.99t_want_to_change_scheme_or_difficult_to_copy_mount-table_configurations_to_all_clients.3F"></a>Don&#x2019;t want to change scheme or difficult to copy mount-table configurations to all clients?</h2>
<p>Please refer to the <a href="./ViewFsOverloadScheme.html">View File System Overload Scheme Guide</a></p></section><section>
<h2><a name="Regex_Pattern_Based_Mount_Points"></a>Regex Pattern Based Mount Points</h2>
<p>The view file system mount points were a Key-Value based mapping system. It is not friendly for user cases which mapping config could be abstracted to rules. E.g. Users want to provide a GCS bucket per user and there might be thousands of users in total. The old key-value based approach won&#x2019;t work well for several reasons:</p>
<ol style="list-style-type: decimal">
<li>
<p>The mount table is used by FileSystem clients. There&#x2019;s a cost to spread the config to all clients and we should avoid it if possible. The <a href="./ViewFsOverloadScheme.html">View File System Overload Scheme Guide</a> could help the distribution by central mount table management. But the mount table still have to be updated on every change. The change could be greatly avoided if provide a rule-based mount table.</p>
</li>
<li>
<p>The client have to understand all the KVs in the mount table. This is not ideal when the mountable grows to thousands of items. E.g. thousands of file systems might be initialized even users only need one. And the config itself will become bloated at scale.</p>
</li>
</ol><section>
<h3><a name="Understand_the_Difference"></a>Understand the Difference</h3>
<p>In the key-value based mount table, view file system treats every mount point as a partition. There&#x2019;s several file system APIs which will lead to operation on all partitions. E.g. there&#x2019;s an HDFS cluster with multiple mount. Users want to run &#x201c;hadoop fs -put file <a class="externalLink" href="viewfs://hdfs.namenode.apache.org/tmp/”">viewfs://hdfs.namenode.apache.org/tmp/&#x201d;</a> cmd to copy data from local disk to our HDFS cluster. The cmd will trigger ViewFileSystem to call setVerifyChecksum() method which will initialize the file system for every mount point. For a regex rule based mount table entry, we couldn&#x2019;t know what&#x2019;s corresponding path until parsing. So the regex based mount table entry will be ignored on such cases. The file system (ChRootedFileSystem) will be created upon accessing. But the underlying file system will be cached by inner cache of ViewFileSystem.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.viewfs.rename.strategy&lt;/name&gt;
&lt;value&gt;SAME_FILESYSTEM_ACROSS_MOUNTPOINT&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
</section><section>
<h3><a name="Basic_Regex_Link_Mapping_Config"></a>Basic Regex Link Mapping Config</h3>
<p>Here&#x2019;s an example of base regex mount point config. ${username} is the named capture group in Java Regex.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.hadoop-nn.linkRegx./^(?&lt;username&gt;\\w+)&lt;/name&gt;
&lt;value&gt;gs://${username}.hadoop.apache.org/&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>Parsing example.</p>
<div class="source">
<div class="source">
<pre>viewfs://hadoop-nn/user1/dir1 =&gt; gs://user1.hadoop.apache.org/dir1
viewfs://hadoop-nn/user2 =&gt; gs://user2.hadoop.apache.org/
</pre></div></div>
<p>The src/key&#x2019;s format are</p>
<div class="source">
<div class="source">
<pre>fs.viewfs.mounttable.${VIEWNAME}.linkRegx.${REGEX_STR}
</pre></div></div>
</section><section>
<h3><a name="Regex_Link_Mapping_With_Interceptors"></a>Regex Link Mapping With Interceptors</h3>
<p>Interceptor is one mechanism introduced to modify source or target in the resolution process. It&#x2019;s optional and could be used to satisfy user cases such as replace specific character or replace some word. Interceptor will only work for regex mount point. RegexMountPointResolvedDstPathReplaceInterceptor is the only build-in interceptor now.</p>
<p>Here&#x2019;s an example regex mount point entry with RegexMountPointResolvedDstPathReplaceInterceptor set.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.hadoop-nn.linkRegx.replaceresolveddstpath:_:-#./^(?&lt;username&gt;\\w+)&lt;/name&gt;
&lt;value&gt;gs://${username}.hadoop.apache.org/&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>The <code>replaceresolveddstpath:_:-</code> is an interceptor setting. &#x201c;replaceresolveddstpath&#x201d; is the interceptor type, &#x201c;_&#x201d; is the string to replace and &#x201c;-&#x201d; is the string after replace.</p>
<p>Parsing example.</p>
<div class="source">
<div class="source">
<pre>viewfs://hadoop-nn/user_ad/dir1 =&gt; gs://user-ad.hadoop.apache.org/dir1
viewfs://hadoop-nn/user_ad_click =&gt; gs://user-ad-click.hadoop.apache.org/
</pre></div></div>
<p>The src/key&#x2019;s format are</p>
<div class="source">
<div class="source">
<pre>fs.viewfs.mounttable.${VIEWNAME}.linkRegx.${REGEX_STR}
fs.viewfs.mounttable.${VIEWNAME}.linkRegx.${interceptorSettings}#.${srcRegex}
</pre></div></div>
</section></section><section>
<h2><a name="Appendix:_A_Mount_Table_Configuration_Example"></a>Appendix: A Mount Table Configuration Example</h2>
<p>Generally, users do not have to define mount tables or the <code>core-site.xml</code> to use the mount table. This is done by operations and the correct configuration is set on the right gateway machines as is done for <code>core-site.xml</code> today.</p>
<p>The mount tables can be described in <code>core-site.xml</code> but it is better to use indirection in <code>core-site.xml</code> to reference a separate configuration file, say <code>mountTable.xml</code>. Add the following configuration element to <code>core-site.xml</code> for referencing <code>mountTable.xml</code>:</p>
<div class="source">
<div class="source">
<pre>&lt;configuration xmlns:xi=&quot;http://www.w3.org/2001/XInclude&quot;&gt;
&lt;xi:include href=&quot;mountTable.xml&quot; /&gt;
&lt;/configuration&gt;
</pre></div></div>
<p>In the file <code>mountTable.xml</code>, there is a definition of the mount table &#x201c;clusterX&#x201d; for the hypothetical cluster that is a federation of the three namespace volumes managed by the three namenodes</p>
<ol style="list-style-type: decimal">
<li>nn1-clusterx.example.com:8020,</li>
<li>nn2-clusterx.example.com:8020, and</li>
<li>nn3-clusterx.example.com:8020.</li>
</ol>
<p>Here <code>/home</code> and <code>/tmp</code> are in the namespace managed by namenode nn1-clusterx.example.com:8020, and projects <code>/foo</code> and <code>/bar</code> are hosted on the other namenodes of the federated cluster. The home directory base path is set to <code>/home</code> so that each user can access its home directory using the getHomeDirectory() method defined in <a href="../../api/org/apache/hadoop/fs/FileSystem.html">FileSystem</a>/<a href="../../api/org/apache/hadoop/fs/FileContext.html">FileContext</a>.</p>
<div class="source">
<div class="source">
<pre>&lt;configuration&gt;
&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.clusterX.homedir&lt;/name&gt;
&lt;value&gt;/home&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.clusterX.link./home&lt;/name&gt;
&lt;value&gt;hdfs://nn1-clusterx.example.com:8020/home&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.clusterX.link./tmp&lt;/name&gt;
&lt;value&gt;hdfs://nn1-clusterx.example.com:8020/tmp&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.clusterX.link./projects/foo&lt;/name&gt;
&lt;value&gt;hdfs://nn2-clusterx.example.com:8020/projects/foo&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.viewfs.mounttable.clusterX.link./projects/bar&lt;/name&gt;
&lt;value&gt;hdfs://nn3-clusterx.example.com:8020/projects/bar&lt;/value&gt;
&lt;/property&gt;
&lt;/configuration&gt;
</pre></div></div></section>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
&#169; 2008-2023
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>