hadoop/hadoop-azure/index.html

986 lines
54 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
| Generated by Apache Maven Doxia at 2023-03-08
| Rendered using Apache Maven Stylus Skin 1.5
-->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache Hadoop Azure support &#x2013; Hadoop Azure Support: Azure Blob Storage</title>
<style type="text/css" media="all">
@import url("./css/maven-base.css");
@import url("./css/maven-theme.css");
@import url("./css/site.css");
</style>
<link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20230308" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body class="composite">
<div id="banner">
<a href="http://hadoop.apache.org/" id="bannerLeft">
<img src="http://hadoop.apache.org/images/hadoop-logo.jpg" alt="" />
</a>
<a href="http://www.apache.org/" id="bannerRight">
<img src="http://www.apache.org/images/asf_logo_wide.png" alt="" />
</a>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xright"> <a href="http://wiki.apache.org/hadoop" class="externalLink">Wiki</a>
|
<a href="https://gitbox.apache.org/repos/asf/hadoop.git" class="externalLink">git</a>
&nbsp;| Last Published: 2023-03-08
&nbsp;| Version: 3.4.0-SNAPSHOT
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>General</h5>
<ul>
<li class="none">
<a href="../index.html">Overview</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/SingleCluster.html">Single Node Setup</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/ClusterSetup.html">Cluster Setup</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/CommandsManual.html">Commands Reference</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/Compatibility.html">Compatibility Specification</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/DownstreamDev.html">Downstream Developer's Guide</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/AdminCompatibilityGuide.html">Admin Compatibility Guide</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/InterfaceClassification.html">Interface Classification</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/filesystem/index.html">FileSystem Specification</a>
</li>
</ul>
<h5>Common</h5>
<ul>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/CLIMiniCluster.html">CLI Mini Cluster</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/FairCallQueue.html">Fair Call Queue</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/NativeLibraries.html">Native Libraries</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/Superusers.html">Proxy User</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/RackAwareness.html">Rack Awareness</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/SecureMode.html">Secure Mode</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/ServiceLevelAuth.html">Service Level Authorization</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/HttpAuthentication.html">HTTP Authentication</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>
</li>
<li class="none">
<a href="../hadoop-kms/index.html">Hadoop KMS</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/Tracing.html">Tracing</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/UnixShellGuide.html">Unix Shell Guide</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/registry/index.html">Registry</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/AsyncProfilerServlet.html">Async Profiler</a>
</li>
</ul>
<h5>HDFS</h5>
<ul>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">Architecture</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">User Guide</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HDFSCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html">NameNode HA With QJM</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html">NameNode HA With NFS</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html">Observer NameNode</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/Federation.html">Federation</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/ViewFs.html">ViewFs</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/ViewFsOverloadScheme.html">ViewFsOverloadScheme</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">Snapshots</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html">Edits Viewer</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html">Image Viewer</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html">Permissions and HDFS</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsQuotaAdminGuide.html">Quotas and HDFS</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/LibHdfs.html">libhdfs (C API)</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/WebHDFS.html">WebHDFS (REST API)</a>
</li>
<li class="none">
<a href="../hadoop-hdfs-httpfs/index.html">HttpFS</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html">Short Circuit Local Reads</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html">Centralized Cache Management</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html">NFS Gateway</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html">Rolling Upgrade</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html">Extended Attributes</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html">Transparent Encryption</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html">Multihoming</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html">Storage Policies</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/MemoryStorage.html">Memory Storage Support</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/SLGUserGuide.html">Synthetic Load Generator</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html">Erasure Coding</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html">Disk Balancer</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html">Upgrade Domain</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html">DataNode Admin</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html">Router Federation</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html">Provided Storage</a>
</li>
</ul>
<h5>MapReduce</h5>
<ul>
<li class="none">
<a href="../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html">Tutorial</a>
</li>
<li class="none">
<a href="../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html">Compatibility with 1.x</a>
</li>
<li class="none">
<a href="../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html">Encrypted Shuffle</a>
</li>
<li class="none">
<a href="../hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html">Pluggable Shuffle/Sort</a>
</li>
<li class="none">
<a href="../hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html">Distributed Cache Deploy</a>
</li>
<li class="none">
<a href="../hadoop-mapreduce-client/hadoop-mapreduce-client-core/SharedCacheSupport.html">Support for YARN Shared Cache</a>
</li>
</ul>
<h5>MapReduce REST APIs</h5>
<ul>
<li class="none">
<a href="../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html">MR Application Master</a>
</li>
<li class="none">
<a href="../hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html">MR History Server</a>
</li>
</ul>
<h5>YARN</h5>
<ul>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/YARN.html">Architecture</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/YarnCommands.html">Commands Reference</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html">Capacity Scheduler</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/FairScheduler.html">Fair Scheduler</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html">ResourceManager Restart</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html">ResourceManager HA</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/ResourceModel.html">Resource Model</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/NodeLabel.html">Node Labels</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/NodeAttributes.html">Node Attributes</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html">Web Application Proxy</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/TimelineServer.html">Timeline Server</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html">Timeline Service V.2</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html">Writing YARN Applications</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html">YARN Application Security</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/NodeManager.html">NodeManager</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/DockerContainers.html">Running Applications in Docker Containers</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/RuncContainers.html">Running Applications in runC Containers</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/NodeManagerCgroups.html">Using CGroups</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/SecureContainer.html">Secure Containers</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/ReservationSystem.html">Reservation System</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html">Graceful Decommission</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html">Opportunistic Containers</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/Federation.html">YARN Federation</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/SharedCache.html">Shared Cache</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/UsingGpus.html">Using GPU</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/UsingFPGA.html">Using FPGA</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html">Placement Constraints</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/YarnUI2.html">YARN UI2</a>
</li>
</ul>
<h5>YARN REST APIs</h5>
<ul>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html">Introduction</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html">Resource Manager</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html">Node Manager</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Timeline_Server_REST_API_v1">Timeline Server</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html#Timeline_Service_v.2_REST_API">Timeline Service V.2</a>
</li>
</ul>
<h5>YARN Service</h5>
<ul>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/yarn-service/Overview.html">Overview</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html">QuickStart</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/yarn-service/Concepts.html">Concepts</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html">Yarn Service API</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/yarn-service/ServiceDiscovery.html">Service Discovery</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-site/yarn-service/SystemServices.html">System Services</a>
</li>
</ul>
<h5>Hadoop Compatible File Systems</h5>
<ul>
<li class="none">
<a href="../hadoop-aliyun/tools/hadoop-aliyun/index.html">Aliyun OSS</a>
</li>
<li class="none">
<a href="../hadoop-aws/tools/hadoop-aws/index.html">Amazon S3</a>
</li>
<li class="none">
<a href="../hadoop-azure/index.html">Azure Blob Storage</a>
</li>
<li class="none">
<a href="../hadoop-azure-datalake/index.html">Azure Data Lake Storage</a>
</li>
<li class="none">
<a href="../hadoop-cos/cloud-storage/index.html">Tencent COS</a>
</li>
<li class="none">
<a href="../hadoop-huaweicloud/cloud-storage/index.html">Huaweicloud OBS</a>
</li>
</ul>
<h5>Auth</h5>
<ul>
<li class="none">
<a href="../hadoop-auth/index.html">Overview</a>
</li>
<li class="none">
<a href="../hadoop-auth/Examples.html">Examples</a>
</li>
<li class="none">
<a href="../hadoop-auth/Configuration.html">Configuration</a>
</li>
<li class="none">
<a href="../hadoop-auth/BuildingIt.html">Building</a>
</li>
</ul>
<h5>Tools</h5>
<ul>
<li class="none">
<a href="../hadoop-streaming/HadoopStreaming.html">Hadoop Streaming</a>
</li>
<li class="none">
<a href="../hadoop-archives/HadoopArchives.html">Hadoop Archives</a>
</li>
<li class="none">
<a href="../hadoop-archive-logs/HadoopArchiveLogs.html">Hadoop Archive Logs</a>
</li>
<li class="none">
<a href="../hadoop-distcp/DistCp.html">DistCp</a>
</li>
<li class="none">
<a href="../hadoop-federation-balance/HDFSFederationBalance.html">HDFS Federation Balance</a>
</li>
<li class="none">
<a href="../hadoop-gridmix/GridMix.html">GridMix</a>
</li>
<li class="none">
<a href="../hadoop-rumen/Rumen.html">Rumen</a>
</li>
<li class="none">
<a href="../hadoop-resourceestimator/ResourceEstimator.html">Resource Estimator Service</a>
</li>
<li class="none">
<a href="../hadoop-sls/SchedulerLoadSimulator.html">Scheduler Load Simulator</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/Benchmarking.html">Hadoop Benchmarking</a>
</li>
<li class="none">
<a href="../hadoop-dynamometer/Dynamometer.html">Dynamometer</a>
</li>
</ul>
<h5>Reference</h5>
<ul>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/release/">Changelog and Release Notes</a>
</li>
<li class="none">
<a href="../api/index.html">Java API docs</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/UnixShellAPI.html">Unix Shell API</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/Metrics.html">Metrics</a>
</li>
</ul>
<h5>Configuration</h5>
<ul>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/core-default.xml">core-default.xml</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs/hdfs-default.xml">hdfs-default.xml</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-hdfs-rbf/hdfs-rbf-default.xml">hdfs-rbf-default.xml</a>
</li>
<li class="none">
<a href="../hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml">mapred-default.xml</a>
</li>
<li class="none">
<a href="../hadoop-yarn/hadoop-yarn-common/yarn-default.xml">yarn-default.xml</a>
</li>
<li class="none">
<a href="../hadoop-kms/kms-default.html">kms-default.xml</a>
</li>
<li class="none">
<a href="../hadoop-hdfs-httpfs/httpfs-default.html">httpfs-default.xml</a>
</li>
<li class="none">
<a href="../hadoop-project-dist/hadoop-common/DeprecatedProperties.html">Deprecated Properties</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img alt="Built by Maven" src="./images/logos/maven-feather.png"/>
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<h1>Hadoop Azure Support: Azure Blob Storage</h1>
<ul>
<li><a href="#Introduction">Introduction</a></li>
<li><a href="#Features">Features</a></li>
<li><a href="#Limitations">Limitations</a></li>
<li><a href="#Usage">Usage</a>
<ul>
<li><a href="#Concepts">Concepts</a></li>
<li><a href="#Configuring_Credentials">Configuring Credentials</a>
<ul>
<li><a href="#Protecting_the_Azure_Credentials_for_WASB_with_Credential_Providers">Protecting the Azure Credentials for WASB with Credential Providers</a></li>
<li><a href="#Protecting_the_Azure_Credentials_for_WASB_within_an_Encrypted_File">Protecting the Azure Credentials for WASB within an Encrypted File</a></li></ul></li>
<li><a href="#Block_Blob_with_Compaction_Support_and_Configuration">Block Blob with Compaction Support and Configuration</a></li>
<li><a href="#Page_Blob_Support_and_Configuration">Page Blob Support and Configuration</a></li>
<li><a href="#Custom_User-Agent">Custom User-Agent</a></li>
<li><a href="#Atomic_Folder_Rename">Atomic Folder Rename</a></li>
<li><a href="#Accessing_wasb_URLs">Accessing wasb URLs</a></li>
<li><a href="#Append_API_Support_and_Configuration">Append API Support and Configuration</a></li>
<li><a href="#Multithread_Support">Multithread Support</a></li>
<li><a href="#WASB_Secure_mode_and_configuration">WASB Secure mode and configuration</a></li>
<li><a href="#Authorization_Support_in_WASB">Authorization Support in WASB</a></li>
<li><a href="#Delegation_token_support_in_WASB">Delegation token support in WASB</a></li>
<li><a href="#chown_behaviour_when_authorization_is_enabled_in_WASB">chown behaviour when authorization is enabled in WASB</a></li>
<li><a href="#chmod_behaviour_when_authorization_is_enabled_in_WASB">chmod behaviour when authorization is enabled in WASB</a></li>
<li><a href="#Performance_optimization_configurations">Performance optimization configurations</a></li></ul></li>
<li><a href="#Further_Reading">Further Reading</a></li></ul>
<p>See also:</p>
<ul>
<li><a href="./abfs.html">ABFS</a></li>
<li><a href="./testing_azure.html">Testing</a></li>
</ul><section>
<h2><a name="Introduction"></a>Introduction</h2>
<p>The <code>hadoop-azure</code> module provides support for integration with <a class="externalLink" href="http://azure.microsoft.com/en-us/documentation/services/storage/">Azure Blob Storage</a>. The built jar file, named <code>hadoop-azure.jar</code>, also declares transitive dependencies on the additional artifacts it requires, notably the <a class="externalLink" href="https://github.com/Azure/azure-storage-java">Azure Storage SDK for Java</a>.</p>
<p>To make it part of Apache Hadoop&#x2019;s default classpath, simply make sure that <code>HADOOP_OPTIONAL_TOOLS</code>in <code>hadoop-env.sh</code> has <code>'hadoop-azure</code> in the list. Example:</p>
<div class="source">
<div class="source">
<pre>export HADOOP_OPTIONAL_TOOLS=&quot;hadoop-azure,hadoop-azure-datalake&quot;
</pre></div></div>
</section><section>
<h2><a name="Features"></a>Features</h2>
<ul>
<li>Read and write data stored in an Azure Blob Storage account.</li>
<li>Present a hierarchical file system view by implementing the standard Hadoop <a href="../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a> interface.</li>
<li>Supports configuration of multiple Azure Blob Storage accounts.</li>
<li>Supports both block blobs (suitable for most use cases, such as MapReduce) and page blobs (suitable for continuous write use cases, such as an HBase write-ahead log).</li>
<li>Reference file system paths using URLs using the <code>wasb</code> scheme.</li>
<li>Also reference file system paths using URLs with the <code>wasbs</code> scheme for SSL encrypted access.</li>
<li>Can act as a source of data in a MapReduce job, or a sink.</li>
<li>Tested on both Linux and Windows.</li>
<li>Tested at scale.</li>
</ul></section><section>
<h2><a name="Limitations"></a>Limitations</h2>
<ul>
<li>File owner and group are persisted, but the permissions model is not enforced. Authorization occurs at the level of the entire Azure Blob Storage account.</li>
<li>File last access time is not tracked.</li>
</ul></section><section>
<h2><a name="Usage"></a>Usage</h2><section>
<h3><a name="Concepts"></a>Concepts</h3>
<p>The Azure Blob Storage data model presents 3 core concepts:</p>
<ul>
<li><b>Storage Account</b>: All access is done through a storage account.</li>
<li><b>Container</b>: A container is a grouping of multiple blobs. A storage account may have multiple containers. In Hadoop, an entire file system hierarchy is stored in a single container. It is also possible to configure multiple containers, effectively presenting multiple file systems that can be referenced using distinct URLs.</li>
<li><b>Blob</b>: A file of any type and size. In Hadoop, files are stored in blobs. The internal implementation also uses blobs to persist the file system hierarchy and other metadata.</li>
</ul></section><section>
<h3><a name="Configuring_Credentials"></a>Configuring Credentials</h3>
<p>Usage of Azure Blob Storage requires configuration of credentials. Typically this is set in core-site.xml. The configuration property name is of the form <code>fs.azure.account.key.&lt;account name&gt;.blob.core.windows.net</code> and the value is the access key. <b>The access key is a secret that protects access to your storage account. Do not share the access key (or the core-site.xml file) with an untrusted party.</b></p>
<p>For example:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.account.key.youraccount.blob.core.windows.net&lt;/name&gt;
&lt;value&gt;YOUR ACCESS KEY&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>In many Hadoop clusters, the core-site.xml file is world-readable. It is possible to protect the access key within a credential provider as well. This provides an encrypted file format along with protection with file permissions.</p><section>
<h4><a name="Protecting_the_Azure_Credentials_for_WASB_with_Credential_Providers"></a>Protecting the Azure Credentials for WASB with Credential Providers</h4>
<p>To protect these credentials from prying eyes, it is recommended that you use the credential provider framework to securely store them and access them through configuration. The following describes its use for Azure credentials in WASB FileSystem.</p>
<p>For additional reading on the credential provider API see: <a href="../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html">Credential Provider API</a>.</p><section>
<h5><a name="End_to_End_Steps_for_Distcp_and_WASB_with_Credential_Providers"></a>End to End Steps for Distcp and WASB with Credential Providers</h5><section>
<h6><a name="provision"></a>provision</h6>
<div class="source">
<div class="source">
<pre>% hadoop credential create fs.azure.account.key.youraccount.blob.core.windows.net -value 123
-provider localjceks://file/home/lmccay/wasb.jceks
</pre></div></div>
</section><section>
<h6><a name="configure_core-site.xml_or_command_line_system_property"></a>configure core-site.xml or command line system property</h6>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;hadoop.security.credential.provider.path&lt;/name&gt;
&lt;value&gt;localjceks://file/home/lmccay/wasb.jceks&lt;/value&gt;
&lt;description&gt;Path to interrogate for protected credentials.&lt;/description&gt;
&lt;/property&gt;
</pre></div></div>
</section><section>
<h6><a name="distcp"></a>distcp</h6>
<div class="source">
<div class="source">
<pre>% hadoop distcp
[-D hadoop.security.credential.provider.path=localjceks://file/home/lmccay/wasb.jceks]
hdfs://hostname:9001/user/lmccay/007020615 wasb://yourcontainer@youraccount.blob.core.windows.net/testDir/
</pre></div></div>
<p>NOTE: You may optionally add the provider path property to the distcp command line instead of added job specific configuration to a generic core-site.xml. The square brackets above illustrate this capability.</p></section></section></section><section>
<h4><a name="Protecting_the_Azure_Credentials_for_WASB_within_an_Encrypted_File"></a>Protecting the Azure Credentials for WASB within an Encrypted File</h4>
<p>In addition to using the credential provider framework to protect your credentials, it&#x2019;s also possible to configure it in encrypted form. An additional configuration property specifies an external program to be invoked by Hadoop processes to decrypt the key. The encrypted key value is passed to this external program as a command line argument:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.account.keyprovider.youraccount&lt;/name&gt;
&lt;value&gt;org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.azure.account.key.youraccount.blob.core.windows.net&lt;/name&gt;
&lt;value&gt;YOUR ENCRYPTED ACCESS KEY&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.azure.shellkeyprovider.script&lt;/name&gt;
&lt;value&gt;PATH TO DECRYPTION PROGRAM&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
</section></section><section>
<h3><a name="Block_Blob_with_Compaction_Support_and_Configuration"></a>Block Blob with Compaction Support and Configuration</h3>
<p>Block blobs are the default kind of blob and are good for most big-data use cases. However, block blobs have strict limit of 50,000 blocks per blob. To prevent reaching the limit WASB, by default, does not upload new block to the service after every <code>hflush()</code> or <code>hsync()</code>.</p>
<p>For most of the cases, combining data from multiple <code>write()</code> calls in blocks of 4Mb is a good optimization. But, in others cases, like HBase log files, every call to <code>hflush()</code> or <code>hsync()</code> must upload the data to the service.</p>
<p>Block blobs with compaction upload the data to the cloud service after every <code>hflush()</code>/<code>hsync()</code>. To mitigate the limit of 50000 blocks, <code>hflush()</code>/<code>hsync()</code> runs once compaction process, if number of blocks in the blob is above 32,000.</p>
<p>Block compaction search and replaces a sequence of small blocks with one big block. That means there is associated cost with block compaction: reading small blocks back to the client and writing it again as one big block.</p>
<p>In order to have the files you create be block blobs with block compaction enabled, the client must set the configuration variable <code>fs.azure.block.blob.with.compaction.dir</code> to a comma-separated list of folder names.</p>
<p>For example:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.block.blob.with.compaction.dir&lt;/name&gt;
&lt;value&gt;/hbase/WALs,/data/myblobfiles&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
</section><section>
<h3><a name="Page_Blob_Support_and_Configuration"></a>Page Blob Support and Configuration</h3>
<p>The Azure Blob Storage interface for Hadoop supports two kinds of blobs, <a class="externalLink" href="http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx">block blobs and page blobs</a>. Block blobs are the default kind of blob and are good for most big-data use cases, like input data for Hive, Pig, analytical map-reduce jobs etc. Page blob handling in hadoop-azure was introduced to support HBase log files. Page blobs can be written any number of times, whereas block blobs can only be appended to 50,000 times before you run out of blocks and your writes will fail. That won&#x2019;t work for HBase logs, so page blob support was introduced to overcome this limitation.</p>
<p>Page blobs can be up to 1TB in size, larger than the maximum 200GB size for block blobs. You should stick to block blobs for most usage, and page blobs are only tested in context of HBase write-ahead logs.</p>
<p>In order to have the files you create be page blobs, you must set the configuration variable <code>fs.azure.page.blob.dir</code> to a comma-separated list of folder names.</p>
<p>For example:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.page.blob.dir&lt;/name&gt;
&lt;value&gt;/hbase/WALs,/hbase/oldWALs,/data/mypageblobfiles&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>You can set this to simply / to make all files page blobs.</p>
<p>The configuration option <code>fs.azure.page.blob.size</code> is the default initial size for a page blob. It must be 128MB or greater, and no more than 1TB, specified as an integer number of bytes.</p>
<p>The configuration option <code>fs.azure.page.blob.extension.size</code> is the page blob extension size. This defines the amount to extend a page blob if it starts to get full. It must be 128MB or greater, specified as an integer number of bytes.</p></section><section>
<h3><a name="Custom_User-Agent"></a>Custom User-Agent</h3>
<p>WASB passes User-Agent header to the Azure back-end. The default value contains WASB version, Java Runtime version, Azure Client library version, and the value of the configuration option <code>fs.azure.user.agent.prefix</code>. Customized User-Agent header enables better troubleshooting and analysis by Azure service.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.user.agent.prefix&lt;/name&gt;
&lt;value&gt;Identifier&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
</section><section>
<h3><a name="Atomic_Folder_Rename"></a>Atomic Folder Rename</h3>
<p>Azure storage stores files as a flat key/value store without formal support for folders. The hadoop-azure file system layer simulates folders on top of Azure storage. By default, folder rename in the hadoop-azure file system layer is not atomic. That means that a failure during a folder rename could, for example, leave some folders in the original directory and some in the new one.</p>
<p>HBase depends on atomic folder rename. Hence, a configuration setting was introduced called <code>fs.azure.atomic.rename.dir</code> that allows you to specify a comma-separated list of directories to receive special treatment so that folder rename is made atomic. The default value of this setting is just <code>/hbase</code>. Redo will be applied to finish a folder rename that fails. A file <code>&lt;folderName&gt;-renamePending.json</code> may appear temporarily and is the record of the intention of the rename operation, to allow redo in event of a failure.</p>
<p>For example:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.atomic.rename.dir&lt;/name&gt;
&lt;value&gt;/hbase,/data&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
</section><section>
<h3><a name="Accessing_wasb_URLs"></a>Accessing wasb URLs</h3>
<p>After credentials are configured in core-site.xml, any Hadoop component may reference files in that Azure Blob Storage account by using URLs of the following format:</p>
<div class="source">
<div class="source">
<pre>wasb[s]://&lt;containername&gt;@&lt;accountname&gt;.blob.core.windows.net/&lt;path&gt;
</pre></div></div>
<p>The schemes <code>wasb</code> and <code>wasbs</code> identify a URL on a file system backed by Azure Blob Storage. <code>wasb</code> utilizes unencrypted HTTP access for all interaction with the Azure Blob Storage API. <code>wasbs</code> utilizes SSL encrypted HTTPS access.</p>
<p>For example, the following <a href="../hadoop-project-dist/hadoop-common/FileSystemShell.html">FileSystem Shell</a> commands demonstrate access to a storage account named <code>youraccount</code> and a container named <code>yourcontainer</code>.</p>
<div class="source">
<div class="source">
<pre>% hadoop fs -mkdir wasb://yourcontainer@youraccount.blob.core.windows.net/testDir
% hadoop fs -put testFile wasb://yourcontainer@youraccount.blob.core.windows.net/testDir/testFile
% hadoop fs -cat wasbs://yourcontainer@youraccount.blob.core.windows.net/testDir/testFile
test file content
</pre></div></div>
<p>It&#x2019;s also possible to configure <code>fs.defaultFS</code> to use a <code>wasb</code> or <code>wasbs</code> URL. This causes all bare paths, such as <code>/testDir/testFile</code> to resolve automatically to that file system.</p></section><section>
<h3><a name="Append_API_Support_and_Configuration"></a>Append API Support and Configuration</h3>
<p>The Azure Blob Storage interface for Hadoop has optional support for Append API for single writer by setting the configuration <code>fs.azure.enable.append.support</code> to true.</p>
<p>For Example:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.enable.append.support&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>It must be noted Append support in Azure Blob Storage interface DIFFERS FROM HDFS SEMANTICS. Append support does not enforce single writer internally but requires applications to guarantee this semantic. It becomes a responsibility of the application either to ensure single-threaded handling for a particular file path, or rely on some external locking mechanism of its own. Failure to do so will result in unexpected behavior.</p></section><section>
<h3><a name="Multithread_Support"></a>Multithread Support</h3>
<p>Rename and Delete blob operations on directories with large number of files and sub directories currently is very slow as these operations are done one blob at a time serially. These files and sub folders can be deleted or renamed parallel. Following configurations can be used to enable threads to do parallel processing</p>
<p>To enable 10 threads for Delete operation. Set configuration value to 0 or 1 to disable threads. The default behavior is threads disabled.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.delete.threads&lt;/name&gt;
&lt;value&gt;10&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>To enable 20 threads for Rename operation. Set configuration value to 0 or 1 to disable threads. The default behavior is threads disabled.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.rename.threads&lt;/name&gt;
&lt;value&gt;20&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
</section><section>
<h3><a name="WASB_Secure_mode_and_configuration"></a>WASB Secure mode and configuration</h3>
<p>WASB can operate in secure mode where the Storage access keys required to communicate with Azure storage does not have to be in the same address space as the process using WASB. In this mode all interactions with Azure storage is performed using SAS uris. There are two sub modes within the Secure mode, one is remote SAS key mode where the SAS keys are generated from a remote process and local mode where SAS keys are generated within WASB. By default the SAS Key mode is expected to run in Romote mode, however for testing purposes the local mode can be enabled to generate SAS keys in the same process as WASB.</p>
<p>To enable Secure mode following property needs to be set to true.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.secure.mode&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>To enable SAS key generation locally following property needs to be set to true.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.local.sas.key.mode&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>To use the remote SAS key generation mode, comma separated external REST services are expected to provided required SAS keys. Following property can used to provide the end point to use for remote SAS Key generation:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.cred.service.urls&lt;/name&gt;
&lt;value&gt;{URL}&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>The remote service is expected to provide support for two REST calls <code>{URL}/GET_CONTAINER_SAS</code> and <code>{URL}/GET_RELATIVE_BLOB_SAS</code>, for generating container and relative blob sas keys. An example requests</p>
<p><code>{URL}/GET_CONTAINER_SAS?storage_account=&lt;account_name&gt;&amp;container=&lt;container&gt;&amp;sas_expiry=&lt;expiry period&gt;&amp;delegation_token=&lt;delegation token&gt;</code> <code>{URL}/GET_CONTAINER_SAS?storage_account=&lt;account_name&gt;&amp;container=&lt;container&gt;&amp;relative_path=&lt;relative path&gt;&amp;sas_expiry=&lt;expiry period&gt;&amp;delegation_token=&lt;delegation token&gt;</code></p>
<p>The service is expected to return a response in JSON format:</p>
<div class="source">
<div class="source">
<pre>{
&quot;responseCode&quot; : 0 or non-zero &lt;int&gt;,
&quot;responseMessage&quot; : relavant message on failure &lt;String&gt;,
&quot;sasKey&quot; : Requested SAS Key &lt;String&gt;
}
</pre></div></div>
</section><section>
<h3><a name="Authorization_Support_in_WASB"></a>Authorization Support in WASB</h3>
<p>Authorization support can be enabled in WASB using the following configuration:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.authorization&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>The current implementation of authorization relies on the presence of an external service that can enforce the authorization. The service is expected to be running on comma separated URLs provided by the following config.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.authorization.remote.service.urls&lt;/name&gt;
&lt;value&gt;{URL}&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>The remote service is expected to provide support for the following REST call: <code>{URL}/CHECK_AUTHORIZATION</code> An example request: <code>{URL}/CHECK_AUTHORIZATION?wasb_absolute_path=&lt;absolute_path&gt;&amp;operation_type=&lt;operation type&gt;&amp;delegation_token=&lt;delegation token&gt;</code></p>
<p>The service is expected to return a response in JSON format:</p>
<div class="source">
<div class="source">
<pre>{
&quot;responseCode&quot; : 0 or non-zero &lt;int&gt;,
&quot;responseMessage&quot; : relevant message on failure &lt;String&gt;,
&quot;authorizationResult&quot; : true/false &lt;boolean&gt;
}
</pre></div></div>
</section><section>
<h3><a name="Delegation_token_support_in_WASB"></a>Delegation token support in WASB</h3>
<p>Delegation token support can be enabled in WASB using the following configuration:</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.enable.kerberos.support&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>The current implementation of delegation token implementation relies on the presence of an external service instances that can generate and manage delegation tokens. The service is expected to be running on comma separated URLs provided by the following config.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.delegation.token.service.urls&lt;/name&gt;
&lt;value&gt;{URL}&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>The remote service is expected to provide support for the following REST call: <code>{URL}?op=GETDELEGATIONTOKEN</code>, <code>{URL}?op=RENEWDELEGATIONTOKEN</code> and <code>{URL}?op=CANCELDELEGATIONTOKEN</code> An example request: <code>{URL}?op=GETDELEGATIONTOKEN&amp;renewer=&lt;renewer&gt;</code> <code>{URL}?op=RENEWDELEGATIONTOKEN&amp;token=&lt;delegation token&gt;</code> <code>{URL}?op=CANCELDELEGATIONTOKEN&amp;token=&lt;delegation token&gt;</code></p>
<p>The service is expected to return a response in JSON format for GETDELEGATIONTOKEN request:</p>
<div class="source">
<div class="source">
<pre>{
&quot;Token&quot; : {
&quot;urlString&quot;: URL string of delegation token.
}
}
</pre></div></div>
</section><section>
<h3><a name="chown_behaviour_when_authorization_is_enabled_in_WASB"></a>chown behaviour when authorization is enabled in WASB</h3>
<p>When authorization is enabled, only the users listed in the following configuration are allowed to change the owning user of files/folders in WASB. The configuration value takes a comma separated list of user names who are allowed to perform chown.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.chown.allowed.userlist&lt;/name&gt;
&lt;value&gt;user1,user2&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
</section><section>
<h3><a name="chmod_behaviour_when_authorization_is_enabled_in_WASB"></a>chmod behaviour when authorization is enabled in WASB</h3>
<p>When authorization is enabled, only the owner and the users listed in the following configurations are allowed to change the permissions of files/folders in WASB. The configuration value takes a comma separated list of user names who are allowed to perform chmod.</p>
<div class="source">
<div class="source">
<pre>&lt;property&gt;
&lt;name&gt;fs.azure.daemon.userlist&lt;/name&gt;
&lt;value&gt;user1,user2&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;fs.azure.chmod.allowed.userlist&lt;/name&gt;
&lt;value&gt;userA,userB&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>Caching of both SAS keys and Authorization responses can be enabled using the following setting: The cache settings are applicable only when fs.azure.authorization is enabled. The cache is maintained at a filesystem object level.</p>
<div class="source">
<div class="source">
<pre> &lt;property&gt;
&lt;name&gt;fs.azure.authorization.caching.enable&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>The maximum number of entries that the cache can hold can be customized using the following setting:</p>
<div class="source">
<div class="source">
<pre> &lt;property&gt;
&lt;name&gt;fs.azure.authorization.caching.maxentries&lt;/name&gt;
&lt;value&gt;512&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>The validity of an authorization cache-entry can be controlled using the following setting: Setting the value to zero disables authorization-caching. If the key is not specified, a default expiry duration of 5m takes effect.</p>
<div class="source">
<div class="source">
<pre> &lt;property&gt;
&lt;name&gt;fs.azure.authorization.cacheentry.expiry.period&lt;/name&gt;
&lt;value&gt;5m&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>The validity of a SASKey cache-entry can be controlled using the following setting. Setting the value to zero disables SASKey-caching. If the key is not specified, the default expiry duration specified in the sas-key request takes effect.</p>
<div class="source">
<div class="source">
<pre> &lt;property&gt;
&lt;name&gt;fs.azure.saskey.cacheentry.expiry.period&lt;/name&gt;
&lt;value&gt;90d&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<p>Use container saskey for access to all blobs within the container. Blob-specific saskeys are not used when this setting is enabled. This setting provides better performance compared to blob-specific saskeys.</p>
<div class="source">
<div class="source">
<pre> &lt;property&gt;
&lt;name&gt;fs.azure.saskey.usecontainersaskeyforallaccess&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
</section><section>
<h3><a name="Performance_optimization_configurations"></a>Performance optimization configurations</h3>
<p><code>fs.azure.block.blob.buffered.pread.disable</code>: By default the positional read API will do a seek and read on input stream. This read will fill the buffer cache in BlockBlobInputStream. If this configuration is true it will skip usage of buffer and do a lock free call for reading from blob. This optimization is very much helpful for HBase kind of short random read over a shared InputStream instance. Note: This is not a config which can be set at cluster level. It can be used as an option on FutureDataInputStreamBuilder. See FileSystem#openFile(Path path)</p></section></section><section>
<h2><a name="Further_Reading"></a>Further Reading</h2>
<ul>
<li><a href="testing_azure.html">Testing the Azure WASB client</a>.</li>
<li>MSDN article, <a class="externalLink" href="https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs">Understanding Block Blobs, Append Blobs, and Page Blobs</a></li>
</ul></section>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
&#169; 2008-2023
Apache Software Foundation
- <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>