HDFS-6394. HDFS encryption documentation. (wang)
git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/fs-encryption@1616016 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
7a246c447f
commit
ac47ad11de
|
@ -74,6 +74,8 @@ fs-encryption (Unreleased)
|
||||||
|
|
||||||
HDFS-6780. Batch the encryption zones listing API. (wang)
|
HDFS-6780. Batch the encryption zones listing API. (wang)
|
||||||
|
|
||||||
|
HDFS-6394. HDFS encryption documentation. (wang)
|
||||||
|
|
||||||
OPTIMIZATIONS
|
OPTIMIZATIONS
|
||||||
|
|
||||||
BUG FIXES
|
BUG FIXES
|
||||||
|
|
|
@ -125,7 +125,7 @@ public class CryptoAdmin extends Configured implements Tool {
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
public String getShortUsage() {
|
public String getShortUsage() {
|
||||||
return "[" + getName() + " -keyName <keyName> -path <path> " + "]\n";
|
return "[" + getName() + " -keyName <keyName> -path <path>]\n";
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
|
@ -187,7 +187,7 @@ public class CryptoAdmin extends Configured implements Tool {
|
||||||
@Override
|
@Override
|
||||||
public String getLongUsage() {
|
public String getLongUsage() {
|
||||||
return getShortUsage() + "\n" +
|
return getShortUsage() + "\n" +
|
||||||
"List all encryption zones.\n\n";
|
"List all encryption zones. Requires superuser permissions.\n\n";
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
|
|
|
@ -0,0 +1,206 @@
|
||||||
|
~~ Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
~~ you may not use this file except in compliance with the License.
|
||||||
|
~~ You may obtain a copy of the License at
|
||||||
|
~~
|
||||||
|
~~ http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
~~
|
||||||
|
~~ Unless required by applicable law or agreed to in writing, software
|
||||||
|
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
~~ See the License for the specific language governing permissions and
|
||||||
|
~~ limitations under the License. See accompanying LICENSE file.
|
||||||
|
|
||||||
|
---
|
||||||
|
Hadoop Distributed File System-${project.version} - Transparent Encryption in HDFS
|
||||||
|
---
|
||||||
|
---
|
||||||
|
${maven.build.timestamp}
|
||||||
|
|
||||||
|
Transparent Encryption in HDFS
|
||||||
|
|
||||||
|
%{toc|section=1|fromDepth=2|toDepth=3}
|
||||||
|
|
||||||
|
* {Overview}
|
||||||
|
|
||||||
|
HDFS implements <transparent>, <end-to-end> encryption.
|
||||||
|
Once configured, data read from and written to HDFS is <transparently> encrypted and decrypted without requiring changes to user application code.
|
||||||
|
This encryption is also <end-to-end>, which means the data can only be encrypted and decrypted by the client.
|
||||||
|
HDFS never stores or has access to unencrypted data or data encryption keys.
|
||||||
|
This satisfies two typical requirements for encryption: <at-rest encryption> (meaning data on persistent media, such as a disk) as well as <in-transit encryption> (e.g. when data is travelling over the network).
|
||||||
|
|
||||||
|
* {Use Cases}
|
||||||
|
|
||||||
|
Data encryption is required by a number of different government, financial, and regulatory entities.
|
||||||
|
For example, the health-care industry has HIPAA regulations, the card payment industry has PCI DSS regulations, and the US government has FISMA regulations.
|
||||||
|
Having transparent encryption built into HDFS makes it easier for organizations to comply with these regulations.
|
||||||
|
|
||||||
|
Encryption can also be performed at the application-level, but by integrating it into HDFS, existing applications can operate on encrypted data without changes.
|
||||||
|
This integrated architecture implies stronger encrypted file semantics and better coordination with other HDFS functions.
|
||||||
|
|
||||||
|
* {Architecture}
|
||||||
|
|
||||||
|
** {Key Management Server, KeyProvider, EDEKs}
|
||||||
|
|
||||||
|
A new cluster service is required to store, manage, and access encryption keys: the Hadoop <Key Management Server (KMS)>.
|
||||||
|
The KMS is a proxy that interfaces with a backing key store on behalf of HDFS daemons and clients.
|
||||||
|
Both the backing key store and the KMS implement the Hadoop KeyProvider client API.
|
||||||
|
See the {{{../../hadoop-kms/index.html}KMS documentation}} for more information.
|
||||||
|
|
||||||
|
In the KeyProvider API, each encryption key has a unique <key name>.
|
||||||
|
Because keys can be rolled, a key can have multiple <key versions>, where each key version has its own <key material> (the actual secret bytes used during encryption and decryption).
|
||||||
|
An encryption key can be fetched by either its key name, returning the latest version of the key, or by a specific key version.
|
||||||
|
|
||||||
|
The KMS implements additional functionality which enables creation and decryption of <encrypted encryption keys (EEKs)>.
|
||||||
|
Creation and decryption of EEKs happens entirely on the KMS.
|
||||||
|
Importantly, the client requesting creation or decryption of an EEK never handles the EEK's encryption key.
|
||||||
|
To create a new EEK, the KMS generates a new random key, encrypts it with the specified key, and returns the EEK to the client.
|
||||||
|
To decrypt an EEK, the KMS checks that the user has access to the encryption key, uses it to decrypt the EEK, and returns the decrypted encryption key.
|
||||||
|
|
||||||
|
In the context of HDFS encryption, EEKs are <encrypted data encryption keys (EDEKs)>, where a <data encryption key (DEK)> is what is used to encrypt and decrypt file data.
|
||||||
|
Typically, the key store is configured to only allow end users access to the keys used to encrypt DEKs.
|
||||||
|
This means that EDEKs can be safely stored and handled by HDFS, since the HDFS user will not have access to EDEK encryption keys.
|
||||||
|
|
||||||
|
** {Encryption zones}
|
||||||
|
|
||||||
|
For transparent encryption, we introduce a new abstraction to HDFS: the <encryption zone>.
|
||||||
|
An encryption zone is a special directory whose contents will be transparently encrypted upon write and transparently decrypted upon read.
|
||||||
|
Each encryption zone is associated with a single <encryption zone key> which is specified when the zone is created.
|
||||||
|
Each file within an encryption zone has its own unique EDEK.
|
||||||
|
|
||||||
|
When creating a new file in an encryption zone, the NameNode asks the KMS to generate a new EDEK encrypted with the encryption zone's key.
|
||||||
|
The EDEK is then stored persistently as part of the file's metadata on the NameNode.
|
||||||
|
|
||||||
|
When reading a file within an encryption zone, the NameNode provides the client with the file's EDEK and the encryption zone key version used to encrypt the EDEK.
|
||||||
|
The client then asks the KMS to decrypt the EDEK, which involves checking that the client has permission to access the encryption zone key version.
|
||||||
|
Assuming that is successful, the client uses the DEK to decrypt the file's contents.
|
||||||
|
|
||||||
|
All of the above steps for the read and write path happen automatically through interactions between the DFSClient, the NameNode, and the KMS.
|
||||||
|
|
||||||
|
Access to encrypted file data and metadata is controlled by normal HDFS filesystem permissions.
|
||||||
|
This means that if HDFS is compromised (for example, by gaining unauthorized access to an HDFS superuser account), a malicious user only gains access to ciphertext and encrypted keys.
|
||||||
|
However, since access to encryption zone keys is controlled by a separate set of permissions on the KMS and key store, this does not pose a security threat.
|
||||||
|
|
||||||
|
* {Configuration}
|
||||||
|
|
||||||
|
A necessary prerequisite is an instance of the KMS, as well as a backing key store for the KMS.
|
||||||
|
See the {{{../../hadoop-kms/index.html}KMS documentation}} for more information.
|
||||||
|
|
||||||
|
** Selecting an encryption algorithm and codec
|
||||||
|
|
||||||
|
*** hadoop.security.crypto.codec.classes.EXAMPLECIPHERSUITE
|
||||||
|
|
||||||
|
The prefix for a given crypto codec, contains a comma-separated list of implementation classes for a given crypto codec (eg EXAMPLECIPHERSUITE).
|
||||||
|
The first implementation will be used if available, others are fallbacks.
|
||||||
|
|
||||||
|
*** hadoop.security.crypto.codec.classes.aes.ctr.nopadding
|
||||||
|
|
||||||
|
Default: <<<org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec,org.apache.hadoop.crypto.JceAesCtrCryptoCodec>>>
|
||||||
|
|
||||||
|
Comma-separated list of crypto codec implementations for AES/CTR/NoPadding.
|
||||||
|
The first implementation will be used if available, others are fallbacks.
|
||||||
|
|
||||||
|
*** hadoop.security.crypto.cipher.suite
|
||||||
|
|
||||||
|
Default: <<<AES/CTR/NoPadding>>>
|
||||||
|
|
||||||
|
Cipher suite for crypto codec.
|
||||||
|
|
||||||
|
*** hadoop.security.crypto.jce.provider
|
||||||
|
|
||||||
|
Default: None
|
||||||
|
|
||||||
|
The JCE provider name used in CryptoCodec.
|
||||||
|
|
||||||
|
*** hadoop.security.crypto.buffer.size
|
||||||
|
|
||||||
|
Default: <<<8192>>>
|
||||||
|
|
||||||
|
The buffer size used by CryptoInputStream and CryptoOutputStream.
|
||||||
|
|
||||||
|
** Namenode configuration
|
||||||
|
|
||||||
|
*** dfs.namenode.list.encryption.zones.num.responses
|
||||||
|
|
||||||
|
Default: <<<100>>>
|
||||||
|
|
||||||
|
When listing encryption zones, the maximum number of zones that will be returned in a batch.
|
||||||
|
Fetching the list incrementally in batches improves namenode performance.
|
||||||
|
|
||||||
|
* {<<<crypto>>> command-line interface}
|
||||||
|
|
||||||
|
** {createZone}
|
||||||
|
|
||||||
|
Usage: <<<[-createZone -keyName <keyName> -path <path>]>>>
|
||||||
|
|
||||||
|
Create a new encryption zone.
|
||||||
|
|
||||||
|
*--+--+
|
||||||
|
<path> | The path of the encryption zone to create. It must be an empty directory.
|
||||||
|
*--+--+
|
||||||
|
<keyName> | Name of the key to use for the encryption zone.
|
||||||
|
*--+--+
|
||||||
|
|
||||||
|
** {listZones}
|
||||||
|
|
||||||
|
Usage: <<<[-listZones]>>>
|
||||||
|
|
||||||
|
List all encryption zones. Requires superuser permissions.
|
||||||
|
|
||||||
|
* {Attack vectors}
|
||||||
|
|
||||||
|
** {Hardware access exploits}
|
||||||
|
|
||||||
|
These exploits assume that attacker has gained physical access to hard drives from cluster machines, i.e. datanodes and namenodes.
|
||||||
|
|
||||||
|
[[1]] Access to swap files of processes containing data encryption keys.
|
||||||
|
|
||||||
|
* By itself, this does not expose cleartext, as it also requires access to encrypted block files.
|
||||||
|
|
||||||
|
* This can be mitigated by disabling swap, using encrypted swap, or using mlock to prevent keys from being swapped out.
|
||||||
|
|
||||||
|
[[1]] Access to encrypted block files.
|
||||||
|
|
||||||
|
* By itself, this does not expose cleartext, as it also requires access to DEKs.
|
||||||
|
|
||||||
|
** {Root access exploits}
|
||||||
|
|
||||||
|
These exploits assume that attacker has gained root shell access to cluster machines, i.e. datanodes and namenodes.
|
||||||
|
Many of these exploits cannot be addressed in HDFS, since a malicious root user has access to the in-memory state of processes holding encryption keys and cleartext.
|
||||||
|
For these exploits, the only mitigation technique is carefully restricting and monitoring root shell access.
|
||||||
|
|
||||||
|
[[1]] Access to encrypted block files.
|
||||||
|
|
||||||
|
* By itself, this does not expose cleartext, as it also requires access to encryption keys.
|
||||||
|
|
||||||
|
[[1]] Dump memory of client processes to obtain DEKs, delegation tokens, cleartext.
|
||||||
|
|
||||||
|
* No mitigation.
|
||||||
|
|
||||||
|
[[1]] Recording network traffic to sniff encryption keys and encrypted data in transit.
|
||||||
|
|
||||||
|
* By itself, insufficient to read cleartext without the EDEK encryption key.
|
||||||
|
|
||||||
|
[[1]] Dump memory of datanode process to obtain encrypted block data.
|
||||||
|
|
||||||
|
* By itself, insufficient to read cleartext without the DEK.
|
||||||
|
|
||||||
|
[[1]] Dump memory of namenode process to obtain encrypted data encryption keys.
|
||||||
|
|
||||||
|
* By itself, insufficient to read cleartext without the EDEK's encryption key and encrypted block files.
|
||||||
|
|
||||||
|
** {HDFS admin exploits}
|
||||||
|
|
||||||
|
These exploits assume that the attacker has compromised HDFS, but does not have root or <<<hdfs>>> user shell access.
|
||||||
|
|
||||||
|
[[1]] Access to encrypted block files.
|
||||||
|
|
||||||
|
* By itself, insufficient to read cleartext without the EDEK and EDEK encryption key.
|
||||||
|
|
||||||
|
[[1]] Access to encryption zone and encrypted file metadata (including encrypted data encryption keys), via -fetchImage.
|
||||||
|
|
||||||
|
* By itself, insufficient to read cleartext without EDEK encryption keys.
|
||||||
|
|
||||||
|
** {Rogue user exploits}
|
||||||
|
|
||||||
|
A rogue user can collect keys to which they have access, and use them later to decrypt encrypted data.
|
||||||
|
This can be mitigated through periodic key rolling policies.
|
|
@ -89,6 +89,7 @@
|
||||||
<item name="HDFS NFS Gateway" href="hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html"/>
|
<item name="HDFS NFS Gateway" href="hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html"/>
|
||||||
<item name="HDFS Rolling Upgrade" href="hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html"/>
|
<item name="HDFS Rolling Upgrade" href="hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html"/>
|
||||||
<item name="Extended Attributes" href="hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html"/>
|
<item name="Extended Attributes" href="hadoop-project-dist/hadoop-hdfs/ExtendedAttributes.html"/>
|
||||||
|
<item name="Transparent Encryption" href="hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html"/>
|
||||||
<item name="HDFS Support for Multihoming" href="hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html"/>
|
<item name="HDFS Support for Multihoming" href="hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html"/>
|
||||||
</menu>
|
</menu>
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue