HADOOP-5323. Trash documentation should describe its directory structure and configurations. Contributed by Weiwei Yang.

(cherry picked from commit 9a87f81ce6)
This commit is contained in:
Tsuyoshi Ozawa 2015-09-03 22:34:25 +09:00
parent a53054c853
commit f4ee0bf19f
2 changed files with 7 additions and 2 deletions

View File

@ -953,6 +953,9 @@ Release 2.7.2 - UNRELEASED
HDFS-8659. Block scanner INFO message is spamming logs. (Yongjun Zhang)
HADOOP-5323. Trash documentation should describe its directory structure and
configurations. (Weiwei Yang via ozawa)
OPTIMIZATIONS
HDFS-8722. Optimize datanode writes for small writes and flushes (kihwal)

View File

@ -225,9 +225,11 @@ Space Reclamation
### File Deletes and Undeletes
When a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS first renames it to a file in the trash directory(`/user/<username>/.Trash`). The file can be restored quickly as long as it remains in trash. A file remains in trash for a configurable amount of time. After the expiry of its life in trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.
When a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS moves it to a trash directory (each user has its own trash directory under `/user/<username>/.Trash`).
The file can be restored quickly as long as it remains in trash. Most recent deleted files are moved to the current trash directory (`/user/<username>/.Trash/Current`), and in a configurable interval, HDFS creates checkpoints (under `/user/<username>/.Trash/<date>`) for files in current trash directory and deletes old checkpoints when they are expired.
After the expiry of its life in trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.
A user can Undelete a file after deleting it as long as it remains in the trash directory. If a user wants to undelete a file that he/she has deleted, he/she can navigate the trash directory and retrieve the file. The trash directory contains only the latest copy of the file that was deleted. The trash directory is just like any other directory with one special feature: HDFS applies specified policies to automatically delete files from this directory. Current default trash interval is set to 0 (Deletes file without storing in trash). This value is configurable parameter stored as `fs.trash.interval` stored in core-site.xml.
Currently, the trash feature is disabled by default (deleting files without storing in trash). User can enable this feature by setting a value greater than zero for parameter `fs.trash.interval` (in core-site.xml). This value tells the NameNode how long a checkpoint will be expired and removed from HDFS. In addition, user can configure an appropriate time to tell NameNode how often to create checkpoints in trash (the parameter stored as `fs.trash.checkpoint.interval` in core-site.xml), this value should be smaller or equal to fs.trash.interval.
### Decrease Replication Factor