HADOOP-5323. Trash documentation should describe its directory structure and configurations. Contributed by Weiwei Yang.
(cherry picked from commit 9a87f81ce6
)
Conflicts:
hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md
This commit is contained in:
parent
2d1ff2e1ca
commit
139fd21c83
|
@ -10,6 +10,9 @@ Release 2.7.2 - UNRELEASED
|
|||
|
||||
HDFS-8659. Block scanner INFO message is spamming logs. (Yongjun Zhang)
|
||||
|
||||
HADOOP-5323. Trash documentation should describe its directory structure and
|
||||
configurations. (Weiwei Yang via ozawa)
|
||||
|
||||
OPTIMIZATIONS
|
||||
|
||||
HDFS-8722. Optimize datanode writes for small writes and flushes (kihwal)
|
||||
|
|
|
@ -224,9 +224,11 @@ Space Reclamation
|
|||
|
||||
### File Deletes and Undeletes
|
||||
|
||||
When a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS first renames it to a file in the `/trash` directory. The file can be restored quickly as long as it remains in `/trash`. A file remains in `/trash` for a configurable amount of time. After the expiry of its life in `/trash`, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.
|
||||
When a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS moves it to a trash directory (each user has its own trash directory under `/user/<username>/.Trash`).
|
||||
The file can be restored quickly as long as it remains in trash. Most recent deleted files are moved to the current trash directory (`/user/<username>/.Trash/Current`), and in a configurable interval, HDFS creates checkpoints (under `/user/<username>/.Trash/<date>`) for files in current trash directory and deletes old checkpoints when they are expired.
|
||||
After the expiry of its life in trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.
|
||||
|
||||
A user can Undelete a file after deleting it as long as it remains in the `/trash` directory. If a user wants to undelete a file that he/she has deleted, he/she can navigate the `/trash` directory and retrieve the file. The `/trash` directory contains only the latest copy of the file that was deleted. The `/trash` directory is just like any other directory with one special feature: HDFS applies specified policies to automatically delete files from this directory. Current default trash interval is set to 0 (Deletes file without storing in trash). This value is configurable parameter stored as `fs.trash.interval` stored in core-site.xml.
|
||||
Currently, the trash feature is disabled by default (deleting files without storing in trash). User can enable this feature by setting a value greater than zero for parameter `fs.trash.interval` (in core-site.xml). This value tells the NameNode how long a checkpoint will be expired and removed from HDFS. In addition, user can configure an appropriate time to tell NameNode how often to create checkpoints in trash (the parameter stored as `fs.trash.checkpoint.interval` in core-site.xml), this value should be smaller or equal to fs.trash.interval.
|
||||
|
||||
### Decrease Replication Factor
|
||||
|
||||
|
|
Loading…
Reference in New Issue