From 139fd21c8383f7535de16ea4cd8e0e147f7a23e9 Mon Sep 17 00:00:00 2001 From: Tsuyoshi Ozawa Date: Thu, 3 Sep 2015 22:34:25 +0900 Subject: [PATCH] HADOOP-5323. Trash documentation should describe its directory structure and configurations. Contributed by Weiwei Yang. (cherry picked from commit 9a87f81ce6839d1b5ab3d663c1e82c7e572bdeb8) Conflicts: hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md --- hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt | 3 +++ .../hadoop-hdfs/src/site/markdown/HdfsDesign.md | 6 ++++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt index 2a41a9ed888..5c4e16e4515 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt +++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt @@ -10,6 +10,9 @@ Release 2.7.2 - UNRELEASED HDFS-8659. Block scanner INFO message is spamming logs. (Yongjun Zhang) + HADOOP-5323. Trash documentation should describe its directory structure and + configurations. (Weiwei Yang via ozawa) + OPTIMIZATIONS HDFS-8722. Optimize datanode writes for small writes and flushes (kihwal) diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md index 30571c88a2b..8f1d011490c 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md +++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDesign.md @@ -224,9 +224,11 @@ Space Reclamation ### File Deletes and Undeletes -When a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS first renames it to a file in the `/trash` directory. The file can be restored quickly as long as it remains in `/trash`. A file remains in `/trash` for a configurable amount of time. After the expiry of its life in `/trash`, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS. +When a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS moves it to a trash directory (each user has its own trash directory under `/user//.Trash`). +The file can be restored quickly as long as it remains in trash. Most recent deleted files are moved to the current trash directory (`/user//.Trash/Current`), and in a configurable interval, HDFS creates checkpoints (under `/user//.Trash/`) for files in current trash directory and deletes old checkpoints when they are expired. +After the expiry of its life in trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS. -A user can Undelete a file after deleting it as long as it remains in the `/trash` directory. If a user wants to undelete a file that he/she has deleted, he/she can navigate the `/trash` directory and retrieve the file. The `/trash` directory contains only the latest copy of the file that was deleted. The `/trash` directory is just like any other directory with one special feature: HDFS applies specified policies to automatically delete files from this directory. Current default trash interval is set to 0 (Deletes file without storing in trash). This value is configurable parameter stored as `fs.trash.interval` stored in core-site.xml. +Currently, the trash feature is disabled by default (deleting files without storing in trash). User can enable this feature by setting a value greater than zero for parameter `fs.trash.interval` (in core-site.xml). This value tells the NameNode how long a checkpoint will be expired and removed from HDFS. In addition, user can configure an appropriate time to tell NameNode how often to create checkpoints in trash (the parameter stored as `fs.trash.checkpoint.interval` in core-site.xml), this value should be smaller or equal to fs.trash.interval. ### Decrease Replication Factor