HADOOP-12229 Fix inconsistent subsection titles in filesystem.md. Contributed by Masatake Iwasaki

This commit is contained in:
Steve Loughran 2016-06-29 14:31:13 +01:00
parent f4f0c5074d
commit 52dbafecc5
2 changed files with 77 additions and 66 deletions

View File

@ -19,6 +19,10 @@
# class `org.apache.hadoop.fs.FileSystem` # class `org.apache.hadoop.fs.FileSystem`
* [Invariants](#Invariants)
* [Predicates and other state access operations](#Predicates_and_other_state_access_operations)
* [State Changing Operations](#State_Changing_Operations)
The abstract `FileSystem` class is the original class to access Hadoop filesystems; The abstract `FileSystem` class is the original class to access Hadoop filesystems;
non-abstract subclasses exist for all Hadoop-supported filesystems. non-abstract subclasses exist for all Hadoop-supported filesystems.
@ -59,38 +63,6 @@ all operations on a valid FileSystem MUST result in a new FileSystem that is als
def isFile(FS, p) = p in files(FS) def isFile(FS, p) = p in files(FS)
### `boolean isSymlink(Path p)`
def isSymlink(FS, p) = p in symlinks(FS)
### 'boolean inEncryptionZone(Path p)'
Return True if the data for p is encrypted. The nature of the encryption and the
mechanism for creating an encryption zone are implementation details not covered
in this specification. No guarantees are made about the quality of the
encryption. The metadata is not encrypted.
#### Preconditions
if not exists(FS, p) : raise FileNotFoundException
#### Postconditions
#### Invariants
All files and directories under a directory in an encryption zone are also in an
encryption zone
forall d in directories(FS): inEncyptionZone(FS, d) implies
forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) :
inEncyptionZone(FS, c)
For all files in an encrypted zone, the data is encrypted, but the encryption
type and specification are not defined.
forall f in files(FS) where inEncyptionZone(FS, c):
isEncrypted(data(f))
### `FileStatus getFileStatus(Path p)` ### `FileStatus getFileStatus(Path p)`
@ -98,12 +70,10 @@ Get the status of a path
#### Preconditions #### Preconditions
if not exists(FS, p) : raise FileNotFoundException if not exists(FS, p) : raise FileNotFoundException
#### Postconditions #### Postconditions
result = stat: FileStatus where: result = stat: FileStatus where:
if isFile(FS, p) : if isFile(FS, p) :
stat.length = len(FS.Files[p]) stat.length = len(FS.Files[p])
@ -120,6 +90,7 @@ Get the status of a path
else else
stat.isEncrypted = False stat.isEncrypted = False
### `Path getHomeDirectory()` ### `Path getHomeDirectory()`
The function `getHomeDirectory` returns the home directory for the FileSystem The function `getHomeDirectory` returns the home directory for the FileSystem
@ -152,7 +123,7 @@ code may fail.
fail with a RuntimeException or subclass thereof if there is a connectivity fail with a RuntimeException or subclass thereof if there is a connectivity
problem. The time to execute the operation is not bounded. problem. The time to execute the operation is not bounded.
### `FileSystem.listStatus(Path, PathFilter )` ### `FileStatus[] listStatus(Path p, PathFilter filter)`
A `PathFilter` `f` is a predicate function that returns true iff the path `p` A `PathFilter` `f` is a predicate function that returns true iff the path `p`
meets the filter's conditions. meets the filter's conditions.
@ -184,7 +155,7 @@ to the same path:
fs == getFileStatus(fs.path) fs == getFileStatus(fs.path)
### Atomicity and Consistency #### Atomicity and Consistency
By the time the `listStatus()` operation returns to the caller, there By the time the `listStatus()` operation returns to the caller, there
is no guarantee that the information contained in the response is current. is no guarantee that the information contained in the response is current.
@ -239,7 +210,7 @@ these inconsistent views are only likely when listing a directory with many chil
Other filesystems may have stronger consistency guarantees, or return inconsistent Other filesystems may have stronger consistency guarantees, or return inconsistent
data more readily. data more readily.
### ` List[BlockLocation] getFileBlockLocations(FileStatus f, int s, int l)` ### `BlockLocation[] getFileBlockLocations(FileStatus f, int s, int l)`
#### Preconditions #### Preconditions
@ -286,7 +257,7 @@ of elements as the cluster topology MUST be provided, hence Filesystems SHOULD
return that `"/default/localhost"` path return that `"/default/localhost"` path
### `getFileBlockLocations(Path P, int S, int L)` ### `BlockLocation[] getFileBlockLocations(Path P, int S, int L)`
#### Preconditions #### Preconditions
@ -300,7 +271,7 @@ return that `"/default/localhost"` path
result = getFileBlockLocations(getStatus(P), S, L) result = getFileBlockLocations(getStatus(P), S, L)
### `getDefaultBlockSize()` ### `long getDefaultBlockSize()`
#### Preconditions #### Preconditions
@ -318,7 +289,7 @@ Any FileSystem that does not actually break files into blocks SHOULD
return a number for this that results in efficient processing. return a number for this that results in efficient processing.
A FileSystem MAY make this user-configurable (the S3 and Swift filesystem clients do this). A FileSystem MAY make this user-configurable (the S3 and Swift filesystem clients do this).
### `getDefaultBlockSize(Path P)` ### `long getDefaultBlockSize(Path p)`
#### Preconditions #### Preconditions
@ -336,7 +307,7 @@ different paths, in which case the specific default value for the destination pa
SHOULD be returned. SHOULD be returned.
### `getBlockSize(Path P)` ### `long getBlockSize(Path p)`
#### Preconditions #### Preconditions
@ -511,7 +482,7 @@ exists in the metadata, but no copies of any its blocks can be located;
-`FileNotFoundException` would seem more accurate and useful. -`FileNotFoundException` would seem more accurate and useful.
### `FileSystem.delete(Path P, boolean recursive)` ### `boolean delete(Path p, boolean recursive)`
#### Preconditions #### Preconditions
@ -615,12 +586,8 @@ implement `delete()` as recursive listing and file delete operation.
This can break the expectations of client applications -and means that This can break the expectations of client applications -and means that
they cannot be used as drop-in replacements for HDFS. they cannot be used as drop-in replacements for HDFS.
<!-- ============================================================= -->
<!-- METHOD: rename() -->
<!-- ============================================================= -->
### `boolean rename(Path src, Path d)`
### `FileSystem.rename(Path src, Path d)`
In terms of its specification, `rename()` is one of the most complex operations within a filesystem . In terms of its specification, `rename()` is one of the most complex operations within a filesystem .
@ -787,7 +754,7 @@ The behavior of HDFS here should not be considered a feature to replicate.
to the `DFSFileSystem` implementation is an ongoing matter for debate. to the `DFSFileSystem` implementation is an ongoing matter for debate.
### `concat(Path p, Path sources[])` ### `void concat(Path p, Path sources[])`
Joins multiple blocks together to create a single file. This Joins multiple blocks together to create a single file. This
is a little-used operation currently implemented only by HDFS. is a little-used operation currently implemented only by HDFS.

View File

@ -14,9 +14,21 @@
# A Model of a Hadoop Filesystem # A Model of a Hadoop Filesystem
* [Paths and Path Elements](#Paths_and_Path_Elements)
* [Predicates and Functions](#Predicates_and_Functions)
* [Notes for relative paths](#Notes_for_relative_paths)
* [Defining the Filesystem](#Defining_the_Filesystem)
* [Directory references](#Directory_references)
* [File references](#File_references)
* [Symbolic references](#Symbolic_references)
* [File Length](#File_Length)
* [User home](#User_home)
* [Exclusivity](#Exclusivity)
* [Encryption Zone](#Encryption_Zone)
* [Notes](#Notes)
#### Paths and Path Elements ## Paths and Path Elements
A Path is a list of Path elements which represents a path to a file, directory of symbolic link A Path is a list of Path elements which represents a path to a file, directory of symbolic link
@ -32,7 +44,9 @@ Filesystems MAY have other strings that are not permitted in a path element.
When validating path elements, the exception `InvalidPathException` SHOULD When validating path elements, the exception `InvalidPathException` SHOULD
be raised when a path is invalid [HDFS] be raised when a path is invalid [HDFS]
Predicate: `valid-path-element:List[String];` ### Predicates and Functions
#### `valid-path-element(List[String]): bool`
A path element `pe` is invalid if any character in it is in the set of forbidden characters, A path element `pe` is invalid if any character in it is in the set of forbidden characters,
or the element as a whole is invalid or the element as a whole is invalid
@ -41,17 +55,20 @@ or the element as a whole is invalid
not pe in {"", ".", "..", "/"} not pe in {"", ".", "..", "/"}
Predicate: `valid-path:List<PathElement>` #### `valid-path(List[PathElement]): bool`
A Path `p` is *valid* if all path elements in it are valid A Path `p` is *valid* if all path elements in it are valid
def valid-path(pe): forall pe in Path: valid-path-element(pe) def valid-path(path): forall pe in path: valid-path-element(pe)
The set of all possible paths is *Paths*; this is the infinite set of all lists of valid path elements. The set of all possible paths is *Paths*; this is the infinite set of all lists of valid path elements.
The path represented by empty list, `[]` is the *root path*, and is denoted by the string `"/"`. The path represented by empty list, `[]` is the *root path*, and is denoted by the string `"/"`.
#### `parent(path:Path): Path`
The partial function `parent(path:Path):Path` provides the parent path can be defined using The partial function `parent(path:Path):Path` provides the parent path can be defined using
list slicing. list slicing.
@ -62,7 +79,7 @@ Preconditions:
path != [] path != []
#### `filename:Path->PathElement` #### `filename(Path): PathElement`
The last Path Element in a Path is called the filename. The last Path Element in a Path is called the filename.
@ -72,7 +89,7 @@ Preconditions:
p != [] p != []
#### `childElements:(Path p, Path q):Path` #### `childElements(Path p, Path q): Path`
The partial function `childElements:(Path p, Path q):Path` The partial function `childElements:(Path p, Path q):Path`
@ -87,12 +104,12 @@ Preconditions:
q == p[:len(q)] q == p[:len(q)]
#### ancestors(Path): List[Path] #### `ancestors(Path): List[Path]`
The list of all paths that are either the direct parent of a path p, or a parent of The list of all paths that are either the direct parent of a path p, or a parent of
ancestor of p. ancestor of p.
#### Notes ### Notes for relative paths
This definition handles absolute paths but not relative ones; it needs to be reworked so the root element is explicit, presumably This definition handles absolute paths but not relative ones; it needs to be reworked so the root element is explicit, presumably
by declaring that the root (and only the root) path element may be ['/']. by declaring that the root (and only the root) path element may be ['/'].
@ -100,18 +117,18 @@ by declaring that the root (and only the root) path element may be ['/'].
Relative paths can then be distinguished from absolute paths as the input to any function and resolved when the second entry in a two-argument function Relative paths can then be distinguished from absolute paths as the input to any function and resolved when the second entry in a two-argument function
such as `rename`. such as `rename`.
### Defining the Filesystem ## Defining the Filesystem
A filesystem `FS` contains a set of directories, a dictionary of paths and a dictionary of symbolic links A filesystem `FS` contains a set of directories, a dictionary of paths and a dictionary of symbolic links
(Directories:set[Path], Files:[Path:List[byte]], Symlinks:set[Path]) (Directories:Set[Path], Files:[Path:List[byte]], Symlinks:Set[Path])
Accessor functions return the specific element of a filesystem Accessor functions return the specific element of a filesystem
def FS.Directories = FS.Directories def FS.Directories = FS.Directories
def file(FS) = FS.Files def files(FS) = FS.Files
def symlinks(FS) = FS.Symlinks def symlinks(FS) = FS.Symlinks
def filenames(FS) = keys(FS.Files) def filenames(FS) = keys(FS.Files)
@ -131,7 +148,7 @@ The root path, "/", is a directory represented by the path ["/"], which must al
#### Directory references ### Directory references
A path MAY refer to a directory in a FileSystem: A path MAY refer to a directory in a FileSystem:
@ -172,21 +189,21 @@ path begins with the path P -that is their parent is P or an ancestor is P
def descendants(FS, D) = {p for p in paths(FS) where isDescendant(D, p)} def descendants(FS, D) = {p for p in paths(FS) where isDescendant(D, p)}
#### File references ### File references
A path MAY refer to a file; that it it has data in the filesystem; its path is a key in the data dictionary A path MAY refer to a file; that it it has data in the filesystem; its path is a key in the data dictionary
def isFile(FS, p) = p in FS.Files def isFile(FS, p) = p in FS.Files
#### Symbolic references ### Symbolic references
A path MAY refer to a symbolic link: A path MAY refer to a symbolic link:
def isSymlink(FS, p) = p in symlinks(FS) def isSymlink(FS, p) = p in symlinks(FS)
#### File Length ### File Length
The length of a path p in a filesystem FS is the length of the data stored, or 0 if it is a directory: The length of a path p in a filesystem FS is the length of the data stored, or 0 if it is a directory:
@ -203,7 +220,8 @@ The function `getHomeDirectory` returns the home directory for the Filesystem an
For some FileSystems, the path is `["/","users", System.getProperty("user-name")]`. However, For some FileSystems, the path is `["/","users", System.getProperty("user-name")]`. However,
for HDFS, for HDFS,
#### Exclusivity
### Exclusivity
A path cannot refer to more than one of a file, a directory or a symbolic link A path cannot refer to more than one of a file, a directory or a symbolic link
@ -218,7 +236,33 @@ This implies that only files may have data.
This condition is invariant and is an implicit postcondition of all This condition is invariant and is an implicit postcondition of all
operations that manipulate the state of a FileSystem `FS`. operations that manipulate the state of a FileSystem `FS`.
### Notes
### Encryption Zone
The data is encrypted if the file is in encryption zone.
def inEncryptionZone(FS, path): bool
The nature of the encryption and the mechanism for creating an encryption zone
are implementation details not covered in this specification.
No guarantees are made about the quality of the encryption.
The metadata is not encrypted.
All files and directories under a directory in an encryption zone are also in an
encryption zone.
forall d in directories(FS): inEncyptionZone(FS, d) implies
forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) :
inEncyptionZone(FS, c)
For all files in an encrypted zone, the data is encrypted, but the encryption
type and specification are not defined.
forall f in files(FS) where inEncyptionZone(FS, f):
isEncrypted(data(f))
## Notes
Not covered: hard links in a FileSystem. If a FileSystem supports multiple Not covered: hard links in a FileSystem. If a FileSystem supports multiple
references in *paths(FS)* to point to the same data, the outcome of operations references in *paths(FS)* to point to the same data, the outcome of operations