HADOOP-12229 Fix inconsistent subsection titles in filesystem.md. Contributed by Masatake Iwasaki

This commit is contained in:
Steve Loughran 2016-06-29 14:31:13 +01:00
parent 8113855b3a
commit 111739df8f
2 changed files with 77 additions and 66 deletions

View File

@ -19,6 +19,10 @@
# class `org.apache.hadoop.fs.FileSystem`
* [Invariants](#Invariants)
* [Predicates and other state access operations](#Predicates_and_other_state_access_operations)
* [State Changing Operations](#State_Changing_Operations)
The abstract `FileSystem` class is the original class to access Hadoop filesystems;
non-abstract subclasses exist for all Hadoop-supported filesystems.
@ -59,38 +63,6 @@ all operations on a valid FileSystem MUST result in a new FileSystem that is als
def isFile(FS, p) = p in files(FS)
### `boolean isSymlink(Path p)`
def isSymlink(FS, p) = p in symlinks(FS)
### 'boolean inEncryptionZone(Path p)'
Return True if the data for p is encrypted. The nature of the encryption and the
mechanism for creating an encryption zone are implementation details not covered
in this specification. No guarantees are made about the quality of the
encryption. The metadata is not encrypted.
#### Preconditions
if not exists(FS, p) : raise FileNotFoundException
#### Postconditions
#### Invariants
All files and directories under a directory in an encryption zone are also in an
encryption zone
forall d in directories(FS): inEncyptionZone(FS, d) implies
forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) :
inEncyptionZone(FS, c)
For all files in an encrypted zone, the data is encrypted, but the encryption
type and specification are not defined.
forall f in files(FS) where inEncyptionZone(FS, c):
isEncrypted(data(f))
### `FileStatus getFileStatus(Path p)`
@ -98,12 +70,10 @@ Get the status of a path
#### Preconditions
if not exists(FS, p) : raise FileNotFoundException
#### Postconditions
result = stat: FileStatus where:
if isFile(FS, p) :
stat.length = len(FS.Files[p])
@ -120,6 +90,7 @@ Get the status of a path
else
stat.isEncrypted = False
### `Path getHomeDirectory()`
The function `getHomeDirectory` returns the home directory for the FileSystem
@ -152,7 +123,7 @@ code may fail.
fail with a RuntimeException or subclass thereof if there is a connectivity
problem. The time to execute the operation is not bounded.
### `FileSystem.listStatus(Path, PathFilter )`
### `FileStatus[] listStatus(Path p, PathFilter filter)`
A `PathFilter` `f` is a predicate function that returns true iff the path `p`
meets the filter's conditions.
@ -188,7 +159,7 @@ While HDFS currently returns an alphanumerically sorted list, neither the Posix
nor Java's `File.listFiles()` API calls define any ordering of returned values. Applications
which require a uniform sort order on the results must perform the sorting themselves.
### Atomicity and Consistency
#### Atomicity and Consistency
By the time the `listStatus()` operation returns to the caller, there
is no guarantee that the information contained in the response is current.
@ -243,7 +214,7 @@ these inconsistent views are only likely when listing a directory with many chil
Other filesystems may have stronger consistency guarantees, or return inconsistent
data more readily.
### ` List[BlockLocation] getFileBlockLocations(FileStatus f, int s, int l)`
### `BlockLocation[] getFileBlockLocations(FileStatus f, int s, int l)`
#### Preconditions
@ -290,7 +261,7 @@ of elements as the cluster topology MUST be provided, hence Filesystems SHOULD
return that `"/default/localhost"` path
### `getFileBlockLocations(Path P, int S, int L)`
### `BlockLocation[] getFileBlockLocations(Path P, int S, int L)`
#### Preconditions
@ -304,7 +275,7 @@ return that `"/default/localhost"` path
result = getFileBlockLocations(getStatus(P), S, L)
### `getDefaultBlockSize()`
### `long getDefaultBlockSize()`
#### Preconditions
@ -322,7 +293,7 @@ Any FileSystem that does not actually break files into blocks SHOULD
return a number for this that results in efficient processing.
A FileSystem MAY make this user-configurable (the S3 and Swift filesystem clients do this).
### `getDefaultBlockSize(Path P)`
### `long getDefaultBlockSize(Path p)`
#### Preconditions
@ -340,7 +311,7 @@ different paths, in which case the specific default value for the destination pa
SHOULD be returned.
### `getBlockSize(Path P)`
### `long getBlockSize(Path p)`
#### Preconditions
@ -358,7 +329,7 @@ the `FileStatus` returned from `getFileStatus(P)`.
## State Changing Operations
### `boolean mkdirs(Path p, FsPermission permission )`
### `boolean mkdirs(Path p, FsPermission permission)`
Create a directory and all its parents
@ -515,7 +486,7 @@ exists in the metadata, but no copies of any its blocks can be located;
-`FileNotFoundException` would seem more accurate and useful.
### `FileSystem.delete(Path P, boolean recursive)`
### `boolean delete(Path p, boolean recursive)`
#### Preconditions
@ -619,12 +590,8 @@ implement `delete()` as recursive listing and file delete operation.
This can break the expectations of client applications -and means that
they cannot be used as drop-in replacements for HDFS.
<!-- ============================================================= -->
<!-- METHOD: rename() -->
<!-- ============================================================= -->
### `FileSystem.rename(Path src, Path d)`
### `boolean rename(Path src, Path d)`
In terms of its specification, `rename()` is one of the most complex operations within a filesystem .
@ -791,7 +758,7 @@ The behavior of HDFS here should not be considered a feature to replicate.
to the `DFSFileSystem` implementation is an ongoing matter for debate.
### `concat(Path p, Path sources[])`
### `void concat(Path p, Path sources[])`
Joins multiple blocks together to create a single file. This
is a little-used operation currently implemented only by HDFS.

View File

@ -14,9 +14,21 @@
# A Model of a Hadoop Filesystem
* [Paths and Path Elements](#Paths_and_Path_Elements)
* [Predicates and Functions](#Predicates_and_Functions)
* [Notes for relative paths](#Notes_for_relative_paths)
* [Defining the Filesystem](#Defining_the_Filesystem)
* [Directory references](#Directory_references)
* [File references](#File_references)
* [Symbolic references](#Symbolic_references)
* [File Length](#File_Length)
* [User home](#User_home)
* [Exclusivity](#Exclusivity)
* [Encryption Zone](#Encryption_Zone)
* [Notes](#Notes)
#### Paths and Path Elements
## Paths and Path Elements
A Path is a list of Path elements which represents a path to a file, directory of symbolic link
@ -32,7 +44,9 @@ Filesystems MAY have other strings that are not permitted in a path element.
When validating path elements, the exception `InvalidPathException` SHOULD
be raised when a path is invalid [HDFS]
Predicate: `valid-path-element:List[String];`
### Predicates and Functions
#### `valid-path-element(List[String]): bool`
A path element `pe` is invalid if any character in it is in the set of forbidden characters,
or the element as a whole is invalid
@ -41,17 +55,20 @@ or the element as a whole is invalid
not pe in {"", ".", "..", "/"}
Predicate: `valid-path:List<PathElement>`
#### `valid-path(List[PathElement]): bool`
A Path `p` is *valid* if all path elements in it are valid
def valid-path(pe): forall pe in Path: valid-path-element(pe)
def valid-path(path): forall pe in path: valid-path-element(pe)
The set of all possible paths is *Paths*; this is the infinite set of all lists of valid path elements.
The path represented by empty list, `[]` is the *root path*, and is denoted by the string `"/"`.
#### `parent(path:Path): Path`
The partial function `parent(path:Path):Path` provides the parent path can be defined using
list slicing.
@ -62,7 +79,7 @@ Preconditions:
path != []
#### `filename:Path->PathElement`
#### `filename(Path): PathElement`
The last Path Element in a Path is called the filename.
@ -72,7 +89,7 @@ Preconditions:
p != []
#### `childElements:(Path p, Path q):Path`
#### `childElements(Path p, Path q): Path`
The partial function `childElements:(Path p, Path q):Path`
@ -87,12 +104,12 @@ Preconditions:
q == p[:len(q)]
#### ancestors(Path): List[Path]
#### `ancestors(Path): List[Path]`
The list of all paths that are either the direct parent of a path p, or a parent of
ancestor of p.
#### Notes
### Notes for relative paths
This definition handles absolute paths but not relative ones; it needs to be reworked so the root element is explicit, presumably
by declaring that the root (and only the root) path element may be ['/'].
@ -100,18 +117,18 @@ by declaring that the root (and only the root) path element may be ['/'].
Relative paths can then be distinguished from absolute paths as the input to any function and resolved when the second entry in a two-argument function
such as `rename`.
### Defining the Filesystem
## Defining the Filesystem
A filesystem `FS` contains a set of directories, a dictionary of paths and a dictionary of symbolic links
(Directories:set[Path], Files:[Path:List[byte]], Symlinks:set[Path])
(Directories:Set[Path], Files:[Path:List[byte]], Symlinks:Set[Path])
Accessor functions return the specific element of a filesystem
def FS.Directories = FS.Directories
def file(FS) = FS.Files
def files(FS) = FS.Files
def symlinks(FS) = FS.Symlinks
def filenames(FS) = keys(FS.Files)
@ -131,7 +148,7 @@ The root path, "/", is a directory represented by the path ["/"], which must al
#### Directory references
### Directory references
A path MAY refer to a directory in a FileSystem:
@ -172,21 +189,21 @@ path begins with the path P -that is their parent is P or an ancestor is P
def descendants(FS, D) = {p for p in paths(FS) where isDescendant(D, p)}
#### File references
### File references
A path MAY refer to a file; that it it has data in the filesystem; its path is a key in the data dictionary
def isFile(FS, p) = p in FS.Files
#### Symbolic references
### Symbolic references
A path MAY refer to a symbolic link:
def isSymlink(FS, p) = p in symlinks(FS)
#### File Length
### File Length
The length of a path p in a filesystem FS is the length of the data stored, or 0 if it is a directory:
@ -203,7 +220,8 @@ The function `getHomeDirectory` returns the home directory for the Filesystem an
For some FileSystems, the path is `["/","users", System.getProperty("user-name")]`. However,
for HDFS,
#### Exclusivity
### Exclusivity
A path cannot refer to more than one of a file, a directory or a symbolic link
@ -218,7 +236,33 @@ This implies that only files may have data.
This condition is invariant and is an implicit postcondition of all
operations that manipulate the state of a FileSystem `FS`.
### Notes
### Encryption Zone
The data is encrypted if the file is in encryption zone.
def inEncryptionZone(FS, path): bool
The nature of the encryption and the mechanism for creating an encryption zone
are implementation details not covered in this specification.
No guarantees are made about the quality of the encryption.
The metadata is not encrypted.
All files and directories under a directory in an encryption zone are also in an
encryption zone.
forall d in directories(FS): inEncyptionZone(FS, d) implies
forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) :
inEncyptionZone(FS, c)
For all files in an encrypted zone, the data is encrypted, but the encryption
type and specification are not defined.
forall f in files(FS) where inEncyptionZone(FS, f):
isEncrypted(data(f))
## Notes
Not covered: hard links in a FileSystem. If a FileSystem supports multiple
references in *paths(FS)* to point to the same data, the outcome of operations