HADOOP-12229 Fix inconsistent subsection titles in filesystem.md. Contributed by Masatake Iwasaki
This commit is contained in:
parent
f4f0c5074d
commit
52dbafecc5
|
@ -19,6 +19,10 @@
|
||||||
|
|
||||||
# class `org.apache.hadoop.fs.FileSystem`
|
# class `org.apache.hadoop.fs.FileSystem`
|
||||||
|
|
||||||
|
* [Invariants](#Invariants)
|
||||||
|
* [Predicates and other state access operations](#Predicates_and_other_state_access_operations)
|
||||||
|
* [State Changing Operations](#State_Changing_Operations)
|
||||||
|
|
||||||
The abstract `FileSystem` class is the original class to access Hadoop filesystems;
|
The abstract `FileSystem` class is the original class to access Hadoop filesystems;
|
||||||
non-abstract subclasses exist for all Hadoop-supported filesystems.
|
non-abstract subclasses exist for all Hadoop-supported filesystems.
|
||||||
|
|
||||||
|
@ -59,38 +63,6 @@ all operations on a valid FileSystem MUST result in a new FileSystem that is als
|
||||||
|
|
||||||
def isFile(FS, p) = p in files(FS)
|
def isFile(FS, p) = p in files(FS)
|
||||||
|
|
||||||
### `boolean isSymlink(Path p)`
|
|
||||||
|
|
||||||
|
|
||||||
def isSymlink(FS, p) = p in symlinks(FS)
|
|
||||||
|
|
||||||
### 'boolean inEncryptionZone(Path p)'
|
|
||||||
|
|
||||||
Return True if the data for p is encrypted. The nature of the encryption and the
|
|
||||||
mechanism for creating an encryption zone are implementation details not covered
|
|
||||||
in this specification. No guarantees are made about the quality of the
|
|
||||||
encryption. The metadata is not encrypted.
|
|
||||||
|
|
||||||
#### Preconditions
|
|
||||||
|
|
||||||
if not exists(FS, p) : raise FileNotFoundException
|
|
||||||
|
|
||||||
#### Postconditions
|
|
||||||
|
|
||||||
#### Invariants
|
|
||||||
|
|
||||||
All files and directories under a directory in an encryption zone are also in an
|
|
||||||
encryption zone
|
|
||||||
|
|
||||||
forall d in directories(FS): inEncyptionZone(FS, d) implies
|
|
||||||
forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) :
|
|
||||||
inEncyptionZone(FS, c)
|
|
||||||
|
|
||||||
For all files in an encrypted zone, the data is encrypted, but the encryption
|
|
||||||
type and specification are not defined.
|
|
||||||
|
|
||||||
forall f in files(FS) where inEncyptionZone(FS, c):
|
|
||||||
isEncrypted(data(f))
|
|
||||||
|
|
||||||
### `FileStatus getFileStatus(Path p)`
|
### `FileStatus getFileStatus(Path p)`
|
||||||
|
|
||||||
|
@ -98,12 +70,10 @@ Get the status of a path
|
||||||
|
|
||||||
#### Preconditions
|
#### Preconditions
|
||||||
|
|
||||||
|
|
||||||
if not exists(FS, p) : raise FileNotFoundException
|
if not exists(FS, p) : raise FileNotFoundException
|
||||||
|
|
||||||
#### Postconditions
|
#### Postconditions
|
||||||
|
|
||||||
|
|
||||||
result = stat: FileStatus where:
|
result = stat: FileStatus where:
|
||||||
if isFile(FS, p) :
|
if isFile(FS, p) :
|
||||||
stat.length = len(FS.Files[p])
|
stat.length = len(FS.Files[p])
|
||||||
|
@ -120,6 +90,7 @@ Get the status of a path
|
||||||
else
|
else
|
||||||
stat.isEncrypted = False
|
stat.isEncrypted = False
|
||||||
|
|
||||||
|
|
||||||
### `Path getHomeDirectory()`
|
### `Path getHomeDirectory()`
|
||||||
|
|
||||||
The function `getHomeDirectory` returns the home directory for the FileSystem
|
The function `getHomeDirectory` returns the home directory for the FileSystem
|
||||||
|
@ -152,7 +123,7 @@ code may fail.
|
||||||
fail with a RuntimeException or subclass thereof if there is a connectivity
|
fail with a RuntimeException or subclass thereof if there is a connectivity
|
||||||
problem. The time to execute the operation is not bounded.
|
problem. The time to execute the operation is not bounded.
|
||||||
|
|
||||||
### `FileSystem.listStatus(Path, PathFilter )`
|
### `FileStatus[] listStatus(Path p, PathFilter filter)`
|
||||||
|
|
||||||
A `PathFilter` `f` is a predicate function that returns true iff the path `p`
|
A `PathFilter` `f` is a predicate function that returns true iff the path `p`
|
||||||
meets the filter's conditions.
|
meets the filter's conditions.
|
||||||
|
@ -184,7 +155,7 @@ to the same path:
|
||||||
fs == getFileStatus(fs.path)
|
fs == getFileStatus(fs.path)
|
||||||
|
|
||||||
|
|
||||||
### Atomicity and Consistency
|
#### Atomicity and Consistency
|
||||||
|
|
||||||
By the time the `listStatus()` operation returns to the caller, there
|
By the time the `listStatus()` operation returns to the caller, there
|
||||||
is no guarantee that the information contained in the response is current.
|
is no guarantee that the information contained in the response is current.
|
||||||
|
@ -239,7 +210,7 @@ these inconsistent views are only likely when listing a directory with many chil
|
||||||
Other filesystems may have stronger consistency guarantees, or return inconsistent
|
Other filesystems may have stronger consistency guarantees, or return inconsistent
|
||||||
data more readily.
|
data more readily.
|
||||||
|
|
||||||
### ` List[BlockLocation] getFileBlockLocations(FileStatus f, int s, int l)`
|
### `BlockLocation[] getFileBlockLocations(FileStatus f, int s, int l)`
|
||||||
|
|
||||||
#### Preconditions
|
#### Preconditions
|
||||||
|
|
||||||
|
@ -286,7 +257,7 @@ of elements as the cluster topology MUST be provided, hence Filesystems SHOULD
|
||||||
return that `"/default/localhost"` path
|
return that `"/default/localhost"` path
|
||||||
|
|
||||||
|
|
||||||
### `getFileBlockLocations(Path P, int S, int L)`
|
### `BlockLocation[] getFileBlockLocations(Path P, int S, int L)`
|
||||||
|
|
||||||
#### Preconditions
|
#### Preconditions
|
||||||
|
|
||||||
|
@ -300,7 +271,7 @@ return that `"/default/localhost"` path
|
||||||
result = getFileBlockLocations(getStatus(P), S, L)
|
result = getFileBlockLocations(getStatus(P), S, L)
|
||||||
|
|
||||||
|
|
||||||
### `getDefaultBlockSize()`
|
### `long getDefaultBlockSize()`
|
||||||
|
|
||||||
#### Preconditions
|
#### Preconditions
|
||||||
|
|
||||||
|
@ -318,7 +289,7 @@ Any FileSystem that does not actually break files into blocks SHOULD
|
||||||
return a number for this that results in efficient processing.
|
return a number for this that results in efficient processing.
|
||||||
A FileSystem MAY make this user-configurable (the S3 and Swift filesystem clients do this).
|
A FileSystem MAY make this user-configurable (the S3 and Swift filesystem clients do this).
|
||||||
|
|
||||||
### `getDefaultBlockSize(Path P)`
|
### `long getDefaultBlockSize(Path p)`
|
||||||
|
|
||||||
#### Preconditions
|
#### Preconditions
|
||||||
|
|
||||||
|
@ -336,7 +307,7 @@ different paths, in which case the specific default value for the destination pa
|
||||||
SHOULD be returned.
|
SHOULD be returned.
|
||||||
|
|
||||||
|
|
||||||
### `getBlockSize(Path P)`
|
### `long getBlockSize(Path p)`
|
||||||
|
|
||||||
#### Preconditions
|
#### Preconditions
|
||||||
|
|
||||||
|
@ -511,7 +482,7 @@ exists in the metadata, but no copies of any its blocks can be located;
|
||||||
-`FileNotFoundException` would seem more accurate and useful.
|
-`FileNotFoundException` would seem more accurate and useful.
|
||||||
|
|
||||||
|
|
||||||
### `FileSystem.delete(Path P, boolean recursive)`
|
### `boolean delete(Path p, boolean recursive)`
|
||||||
|
|
||||||
#### Preconditions
|
#### Preconditions
|
||||||
|
|
||||||
|
@ -615,12 +586,8 @@ implement `delete()` as recursive listing and file delete operation.
|
||||||
This can break the expectations of client applications -and means that
|
This can break the expectations of client applications -and means that
|
||||||
they cannot be used as drop-in replacements for HDFS.
|
they cannot be used as drop-in replacements for HDFS.
|
||||||
|
|
||||||
<!-- ============================================================= -->
|
|
||||||
<!-- METHOD: rename() -->
|
|
||||||
<!-- ============================================================= -->
|
|
||||||
|
|
||||||
|
### `boolean rename(Path src, Path d)`
|
||||||
### `FileSystem.rename(Path src, Path d)`
|
|
||||||
|
|
||||||
In terms of its specification, `rename()` is one of the most complex operations within a filesystem .
|
In terms of its specification, `rename()` is one of the most complex operations within a filesystem .
|
||||||
|
|
||||||
|
@ -787,7 +754,7 @@ The behavior of HDFS here should not be considered a feature to replicate.
|
||||||
to the `DFSFileSystem` implementation is an ongoing matter for debate.
|
to the `DFSFileSystem` implementation is an ongoing matter for debate.
|
||||||
|
|
||||||
|
|
||||||
### `concat(Path p, Path sources[])`
|
### `void concat(Path p, Path sources[])`
|
||||||
|
|
||||||
Joins multiple blocks together to create a single file. This
|
Joins multiple blocks together to create a single file. This
|
||||||
is a little-used operation currently implemented only by HDFS.
|
is a little-used operation currently implemented only by HDFS.
|
||||||
|
|
|
@ -14,9 +14,21 @@
|
||||||
|
|
||||||
# A Model of a Hadoop Filesystem
|
# A Model of a Hadoop Filesystem
|
||||||
|
|
||||||
|
* [Paths and Path Elements](#Paths_and_Path_Elements)
|
||||||
|
* [Predicates and Functions](#Predicates_and_Functions)
|
||||||
|
* [Notes for relative paths](#Notes_for_relative_paths)
|
||||||
|
* [Defining the Filesystem](#Defining_the_Filesystem)
|
||||||
|
* [Directory references](#Directory_references)
|
||||||
|
* [File references](#File_references)
|
||||||
|
* [Symbolic references](#Symbolic_references)
|
||||||
|
* [File Length](#File_Length)
|
||||||
|
* [User home](#User_home)
|
||||||
|
* [Exclusivity](#Exclusivity)
|
||||||
|
* [Encryption Zone](#Encryption_Zone)
|
||||||
|
* [Notes](#Notes)
|
||||||
|
|
||||||
|
|
||||||
#### Paths and Path Elements
|
## Paths and Path Elements
|
||||||
|
|
||||||
A Path is a list of Path elements which represents a path to a file, directory of symbolic link
|
A Path is a list of Path elements which represents a path to a file, directory of symbolic link
|
||||||
|
|
||||||
|
@ -32,7 +44,9 @@ Filesystems MAY have other strings that are not permitted in a path element.
|
||||||
When validating path elements, the exception `InvalidPathException` SHOULD
|
When validating path elements, the exception `InvalidPathException` SHOULD
|
||||||
be raised when a path is invalid [HDFS]
|
be raised when a path is invalid [HDFS]
|
||||||
|
|
||||||
Predicate: `valid-path-element:List[String];`
|
### Predicates and Functions
|
||||||
|
|
||||||
|
#### `valid-path-element(List[String]): bool`
|
||||||
|
|
||||||
A path element `pe` is invalid if any character in it is in the set of forbidden characters,
|
A path element `pe` is invalid if any character in it is in the set of forbidden characters,
|
||||||
or the element as a whole is invalid
|
or the element as a whole is invalid
|
||||||
|
@ -41,17 +55,20 @@ or the element as a whole is invalid
|
||||||
not pe in {"", ".", "..", "/"}
|
not pe in {"", ".", "..", "/"}
|
||||||
|
|
||||||
|
|
||||||
Predicate: `valid-path:List<PathElement>`
|
#### `valid-path(List[PathElement]): bool`
|
||||||
|
|
||||||
A Path `p` is *valid* if all path elements in it are valid
|
A Path `p` is *valid* if all path elements in it are valid
|
||||||
|
|
||||||
def valid-path(pe): forall pe in Path: valid-path-element(pe)
|
def valid-path(path): forall pe in path: valid-path-element(pe)
|
||||||
|
|
||||||
|
|
||||||
The set of all possible paths is *Paths*; this is the infinite set of all lists of valid path elements.
|
The set of all possible paths is *Paths*; this is the infinite set of all lists of valid path elements.
|
||||||
|
|
||||||
The path represented by empty list, `[]` is the *root path*, and is denoted by the string `"/"`.
|
The path represented by empty list, `[]` is the *root path*, and is denoted by the string `"/"`.
|
||||||
|
|
||||||
|
|
||||||
|
#### `parent(path:Path): Path`
|
||||||
|
|
||||||
The partial function `parent(path:Path):Path` provides the parent path can be defined using
|
The partial function `parent(path:Path):Path` provides the parent path can be defined using
|
||||||
list slicing.
|
list slicing.
|
||||||
|
|
||||||
|
@ -62,7 +79,7 @@ Preconditions:
|
||||||
path != []
|
path != []
|
||||||
|
|
||||||
|
|
||||||
#### `filename:Path->PathElement`
|
#### `filename(Path): PathElement`
|
||||||
|
|
||||||
The last Path Element in a Path is called the filename.
|
The last Path Element in a Path is called the filename.
|
||||||
|
|
||||||
|
@ -72,7 +89,7 @@ Preconditions:
|
||||||
|
|
||||||
p != []
|
p != []
|
||||||
|
|
||||||
#### `childElements:(Path p, Path q):Path`
|
#### `childElements(Path p, Path q): Path`
|
||||||
|
|
||||||
|
|
||||||
The partial function `childElements:(Path p, Path q):Path`
|
The partial function `childElements:(Path p, Path q):Path`
|
||||||
|
@ -87,12 +104,12 @@ Preconditions:
|
||||||
q == p[:len(q)]
|
q == p[:len(q)]
|
||||||
|
|
||||||
|
|
||||||
#### ancestors(Path): List[Path]
|
#### `ancestors(Path): List[Path]`
|
||||||
|
|
||||||
The list of all paths that are either the direct parent of a path p, or a parent of
|
The list of all paths that are either the direct parent of a path p, or a parent of
|
||||||
ancestor of p.
|
ancestor of p.
|
||||||
|
|
||||||
#### Notes
|
### Notes for relative paths
|
||||||
|
|
||||||
This definition handles absolute paths but not relative ones; it needs to be reworked so the root element is explicit, presumably
|
This definition handles absolute paths but not relative ones; it needs to be reworked so the root element is explicit, presumably
|
||||||
by declaring that the root (and only the root) path element may be ['/'].
|
by declaring that the root (and only the root) path element may be ['/'].
|
||||||
|
@ -100,18 +117,18 @@ by declaring that the root (and only the root) path element may be ['/'].
|
||||||
Relative paths can then be distinguished from absolute paths as the input to any function and resolved when the second entry in a two-argument function
|
Relative paths can then be distinguished from absolute paths as the input to any function and resolved when the second entry in a two-argument function
|
||||||
such as `rename`.
|
such as `rename`.
|
||||||
|
|
||||||
### Defining the Filesystem
|
## Defining the Filesystem
|
||||||
|
|
||||||
|
|
||||||
A filesystem `FS` contains a set of directories, a dictionary of paths and a dictionary of symbolic links
|
A filesystem `FS` contains a set of directories, a dictionary of paths and a dictionary of symbolic links
|
||||||
|
|
||||||
(Directories:set[Path], Files:[Path:List[byte]], Symlinks:set[Path])
|
(Directories:Set[Path], Files:[Path:List[byte]], Symlinks:Set[Path])
|
||||||
|
|
||||||
|
|
||||||
Accessor functions return the specific element of a filesystem
|
Accessor functions return the specific element of a filesystem
|
||||||
|
|
||||||
def FS.Directories = FS.Directories
|
def FS.Directories = FS.Directories
|
||||||
def file(FS) = FS.Files
|
def files(FS) = FS.Files
|
||||||
def symlinks(FS) = FS.Symlinks
|
def symlinks(FS) = FS.Symlinks
|
||||||
def filenames(FS) = keys(FS.Files)
|
def filenames(FS) = keys(FS.Files)
|
||||||
|
|
||||||
|
@ -131,7 +148,7 @@ The root path, "/", is a directory represented by the path ["/"], which must al
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#### Directory references
|
### Directory references
|
||||||
|
|
||||||
A path MAY refer to a directory in a FileSystem:
|
A path MAY refer to a directory in a FileSystem:
|
||||||
|
|
||||||
|
@ -172,21 +189,21 @@ path begins with the path P -that is their parent is P or an ancestor is P
|
||||||
def descendants(FS, D) = {p for p in paths(FS) where isDescendant(D, p)}
|
def descendants(FS, D) = {p for p in paths(FS) where isDescendant(D, p)}
|
||||||
|
|
||||||
|
|
||||||
#### File references
|
### File references
|
||||||
|
|
||||||
A path MAY refer to a file; that it it has data in the filesystem; its path is a key in the data dictionary
|
A path MAY refer to a file; that it it has data in the filesystem; its path is a key in the data dictionary
|
||||||
|
|
||||||
def isFile(FS, p) = p in FS.Files
|
def isFile(FS, p) = p in FS.Files
|
||||||
|
|
||||||
|
|
||||||
#### Symbolic references
|
### Symbolic references
|
||||||
|
|
||||||
A path MAY refer to a symbolic link:
|
A path MAY refer to a symbolic link:
|
||||||
|
|
||||||
def isSymlink(FS, p) = p in symlinks(FS)
|
def isSymlink(FS, p) = p in symlinks(FS)
|
||||||
|
|
||||||
|
|
||||||
#### File Length
|
### File Length
|
||||||
|
|
||||||
The length of a path p in a filesystem FS is the length of the data stored, or 0 if it is a directory:
|
The length of a path p in a filesystem FS is the length of the data stored, or 0 if it is a directory:
|
||||||
|
|
||||||
|
@ -203,7 +220,8 @@ The function `getHomeDirectory` returns the home directory for the Filesystem an
|
||||||
For some FileSystems, the path is `["/","users", System.getProperty("user-name")]`. However,
|
For some FileSystems, the path is `["/","users", System.getProperty("user-name")]`. However,
|
||||||
for HDFS,
|
for HDFS,
|
||||||
|
|
||||||
#### Exclusivity
|
|
||||||
|
### Exclusivity
|
||||||
|
|
||||||
A path cannot refer to more than one of a file, a directory or a symbolic link
|
A path cannot refer to more than one of a file, a directory or a symbolic link
|
||||||
|
|
||||||
|
@ -218,7 +236,33 @@ This implies that only files may have data.
|
||||||
This condition is invariant and is an implicit postcondition of all
|
This condition is invariant and is an implicit postcondition of all
|
||||||
operations that manipulate the state of a FileSystem `FS`.
|
operations that manipulate the state of a FileSystem `FS`.
|
||||||
|
|
||||||
### Notes
|
|
||||||
|
### Encryption Zone
|
||||||
|
|
||||||
|
The data is encrypted if the file is in encryption zone.
|
||||||
|
|
||||||
|
def inEncryptionZone(FS, path): bool
|
||||||
|
|
||||||
|
The nature of the encryption and the mechanism for creating an encryption zone
|
||||||
|
are implementation details not covered in this specification.
|
||||||
|
No guarantees are made about the quality of the encryption.
|
||||||
|
The metadata is not encrypted.
|
||||||
|
|
||||||
|
All files and directories under a directory in an encryption zone are also in an
|
||||||
|
encryption zone.
|
||||||
|
|
||||||
|
forall d in directories(FS): inEncyptionZone(FS, d) implies
|
||||||
|
forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) :
|
||||||
|
inEncyptionZone(FS, c)
|
||||||
|
|
||||||
|
For all files in an encrypted zone, the data is encrypted, but the encryption
|
||||||
|
type and specification are not defined.
|
||||||
|
|
||||||
|
forall f in files(FS) where inEncyptionZone(FS, f):
|
||||||
|
isEncrypted(data(f))
|
||||||
|
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
Not covered: hard links in a FileSystem. If a FileSystem supports multiple
|
Not covered: hard links in a FileSystem. If a FileSystem supports multiple
|
||||||
references in *paths(FS)* to point to the same data, the outcome of operations
|
references in *paths(FS)* to point to the same data, the outcome of operations
|
||||||
|
|
Loading…
Reference in New Issue