HADOOP-12229 Fix inconsistent subsection titles in filesystem.md. Contributed by Masatake Iwasaki
This commit is contained in:
parent
8113855b3a
commit
111739df8f
|
@ -19,6 +19,10 @@
|
|||
|
||||
# class `org.apache.hadoop.fs.FileSystem`
|
||||
|
||||
* [Invariants](#Invariants)
|
||||
* [Predicates and other state access operations](#Predicates_and_other_state_access_operations)
|
||||
* [State Changing Operations](#State_Changing_Operations)
|
||||
|
||||
The abstract `FileSystem` class is the original class to access Hadoop filesystems;
|
||||
non-abstract subclasses exist for all Hadoop-supported filesystems.
|
||||
|
||||
|
@ -59,38 +63,6 @@ all operations on a valid FileSystem MUST result in a new FileSystem that is als
|
|||
|
||||
def isFile(FS, p) = p in files(FS)
|
||||
|
||||
### `boolean isSymlink(Path p)`
|
||||
|
||||
|
||||
def isSymlink(FS, p) = p in symlinks(FS)
|
||||
|
||||
### 'boolean inEncryptionZone(Path p)'
|
||||
|
||||
Return True if the data for p is encrypted. The nature of the encryption and the
|
||||
mechanism for creating an encryption zone are implementation details not covered
|
||||
in this specification. No guarantees are made about the quality of the
|
||||
encryption. The metadata is not encrypted.
|
||||
|
||||
#### Preconditions
|
||||
|
||||
if not exists(FS, p) : raise FileNotFoundException
|
||||
|
||||
#### Postconditions
|
||||
|
||||
#### Invariants
|
||||
|
||||
All files and directories under a directory in an encryption zone are also in an
|
||||
encryption zone
|
||||
|
||||
forall d in directories(FS): inEncyptionZone(FS, d) implies
|
||||
forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) :
|
||||
inEncyptionZone(FS, c)
|
||||
|
||||
For all files in an encrypted zone, the data is encrypted, but the encryption
|
||||
type and specification are not defined.
|
||||
|
||||
forall f in files(FS) where inEncyptionZone(FS, c):
|
||||
isEncrypted(data(f))
|
||||
|
||||
### `FileStatus getFileStatus(Path p)`
|
||||
|
||||
|
@ -98,12 +70,10 @@ Get the status of a path
|
|||
|
||||
#### Preconditions
|
||||
|
||||
|
||||
if not exists(FS, p) : raise FileNotFoundException
|
||||
|
||||
#### Postconditions
|
||||
|
||||
|
||||
result = stat: FileStatus where:
|
||||
if isFile(FS, p) :
|
||||
stat.length = len(FS.Files[p])
|
||||
|
@ -120,6 +90,7 @@ Get the status of a path
|
|||
else
|
||||
stat.isEncrypted = False
|
||||
|
||||
|
||||
### `Path getHomeDirectory()`
|
||||
|
||||
The function `getHomeDirectory` returns the home directory for the FileSystem
|
||||
|
@ -152,7 +123,7 @@ code may fail.
|
|||
fail with a RuntimeException or subclass thereof if there is a connectivity
|
||||
problem. The time to execute the operation is not bounded.
|
||||
|
||||
### `FileSystem.listStatus(Path, PathFilter )`
|
||||
### `FileStatus[] listStatus(Path p, PathFilter filter)`
|
||||
|
||||
A `PathFilter` `f` is a predicate function that returns true iff the path `p`
|
||||
meets the filter's conditions.
|
||||
|
@ -188,7 +159,7 @@ While HDFS currently returns an alphanumerically sorted list, neither the Posix
|
|||
nor Java's `File.listFiles()` API calls define any ordering of returned values. Applications
|
||||
which require a uniform sort order on the results must perform the sorting themselves.
|
||||
|
||||
### Atomicity and Consistency
|
||||
#### Atomicity and Consistency
|
||||
|
||||
By the time the `listStatus()` operation returns to the caller, there
|
||||
is no guarantee that the information contained in the response is current.
|
||||
|
@ -243,7 +214,7 @@ these inconsistent views are only likely when listing a directory with many chil
|
|||
Other filesystems may have stronger consistency guarantees, or return inconsistent
|
||||
data more readily.
|
||||
|
||||
### ` List[BlockLocation] getFileBlockLocations(FileStatus f, int s, int l)`
|
||||
### `BlockLocation[] getFileBlockLocations(FileStatus f, int s, int l)`
|
||||
|
||||
#### Preconditions
|
||||
|
||||
|
@ -290,7 +261,7 @@ of elements as the cluster topology MUST be provided, hence Filesystems SHOULD
|
|||
return that `"/default/localhost"` path
|
||||
|
||||
|
||||
### `getFileBlockLocations(Path P, int S, int L)`
|
||||
### `BlockLocation[] getFileBlockLocations(Path P, int S, int L)`
|
||||
|
||||
#### Preconditions
|
||||
|
||||
|
@ -304,7 +275,7 @@ return that `"/default/localhost"` path
|
|||
result = getFileBlockLocations(getStatus(P), S, L)
|
||||
|
||||
|
||||
### `getDefaultBlockSize()`
|
||||
### `long getDefaultBlockSize()`
|
||||
|
||||
#### Preconditions
|
||||
|
||||
|
@ -322,7 +293,7 @@ Any FileSystem that does not actually break files into blocks SHOULD
|
|||
return a number for this that results in efficient processing.
|
||||
A FileSystem MAY make this user-configurable (the S3 and Swift filesystem clients do this).
|
||||
|
||||
### `getDefaultBlockSize(Path P)`
|
||||
### `long getDefaultBlockSize(Path p)`
|
||||
|
||||
#### Preconditions
|
||||
|
||||
|
@ -340,7 +311,7 @@ different paths, in which case the specific default value for the destination pa
|
|||
SHOULD be returned.
|
||||
|
||||
|
||||
### `getBlockSize(Path P)`
|
||||
### `long getBlockSize(Path p)`
|
||||
|
||||
#### Preconditions
|
||||
|
||||
|
@ -515,7 +486,7 @@ exists in the metadata, but no copies of any its blocks can be located;
|
|||
-`FileNotFoundException` would seem more accurate and useful.
|
||||
|
||||
|
||||
### `FileSystem.delete(Path P, boolean recursive)`
|
||||
### `boolean delete(Path p, boolean recursive)`
|
||||
|
||||
#### Preconditions
|
||||
|
||||
|
@ -619,12 +590,8 @@ implement `delete()` as recursive listing and file delete operation.
|
|||
This can break the expectations of client applications -and means that
|
||||
they cannot be used as drop-in replacements for HDFS.
|
||||
|
||||
<!-- ============================================================= -->
|
||||
<!-- METHOD: rename() -->
|
||||
<!-- ============================================================= -->
|
||||
|
||||
|
||||
### `FileSystem.rename(Path src, Path d)`
|
||||
### `boolean rename(Path src, Path d)`
|
||||
|
||||
In terms of its specification, `rename()` is one of the most complex operations within a filesystem .
|
||||
|
||||
|
@ -791,7 +758,7 @@ The behavior of HDFS here should not be considered a feature to replicate.
|
|||
to the `DFSFileSystem` implementation is an ongoing matter for debate.
|
||||
|
||||
|
||||
### `concat(Path p, Path sources[])`
|
||||
### `void concat(Path p, Path sources[])`
|
||||
|
||||
Joins multiple blocks together to create a single file. This
|
||||
is a little-used operation currently implemented only by HDFS.
|
||||
|
|
|
@ -14,9 +14,21 @@
|
|||
|
||||
# A Model of a Hadoop Filesystem
|
||||
|
||||
* [Paths and Path Elements](#Paths_and_Path_Elements)
|
||||
* [Predicates and Functions](#Predicates_and_Functions)
|
||||
* [Notes for relative paths](#Notes_for_relative_paths)
|
||||
* [Defining the Filesystem](#Defining_the_Filesystem)
|
||||
* [Directory references](#Directory_references)
|
||||
* [File references](#File_references)
|
||||
* [Symbolic references](#Symbolic_references)
|
||||
* [File Length](#File_Length)
|
||||
* [User home](#User_home)
|
||||
* [Exclusivity](#Exclusivity)
|
||||
* [Encryption Zone](#Encryption_Zone)
|
||||
* [Notes](#Notes)
|
||||
|
||||
|
||||
#### Paths and Path Elements
|
||||
## Paths and Path Elements
|
||||
|
||||
A Path is a list of Path elements which represents a path to a file, directory of symbolic link
|
||||
|
||||
|
@ -32,7 +44,9 @@ Filesystems MAY have other strings that are not permitted in a path element.
|
|||
When validating path elements, the exception `InvalidPathException` SHOULD
|
||||
be raised when a path is invalid [HDFS]
|
||||
|
||||
Predicate: `valid-path-element:List[String];`
|
||||
### Predicates and Functions
|
||||
|
||||
#### `valid-path-element(List[String]): bool`
|
||||
|
||||
A path element `pe` is invalid if any character in it is in the set of forbidden characters,
|
||||
or the element as a whole is invalid
|
||||
|
@ -41,17 +55,20 @@ or the element as a whole is invalid
|
|||
not pe in {"", ".", "..", "/"}
|
||||
|
||||
|
||||
Predicate: `valid-path:List<PathElement>`
|
||||
#### `valid-path(List[PathElement]): bool`
|
||||
|
||||
A Path `p` is *valid* if all path elements in it are valid
|
||||
|
||||
def valid-path(pe): forall pe in Path: valid-path-element(pe)
|
||||
def valid-path(path): forall pe in path: valid-path-element(pe)
|
||||
|
||||
|
||||
The set of all possible paths is *Paths*; this is the infinite set of all lists of valid path elements.
|
||||
|
||||
The path represented by empty list, `[]` is the *root path*, and is denoted by the string `"/"`.
|
||||
|
||||
|
||||
#### `parent(path:Path): Path`
|
||||
|
||||
The partial function `parent(path:Path):Path` provides the parent path can be defined using
|
||||
list slicing.
|
||||
|
||||
|
@ -62,7 +79,7 @@ Preconditions:
|
|||
path != []
|
||||
|
||||
|
||||
#### `filename:Path->PathElement`
|
||||
#### `filename(Path): PathElement`
|
||||
|
||||
The last Path Element in a Path is called the filename.
|
||||
|
||||
|
@ -72,7 +89,7 @@ Preconditions:
|
|||
|
||||
p != []
|
||||
|
||||
#### `childElements:(Path p, Path q):Path`
|
||||
#### `childElements(Path p, Path q): Path`
|
||||
|
||||
|
||||
The partial function `childElements:(Path p, Path q):Path`
|
||||
|
@ -87,12 +104,12 @@ Preconditions:
|
|||
q == p[:len(q)]
|
||||
|
||||
|
||||
#### ancestors(Path): List[Path]
|
||||
#### `ancestors(Path): List[Path]`
|
||||
|
||||
The list of all paths that are either the direct parent of a path p, or a parent of
|
||||
ancestor of p.
|
||||
|
||||
#### Notes
|
||||
### Notes for relative paths
|
||||
|
||||
This definition handles absolute paths but not relative ones; it needs to be reworked so the root element is explicit, presumably
|
||||
by declaring that the root (and only the root) path element may be ['/'].
|
||||
|
@ -100,18 +117,18 @@ by declaring that the root (and only the root) path element may be ['/'].
|
|||
Relative paths can then be distinguished from absolute paths as the input to any function and resolved when the second entry in a two-argument function
|
||||
such as `rename`.
|
||||
|
||||
### Defining the Filesystem
|
||||
## Defining the Filesystem
|
||||
|
||||
|
||||
A filesystem `FS` contains a set of directories, a dictionary of paths and a dictionary of symbolic links
|
||||
|
||||
(Directories:set[Path], Files:[Path:List[byte]], Symlinks:set[Path])
|
||||
(Directories:Set[Path], Files:[Path:List[byte]], Symlinks:Set[Path])
|
||||
|
||||
|
||||
Accessor functions return the specific element of a filesystem
|
||||
|
||||
def FS.Directories = FS.Directories
|
||||
def file(FS) = FS.Files
|
||||
def files(FS) = FS.Files
|
||||
def symlinks(FS) = FS.Symlinks
|
||||
def filenames(FS) = keys(FS.Files)
|
||||
|
||||
|
@ -131,7 +148,7 @@ The root path, "/", is a directory represented by the path ["/"], which must al
|
|||
|
||||
|
||||
|
||||
#### Directory references
|
||||
### Directory references
|
||||
|
||||
A path MAY refer to a directory in a FileSystem:
|
||||
|
||||
|
@ -172,21 +189,21 @@ path begins with the path P -that is their parent is P or an ancestor is P
|
|||
def descendants(FS, D) = {p for p in paths(FS) where isDescendant(D, p)}
|
||||
|
||||
|
||||
#### File references
|
||||
### File references
|
||||
|
||||
A path MAY refer to a file; that it it has data in the filesystem; its path is a key in the data dictionary
|
||||
|
||||
def isFile(FS, p) = p in FS.Files
|
||||
|
||||
|
||||
#### Symbolic references
|
||||
### Symbolic references
|
||||
|
||||
A path MAY refer to a symbolic link:
|
||||
|
||||
def isSymlink(FS, p) = p in symlinks(FS)
|
||||
|
||||
|
||||
#### File Length
|
||||
### File Length
|
||||
|
||||
The length of a path p in a filesystem FS is the length of the data stored, or 0 if it is a directory:
|
||||
|
||||
|
@ -203,7 +220,8 @@ The function `getHomeDirectory` returns the home directory for the Filesystem an
|
|||
For some FileSystems, the path is `["/","users", System.getProperty("user-name")]`. However,
|
||||
for HDFS,
|
||||
|
||||
#### Exclusivity
|
||||
|
||||
### Exclusivity
|
||||
|
||||
A path cannot refer to more than one of a file, a directory or a symbolic link
|
||||
|
||||
|
@ -218,7 +236,33 @@ This implies that only files may have data.
|
|||
This condition is invariant and is an implicit postcondition of all
|
||||
operations that manipulate the state of a FileSystem `FS`.
|
||||
|
||||
### Notes
|
||||
|
||||
### Encryption Zone
|
||||
|
||||
The data is encrypted if the file is in encryption zone.
|
||||
|
||||
def inEncryptionZone(FS, path): bool
|
||||
|
||||
The nature of the encryption and the mechanism for creating an encryption zone
|
||||
are implementation details not covered in this specification.
|
||||
No guarantees are made about the quality of the encryption.
|
||||
The metadata is not encrypted.
|
||||
|
||||
All files and directories under a directory in an encryption zone are also in an
|
||||
encryption zone.
|
||||
|
||||
forall d in directories(FS): inEncyptionZone(FS, d) implies
|
||||
forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) :
|
||||
inEncyptionZone(FS, c)
|
||||
|
||||
For all files in an encrypted zone, the data is encrypted, but the encryption
|
||||
type and specification are not defined.
|
||||
|
||||
forall f in files(FS) where inEncyptionZone(FS, f):
|
||||
isEncrypted(data(f))
|
||||
|
||||
|
||||
## Notes
|
||||
|
||||
Not covered: hard links in a FileSystem. If a FileSystem supports multiple
|
||||
references in *paths(FS)* to point to the same data, the outcome of operations
|
||||
|
|
Loading…
Reference in New Issue