diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md index 84e375508c2..d81208d1636 100644 --- a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md +++ b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md @@ -19,6 +19,10 @@ # class `org.apache.hadoop.fs.FileSystem` +* [Invariants](#Invariants) +* [Predicates and other state access operations](#Predicates_and_other_state_access_operations) +* [State Changing Operations](#State_Changing_Operations) + The abstract `FileSystem` class is the original class to access Hadoop filesystems; non-abstract subclasses exist for all Hadoop-supported filesystems. @@ -59,38 +63,6 @@ all operations on a valid FileSystem MUST result in a new FileSystem that is als def isFile(FS, p) = p in files(FS) -### `boolean isSymlink(Path p)` - - - def isSymlink(FS, p) = p in symlinks(FS) - -### 'boolean inEncryptionZone(Path p)' - -Return True if the data for p is encrypted. The nature of the encryption and the -mechanism for creating an encryption zone are implementation details not covered -in this specification. No guarantees are made about the quality of the -encryption. The metadata is not encrypted. - -#### Preconditions - - if not exists(FS, p) : raise FileNotFoundException - -#### Postconditions - -#### Invariants - -All files and directories under a directory in an encryption zone are also in an -encryption zone - - forall d in directories(FS): inEncyptionZone(FS, d) implies - forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) : - inEncyptionZone(FS, c) - -For all files in an encrypted zone, the data is encrypted, but the encryption -type and specification are not defined. - - forall f in files(FS) where inEncyptionZone(FS, c): - isEncrypted(data(f)) ### `FileStatus getFileStatus(Path p)` @@ -98,12 +70,10 @@ Get the status of a path #### Preconditions - if not exists(FS, p) : raise FileNotFoundException #### Postconditions - result = stat: FileStatus where: if isFile(FS, p) : stat.length = len(FS.Files[p]) @@ -120,6 +90,7 @@ Get the status of a path else stat.isEncrypted = False + ### `Path getHomeDirectory()` The function `getHomeDirectory` returns the home directory for the FileSystem @@ -152,7 +123,7 @@ code may fail. fail with a RuntimeException or subclass thereof if there is a connectivity problem. The time to execute the operation is not bounded. -### `FileSystem.listStatus(Path, PathFilter )` +### `FileStatus[] listStatus(Path p, PathFilter filter)` A `PathFilter` `f` is a predicate function that returns true iff the path `p` meets the filter's conditions. @@ -184,7 +155,7 @@ to the same path: fs == getFileStatus(fs.path) -### Atomicity and Consistency +#### Atomicity and Consistency By the time the `listStatus()` operation returns to the caller, there is no guarantee that the information contained in the response is current. @@ -239,7 +210,7 @@ these inconsistent views are only likely when listing a directory with many chil Other filesystems may have stronger consistency guarantees, or return inconsistent data more readily. -### ` List[BlockLocation] getFileBlockLocations(FileStatus f, int s, int l)` +### `BlockLocation[] getFileBlockLocations(FileStatus f, int s, int l)` #### Preconditions @@ -286,7 +257,7 @@ of elements as the cluster topology MUST be provided, hence Filesystems SHOULD return that `"/default/localhost"` path -### `getFileBlockLocations(Path P, int S, int L)` +### `BlockLocation[] getFileBlockLocations(Path P, int S, int L)` #### Preconditions @@ -300,7 +271,7 @@ return that `"/default/localhost"` path result = getFileBlockLocations(getStatus(P), S, L) -### `getDefaultBlockSize()` +### `long getDefaultBlockSize()` #### Preconditions @@ -318,7 +289,7 @@ Any FileSystem that does not actually break files into blocks SHOULD return a number for this that results in efficient processing. A FileSystem MAY make this user-configurable (the S3 and Swift filesystem clients do this). -### `getDefaultBlockSize(Path P)` +### `long getDefaultBlockSize(Path p)` #### Preconditions @@ -336,7 +307,7 @@ different paths, in which case the specific default value for the destination pa SHOULD be returned. -### `getBlockSize(Path P)` +### `long getBlockSize(Path p)` #### Preconditions @@ -354,7 +325,7 @@ the `FileStatus` returned from `getFileStatus(P)`. ## State Changing Operations -### `boolean mkdirs(Path p, FsPermission permission )` +### `boolean mkdirs(Path p, FsPermission permission)` Create a directory and all its parents @@ -511,7 +482,7 @@ exists in the metadata, but no copies of any its blocks can be located; -`FileNotFoundException` would seem more accurate and useful. -### `FileSystem.delete(Path P, boolean recursive)` +### `boolean delete(Path p, boolean recursive)` #### Preconditions @@ -615,12 +586,8 @@ implement `delete()` as recursive listing and file delete operation. This can break the expectations of client applications -and means that they cannot be used as drop-in replacements for HDFS. - - - - -### `FileSystem.rename(Path src, Path d)` +### `boolean rename(Path src, Path d)` In terms of its specification, `rename()` is one of the most complex operations within a filesystem . @@ -787,7 +754,7 @@ The behavior of HDFS here should not be considered a feature to replicate. to the `DFSFileSystem` implementation is an ongoing matter for debate. -### `concat(Path p, Path sources[])` +### `void concat(Path p, Path sources[])` Joins multiple blocks together to create a single file. This is a little-used operation currently implemented only by HDFS. diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/model.md b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/model.md index d00dcd674b3..c458671a155 100644 --- a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/model.md +++ b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/model.md @@ -14,9 +14,21 @@ # A Model of a Hadoop Filesystem +* [Paths and Path Elements](#Paths_and_Path_Elements) + * [Predicates and Functions](#Predicates_and_Functions) + * [Notes for relative paths](#Notes_for_relative_paths) +* [Defining the Filesystem](#Defining_the_Filesystem) + * [Directory references](#Directory_references) + * [File references](#File_references) + * [Symbolic references](#Symbolic_references) + * [File Length](#File_Length) + * [User home](#User_home) + * [Exclusivity](#Exclusivity) + * [Encryption Zone](#Encryption_Zone) +* [Notes](#Notes) -#### Paths and Path Elements +## Paths and Path Elements A Path is a list of Path elements which represents a path to a file, directory of symbolic link @@ -32,7 +44,9 @@ Filesystems MAY have other strings that are not permitted in a path element. When validating path elements, the exception `InvalidPathException` SHOULD be raised when a path is invalid [HDFS] -Predicate: `valid-path-element:List[String];` +### Predicates and Functions + +#### `valid-path-element(List[String]): bool` A path element `pe` is invalid if any character in it is in the set of forbidden characters, or the element as a whole is invalid @@ -41,17 +55,20 @@ or the element as a whole is invalid not pe in {"", ".", "..", "/"} -Predicate: `valid-path:List` +#### `valid-path(List[PathElement]): bool` A Path `p` is *valid* if all path elements in it are valid - def valid-path(pe): forall pe in Path: valid-path-element(pe) + def valid-path(path): forall pe in path: valid-path-element(pe) The set of all possible paths is *Paths*; this is the infinite set of all lists of valid path elements. The path represented by empty list, `[]` is the *root path*, and is denoted by the string `"/"`. + +#### `parent(path:Path): Path` + The partial function `parent(path:Path):Path` provides the parent path can be defined using list slicing. @@ -62,7 +79,7 @@ Preconditions: path != [] -#### `filename:Path->PathElement` +#### `filename(Path): PathElement` The last Path Element in a Path is called the filename. @@ -72,7 +89,7 @@ Preconditions: p != [] -#### `childElements:(Path p, Path q):Path` +#### `childElements(Path p, Path q): Path` The partial function `childElements:(Path p, Path q):Path` @@ -87,12 +104,12 @@ Preconditions: q == p[:len(q)] -#### ancestors(Path): List[Path] +#### `ancestors(Path): List[Path]` The list of all paths that are either the direct parent of a path p, or a parent of ancestor of p. -#### Notes +### Notes for relative paths This definition handles absolute paths but not relative ones; it needs to be reworked so the root element is explicit, presumably by declaring that the root (and only the root) path element may be ['/']. @@ -100,18 +117,18 @@ by declaring that the root (and only the root) path element may be ['/']. Relative paths can then be distinguished from absolute paths as the input to any function and resolved when the second entry in a two-argument function such as `rename`. -### Defining the Filesystem +## Defining the Filesystem A filesystem `FS` contains a set of directories, a dictionary of paths and a dictionary of symbolic links - (Directories:set[Path], Files:[Path:List[byte]], Symlinks:set[Path]) + (Directories:Set[Path], Files:[Path:List[byte]], Symlinks:Set[Path]) Accessor functions return the specific element of a filesystem def FS.Directories = FS.Directories - def file(FS) = FS.Files + def files(FS) = FS.Files def symlinks(FS) = FS.Symlinks def filenames(FS) = keys(FS.Files) @@ -131,7 +148,7 @@ The root path, "/", is a directory represented by the path ["/"], which must al -#### Directory references +### Directory references A path MAY refer to a directory in a FileSystem: @@ -172,21 +189,21 @@ path begins with the path P -that is their parent is P or an ancestor is P def descendants(FS, D) = {p for p in paths(FS) where isDescendant(D, p)} -#### File references +### File references A path MAY refer to a file; that it it has data in the filesystem; its path is a key in the data dictionary def isFile(FS, p) = p in FS.Files -#### Symbolic references +### Symbolic references A path MAY refer to a symbolic link: def isSymlink(FS, p) = p in symlinks(FS) -#### File Length +### File Length The length of a path p in a filesystem FS is the length of the data stored, or 0 if it is a directory: @@ -203,7 +220,8 @@ The function `getHomeDirectory` returns the home directory for the Filesystem an For some FileSystems, the path is `["/","users", System.getProperty("user-name")]`. However, for HDFS, -#### Exclusivity + +### Exclusivity A path cannot refer to more than one of a file, a directory or a symbolic link @@ -218,7 +236,33 @@ This implies that only files may have data. This condition is invariant and is an implicit postcondition of all operations that manipulate the state of a FileSystem `FS`. -### Notes + +### Encryption Zone + +The data is encrypted if the file is in encryption zone. + + def inEncryptionZone(FS, path): bool + +The nature of the encryption and the mechanism for creating an encryption zone +are implementation details not covered in this specification. +No guarantees are made about the quality of the encryption. +The metadata is not encrypted. + +All files and directories under a directory in an encryption zone are also in an +encryption zone. + + forall d in directories(FS): inEncyptionZone(FS, d) implies + forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) : + inEncyptionZone(FS, c) + +For all files in an encrypted zone, the data is encrypted, but the encryption +type and specification are not defined. + + forall f in files(FS) where inEncyptionZone(FS, f): + isEncrypted(data(f)) + + +## Notes Not covered: hard links in a FileSystem. If a FileSystem supports multiple references in *paths(FS)* to point to the same data, the outcome of operations