diff --git a/pep-0428.txt b/pep-0428.txt new file mode 100644 index 000000000..0c890f2b9 --- /dev/null +++ b/pep-0428.txt @@ -0,0 +1,568 @@ +PEP: 428 +Title: The pathlib module -- object-oriented filesystem paths +Version: $Revision$ +Last-Modified: $Date +Author: Antoine Pitrou +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 30-July-2012 +Python-Version: 3.4 +Post-History: + + +Abstract +======== + +This PEP proposes the inclusion of a third-party module, `pathlib`_, in +the standard library. The inclusion is proposed under the provisional +label, as described in :pep:`411`. Therefore, API changes can be done, +either as part of the PEP process, or after acceptance in the standard +library (and until the provisional label is removed). + +The aim of this library is to provide a simple hierarchy of classes to +handle filesystem paths and the common operations users do over them. + +.. _`pathlib`: http://pypi.python.org/pypi/pathlib/ + + +Related work +============ + +An object-oriented API for filesystem paths has already been proposed +and rejected in :pep:`355`. Several third-party implementations of the +idea of object-oriented filesystem paths exist in the wild: + +* The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs + and others, which provides a ``str``-subclassing ``Path`` class; + +* Twisted's slightly specialized `FilePath class`_; + +* An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than + ``str``; + +* `Unipath`_, a variation on the str-subclassing approach with two public + classes, an ``AbstractPath`` class for operations which don't do I/O and a + ``Path`` class for all common operations. + +This proposal attempts to learn from these previous attempts and the +rejection of :pep:`355`. + + +.. _`path.py module`: https://github.com/jaraco/path.py +.. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html +.. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass +.. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview + + +Why an object-oriented API +========================== + +The rationale to represent filesystem paths using dedicated classes is the +same as for other kinds of stateless objects, such as dates, times or IP +addresses. Python has been slowly moving away from strictly replicating +the C language's APIs to providing better, more helpful abstractions around +all kinds of common functionality. Even if this PEP isn't accepted, it is +likely that another form of filesystem handling abstraction will be adopted +one day into the standard library. + +Indeed, many people will prefer handling dates and times using the high-level +objects provided by the ``datetime`` module, rather than using numeric +timestamps and the ``time`` module API. Moreover, using a dedicated class +allows to enable desirable behaviours by default, for example the case +insensitivity of Windows paths. + + +Proposal +======== + +Class hierarchy +--------------- + +The `pathlib`_ module implements a simple hierarchy of classes:: + + +----------+ + | | + ---------| PurePath |-------- + | | | | + | +----------+ | + | | | + | | | + v | v + +---------------+ | +------------+ + | | | | | + | PurePosixPath | | | PureNTPath | + | | | | | + +---------------+ | +------------+ + | v | + | +------+ | + | | | | + | -------| Path |------ | + | | | | | | + | | +------+ | | + | | | | + | | | | + v v v v + +-----------+ +--------+ + | | | | + | PosixPath | | NTPath | + | | | | + +-----------+ +--------+ + + +This hierarchy divides path classes along two dimensions: + +* a path class can be either pure or concrete: pure classes support only + operations that don't need to do any actual I/O, which are most path + manipulation operations; concrete classes support all the operations + of pure classes, plus operations that do I/O. + +* a path class is of a given flavour according to the kind of operating + system paths it represents. `pathlib`_ implements two flavours: NT paths + for the filesystem semantics embodied in Windows systems, POSIX paths for + other systems (``os.name``'s terminology is re-used here). + +Any pure class can be instantiated on any system: for example, you can +manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects +under Unix, and so on. However, concrete classes can only be instantiated +on a matching system: indeed, it would be error-prone to start doing I/O +with ``NTPath`` objects under Unix, or vice-versa. + +Furthermore, there are two base classes which also act as system-dependent +factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a +``PureNTPath`` depending on the operating system. Similarly, ``Path`` +will instantiate either a ``PosixPath`` or a ``NTPath``. + +It is expected that, in most uses, using the ``Path`` class is adequate, +which is why it has the shortest name of all. + + +No confusion with builtins +-------------------------- + +In this proposal, the path classes do not derive from a builtin type. This +contrasts with some other Path class proposals which were derived from +``str``. They also do not pretend to implement the sequence protocol: +if you want a path to act as a sequence, you have to lookup a dedicate +attribute (the ``parts`` attribute). + +By avoiding to pass as builtin types, the path classes minimize the potential +for confusion if they are combined by accident with genuine builtin types. + + +Immutability +------------ + +Path objects are immutable, which makes them hashable and also prevents a +class of programming errors. + + +Sane behaviour +-------------- + +Little of the functionality from os.path is reused. Many os.path functions +are tied by backwards compatibility to confusing or plain wrong behaviour +(for example, the fact that ``os.path.abspath()`` simplifies ".." path +components without resolving symlinks first). + +Also, using classes instead of plain strings helps make system-dependent +behaviours natural. For example, comparing and ordering Windows path +objects is case-insensitive, and path separators are automatically converted +to the platform default. + + +Useful notations +---------------- + +The API tries to provide useful notations all the while avoiding magic. +Some examples:: + + >>> p = Path('/home/antoine/pathlib/setup.py') + >>> p.name + 'setup.py' + >>> p.ext + '.py' + >>> p.root + '/' + >>> p.parts + + >>> list(p.parents()) + [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')] + >>> p.exists() + True + >>> p.st_size + 928 + + +Pure paths API +============== + +The philosophy of the ``PurePath`` API is to provide a consistent array of +useful path manipulation operations, without exposing a hodge-podge of +functions like ``os.path`` does. + + +Definitions +----------- + +First a couple of conventions: + +* All paths can have a drive and a root. For POSIX paths, the drive is + always empty. + +* A relative path has neither drive nor root. + +* A POSIX path is absolute if it has a root. A Windows path is absolute if + it has both a drive *and* a root. A Windows UNC path (e.g. + ``\\some\\share\\myfile.txt``) always has a drive and a root + (here, ``\\some\\share`` and ``\\``, respectively). + +* A drive which has either a drive *or* a root is said to be anchored. + Its anchor is the concatenation of the drive and root. Under POSIX, + "anchored" is the same as "absolute". + + +Construction and joining +------------------------ + +We will present construction and joining together since they expose +similar semantics. + +The simplest way to construct a path is to pass it its string representation:: + + >>> PurePath('setup.py') + PurePosixPath('setup.py') + +Extraneous path separators and ``"."`` components are eliminated:: + + >>> PurePath('a///b/c/./d/') + PurePosixPath('a/b/c/d') + +If you pass several arguments, they will be automatically joined:: + + >>> PurePath('docs', 'Makefile') + PurePosixPath('docs/Makefile') + +Joining semantics are similar to os.path.join, in that anchored paths ignore +the information from the previously joined components:: + + >>> PurePath('/etc', '/usr', 'bin') + PurePosixPath('/usr/bin') + +However, with Windows paths, the drive is retained as necessary:: + + >>> PureNTPath('c:/foo', '/Windows') + PureNTPath('c:\\Windows') + >>> PureNTPath('c:/foo', 'd:') + PureNTPath('d:') + +Calling the constructor without any argument creates a path object pointing +to the logical "current directory":: + + >>> PurePosixPath() + PurePosixPath('.') + +A path can be joined with another using the ``__getitem__`` operator:: + + >>> p = PurePosixPath('foo') + >>> p['bar'] + PurePosixPath('foo/bar') + >>> p[PurePosixPath('bar')] + PurePosixPath('foo/bar') + +As with constructing, multiple path components can be specified at once:: + + >>> p['bar/xyzzy'] + PurePosixPath('foo/bar/xyzzy') + +A join() method is also provided, with the same behaviour. It can serve +as a factory function:: + + >>> path_factory = p.join + >>> path_factory('bar') + PurePosixPath('foo/bar') + + +Representing +------------ + +To represent a path (e.g. to pass it to third-party libraries), just call +``str()`` on it:: + + >>> p = PurePath('/home/antoine/pathlib/setup.py') + >>> str(p) + '/home/antoine/pathlib/setup.py' + >>> p = PureNTPath('c:/windows') + >>> str(p) + 'c:\\windows' + +To force the string representation with forward slashes, use the ``as_posix()`` +method:: + + >>> p.as_posix() + 'c:/windows' + +To get the bytes representation (which might be useful under Unix systems), +call ``bytes()`` on it, or use the ``as_bytes()`` method:: + + >>> bytes(p) + b'/home/antoine/pathlib/setup.py' + + +Properties +---------- + +Five simple properties are provided on every path (each can be empty):: + + >>> p = PureNTPath('c:/pathlib/setup.py') + >>> p.drive + 'c:' + >>> p.root + '\\' + >>> p.anchor + 'c:\\' + >>> p.name + 'setup.py' + >>> p.ext + '.py' + + +Sequence-like access +-------------------- + +The ``parts`` property provides read-only sequence access to a path object:: + + >>> p = PurePosixPath('/etc/init.d') + >>> p.parts + + +Simple indexing returns the invidual path component as a string, while +slicing returns a new path object constructed from the selected components:: + + >>> p.parts[-1] + 'init.d' + >>> p.parts[:-1] + PurePosixPath('/etc') + +Windows paths handle the drive and the root as a single path component:: + + >>> p = PureNTPath('c:/setup.py') + >>> p.parts + + >>> p.root + '\\' + >>> p.parts[0] + 'c:\\' + +(separating them would be wrong, since ``C:`` is not the parent of ``C:\\``). + +The ``parent()`` method returns an ancestor of the path:: + + >>> p.parent() + PureNTPath('c:\\python33\\bin') + >>> p.parent(2) + PureNTPath('c:\\python33') + >>> p.parent(3) + PureNTPath('c:\\') + +The ``parents()`` method automates repeated invocations of ``parent()``, until +the anchor is reached:: + + >>> p = PureNTPath('c:/python33/bin/python.exe') + >>> for parent in p.parents(): parent + ... + PureNTPath('c:\\python33\\bin') + PureNTPath('c:\\python33') + PureNTPath('c:\\') + + +Querying +-------- + +``is_relative()`` returns True if the path is relative (see definition +above), False otherwise. + +``is_reserved()`` returns True if a Windows path is a reserved path such +as ``CON`` or ``NUL``. It always returns False for POSIX paths. + +``match()`` matches the path against a glob pattern:: + + >>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY') + True + +``relative()`` returns a new relative path by stripping the drive and root:: + + >>> PurePosixPath('setup.py').relative() + PurePosixPath('setup.py') + >>> PurePosixPath('/setup.py').relative() + PurePosixPath('setup.py') + +``relative_to()`` computes the relative difference of a path to another:: + + >>> PurePosixPath('/usr/bin/python').relative_to('/usr') + PurePosixPath('bin/python') + +``normcase()`` returns a case-folded version of the path for NT paths:: + + >>> PurePosixPath('CAPS').normcase() + PurePosixPath('CAPS') + >>> PureNTPath('CAPS').normcase() + PureNTPath('caps') + + +Concrete paths API +================== + +In addition to the operations of the pure API, concrete paths provide +additional methods which actually access the filesystem to query or mutate +information. + + +Constructing +------------ + +The classmethod ``cwd()`` creates a path object pointing to the current +working directory in absolute form:: + + >>> Path.cwd() + PosixPath('/home/antoine/pathlib') + + +File metadata +------------- + +The ``stat()`` method caches and returns the file's stat() result; +``restat()`` forces refreshing of the cache. ``lstat()`` is also provided, +but doesn't have any caching behaviour:: + + >>> p.stat() + posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964) + +For ease of use, direct attribute access to the fields of the stat structure +is provided over the path object itself:: + + >>> p.st_size + 928 + >>> p.st_mtime + 1328287308.889562 + +Higher-level methods help examine the kind of the file:: + + >>> p.exists() + True + >>> p.is_file() + True + >>> p.is_dir() + False + >>> p.is_symlink() + False + +The file owner and group names (rather than numeric ids) are queried +through matching properties:: + + >>> p = Path('/etc/shadow') + >>> p.owner + 'root' + >>> p.group + 'shadow' + + +Path resolution +--------------- + +The ``resolve()`` method makes a path absolute, resolving any symlink on +the way. It is the only operation which will remove "``..``" path components. + + +Directory walking +----------------- + +Simple (non-recursive) directory access is done by iteration:: + + >>> p = Path('docs') + >>> for child in p: child + ... + PosixPath('docs/conf.py') + PosixPath('docs/_templates') + PosixPath('docs/make.bat') + PosixPath('docs/index.rst') + PosixPath('docs/_build') + PosixPath('docs/_static') + PosixPath('docs/Makefile') + +This allows simple filtering through list comprehensions:: + + >>> p = Path('.') + >>> [child for child in p if child.is_dir()] + [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')] + +Simple and recursive globbing is also provided:: + + >>> for child in p.glob('**/*.py'): child + ... + PosixPath('test_pathlib.py') + PosixPath('setup.py') + PosixPath('pathlib.py') + PosixPath('docs/conf.py') + PosixPath('build/lib/pathlib.py') + + +File opening +------------ + +The ``open()`` method provides a file opening API similar to the builtin +``open()`` method:: + + >>> p = Path('setup.py') + >>> with p.open() as f: f.readline() + ... + '#!/usr/bin/env python3\n' + +The ``raw_open()`` method, on the other hand, is similar to ``os.open``:: + + >>> fd = p.raw_open(os.O_RDONLY) + >>> os.read(fd, 15) + b'#!/usr/bin/env ' + + +Filesystem alteration +--------------------- + +Several common filesystem operations are provided as methods: ``touch()``, +``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``, +``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be +provided, for example some of the functionality of the shutil module. + + +Experimental openat() support +----------------------------- + +On compatible POSIX systems, the concrete PosixPath class can take advantage +of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of +open file descriptors as necessary. Support is enabled by passing the +*use_openat* argument to the constructor:: + + >>> p = Path(".", use_openat=True) + +Then all paths constructed by navigating this path (either by iteration or +indexing) will also use the openat() family of functions. The point of using +these functions is to avoid race conditions whereby a given directory is +silently replaced with another (often a symbolic link to a sensitive system +location) between two accesses. + +.. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html + + +Copyright +========= + +This document has been placed into the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8