PEP: 428 Title: The pathlib module -- object-oriented filesystem paths Version: $Revision$ Last-Modified: $Date$ Author: Antoine Pitrou Status: Final Type: Standards Track Content-Type: text/x-rst Created: 30-July-2012 Python-Version: 3.4 Post-History: http://mail.python.org/pipermail/python-ideas/2012-October/016338.html Resolution: https://mail.python.org/pipermail/python-dev/2013-November/130424.html Abstract ======== This PEP proposes the inclusion of a third-party module, `pathlib`_, in the standard library. The inclusion is proposed under the provisional label, as described in :pep:`411`. Therefore, API changes can be done, either as part of the PEP process, or after acceptance in the standard library (and until the provisional label is removed). The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them. .. _`pathlib`: http://pypi.python.org/pypi/pathlib/ Related work ============ An object-oriented API for filesystem paths has already been proposed and rejected in :pep:`355`. Several third-party implementations of the idea of object-oriented filesystem paths exist in the wild: * The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs and others, which provides a ``str``-subclassing ``Path`` class; * Twisted's slightly specialized `FilePath class`_; * An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than ``str``; * `Unipath`_, a variation on the str-subclassing approach with two public classes, an ``AbstractPath`` class for operations which don't do I/O and a ``Path`` class for all common operations. This proposal attempts to learn from these previous attempts and the rejection of :pep:`355`. .. _`path.py module`: https://github.com/jaraco/path.py .. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html .. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass .. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview Implementation ============== The implementation of this proposal is tracked in the ``pep428`` branch of pathlib's `Mercurial repository`_. .. _`Mercurial repository`: https://bitbucket.org/pitrou/pathlib/ Why an object-oriented API ========================== The rationale to represent filesystem paths using dedicated classes is the same as for other kinds of stateless objects, such as dates, times or IP addresses. Python has been slowly moving away from strictly replicating the C language's APIs to providing better, more helpful abstractions around all kinds of common functionality. Even if this PEP isn't accepted, it is likely that another form of filesystem handling abstraction will be adopted one day into the standard library. Indeed, many people will prefer handling dates and times using the high-level objects provided by the ``datetime`` module, rather than using numeric timestamps and the ``time`` module API. Moreover, using a dedicated class allows to enable desirable behaviours by default, for example the case insensitivity of Windows paths. Proposal ======== Class hierarchy --------------- The `pathlib`_ module implements a simple hierarchy of classes:: +----------+ | | ---------| PurePath |-------- | | | | | +----------+ | | | | | | | v | v +---------------+ | +-----------------+ | | | | | | PurePosixPath | | | PureWindowsPath | | | | | | +---------------+ | +-----------------+ | v | | +------+ | | | | | | -------| Path |------ | | | | | | | | | +------+ | | | | | | | | | | v v v v +-----------+ +-------------+ | | | | | PosixPath | | WindowsPath | | | | | +-----------+ +-------------+ This hierarchy divides path classes along two dimensions: * a path class can be either pure or concrete: pure classes support only operations that don't need to do any actual I/O, which are most path manipulation operations; concrete classes support all the operations of pure classes, plus operations that do I/O. * a path class is of a given flavour according to the kind of operating system paths it represents. `pathlib`_ implements two flavours: Windows paths for the filesystem semantics embodied in Windows systems, POSIX paths for other systems. Any pure class can be instantiated on any system: for example, you can manipulate ``PurePosixPath`` objects under Windows, ``PureWindowsPath`` objects under Unix, and so on. However, concrete classes can only be instantiated on a matching system: indeed, it would be error-prone to start doing I/O with ``WindowsPath`` objects under Unix, or vice-versa. Furthermore, there are two base classes which also act as system-dependent factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a ``PureWindowsPath`` depending on the operating system. Similarly, ``Path`` will instantiate either a ``PosixPath`` or a ``WindowsPath``. It is expected that, in most uses, using the ``Path`` class is adequate, which is why it has the shortest name of all. No confusion with builtins -------------------------- In this proposal, the path classes do not derive from a builtin type. This contrasts with some other Path class proposals which were derived from ``str``. They also do not pretend to implement the sequence protocol: if you want a path to act as a sequence, you have to lookup a dedicated attribute (the ``parts`` attribute). The key reasoning behind not inheriting from ``str`` is to prevent accidentally performing operations with a string representing a path and a string that doesn't, e.g. ``path + an_accident``. Since operations with a string will not necessarily lead to a valid or expected file system path, "explicit is better than implicit" by avoiding accidental operations with strings by not subclassing it. A `blog post`_ by a Python core developer goes into more detail on the reasons behind this specific design decision. .. _blog post: http://www.snarky.ca/why-pathlib-path-doesn-t-inherit-from-str Immutability ------------ Path objects are immutable, which makes them hashable and also prevents a class of programming errors. Sane behaviour -------------- Little of the functionality from os.path is reused. Many os.path functions are tied by backwards compatibility to confusing or plain wrong behaviour (for example, the fact that ``os.path.abspath()`` simplifies ".." path components without resolving symlinks first). Comparisons ----------- Paths of the same flavour are comparable and orderable, whether pure or not:: >>> PurePosixPath('a') == PurePosixPath('b') False >>> PurePosixPath('a') < PurePosixPath('b') True >>> PurePosixPath('a') == PosixPath('a') True Comparing and ordering Windows path objects is case-insensitive:: >>> PureWindowsPath('a') == PureWindowsPath('A') True Paths of different flavours always compare unequal, and cannot be ordered:: >>> PurePosixPath('a') == PureWindowsPath('a') False >>> PurePosixPath('a') < PureWindowsPath('a') Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: PurePosixPath() < PureWindowsPath() Paths compare unequal to, and are not orderable with instances of builtin types (such as ``str``) and any other types. Useful notations ---------------- The API tries to provide useful notations all the while avoiding magic. Some examples:: >>> p = Path('/home/antoine/pathlib/setup.py') >>> p.name 'setup.py' >>> p.suffix '.py' >>> p.root '/' >>> p.parts ('/', 'home', 'antoine', 'pathlib', 'setup.py') >>> p.relative_to('/home/antoine') PosixPath('pathlib/setup.py') >>> p.exists() True Pure paths API ============== The philosophy of the ``PurePath`` API is to provide a consistent array of useful path manipulation operations, without exposing a hodge-podge of functions like ``os.path`` does. Definitions ----------- First a couple of conventions: * All paths can have a drive and a root. For POSIX paths, the drive is always empty. * A relative path has neither drive nor root. * A POSIX path is absolute if it has a root. A Windows path is absolute if it has both a drive *and* a root. A Windows UNC path (e.g. ``\\host\share\myfile.txt``) always has a drive and a root (here, ``\\host\share`` and ``\``, respectively). * A path which has either a drive *or* a root is said to be anchored. Its anchor is the concatenation of the drive and root. Under POSIX, "anchored" is the same as "absolute". Construction ------------ We will present construction and joining together since they expose similar semantics. The simplest way to construct a path is to pass it its string representation:: >>> PurePath('setup.py') PurePosixPath('setup.py') Extraneous path separators and ``"."`` components are eliminated:: >>> PurePath('a///b/c/./d/') PurePosixPath('a/b/c/d') If you pass several arguments, they will be automatically joined:: >>> PurePath('docs', 'Makefile') PurePosixPath('docs/Makefile') Joining semantics are similar to os.path.join, in that anchored paths ignore the information from the previously joined components:: >>> PurePath('/etc', '/usr', 'bin') PurePosixPath('/usr/bin') However, with Windows paths, the drive is retained as necessary:: >>> PureWindowsPath('c:/foo', '/Windows') PureWindowsPath('c:/Windows') >>> PureWindowsPath('c:/foo', 'd:') PureWindowsPath('d:') Also, path separators are normalized to the platform default:: >>> PureWindowsPath('a/b') == PureWindowsPath('a\\b') True Extraneous path separators and ``"."`` components are eliminated, but not ``".."`` components:: >>> PurePosixPath('a//b/./c/') PurePosixPath('a/b/c') >>> PurePosixPath('a/../b') PurePosixPath('a/../b') Multiple leading slashes are treated differently depending on the path flavour. They are always retained on Windows paths (because of the UNC notation):: >>> PureWindowsPath('//some/path') PureWindowsPath('//some/path/') On POSIX, they are collapsed except if there are exactly two leading slashes, which is a special case in the POSIX specification on `pathname resolution`_ (this is also necessary for Cygwin compatibility):: >>> PurePosixPath('///some/path') PurePosixPath('/some/path') >>> PurePosixPath('//some/path') PurePosixPath('//some/path') Calling the constructor without any argument creates a path object pointing to the logical "current directory" (without looking up its absolute path, which is the job of the ``cwd()`` classmethod on concrete paths):: >>> PurePosixPath() PurePosixPath('.') .. _pathname resolution: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_11 Representing ------------ To represent a path (e.g. to pass it to third-party libraries), just call ``str()`` on it:: >>> p = PurePath('/home/antoine/pathlib/setup.py') >>> str(p) '/home/antoine/pathlib/setup.py' >>> p = PureWindowsPath('c:/windows') >>> str(p) 'c:\\windows' To force the string representation with forward slashes, use the ``as_posix()`` method:: >>> p.as_posix() 'c:/windows' To get the bytes representation (which might be useful under Unix systems), call ``bytes()`` on it, which internally uses ``os.fsencode()``:: >>> bytes(p) b'/home/antoine/pathlib/setup.py' To represent the path as a ``file:`` URI, call the ``as_uri()`` method:: >>> p = PurePosixPath('/etc/passwd') >>> p.as_uri() 'file:///etc/passwd' >>> p = PureWindowsPath('c:/Windows') >>> p.as_uri() 'file:///c:/Windows' The repr() of a path always uses forward slashes, even under Windows, for readability and to remind users that forward slashes are ok:: >>> p = PureWindowsPath('c:/Windows') >>> p PureWindowsPath('c:/Windows') Properties ---------- Several simple properties are provided on every path (each can be empty):: >>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz') >>> p.drive 'c:' >>> p.root '\\' >>> p.anchor 'c:\\' >>> p.name 'pathlib.tar.gz' >>> p.stem 'pathlib.tar' >>> p.suffix '.gz' >>> p.suffixes ['.tar', '.gz'] Deriving new paths ------------------ Joining ^^^^^^^ A path can be joined with another using the ``/`` operator:: >>> p = PurePosixPath('foo') >>> p / 'bar' PurePosixPath('foo/bar') >>> p / PurePosixPath('bar') PurePosixPath('foo/bar') >>> 'bar' / p PurePosixPath('bar/foo') As with the constructor, multiple path components can be specified, either collapsed or separately:: >>> p / 'bar/xyzzy' PurePosixPath('foo/bar/xyzzy') >>> p / 'bar' / 'xyzzy' PurePosixPath('foo/bar/xyzzy') A joinpath() method is also provided, with the same behaviour:: >>> p.joinpath('Python') PurePosixPath('foo/Python') Changing the path's final component ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``with_name()`` method returns a new path, with the name changed:: >>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz') >>> p.with_name('setup.py') PureWindowsPath('c:/Downloads/setup.py') It fails with a ``ValueError`` if the path doesn't have an actual name:: >>> p = PureWindowsPath('c:/') >>> p.with_name('setup.py') Traceback (most recent call last): File "", line 1, in File "pathlib.py", line 875, in with_name raise ValueError("%r has an empty name" % (self,)) ValueError: PureWindowsPath('c:/') has an empty name >>> p.name '' The ``with_suffix()`` method returns a new path with the suffix changed. However, if the path has no suffix, the new suffix is added:: >>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz') >>> p.with_suffix('.bz2') PureWindowsPath('c:/Downloads/pathlib.tar.bz2') >>> p = PureWindowsPath('README') >>> p.with_suffix('.bz2') PureWindowsPath('README.bz2') Making the path relative ^^^^^^^^^^^^^^^^^^^^^^^^ The ``relative_to()`` method computes the relative difference of a path to another:: >>> PurePosixPath('/usr/bin/python').relative_to('/usr') PurePosixPath('bin/python') ValueError is raised if the method cannot return a meaningful value:: >>> PurePosixPath('/usr/bin/python').relative_to('/etc') Traceback (most recent call last): File "", line 1, in File "pathlib.py", line 926, in relative_to .format(str(self), str(formatted))) ValueError: '/usr/bin/python' does not start with '/etc' Sequence-like access -------------------- The ``parts`` property returns a tuple providing read-only sequence access to a path's components:: >>> p = PurePosixPath('/etc/init.d') >>> p.parts ('/', 'etc', 'init.d') Windows paths handle the drive and the root as a single path component:: >>> p = PureWindowsPath('c:/setup.py') >>> p.parts ('c:\\', 'setup.py') (separating them would be wrong, since ``C:`` is not the parent of ``C:\\``). The ``parent`` property returns the logical parent of the path:: >>> p = PureWindowsPath('c:/python33/bin/python.exe') >>> p.parent PureWindowsPath('c:/python33/bin') The ``parents`` property returns an immutable sequence of the path's logical ancestors:: >>> p = PureWindowsPath('c:/python33/bin/python.exe') >>> len(p.parents) 3 >>> p.parents[0] PureWindowsPath('c:/python33/bin') >>> p.parents[1] PureWindowsPath('c:/python33') >>> p.parents[2] PureWindowsPath('c:/') Querying -------- ``is_relative()`` returns True if the path is relative (see definition above), False otherwise. ``is_reserved()`` returns True if a Windows path is a reserved path such as ``CON`` or ``NUL``. It always returns False for POSIX paths. ``match()`` matches the path against a glob pattern. It operates on individual parts and matches from the right: >>> p = PurePosixPath('/usr/bin') >>> p.match('/usr/b*') True >>> p.match('usr/b*') True >>> p.match('b*') True >>> p.match('/u*') False This behaviour respects the following expectations: - A simple pattern such as "\*.py" matches arbitrarily long paths as long as the last part matches, e.g. "/usr/foo/bar.py". - Longer patterns can be used as well for more complex matching, e.g. "/usr/foo/\*.py" matches "/usr/foo/bar.py". Concrete paths API ================== In addition to the operations of the pure API, concrete paths provide additional methods which actually access the filesystem to query or mutate information. Constructing ------------ The classmethod ``cwd()`` creates a path object pointing to the current working directory in absolute form:: >>> Path.cwd() PosixPath('/home/antoine/pathlib') File metadata ------------- The ``stat()`` returns the file's stat() result; similarly, ``lstat()`` returns the file's lstat() result (which is different iff the file is a symbolic link):: >>> p.stat() posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964) Higher-level methods help examine the kind of the file:: >>> p.exists() True >>> p.is_file() True >>> p.is_dir() False >>> p.is_symlink() False >>> p.is_socket() False >>> p.is_fifo() False >>> p.is_block_device() False >>> p.is_char_device() False The file owner and group names (rather than numeric ids) are queried through corresponding methods:: >>> p = Path('/etc/shadow') >>> p.owner() 'root' >>> p.group() 'shadow' Path resolution --------------- The ``resolve()`` method makes a path absolute, resolving any symlink on the way (like the POSIX realpath() call). It is the only operation which will remove "``..``" path components. On Windows, this method will also take care to return the canonical path (with the right casing). Directory walking ----------------- Simple (non-recursive) directory access is done by calling the iterdir() method, which returns an iterator over the child paths:: >>> p = Path('docs') >>> for child in p.iterdir(): child ... PosixPath('docs/conf.py') PosixPath('docs/_templates') PosixPath('docs/make.bat') PosixPath('docs/index.rst') PosixPath('docs/_build') PosixPath('docs/_static') PosixPath('docs/Makefile') This allows simple filtering through list comprehensions:: >>> p = Path('.') >>> [child for child in p.iterdir() if child.is_dir()] [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')] Simple and recursive globbing is also provided:: >>> for child in p.glob('**/*.py'): child ... PosixPath('test_pathlib.py') PosixPath('setup.py') PosixPath('pathlib.py') PosixPath('docs/conf.py') PosixPath('build/lib/pathlib.py') File opening ------------ The ``open()`` method provides a file opening API similar to the builtin ``open()`` method:: >>> p = Path('setup.py') >>> with p.open() as f: f.readline() ... '#!/usr/bin/env python3\n' Filesystem modification ----------------------- Several common filesystem operations are provided as methods: ``touch()``, ``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``, ``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be provided, for example some of the functionality of the shutil module. Detailed documentation of the proposed API can be found at the `pathlib docs`_. .. _pathlib docs: https://pathlib.readthedocs.org/en/pep428/ Discussion ========== Division operator ----------------- The division operator came out first in a `poll`_ about the path joining operator. Initial versions of `pathlib`_ used square brackets (i.e. ``__getitem__``) instead. .. _poll: https://mail.python.org/pipermail/python-ideas/2012-October/016544.html joinpath() ---------- The joinpath() method was initially called join(), but several people objected that it could be confused with str.join() which has different semantics. Therefore it was renamed to joinpath(). Case-sensitivity ---------------- Windows users consider filesystem paths to be case-insensitive and expect path objects to observe that characteristic, even though in some rare situations some foreign filesystem mounts may be case-sensitive under Windows. In the words of one commenter, "If glob("\*.py") failed to find SETUP.PY on Windows, that would be a usability disaster". -- Paul Moore in https://mail.python.org/pipermail/python-dev/2013-April/125254.html Copyright ========= This document has been placed into the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8