665 lines
19 KiB
Plaintext
665 lines
19 KiB
Plaintext
PEP: 428
|
|
Title: The pathlib module -- object-oriented filesystem paths
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Antoine Pitrou <solipsis@pitrou.net>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 30-July-2012
|
|
Python-Version: 3.4
|
|
Post-History: http://mail.python.org/pipermail/python-ideas/2012-October/016338.html
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP proposes the inclusion of a third-party module, `pathlib`_, in
|
|
the standard library. The inclusion is proposed under the provisional
|
|
label, as described in :pep:`411`. Therefore, API changes can be done,
|
|
either as part of the PEP process, or after acceptance in the standard
|
|
library (and until the provisional label is removed).
|
|
|
|
The aim of this library is to provide a simple hierarchy of classes to
|
|
handle filesystem paths and the common operations users do over them.
|
|
|
|
.. _`pathlib`: http://pypi.python.org/pypi/pathlib/
|
|
|
|
|
|
Related work
|
|
============
|
|
|
|
An object-oriented API for filesystem paths has already been proposed
|
|
and rejected in :pep:`355`. Several third-party implementations of the
|
|
idea of object-oriented filesystem paths exist in the wild:
|
|
|
|
* The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs
|
|
and others, which provides a ``str``-subclassing ``Path`` class;
|
|
|
|
* Twisted's slightly specialized `FilePath class`_;
|
|
|
|
* An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than
|
|
``str``;
|
|
|
|
* `Unipath`_, a variation on the str-subclassing approach with two public
|
|
classes, an ``AbstractPath`` class for operations which don't do I/O and a
|
|
``Path`` class for all common operations.
|
|
|
|
This proposal attempts to learn from these previous attempts and the
|
|
rejection of :pep:`355`.
|
|
|
|
|
|
.. _`path.py module`: https://github.com/jaraco/path.py
|
|
.. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html
|
|
.. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass
|
|
.. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview
|
|
|
|
|
|
Why an object-oriented API
|
|
==========================
|
|
|
|
The rationale to represent filesystem paths using dedicated classes is the
|
|
same as for other kinds of stateless objects, such as dates, times or IP
|
|
addresses. Python has been slowly moving away from strictly replicating
|
|
the C language's APIs to providing better, more helpful abstractions around
|
|
all kinds of common functionality. Even if this PEP isn't accepted, it is
|
|
likely that another form of filesystem handling abstraction will be adopted
|
|
one day into the standard library.
|
|
|
|
Indeed, many people will prefer handling dates and times using the high-level
|
|
objects provided by the ``datetime`` module, rather than using numeric
|
|
timestamps and the ``time`` module API. Moreover, using a dedicated class
|
|
allows to enable desirable behaviours by default, for example the case
|
|
insensitivity of Windows paths.
|
|
|
|
|
|
Proposal
|
|
========
|
|
|
|
Class hierarchy
|
|
---------------
|
|
|
|
The `pathlib`_ module implements a simple hierarchy of classes::
|
|
|
|
+----------+
|
|
| |
|
|
---------| PurePath |--------
|
|
| | | |
|
|
| +----------+ |
|
|
| | |
|
|
| | |
|
|
v | v
|
|
+---------------+ | +------------+
|
|
| | | | |
|
|
| PurePosixPath | | | PureNTPath |
|
|
| | | | |
|
|
+---------------+ | +------------+
|
|
| v |
|
|
| +------+ |
|
|
| | | |
|
|
| -------| Path |------ |
|
|
| | | | | |
|
|
| | +------+ | |
|
|
| | | |
|
|
| | | |
|
|
v v v v
|
|
+-----------+ +--------+
|
|
| | | |
|
|
| PosixPath | | NTPath |
|
|
| | | |
|
|
+-----------+ +--------+
|
|
|
|
|
|
This hierarchy divides path classes along two dimensions:
|
|
|
|
* a path class can be either pure or concrete: pure classes support only
|
|
operations that don't need to do any actual I/O, which are most path
|
|
manipulation operations; concrete classes support all the operations
|
|
of pure classes, plus operations that do I/O.
|
|
|
|
* a path class is of a given flavour according to the kind of operating
|
|
system paths it represents. `pathlib`_ implements two flavours: NT paths
|
|
for the filesystem semantics embodied in Windows systems, POSIX paths for
|
|
other systems (``os.name``'s terminology is re-used here).
|
|
|
|
Any pure class can be instantiated on any system: for example, you can
|
|
manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects
|
|
under Unix, and so on. However, concrete classes can only be instantiated
|
|
on a matching system: indeed, it would be error-prone to start doing I/O
|
|
with ``NTPath`` objects under Unix, or vice-versa.
|
|
|
|
Furthermore, there are two base classes which also act as system-dependent
|
|
factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a
|
|
``PureNTPath`` depending on the operating system. Similarly, ``Path``
|
|
will instantiate either a ``PosixPath`` or a ``NTPath``.
|
|
|
|
It is expected that, in most uses, using the ``Path`` class is adequate,
|
|
which is why it has the shortest name of all.
|
|
|
|
|
|
No confusion with builtins
|
|
--------------------------
|
|
|
|
In this proposal, the path classes do not derive from a builtin type. This
|
|
contrasts with some other Path class proposals which were derived from
|
|
``str``. They also do not pretend to implement the sequence protocol:
|
|
if you want a path to act as a sequence, you have to lookup a dedicate
|
|
attribute (the ``parts`` attribute).
|
|
|
|
By avoiding to pass as builtin types, the path classes minimize the potential
|
|
for confusion if they are combined by accident with genuine builtin types.
|
|
|
|
|
|
Immutability
|
|
------------
|
|
|
|
Path objects are immutable, which makes them hashable and also prevents a
|
|
class of programming errors.
|
|
|
|
|
|
Sane behaviour
|
|
--------------
|
|
|
|
Little of the functionality from os.path is reused. Many os.path functions
|
|
are tied by backwards compatibility to confusing or plain wrong behaviour
|
|
(for example, the fact that ``os.path.abspath()`` simplifies ".." path
|
|
components without resolving symlinks first).
|
|
|
|
|
|
Comparisons
|
|
-----------
|
|
|
|
Paths of the same flavour are comparable and orderable, whether pure or not::
|
|
|
|
>>> PurePosixPath('a') == PurePosixPath('b')
|
|
False
|
|
>>> PurePosixPath('a') < PurePosixPath('b')
|
|
True
|
|
>>> PurePosixPath('a') == PosixPath('a')
|
|
True
|
|
|
|
Comparing and ordering Windows path objects is case-insensitive::
|
|
|
|
>>> PureNTPath('a') == PureNTPath('A')
|
|
True
|
|
|
|
Paths of different flavours always compare unequal, and cannot be ordered::
|
|
|
|
>>> PurePosixPath('a') == PureNTPath('a')
|
|
False
|
|
>>> PurePosixPath('a') < PureNTPath('a')
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in <module>
|
|
TypeError: unorderable types: PurePosixPath() < PureNTPath()
|
|
|
|
|
|
Useful notations
|
|
----------------
|
|
|
|
The API tries to provide useful notations all the while avoiding magic.
|
|
Some examples::
|
|
|
|
>>> p = Path('/home/antoine/pathlib/setup.py')
|
|
>>> p.name
|
|
'setup.py'
|
|
>>> p.ext
|
|
'.py'
|
|
>>> p.root
|
|
'/'
|
|
>>> p.parts
|
|
<PosixPath.parts: ['/', 'home', 'antoine', 'pathlib', 'setup.py']>
|
|
>>> list(p.parents())
|
|
[PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]
|
|
>>> p.exists()
|
|
True
|
|
>>> p.st_size
|
|
928
|
|
|
|
|
|
Pure paths API
|
|
==============
|
|
|
|
The philosophy of the ``PurePath`` API is to provide a consistent array of
|
|
useful path manipulation operations, without exposing a hodge-podge of
|
|
functions like ``os.path`` does.
|
|
|
|
|
|
Definitions
|
|
-----------
|
|
|
|
First a couple of conventions:
|
|
|
|
* All paths can have a drive and a root. For POSIX paths, the drive is
|
|
always empty.
|
|
|
|
* A relative path has neither drive nor root.
|
|
|
|
* A POSIX path is absolute if it has a root. A Windows path is absolute if
|
|
it has both a drive *and* a root. A Windows UNC path (e.g.
|
|
``\\some\share\myfile.txt``) always has a drive and a root
|
|
(here, ``\\some\share`` and ``\``, respectively).
|
|
|
|
* A drive which has either a drive *or* a root is said to be anchored.
|
|
Its anchor is the concatenation of the drive and root. Under POSIX,
|
|
"anchored" is the same as "absolute".
|
|
|
|
|
|
Construction
|
|
------------
|
|
|
|
We will present construction and joining together since they expose
|
|
similar semantics.
|
|
|
|
The simplest way to construct a path is to pass it its string representation::
|
|
|
|
>>> PurePath('setup.py')
|
|
PurePosixPath('setup.py')
|
|
|
|
Extraneous path separators and ``"."`` components are eliminated::
|
|
|
|
>>> PurePath('a///b/c/./d/')
|
|
PurePosixPath('a/b/c/d')
|
|
|
|
If you pass several arguments, they will be automatically joined::
|
|
|
|
>>> PurePath('docs', 'Makefile')
|
|
PurePosixPath('docs/Makefile')
|
|
|
|
Joining semantics are similar to os.path.join, in that anchored paths ignore
|
|
the information from the previously joined components::
|
|
|
|
>>> PurePath('/etc', '/usr', 'bin')
|
|
PurePosixPath('/usr/bin')
|
|
|
|
However, with Windows paths, the drive is retained as necessary::
|
|
|
|
>>> PureNTPath('c:/foo', '/Windows')
|
|
PureNTPath('c:\\Windows')
|
|
>>> PureNTPath('c:/foo', 'd:')
|
|
PureNTPath('d:')
|
|
|
|
Also, path separators are normalized to the platform default::
|
|
|
|
>>> PureNTPath('a/b') == PureNTPath('a\\b')
|
|
True
|
|
|
|
Extraneous path separators and ``"."`` components are eliminated, but not
|
|
``".."`` components::
|
|
|
|
>>> PurePosixPath('a//b/./c/')
|
|
PurePosixPath('a/b/c')
|
|
>>> PurePosixPath('a/../b')
|
|
PurePosixPath('a/../b')
|
|
|
|
Multiple leading slashes are treated differently depending on the path
|
|
flavour::
|
|
|
|
>>> PurePosixPath('//some/path')
|
|
PurePosixPath('/some/path')
|
|
>>> PureNTPath('//some/path')
|
|
PureNTPath('\\\\some\\path\\')
|
|
|
|
Calling the constructor without any argument creates a path object pointing
|
|
to the logical "current directory"::
|
|
|
|
>>> PurePosixPath()
|
|
PurePosixPath('.')
|
|
|
|
|
|
Representing
|
|
------------
|
|
|
|
To represent a path (e.g. to pass it to third-party libraries), just call
|
|
``str()`` on it::
|
|
|
|
>>> p = PurePath('/home/antoine/pathlib/setup.py')
|
|
>>> str(p)
|
|
'/home/antoine/pathlib/setup.py'
|
|
>>> p = PureNTPath('c:/windows')
|
|
>>> str(p)
|
|
'c:\\windows'
|
|
|
|
To force the string representation with forward slashes, use the ``as_posix()``
|
|
method::
|
|
|
|
>>> p.as_posix()
|
|
'c:/windows'
|
|
|
|
To get the bytes representation (which might be useful under Unix systems),
|
|
call ``bytes()`` on it, or use the ``as_bytes()`` method::
|
|
|
|
>>> bytes(p)
|
|
b'/home/antoine/pathlib/setup.py'
|
|
|
|
|
|
Properties
|
|
----------
|
|
|
|
Seven simple properties are provided on every path (each can be empty)::
|
|
|
|
>>> p = PureNTPath('c:/Downloads/pathlib.tar.gz')
|
|
>>> p.drive
|
|
'c:'
|
|
>>> p.root
|
|
'\\'
|
|
>>> p.anchor
|
|
'c:\\'
|
|
>>> p.name
|
|
'pathlib.tar.gz'
|
|
>>> p.basename
|
|
'pathlib.tar'
|
|
>>> p.suffix
|
|
'.gz'
|
|
>>> p.suffixes
|
|
['.tar', '.gz']
|
|
|
|
|
|
Deriving new paths
|
|
------------------
|
|
|
|
Joining
|
|
^^^^^^^
|
|
|
|
A path can be joined with another using the ``/`` operator::
|
|
|
|
>>> p = PurePosixPath('foo')
|
|
>>> p / 'bar'
|
|
PurePosixPath('foo/bar')
|
|
>>> p / PurePosixPath('bar')
|
|
PurePosixPath('foo/bar')
|
|
>>> 'bar' / p
|
|
PurePosixPath('bar/foo')
|
|
|
|
As with the constructor, multiple path components can be specified, either
|
|
collapsed or separately::
|
|
|
|
>>> p / 'bar/xyzzy'
|
|
PurePosixPath('foo/bar/xyzzy')
|
|
>>> p / 'bar' / 'xyzzy'
|
|
PurePosixPath('foo/bar/xyzzy')
|
|
|
|
A joinpath() method is also provided, with the same behaviour. It can serve
|
|
as a factory function::
|
|
|
|
>>> path_factory = p.joinpath
|
|
>>> path_factory('bar')
|
|
PurePosixPath('foo/bar')
|
|
|
|
Changing the path's final component
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The ``with_name()`` method returns a new path, with the name changed::
|
|
|
|
>>> p = PureNTPath('c:/Downloads/pathlib.tar.gz')
|
|
>>> p.with_name('setup.py')
|
|
PureNTPath('c:\\Downloads\\setup.py')
|
|
|
|
It fails with a ``ValueError`` if the path doesn't have an actual name::
|
|
|
|
>>> p = PureNTPath('c:/')
|
|
>>> p.with_name('setup.py')
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in <module>
|
|
File "pathlib.py", line 875, in with_name
|
|
raise ValueError("%r has an empty name" % (self,))
|
|
ValueError: PureNTPath('c:\\') has an empty name
|
|
>>> p.name
|
|
''
|
|
|
|
The ``with_suffix()`` method returns a new path with the suffix changed.
|
|
However, if the path has no suffix, the new suffix is added::
|
|
|
|
>>> p = PureNTPath('c:/Downloads/pathlib.tar.gz')
|
|
>>> p.with_suffix('.bz2')
|
|
PureNTPath('c:\\Downloads\\pathlib.tar.bz2')
|
|
>>> p = PureNTPath('README')
|
|
>>> p.with_suffix('.bz2')
|
|
PureNTPath('README.bz2')
|
|
|
|
Making the path relative
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The ``relative()`` method computes the relative difference of a path to
|
|
another::
|
|
|
|
>>> PurePosixPath('/usr/bin/python').relative('/usr')
|
|
PurePosixPath('bin/python')
|
|
|
|
ValueError is raised if the method cannot return a meaningful value::
|
|
|
|
>>> PurePosixPath('/usr/bin/python').relative('/etc')
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in <module>
|
|
File "pathlib.py", line 926, in relative
|
|
.format(str(self), str(formatted)))
|
|
ValueError: '/usr/bin/python' does not start with '/etc'
|
|
|
|
|
|
Sequence-like access
|
|
--------------------
|
|
|
|
The ``parts`` property provides read-only sequence access to a path object::
|
|
|
|
>>> p = PurePosixPath('/etc/init.d')
|
|
>>> p.parts
|
|
<PurePosixPath.parts: ['/', 'etc', 'init.d']>
|
|
|
|
Simple indexing returns the invidual path component as a string, while
|
|
slicing returns a new path object constructed from the selected components::
|
|
|
|
>>> p.parts[-1]
|
|
'init.d'
|
|
>>> p.parts[:-1]
|
|
PurePosixPath('/etc')
|
|
|
|
Windows paths handle the drive and the root as a single path component::
|
|
|
|
>>> p = PureNTPath('c:/setup.py')
|
|
>>> p.parts
|
|
<PureNTPath.parts: ['c:\\', 'setup.py']>
|
|
>>> p.root
|
|
'\\'
|
|
>>> p.parts[0]
|
|
'c:\\'
|
|
|
|
(separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).
|
|
|
|
The ``parent()`` method returns an ancestor of the path::
|
|
|
|
>>> p.parent()
|
|
PureNTPath('c:\\python33\\bin')
|
|
>>> p.parent(2)
|
|
PureNTPath('c:\\python33')
|
|
>>> p.parent(3)
|
|
PureNTPath('c:\\')
|
|
|
|
The ``parents()`` method automates repeated invocations of ``parent()``, until
|
|
the anchor is reached::
|
|
|
|
>>> p = PureNTPath('c:/python33/bin/python.exe')
|
|
>>> for parent in p.parents(): parent
|
|
...
|
|
PureNTPath('c:\\python33\\bin')
|
|
PureNTPath('c:\\python33')
|
|
PureNTPath('c:\\')
|
|
|
|
|
|
Querying
|
|
--------
|
|
|
|
``is_relative()`` returns True if the path is relative (see definition
|
|
above), False otherwise.
|
|
|
|
``is_reserved()`` returns True if a Windows path is a reserved path such
|
|
as ``CON`` or ``NUL``. It always returns False for POSIX paths.
|
|
|
|
``match()`` matches the path against a glob pattern::
|
|
|
|
>>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY')
|
|
True
|
|
|
|
``normcase()`` returns a case-folded version of the path for NT paths::
|
|
|
|
>>> PurePosixPath('CAPS').normcase()
|
|
PurePosixPath('CAPS')
|
|
>>> PureNTPath('CAPS').normcase()
|
|
PureNTPath('caps')
|
|
|
|
|
|
Concrete paths API
|
|
==================
|
|
|
|
In addition to the operations of the pure API, concrete paths provide
|
|
additional methods which actually access the filesystem to query or mutate
|
|
information.
|
|
|
|
|
|
Constructing
|
|
------------
|
|
|
|
The classmethod ``cwd()`` creates a path object pointing to the current
|
|
working directory in absolute form::
|
|
|
|
>>> Path.cwd()
|
|
PosixPath('/home/antoine/pathlib')
|
|
|
|
|
|
File metadata
|
|
-------------
|
|
|
|
The ``stat()`` method caches and returns the file's stat() result;
|
|
``restat()`` forces refreshing of the cache. ``lstat()`` is also provided,
|
|
but doesn't have any caching behaviour::
|
|
|
|
>>> p.stat()
|
|
posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
|
|
|
|
For ease of use, direct attribute access to the fields of the stat structure
|
|
is provided over the path object itself::
|
|
|
|
>>> p.st_size
|
|
928
|
|
>>> p.st_mtime
|
|
1328287308.889562
|
|
|
|
Higher-level methods help examine the kind of the file::
|
|
|
|
>>> p.exists()
|
|
True
|
|
>>> p.is_file()
|
|
True
|
|
>>> p.is_dir()
|
|
False
|
|
>>> p.is_symlink()
|
|
False
|
|
|
|
The file owner and group names (rather than numeric ids) are queried
|
|
through matching properties::
|
|
|
|
>>> p = Path('/etc/shadow')
|
|
>>> p.owner
|
|
'root'
|
|
>>> p.group
|
|
'shadow'
|
|
|
|
|
|
Path resolution
|
|
---------------
|
|
|
|
The ``resolve()`` method makes a path absolute, resolving any symlink on
|
|
the way. It is the only operation which will remove "``..``" path components.
|
|
|
|
|
|
Directory walking
|
|
-----------------
|
|
|
|
Simple (non-recursive) directory access is done by iteration::
|
|
|
|
>>> p = Path('docs')
|
|
>>> for child in p: child
|
|
...
|
|
PosixPath('docs/conf.py')
|
|
PosixPath('docs/_templates')
|
|
PosixPath('docs/make.bat')
|
|
PosixPath('docs/index.rst')
|
|
PosixPath('docs/_build')
|
|
PosixPath('docs/_static')
|
|
PosixPath('docs/Makefile')
|
|
|
|
This allows simple filtering through list comprehensions::
|
|
|
|
>>> p = Path('.')
|
|
>>> [child for child in p if child.is_dir()]
|
|
[PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
|
|
|
|
Simple and recursive globbing is also provided::
|
|
|
|
>>> for child in p.glob('**/*.py'): child
|
|
...
|
|
PosixPath('test_pathlib.py')
|
|
PosixPath('setup.py')
|
|
PosixPath('pathlib.py')
|
|
PosixPath('docs/conf.py')
|
|
PosixPath('build/lib/pathlib.py')
|
|
|
|
|
|
File opening
|
|
------------
|
|
|
|
The ``open()`` method provides a file opening API similar to the builtin
|
|
``open()`` method::
|
|
|
|
>>> p = Path('setup.py')
|
|
>>> with p.open() as f: f.readline()
|
|
...
|
|
'#!/usr/bin/env python3\n'
|
|
|
|
The ``raw_open()`` method, on the other hand, is similar to ``os.open``::
|
|
|
|
>>> fd = p.raw_open(os.O_RDONLY)
|
|
>>> os.read(fd, 15)
|
|
b'#!/usr/bin/env '
|
|
|
|
|
|
Filesystem alteration
|
|
---------------------
|
|
|
|
Several common filesystem operations are provided as methods: ``touch()``,
|
|
``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``,
|
|
``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be
|
|
provided, for example some of the functionality of the shutil module.
|
|
|
|
|
|
Experimental openat() support
|
|
-----------------------------
|
|
|
|
On compatible POSIX systems, the concrete PosixPath class can take advantage
|
|
of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of
|
|
open file descriptors as necessary. Support is enabled by passing the
|
|
*use_openat* argument to the constructor::
|
|
|
|
>>> p = Path(".", use_openat=True)
|
|
|
|
Then all paths constructed by navigating this path (either by iteration or
|
|
indexing) will also use the openat() family of functions. The point of using
|
|
these functions is to avoid race conditions whereby a given directory is
|
|
silently replaced with another (often a symbolic link to a sensitive system
|
|
location) between two accesses.
|
|
|
|
.. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed into the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|