Push PEP 428 - object-oriented filesystem paths
This commit is contained in:
parent
c373707569
commit
56f8cb38a7
|
@ -0,0 +1,568 @@
|
|||
PEP: 428
|
||||
Title: The pathlib module -- object-oriented filesystem paths
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date
|
||||
Author: Antoine Pitrou <solipsis@pitrou.net>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 30-July-2012
|
||||
Python-Version: 3.4
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes the inclusion of a third-party module, `pathlib`_, in
|
||||
the standard library. The inclusion is proposed under the provisional
|
||||
label, as described in :pep:`411`. Therefore, API changes can be done,
|
||||
either as part of the PEP process, or after acceptance in the standard
|
||||
library (and until the provisional label is removed).
|
||||
|
||||
The aim of this library is to provide a simple hierarchy of classes to
|
||||
handle filesystem paths and the common operations users do over them.
|
||||
|
||||
.. _`pathlib`: http://pypi.python.org/pypi/pathlib/
|
||||
|
||||
|
||||
Related work
|
||||
============
|
||||
|
||||
An object-oriented API for filesystem paths has already been proposed
|
||||
and rejected in :pep:`355`. Several third-party implementations of the
|
||||
idea of object-oriented filesystem paths exist in the wild:
|
||||
|
||||
* The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs
|
||||
and others, which provides a ``str``-subclassing ``Path`` class;
|
||||
|
||||
* Twisted's slightly specialized `FilePath class`_;
|
||||
|
||||
* An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than
|
||||
``str``;
|
||||
|
||||
* `Unipath`_, a variation on the str-subclassing approach with two public
|
||||
classes, an ``AbstractPath`` class for operations which don't do I/O and a
|
||||
``Path`` class for all common operations.
|
||||
|
||||
This proposal attempts to learn from these previous attempts and the
|
||||
rejection of :pep:`355`.
|
||||
|
||||
|
||||
.. _`path.py module`: https://github.com/jaraco/path.py
|
||||
.. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html
|
||||
.. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass
|
||||
.. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview
|
||||
|
||||
|
||||
Why an object-oriented API
|
||||
==========================
|
||||
|
||||
The rationale to represent filesystem paths using dedicated classes is the
|
||||
same as for other kinds of stateless objects, such as dates, times or IP
|
||||
addresses. Python has been slowly moving away from strictly replicating
|
||||
the C language's APIs to providing better, more helpful abstractions around
|
||||
all kinds of common functionality. Even if this PEP isn't accepted, it is
|
||||
likely that another form of filesystem handling abstraction will be adopted
|
||||
one day into the standard library.
|
||||
|
||||
Indeed, many people will prefer handling dates and times using the high-level
|
||||
objects provided by the ``datetime`` module, rather than using numeric
|
||||
timestamps and the ``time`` module API. Moreover, using a dedicated class
|
||||
allows to enable desirable behaviours by default, for example the case
|
||||
insensitivity of Windows paths.
|
||||
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
||||
Class hierarchy
|
||||
---------------
|
||||
|
||||
The `pathlib`_ module implements a simple hierarchy of classes::
|
||||
|
||||
+----------+
|
||||
| |
|
||||
---------| PurePath |--------
|
||||
| | | |
|
||||
| +----------+ |
|
||||
| | |
|
||||
| | |
|
||||
v | v
|
||||
+---------------+ | +------------+
|
||||
| | | | |
|
||||
| PurePosixPath | | | PureNTPath |
|
||||
| | | | |
|
||||
+---------------+ | +------------+
|
||||
| v |
|
||||
| +------+ |
|
||||
| | | |
|
||||
| -------| Path |------ |
|
||||
| | | | | |
|
||||
| | +------+ | |
|
||||
| | | |
|
||||
| | | |
|
||||
v v v v
|
||||
+-----------+ +--------+
|
||||
| | | |
|
||||
| PosixPath | | NTPath |
|
||||
| | | |
|
||||
+-----------+ +--------+
|
||||
|
||||
|
||||
This hierarchy divides path classes along two dimensions:
|
||||
|
||||
* a path class can be either pure or concrete: pure classes support only
|
||||
operations that don't need to do any actual I/O, which are most path
|
||||
manipulation operations; concrete classes support all the operations
|
||||
of pure classes, plus operations that do I/O.
|
||||
|
||||
* a path class is of a given flavour according to the kind of operating
|
||||
system paths it represents. `pathlib`_ implements two flavours: NT paths
|
||||
for the filesystem semantics embodied in Windows systems, POSIX paths for
|
||||
other systems (``os.name``'s terminology is re-used here).
|
||||
|
||||
Any pure class can be instantiated on any system: for example, you can
|
||||
manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects
|
||||
under Unix, and so on. However, concrete classes can only be instantiated
|
||||
on a matching system: indeed, it would be error-prone to start doing I/O
|
||||
with ``NTPath`` objects under Unix, or vice-versa.
|
||||
|
||||
Furthermore, there are two base classes which also act as system-dependent
|
||||
factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a
|
||||
``PureNTPath`` depending on the operating system. Similarly, ``Path``
|
||||
will instantiate either a ``PosixPath`` or a ``NTPath``.
|
||||
|
||||
It is expected that, in most uses, using the ``Path`` class is adequate,
|
||||
which is why it has the shortest name of all.
|
||||
|
||||
|
||||
No confusion with builtins
|
||||
--------------------------
|
||||
|
||||
In this proposal, the path classes do not derive from a builtin type. This
|
||||
contrasts with some other Path class proposals which were derived from
|
||||
``str``. They also do not pretend to implement the sequence protocol:
|
||||
if you want a path to act as a sequence, you have to lookup a dedicate
|
||||
attribute (the ``parts`` attribute).
|
||||
|
||||
By avoiding to pass as builtin types, the path classes minimize the potential
|
||||
for confusion if they are combined by accident with genuine builtin types.
|
||||
|
||||
|
||||
Immutability
|
||||
------------
|
||||
|
||||
Path objects are immutable, which makes them hashable and also prevents a
|
||||
class of programming errors.
|
||||
|
||||
|
||||
Sane behaviour
|
||||
--------------
|
||||
|
||||
Little of the functionality from os.path is reused. Many os.path functions
|
||||
are tied by backwards compatibility to confusing or plain wrong behaviour
|
||||
(for example, the fact that ``os.path.abspath()`` simplifies ".." path
|
||||
components without resolving symlinks first).
|
||||
|
||||
Also, using classes instead of plain strings helps make system-dependent
|
||||
behaviours natural. For example, comparing and ordering Windows path
|
||||
objects is case-insensitive, and path separators are automatically converted
|
||||
to the platform default.
|
||||
|
||||
|
||||
Useful notations
|
||||
----------------
|
||||
|
||||
The API tries to provide useful notations all the while avoiding magic.
|
||||
Some examples::
|
||||
|
||||
>>> p = Path('/home/antoine/pathlib/setup.py')
|
||||
>>> p.name
|
||||
'setup.py'
|
||||
>>> p.ext
|
||||
'.py'
|
||||
>>> p.root
|
||||
'/'
|
||||
>>> p.parts
|
||||
<PosixPath.parts: ['/', 'home', 'antoine', 'pathlib', 'setup.py']>
|
||||
>>> list(p.parents())
|
||||
[PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]
|
||||
>>> p.exists()
|
||||
True
|
||||
>>> p.st_size
|
||||
928
|
||||
|
||||
|
||||
Pure paths API
|
||||
==============
|
||||
|
||||
The philosophy of the ``PurePath`` API is to provide a consistent array of
|
||||
useful path manipulation operations, without exposing a hodge-podge of
|
||||
functions like ``os.path`` does.
|
||||
|
||||
|
||||
Definitions
|
||||
-----------
|
||||
|
||||
First a couple of conventions:
|
||||
|
||||
* All paths can have a drive and a root. For POSIX paths, the drive is
|
||||
always empty.
|
||||
|
||||
* A relative path has neither drive nor root.
|
||||
|
||||
* A POSIX path is absolute if it has a root. A Windows path is absolute if
|
||||
it has both a drive *and* a root. A Windows UNC path (e.g.
|
||||
``\\some\\share\\myfile.txt``) always has a drive and a root
|
||||
(here, ``\\some\\share`` and ``\\``, respectively).
|
||||
|
||||
* A drive which has either a drive *or* a root is said to be anchored.
|
||||
Its anchor is the concatenation of the drive and root. Under POSIX,
|
||||
"anchored" is the same as "absolute".
|
||||
|
||||
|
||||
Construction and joining
|
||||
------------------------
|
||||
|
||||
We will present construction and joining together since they expose
|
||||
similar semantics.
|
||||
|
||||
The simplest way to construct a path is to pass it its string representation::
|
||||
|
||||
>>> PurePath('setup.py')
|
||||
PurePosixPath('setup.py')
|
||||
|
||||
Extraneous path separators and ``"."`` components are eliminated::
|
||||
|
||||
>>> PurePath('a///b/c/./d/')
|
||||
PurePosixPath('a/b/c/d')
|
||||
|
||||
If you pass several arguments, they will be automatically joined::
|
||||
|
||||
>>> PurePath('docs', 'Makefile')
|
||||
PurePosixPath('docs/Makefile')
|
||||
|
||||
Joining semantics are similar to os.path.join, in that anchored paths ignore
|
||||
the information from the previously joined components::
|
||||
|
||||
>>> PurePath('/etc', '/usr', 'bin')
|
||||
PurePosixPath('/usr/bin')
|
||||
|
||||
However, with Windows paths, the drive is retained as necessary::
|
||||
|
||||
>>> PureNTPath('c:/foo', '/Windows')
|
||||
PureNTPath('c:\\Windows')
|
||||
>>> PureNTPath('c:/foo', 'd:')
|
||||
PureNTPath('d:')
|
||||
|
||||
Calling the constructor without any argument creates a path object pointing
|
||||
to the logical "current directory"::
|
||||
|
||||
>>> PurePosixPath()
|
||||
PurePosixPath('.')
|
||||
|
||||
A path can be joined with another using the ``__getitem__`` operator::
|
||||
|
||||
>>> p = PurePosixPath('foo')
|
||||
>>> p['bar']
|
||||
PurePosixPath('foo/bar')
|
||||
>>> p[PurePosixPath('bar')]
|
||||
PurePosixPath('foo/bar')
|
||||
|
||||
As with constructing, multiple path components can be specified at once::
|
||||
|
||||
>>> p['bar/xyzzy']
|
||||
PurePosixPath('foo/bar/xyzzy')
|
||||
|
||||
A join() method is also provided, with the same behaviour. It can serve
|
||||
as a factory function::
|
||||
|
||||
>>> path_factory = p.join
|
||||
>>> path_factory('bar')
|
||||
PurePosixPath('foo/bar')
|
||||
|
||||
|
||||
Representing
|
||||
------------
|
||||
|
||||
To represent a path (e.g. to pass it to third-party libraries), just call
|
||||
``str()`` on it::
|
||||
|
||||
>>> p = PurePath('/home/antoine/pathlib/setup.py')
|
||||
>>> str(p)
|
||||
'/home/antoine/pathlib/setup.py'
|
||||
>>> p = PureNTPath('c:/windows')
|
||||
>>> str(p)
|
||||
'c:\\windows'
|
||||
|
||||
To force the string representation with forward slashes, use the ``as_posix()``
|
||||
method::
|
||||
|
||||
>>> p.as_posix()
|
||||
'c:/windows'
|
||||
|
||||
To get the bytes representation (which might be useful under Unix systems),
|
||||
call ``bytes()`` on it, or use the ``as_bytes()`` method::
|
||||
|
||||
>>> bytes(p)
|
||||
b'/home/antoine/pathlib/setup.py'
|
||||
|
||||
|
||||
Properties
|
||||
----------
|
||||
|
||||
Five simple properties are provided on every path (each can be empty)::
|
||||
|
||||
>>> p = PureNTPath('c:/pathlib/setup.py')
|
||||
>>> p.drive
|
||||
'c:'
|
||||
>>> p.root
|
||||
'\\'
|
||||
>>> p.anchor
|
||||
'c:\\'
|
||||
>>> p.name
|
||||
'setup.py'
|
||||
>>> p.ext
|
||||
'.py'
|
||||
|
||||
|
||||
Sequence-like access
|
||||
--------------------
|
||||
|
||||
The ``parts`` property provides read-only sequence access to a path object::
|
||||
|
||||
>>> p = PurePosixPath('/etc/init.d')
|
||||
>>> p.parts
|
||||
<PurePosixPath.parts: ['/', 'etc', 'init.d']>
|
||||
|
||||
Simple indexing returns the invidual path component as a string, while
|
||||
slicing returns a new path object constructed from the selected components::
|
||||
|
||||
>>> p.parts[-1]
|
||||
'init.d'
|
||||
>>> p.parts[:-1]
|
||||
PurePosixPath('/etc')
|
||||
|
||||
Windows paths handle the drive and the root as a single path component::
|
||||
|
||||
>>> p = PureNTPath('c:/setup.py')
|
||||
>>> p.parts
|
||||
<PureNTPath.parts: ['c:\\', 'setup.py']>
|
||||
>>> p.root
|
||||
'\\'
|
||||
>>> p.parts[0]
|
||||
'c:\\'
|
||||
|
||||
(separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).
|
||||
|
||||
The ``parent()`` method returns an ancestor of the path::
|
||||
|
||||
>>> p.parent()
|
||||
PureNTPath('c:\\python33\\bin')
|
||||
>>> p.parent(2)
|
||||
PureNTPath('c:\\python33')
|
||||
>>> p.parent(3)
|
||||
PureNTPath('c:\\')
|
||||
|
||||
The ``parents()`` method automates repeated invocations of ``parent()``, until
|
||||
the anchor is reached::
|
||||
|
||||
>>> p = PureNTPath('c:/python33/bin/python.exe')
|
||||
>>> for parent in p.parents(): parent
|
||||
...
|
||||
PureNTPath('c:\\python33\\bin')
|
||||
PureNTPath('c:\\python33')
|
||||
PureNTPath('c:\\')
|
||||
|
||||
|
||||
Querying
|
||||
--------
|
||||
|
||||
``is_relative()`` returns True if the path is relative (see definition
|
||||
above), False otherwise.
|
||||
|
||||
``is_reserved()`` returns True if a Windows path is a reserved path such
|
||||
as ``CON`` or ``NUL``. It always returns False for POSIX paths.
|
||||
|
||||
``match()`` matches the path against a glob pattern::
|
||||
|
||||
>>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY')
|
||||
True
|
||||
|
||||
``relative()`` returns a new relative path by stripping the drive and root::
|
||||
|
||||
>>> PurePosixPath('setup.py').relative()
|
||||
PurePosixPath('setup.py')
|
||||
>>> PurePosixPath('/setup.py').relative()
|
||||
PurePosixPath('setup.py')
|
||||
|
||||
``relative_to()`` computes the relative difference of a path to another::
|
||||
|
||||
>>> PurePosixPath('/usr/bin/python').relative_to('/usr')
|
||||
PurePosixPath('bin/python')
|
||||
|
||||
``normcase()`` returns a case-folded version of the path for NT paths::
|
||||
|
||||
>>> PurePosixPath('CAPS').normcase()
|
||||
PurePosixPath('CAPS')
|
||||
>>> PureNTPath('CAPS').normcase()
|
||||
PureNTPath('caps')
|
||||
|
||||
|
||||
Concrete paths API
|
||||
==================
|
||||
|
||||
In addition to the operations of the pure API, concrete paths provide
|
||||
additional methods which actually access the filesystem to query or mutate
|
||||
information.
|
||||
|
||||
|
||||
Constructing
|
||||
------------
|
||||
|
||||
The classmethod ``cwd()`` creates a path object pointing to the current
|
||||
working directory in absolute form::
|
||||
|
||||
>>> Path.cwd()
|
||||
PosixPath('/home/antoine/pathlib')
|
||||
|
||||
|
||||
File metadata
|
||||
-------------
|
||||
|
||||
The ``stat()`` method caches and returns the file's stat() result;
|
||||
``restat()`` forces refreshing of the cache. ``lstat()`` is also provided,
|
||||
but doesn't have any caching behaviour::
|
||||
|
||||
>>> p.stat()
|
||||
posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
|
||||
|
||||
For ease of use, direct attribute access to the fields of the stat structure
|
||||
is provided over the path object itself::
|
||||
|
||||
>>> p.st_size
|
||||
928
|
||||
>>> p.st_mtime
|
||||
1328287308.889562
|
||||
|
||||
Higher-level methods help examine the kind of the file::
|
||||
|
||||
>>> p.exists()
|
||||
True
|
||||
>>> p.is_file()
|
||||
True
|
||||
>>> p.is_dir()
|
||||
False
|
||||
>>> p.is_symlink()
|
||||
False
|
||||
|
||||
The file owner and group names (rather than numeric ids) are queried
|
||||
through matching properties::
|
||||
|
||||
>>> p = Path('/etc/shadow')
|
||||
>>> p.owner
|
||||
'root'
|
||||
>>> p.group
|
||||
'shadow'
|
||||
|
||||
|
||||
Path resolution
|
||||
---------------
|
||||
|
||||
The ``resolve()`` method makes a path absolute, resolving any symlink on
|
||||
the way. It is the only operation which will remove "``..``" path components.
|
||||
|
||||
|
||||
Directory walking
|
||||
-----------------
|
||||
|
||||
Simple (non-recursive) directory access is done by iteration::
|
||||
|
||||
>>> p = Path('docs')
|
||||
>>> for child in p: child
|
||||
...
|
||||
PosixPath('docs/conf.py')
|
||||
PosixPath('docs/_templates')
|
||||
PosixPath('docs/make.bat')
|
||||
PosixPath('docs/index.rst')
|
||||
PosixPath('docs/_build')
|
||||
PosixPath('docs/_static')
|
||||
PosixPath('docs/Makefile')
|
||||
|
||||
This allows simple filtering through list comprehensions::
|
||||
|
||||
>>> p = Path('.')
|
||||
>>> [child for child in p if child.is_dir()]
|
||||
[PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
|
||||
|
||||
Simple and recursive globbing is also provided::
|
||||
|
||||
>>> for child in p.glob('**/*.py'): child
|
||||
...
|
||||
PosixPath('test_pathlib.py')
|
||||
PosixPath('setup.py')
|
||||
PosixPath('pathlib.py')
|
||||
PosixPath('docs/conf.py')
|
||||
PosixPath('build/lib/pathlib.py')
|
||||
|
||||
|
||||
File opening
|
||||
------------
|
||||
|
||||
The ``open()`` method provides a file opening API similar to the builtin
|
||||
``open()`` method::
|
||||
|
||||
>>> p = Path('setup.py')
|
||||
>>> with p.open() as f: f.readline()
|
||||
...
|
||||
'#!/usr/bin/env python3\n'
|
||||
|
||||
The ``raw_open()`` method, on the other hand, is similar to ``os.open``::
|
||||
|
||||
>>> fd = p.raw_open(os.O_RDONLY)
|
||||
>>> os.read(fd, 15)
|
||||
b'#!/usr/bin/env '
|
||||
|
||||
|
||||
Filesystem alteration
|
||||
---------------------
|
||||
|
||||
Several common filesystem operations are provided as methods: ``touch()``,
|
||||
``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``,
|
||||
``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be
|
||||
provided, for example some of the functionality of the shutil module.
|
||||
|
||||
|
||||
Experimental openat() support
|
||||
-----------------------------
|
||||
|
||||
On compatible POSIX systems, the concrete PosixPath class can take advantage
|
||||
of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of
|
||||
open file descriptors as necessary. Support is enabled by passing the
|
||||
*use_openat* argument to the constructor::
|
||||
|
||||
>>> p = Path(".", use_openat=True)
|
||||
|
||||
Then all paths constructed by navigating this path (either by iteration or
|
||||
indexing) will also use the openat() family of functions. The point of using
|
||||
these functions is to avoid race conditions whereby a given directory is
|
||||
silently replaced with another (often a symbolic link to a sensitive system
|
||||
location) between two accesses.
|
||||
|
||||
.. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed into the public domain.
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
Loading…
Reference in New Issue