205 lines
6.4 KiB
ReStructuredText
205 lines
6.4 KiB
ReStructuredText
PEP: 721
|
|
Title: Using tarfile.data_filter for source distribution extraction
|
|
Author: Petr Viktorin <encukou@gmail.com>
|
|
PEP-Delegate: Paul Moore <p.f.moore@gmail.com>
|
|
Status: Accepted
|
|
Type: Standards Track
|
|
Topic: Packaging
|
|
Content-Type: text/x-rst
|
|
Requires: 706
|
|
Created: 12-Jul-2023
|
|
Python-Version: 3.12
|
|
Post-History: `04-Jul-2023 <https://discuss.python.org/t/28928>`__,
|
|
Resolution: https://discuss.python.org/t/28928/13
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Extracting a source distribution archive should normally use the ``data``
|
|
filter added in :pep:`706`.
|
|
We clarify details, and specify the behaviour for tools that cannot use the
|
|
filter directly.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
The *source distribution* ``sdist`` is defined as a tar archive.
|
|
|
|
The ``tar`` format is designed to capture all metadata of Unix-like files.
|
|
Some of these are dangerous, unnecessary for source code, and/or
|
|
platform-dependent.
|
|
As explained in :pep:`706`, when extracting a tarball, one should always either
|
|
limit the allowed features, or explicitly give the tarball total control.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
For source distributions, the ``data`` filter introduced in :pep:`706`
|
|
is enough. It allows slightly more features than ``git`` and ``zip`` (both
|
|
commonly used in packaging workflows).
|
|
|
|
However, not all tools can use the ``data`` filter,
|
|
so this PEP specifies an explicit set of expectations.
|
|
The aim is that the current behaviour of ``pip download``
|
|
and ``setuptools.archive_util.unpack_tarfile`` is valid,
|
|
except cases deemed too dangerous to allow.
|
|
Another consideration is ease of implementation for non-Python tools.
|
|
|
|
|
|
Unpatched versions of Python
|
|
----------------------------
|
|
|
|
Tools are allowed to ignore this PEP when running on Python without tarfile
|
|
filters.
|
|
|
|
The feature has been backported to all versions of Python supported by
|
|
``python.org``. Vendoring it in third-party libraries is tricky,
|
|
and we should not force all tools to do so.
|
|
This shifts the responsibility to keep up with security updates from the tools
|
|
to the users.
|
|
|
|
|
|
Permissions
|
|
-----------
|
|
|
|
Common tools (``git``, ``zip``) don't preserve Unix permissions (mode bits).
|
|
Telling users to not rely on them in *sdists*, and allowing tools to handle
|
|
them relatively freely, seems fair.
|
|
|
|
The only exception is the *executable* permission.
|
|
We recommend, but not require, that tools preserve it.
|
|
Given that scripts are generally platform-specific, it seems fitting to
|
|
say that keeping them executable is tool-specific behaviour.
|
|
|
|
Note that while ``git`` preserves executability, ``zip`` (and thus ``wheel``)
|
|
doesn't do it natively. (It is possible to encode it in “external attributes”,
|
|
but Python's ``ZipFile.extract`` does not honour that.)
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
The following will be added to `the PyPA source distribution format spec <https://packaging.python.org/en/latest/specifications/source-distribution-format/>`_
|
|
under a new heading, “*Source distribution archive features*”:
|
|
|
|
Because extracting tar files as-is is dangerous, and the results are
|
|
platform-specific, archive features of source distributions are limited.
|
|
|
|
Unpacking with the data filter
|
|
------------------------------
|
|
|
|
When extracting a source distribution, tools MUST either use
|
|
``tarfile.data_filter`` (e.g. ``TarFile.extractall(..., filter='data')``), OR
|
|
follow the *Unpacking without the data filter* section below.
|
|
|
|
As an exception, on Python interpreters without ``hasattr(tarfile, 'data_filter')``
|
|
(:pep:`706`), tools that normally use that filter (directly on indirectly)
|
|
MAY warn the user and ignore this specification.
|
|
The trade-off between usability (e.g. fully trusting the archive) and
|
|
security (e.g. refusing to unpack) is left up to the tool in this case.
|
|
|
|
|
|
Unpacking without the data filter
|
|
---------------------------------
|
|
|
|
Tools that do not use the ``data`` filter directly (e.g. for backwards
|
|
compatibility, allowing additional features, or not using Python) MUST follow
|
|
this section.
|
|
(At the time of this writing, the ``data`` filter also follows this section,
|
|
but it may get out of sync in the future.)
|
|
|
|
The following files are invalid in an ``sdist`` archive.
|
|
Upon encountering such an entry, tools SHOULD notify the user,
|
|
MUST NOT unpack the entry, and MAY abort with a failure:
|
|
|
|
- Files that would be placed outside the destination directory.
|
|
- Links (symbolic or hard) pointing outside the destination directory.
|
|
- Device files (including pipes).
|
|
|
|
The following are also invalid. Tools MAY treat them as above,
|
|
but are NOT REQUIRED to do so:
|
|
|
|
- Files with a ``..`` component in the filename or link target.
|
|
- Links pointing to a file that is not part of the archive.
|
|
|
|
Tools MAY unpack links (symbolic or hard) as regular files,
|
|
using content from the archive.
|
|
|
|
When extracting ``sdist`` archives:
|
|
|
|
- Leading slashes in file names MUST be dropped.
|
|
(This is nowadays standard behaviour for ``tar`` unpacking.)
|
|
- For each ``mode`` (Unix permission) bit, tools MUST either:
|
|
|
|
- use the platform's default for a new file/directory (respectively),
|
|
- set the bit according to the archive, or
|
|
- use the bit from ``rw-r--r--`` (``0o644``) for non-executable files or
|
|
``rwxr-xr-x`` (``0o755``) for executable files and directories.
|
|
|
|
- High ``mode`` bits (setuid, setgid, sticky) MUST be cleared.
|
|
- It is RECOMMENDED to preserve the user *executable* bit.
|
|
|
|
|
|
Further hints
|
|
-------------
|
|
|
|
Tool authors are encouraged to consider how *hints for further
|
|
verification* in ``tarfile`` documentation apply for their tool.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
The existing behaviour is unspecified, and treated differently by different
|
|
tools.
|
|
This PEP makes the expectations explicit.
|
|
|
|
There is no known case of backwards incompatibility, but some project out there
|
|
probably does rely on details that aren't guaranteed.
|
|
This PEP bans the most dangerous of those features, and the rest is
|
|
made tool-specific.
|
|
|
|
|
|
Security Implications
|
|
=====================
|
|
|
|
The recommended ``data`` filter is believed safe against common exploits,
|
|
and is a single place to amend if flaws are found in the future.
|
|
|
|
The explicit specification includes protections from the ``data`` filter.
|
|
|
|
|
|
How to Teach This
|
|
=================
|
|
|
|
The PEP is aimed at authors of packaging tools, who should be fine with
|
|
a PEP and an updated packaging spec.
|
|
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
TBD
|
|
|
|
|
|
Rejected Ideas
|
|
==============
|
|
|
|
None yet.
|
|
|
|
|
|
Open Issues
|
|
===========
|
|
|
|
None yet.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|