PEP 721 (new): Using tarfile.data_filter for source distribution extraction (#3198)

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
This commit is contained in:
Petr Viktorin 2023-07-13 12:01:17 +02:00 committed by GitHub
parent 08124264e6
commit b91880181b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 205 additions and 0 deletions

2
.github/CODEOWNERS vendored
View File

@ -597,6 +597,8 @@ pep-0713.rst @ambv
pep-0714.rst @dstufft
pep-0715.rst @dstufft
pep-0719.rst @Yhg1s
# pep-0720.rst (reserved for https://github.com/python/peps/pull/3192)
pep-0721.rst @encukou
# ...
# pep-0754.txt
# ...

203
pep-0721.rst Normal file
View File

@ -0,0 +1,203 @@
PEP: 721
Title: Using tarfile.data_filter for source distribution extraction
Author: Petr Viktorin <encukou@gmail.com>
PEP-Delegate: Paul Moore <p.f.moore@gmail.com>
Status: Draft
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Requires: 706
Created: 12-Jul-2023
Python-Version: 3.12
Post-History: `04-Jul-2023 <https://discuss.python.org/t/28928>`__,
Abstract
========
Extracting a source distribution archive should normally use the ``data``
filter added in :pep:`706`.
We clarify details, and specify the behaviour for tools that cannot use the
filter directly.
Motivation
==========
The *source distribution* ``sdist`` is defined as a tar archive.
The ``tar`` format is designed to capture all metadata of Unix-like files.
Some of these are dangerous, unnecessary for source code, and/or
platform-dependent.
As explained in :pep:`706`, when extracting a tarball, one should always either
limit the allowed features, or explicitly give the tarball total control.
Rationale
=========
For source distributions, the ``data`` filter introduced in :pep:`706`
is enough. It allows slightly more features than ``git`` and ``zip`` (both
commonly used in packaging workflows).
However, not all tools can use the ``data`` filter,
so this PEP specifies an explicit set of expectations.
The aim is that the current behaviour of ``pip download``
and ``setuptools.archive_util.unpack_tarfile`` is valid,
except cases deemed too dangerous to allow.
Another consideration is ease of implementation for non-Python tools.
Unpatched versions of Python
----------------------------
Tools are allowed to ignore this PEP when running on Python pithout tarfile
filters.
The feature has been backported to all versions of Python supported by
``python.org``. Vendoring it in third-party libraries is tricky,
and we should not force all tools to do so.
This shifts the responsibility to keep up with security updates from the tools
to the users.
Permissions
-----------
Common tools (``git``, ``zip``) don't preserve Unix permissions (mode bits).
Telling users to not rely on them in *sdists*, and allowing tools to handle
them relatively freely, seems fair.
The only exception is the *executable* permission.
We recommend, but not require, that tools preserve it.
Given that scripts are generally platform-specific, it seems fitting to
say that keeping them executable is tool-specific behaviour.
Note that while ``git`` preserves executability, ``zip`` (and thus ``wheel``)
doesn't do it natively. (It is possible to encode it in “external attributes”,
but Python's ``ZipFile.extract`` does not honour that.)
Specification
=============
The following will be added to `the PyPA source distribution format spec <https://packaging.python.org/en/latest/specifications/source-distribution-format/>`_
under a new heading, “*Source distribution archive features*”:
Because extracting tar files as-is is dangerous, and the results are
platform-specific, archive features of source distributions are limited.
Unpacking with the data filter
------------------------------
When extracting a source distribution, tools MUST either use
``tarfile.data_filter`` (e.g. ``TarFile.extractall(..., filter='data')``), OR
follow the *Unpacking without the data filter* section below.
As an exception, on Python interpreters without ``hasattr(tarfile, 'data_filter')``
(:pep:`706`), tools that normally use that filter (directly on indirectly)
MAY warn the user and ignore this specification.
The trade-off between usability (e.g. fully trusting the archive) and
security (e.g. refusing to unpack) is left up to the tool in this case.
Unpacking without the data filter
---------------------------------
Tools that do not use the ``data`` filter directly (e.g. for backwards
compatibility, allowing additional features, or not using Python) MUST follow
this section.
(At the time of this writing, the ``data`` filter also follows this section,
but it may get out of sync in the future.)
The following files are invalid in an ``sdist`` archive.
Upon encountering such an entry, tools SHOULD notify the user,
MUST NOT unpack the entry, and MAY abort with a failure:
- Files that would be placed outside the destination directory.
- Links (symbolic or hard) pointing outside the destination directory.
- Device files (including pipes).
The following are also invalid. Tools MAY treat them as above,
but are NOT REQUIRED to do so:
- Files with a ``..`` component in the filename or link target.
- Links pointing to a file that is not part of the archive.
Tools MAY unpack links (symbolic or hard) as regular files,
using content from the archive.
When extracting ``sdist`` archives:
- Leading slashes in file names MUST be dropped.
(This is nowadays standard behaviour for ``tar`` unpacking.)
- For each ``mode`` (Unix permission) bit, tools MUST either:
- use the platform's default for a new file/directory (respectively),
- set the bit according to the archive, or
- use the bit from ``rw-r--r--`` (``0o644``) for non-executable files or
``rwxr-xr-x`` (``0o755``) for executable files and directories.
- High ``mode`` bits (setuid, setgid, sticky) MUST be cleared.
- It is RECOMMENDED to preserve the user *executable* bit.
Further hints
-------------
Tool authors are encouraged to consider how *hints for further
verification* in ``tarfile`` documentation apply for their tool.
Backwards Compatibility
=======================
The existing behaviour is unspecified, and treated differently by different
tools.
This PEP makes the expectations explicit.
There is no known case of backwards incompatibility, but some project out there
probably does rely on details that aren't guaranteed.
This PEP bans the most dangerous of those features, and the rest is
made tool-specific.
Security Implications
=====================
The recommended ``data`` filter is believed safe against common exploits,
and is a single place to amend if flaws are found in the future.
The explicit specification includes protections from the ``data`` filter.
How to Teach This
=================
The PEP is aimed at authors of packaging tools, who should be fine with
a PEP and an updated packaging spec.
Reference Implementation
========================
TBD
Rejected Ideas
==============
None yet.
Open Issues
===========
None yet.
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.