PEP 721 (new): Using tarfile.data_filter for source distribution extraction (#3198)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
This commit is contained in:
parent
08124264e6
commit
b91880181b
|
@ -597,6 +597,8 @@ pep-0713.rst @ambv
|
|||
pep-0714.rst @dstufft
|
||||
pep-0715.rst @dstufft
|
||||
pep-0719.rst @Yhg1s
|
||||
# pep-0720.rst (reserved for https://github.com/python/peps/pull/3192)
|
||||
pep-0721.rst @encukou
|
||||
# ...
|
||||
# pep-0754.txt
|
||||
# ...
|
||||
|
|
|
@ -0,0 +1,203 @@
|
|||
PEP: 721
|
||||
Title: Using tarfile.data_filter for source distribution extraction
|
||||
Author: Petr Viktorin <encukou@gmail.com>
|
||||
PEP-Delegate: Paul Moore <p.f.moore@gmail.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Topic: Packaging
|
||||
Content-Type: text/x-rst
|
||||
Requires: 706
|
||||
Created: 12-Jul-2023
|
||||
Python-Version: 3.12
|
||||
Post-History: `04-Jul-2023 <https://discuss.python.org/t/28928>`__,
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
Extracting a source distribution archive should normally use the ``data``
|
||||
filter added in :pep:`706`.
|
||||
We clarify details, and specify the behaviour for tools that cannot use the
|
||||
filter directly.
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
The *source distribution* ``sdist`` is defined as a tar archive.
|
||||
|
||||
The ``tar`` format is designed to capture all metadata of Unix-like files.
|
||||
Some of these are dangerous, unnecessary for source code, and/or
|
||||
platform-dependent.
|
||||
As explained in :pep:`706`, when extracting a tarball, one should always either
|
||||
limit the allowed features, or explicitly give the tarball total control.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
For source distributions, the ``data`` filter introduced in :pep:`706`
|
||||
is enough. It allows slightly more features than ``git`` and ``zip`` (both
|
||||
commonly used in packaging workflows).
|
||||
|
||||
However, not all tools can use the ``data`` filter,
|
||||
so this PEP specifies an explicit set of expectations.
|
||||
The aim is that the current behaviour of ``pip download``
|
||||
and ``setuptools.archive_util.unpack_tarfile`` is valid,
|
||||
except cases deemed too dangerous to allow.
|
||||
Another consideration is ease of implementation for non-Python tools.
|
||||
|
||||
|
||||
Unpatched versions of Python
|
||||
----------------------------
|
||||
|
||||
Tools are allowed to ignore this PEP when running on Python pithout tarfile
|
||||
filters.
|
||||
|
||||
The feature has been backported to all versions of Python supported by
|
||||
``python.org``. Vendoring it in third-party libraries is tricky,
|
||||
and we should not force all tools to do so.
|
||||
This shifts the responsibility to keep up with security updates from the tools
|
||||
to the users.
|
||||
|
||||
|
||||
Permissions
|
||||
-----------
|
||||
|
||||
Common tools (``git``, ``zip``) don't preserve Unix permissions (mode bits).
|
||||
Telling users to not rely on them in *sdists*, and allowing tools to handle
|
||||
them relatively freely, seems fair.
|
||||
|
||||
The only exception is the *executable* permission.
|
||||
We recommend, but not require, that tools preserve it.
|
||||
Given that scripts are generally platform-specific, it seems fitting to
|
||||
say that keeping them executable is tool-specific behaviour.
|
||||
|
||||
Note that while ``git`` preserves executability, ``zip`` (and thus ``wheel``)
|
||||
doesn't do it natively. (It is possible to encode it in “external attributes”,
|
||||
but Python's ``ZipFile.extract`` does not honour that.)
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
The following will be added to `the PyPA source distribution format spec <https://packaging.python.org/en/latest/specifications/source-distribution-format/>`_
|
||||
under a new heading, “*Source distribution archive features*”:
|
||||
|
||||
Because extracting tar files as-is is dangerous, and the results are
|
||||
platform-specific, archive features of source distributions are limited.
|
||||
|
||||
Unpacking with the data filter
|
||||
------------------------------
|
||||
|
||||
When extracting a source distribution, tools MUST either use
|
||||
``tarfile.data_filter`` (e.g. ``TarFile.extractall(..., filter='data')``), OR
|
||||
follow the *Unpacking without the data filter* section below.
|
||||
|
||||
As an exception, on Python interpreters without ``hasattr(tarfile, 'data_filter')``
|
||||
(:pep:`706`), tools that normally use that filter (directly on indirectly)
|
||||
MAY warn the user and ignore this specification.
|
||||
The trade-off between usability (e.g. fully trusting the archive) and
|
||||
security (e.g. refusing to unpack) is left up to the tool in this case.
|
||||
|
||||
|
||||
Unpacking without the data filter
|
||||
---------------------------------
|
||||
|
||||
Tools that do not use the ``data`` filter directly (e.g. for backwards
|
||||
compatibility, allowing additional features, or not using Python) MUST follow
|
||||
this section.
|
||||
(At the time of this writing, the ``data`` filter also follows this section,
|
||||
but it may get out of sync in the future.)
|
||||
|
||||
The following files are invalid in an ``sdist`` archive.
|
||||
Upon encountering such an entry, tools SHOULD notify the user,
|
||||
MUST NOT unpack the entry, and MAY abort with a failure:
|
||||
|
||||
- Files that would be placed outside the destination directory.
|
||||
- Links (symbolic or hard) pointing outside the destination directory.
|
||||
- Device files (including pipes).
|
||||
|
||||
The following are also invalid. Tools MAY treat them as above,
|
||||
but are NOT REQUIRED to do so:
|
||||
|
||||
- Files with a ``..`` component in the filename or link target.
|
||||
- Links pointing to a file that is not part of the archive.
|
||||
|
||||
Tools MAY unpack links (symbolic or hard) as regular files,
|
||||
using content from the archive.
|
||||
|
||||
When extracting ``sdist`` archives:
|
||||
|
||||
- Leading slashes in file names MUST be dropped.
|
||||
(This is nowadays standard behaviour for ``tar`` unpacking.)
|
||||
- For each ``mode`` (Unix permission) bit, tools MUST either:
|
||||
|
||||
- use the platform's default for a new file/directory (respectively),
|
||||
- set the bit according to the archive, or
|
||||
- use the bit from ``rw-r--r--`` (``0o644``) for non-executable files or
|
||||
``rwxr-xr-x`` (``0o755``) for executable files and directories.
|
||||
|
||||
- High ``mode`` bits (setuid, setgid, sticky) MUST be cleared.
|
||||
- It is RECOMMENDED to preserve the user *executable* bit.
|
||||
|
||||
|
||||
Further hints
|
||||
-------------
|
||||
|
||||
Tool authors are encouraged to consider how *hints for further
|
||||
verification* in ``tarfile`` documentation apply for their tool.
|
||||
|
||||
|
||||
Backwards Compatibility
|
||||
=======================
|
||||
|
||||
The existing behaviour is unspecified, and treated differently by different
|
||||
tools.
|
||||
This PEP makes the expectations explicit.
|
||||
|
||||
There is no known case of backwards incompatibility, but some project out there
|
||||
probably does rely on details that aren't guaranteed.
|
||||
This PEP bans the most dangerous of those features, and the rest is
|
||||
made tool-specific.
|
||||
|
||||
|
||||
Security Implications
|
||||
=====================
|
||||
|
||||
The recommended ``data`` filter is believed safe against common exploits,
|
||||
and is a single place to amend if flaws are found in the future.
|
||||
|
||||
The explicit specification includes protections from the ``data`` filter.
|
||||
|
||||
|
||||
How to Teach This
|
||||
=================
|
||||
|
||||
The PEP is aimed at authors of packaging tools, who should be fine with
|
||||
a PEP and an updated packaging spec.
|
||||
|
||||
|
||||
Reference Implementation
|
||||
========================
|
||||
|
||||
TBD
|
||||
|
||||
|
||||
Rejected Ideas
|
||||
==============
|
||||
|
||||
None yet.
|
||||
|
||||
|
||||
Open Issues
|
||||
===========
|
||||
|
||||
None yet.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document is placed in the public domain or under the
|
||||
CC0-1.0-Universal license, whichever is more permissive.
|
Loading…
Reference in New Issue