PEP 625: Update following discussions (GH-2671)

This commit is contained in:
Paul Moore 2022-07-09 11:27:18 +01:00 committed by GitHub
parent ed9d65c1d4
commit 43782e9251
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 102 additions and 61 deletions

View File

@ -1,9 +1,9 @@
PEP: 625 PEP: 625
Title: File name of a Source Distribution Title: Filename of a Source Distribution
Author: Tzu-ping Chung <uranusjr@gmail.com>, Author: Tzu-ping Chung <uranusjr@gmail.com>,
Paul Moore <p.f.moore@gmail.com> Paul Moore <p.f.moore@gmail.com>
Discussions-To: https://discuss.python.org/t/draft-pep-file-name-of-a-source-distribution/4686 Discussions-To: https://discuss.python.org/t/draft-pep-file-name-of-a-source-distribution/4686
Status: Draft Status: Deferred
Type: Standards Track Type: Standards Track
Topic: Packaging Topic: Packaging
Content-Type: text/x-rst Content-Type: text/x-rst
@ -14,14 +14,17 @@ Abstract
======== ========
This PEP describes a standard naming scheme for a Source Distribution, also This PEP describes a standard naming scheme for a Source Distribution, also
known as an *sdist*. This scheme distinguishes an sdist from an arbitrary known as an *sdist*. An sdist is distinct from an arbitrary archive file
archive file containing source code of Python packages, and can be used to containing source code of Python packages, and can be used to communicate
communicate information about the distribution to packaging tools. information about the distribution to packaging tools.
A standard sdist specified here is a gzipped tar file with a specially A standard sdist specified here is a gzipped tar file with a specially
formatted file stem and a ``.sdist`` suffix. This PEP does not specify the formatted filename and the usual ``.tar.gz`` suffix. This PEP does not specify
contents of the tarball. the contents of the tarball, as that is covered in other specifications.
**Note**: This PEP has been deferred until :pep:`643` has seen wider adoption
(in particular, until Metadata 2.2 is accepted on PyPI, and a number of common
backends have implemented it).
Motivation Motivation
========== ==========
@ -32,24 +35,25 @@ installation. This format is often considered as an unbuilt counterpart of a
:pep:`427` wheel, and given special treatments in various parts of the :pep:`427` wheel, and given special treatments in various parts of the
packaging ecosystem. packaging ecosystem.
Compared to wheel, however, the sdist is entirely unspecified, and currently The content of an sdist is specified in :pep:`517` and :pep:`643`, but currently
works by convention. The widely accepted format of an sdist is defined by the the filename of the sdist is incompletely specified, meaning that consumers
implementation of distutils and setuptools, which creates a source code of the format must download and process the sdist to confirm the name and
archive in a predictable format and file name scheme. Installers exploit this version of the distribution included within.
predictability to assign this format certain contextual information that helps
the installation process. pip, for example, parses the file name of an sdist Installers currently rely on heuristics to infer the name and/or version from
from a :pep:`503` index, to obtain the distribution's project name and version the filename, to help the installation process. pip, for example, parses the
for dependency resolution purposes. But due to the lack of specification, filename of an sdist from a :pep:`503` index, to obtain the distribution's
the installer does not have any guarantee as to the correctness of the inferred project name and version for dependency resolution purposes. But due to the
message, and must verify it at some point by locally building the distribution lack of specification, the installer does not have any guarantee as to the
metadata. correctness of the inferred data, and must verify it at some point by locally
building the distribution metadata.
This build step is awkward for a certain class of operations, when the user This build step is awkward for a certain class of operations, when the user
does not expect the build process to occur. `pypa/pip#8387`_ describes an does not expect the build process to occur. `pypa/pip#8387`_ describes an
example. The command ``pip download --no-deps --no-binary=numpy numpy`` is example. The command ``pip download --no-deps --no-binary=numpy numpy`` is
expected to only download an sdist for numpy, since we do not need to check expected to only download an sdist for numpy, since we do not need to check
for dependencies, and both the name and version are available by introspecting for dependencies, and both the name and version are available by introspecting
the downloaded file name. pip, however, cannot assume the downloaded archive the downloaded filename. pip, however, cannot assume the downloaded archive
follows the convention, and must build and check the metadata. For a :pep:`518` follows the convention, and must build and check the metadata. For a :pep:`518`
project, this means running the ``prepare_metadata_for_build_wheel`` hook project, this means running the ``prepare_metadata_for_build_wheel`` hook
specified in :pep:`517`, which incurs significant overhead. specified in :pep:`517`, which incurs significant overhead.
@ -58,78 +62,102 @@ specified in :pep:`517`, which incurs significant overhead.
Rationale Rationale
========= =========
By creating a special file name scheme for the sdist format, this PEP frees up By creating a special filename scheme for the sdist format, this PEP frees up
tools from the time-consuming metadata verification step when they only need tools from the time-consuming metadata verification step when they only need
the metadata available in the file name. the metadata available in the filename.
This PEP also serves as the formal specification to the long-standing This PEP also serves as the formal specification to the long-standing
file name convention used by the current sdist implementations. The file name filename convention used by the current sdist implementations. The filename
contains the distribution name and version, to aid tools identifying a contains the distribution name and version, to aid tools identifying a
distribution without needing to download, unarchive the file, and perform distribution without needing to download, unarchive the file, and perform
costly metadata generation for introspection, if all the information they need costly metadata generation for introspection, if all the information they need
is available in the file name. is available in the filename.
Specification Specification
============= =============
The name of an sdist should be ``{distribution}-{version}.sdist``. The name of an sdist should be ``{distribution}-{version}.tar.gz``.
* ``distribution`` is the name of the distribution as defined in :pep:`345`, * ``distribution`` is the name of the distribution as defined in :pep:`345`,
and normalised according to :pep:`503`, e.g. ``'pip'``, ``'flit-core'``. and normalised as described in `the wheel spec`_ e.g. ``'pip'``,
``'flit_core'``.
* ``version`` is the version of the distribution as defined in :pep:`440`, * ``version`` is the version of the distribution as defined in :pep:`440`,
e.g. ``20.2``. e.g. ``20.2``, and normalised according to the rules in that PEP.
Each component is escaped according to the same rules as :pep:`427`. An sdist must be a gzipped tar archive in pax format, that is able to be
extracted by the standard library ``tarfile`` module with the open flag
``'r:gz'``.
An sdist must be a gzipped tar archive that is able to be extracted by the Code that produces an sdist file MUST give the file a name that matches this
standard library ``tarfile`` module with the open flag ``'r:gz'``. specification. The specification of the ``build_sdist`` hook from :pep:`517` is
extended to require this naming convention.
Code that processes sdist files MAY determine the distribution name and version
by simply parsing the filename, and is not required to verify that information
by generating or reading the metadata from the sdist contents.
Conforming sdist files can be recognised by the presence of the ``.tar.gz``
suffix and a *single* hyphen in the filename. Note that some legacy files may
also match these criteria, but this is not expected to be an issue in practice.
See the "Backwards Compatibility" section of this document for more details.
Backwards Compatibility Backwards Compatibility
======================= =======================
The new file name scheme should not incur backwards incompatibility in The new filename scheme is a subset of the current informal naming
existing tools. Installers are likely to have already implemented logic to convention for sdist files, so tools that create or publish files conforming
exclude extensions they do not understand, since they already need to deal to this standard will be readable by older tools that only understand the
with legacy formats on PyPI such as ``.rpm`` and ``.egg``. They should be able previous naming conventions.
to correctly ignore files with extension ``.sdist``.
pip, for example, skips this extension with the following debug message:: Tools that consume sdist filenames would technically not be able to determine
whether a file is using the new standard or a legacy form. However, a review
of the filenames on PyPI determined that 37% of files are obviously legacy
(because they contain multiple or no hyphens) and of the remainder, parsing
according to this PEP gives the correct answer in all but 0.004% of cases.
Skipping link: unsupported archive format: sdist: <URL to file> Currently, tools that consume sdists should, if they are to be fully correct,
treat the name and version parsed from the filename as provisional, and verify
While setuptools ignores it silently. them by downloading the file and generating the actual metadata (or reading it,
if the sdist conforms to :pep:`643`). Tools supporting this specification can
treat the name and version from the filename as definitive. In theory, this
could risk mistakes if a legacy filename is assumed to conform to this PEP,
but in practice the chance of this appears to be vanishingly small.
Rejected Ideas Rejected Ideas
============== ==============
Create specification for sdist metadata Rely on the specification for sdist metadata
--------------------------------------- --------------------------------------------
The topic of creating a trustworthy, standard sdist metadata format as a means Since this PEP was first written, :pep:`643` has been accepted, defining a
to distinguish sdists from arbitrary archive files has been raised and trustworthy, standard sdist metadata format. This allows distribution metadata
discussed multiple times, but has yet to make significant progress due to (and in particular name and version) to be determined statically.
the complexity of potential metadata inconsistency between an sdist and a
wheel built from it.
This PEP does not exclude the possibility of creating a metadata specification This is not considered sufficient, however, as in a number of significant
for sdists in the future. But by specifying only the file name of an sdist, a cases (for example, reading filenames from a package index) the application
tool can reliably identify an sdist, and perform useful introspection on its only has access to the filename, and reading metadata would involve a
identity, without going into the details required for metadata specification. potentially costly download.
Use a currently common sdist naming scheme Use a dedicated file extension
------------------------------------------ ------------------------------
There is a currently established practice to name an sdist in the format of The original version of this PEP proposed a filename of
``{distribution}-{version}.[tar.gz|zip]``. ``{distribution}-{version}.sdist``. This has the advantage of being explicit,
as well as allowing a future change to the storage format without needing a
further change of the file naming convention.
Popular source code management services use a similar scheme to name the However, there are significant compatibility issues with a new extension. Index
downloaded source archive. GitHub, for example, uses ``distribution-1.0.zip`` servers may currently disallow unknown extensions, and if we introduced a new
as the archive name containing source code of repository ``distribution`` on one, it is not clear how to handle cases like a legacy index trying to mirror an
branch ``1.0``. Giving this scheme a special meaning would cause confusion index that hosts new-style sdists. Is it acceptable to only partially mirror,
since a source archive may not a valid sdist. omitting sdists for newer versions of projects? Also, build backends that produce
the new format would be incompaible with index servers that only accept the old
format, and as there is often no way for a user to request an older version of a
backend when doing a build, this could make it impossible to build and upload
sdists.
Augment a currently common sdist naming scheme Augment a currently common sdist naming scheme
---------------------------------------------- ----------------------------------------------
@ -141,15 +169,28 @@ parse ``distribution-1.0.sdist.tar.gz`` as project ``distribution`` with
version ``1.0.sdist``. This would cause the sdist to be downloaded, but fail to version ``1.0.sdist``. This would cause the sdist to be downloaded, but fail to
install due to inconsistent metadata. install due to inconsistent metadata.
The same problem exists for all common archive suffixes. To avoid confusing The main advantage of this proposal was that it is easier for tools to
old installers, the sdist scheme must use a suffix that they do not identify recognise the new-style naming. But this is not a particularly significant
as an archive. benefit, given that all sdists with a single hyphen in the name are parsed
the same way under the old and new rules.
Open Issues
===========
The contents of an sdist are required to contain a single top-level directory
named ``{name}-{version}``. Currently no normalisation rules are required
for the components of this name. Should this PEP require that the same normalisation
rules are applied here as for the filename? Note that in practice, it is likely
that tools will create the two names using the same code, so normalisation is
likely to happen naturally, even if it is not explicitly required.
References References
========== ==========
.. _`pypa/pip#8387`: https://github.com/pypa/pip/issues/8387 .. _`pypa/pip#8387`: https://github.com/pypa/pip/issues/8387
.. _`the wheel spec`: https://packaging.python.org/en/latest/specifications/binary-distribution-format/
Copyright Copyright