PEP to specify a new sdist file naming scheme (#1505)

This commit is contained in:
Tzu-ping Chung 2020-07-08 17:56:33 +08:00 committed by GitHub
parent 501ba2d69b
commit 4d21729080
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 168 additions and 0 deletions

168
pep-0625.rst Normal file
View File

@ -0,0 +1,168 @@
PEP: 625
Title: File name of a Source Distribution
Author: Tzu-ping Chung <uranusjr@gmail.com>,
Paul Moore <p.f.moore@gmail.com>
Discussions-To: https://discuss.python.org/t/draft-pep-file-name-of-a-source-distribution/4686
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 08-Jul-2020
Post-History: 08-Jul-2020
Abstract
========
This PEP describes a standard naming scheme for a Source Distribution, also
known as an *sdist*. This scheme distinguishes an sdist from an arbitrary
archive file containing source code of Python packages, and can be used to
communicate information about the distribution to packaging tools.
A standard sdist specified here is a gzipped tar file with a specially
formatted file stem and a ``.sdist`` suffix. This PEP does not specify the
contents of the tarball.
Motivation
==========
An sdist is a Python package distribution that contains "source code" of the
Python package, and requires a build step to be turned into a wheel on
installation. This format is often considered as an unbuilt counterpart of a
:pep:`427` wheel, and given special treatments in various parts of the
packaging ecosystem.
Compared to wheel, however, the sdist is entirely unspecified, and currently
works by convention. The widely accepted format of an sdist is defined by the
implementation of distutils and setuptools, which creates a source code
archive in a predictable format and file name scheme. Installers exploit this
predictability to assign this format certain contextual information that helps
the installation process. pip, for example, parses the file name of an sdist
from a :pep:`503` index, to obtain the distribution's project name and version
for dependency resolution purposes. But due to the lack of specification,
the installer does not have any guarantee to the correctness of the inferred
message, and must verify it at some point by locally building the distribution
metadata.
This build step is awkward for a certin class of operations, when the user
does not expect the build process to occur. `pypa/pip#8387`_ describes an
example. The command ``pip download --no-deps --no-binary=numpy numpy`` is
expected to only download an sdist for numpy, since we do not need to check
for dependencies, and both the name and version are available by introspecting
the downloaded file name. pip, however, cannot assume the downloaded archive
follows the convention, and must build check the metadata. For a :pep:`518`
project, this means running the ``prepare_metadata_for_build_wheel`` hook
specified in :pep:`517`, which incurs significant overhead.
Rationale
=========
By creating a special file name scheme for the sdist format, this PEP frees up
tools from the time-consuming metadata verification step when they only need
the metadata available in the file name.
This PEP also serves as the formal specification to the long-standing
file name convention used by the current sdist implementations. The file name
contains the distribution name and version, to aid tools identifying a
distribution without needing to download, unarchieve the file, and perform
costly metadata generation for introspection, if all the information they need
are available in the file name.
Specification
=============
The name of an sdist should be ``{distribution}-{version}.sdist``.
* ``distribution`` is the name of the distribution as defined in :pep:`345`,
and normalised according to :pep:`503`, e.g. ``'pip'``, ``'flit-core'``.
* ``version`` is the version of the distribution as defined in :pep:`440`,
e.g. ``20.2``.
Each component is escaped according to the same rules as :pep:`427`.
An sdist must be a gzipped tar archive that is able to be extracted by the
standard library ``tarfile`` module with the open flag ``'r:gz'``.
Backwards Compatibility
=======================
The new file name scheme should not incur backwards incompatibility in
existing tools. Installers are likely to have already implemented logic to
exclude extensions they do not understand, since they already need to deal
with legacy formats on PyPI such as ``.rpm`` and ``.egg``. They should be able
to correctly ignore files with extension ``.sdist``.
pip, for example, skips this extension with the following debug message::
Skipping link: unsupported archive format: sdist: <URL to file>
While setuptools ignores it silently.
Rejected Ideas
==============
Create specification for sdist metadata
---------------------------------------
The topic of creating a trustworthy, standard sdist metadata format as a mean
to distinguish sdists from arbitrary archive files has been raised and
discussed multiple times, but has yet to make significant progress due to
the complexity of potential metadata inconsistency between an sdist and a
wheel built from it.
This PEP does not exclude the possibility of creating a metadata specification
for sdists in the future. But by specifying only the file name of an sdist, a
tool can reliably identify an sdist, and perform useful introspection to its
identity, without going into the details required for metadata specification.
Use a currently common sdist naming scheme
------------------------------------------
There is a currently established practice to name an sdist in the format of
``{distribution}-{version}.[tar.gz|zip]``.
Popular source code management services use a similar scheme to name the
downloaded source archive. GitHub, for example, uses ``distribution-1.0.zip``
as the arhieve name containing source code of repository ``distribution`` on
branch ``1.0``. Giving this scheme a special meaning would cause confusion
since a source archive may not a valid sdist.
Augment a currently common sdist naming scheme
----------------------------------------------
A scheme ``{distribution}-{version}.sdist.tar.gz`` was raised during the
initial discussion. This was abandoned due to backwards compatibility issues
with currently available installation tools. pip 20.1, for example, would
parse ``distribution-1.0.sdist.tar.gz`` as project ``distribution`` with
version ``1.0.sdist``. This would cause the sdist be downloaded, but fail to
install due to inconsistent metadata.
The same problem exists for all common archive suffixes. To avoid confusing
old installers, the sdist scheme must use a suffix that they do not identify
as an archive.
References
==========
.. _`pypa/pip#8387`: https://github.com/pypa/pip/issues/8387
Copyright
=========
This document is placed in the public domain or under the CC0-1.0-Universal
license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: