PEP: 625 Title: Filename of a Source Distribution Author: Tzu-ping Chung , Paul Moore PEP-Delegate: Pradyun Gedam Discussions-To: https://discuss.python.org/t/draft-pep-file-name-of-a-source-distribution/4686 Status: Draft Type: Standards Track Topic: Packaging Content-Type: text/x-rst Created: 08-Jul-2020 Post-History: 08-Jul-2020 Abstract ======== This PEP describes a standard naming scheme for a Source Distribution, also known as an *sdist*. An sdist is distinct from an arbitrary archive file containing source code of Python packages, and can be used to communicate information about the distribution to packaging tools. A standard sdist specified here is a gzipped tar file with a specially formatted filename and the usual ``.tar.gz`` suffix. This PEP does not specify the contents of the tarball, as that is covered in other specifications. Motivation ========== An sdist is a Python package distribution that contains "source code" of the Python package, and requires a build step to be turned into a wheel on installation. This format is often considered as an unbuilt counterpart of a :pep:`427` wheel, and given special treatments in various parts of the packaging ecosystem. The content of an sdist is specified in :pep:`517` and :pep:`643`, but currently the filename of the sdist is incompletely specified, meaning that consumers of the format must download and process the sdist to confirm the name and version of the distribution included within. Installers currently rely on heuristics to infer the name and/or version from the filename, to help the installation process. pip, for example, parses the filename of an sdist from a :pep:`503` index, to obtain the distribution's project name and version for dependency resolution purposes. But due to the lack of specification, the installer does not have any guarantee as to the correctness of the inferred data, and must verify it at some point by locally building the distribution metadata. This build step is awkward for a certain class of operations, when the user does not expect the build process to occur. `pypa/pip#8387`_ describes an example. The command ``pip download --no-deps --no-binary=numpy numpy`` is expected to only download an sdist for numpy, since we do not need to check for dependencies, and both the name and version are available by introspecting the downloaded filename. pip, however, cannot assume the downloaded archive follows the convention, and must build and check the metadata. For a :pep:`518` project, this means running the ``prepare_metadata_for_build_wheel`` hook specified in :pep:`517`, which incurs significant overhead. Rationale ========= By creating a special filename scheme for the sdist format, this PEP frees up tools from the time-consuming metadata verification step when they only need the metadata available in the filename. This PEP also serves as the formal specification to the long-standing filename convention used by the current sdist implementations. The filename contains the distribution name and version, to aid tools identifying a distribution without needing to download, unarchive the file, and perform costly metadata generation for introspection, if all the information they need is available in the filename. Specification ============= The name of an sdist should be ``{distribution}-{version}.tar.gz``. * ``distribution`` is the name of the distribution as defined in :pep:`345`, and normalised as described in `the wheel spec`_ e.g. ``'pip'``, ``'flit_core'``. * ``version`` is the version of the distribution as defined in :pep:`440`, e.g. ``20.2``, and normalised according to the rules in that PEP. An sdist must be a gzipped tar archive in pax format, that is able to be extracted by the standard library ``tarfile`` module with the open flag ``'r:gz'``. Code that produces an sdist file MUST give the file a name that matches this specification. The specification of the ``build_sdist`` hook from :pep:`517` is extended to require this naming convention. Code that processes sdist files MAY determine the distribution name and version by simply parsing the filename, and is not required to verify that information by generating or reading the metadata from the sdist contents. Conforming sdist files can be recognised by the presence of the ``.tar.gz`` suffix and a *single* hyphen in the filename. Note that some legacy files may also match these criteria, but this is not expected to be an issue in practice. See the "Backwards Compatibility" section of this document for more details. Backwards Compatibility ======================= The new filename scheme is a subset of the current informal naming convention for sdist files, so tools that create or publish files conforming to this standard will be readable by older tools that only understand the previous naming conventions. Tools that consume sdist filenames would technically not be able to determine whether a file is using the new standard or a legacy form. However, a review of the filenames on PyPI determined that 37% of files are obviously legacy (because they contain multiple or no hyphens) and of the remainder, parsing according to this PEP gives the correct answer in all but 0.004% of cases. Currently, tools that consume sdists should, if they are to be fully correct, treat the name and version parsed from the filename as provisional, and verify them by downloading the file and generating the actual metadata (or reading it, if the sdist conforms to :pep:`643`). Tools supporting this specification can treat the name and version from the filename as definitive. In theory, this could risk mistakes if a legacy filename is assumed to conform to this PEP, but in practice the chance of this appears to be vanishingly small. Rejected Ideas ============== Rely on the specification for sdist metadata -------------------------------------------- Since this PEP was first written, :pep:`643` has been accepted, defining a trustworthy, standard sdist metadata format. This allows distribution metadata (and in particular name and version) to be determined statically. This is not considered sufficient, however, as in a number of significant cases (for example, reading filenames from a package index) the application only has access to the filename, and reading metadata would involve a potentially costly download. Use a dedicated file extension ------------------------------ The original version of this PEP proposed a filename of ``{distribution}-{version}.sdist``. This has the advantage of being explicit, as well as allowing a future change to the storage format without needing a further change of the file naming convention. However, there are significant compatibility issues with a new extension. Index servers may currently disallow unknown extensions, and if we introduced a new one, it is not clear how to handle cases like a legacy index trying to mirror an index that hosts new-style sdists. Is it acceptable to only partially mirror, omitting sdists for newer versions of projects? Also, build backends that produce the new format would be incompaible with index servers that only accept the old format, and as there is often no way for a user to request an older version of a backend when doing a build, this could make it impossible to build and upload sdists. Augment a currently common sdist naming scheme ---------------------------------------------- A scheme ``{distribution}-{version}.sdist.tar.gz`` was raised during the initial discussion. This was abandoned due to backwards compatibility issues with currently available installation tools. pip 20.1, for example, would parse ``distribution-1.0.sdist.tar.gz`` as project ``distribution`` with version ``1.0.sdist``. This would cause the sdist to be downloaded, but fail to install due to inconsistent metadata. The main advantage of this proposal was that it is easier for tools to recognise the new-style naming. But this is not a particularly significant benefit, given that all sdists with a single hyphen in the name are parsed the same way under the old and new rules. Open Issues =========== The contents of an sdist are required to contain a single top-level directory named ``{name}-{version}``. Currently no normalisation rules are required for the components of this name. Should this PEP require that the same normalisation rules are applied here as for the filename? Note that in practice, it is likely that tools will create the two names using the same code, so normalisation is likely to happen naturally, even if it is not explicitly required. References ========== .. _`pypa/pip#8387`: https://github.com/pypa/pip/issues/8387 .. _`the wheel spec`: https://packaging.python.org/en/latest/specifications/binary-distribution-format/ Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: