diff --git a/pep-0625.rst b/pep-0625.rst index 499011d57..cee3ca9ba 100644 --- a/pep-0625.rst +++ b/pep-0625.rst @@ -1,9 +1,9 @@ PEP: 625 -Title: File name of a Source Distribution +Title: Filename of a Source Distribution Author: Tzu-ping Chung , Paul Moore Discussions-To: https://discuss.python.org/t/draft-pep-file-name-of-a-source-distribution/4686 -Status: Draft +Status: Deferred Type: Standards Track Topic: Packaging Content-Type: text/x-rst @@ -14,14 +14,17 @@ Abstract ======== This PEP describes a standard naming scheme for a Source Distribution, also -known as an *sdist*. This scheme distinguishes an sdist from an arbitrary -archive file containing source code of Python packages, and can be used to -communicate information about the distribution to packaging tools. +known as an *sdist*. An sdist is distinct from an arbitrary archive file +containing source code of Python packages, and can be used to communicate +information about the distribution to packaging tools. A standard sdist specified here is a gzipped tar file with a specially -formatted file stem and a ``.sdist`` suffix. This PEP does not specify the -contents of the tarball. +formatted filename and the usual ``.tar.gz`` suffix. This PEP does not specify +the contents of the tarball, as that is covered in other specifications. +**Note**: This PEP has been deferred until :pep:`643` has seen wider adoption +(in particular, until Metadata 2.2 is accepted on PyPI, and a number of common +backends have implemented it). Motivation ========== @@ -32,24 +35,25 @@ installation. This format is often considered as an unbuilt counterpart of a :pep:`427` wheel, and given special treatments in various parts of the packaging ecosystem. -Compared to wheel, however, the sdist is entirely unspecified, and currently -works by convention. The widely accepted format of an sdist is defined by the -implementation of distutils and setuptools, which creates a source code -archive in a predictable format and file name scheme. Installers exploit this -predictability to assign this format certain contextual information that helps -the installation process. pip, for example, parses the file name of an sdist -from a :pep:`503` index, to obtain the distribution's project name and version -for dependency resolution purposes. But due to the lack of specification, -the installer does not have any guarantee as to the correctness of the inferred -message, and must verify it at some point by locally building the distribution -metadata. +The content of an sdist is specified in :pep:`517` and :pep:`643`, but currently +the filename of the sdist is incompletely specified, meaning that consumers +of the format must download and process the sdist to confirm the name and +version of the distribution included within. + +Installers currently rely on heuristics to infer the name and/or version from +the filename, to help the installation process. pip, for example, parses the +filename of an sdist from a :pep:`503` index, to obtain the distribution's +project name and version for dependency resolution purposes. But due to the +lack of specification, the installer does not have any guarantee as to the +correctness of the inferred data, and must verify it at some point by locally +building the distribution metadata. This build step is awkward for a certain class of operations, when the user does not expect the build process to occur. `pypa/pip#8387`_ describes an example. The command ``pip download --no-deps --no-binary=numpy numpy`` is expected to only download an sdist for numpy, since we do not need to check for dependencies, and both the name and version are available by introspecting -the downloaded file name. pip, however, cannot assume the downloaded archive +the downloaded filename. pip, however, cannot assume the downloaded archive follows the convention, and must build and check the metadata. For a :pep:`518` project, this means running the ``prepare_metadata_for_build_wheel`` hook specified in :pep:`517`, which incurs significant overhead. @@ -58,78 +62,102 @@ specified in :pep:`517`, which incurs significant overhead. Rationale ========= -By creating a special file name scheme for the sdist format, this PEP frees up +By creating a special filename scheme for the sdist format, this PEP frees up tools from the time-consuming metadata verification step when they only need -the metadata available in the file name. +the metadata available in the filename. This PEP also serves as the formal specification to the long-standing -file name convention used by the current sdist implementations. The file name +filename convention used by the current sdist implementations. The filename contains the distribution name and version, to aid tools identifying a distribution without needing to download, unarchive the file, and perform costly metadata generation for introspection, if all the information they need -is available in the file name. +is available in the filename. Specification ============= -The name of an sdist should be ``{distribution}-{version}.sdist``. +The name of an sdist should be ``{distribution}-{version}.tar.gz``. * ``distribution`` is the name of the distribution as defined in :pep:`345`, - and normalised according to :pep:`503`, e.g. ``'pip'``, ``'flit-core'``. + and normalised as described in `the wheel spec`_ e.g. ``'pip'``, + ``'flit_core'``. * ``version`` is the version of the distribution as defined in :pep:`440`, - e.g. ``20.2``. + e.g. ``20.2``, and normalised according to the rules in that PEP. -Each component is escaped according to the same rules as :pep:`427`. +An sdist must be a gzipped tar archive in pax format, that is able to be +extracted by the standard library ``tarfile`` module with the open flag +``'r:gz'``. -An sdist must be a gzipped tar archive that is able to be extracted by the -standard library ``tarfile`` module with the open flag ``'r:gz'``. +Code that produces an sdist file MUST give the file a name that matches this +specification. The specification of the ``build_sdist`` hook from :pep:`517` is +extended to require this naming convention. + +Code that processes sdist files MAY determine the distribution name and version +by simply parsing the filename, and is not required to verify that information +by generating or reading the metadata from the sdist contents. + +Conforming sdist files can be recognised by the presence of the ``.tar.gz`` +suffix and a *single* hyphen in the filename. Note that some legacy files may +also match these criteria, but this is not expected to be an issue in practice. +See the "Backwards Compatibility" section of this document for more details. Backwards Compatibility ======================= -The new file name scheme should not incur backwards incompatibility in -existing tools. Installers are likely to have already implemented logic to -exclude extensions they do not understand, since they already need to deal -with legacy formats on PyPI such as ``.rpm`` and ``.egg``. They should be able -to correctly ignore files with extension ``.sdist``. +The new filename scheme is a subset of the current informal naming +convention for sdist files, so tools that create or publish files conforming +to this standard will be readable by older tools that only understand the +previous naming conventions. -pip, for example, skips this extension with the following debug message:: +Tools that consume sdist filenames would technically not be able to determine +whether a file is using the new standard or a legacy form. However, a review +of the filenames on PyPI determined that 37% of files are obviously legacy +(because they contain multiple or no hyphens) and of the remainder, parsing +according to this PEP gives the correct answer in all but 0.004% of cases. - Skipping link: unsupported archive format: sdist: - -While setuptools ignores it silently. +Currently, tools that consume sdists should, if they are to be fully correct, +treat the name and version parsed from the filename as provisional, and verify +them by downloading the file and generating the actual metadata (or reading it, +if the sdist conforms to :pep:`643`). Tools supporting this specification can +treat the name and version from the filename as definitive. In theory, this +could risk mistakes if a legacy filename is assumed to conform to this PEP, +but in practice the chance of this appears to be vanishingly small. Rejected Ideas ============== -Create specification for sdist metadata ---------------------------------------- +Rely on the specification for sdist metadata +-------------------------------------------- -The topic of creating a trustworthy, standard sdist metadata format as a means -to distinguish sdists from arbitrary archive files has been raised and -discussed multiple times, but has yet to make significant progress due to -the complexity of potential metadata inconsistency between an sdist and a -wheel built from it. +Since this PEP was first written, :pep:`643` has been accepted, defining a +trustworthy, standard sdist metadata format. This allows distribution metadata +(and in particular name and version) to be determined statically. -This PEP does not exclude the possibility of creating a metadata specification -for sdists in the future. But by specifying only the file name of an sdist, a -tool can reliably identify an sdist, and perform useful introspection on its -identity, without going into the details required for metadata specification. +This is not considered sufficient, however, as in a number of significant +cases (for example, reading filenames from a package index) the application +only has access to the filename, and reading metadata would involve a +potentially costly download. -Use a currently common sdist naming scheme ------------------------------------------- +Use a dedicated file extension +------------------------------ -There is a currently established practice to name an sdist in the format of -``{distribution}-{version}.[tar.gz|zip]``. +The original version of this PEP proposed a filename of +``{distribution}-{version}.sdist``. This has the advantage of being explicit, +as well as allowing a future change to the storage format without needing a +further change of the file naming convention. -Popular source code management services use a similar scheme to name the -downloaded source archive. GitHub, for example, uses ``distribution-1.0.zip`` -as the archive name containing source code of repository ``distribution`` on -branch ``1.0``. Giving this scheme a special meaning would cause confusion -since a source archive may not a valid sdist. +However, there are significant compatibility issues with a new extension. Index +servers may currently disallow unknown extensions, and if we introduced a new +one, it is not clear how to handle cases like a legacy index trying to mirror an +index that hosts new-style sdists. Is it acceptable to only partially mirror, +omitting sdists for newer versions of projects? Also, build backends that produce +the new format would be incompaible with index servers that only accept the old +format, and as there is often no way for a user to request an older version of a +backend when doing a build, this could make it impossible to build and upload +sdists. Augment a currently common sdist naming scheme ---------------------------------------------- @@ -141,15 +169,28 @@ parse ``distribution-1.0.sdist.tar.gz`` as project ``distribution`` with version ``1.0.sdist``. This would cause the sdist to be downloaded, but fail to install due to inconsistent metadata. -The same problem exists for all common archive suffixes. To avoid confusing -old installers, the sdist scheme must use a suffix that they do not identify -as an archive. +The main advantage of this proposal was that it is easier for tools to +recognise the new-style naming. But this is not a particularly significant +benefit, given that all sdists with a single hyphen in the name are parsed +the same way under the old and new rules. + + +Open Issues +=========== + +The contents of an sdist are required to contain a single top-level directory +named ``{name}-{version}``. Currently no normalisation rules are required +for the components of this name. Should this PEP require that the same normalisation +rules are applied here as for the filename? Note that in practice, it is likely +that tools will create the two names using the same code, so normalisation is +likely to happen naturally, even if it is not explicitly required. References ========== .. _`pypa/pip#8387`: https://github.com/pypa/pip/issues/8387 +.. _`the wheel spec`: https://packaging.python.org/en/latest/specifications/binary-distribution-format/ Copyright