PEP 753: Uniform URLs in core metadata (#3936)

Signed-off-by: William Woodruff <william@yossarian.net>
Co-authored-by: Alyssa Coghlan <ncoghlan@gmail.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
This commit is contained in:
William Woodruff 2024-09-10 15:07:31 -04:00 committed by GitHub
parent 2dd6c9dfee
commit ce38a96f43
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 328 additions and 0 deletions

1
.github/CODEOWNERS vendored
View File

@ -632,6 +632,7 @@ peps/pep-0749.rst @JelleZijlstra
peps/pep-0750.rst @gvanrossum @lysnikolaou peps/pep-0750.rst @gvanrossum @lysnikolaou
peps/pep-0751.rst @brettcannon peps/pep-0751.rst @brettcannon
peps/pep-0752.rst @warsaw peps/pep-0752.rst @warsaw
peps/pep-0753.rst @warsaw
# ... # ...
# peps/pep-0754.rst # peps/pep-0754.rst
# ... # ...

327
peps/pep-0753.rst Normal file
View File

@ -0,0 +1,327 @@
PEP: 753
Title: Uniform project URLs in core metadata
Author: William Woodruff <william@yossarian.net>,
Facundo Tuesca <facundo.tuesca@trailofbits.com>
Sponsor: Barry Warsaw <barry at python.org>
PEP-Delegate: Paul Moore <p.f.moore at gmail.com>
Discussions-To: https://discuss.python.org/t/pep-753-uniform-urls-in-core-metadata/62792
Status: Draft
Type: Standards Track
Topic: Packaging
Created: 29-Aug-2024
Post-History: `26-Aug-2024 <https://discuss.python.org/t/core-metadata-should-home-page-and-download-url-be-deprecated/62037>`__,
`03-Sep-2024 <https://discuss.python.org/t/pep-753-uniform-urls-in-core-metadata/62792>`__
Abstract
========
This PEP recommends two discrete changes to handling of core metadata by
indices, such as PyPI:
* Deprecation of the ``Home-page`` and ``Download-URL`` fields in favor of
their ``Project-URL`` equivalents;
* A set of conventions for normalizing and assigning semantics to
``Project-URL`` labels.
Rationale and Motivation
========================
Python's standard :ref:`core metadata <packaging:core-metadata>` has gone
through many years of revisions, with various standardized milestone versions.
These revisions of the core metadata have introduced various mechanisms
for expressing a package's relationship to external resources, via URLs:
1. Metadata 1.0 introduced ``Home-page``, a single-use field containing
a URL to the distribution's home page.
.. code-block::
Home-page: https://example.com/sampleproject
2. Metadata 1.1 introduced ``Download-URL``, a complementary single-use field
containing a URL suitable for downloading the current distribution.
.. code-block::
Download-URL: https://example.com/sampleproject/sampleproject-1.2.3.tar.gz
3. Metadata 1.2 introduced ``Project-URL``, a **multiple-use** field containing
a label-and-URL pair. Each label is free text conveying the URL's semantics.
.. code-block::
Project-URL: Homepage, https://example.com/sampleproject
Project-URL: Download, https://example.com/sampleproject/sampleproject-1.2.3.tar.gz
Project-URL: Documentation, https://example.com/sampleproject/docs
Metadata 2.1, 2.2, and 2.3 leave the behavior of these fields as originally
specified.
Because ``Project-URL`` allows free text labels and is multiple-use, informal
conventions have arisen for representing the values of
``Home-page`` and ``Download-URL`` within ``Project-URL`` instead.
These conventions have seen significant adoption, with :pep:`621` explicitly
choosing to provide only a ``project.urls`` table rather than a
``project.home-page`` field. From :pep:`621`'s rejected ideas:
.. pull-quote::
While the core metadata supports it, having a single field for a project's
URL while also supporting a full table seemed redundant and confusing.
This PEP exists to formalize the informal conventions that have arisen, as well
as explicitly document ``Home-page`` and ``Download-URL`` as deprecated in
favor of equivalent ``Project-URL`` representations.
Specification
=============
This PEP proposes that ``Home-page`` and ``Download-URL`` be considered
**deprecated**. This deprecation has implications for both package metadata
producers (e.g. build backends and packaging tools) and package indices
(e.g. PyPI).
.. _metadata-producers:
Metadata producers
------------------
This PEP stipulates the following for metadata producers:
* When generating metadata 1.2 or later, producers **SHOULD** emit only
``Project-URL``, and **SHOULD NOT** emit ``Home-page`` or ``Download-URL``
fields.
* When generating ``Project-URL`` equivalents for ``Home-page`` and
``Download-URL``, producers **SHOULD** use the
:ref:`label conventions <conventions-for-labels>` described below.
These stipulations do not change the optionality of URL fields in core metadata.
In other words, producers **MAY** choose to omit ``Project-URL`` entirely
at their discretion.
This PEP does **not** propose the outright removal of support for ``Home-page``
or ``Download-URL``. However, see :ref:`future-considerations` for
thoughts on how a new (as of yet unspecified) major core metadata version
could complete the deprecation cycle via removal of these deprecated fields.
.. _package-indices:
Package indices
---------------
This PEP stipulates the following for package indices:
* When interpreting a distribution's metadata of version 1.2 or later
(e.g. for rendering on a web page), the index **MUST** prefer
``Project-URL`` fields as a source of URLs over ``Home-page`` and
``Download-URL``, even if the latter are explicitly provided.
* If a distribution's metadata contains only the ``Home-page`` and
``Download-URL`` fields, the index **MAY** choose to ignore those fields
and behave as though no URLs were provided in the metadata. In this case,
the index **SHOULD** present an appropriate warning or notification to
the uploading party.
* The mechanism for presenting this warning or notification is not
specified, since it will vary by index. By way of example, an index may
choose to present a warning in the HTTP response to an upload request, or
send an email or other notification to the maintainer(s) of the project.
* If a distribution's metadata contains both sets of fields, the index **MAY**
choose to reject the distribution outright. However, this is
**NOT RECOMMENDED** until a future unspecified major metadata version
formally removes support for ``Home-page`` and ``Download-URL``.
* Any changes to the interpretation of metadata of version 1.2 or later that
result in previously recognized URLs no longer being recognized
**SHOULD NOT** be retroactively applied to previously uploaded packages.
These stipulations do not change the optionality of URL processing by indices.
In other words, an index that does not process URLs within uploaded
distributions may continue to ignore all URL fields entirely.
.. _conventions-for-labels:
Conventions for ``Project-URL`` labels
======================================
The deprecations proposed above require a formalization of the currently
informal relationship between ``Home-page``, ``Download-URL``, and their
``Project-URL`` equivalents.
This formalization has two parts:
1. A set of rules for canonicalizing ``Project-URL`` labels;
2. A set of "well-known" canonical label values that indices may specialize
URL presentation for.
Label canonicalization
----------------------
The core metadata specification stipulates that ``Project-URL`` labels are
free text, limited to 32 characters.
This PEP proposes adding the concept of a "canonicalized" label to the core
metadata specification. Label canonicalization is defined via the following
Python function:
.. code-block:: python
import string
def canonicalize_label(label: str) -> str:
chars_to_remove = string.punctuation + string.whitespace
removal_map = str.maketrans("", "", chars_to_remove)
return label.translate(removal_map).lower()
In plain language: a label is *canonicalized* by deleting all ASCII punctuation and
whitespace, and then converting the result to lowercase.
The following table shows examples of labels before (raw) and after
canonicalization:
.. csv-table::
:header: "Raw", "Canonicalized"
"``Homepage``", "``homepage``"
"``Home-page``", "``homepage``"
"``Home page``", "``homepage``"
"``Change_Log``", "``changelog``"
"``What's New?``", "``whatsnew``"
Metadata producers **SHOULD** emit the canonicalized form of a user
specified label, but **MAY** choose to emit the un-canonicalized form so
long as it adheres to the existing 32 character constraint.
Package indices **SHOULD NOT** use the canonicalized labels belonging to the set
of well-known labels directly as UI elements (instead replacing them with
appropriately capitalized text labels). Labels not belonging to the well-known
set **MAY** be used directly as UI elements.
Well-known labels
-----------------
In addition to the canonicalization rules above, this PEP proposes a
fixed (but extensible) set of "well-known" ``Project-URL`` labels,
as well as equivalent aliases.
The following table lists these labels, in canonical form:
.. csv-table::
:header: "Label", "Description", "Aliases"
:widths: 20, 50, 30
"``homepage``", "The project's home page", "*(none)*"
"``download``", "A download URL for the current distribution, equivalent to ``Download-URL``", "*(none)*"
"``changelog``", "The project's changelog", "``changes``, ``releasenotes``, ``whatsnew``, ``history``"
"``documentation``", "The project's online documentation", "``docs``"
"``issues``", "The project's bug tracker", "``bugs``, ``issue``, ``bug``, ``tracker``, ``report``"
"``sponsor``", "Sponsoring information", "``funding``, ``donate``, ``donation``"
Packagers and metadata producers **MAY** choose to use these well-known
labels to communicate specific URL intents to package indices and downstreams.
Packagers and metadata producers **SHOULD** produce the canonicalized version
of the well-known labels in package metadata.
Similarly, indices **MAY** choose to specialize their rendering or presentation
of URLs with these labels, e.g. by presenting an appropriate icon or tooltip
for each label.
Indices **MAY** also specialize the rendering or presentation of additional labels or URLs,
including (but not limited to), labels that start with a well-known label, and URLs that refer
to a known service provider domain (e.g. for documentation hosting or issue tracking).
This PEP recognizes that the list of well-known labels is unlikely to remain
static, and that subsequent additions to it should not require the overhead
associated with a formal PEP process or new metadata version. As the primary
expected use case for this information is to control the way project URLs are
displayed on the Python Package Index, this PEP proposes that the list above
become a "living" list within PyPI's documentation (at time of writing, the
documentation for influencing PyPI's URL display can be found
`here <https://docs.pypi.org/project_metadata/#icons>`__).
Backwards Compatibility
=======================
Limited Impact
--------------
This PEP is expected to have little to no impact on existing packaging tooling
or package indices:
* Packaging tooling: no changes to the correctness or well-formedness
of the core metadata. This PEP proposes deprecations as well as behavioral
refinements, but all currently (and historically) produced metadata will
continue to be valid per the rules of its respective version.
* Package indices: indices will continue to expect well-formed core metadata,
with no behavioral changes. Indices **MAY** choose to emit warnings or
notifications on the presence of now-deprecated fields,
:ref:`per above <package-indices>`.
.. _future-considerations:
Future Considerations
=====================
This PEP does not stipulate or require any future metadata changes.
However, per :ref:`metadata-producers` and :ref:`conventions-for-labels`,
we identify the following potential future goals for a new major release of
the core metadata standards:
* Outright removal of support for ``Home-page`` and ``Download-URL`` in the
next major core metadata version. If removed, package indices and consumers
**MUST** reject metadata containing these fields when said metadata is of
the new major version.
* Enforcement of label canonicalization. If enforced, package producers
**MUST** emit only canonicalized ``Project-URL`` labels when generating
distribution metadata, and package indices and consumers **MUST** reject
distributions containing non-canonicalized labels. Note: requiring
canonicalization merely restricts labels to lowercase text, and excludes
whitespace and punctuation. It does NOT restrict project URLs solely to
the use of "well-known" labels.
These potential changes would be backwards incompatible, hence their
inclusion only in this section. Acceptance of this PEP does NOT commit
any future metadata revision to actually making these changes.
Security Implications
=====================
This PEP does not identify any positive or negative security implications
associated with deprecating ``Home-page`` and ``Download-URL`` or with
label canonicalization.
How To Teach This
=================
The changes in this PEP should be transparent to the majority of the packaging
ecosystem's userbase; the primary beneficiaries of this PEP's changes are
packaging tooling authors and index maintainers, who will be able to reduce the
number of unique URL fields produced and checked.
A small number of package maintainers may observe new warnings or notifications
from their index of choice, should the index choose to ignore ``Home-page``
and ``Download-URL`` as suggested. Similarly, a small number of package
maintainers may observe that their index of choice no longer renders
their URLs, if only present in the deprecated fields. However, no package
maintainers should observe rejected package uploads or other breaking
changes to packaging workflows due to this PEP's proposed changes.
Anybody who observes warnings or changes to the presentation of
URLs on indices can be taught about this PEP's behavior via official
packaging resources, such as the
:ref:`Python Packaging User Guide <packaging>`
and `PyPI's user documentation <https://docs.pypi.org/>`__, the latter of which
already contains an informal description of PyPI's URL handling behavior.
If this PEP is accepted, the authors of this PEP will coordinate to update
and cross-link the resources mentioned above.
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.