PEP 639: Apply changes to the draft based on the Discourse thread (#3835)

Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
Co-authored-by: Alyssa Coghlan <ncoghlan@gmail.com>
This commit is contained in:
Karolina Surma 2024-06-29 00:20:47 +02:00 committed by GitHub
parent 15b87159e5
commit c21ff55815
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 85 additions and 183 deletions

View File

@ -33,14 +33,11 @@ To achieve that, it:
- Specifies the necessary changes to :term:`Core Metadata` and
the corresponding :term:`Pyproject Metadata key`\s
- Describes the necessary changes to related specifications,
namely the `source distribution (sdist) <sdistspec_>`__,
- Describes the necessary changes to
the `source distribution (sdist) <sdistspec_>`__,
`built distribution (wheel) <wheelspec_>`__ and
`installed project <installedspec_>`__ standards.
- :ref:`Provides guidance <639-spec-converting-metadata>`
for authors and tools converting legacy license metadata.
This will make license declaration simpler and less ambiguous for
package authors to create, end users to understand,
and tools to programmatically process.
@ -101,7 +98,7 @@ including on `outdated and ambiguous PyPI classifiers <classifierissue_>`__,
`license interoperability with other ecosystems <interopissue_>`__,
`too many confusing license metadata options <packagingissue_>`__,
`limited support for license files in the Wheel project <wheelfiles_>`__, and
`the lack of clear, precise and standardized license metadata <pepissue_>`__.
`the lack of precise license metadata <pepissue_>`__.
As a result, on average, Python packages tend to have more ambiguous and
missing license information than other common ecosystems. This is supported by
@ -218,10 +215,12 @@ particularly :term:`license identifier` and :term:`license expression`.
as described in the
:ref:`639-spec-field-license-expression` section of this PEP.
This includes all valid SPDX identifiers and
the strings ``LicenseRef-Public-Domain`` and ``LicenseRef-Proprietary``.
the custom ``LicenseRef-[idstring]`` strings conforming to the
`SPDX specification, clause 10.1 <spdxcustom_>`__.
Examples:
``MIT``,
``GPL-3.0-only``
``GPL-3.0-only``,
``LicenseRef-My-Custom-License``
root license directory
license directory
@ -254,7 +253,7 @@ The changes necessary to implement this PEP include:
:ref:`project source metadata <639-spec-source-metadata>`,
as defined in the `specification <pyprojecttoml_>`__.
- :ref:`minor additions <639-spec-project-formats>` to the
- :ref:`additions <639-spec-project-formats>` to the
source distribution (sdist), built distribution (wheel) and installed project
specifications.
@ -283,8 +282,11 @@ A license expression can use the following :term:`license identifier`\s:
version. Note that the SPDX working group never removes any license
identifiers; instead, they may choose to mark an identifier as "deprecated".
- The ``LicenseRef-Public-Domain`` and ``LicenseRef-Proprietary`` strings, to
identify licenses that are not included in the SPDX license list.
- The custom ``LicenseRef-[idstring]`` string(s), where
``[idstring]`` is a unique string containing letters, numbers,
``.`` and/or ``-``, to identify licenses that are not included in the SPDX
license list. The custom identifiers must follow the SPDX specification,
`clause 10.1 <spdxcustom_>`__ of the given specification version.
Examples of valid SPDX expressions:
@ -293,10 +295,10 @@ Examples of valid SPDX expressions:
MIT
BSD-3-Clause
MIT AND (Apache-2.0 OR BSD-2-clause)
MIT AND (Apache-2.0 OR BSD-2-Clause)
MIT OR GPL-2.0-or-later OR (FSFUL AND BSD-2-Clause)
GPL-3.0-only WITH Classpath-Exception-2.0 OR BSD-3-Clause
LicenseRef-Public-Domain OR CC0-1.0 OR Unlicense
LicenseRef-Special-License OR CC0-1.0 OR Unlicense
LicenseRef-Proprietary
@ -306,6 +308,8 @@ Examples of invalid SPDX expressions:
Use-it-after-midnight
Apache-2.0 OR 2-BSD-Clause
LicenseRef-License with spaces
LicenseRef-License_with_underscores
.. _639-spec-core-metadata:
@ -328,34 +332,21 @@ Add ``License-Expression`` field
The ``License-Expression`` optional :term:`Core Metadata field`
is specified to contain a text string
that is a valid SPDX :term:`license expression`, as defined by this PEP.
that is a valid SPDX :term:`license expression`,
as :ref:`defined above <639-spdx>`.
Publishing tools SHOULD issue an informational warning if this field is
missing, and MAY raise an error. Build tools MAY issue a similar warning,
but MUST NOT raise an error.
A license expression is an SPDX expression as :ref:`defined above <639-spdx>`.
When processing the ``License-Expression`` field, build and publishing tools:
- SHOULD halt execution and raise an error if:
- The field does not contain a valid license expression
- One or more license identifiers are not valid
(as :ref:`defined above <639-spdx>`)
- SHOULD report an informational warning, and publishing tools MAY raise an
error, if one or more license identifiers have been marked as deprecated in
the `SPDX License List <spdxlist_>`__.
- MUST store a case-normalized version of the ``License-Expression`` field
using the reference case for each SPDX license identifier and
uppercase for the ``AND``, ``OR`` and ``WITH`` keywords.
- SHOULD report an informational warning, and MAY raise an error if
the normalization process results in changes to the
``License-Expression`` field contents.
Build and publishing tools SHOULD
check that the ``License-Expression`` field contains a valid SPDX expression,
including the validity of the particular license identifiers
(as :ref:`defined above <639-spdx>`).
Tools MAY halt execution and raise an error when an invalid expression is found.
If tools choose to validate the SPDX expression, they also SHOULD
store a case-normalized version of the ``License-Expression``
field using the reference case for each SPDX license identifier and uppercase
for the ``AND``, ``OR`` and ``WITH`` keywords.
Tools SHOULD report a warning and publishing tools MAY raise an error
if one or more license identifiers
have been marked as deprecated in the `SPDX License List <spdxlist_>`__.
For all newly-uploaded :term:`distribution archive`\s
that include a ``License-Expression`` field,
@ -363,6 +354,8 @@ the `Python Package Index (PyPI) <pypi_>`__ MUST
validate that they contain a valid, case-normalized license expression with
valid identifiers (as :ref:`defined above <639-spdx>`)
and MUST reject uploads that do not.
Custom license identifiers which conform to the SPDX specification
are considered valid.
PyPI MAY reject an upload for using a deprecated license identifier,
so long as it was deprecated as of the above-mentioned SPDX License List
version.
@ -430,19 +423,19 @@ Deprecate ``License`` field
The legacy unstructured-text ``License`` :term:`Core Metadata field`
is deprecated and replaced by the new ``License-Expression`` field.
Build and publishing tools MUST raise an error
if both these fields are present and their values are not identical,
including capitalization and excluding leading and trailing whitespace.
The fields are mutually exclusive.
Tools which generate Core Metadata MUST NOT create both these fields.
Tools which read Core Metadata, when dealing with both these fields present
at the same time, MUST read the value of ``License-Expression`` and MUST
disregard the value of the ``License`` field.
If only the ``License`` field is present, such tools SHOULD issue a warning
If only the ``License`` field is present, tools MAY issue a warning
informing users it is deprecated and recommending ``License-Expression``
instead.
For all newly-uploaded :term:`distribution archive`\s that include a
``License-Expression`` field, the `Python Package Index (PyPI) <pypi_>`__ MUST
reject any that specify a ``License`` field and the text of which is not
identical to that of ``License-Expression``,
as :ref:`defined here <639-spdx>`.
reject any that specify both ``License`` and ``License-Expression`` fields.
The ``License`` field may be removed from a new version of the specification
in a future PEP.
@ -499,15 +492,10 @@ string value. It is a valid SPDX license expression as
:ref:`defined in this PEP <639-spdx>`.
Its value maps to the ``License-Expression`` field in the core metadata.
Build tools SHOULD validate the expression as described in the
Build tools SHOULD validate and perform case normalization of the expression
as described in the
:ref:`639-spec-field-license-expression` section,
outputting an error or warning as specified.
When generating the Core Metadata, tools MUST perform case normalization.
If a top-level string value for the ``license`` key is present and valid,
for purposes of backward compatibility
tools MAY back-fill the ``License`` Core Metadata field
with the normalized value of the ``license`` key.
Examples:
@ -815,13 +803,14 @@ Users of packaging tools will learn the valid license expression of their
package through the messages issued by the tools when they detect invalid
ones, or when the deprecated ``License`` field or license classifiers are used.
If an invalid ``License-Expression`` is used, an error message will help users
understand they need to use SPDX identifiers. For authors using the
now-deprecated ``License`` field or license classifiers, packaging tools will
warn them and inform them of the modern replacement, ``License-Expression``.
Finally, the users who may not be aware of this PEP will be guided by the
publishing tools toward including ``license`` and ``license-files`` in their
project source metadata.
If an invalid ``License-Expression`` is used, the users will not be able
to publish their package to PyPI and an error message will help them
understand they need to use SPDX identifiers.
It will be possible to generate a distribution with incorrect license metadata,
but not to publish one on PyPI or any other index server that enforces ``License-Expression`` validity.
For authors using the now-deprecated ``License`` field or license classifiers,
packaging tools may warn them and inform them of the replacement,
``License-Expression``.
Tools may also help with the conversion and suggest a license expression in
many common cases:
@ -833,13 +822,6 @@ many common cases:
- Tools may be able to suggest how to update an existing ``License`` value
in project source metadata and convert that to a license expression,
as also :ref:`specified in this PEP <639-spec-converting-metadata>`.
For instance, a tool may suggest converting a value of ``MIT`` in the
``license.text`` key in ``[project]`` (or the equivalent in tool-specific
formats) to a top-level string value of the ``license`` key (or equivalent).
Likewise, a tool could suggest converting from a ``License`` of ``Apache2``
(which is not a valid license expression as :ref:`defined in this PEP
<639-spdx>`) to a ``License-Expression`` of ``Apache-2.0``.
.. _639-reference-implementation:
@ -847,16 +829,13 @@ Reference Implementation
========================
Tools will need to support parsing and validating license expressions in the
``License-Expression`` field.
The `license-expression library <licenseexplib_>`__ is a reference Python
implementation that handles license expressions including parsing,
formatting and validation, using flexible lists of license symbols
(including SPDX license IDs and any extra identifiers included here).
It is licensed under Apache-2.0 and is already used in several projects,
including the `SPDX Python Tools <spdxpy_>`__,
the `ScanCode toolkit <scancodetk_>`__
and the Free Software Foundation Europe (FSFE) `REUSE project <reuse_>`__.
``License-Expression`` field if they decide to implement this part of the
specification.
It's up to the tools whether they prefer to implement the validation on their
side (e.g. like `hatch <hatchparseimpl_>`__) or use one of the available
Python libraries (e.g. `license-expression <licenseexplib_>`__).
This PEP does not mandate using any specific library and leaves it to the
tools authors to choose the best implementation for their projects.
.. _639-rejected-ideas:
@ -869,77 +848,6 @@ rejected. The exhaustive list including the rationale for rejecting can be found
in a :ref:`separate page <639-rejected-ideas-details>`.
Open Issues
===========
Should the ``License`` field be back-filled, or mutually exclusive?
-------------------------------------------------------------------
At present, this PEP explicitly allows, but does not require, build tools to
back-fill the ``License`` Core Metadata field with the verbatim text from the
``License-Expression`` field. This would improve backwards compatibility and was
suggested by some on the Discourse thread. On the other hand, allowing it does
increase complexity and is less of a clean separation, preventing the
``License`` field from being mutually exclusive with the new
``License-Expression`` field and requiring that their values match.
As such, it would be useful to have a more concrete rationale and use cases for
the back-filled data in order to come to a final consensus on this matter.
Therefore, is the status quo acceptable, allowing tools to decide this for
themselves? Should this PEP recommend, or even require, that tools back-fill
this metadata (which would presumably be reversed once a breaking revision of
the metadata spec is issued)? Or should this not be explicitly allowed, or even
prohibited?
Should custom license identifiers be allowed?
---------------------------------------------
The current version of this PEP specifies the possibility to use the
custom identifiers ``LicenseRef-Public-Domain`` and ``LicenseRef-Proprietary``
to handle the cases where projects have a license, but there is not a
recognized SPDX license identifier for it. For maximum flexibility, custom
``LicenseRef-<CUSTOM-TEXT>`` license identifiers could be allowed. In some cases
``LicenseRef-Proprietary`` may not be appropriate or specific enough, but
package authors could still want to benefit from the mainstream Python build
tooling.
However, this could increase the confusion about licensing. Custom identifiers
cannot be checked for correctness and users may think they always have to
prepend identifiers with ``LicenseRef``. This would lead to tools producing
invalid metadata. Additionally, this promotes the use of custom license
identifiers, leading to even more ambiguity.
Standards-conforming tools should not be required to allow custom license
identifiers, since they will not recognize or know how to treat them. By
contrast, custom tools, which would be required to understand custom
identifiers, don't have to follow the listed rules for license identifiers. This
specification already allows such use in specific ecosystems, which avoids the
disadvantages of forcing them on all mainstream packaging tools.
As an alternative, a ``LicenseRef-Custom`` identifier could be defined, which
would more explicitly indicate that the license cannot be expressed with
existing identifiers and the license text should be referenced for details,
in cases where ``LicenseRef-Proprietary`` is not appropriate. This would avoid
the main downsides of the approach of allowing an arbitrary ``LicenseRef``,
while addressing several of the potential scenarios cited for it.
On the other hand, as SPDX aims to encompass all FSF-recognized "Free" and
OSI-approved "Open Source" licenses, anything outside those bounds would
generally be covered by ``LicenseRef-Proprietary``, thus making
``LicenseRef-Custom`` somewhat redundant to it. Furthermore, it may mislead
authors of projects with complex/multiple licenses that they should use it over
specifying a license expression.
At present, the PEP retains the existing approach over either of these, since
the benefits
otherwise seem marginal. Not defining this now enables allowing it later (or
even now, with custom packaging tools) without affecting backward compatibility.
This would be problematic, if they were allowed now and later determined to be
unnecessary.
Appendices
==========
@ -967,6 +875,7 @@ References
.. _globmodule: https://docs.python.org/3/library/glob.html
.. _hatch: https://hatch.pypa.io/latest/
.. _hatchimplementation: https://discuss.python.org/t/12622/22
.. _hatchparseimpl: https://github.com/pypa/hatch/blob/hatchling-v1.24.2/backend/src/hatchling/licenses/parse.py#L8-L18
.. _installedspec: https://packaging.python.org/specifications/recording-installed-packages/
.. _interopissue: https://github.com/pypa/interoperability-peps/issues/46
.. _licenseexplib: https://github.com/nexB/license-expression/
@ -978,16 +887,14 @@ References
.. _pypugdistributionpackage: https://packaging.python.org/en/latest/glossary/#term-Distribution-Package
.. _pypugglossary: https://packaging.python.org/glossary/
.. _pypugproject: https://packaging.python.org/en/latest/glossary/#term-Project
.. _reuse: https://reuse.software/
.. _scancodetk: https://github.com/nexB/scancode-toolkit
.. _sdistspec: https://packaging.python.org/specifications/source-distribution-format/
.. _setuptoolsfiles: https://github.com/pypa/setuptools/issues/2739
.. _setuptoolspep639: https://github.com/pypa/setuptools/pull/2645
.. _spdx: https://spdx.dev/
.. _spdxcustom: https://spdx.github.io/spdx-spec/v2.2.2/other-licensing-information-detected/
.. _spdxid: https://spdx.dev/ids/
.. _spdxlist: https://spdx.org/licenses/
.. _spdxpression: https://spdx.github.io/spdx-spec/v2.2.2/SPDX-license-expressions/
.. _spdxpy: https://github.com/spdx/tools-python/
.. _spdxversion: https://github.com/pombredanne/spdx-pypi-pep/issues/6
.. _wheelfiles: https://github.com/pypa/wheel/issues/138
.. _wheelproject: https://wheel.readthedocs.io/en/stable/

View File

@ -300,37 +300,6 @@ Therefore, for these reasons, we reject this here in favor of
the reserved string value of the ``license`` key.
Must be marked dynamic to back-fill
'''''''''''''''''''''''''''''''''''
The ``license`` key in the ``pyproject.toml`` could be required to be
explicitly set to dynamic in order for the ``License`` Core Metadata field
to be automatically back-filled from
the top-level string value of the ``license`` key.
This would be more explicit that the filling will be done,
as strictly speaking the ``license`` key is not (and cannot be) specified in
``pyproject.toml``, and satisfies a stricter interpretation of the letter
of the previous :pep:`621` specification that PEP 639 revises.
However, this doesn't seem to be necessary, because it is simply using the
static, literal value of the ``license`` key, as specified
strictly in PEP 639. Therefore, any conforming tool can
deterministically derive this using only the static data
in the ``pyproject.toml`` file itself.
Furthermore, this actually adds significant ambiguity, as it means the value
could get filled arbitrarily by other tools, which would in turn compromise
and conflict with the value of the new ``License-Expression`` field, which is
why such is explicitly prohibited by PEP 639. Therefore, not marking it as
``dynamic`` will ensure it is only handled in accordance with PEP 639's
requirements.
Finally, users explicitly being told to mark it as ``dynamic``, or not, to
control filling behavior seems to be a bit of a misuse of the ``dynamic``
field as apparently intended, and prevents tools from adapting to best
practices (fill, don't fill, etc.) as they develop and evolve over time.
Source metadata ``license-files`` key
-------------------------------------
@ -738,6 +707,32 @@ tools to immediate take advantage of improvements and accept new
licenses balancing flexibility and compatibility.
Don't allow custom license identifiers
''''''''''''''''''''''''''''''''''''''
A previous draft of this PEP specified the possibility to use only two
custom identifiers: ``LicenseRef-Public-Domain`` and ``LicenseRef-Proprietary``
to handle the cases where projects have a license, but there is not a
recognized SPDX license identifier for it.
The custom identifiers cannot be checked for correctness and users may think
they always have to prepend identifiers with ``LicenseRef``.
This would lead to tools producing invalid metadata.
However, Python packages are produced in many open and close
environments,
where it may be impossible to declare the license using only the small subset
of the allowed custom identifiers and where, for various reasons,
it's not possible to add the license to the SPDX license list.
The custom license identifiers are explicitly allowed and described in the
official SPDX specification and they can be syntactically validated although
not case-normalized.
Therefore, with acknowledgement that the custom identifiers can't be fully
validated and may contain mistakes, it was decided to allow
them in line with the official SPDX specification.
.. _639-rejected-ideas-difference-license-source-binary:
Different licenses for source and binary distributions