python-peps/peps/pep-0639.rst

1117 lines
51 KiB
ReStructuredText

PEP: 639
Title: Improving License Clarity with Better Package Metadata
Author: Philippe Ombredanne <pombredanne@nexb.com>,
C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>,
PEP-Delegate: Brett Cannon <brett@python.org>
Discussions-To: https://discuss.python.org/t/12622
Status: Draft
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 15-Aug-2019
Post-History: `15-Aug-2019 <https://discuss.python.org/t/2154>`__,
`17-Dec-2021 <https://discuss.python.org/t/12622>`__,
.. _639-abstract:
Abstract
========
This PEP defines a specification for how licenses are documented in the
`core metadata <coremetadataspec_>`__, with
:ref:`license expression strings <639-spec-field-license-expression>` using
`SPDX identifiers <spdxid_>`__ in a new ``License-Expression`` field.
This will make license declarations simpler and less ambiguous for
package authors to create, end users to read and understand, and
tools to programmatically process.
The PEP also:
- :ref:`Formally specifies <639-spec-field-license-file>`
a new ``License-File`` field, and defines how license files should be
:ref:`included in distributions <639-spec-project-formats>`,
as already used by the Wheel and Setuptools projects.
- Deprecates the legacy ``License`` :ref:`field <639-spec-field-license>`
and ``license ::`` :ref:`classifiers <639-spec-field-classifier>`.
- :ref:`Adds and deprecates <639-spec-source-metadata>` the corresponding keys
in the ``pyproject.toml`` ``[project]`` table.
- :ref:`Provides clear guidance <639-spec-converting-metadata>` for authors and
tools converting legacy license metadata, adding license files and
validating license expressions.
- Describes a :ref:`reference implementation <639-reference-implementation>`
and analyzes numerous :ref:`potential alternatives <639-rejected-ideas>`.
The changes in this PEP will update the
`core metadata <coremetadataspec_>`__ to version 2.4, modify the
`project (source) metadata specification <pep621spec_>`__,
and make minor additions to the `source distribution (sdist) <sdistspec_>`__,
`built distribution (wheel) <wheelspec_>`__ and
`installed project <installedspec_>`__ standards.
.. _639-goals:
Goals
=====
This PEP's scope is limited to covering new mechanisms for documenting
the license of a distribution package, specifically defining:
- A means of specifying a SPDX license expression.
- A method of including license texts in distributions and installed projects.
The changes to the core metadata specification that this PEP requires have been
designed to minimize impact and maximize backward compatibility.
This specification builds off of existing ways to document licenses that are
already in use in popular tools (e.g. adding support to core metadata for the
``License-File`` field :ref:`already used <639-license-doc-setuptools-wheel>`
in the Wheel and Setuptools projects) and by some package authors
(e.g. storing an SPDX license expression in the existing ``License`` field).
In addition to these proposed changes, this PEP contains guidance for tools
handling and converting these metadata, a tutorial for package authors
covering various common use cases, detailed examples of them in use,
and a comprehensive survey of license documentation in Python and other
languages.
It is the intent of the PEP authors to work closely with tool maintainers to
implement the recommendations for validation and warnings specified here.
.. _639-non-goals:
Non-Goals
=========
This PEP is neutral regarding the choice of license by any particular
package author. This PEP makes no recommendation for specific licenses,
and does not require the use of a particular license documentation convention.
Rather, the SPDX license expression syntax proposed in this PEP provides a
simpler and more expressive mechanism to accurately document any kind of
license that applies to a Python package, whether it is open source,
free/libre, proprietary, or a combination of such.
This PEP also does not impose any additional restrictions when uploading to
PyPI, unless projects choose to make use of the new fields.
Instead, it is intended to document best practices already in use, extend them
to use a new formally-specified and supported mechanism, and provide guidance
for packaging tools on how to hand the transition and inform users accordingly.
This PEP also is not about license documentation in files inside projects,
though this is a :ref:`surveyed topic <639-license-doc-source-files>`
in an appendix, and nor does it intend to cover cases where the source and
binary distribution packages don't have :ref:`the same licenses
<639-rejected-ideas-difference-license-source-binary>`.
.. _639-motivation:
Motivation
==========
Software must be licensed in order for anyone other than its creator to
download, use, share and modify it, so providing accurate license information
to Python package users is an important matter.
Today, there are multiple fields where
licenses are documented in core metadata, and there are limitations to what
can be expressed in each of them. This often leads to confusion and a lack of
clarity, both for package authors and end users.
Many package authors have expressed difficulty and frustrations due to the
limited capabilities to express licensing in project metadata, and this
creates further trouble for Linux and BSD distribution re-packagers.
This has triggered a number of license-related discussions and issues,
including on `outdated and ambiguous PyPI classifiers <classifierissue_>`__,
`license interoperability with other ecosystems <interopissue_>`__,
`too many confusing license metadata options <packagingissue_>`__,
`limited support for license files in the Wheel project <wheelfiles_>`__, and
`the lack of clear, precise and standardized license metadata <pepissue_>`__.
The current license classifiers address some common cases, and could
be extended to include the full range of current SPDX identifiers
while deprecating the many ambiguous classifiers
(including some popular and problematic ones,
such as ``License :: OSI Approved :: BSD License``).
However, this requires a substantial amount of effort
to duplicate the SPDX license list and keep it in sync.
Furthermore, it is effectively a hard break in backward compatibility,
forcing a huge proportion of package authors to immediately update to new
classifiers (in most cases, with many possible choices that require closely
examining the project's license) immediately when PyPI deprecates the old ones.
Furthermore, this only covers simple packages entirely under a single license;
it doesn't address the substantial fraction of common projects that vendor
dependencies (e.g. Setuptools), offer a choice of licenses (e.g. Packaging)
or were relicensed, adapt code from other projects or contain fonts, images,
examples, binaries or other assets under other licenses. It also requires
both authors and tools understand and implement the PyPI-specific bespoke
classifier system, rather than using short, easy to add and standardized
SPDX identifiers in a simple text field, as increasingly widely adopted by
most other packaging systems to reduce the overall burden on the ecosystem.
Finally, this does not provide as clear an indicator that a package
has adopted the new system, and should be treated accordingly.
On average, Python packages tend to have more ambiguous and missing license
information than other common ecosystems (such as npm, Maven or
Gem). This is supported by the `statistics page <cdstats_>`__ of the
`ClearlyDefined project <clearlydefined_>`__, an
`Open Source Initiative <osi_>`__ incubated effort to help
improve licensing clarity of other FOSS projects, covering all packages
from PyPI, Maven, npm and Rubygems.
.. _639-rationale:
Rationale
=========
A survey of existing license metadata definitions in use in the Python
ecosystem today is provided in
:ref:`an appendix <639-license-doc-python>` of this PEP,
and license documentation in a variety of other packaging systems,
Linux distros, languages ecosystems and applications is surveyed in
:ref:`another appendix <639-license-doc-other-projects>`.
There are a few takeaways from the survey, which have guided the design
and recommendations of this PEP:
- Most package formats use a single ``License`` field.
- Many modern package systems use some form of license expression syntax to
optionally combine more than one license identifier together.
SPDX and SPDX-like syntaxes are the most popular in use.
- SPDX license identifiers are becoming the de facto way to reference common
licenses everywhere, whether or not a full license expression syntax is used.
- Several package formats support documenting both a license expression and the
paths of the corresponding files that contain the license text. Most Free and
Open Source Software licenses require package authors to include their full
text in a distribution.
The use of a new ``License-Expression`` field will provide an intuitive,
structured and unambiguous way to express the license of a
package using a well-defined syntax and well-known license identifiers.
Similarly, a formally-specified ``License-File`` field offers a standardized
way to ensure that the full text of the license(s) are included with the
package when distributed, as legally required, and allows other tools consuming
the core metadata to unambiguously locate a distribution's license files.
While dramatically simplifying and improving the present Python license
metadata story, this specification standardizes and builds upon
existing practice in the `Setuptools <setuptoolsfiles_>`__ and
`Wheel <wheelfiles_>`__ projects.
Furthermore, an up-to-date version of the current draft of this PEP is
`already successfully implemented <hatchimplementation_>`__ in the popular
PyPA `Hatch <hatch_>`__ packaging tool, and an earlier draft of the
license files portion is `implemented in Setuptools <setuptoolspep639_>`__.
Over time, encouraging the use of these fields and deprecating the ambiguous,
duplicative and confusing legacy alternatives will help Python software
publishers improve the clarity, accuracy and portability of their licensing
practices, to the benefit of package authors, consumers and redistributors
alike.
.. _639-terminology:
Terminology
===========
This PEP seeks to clearly define the terms it uses, given that some have
multiple established meanings (e.g. import vs. distribution package,
wheel *format* vs. Wheel *project*); are related and often used
interchangeably, but have critical distinctions in meaning
(e.g. ``[project]`` *key* vs. core metadata *field*); are existing concepts
that don't have formal terms/definitions (e.g. project/source metadata vs.
distribution/built metadata, build vs. publishing tools), or are new concepts
introduced here (e.g. license expression/identifier).
This PEP also uses terms defined in the
`PyPA PyPUG Glossary <pypugglossary_>`__
(specifically *built/binary distribution*, *distribution package*,
*project* and *source distribution*), and by the `SPDX Project <spdx_>`__
(*license identifier*, *license expression*).
The keywords "MUST", "MUST NOT", "REQUIRED",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in :rfc:`2119`.
Terms are listed here in their full versions;
related words (``Rel:``) are in parenthesis,
including short forms (``Short:``), sub-terms (``Sub:``) and common synonyms
for the purposes of this PEP (``Syn:``).
**Core Metadata** *(Syn: Package Metadata, Sub: Distribution Metadata)*
The `PyPA specification <coremetadataspec_>`__ and the set of metadata fields
it defines that describe key static attributes of distribution packages
and installed projects.
The **distribution metadata** refers to, more specifically, the concrete form
core metadata takes when included inside a distribution archive
(``PKG-INFO`` in a sdist and ``METADATA`` in a wheel) or installed project
(``METADATA``).
**Core Metadata Field** *(Short: Metadata Field/Field)*
A single key-value pair, or sequence of such with the same key, as defined
by the `core metadata specification <coremetadataspec_>`__.
Notably, distinct from a ``pyproject.toml`` ``[project]`` table *key*.
**Distribution Package** *(Sub: Package, Distribution Archive)*
(`See PyPUG <pypugdistributionpackage_>`__)
In this PEP, **package** is used to refer to the abstract concept of a
distributable form of a Python project, while **distribution** more
specifically references the physical **distribution archive**.
**License Classifier**
A `PyPI Trove classifier <classifiers_>`__
(as `described in the core metadata specification
<coremetadataclassifiers_>`__)
which begins with ``License ::``, currently used to indicate
a project's license status by including it as a ``Classifier``
in the core metadata.
**License Expression** *(Syn: SPDX Expression)*
A string with valid `SPDX license expression syntax <spdxpression_>`__
including any SPDX license identifiers as defined here, which describes
a project's license(s) and how they relate to one another. Examples:
``GPL-3.0-or-later``, ``MIT AND (Apache-2.0 OR BSD-2-clause)``
**License Identifier** *(Syn: License ID/SPDX Identifier)*
A valid `SPDX short-form license identifier <spdxid_>`__, as described in the
:ref:`639-spec-field-license-expression` section of this PEP; briefly,
this includes all valid SPDX identifiers and the ``LicenseRef-Public-Domain``
and ``LicenseRef-Proprietary`` strings. Examples: ``MIT``, ``GPL-3.0-only``
**Project** *(Sub: Project Source Tree, Installed Project)*
(`See PyPUG <pypugproject_>`__)
Here, a **project source tree** refers to the on-disk format of
a project used for development, while an **installed project** is the form a
project takes once installed from a distribution, as
`specified by PyPA <installedspec_>`__.
**Project Source Metadata** *(Sub: Project Table Metadata, Key, Subkey)*
Core metadata defined by the package author in the project source tree,
as top-level keys in the ``[project]`` table of a ``pyproject.toml`` file,
in the ``[metadata]`` table of ``setup.cfg``, or the equivalent for other
build tools.
The **Project Table Metadata**, or ``pyproject.toml`` ``[project]`` metadata,
refers specifically to the former, as defined by the
`PyPA Declaring Project Metadata specification <pep621spec_>`__
and originally specified in :pep:`621`.
A **Project Table Key**, or an unqualified *key* refers specifically to
a top-level ``[project]`` key
(notably, distinct from a core metadata *field*),
while a **subkey** refers to a second-level key in a table-valued
``[project]`` key.
**Root License Directory** *(Short: License Directory)*
The directory under which license files are stored in a project/distribution
and the root directory that their paths, as recorded under the
``License-File`` core metadata fields, are relative to.
Defined here to be the project root directory for source trees and source
distributions, and a subdirectory named ``licenses`` of the directory
containing the core metadata (i.e., the ``.dist-info/licenses``
directory) for built distributions and installed projects.
**Tool** *(Sub: Packaging Tool, Build Tool, Install Tool, Publishing Tool)*
A program, script or service executed by the user or automatically that
seeks to conform to the specification defined in this PEP.
A **packaging tool** refers to a tool used to build, publish,
install, or otherwise directly interact with Python packages.
A **build tool** is a packaging tool used to generate a source or built
distribution from a project source tree or sdist, when directly invoked
as such (as opposed to by end-user-facing install tools).
Examples: Wheel project, :pep:`517` backends via ``build`` or other
package-developer-facing frontends, calling ``setup.py`` directly.
An **install tool** is a packaging tool used to install a source or built
distribution in a target environment. Examples include the PyPA pip and
``installer`` projects.
A **publishing tool** is a packaging tool used to upload distribution
archives to a package index, such as Twine for PyPI.
**Wheel** *(Short: wheel, Rel: wheel format, Wheel project)*
Here, **wheel**, the standard built distribution format introduced in
:pep:`427` and `specified by the PyPA <wheelspec_>`__, will be referred to in
lowercase, while the `Wheel project <wheelproject_>`__, its reference
implementation, will be referred to as such with **Wheel** in Title Case.
.. _639-specification:
Specification
=============
The changes necessary to implement the improved license handling outlined in
this PEP include those in both
:ref:`distribution package metadata <639-spec-core-metadata>`,
as defined in the `core metadata specification <coremetadataspec_>`__, and
:ref:`author-provided project source metadata <639-spec-source-metadata>`,
as defined in the `project source metadata specification <_pep621spec>`__
(and originally introduced in :pep:`621`).
Further, :ref:`minor additions <639-spec-project-formats>` to the
source distribution (sdist), built distribution (wheel) and installed project
specifications will help document and clarify the already allowed,
now formally standardized behavior in these respects.
Finally, :ref:`guidance is established <639-spec-converting-metadata>`
for tools handling and converting legacy license metadata to license
expressions, to ensure the results are consistent, correct and unambiguous.
Note that the guidance on errors and warnings is for tools' default behavior;
they MAY operate more strictly if users explicitly configure them to do so,
such as by a CLI flag or a configuration option.
.. _639-spec-core-metadata:
Core metadata
-------------
The `PyPA Core Metadata specification <coremetadataspec_>`__ defines the names
and semantics of each of the supported fields in the distribution metadata of
Python distribution packages and installed projects.
This PEP :ref:`adds <639-spec-field-license-expression>` the
``License-Expression`` field,
:ref:`adds <639-spec-field-license-file>` the ``License-File`` field,
:ref:`deprecates <639-spec-field-license>` the ``License`` field,
and :ref:`deprecates <639-spec-field-classifier>` the license classifiers
in the ``Classifier`` field.
The error and warning guidance in this section applies to build and
publishing tools; end-user-facing install tools MAY be more lenient than
mentioned here when encountering malformed metadata
that does not conform to this specification.
As it adds new fields, this PEP updates the core metadata to version 2.4.
.. _639-spec-field-license-expression:
Add ``License-Expression`` field
''''''''''''''''''''''''''''''''
The ``License-Expression`` optional field is specified to contain a text string
that is a valid SPDX license expression, as defined herein.
Publishing tools SHOULD issue an informational warning if this field is
missing, and MAY raise an error. Build tools MAY issue a similar warning,
but MUST NOT raise an error.
.. _639-license-expression-definition:
A license expression is a string using the SPDX license expression syntax as
documented in the `SPDX specification <spdxpression_>`__, either
Version 2.2 or a later compatible version.
When used in the ``License-Expression`` field and as a specialization of
the SPDX license expression definition, a license expression can use the
following license identifiers:
- Any SPDX-listed license short-form identifiers that are published in the
`SPDX License List <spdxlist_>`__, version 3.17 or any later compatible
version. Note that the SPDX working group never removes any license
identifiers; instead, they may choose to mark an identifier as "deprecated".
- The ``LicenseRef-Public-Domain`` and ``LicenseRef-Proprietary`` strings, to
identify licenses that are not included in the SPDX license list.
When processing the ``License-Expression`` field to determine if it contains
a valid license expression, build and publishing tools:
- SHOULD halt execution and raise an error if:
- The field does not contain a valid license expression
- One or more license identifiers are not valid
(as :ref:`defined above <639-license-expression-definition>`)
- SHOULD report an informational warning, and publishing tools MAY raise an
error, if one or more license identifiers have been marked as deprecated in
the `SPDX License List <spdxlist_>`__.
- MUST store a case-normalized version of the ``License-Expression`` field
using the reference case for each SPDX license identifier and
uppercase for the ``AND``, ``OR`` and ``WITH`` keywords.
- SHOULD report an informational warning, and MAY raise an error if
the normalization process results in changes to the
``License-Expression`` field contents.
For all newly-upload distributions that include a
``License-Expression`` field, the `Python Package Index (PyPI) <pypi_>`__ MUST
validate that it contains a valid, case-normalized license expression with
valid identifiers (as defined here) and MUST reject uploads that do not.
PyPI MAY reject an upload for using a deprecated license identifier,
so long as it was deprecated as of the above-mentioned SPDX License List
version.
.. _639-spec-field-license-file:
Add ``License-File`` field
''''''''''''''''''''''''''
Each instance of the ``License-File`` optional field is specified to contain
the string representation of the path in the project source tree, relative to
the project root directory, of a license-related file.
It is a multi-use field that may appear zero or
more times, each instance listing the path to one such file. Files specified
under this field could include license text, author/attribution information,
or other legal notices that need to be distributed with the package.
As :ref:`specified by this PEP <639-spec-project-formats>`, its value
is also that file's path relative to the root license directory in both
installed projects and the standardized distribution package types.
In other legacy, non-standard or new distribution package formats and
mechanisms of accessing and storing core metadata, the value MAY correspond
to the license file path relative to a format-defined root license directory.
Alternatively, it MAY be treated as a unique abstract key to access the
license file contents by another means, as specified by the format.
If a ``License-File`` is listed in a source or built distribution's core
metadata, that file MUST be included in the distribution at the specified path
relative to the root license directory, and MUST be installed with the
distribution at that same relative path.
The specified relative path MUST be consistent between project source trees,
source distributions (sdists), built distributions (wheels) and installed
projects. Therefore, inside the root license directory, packaging tools
MUST reproduce the directory structure under which the
source license files are located relative to the project root.
Path delimiters MUST be the forward slash character (``/``),
and parent directory indicators (``..``) MUST NOT be used.
License file content MUST be UTF-8 encoded text.
Build tools MAY and publishing tools SHOULD produce an informative warning
if a built distribution's metadata contains no ``License-File`` entries,
and publishing tools MAY but build tools MUST NOT raise an error.
For all newly-uploaded distribution packages that include one or more
``License-File`` fields and declare a ``Metadata-Version`` of ``2.4`` or
higher, PyPI SHOULD validate that the specified files are present in all
uploaded distributions, and MUST reject uploads that do not validate.
.. _639-spec-field-license:
Deprecate ``License`` field
'''''''''''''''''''''''''''
The legacy unstructured-text ``License`` field is deprecated and replaced by
the new ``License-Expression`` field. Build and publishing tools MUST raise
an error if both these fields are present and their values are not identical,
including capitalization and excluding leading and trailing whitespace.
If only the ``License`` field is present, such tools SHOULD issue a warning
informing users it is deprecated and recommending ``License-Expression``
instead.
For all newly-uploaded distributions that include a
``License-Expression`` field, the `Python Package Index (PyPI) <pypi_>`__ MUST
reject any that specify a ``License`` field and the text of which is not
identical to that of ``License-Expression``, as defined in this section.
Along with license classifiers, the ``License`` field may be removed from a
new version of the specification in a future PEP.
.. _639-spec-field-classifier:
Deprecate license classifiers
'''''''''''''''''''''''''''''
Using license `classifiers <classifiers_>`__ in the ``Classifier`` field
(`described in the core metadata specification <coremetadataclassifiers_>`__)
is deprecated and replaced by the more precise ``License-Expression`` field.
If the ``License-Expression`` field is present, build tools SHOULD and
publishing tools MUST raise an error if one or more license classifiers
is included in a ``Classifier`` field, and MUST NOT add
such classifiers themselves.
Otherwise, if this field contains a license classifier, build tools MAY
and publishing tools SHOULD issue a warning informing users such classifiers
are deprecated, and recommending ``License-Expression`` instead.
For compatibility with existing publishing and installation processes,
the presence of license classifiers SHOULD NOT raise an error unless
``License-Expression`` is also provided.
For all newly-uploaded distributions that include a
``License-Expression`` field, the `Python Package Index (PyPI) <pypi_>`__ MUST
reject any that also specify any license classifiers.
New license classifiers MUST NOT be `added to PyPI <classifiersrepo_>`__;
users needing them SHOULD use the ``License-Expression`` field instead.
Along with the ``License`` field, license classifiers may be removed from a
new version of the specification in a future PEP.
.. _639-spec-source-metadata:
Project source metadata
-----------------------
As originally introduced in :pep:`621`, the
`PyPA Declaring Project Metadata specification <pep621spec_>`__
defines how to declare a project's source
metadata under a ``[project]`` table in the ``pyproject.toml`` file for
build tools to consume and output distribution core metadata.
This PEP :ref:`adds <639-spec-key-license-expression>`
a top-level string value for the ``license`` key,
:ref:`adds <639-spec-key-license-files>` the new ``license-files`` key
and :ref:`deprecates <639-spec-key-license>`
the table value for the ``license`` key
along with its corresponding table subkeys, ``text`` and ``file``.
.. _639-spec-key-license-expression:
Add string value to ``license`` key
'''''''''''''''''''''''''''''''''''
A top-level string value is defined
for the ``license`` key in the ``[project]`` table,
which is specified to be a valid SPDX license expression,
as :ref:`defined previously <639-license-expression-definition>`.
Its value maps to the ``License-Expression`` field in the core metadata.
Build tools SHOULD validate the expression as described in the
:ref:`639-spec-field-license-expression` section,
outputting an error or warning as specified.
When generating the core metadata, tools MUST perform case normalization.
If a top-level string value for the ``license`` key is present and valid,
for purposes of backward compatibility
tools MAY back-fill the ``License`` core metadata field
with the normalized value of the ``license`` key.
.. _639-spec-key-license-files:
Add ``license-files`` key
'''''''''''''''''''''''''
A new ``license-files`` key is added to the ``[project]`` table for specifying
paths in the project source tree relative to ``pyproject.toml`` to file(s)
containing licenses and other legal notices to be distributed with the package.
It corresponds to the ``License-File`` fields in the core metadata.
Its value is a table, which if present MUST contain one of two optional,
mutually exclusive subkeys, ``paths`` and ``globs``; if both are specified,
tools MUST raise an error. Both are arrays of strings; the ``paths`` subkey
contains verbatim file paths, and the ``globs`` subkey valid glob patterns,
which MUST be parsable by the ``glob`` `module <globmodule_>`__ in the
Python standard library.
**Note**: To avoid ambiguity, confusion and (per :pep:`20`, the Zen of Python)
"more than one (obvious) way to do it", allowing a flat array of strings
as the value for the ``license-files`` key has been
:ref:`left out for now <639-license-files-allow-flat-array>`.
Path delimiters MUST be the forward slash character (``/``),
and parent directory indicators (``..``) MUST NOT be used.
Tools MUST assume that license file content is valid UTF-8 encoded text,
and SHOULD validate this and raise an error if it is not.
If the ``paths`` subkey is a non-empty array, build tools:
- MUST treat each value as a verbatim, literal file path, and
MUST NOT treat them as glob patterns.
- MUST include each listed file in all distribution archives.
- MUST NOT match any additional license files beyond those explicitly
statically specified by the user under the ``paths`` subkey.
- MUST list each file path under a ``License-File`` field in the core metadata.
- MUST raise an error if one or more paths do not correspond to a valid file
in the project source that can be copied into the distribution archive.
If the ``globs`` subkey is a non-empty array, build tools:
- MUST treat each value as a glob pattern, and MUST raise an error if the
pattern contains invalid glob syntax.
- MUST include all files matched by at least one listed pattern in all
distribution archives.
- MAY exclude files matched by glob patterns that can be unambiguously
determined to be backup, temporary, hidden, OS-generated or VCS-ignored.
- MUST list each matched file path under a ``License-File`` field in the
core metadata.
- SHOULD issue a warning and MAY raise an error if no files are matched.
- MAY issue a warning if any individual user-specified pattern
does not match at least one file.
If the ``license-files`` key is present, and the ``paths`` or ``globs`` subkey
is set to a value of an empty array, then tools MUST NOT include any
license files and MUST NOT raise an error.
.. _639-default-patterns:
If the ``license-files`` key is not present and not explicitly marked as
``dynamic``, tools MUST assume a default value of the following:
.. code-block:: toml
license-files.globs = ["LICEN[CS]E*", "COPYING*", "NOTICE*", "AUTHORS*"]
In this case, tools MAY issue a warning if no license files are matched,
but MUST NOT raise an error.
If the ``license-files`` key is marked as ``dynamic`` (and not present),
to preserve consistent behavior with current tools and help ensure the packages
they create are legally distributable, build tools SHOULD default to
including at least the license files matching the above patterns, unless the
user has explicitly specified their own.
.. _639-spec-key-license:
Deprecate ``license`` key table subkeys
'''''''''''''''''''''''''''''''''''''''
Table values for the ``license`` key in the ``[project]`` table,
including the ``text`` and ``file`` table subkeys, are now deprecated.
If the new ``license-files`` key is present,
build tools MUST raise an error if the ``license`` key is defined
and has a value other than a single top-level string.
If the new ``license-files`` key is not present
and the ``text`` subkey is present in a ``license`` table,
tools SHOULD issue a warning informing users it is deprecated
and recommending a license expression as a top-level string key instead.
Likewise, if the new ``license-files`` key is not present
and the ``file`` subkey is present in the ``license`` table,
tools SHOULD issue a warning informing users it is deprecated and recommending
the ``license-files`` key instead.
If the specified license ``file`` is present in the source tree,
build tools SHOULD use it to fill the ``License-File`` field
in the core metadata, and MUST include the specified file
as if it were specified in a ``license-file.paths`` field.
If the file does not exist at the specified path,
tools MUST raise an informative error as previously specified.
However, tools MUST also still assume the
:ref:`specified default value <639-default-patterns>`
for the ``license-files`` key and also include,
in addition to a license file specified under the ``license.file`` subkey,
any license files that match the specified list of patterns.
Table values for the ``license`` key MAY be removed
from a new version of the specification in a future PEP.
.. _639-spec-project-formats:
License files in project formats
--------------------------------
A few minor additions will be made to the relevant existing specifications
to document, standardize and clarify what is already currently supported,
allowed and implemented behavior, as well as explicitly mention the root
license directory the license files are located in and relative to for
each format, per the :ref:`639-spec-field-license-file` section.
**Project source trees**
As described in the :ref:`639-spec-source-metadata` section, the
`Declaring Project Metadata specification <pep621spec_>`__
will be updated to reflect that license file paths MUST be relative to the
project root directory; i.e. the directory containing the ``pyproject.toml``
(or equivalently, other legacy project configuration,
e.g. ``setup.py``, ``setup.cfg``, etc).
**Source distributions** *(sdists)*
The `sdist specification <sdistspec_>`__ will be updated to reflect that for
``Metadata-Version`` is ``2.4`` or greater, the sdist MUST contain any
license files specified by ``License-File`` in the ``PKG-INFO`` at their
respective paths relative to the top-level directory of the sdist
(containing the ``pyproject.toml`` and the ``PKG-INFO`` core metadata).
**Built distributions** *(wheels)*
The `wheel specification <wheelspec_>`__ will be updated to reflect that if
the ``Metadata-Version`` is ``2.4`` or greater and one or more
``License-File`` fields is specified, the ``.dist-info`` directory MUST
contain a ``licenses`` subdirectory, which MUST contain the files listed
in the ``License-File`` fields in the ``METADATA`` file at their respective
paths relative to the ``licenses`` directory.
**Installed projects**
The `Recording Installed Projects specification <installedspec_>`__ will be
updated to reflect that if the ``Metadata-Version`` is ``2.4`` or greater
and one or more ``License-File`` fields is specified, the ``.dist-info``
directory MUST contain a ``licenses`` subdirectory which MUST contain
the files listed in the ``License-File`` fields in the ``METADATA`` file
at their respective paths relative to the ``licenses`` directory,
and that any files in this directory MUST be copied from wheels
by install tools.
.. _639-spec-converting-metadata:
Converting legacy metadata
--------------------------
Tools MUST NOT use the contents of the ``license.text`` ``[project]`` key
(or equivalent tool-specific format),
license classifiers or the value of the core metadata ``License`` field
to fill the top-level string value of the ``license`` key
or the core metadata ``License-Expression`` field
without informing the user and requiring unambiguous, affirmative user action
to select and confirm the desired license expression value before proceeding.
Tool authors, who need to automatically convert license classifiers to
SPDX identifiers, can use the
:ref:`recommendation <639-spec-mapping-classifiers-identifiers>` prepared by
the PEP authors.
.. _639-backwards-compatibility:
Backwards Compatibility
=======================
Adding a new, dedicated ``License-Expression`` core metadata field
and a top-level string value for the ``license`` key reserved for this purpose
in the ``pyproject.toml`` ``[project]`` table
unambiguously signals support for the specification in this PEP.
This avoids the risk of new tooling
misinterpreting a license expression as a free-form license description
or vice versa, and raises an error if and only if the user affirmatively
upgrades to the latest metadata version and adds the new field/key.
The legacy ``License`` core metadata field
and the ``license`` key table subkeys (``text`` and ``file``)
in the ``pyproject.toml`` ``[project]`` table
will be deprecated along with the license classifiers,
retaining backwards compatibility while gently preparing users for their
future removal. Such a removal would follow a suitable transition period, and
be left to a future PEP and a new version of the core metadata specification.
Formally specifying the new ``License-File`` core metadata field and the
inclusion of the listed files in the distribution merely codifies and
refines the existing practices in popular packaging tools, including the Wheel
and Setuptools projects, and is designed to be largely backwards-compatible
with their existing use of that field. Likewise, the new ``license-files``
key in the ``[project]`` table of ``pyproject.toml``
standardizes statically specifying the files to include,
as well as the default behavior, and allows other tools to make use of them,
while only having an effect once users and tools expressly adopt it.
Due to requiring license files not be flattened into ``.dist-info`` and
specifying that they should be placed in a dedicated ``licenses`` subdir,
wheels produced following this change will have differently-located
licenses relative to those produced via the previous unspecified,
installer-specific behavior, but as until this PEP there was no way of
discovering these files or accessing them programmatically, and this will
be further discriminated by a new metadata version, there aren't any foreseen
mechanism for this to pose a practical issue.
Furthermore, this resolves existing compatibility issues with the current
ad hoc behavior, namely license files being silently clobbered if they have
the same names as others at different paths, unknowingly rendering the wheel
undistributable, and conflicting with the names of other metadata files in
the same directory. Formally specifying otherwise would in fact block full
forward compatibility with additional standard or installer-specified files
and directories added to ``.dist-info``, as they too could conflict with
the names of existing licenses.
While minor additions will be made to the source distribution (sdist),
built distribution (wheel) and installed project specifications, all of these
are merely documenting, clarifying and formally specifying behaviors explicitly
allowed under their current respective specifications, and already implemented
in practice, and gating them behind the explicit presence of both the new
metadata versions and the new fields. In particular, sdists may contain
arbitrary files following the project source tree layout, and formally
mentioning that these must include the license files listed in the metadata
merely documents and codifies existing Setuptools practice. Likewise, arbitrary
installer-specific files are allowed in the ``.dist-info`` directory of wheels
and copied to installed projects, and again this PEP just formally clarifies
and standardizes what is already being done.
Finally, while this PEP does propose PyPI implement validation of the new
``License-Expression`` and ``License-File`` fields, this has no effect on
existing packages, nor any effect on any new distributions uploaded unless they
explicitly choose to opt in to using these new fields while not
following the requirements in the specification. Therefore, this does not have
a backward compatibility impact, and in fact ensures forward compatibility with
any future changes by ensuring all distributions uploaded to PyPI with the new
fields are valid and conform to the specification.
.. _639-security-implications:
Security Implications
=====================
This PEP has no foreseen security implications: the ``License-Expression``
field is a plain string and the ``License-File`` fields are file paths.
Neither introduces any known new security concerns.
.. _639-how-to-teach-this:
How to Teach This
=================
The simple cases are simple: a single license identifier is a valid license
expression, and a large majority of packages use a single license.
The plan to teach users of packaging tools how to express their package's
license with a valid license expression is to have tools issue informative
messages when they detect invalid license expressions, or when the deprecated
``License`` field or license classifiers are used.
An immediate, descriptive error message if an invalid ``License-Expression``
is used will help users understand they need to use SPDX identifiers in
this field, and catch them if they make a mistake.
For authors still using the now-deprecated, less precise and more redundant
``License`` field or license classifiers, packaging tools will warn
them and inform them of the modern replacement, ``License-Expression``.
Finally, for users who may have forgotten or not be aware they need to do so,
publishing tools will gently guide them toward including ``license``
and ``license-files`` in their project source metadata.
Tools may also help with the conversion and suggest a license expression in
many, if not most common cases:
- The appendix :ref:`639-spec-mapping-classifiers-identifiers` provides
tool authors with recommendation on how to suggest a license expression produced
from legacy classifiers.
- Tools may also be able to infer and suggest how to update
an existing ``License`` value in project source metadata
and convert that to a license expression,
as also :ref:`specified in this PEP <639-spec-converting-metadata>`.
For instance, a tool may suggest converting a value of ``MIT``
in the ``license.text`` key in ``[project]``
(or the equivalent in tool-specific formats)
to a top-level string value of the ``license`` key (or equivalent).
Likewise, a tool could suggest converting from a ``License`` of ``Apache2``
(which is not a valid license expression
as :ref:`defined in this PEP <639-spec-field-license-expression>`)
to a ``License-Expression`` of ``Apache-2.0``
(the equivalent valid license expression using an SPDX license identifier).
.. _639-reference-implementation:
Reference Implementation
========================
Tools will need to support parsing and validating license expressions in the
``License-Expression`` field.
The `license-expression library <licenseexplib_>`__ is a reference Python
implementation that handles license expressions including parsing,
formatting and validation, using flexible lists of license symbols
(including SPDX license IDs and any extra identifiers included here).
It is licensed under Apache-2.0 and is already used in several projects,
including the `SPDX Python Tools <spdxpy_>`__,
the `ScanCode toolkit <scancodetk_>`__
and the Free Software Foundation Europe (FSFE) `REUSE project <reuse_>`__.
.. _639-rejected-ideas:
Rejected Ideas
==============
Many alternative ideas were proposed and after a careful consideration,
rejected. The exhaustive list including the rationale for rejecting can be found
in a :ref:`separate page <639-rejected-ideas-details>`.
Open Issues
===========
Should the ``License`` field be back-filled, or mutually exclusive?
-------------------------------------------------------------------
At present, this PEP explicitly allows, but does not formally recommend or
require, build tools to back-fill the ``License`` core metadata field with
the verbatim text from the ``License-Expression`` field. This would
presumably improve backwards compatibility and was suggested
by some on the Discourse thread. On the other hand, allowing it does
increase complexity and is less of a clean, consistent separation,
preventing the ``License`` field from being completely mutually exclusive
with the new ``License-Expression`` field and requiring that their values
match.
As such, it would be very useful to have a more concrete and specific
rationale and use cases for the back-filled data, and give fuller
consideration to any potential benefits or drawbacks of this approach,
in order to come to a final consensus on this matter that can be appropriately
justified here.
Therefore, is the status quo expressed here acceptable, allowing tools
leeway to decide this for themselves? Should this PEP formally recommend,
or even require, that tools back-fill this metadata (which would presumably
be reversed once a breaking revision of the metadata spec is issued)?
Or should this not be explicitly allowed, discouraged or even prohibited?
Should custom license identifiers be allowed?
---------------------------------------------
The current version of this PEP retains the behavior of only specifying
the use of SPDX-defined license identifiers, as well as the explicitly defined
custom identifiers ``LicenseRef-Public-Domain`` and ``LicenseRef-Proprietary``
to handle the two common cases where projects have a license, but it is not
one that has a recognized SPDX license identifier.
For maximum flexibility, custom ``LicenseRef-<CUSTOM-TEXT>`` license
identifiers could be allowed, which could potentially be useful for niche
cases or corporate environments where ``LicenseRef-Proprietary`` is not
appropriate or insufficiently specific, but relying on mainstream Python
build tooling and the ``License-Expression`` metadata field is still
desirable to use for this purpose.
This has the downsides, however, of not catching misspellings of the
canonically defined license identifiers and thus producing license metadata
that is not a valid match for what the author intended, as well as users
potentially thinking they have to prepend ``LicenseRef`` in front of valid
license identifiers, as there seems to be some previous confusion about.
Furthermore, this encourages the proliferation of bespoke license identifiers,
which obviates the purpose of enabling clear, unambiguous and well
understood license metadata for which this PEP was created.
Indeed, for niche cases that need specific, proprietary custom licenses,
they could always simply specify ``LicenseRef-Proprietary``, and then
include the actual license files needed to unambiguously identify the license
regardless (if not using SPDX license identifiers) under the ``License-File``
fields. Requiring standards-conforming tools to allow custom license
identifiers does not seem very useful, since standard tools will not recognize
bespoke ones or know how to treat them. By contrast, bespoke tools, which
would be required in any case to understand and act on custom identifiers,
are explicitly allowed, with good reason (thus the ``SHOULD`` keyword)
to not require that license identifiers conform to those listed here.
Therefore, this specification still allows such use in private corporate
environments or specific ecosystems, while avoiding the disadvantages of
imposing them on all mainstream packaging tools.
As an alternative, a literal ``LicenseRef-Custom`` identifier could be
defined, which would more explicitly indicate that the license cannot be
expressed with defined identifiers and the license text should be referenced
for details, without carrying the negative and potentially inappropriate
implications of ``LicenseRef-Proprietary``. This would avoid the main
mentioned downsides (misspellings, confusion, license proliferation) of
the approve approach of allowing an arbitrary ``LicenseRef``, while
addressing several of the potential theoretical scenarios cited for it.
On the other hand, as SPDX aims to (and generally does) encompass all
FSF-recognized "Free" and OSI-approved "Open Source" licenses,
and those sources are kept closely in sync and are now relatively stable,
anything outside those bounds would generally be covered by
``LicenseRef-Proprietary``, thus making ``LicenseRef-Custom`` less specific
in that regard, and somewhat redundant to it. Furthermore, it may mislead
authors of projects with complex/multiple licenses that they should use it
over specifying a license expression.
At present, the PEP retains the existing approach over either of these, given
the use cases and benefits were judged to be sufficiently marginal based
on the current understanding of the packaging landscape. For both these
proposals, however, if more concrete use cases emerge, this can certainly
be reconsidered, either for this current PEP or a future one (before or
in tandem with actually removing the legacy unstructured ``License``
metadata field). Not defining this now enables allowing it later
(or still now, with custom packaging tools), without affecting backward
compatibility, while the same is not so if they are allowed now and later
determined to be unnecessary or too problematic in practice.
Appendices
==========
A list of auxilliary documents is provided:
- Detailed :ref:`Licensing Examples <639-examples>`,
- :ref:`User Scenarios <639-user-scenarios>`,
- :ref:`License Documentation in Python and Other Projects <639-license-doc-python>`,
- :ref:`Mapping License Classifiers to SPDX Identifiers <639-spec-mapping-classifiers-identifiers>`,
- :ref:`Rejected Ideas <639-rejected-ideas-details>` in detail.
References
==========
.. _cc0: https://creativecommons.org/publicdomain/zero/1.0/
.. _cdstats: https://clearlydefined.io/stats
.. _choosealicense: https://choosealicense.com/
.. _classifierissue: https://github.com/pypa/trove-classifiers/issues/17
.. _classifiers: https://pypi.org/classifiers
.. _classifiersrepo: https://github.com/pypa/trove-classifiers
.. _clearlydefined: https://clearlydefined.io
.. _coremetadataspec: https://packaging.python.org/specifications/core-metadata
.. _coremetadataclassifiers: https://packaging.python.org/en/latest/specifications/core-metadata/#classifier-multiple-use
.. _globmodule: https://docs.python.org/3/library/glob.html
.. _hatch: https://hatch.pypa.io/latest/
.. _hatchimplementation: https://discuss.python.org/t/12622/22
.. _installedspec: https://packaging.python.org/specifications/recording-installed-packages/
.. _interopissue: https://github.com/pypa/interoperability-peps/issues/46
.. _licenseexplib: https://github.com/nexB/license-expression/
.. _osi: https://opensource.org
.. _packagingissue: https://github.com/pypa/packaging-problems/issues/41
.. _pep621spec: https://packaging.python.org/specifications/declaring-project-metadata/
.. _pep621specdynamic: https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#dynamic
.. _pepissue: https://github.com/pombredanne/spdx-pypi-pep/issues/1
.. _pypi: https://pypi.org/
.. _pypugdistributionpackage: https://packaging.python.org/en/latest/glossary/#term-Distribution-Package
.. _pypugglossary: https://packaging.python.org/glossary/
.. _pypugproject: https://packaging.python.org/en/latest/glossary/#term-Project
.. _reuse: https://reuse.software/
.. _scancodetk: https://github.com/nexB/scancode-toolkit
.. _sdistspec: https://packaging.python.org/specifications/source-distribution-format/
.. _setuptoolsfiles: https://github.com/pypa/setuptools/issues/2739
.. _setuptoolspep639: https://github.com/pypa/setuptools/pull/2645
.. _spdx: https://spdx.dev/
.. _spdxid: https://spdx.dev/ids/
.. _spdxlist: https://spdx.org/licenses/
.. _spdxpression: https://spdx.github.io/spdx-spec/v2.2.2/SPDX-license-expressions/
.. _spdxpy: https://github.com/spdx/tools-python/
.. _spdxversion: https://github.com/pombredanne/spdx-pypi-pep/issues/6
.. _wheelfiles: https://github.com/pypa/wheel/issues/138
.. _wheelproject: https://wheel.readthedocs.io/en/stable/
.. _wheelspec: https://packaging.python.org/specifications/binary-distribution-format/
Acknowledgments
===============
- Alyssa Coghlan
- Kevin P. Fleming
- Pradyun Gedam
- Oleg Grenrus
- Dustin Ingram
- Chris Jerdonek
- Cyril Roelandt
- Luis Villa
Copyright
=========
This document is placed in the public domain or under the
`CC0-1.0-Universal license <cc0_>`__, whichever is more permissive.