PEP 723: Update based on feedback (#3279)

Co-authored-by: Adam Turner <9087854+aa-turner@users.noreply.github.com>
This commit is contained in:
Ofek Lev 2023-08-09 21:30:10 -04:00 committed by GitHub
parent dc8c5c47a9
commit 6adff45cef
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 266 additions and 57 deletions

View File

@ -73,36 +73,62 @@ begun getting frustrated with the lack of unification regarding both tooling
and specs. Adding yet another way to define metadata, even for a currently and specs. Adding yet another way to define metadata, even for a currently
unsatisfied use case, would further fragment the community. unsatisfied use case, would further fragment the community.
A use case that this PEP wishes to support that other formats may preclude is The following are some of the use cases that this PEP wishes to support:
a script that desires to transition to a directory-type project. A user may
be rapidly prototyping locally or in a remote REPL environment and then decide * A user facing CLI that is capable of executing scripts. If we take Hatch as
to transition to a more formal project if their idea works out. This an example, the interface would be simply
intermediate script stage would be very useful to have fully reproducible bug ``hatch run /path/to/script.py [args]`` and Hatch will manage the
reports. By using the same metadata format, the user can simply copy and paste environment for that script. Such tools could be used as shebang lines on
the metadata into a ``pyproject.toml`` file and continue working without having non-Windows systems e.g. ``#!/usr/bin/env hatch run``. You would also be
to learn a new format. More likely, even, is that tooling will eventually able to enter a shell into that environment like other projects by doing
support this transformation with a single command. ``hatch -p /path/to/script.py shell`` since the project flag would learn
that project metadata could be read from a single file.
* A script that desires to transition to a directory-type project. A user may
be rapidly prototyping locally or in a remote REPL environment and then
decide to transition to a more formal project layout if their idea works
out. This intermediate script stage would be very useful to have fully
reproducible bug reports. By using the same metadata format, the user can
simply copy and paste the metadata into a ``pyproject.toml`` file and
continue working without having to learn a new format. More likely, even, is
that tooling will eventually support this transformation with a single
command.
* Users that wish to avoid manual dependency management. For example, package
managers that have commands to add/remove dependencies or dependency update
automation in CI that triggers based on new versions or in response to
CVEs [1]_.
Specification Specification
============= =============
Any Python script may assign a variable named ``__pyproject__`` to a multi-line Any Python script may assign a variable named ``__pyproject__`` to a multi-line
*double-quoted* string (``"""``) containing a valid TOML document. The opening *double-quoted* string literal (``"""``) containing a valid TOML document. The
of the string MUST be on the same line as the assignment. The closing of the variable MUST start at the beginning of the line and the opening of the string
string MUST be on a line by itself, and MUST NOT be indented. MUST be on the same line as the assignment. The closing of the string MUST be
on a line by itself, and MUST NOT be indented.
When there are multiple ``__pyproject__`` variables defined, tools MUST produce
an error.
The TOML document MUST NOT contain multi-line double-quoted strings, as that The TOML document MUST NOT contain multi-line double-quoted strings, as that
would conflict with the Python string containing the document. Single-quoted would conflict with the Python string containing the document. Single-quoted
multi-line TOML strings may be used instead. multi-line TOML strings may be used instead.
This is the canonical regular expression that MUST be used to parse the
metadata:
.. code:: text
(?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$
In circumstances where there is a discrepancy between the regular expression
and the text specification, the regular expression takes precedence.
Tools reading embedded metadata MAY respect the standard Python encoding Tools reading embedded metadata MAY respect the standard Python encoding
declaration. If they choose not to do so, they MUST process the file as UTF-8. declaration. If they choose not to do so, they MUST process the file as UTF-8.
This document MAY include the ``[project]`` and ``[tool]`` tables but MUST NOT This document MAY include the ``[project]``, ``[tool]`` and ``[build-system]``
define the ``[build-system]`` table. The ``[build-system]`` table MAY be tables.
allowed in a future PEP that standardizes how backends are to build
distributions from single file scripts.
The ``[project]`` table differs in the following ways: The ``[project]`` table differs in the following ways:
@ -110,11 +136,15 @@ The ``[project]`` table differs in the following ways:
dynamically by tools if the user does not define them dynamically by tools if the user does not define them
* These fields do not need to be listed in the ``dynamic`` array * These fields do not need to be listed in the ``dynamic`` array
Non-script running tools MAY choose to read from their expected ``[tool]`` Non-script running tools MAY choose to alter their behavior based on
sub-table. If a single-file script is not the sole input to a tool then configuration that is stored in their expected ``[tool]`` sub-table.
behavior SHOULD NOT be altered based on the embedded metadata. For example,
if a linter is invoked with the path to a directory, it SHOULD behave the same Build frontends SHOULD NOT use the backend defined in the ``[build-system]``
as if zero files had embedded metadata. table to build scripts with embedded metadata. This requires a new PEP because
the current methods defined in :pep:`517` act upon a directory, not a file.
We use ``SHOULD NOT`` instead of ``MUST NOT`` in order to allow tools to
experiment [2]_ with such functionality before we standardize (indeed this
would be a requirement).
Example Example
------- -------
@ -180,15 +210,6 @@ raised in the Rust pre-RFC.
Reference Implementation Reference Implementation
======================== ========================
This regular expression may be used to parse the metadata:
.. code:: text
(?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$
In circumstances where there is a discrepancy between the regular expression
and the text specification, the text specification takes precedence.
The following is an example of how to read the metadata on Python 3.11 or The following is an example of how to read the metadata on Python 3.11 or
higher. higher.
@ -196,9 +217,16 @@ higher.
import re, tomllib import re, tomllib
REGEX = r'(?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$'
def read(script: str) -> dict | None: def read(script: str) -> dict | None:
match = re.search(r'(?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$', script) matches = list(re.finditer(REGEX, script))
return tomllib.loads(match.group(1)) if match else None if len(matches) > 1:
raise ValueError('Multiple __pyproject__ definitions found')
elif len(matches) == 1:
return tomllib.loads(matches[0])
else:
return None
Often tools will edit dependencies like package managers or dependency update Often tools will edit dependencies like package managers or dependency update
automation in CI. The following is a crude example of modifying the content automation in CI. The following is a crude example of modifying the content
@ -218,7 +246,7 @@ using the ``tomlkit`` library.
Note that this example used a library that preserves TOML formatting. This is Note that this example used a library that preserves TOML formatting. This is
not a requirement for editing by any means but rather is a "nice to have" not a requirement for editing by any means but rather is a "nice to have"
especially since there are unlikely to be embedded comments. feature.
Backwards Compatibility Backwards Compatibility
@ -257,8 +285,13 @@ The risk here is part of the functionality of the tool being used to run the
script, and as such should already be addressed by the tool itself. The only script, and as such should already be addressed by the tool itself. The only
additional risk introduced by this PEP is if an untrusted script with a additional risk introduced by this PEP is if an untrusted script with a
embedded metadata is run, when a potentially malicious dependency might be embedded metadata is run, when a potentially malicious dependency might be
installed. This risk is addressed by the normal good practice of reviewing code installed.
before running it.
This risk is addressed by the normal good practice of reviewing code
before running it. Additionally, tools may be able to provide locking
functionality when configured by their ``[tool]`` sub-table to, for example,
add the resolution result as managed metadata somewhere in the script (this
is what Go's ``gorun`` can do).
How to Teach This How to Teach This
@ -270,9 +303,13 @@ about metadata itself direct users to the living document that describes
`project metadata <pyproject metadata_>`_. `project metadata <pyproject metadata_>`_.
We will document that the name and version fields in the ``[project]`` table We will document that the name and version fields in the ``[project]`` table
may be elided for simplicity. Additionally, we will have guidance (perhaps may be elided for simplicity. Additionally, we will have guidance explaining
temporary) explaining that single-file scripts cannot be built into a wheel that single-file scripts cannot (yet) be built into a wheel via standard means.
and therefore you would never see the associated ``[build-system]`` metadata.
We will explain that it is up to individual tools whether or not their behavior
is altered based on the embedded metadata. For example, every script runner may
not be able to provide an environment for specific Python versions as defined
by the ``requires-python`` field.
Finally, we may want to list some tools that support this PEP's format. Finally, we may want to list some tools that support this PEP's format.
@ -284,6 +321,47 @@ Tools that support managing different versions of Python should attempt to use
the highest available version of Python that is compatible with the script's the highest available version of Python that is compatible with the script's
``requires-python`` metadata, if defined. ``requires-python`` metadata, if defined.
For projects that have large multi-line external metadata to embed like a
README file, it is recommended that they become directories with a
``pyproject.toml`` file. While this is technically allowed, it is strongly
discouraged to have large chunks of multi-line metadata and is indicative
of the fact that a script has graduated to a more traditional layout.
If the content is small, for example in the case of internal packages, it is
recommended that multi-line *single-quoted* TOML strings (``'''``) be used.
For example:
.. code:: python
__pyproject__ = """
[project]
readme.content-type = "text/markdown"
readme.text = '''
# Some Project
Please refer to our corporate docs
for more information.
'''
"""
Tooling buy-in
==============
The following is a list of tools that have expressed support for this PEP or
have committed to implementing support should it be accepted:
* `Pantsbuild and Pex <https://discuss.python.org/t/31151/15>`__: expressed
support for any way to define dependencies and also features that this PEP
considers as valid use cases such as building packages from scripts and
embedding tool configuration
* `Mypy <https://discuss.python.org/t/31151/16>`__ and
`Ruff <https://discuss.python.org/t/31151/42>`__: strongly expressed support
for embedding tool configuration as it would solve existing pain points for
users
* `Hatch <https://discuss.python.org/t/31151/53>`__: (author of this PEP)
expressed support for all aspects of this PEP, and will be one of the first
tools to support running scripts with specifically configured Python versions
Rejected Ideas Rejected Ideas
============== ==============
@ -317,6 +395,7 @@ the setup is too complex for the average user like when requiring Nvidia
drivers. Situations like this would allow users to proceed with what they want drivers. Situations like this would allow users to proceed with what they want
to do whereas otherwise they may stop at that point altogether. to do whereas otherwise they may stop at that point altogether.
.. _723-comment-block:
Why not use a comment block resembling requirements.txt? Why not use a comment block resembling requirements.txt?
-------------------------------------------------------- --------------------------------------------------------
@ -411,23 +490,32 @@ small subset of users.
Studio Code would be able to provide TOML syntax highlighting much more Studio Code would be able to provide TOML syntax highlighting much more
easily than each writing custom logic for this feature. easily than each writing custom logic for this feature.
Additionally, the block comment format goes against the recommendation of Additionally, since the original block comment alternative format went against
:pep:`8`: the recommendation of :pep:`8` and as a result linters and IDE auto-formatters
that respected the recommendation would
`fail by default <https://discuss.python.org/t/29905/247>`__, the final
proposal uses standard comments starting with a single ``#`` character.
Each line of a block comment starts with a ``#`` and a single space (unless The concept of regular comments that do not appear to be intended for machines
it is indented text inside the comment). [...] Paragraphs inside a block (i.e. `encoding declarations`__) affecting behavior would not be customary to
comment are separated by a line containing a single ``#``. users of Python and goes directly against the "explicit is better than
implicit" foundational principle.
Linters and IDE auto-formatters that respect this long-time recommendation __ https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations
would fail by default. The following uses the example from :pep:`722`:
.. code:: bash Users typing what to them looks like prose could alter runtime behavior. This
PEP takes the view that the possibility of that happening, even when a tool
$ flake8 . has been set up as such (maybe by a sysadmin), is unfriendly to users.
.\script.py:3:1: E266 too many leading '#' for block comment
.\script.py:4:1: E266 too many leading '#' for block comment
.\script.py:5:1: E266 too many leading '#' for block comment
Finally, and critically, the alternatives to this PEP like :pep:`722` do not
satisfy the use cases enumerated herein, such as setting the supported Python
versions, the eventual building of scripts into packages, and the ability to
have machines edit metadata on behalf of users. It is very likely that the
requests for such features persist and conceivable that another PEP in the
future would allow for the embedding of such metadata. At that point there
would be multiple ways to achieve the same thing which goes against our
foundational principle of "there should be one - and preferably only one -
obvious way to do it".
Why not consider scripts as projects without wheels? Why not consider scripts as projects without wheels?
---------------------------------------------------- ----------------------------------------------------
@ -443,13 +531,67 @@ pinning e.g. a lock file with some sort of hash checking. Such projects would
never be distributed as a wheel (except for maybe a transient editable one never be distributed as a wheel (except for maybe a transient editable one
that is created when doing ``pip install -e .``). that is created when doing ``pip install -e .``).
In contrast, scripts are managed loosely by its runner and would almost In contrast, scripts are managed loosely by their runners and would almost
always have relaxed dependency constraints. Additionally, to reduce always have relaxed dependency constraints. Additionally, there may be a future
friction associated with managing small projects there may be a future in which there is `a standard way <723-limit-build-backend_>`_ to ship projects
in which there is a standard prescribed way to ship projects that are in that are in the form of a single file.
the form of a single file. The author of the Rust RFC for embedding metadata
.. _723-limit-build-backend:
Why not limit build backend behavior?
-------------------------------------
A previous version of this PEP proposed that the ``[build-system]`` table
mustn't be defined. The rationale was that builds would never occur so it
did not make sense to allow this section.
We removed that limitation based on
`feedback <https://discuss.python.org/t/31151/9>`__ stating that there
are already tools that exist in the wild that build wheels and source
distributions from single files.
The author of the Rust RFC for embedding metadata
`mentioned to us <https://discuss.python.org/t/29905/179>`__ that they are `mentioned to us <https://discuss.python.org/t/29905/179>`__ that they are
actively looking into that based on user feedback. actively looking into that as well based on user feedback saying that there
is unnecessary friction with managing small projects, which we have also
heard in the Python community.
There has been `a commitment <https://discuss.python.org/t/31151/15>`__ to
support this by at least one major build system.
Why not limit tool behavior?
----------------------------
A previous version of this PEP proposed that non-script running tools SHOULD
NOT modify their behavior when the script is not the sole input to the tool.
For example, if a linter is invoked with the path to a directory, it SHOULD
behave the same as if zero files had embedded metadata.
This was done as a precaution to avoid tool behavior confusion and generating
various feature requests for tools to support this PEP. However, during
discussion we received `feedback <https://discuss.python.org/t/31151/16>`__
from maintainers of tools that this would be undesirable and potentially
confusing to users. Additionally, this may allow for a universally easier
way to configure tools in certain circumstances and solve existing issues.
Why not accept all valid Python expression syntax?
--------------------------------------------------
There has been a suggestion that we should not restrict how the
``__pyproject__`` variable is defined and we should parse the abstract syntax
tree. For example:
.. code:: python
__pyproject__ = (
"""
[project]
dependencies = []
"""
)
We will not be doing this so that every language has the possibility to read
the metadata without dependence on knowledge of every version of Python.
Why not just set up a Python project with a ``pyproject.toml``? Why not just set up a Python project with a ``pyproject.toml``?
--------------------------------------------------------------- ---------------------------------------------------------------
@ -472,6 +614,61 @@ suggestion until the `current discussion on Discourse
won't be distributed as wheels is resolved. And even then, it doesn't address won't be distributed as wheels is resolved. And even then, it doesn't address
the "sending someone a script in a gist or email" use case. the "sending someone a script in a gist or email" use case.
Why not infer the requirements from import statements?
------------------------------------------------------
The idea would be to automatically recognize ``import`` statements in the source
file and turn them into a list of requirements.
However, this is infeasible for several reasons. First, the points above about
the necessity to keep the syntax easily parsable, for all Python versions, also
by tools written in other languages, apply equally here.
Second, PyPI and other package repositories conforming to the Simple Repository
API do not provide a mechanism to resolve package names from the module names
that are imported (see also `this related discussion`__).
__ https://discuss.python.org/t/record-the-top-level-names-of-a-wheel-in-metadata/29494
Third, even if repositories did offer this information, the same import name may
correspond to several packages on PyPI. One might object that disambiguating
which package is wanted would only be needed if there are several projects
providing the same import name. However, this would make it easy for anyone to
unintentionally or malevolently break working scripts, by uploading a package to
PyPI providing an import name that is the same as an existing project. The
alternative where, among the candidates, the first package to have been
registered on the index is chosen, would be confusing in case a popular package
is developed with the same import name as an existing obscure package, and even
harmful if the existing package is malware intentionally uploaded with a
sufficiently generic import name that has a high probability of being reused.
A related idea would be to attach the requirements as comments to the import
statements instead of gathering them in a block, with a syntax such as::
import numpy as np # requires: numpy
import rich # requires: rich
This still suffers from parsing difficulties. Also, where to place the comment
in the case of multiline imports is ambiguous and may look ugly::
from PyQt5.QtWidgets import (
QCheckBox, QComboBox, QDialog, QDialogButtonBox,
QGridLayout, QLabel, QSpinBox, QTextEdit
) # requires: PyQt5
Furthermore, this syntax cannot behave as might be intuitively expected
in all situations. Consider::
import platform
if platform.system() == "Windows":
import pywin32 # requires: pywin32
Here, the user's intent is that the package is only required on Windows, but
this cannot be understood by the script runner (the correct way to write
it would be ``requires: pywin32 ; sys_platform == 'win32'``).
(Thanks to Jean Abou-Samra for the clear discussion of this point)
Why not use a requirements file for dependencies? Why not use a requirements file for dependencies?
------------------------------------------------- -------------------------------------------------
@ -574,6 +771,18 @@ References
.. _pyproject without wheels: https://discuss.python.org/t/projects-that-arent-meant-to-generate-a-wheel-and-pyproject-toml/29684 .. _pyproject without wheels: https://discuss.python.org/t/projects-that-arent-meant-to-generate-a-wheel-and-pyproject-toml/29684
Footnotes
=========
.. [1] A large number of users use scripts that are version controlled. For
example, `the SREs that were mentioned <723-comment-block_>`_ or
projects that require special maintenance like the
`AWS CLI <https://github.com/aws/aws-cli/tree/4393dcdf044a5275000c9c193d1933c07a08fdf1/scripts>`__
or `Calibre <https://github.com/kovidgoyal/calibre/tree/master/setup>`__.
.. [2] For example, projects like Hatch and Poetry have their own backends
and may wish to support this use case only when their backend is used.
Copyright Copyright
========= =========