From 6adff45cefa58fa483607c62130b5441b9859e0d Mon Sep 17 00:00:00 2001 From: Ofek Lev Date: Wed, 9 Aug 2023 21:30:10 -0400 Subject: [PATCH] PEP 723: Update based on feedback (#3279) Co-authored-by: Adam Turner <9087854+aa-turner@users.noreply.github.com> --- pep-0723.rst | 323 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 266 insertions(+), 57 deletions(-) diff --git a/pep-0723.rst b/pep-0723.rst index a3197e8ae..ae36e5ba3 100644 --- a/pep-0723.rst +++ b/pep-0723.rst @@ -73,36 +73,62 @@ begun getting frustrated with the lack of unification regarding both tooling and specs. Adding yet another way to define metadata, even for a currently unsatisfied use case, would further fragment the community. -A use case that this PEP wishes to support that other formats may preclude is -a script that desires to transition to a directory-type project. A user may -be rapidly prototyping locally or in a remote REPL environment and then decide -to transition to a more formal project if their idea works out. This -intermediate script stage would be very useful to have fully reproducible bug -reports. By using the same metadata format, the user can simply copy and paste -the metadata into a ``pyproject.toml`` file and continue working without having -to learn a new format. More likely, even, is that tooling will eventually -support this transformation with a single command. +The following are some of the use cases that this PEP wishes to support: + +* A user facing CLI that is capable of executing scripts. If we take Hatch as + an example, the interface would be simply + ``hatch run /path/to/script.py [args]`` and Hatch will manage the + environment for that script. Such tools could be used as shebang lines on + non-Windows systems e.g. ``#!/usr/bin/env hatch run``. You would also be + able to enter a shell into that environment like other projects by doing + ``hatch -p /path/to/script.py shell`` since the project flag would learn + that project metadata could be read from a single file. +* A script that desires to transition to a directory-type project. A user may + be rapidly prototyping locally or in a remote REPL environment and then + decide to transition to a more formal project layout if their idea works + out. This intermediate script stage would be very useful to have fully + reproducible bug reports. By using the same metadata format, the user can + simply copy and paste the metadata into a ``pyproject.toml`` file and + continue working without having to learn a new format. More likely, even, is + that tooling will eventually support this transformation with a single + command. +* Users that wish to avoid manual dependency management. For example, package + managers that have commands to add/remove dependencies or dependency update + automation in CI that triggers based on new versions or in response to + CVEs [1]_. Specification ============= Any Python script may assign a variable named ``__pyproject__`` to a multi-line -*double-quoted* string (``"""``) containing a valid TOML document. The opening -of the string MUST be on the same line as the assignment. The closing of the -string MUST be on a line by itself, and MUST NOT be indented. +*double-quoted* string literal (``"""``) containing a valid TOML document. The +variable MUST start at the beginning of the line and the opening of the string +MUST be on the same line as the assignment. The closing of the string MUST be +on a line by itself, and MUST NOT be indented. + +When there are multiple ``__pyproject__`` variables defined, tools MUST produce +an error. The TOML document MUST NOT contain multi-line double-quoted strings, as that would conflict with the Python string containing the document. Single-quoted multi-line TOML strings may be used instead. +This is the canonical regular expression that MUST be used to parse the +metadata: + +.. code:: text + + (?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$ + +In circumstances where there is a discrepancy between the regular expression +and the text specification, the regular expression takes precedence. + Tools reading embedded metadata MAY respect the standard Python encoding declaration. If they choose not to do so, they MUST process the file as UTF-8. -This document MAY include the ``[project]`` and ``[tool]`` tables but MUST NOT -define the ``[build-system]`` table. The ``[build-system]`` table MAY be -allowed in a future PEP that standardizes how backends are to build -distributions from single file scripts. +This document MAY include the ``[project]``, ``[tool]`` and ``[build-system]`` +tables. The ``[project]`` table differs in the following ways: @@ -110,11 +136,15 @@ The ``[project]`` table differs in the following ways: dynamically by tools if the user does not define them * These fields do not need to be listed in the ``dynamic`` array -Non-script running tools MAY choose to read from their expected ``[tool]`` -sub-table. If a single-file script is not the sole input to a tool then -behavior SHOULD NOT be altered based on the embedded metadata. For example, -if a linter is invoked with the path to a directory, it SHOULD behave the same -as if zero files had embedded metadata. +Non-script running tools MAY choose to alter their behavior based on +configuration that is stored in their expected ``[tool]`` sub-table. + +Build frontends SHOULD NOT use the backend defined in the ``[build-system]`` +table to build scripts with embedded metadata. This requires a new PEP because +the current methods defined in :pep:`517` act upon a directory, not a file. +We use ``SHOULD NOT`` instead of ``MUST NOT`` in order to allow tools to +experiment [2]_ with such functionality before we standardize (indeed this +would be a requirement). Example ------- @@ -180,15 +210,6 @@ raised in the Rust pre-RFC. Reference Implementation ======================== -This regular expression may be used to parse the metadata: - -.. code:: text - - (?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$ - -In circumstances where there is a discrepancy between the regular expression -and the text specification, the text specification takes precedence. - The following is an example of how to read the metadata on Python 3.11 or higher. @@ -196,9 +217,16 @@ higher. import re, tomllib + REGEX = r'(?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$' + def read(script: str) -> dict | None: - match = re.search(r'(?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$', script) - return tomllib.loads(match.group(1)) if match else None + matches = list(re.finditer(REGEX, script)) + if len(matches) > 1: + raise ValueError('Multiple __pyproject__ definitions found') + elif len(matches) == 1: + return tomllib.loads(matches[0]) + else: + return None Often tools will edit dependencies like package managers or dependency update automation in CI. The following is a crude example of modifying the content @@ -218,7 +246,7 @@ using the ``tomlkit`` library. Note that this example used a library that preserves TOML formatting. This is not a requirement for editing by any means but rather is a "nice to have" -especially since there are unlikely to be embedded comments. +feature. Backwards Compatibility @@ -257,8 +285,13 @@ The risk here is part of the functionality of the tool being used to run the script, and as such should already be addressed by the tool itself. The only additional risk introduced by this PEP is if an untrusted script with a embedded metadata is run, when a potentially malicious dependency might be -installed. This risk is addressed by the normal good practice of reviewing code -before running it. +installed. + +This risk is addressed by the normal good practice of reviewing code +before running it. Additionally, tools may be able to provide locking +functionality when configured by their ``[tool]`` sub-table to, for example, +add the resolution result as managed metadata somewhere in the script (this +is what Go's ``gorun`` can do). How to Teach This @@ -270,9 +303,13 @@ about metadata itself direct users to the living document that describes `project metadata `_. We will document that the name and version fields in the ``[project]`` table -may be elided for simplicity. Additionally, we will have guidance (perhaps -temporary) explaining that single-file scripts cannot be built into a wheel -and therefore you would never see the associated ``[build-system]`` metadata. +may be elided for simplicity. Additionally, we will have guidance explaining +that single-file scripts cannot (yet) be built into a wheel via standard means. + +We will explain that it is up to individual tools whether or not their behavior +is altered based on the embedded metadata. For example, every script runner may +not be able to provide an environment for specific Python versions as defined +by the ``requires-python`` field. Finally, we may want to list some tools that support this PEP's format. @@ -284,6 +321,47 @@ Tools that support managing different versions of Python should attempt to use the highest available version of Python that is compatible with the script's ``requires-python`` metadata, if defined. +For projects that have large multi-line external metadata to embed like a +README file, it is recommended that they become directories with a +``pyproject.toml`` file. While this is technically allowed, it is strongly +discouraged to have large chunks of multi-line metadata and is indicative +of the fact that a script has graduated to a more traditional layout. + +If the content is small, for example in the case of internal packages, it is +recommended that multi-line *single-quoted* TOML strings (``'''``) be used. +For example: + +.. code:: python + + __pyproject__ = """ + [project] + readme.content-type = "text/markdown" + readme.text = ''' + # Some Project + Please refer to our corporate docs + for more information. + ''' + """ + + +Tooling buy-in +============== + +The following is a list of tools that have expressed support for this PEP or +have committed to implementing support should it be accepted: + +* `Pantsbuild and Pex `__: expressed + support for any way to define dependencies and also features that this PEP + considers as valid use cases such as building packages from scripts and + embedding tool configuration +* `Mypy `__ and + `Ruff `__: strongly expressed support + for embedding tool configuration as it would solve existing pain points for + users +* `Hatch `__: (author of this PEP) + expressed support for all aspects of this PEP, and will be one of the first + tools to support running scripts with specifically configured Python versions + Rejected Ideas ============== @@ -317,6 +395,7 @@ the setup is too complex for the average user like when requiring Nvidia drivers. Situations like this would allow users to proceed with what they want to do whereas otherwise they may stop at that point altogether. +.. _723-comment-block: Why not use a comment block resembling requirements.txt? -------------------------------------------------------- @@ -411,23 +490,32 @@ small subset of users. Studio Code would be able to provide TOML syntax highlighting much more easily than each writing custom logic for this feature. -Additionally, the block comment format goes against the recommendation of -:pep:`8`: +Additionally, since the original block comment alternative format went against +the recommendation of :pep:`8` and as a result linters and IDE auto-formatters +that respected the recommendation would +`fail by default `__, the final +proposal uses standard comments starting with a single ``#`` character. - Each line of a block comment starts with a ``#`` and a single space (unless - it is indented text inside the comment). [...] Paragraphs inside a block - comment are separated by a line containing a single ``#``. +The concept of regular comments that do not appear to be intended for machines +(i.e. `encoding declarations`__) affecting behavior would not be customary to +users of Python and goes directly against the "explicit is better than +implicit" foundational principle. -Linters and IDE auto-formatters that respect this long-time recommendation -would fail by default. The following uses the example from :pep:`722`: +__ https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations -.. code:: bash - - $ flake8 . - .\script.py:3:1: E266 too many leading '#' for block comment - .\script.py:4:1: E266 too many leading '#' for block comment - .\script.py:5:1: E266 too many leading '#' for block comment +Users typing what to them looks like prose could alter runtime behavior. This +PEP takes the view that the possibility of that happening, even when a tool +has been set up as such (maybe by a sysadmin), is unfriendly to users. +Finally, and critically, the alternatives to this PEP like :pep:`722` do not +satisfy the use cases enumerated herein, such as setting the supported Python +versions, the eventual building of scripts into packages, and the ability to +have machines edit metadata on behalf of users. It is very likely that the +requests for such features persist and conceivable that another PEP in the +future would allow for the embedding of such metadata. At that point there +would be multiple ways to achieve the same thing which goes against our +foundational principle of "there should be one - and preferably only one - +obvious way to do it". Why not consider scripts as projects without wheels? ---------------------------------------------------- @@ -443,13 +531,67 @@ pinning e.g. a lock file with some sort of hash checking. Such projects would never be distributed as a wheel (except for maybe a transient editable one that is created when doing ``pip install -e .``). -In contrast, scripts are managed loosely by its runner and would almost -always have relaxed dependency constraints. Additionally, to reduce -friction associated with managing small projects there may be a future -in which there is a standard prescribed way to ship projects that are in -the form of a single file. The author of the Rust RFC for embedding metadata +In contrast, scripts are managed loosely by their runners and would almost +always have relaxed dependency constraints. Additionally, there may be a future +in which there is `a standard way <723-limit-build-backend_>`_ to ship projects +that are in the form of a single file. + +.. _723-limit-build-backend: + +Why not limit build backend behavior? +------------------------------------- + +A previous version of this PEP proposed that the ``[build-system]`` table +mustn't be defined. The rationale was that builds would never occur so it +did not make sense to allow this section. + +We removed that limitation based on +`feedback `__ stating that there +are already tools that exist in the wild that build wheels and source +distributions from single files. + +The author of the Rust RFC for embedding metadata `mentioned to us `__ that they are -actively looking into that based on user feedback. +actively looking into that as well based on user feedback saying that there +is unnecessary friction with managing small projects, which we have also +heard in the Python community. + +There has been `a commitment `__ to +support this by at least one major build system. + +Why not limit tool behavior? +---------------------------- + +A previous version of this PEP proposed that non-script running tools SHOULD +NOT modify their behavior when the script is not the sole input to the tool. +For example, if a linter is invoked with the path to a directory, it SHOULD +behave the same as if zero files had embedded metadata. + +This was done as a precaution to avoid tool behavior confusion and generating +various feature requests for tools to support this PEP. However, during +discussion we received `feedback `__ +from maintainers of tools that this would be undesirable and potentially +confusing to users. Additionally, this may allow for a universally easier +way to configure tools in certain circumstances and solve existing issues. + +Why not accept all valid Python expression syntax? +-------------------------------------------------- + +There has been a suggestion that we should not restrict how the +``__pyproject__`` variable is defined and we should parse the abstract syntax +tree. For example: + +.. code:: python + + __pyproject__ = ( + """ + [project] + dependencies = [] + """ + ) + +We will not be doing this so that every language has the possibility to read +the metadata without dependence on knowledge of every version of Python. Why not just set up a Python project with a ``pyproject.toml``? --------------------------------------------------------------- @@ -472,6 +614,61 @@ suggestion until the `current discussion on Discourse won't be distributed as wheels is resolved. And even then, it doesn't address the "sending someone a script in a gist or email" use case. +Why not infer the requirements from import statements? +------------------------------------------------------ + +The idea would be to automatically recognize ``import`` statements in the source +file and turn them into a list of requirements. + +However, this is infeasible for several reasons. First, the points above about +the necessity to keep the syntax easily parsable, for all Python versions, also +by tools written in other languages, apply equally here. + +Second, PyPI and other package repositories conforming to the Simple Repository +API do not provide a mechanism to resolve package names from the module names +that are imported (see also `this related discussion`__). + +__ https://discuss.python.org/t/record-the-top-level-names-of-a-wheel-in-metadata/29494 + +Third, even if repositories did offer this information, the same import name may +correspond to several packages on PyPI. One might object that disambiguating +which package is wanted would only be needed if there are several projects +providing the same import name. However, this would make it easy for anyone to +unintentionally or malevolently break working scripts, by uploading a package to +PyPI providing an import name that is the same as an existing project. The +alternative where, among the candidates, the first package to have been +registered on the index is chosen, would be confusing in case a popular package +is developed with the same import name as an existing obscure package, and even +harmful if the existing package is malware intentionally uploaded with a +sufficiently generic import name that has a high probability of being reused. + +A related idea would be to attach the requirements as comments to the import +statements instead of gathering them in a block, with a syntax such as:: + + import numpy as np # requires: numpy + import rich # requires: rich + +This still suffers from parsing difficulties. Also, where to place the comment +in the case of multiline imports is ambiguous and may look ugly:: + + from PyQt5.QtWidgets import ( + QCheckBox, QComboBox, QDialog, QDialogButtonBox, + QGridLayout, QLabel, QSpinBox, QTextEdit + ) # requires: PyQt5 + +Furthermore, this syntax cannot behave as might be intuitively expected +in all situations. Consider:: + + import platform + if platform.system() == "Windows": + import pywin32 # requires: pywin32 + +Here, the user's intent is that the package is only required on Windows, but +this cannot be understood by the script runner (the correct way to write +it would be ``requires: pywin32 ; sys_platform == 'win32'``). + +(Thanks to Jean Abou-Samra for the clear discussion of this point) + Why not use a requirements file for dependencies? ------------------------------------------------- @@ -574,6 +771,18 @@ References .. _pyproject without wheels: https://discuss.python.org/t/projects-that-arent-meant-to-generate-a-wheel-and-pyproject-toml/29684 +Footnotes +========= + +.. [1] A large number of users use scripts that are version controlled. For + example, `the SREs that were mentioned <723-comment-block_>`_ or + projects that require special maintenance like the + `AWS CLI `__ + or `Calibre `__. +.. [2] For example, projects like Hatch and Poetry have their own backends + and may wish to support this use case only when their backend is used. + + Copyright =========