PEP 723: Update based on feedback (#3279)

Co-authored-by: Adam Turner <9087854+aa-turner@users.noreply.github.com>
2023-08-09 21:30:10 -04:00 · 2023-08-09 21:30:10 -04:00 · 6adff45cef
parent dc8c5c47a9
commit 6adff45cef
1 changed files with 266 additions and 57 deletions
--- a/pep-0723.rst
+++ b/pep-0723.rst
@ -73,36 +73,62 @@ begun getting frustrated with the lack of unification regarding both tooling
 and specs. Adding yet another way to define metadata, even for a currently
 unsatisfied use case, would further fragment the community.
-A use case that this PEP wishes to support that other formats may preclude is
+The following are some of the use cases that this PEP wishes to support:
-a script that desires to transition to a directory-type project. A user may
+
-be rapidly prototyping locally or in a remote REPL environment and then decide
+* A user facing CLI that is capable of executing scripts. If we take Hatch as
-to transition to a more formal project if their idea works out. This
+  an example, the interface would be simply
-intermediate script stage would be very useful to have fully reproducible bug
+  ``hatch run /path/to/script.py [args]`` and Hatch will manage the
-reports. By using the same metadata format, the user can simply copy and paste
+  environment for that script. Such tools could be used as shebang lines on
-the metadata into a ``pyproject.toml`` file and continue working without having
+  non-Windows systems e.g. ``#!/usr/bin/env hatch run``. You would also be
-to learn a new format. More likely, even, is that tooling will eventually
+  able to enter a shell into that environment like other projects by doing
-support this transformation with a single command.
+  ``hatch -p /path/to/script.py shell`` since the project flag would learn
  that project metadata could be read from a single file.
 * A script that desires to transition to a directory-type project. A user may
  be rapidly prototyping locally or in a remote REPL environment and then
  decide to transition to a more formal project layout if their idea works
  out. This intermediate script stage would be very useful to have fully
  reproducible bug reports. By using the same metadata format, the user can
  simply copy and paste the metadata into a ``pyproject.toml`` file and
  continue working without having to learn a new format. More likely, even, is
  that tooling will eventually support this transformation with a single
  command.
 * Users that wish to avoid manual dependency management. For example, package
  managers that have commands to add/remove dependencies or dependency update
  automation in CI that triggers based on new versions or in response to
  CVEs [1]_.
 Specification
 =============
 Any Python script may assign a variable named ``__pyproject__`` to a multi-line
-*double-quoted* string (``"""``) containing a valid TOML document. The opening
+*double-quoted* string literal (``"""``) containing a valid TOML document. The
-of the string MUST be on the same line as the assignment. The closing of the
+variable MUST start at the beginning of the line and the opening of the string
-string MUST be on a line by itself, and MUST NOT be indented.
+MUST be on the same line as the assignment. The closing of the string MUST be
 on a line by itself, and MUST NOT be indented.
 When there are multiple ``__pyproject__`` variables defined, tools MUST produce
 an error.
 The TOML document MUST NOT contain multi-line double-quoted strings, as that
 would conflict with the Python string containing the document. Single-quoted
 multi-line TOML strings may be used instead.
 This is the canonical regular expression that MUST be used to parse the
 metadata:
 .. code:: text
    (?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$
 In circumstances where there is a discrepancy between the regular expression
 and the text specification, the regular expression takes precedence.
 Tools reading embedded metadata MAY respect the standard Python encoding
 declaration. If they choose not to do so, they MUST process the file as UTF-8.
-This document MAY include the ``[project]`` and ``[tool]`` tables but MUST NOT
+This document MAY include the ``[project]``, ``[tool]`` and ``[build-system]``
-define the ``[build-system]`` table. The ``[build-system]`` table MAY be
+tables.
 allowed in a future PEP that standardizes how backends are to build
 distributions from single file scripts.
 The ``[project]`` table differs in the following ways:
@ -110,11 +136,15 @@ The ``[project]`` table differs in the following ways:
  dynamically by tools if the user does not define them
 * These fields do not need to be listed in the ``dynamic`` array
-Non-script running tools MAY choose to read from their expected ``[tool]``
+Non-script running tools MAY choose to alter their behavior based on
-sub-table. If a single-file script is not the sole input to a tool then
+configuration that is stored in their expected ``[tool]`` sub-table.
-behavior SHOULD NOT be altered based on the embedded metadata. For example,
+
-if a linter is invoked with the path to a directory, it SHOULD behave the same
+Build frontends SHOULD NOT use the backend defined in the ``[build-system]``
-as if zero files had embedded metadata.
+table to build scripts with embedded metadata. This requires a new PEP because
 the current methods defined in :pep:`517` act upon a directory, not a file.
 We use ``SHOULD NOT`` instead of ``MUST NOT`` in order to allow tools to
 experiment [2]_ with such functionality before we standardize (indeed this
 would be a requirement).
 Example
 -------
@ -180,15 +210,6 @@ raised in the Rust pre-RFC.
 Reference Implementation
 ========================
 This regular expression may be used to parse the metadata:
 .. code:: text
   (?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$
 In circumstances where there is a discrepancy between the regular expression
 and the text specification, the text specification takes precedence.
 The following is an example of how to read the metadata on Python 3.11 or
 higher.
@ -196,9 +217,16 @@ higher.
    import re, tomllib
    REGEX = r'(?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$'
    def read(script: str) -> dict | None:
-        match = re.search(r'(?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$', script)
+        matches = list(re.finditer(REGEX, script))
-        return tomllib.loads(match.group(1)) if match else None
+        if len(matches) > 1:
            raise ValueError('Multiple __pyproject__ definitions found')
        elif len(matches) == 1:
            return tomllib.loads(matches[0])
        else:
            return None
 Often tools will edit dependencies like package managers or dependency update
 automation in CI. The following is a crude example of modifying the content
@ -218,7 +246,7 @@ using the ``tomlkit`` library.
 Note that this example used a library that preserves TOML formatting. This is
 not a requirement for editing by any means but rather is a "nice to have"
-especially since there are unlikely to be embedded comments.
+feature.
 Backwards Compatibility
@ -257,8 +285,13 @@ The risk here is part of the functionality of the tool being used to run the
 script, and as such should already be addressed by the tool itself. The only
 additional risk introduced by this PEP is if an untrusted script with a
 embedded metadata is run, when a potentially malicious dependency might be
-installed. This risk is addressed by the normal good practice of reviewing code
+installed.
-before running it.
+
 This risk is addressed by the normal good practice of reviewing code
 before running it. Additionally, tools may be able to provide locking
 functionality when configured by their ``[tool]`` sub-table to, for example,
 add the resolution result as managed metadata somewhere in the script (this
 is what Go's ``gorun`` can do).
 How to Teach This
@ -270,9 +303,13 @@ about metadata itself direct users to the living document that describes
 `project metadata <pyproject metadata_>`_.
 We will document that the name and version fields in the ``[project]`` table
-may be elided for simplicity. Additionally, we will have guidance (perhaps
+may be elided for simplicity. Additionally, we will have guidance explaining
-temporary) explaining that single-file scripts cannot be built into a wheel
+that single-file scripts cannot (yet) be built into a wheel via standard means.
-and therefore you would never see the associated ``[build-system]`` metadata.
+
 We will explain that it is up to individual tools whether or not their behavior
 is altered based on the embedded metadata. For example, every script runner may
 not be able to provide an environment for specific Python versions as defined
 by the ``requires-python`` field.
 Finally, we may want to list some tools that support this PEP's format.
@ -284,6 +321,47 @@ Tools that support managing different versions of Python should attempt to use
 the highest available version of Python that is compatible with the script's
 ``requires-python`` metadata, if defined.
 For projects that have large multi-line external metadata to embed like a
 README file, it is recommended that they become directories with a
 ``pyproject.toml`` file. While this is technically allowed, it is strongly
 discouraged to have large chunks of multi-line metadata and is indicative
 of the fact that a script has graduated to a more traditional layout.
 If the content is small, for example in the case of internal packages, it is
 recommended that multi-line *single-quoted* TOML strings (``'''``) be used.
 For example:
 .. code:: python
    __pyproject__ = """
    [project]
    readme.content-type = "text/markdown"
    readme.text = '''
    # Some Project
    Please refer to our corporate docs
    for more information.
    '''
    """
 Tooling buy-in
 ==============
 The following is a list of tools that have expressed support for this PEP or
 have committed to implementing support should it be accepted:
 * `Pantsbuild and Pex <https://discuss.python.org/t/31151/15>`__:  expressed
  support for any way to define dependencies and also features that this PEP
  considers as valid use cases such as building packages from scripts and
  embedding tool configuration
 * `Mypy <https://discuss.python.org/t/31151/16>`__ and
  `Ruff <https://discuss.python.org/t/31151/42>`__: strongly expressed support
  for embedding tool configuration as it would solve existing pain points for
  users
 * `Hatch <https://discuss.python.org/t/31151/53>`__: (author of this PEP)
  expressed support for all aspects of this PEP, and will be one of the first
  tools to support running scripts with specifically configured Python versions
 Rejected Ideas
 ==============
@ -317,6 +395,7 @@ the setup is too complex for the average user like when requiring Nvidia
 drivers. Situations like this would allow users to proceed with what they want
 to do whereas otherwise they may stop at that point altogether.
 .. _723-comment-block:
 Why not use a comment block resembling requirements.txt?
 --------------------------------------------------------
@ -411,23 +490,32 @@ small subset of users.
  Studio Code would be able to provide TOML syntax highlighting much more
  easily than each writing custom logic for this feature.
-Additionally, the block comment format goes against the recommendation of
+Additionally, since the original block comment alternative format went against
-:pep:`8`:
+the recommendation of :pep:`8` and as a result linters and IDE auto-formatters
 that respected the recommendation would
 `fail by default <https://discuss.python.org/t/29905/247>`__, the final
 proposal uses standard comments starting with a single ``#`` character.
-    Each line of a block comment starts with a ``#`` and a single space (unless
+The concept of regular comments that do not appear to be intended for machines
-    it is indented text inside the comment). [...] Paragraphs inside a block
+(i.e. `encoding declarations`__) affecting behavior would not be customary to
-    comment are separated by a line containing a single ``#``.
+users of Python and goes directly against the "explicit is better than
 implicit" foundational principle.
-Linters and IDE auto-formatters that respect this long-time recommendation
+__ https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations
 would fail by default. The following uses the example from :pep:`722`:
-.. code:: bash
+Users typing what to them looks like prose could alter runtime behavior. This
-
+PEP takes the view that the possibility of that happening, even when a tool
-    $ flake8 .
+has been set up as such (maybe by a sysadmin), is unfriendly to users.
    .\script.py:3:1: E266 too many leading '#' for block comment
    .\script.py:4:1: E266 too many leading '#' for block comment
    .\script.py:5:1: E266 too many leading '#' for block comment
 Finally, and critically, the alternatives to this PEP like :pep:`722` do not
 satisfy the use cases enumerated herein, such as setting the supported Python
 versions, the eventual building of scripts into packages, and the ability to
 have machines edit metadata on behalf of users. It is very likely that the
 requests for such features persist and conceivable that another PEP in the
 future would allow for the embedding of such metadata. At that point there
 would be multiple ways to achieve the same thing which goes against our
 foundational principle of "there should be one - and preferably only one -
 obvious way to do it".
 Why not consider scripts as projects without wheels?
 ----------------------------------------------------
@ -443,13 +531,67 @@ pinning e.g. a lock file with some sort of hash checking. Such projects would
 never be distributed as a wheel (except for maybe a transient editable one
 that is created when doing ``pip install -e .``).
-In contrast, scripts are managed loosely by its runner and would almost
+In contrast, scripts are managed loosely by their runners and would almost
-always have relaxed dependency constraints. Additionally, to reduce
+always have relaxed dependency constraints. Additionally, there may be a future
-friction associated with managing small projects there may be a future
+in which there is `a standard way <723-limit-build-backend_>`_ to ship projects
-in which there is a standard prescribed way to ship projects that are in
+that are in the form of a single file.
-the form of a single file. The author of the Rust RFC for embedding metadata
+
 .. _723-limit-build-backend:
 Why not limit build backend behavior?
 -------------------------------------
 A previous version of this PEP proposed that the ``[build-system]`` table
 mustn't be defined. The rationale was that builds would never occur so it
 did not make sense to allow this section.
 We removed that limitation based on
 `feedback <https://discuss.python.org/t/31151/9>`__ stating that there
 are already tools that exist in the wild that build wheels and source
 distributions from single files.
 The author of the Rust RFC for embedding metadata
 `mentioned to us <https://discuss.python.org/t/29905/179>`__ that they are
-actively looking into that based on user feedback.
+actively looking into that as well based on user feedback saying that there
 is unnecessary friction with managing small projects, which we have also
 heard in the Python community.
 There has been `a commitment <https://discuss.python.org/t/31151/15>`__ to
 support this by at least one major build system.
 Why not limit tool behavior?
 ----------------------------
 A previous version of this PEP proposed that non-script running tools SHOULD
 NOT modify their behavior when the script is not the sole input to the tool.
 For example, if a linter is invoked with the path to a directory, it SHOULD
 behave the same as if zero files had embedded metadata.
 This was done as a precaution to avoid tool behavior confusion and generating
 various feature requests for tools to support this PEP. However, during
 discussion we received `feedback <https://discuss.python.org/t/31151/16>`__
 from maintainers of tools that this would be undesirable and potentially
 confusing to users. Additionally, this may allow for a universally easier
 way to configure tools in certain circumstances and solve existing issues.
 Why not accept all valid Python expression syntax?
 --------------------------------------------------
 There has been a suggestion that we should not restrict how the
 ``__pyproject__`` variable is defined and we should parse the abstract syntax
 tree. For example:
 .. code:: python
    __pyproject__ = (
        """
        [project]
        dependencies = []
            """
      )
 We will not be doing this so that every language has the possibility to read
 the metadata without dependence on knowledge of every version of Python.
 Why not just set up a Python project with a ``pyproject.toml``?
 ---------------------------------------------------------------
@ -472,6 +614,61 @@ suggestion until the `current discussion on Discourse
 won't be distributed as wheels is resolved. And even then, it doesn't address
 the "sending someone a script in a gist or email" use case.
 Why not infer the requirements from import statements?
 ------------------------------------------------------
 The idea would be to automatically recognize ``import`` statements in the source
 file and turn them into a list of requirements.
 However, this is infeasible for several reasons. First, the points above about
 the necessity to keep the syntax easily parsable, for all Python versions, also
 by tools written in other languages, apply equally here.
 Second, PyPI and other package repositories conforming to the Simple Repository
 API do not provide a mechanism to resolve package names from the module names
 that are imported (see also `this related discussion`__).
 __ https://discuss.python.org/t/record-the-top-level-names-of-a-wheel-in-metadata/29494
 Third, even if repositories did offer this information, the same import name may
 correspond to several packages on PyPI. One might object that disambiguating
 which package is wanted would only be needed if there are several projects
 providing the same import name. However, this would make it easy for anyone to
 unintentionally or malevolently break working scripts, by uploading a package to
 PyPI providing an import name that is the same as an existing project. The
 alternative where, among the candidates, the first package to have been
 registered on the index is chosen, would be confusing in case a popular package
 is developed with the same import name as an existing obscure package, and even
 harmful if the existing package is malware intentionally uploaded with a
 sufficiently generic import name that has a high probability of being reused.
 A related idea would be to attach the requirements as comments to the import
 statements instead of gathering them in a block, with a syntax such as::
  import numpy as np # requires: numpy
  import rich # requires: rich
 This still suffers from parsing difficulties. Also, where to place the comment
 in the case of multiline imports is ambiguous and may look ugly::
   from PyQt5.QtWidgets import (
       QCheckBox, QComboBox, QDialog, QDialogButtonBox,
       QGridLayout, QLabel, QSpinBox, QTextEdit
   ) # requires: PyQt5
 Furthermore, this syntax cannot behave as might be intuitively expected
 in all situations. Consider::
  import platform
  if platform.system() == "Windows":
      import pywin32 # requires: pywin32
 Here, the user's intent is that the package is only required on Windows, but
 this cannot be understood by the script runner (the correct way to write
 it would be ``requires: pywin32 ; sys_platform == 'win32'``).
 (Thanks to Jean Abou-Samra for the clear discussion of this point)
 Why not use a requirements file for dependencies?
 -------------------------------------------------
@ -574,6 +771,18 @@ References
 .. _pyproject without wheels: https://discuss.python.org/t/projects-that-arent-meant-to-generate-a-wheel-and-pyproject-toml/29684
 Footnotes
 =========
 .. [1] A large number of users use scripts that are version controlled. For
   example, `the SREs that were mentioned <723-comment-block_>`_ or
   projects that require special maintenance like the
   `AWS CLI <https://github.com/aws/aws-cli/tree/4393dcdf044a5275000c9c193d1933c07a08fdf1/scripts>`__
   or `Calibre <https://github.com/kovidgoyal/calibre/tree/master/setup>`__.
 .. [2] For example, projects like Hatch and Poetry have their own backends
   and may wish to support this use case only when their backend is used.
 Copyright
 =========