python-peps/pep-0723.rst

806 lines
35 KiB
ReStructuredText
Raw Normal View History

PEP: 723
Title: Embedding pyproject.toml in single-file scripts
Author: Ofek Lev <ofekmeister@gmail.com>
Sponsor: Adam Turner <python@quite.org.uk>
PEP-Delegate: Brett Cannon <brett@python.org>
Discussions-To: https://discuss.python.org/t/31151
Status: Draft
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 04-Aug-2023
Post-History: `04-Aug-2023 <https://discuss.python.org/t/30979>`__,
`06-Aug-2023 <https://discuss.python.org/t/31151>`__,
`23-Aug-2023 <https://discuss.python.org/t/32149>`__,
Replaces: 722
Abstract
========
This PEP specifies a metadata format that can be embedded in single-file Python
scripts to assist launchers, IDEs and other external tools which may need to
interact with such scripts.
Motivation
==========
Python is routinely used as a scripting language, with Python scripts as a
(better) alternative to shell scripts, batch files, etc. When Python code is
structured as a script, it is usually stored as a single file and does not
expect the availability of any other local code that may be used for imports.
As such, it is possible to share with others over arbitrary text-based means
such as email, a URL to the script, or even a chat window. Code that is
structured like this may live as a single file forever, never becoming a
full-fledged project with its own directory and ``pyproject.toml`` file.
An issue that users encounter with this approach is that there is no standard
mechanism to define metadata for tools whose job it is to execute such scripts.
For example, a tool that runs a script may need to know which dependencies are
required or the supported version(s) of Python.
There is currently no standard tool that addresses this issue, and this PEP
does *not* attempt to define one. However, any tool that *does* address this
issue will need to know what the runtime requirements of scripts are. By
defining a standard format for storing such metadata, existing tools, as well
as any future tools, will be able to obtain that information without requiring
users to include tool-specific metadata in their scripts.
Rationale
=========
This PEP defines a mechanism for embedding metadata *within the script itself*,
and not in an external file.
We choose to follow the latest developments of other modern packaging
ecosystems (namely `Go`__ and provisionally `Rust`__) by embedding the existing
file used to describe projects, in our case ``pyproject.toml``.
__ https://github.com/erning/gorun
__ https://rust-lang.github.io/rfcs/3424-cargo-script.html
The format is intended to bridge the gap between different types of users
of Python. Users will benefit from seamless interoperability with tools that
already work with TOML.
One of the central themes we discovered from the recent
`packaging survey <https://discuss.python.org/t/22420>`__ is that users have
begun getting frustrated with the lack of unification regarding both tooling
and specs. Adding yet another metadata format (like :pep:`722` syntax for a
list of dependencies), even for a currently unsatisfied use case, would
further fragment the community.
The following are some of the use cases that this PEP wishes to support:
* A user facing CLI that is capable of executing scripts. If we take Hatch as
an example, the interface would be simply
``hatch run /path/to/script.py [args]`` and Hatch will manage the
environment for that script. Such tools could be used as shebang lines on
non-Windows systems e.g. ``#!/usr/bin/env hatch run``.
* A script that desires to transition to a directory-type project. A user may
be rapidly prototyping locally or in a remote REPL environment and then
decide to transition to a more formal project layout if their idea works
out. This intermediate script stage would be very useful to have fully
reproducible bug reports. By using the same format, the user can simply copy
and paste the metadata into a ``pyproject.toml`` file and continue working
without having to learn a new format. More likely, even, is that tooling will
eventually support this transformation with a single command.
* Users that wish to avoid manual dependency management. For example, package
managers that have commands to add/remove dependencies or dependency update
automation in CI that triggers based on new versions or in response to
CVEs [1]_.
Specification
=============
This PEP defines a metadata comment block format loosely inspired [2]_ by
`reStructuredText Directives`__.
__ https://docutils.sourceforge.io/docs/ref/rst/directives.html
Any Python script may have top-level comment blocks that start with the line
``# /// TYPE`` where ``TYPE`` determines how to process the content, and ends
with the line ``# ///``. Every line between these two lines MUST be a comment
starting with ``#``. If there are characters after the ``#`` then the first
character MUST be a space. The embedded content is formed by taking away the
first two characters of each line if the second character is a space, otherwise
just the first character (which means the line consists of only a single
``#``).
When there are multiple comment blocks of the same ``TYPE`` defined, tools MUST
produce an error.
Tools reading embedded metadata MAY respect the standard Python encoding
declaration. If they choose not to do so, they MUST process the file as UTF-8.
This is the canonical regular expression that MAY be used to parse the
metadata:
.. code:: text
(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$
In circumstances where there is a discrepancy between the text specification
and the regular expression, the text specification takes precedence.
Tools MUST NOT read from metadata blocks with types that have not been
standardized by this PEP or future ones.
pyproject type
--------------
The first type of metadata block is named ``pyproject`` which represents
content similar to [3]_ what one would see in a ``pyproject.toml`` file.
This document MAY include the ``[run]`` and ``[tool]`` tables.
The :pep:`tool table <518#tool-table>` MAY be used by any tool, script runner
or otherwise, to configure behavior.
The ``[run]`` table MAY include the following optional fields:
* ``dependencies``: A list of strings that specifies the runtime dependencies
of the script. Each entry MUST be a valid :pep:`508` dependency.
* ``requires-python``: A string that specifies the Python version(s) with which
the script is compatible. The value of this field MUST be a valid
:pep:`version specifier <440#version-specifiers>`.
* ``version``: A string that specifies the version of the script. The value of
this field MUST be a valid :pep:`440` version.
Any future PEPs that define additional fields for the ``[run]`` table when used
in a ``pyproject.toml`` file MUST include the aforementioned fields exactly as
specified. The fields defined by this PEP are equally as applicable to
full-fledged projects as they are to single-file scripts.
Script runners MUST error if the specified ``dependencies`` cannot be provided.
Script runners SHOULD error if no version of Python that satisfies the specified
``requires-python`` can be provided.
Example
-------
The following is an example of a script with an embedded ``pyproject.toml``:
.. code:: python
# /// pyproject
# [run]
# requires-python = ">=3.11"
# dependencies = [
# "requests<3",
# "rich",
# ]
# ///
import requests
from rich.pretty import pprint
resp = requests.get("https://peps.python.org/api/peps.json")
data = resp.json()
pprint([(k, v["title"]) for k, v in data.items()][:10])
The following [4]_ is an example of a proposed syntax for single-file Rust
projects that embeds their equivalent of ``pyproject.toml``, which is called
``Cargo.toml``:
.. code:: rust
#!/usr/bin/env cargo
//! ```cargo
//! [dependencies]
//! regex = "1.8.0"
//! ```
fn main() {
let re = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();
println!("Did our date match? {}", re.is_match("2014-01-01"));
}
Reference Implementation
========================
The following is an example of how to read the metadata on Python 3.11 or
higher.
.. code:: python
import re
import tomllib
REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'
def read(script: str) -> dict | None:
name = 'pyproject'
matches = list(
filter(lambda m: m.group('type') == name, re.finditer(REGEX, script))
)
if len(matches) > 1:
raise ValueError(f'Multiple {name} blocks found')
elif len(matches) == 1:
return tomllib.loads(matches[0])
else:
return None
Often tools will edit dependencies like package managers or dependency update
automation in CI. The following is a crude example of modifying the content
using the ``tomlkit`` library__.
__ https://tomlkit.readthedocs.io/en/latest/
.. code:: python
import re
import tomlkit
REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'
def add(script: str, dependency: str) -> str:
match = re.search(REGEX, script)
content = ''.join(
line[2:] if line.startswith('# ') else line[1:]
for line in match.group('content').splitlines(keepends=True)
)
config = tomlkit.parse(content)
config['project']['dependencies'].append(dependency)
new_content = ''.join(
f'# {line}' if line.strip() else f'#{line}'
for line in tomlkit.dumps(config).splitlines(keepends=True)
)
start, end = match.span('content')
return script[:start] + new_content + script[end:]
Note that this example used a library that preserves TOML formatting. This is
not a requirement for editing by any means but rather is a "nice to have"
feature.
Backwards Compatibility
=======================
At the time of writing, the ``# /// pyproject`` block comment starter does not
appear `on GitHub`__. Therefore, there is little risk of existing scripts being
broken by this PEP.
__ https://github.com/search?q=%22%23+%2F%2F%2F+pyproject%22&type=code
Security Implications
=====================
If a script containing embedded metadata is ran using a tool that automatically
installs dependencies, this could cause arbitrary code to be downloaded and
installed in the user's environment.
The risk here is part of the functionality of the tool being used to run the
script, and as such should already be addressed by the tool itself. The only
additional risk introduced by this PEP is if an untrusted script with a
embedded metadata is run, when a potentially malicious dependency or transitive
dependency might be installed.
This risk is addressed by the normal good practice of reviewing code
before running it. Additionally, tools may be able to provide
`locking functionality <723-tool-configuration_>`__ to ameliorate this risk.
How to Teach This
=================
To embed metadata in a script, define a comment block that starts with the
line ``# /// pyproject`` and ends with the line ``# ///``. Every line between
those two lines must be a comment and the full content is derived by removing
the first two characters. The ``pyproject`` type indicates that the content
is TOML and resembles a ``pyproject.toml`` file.
.. code:: python
# /// pyproject
# [run]
# dependencies = [
# "requests<3",
# "rich",
# ]
# requires-python = ">=3.11"
# version = "0.1.0"
# ///
The two allowed tables are ``[run]`` and ``[tool]``. The ``[run]`` table may
contain the following fields:
.. list-table::
* - Field
- Description
- Tool behavior
* - ``dependencies``
- A list of strings that specifies the runtime dependencies of the script.
Each entry must be a valid :pep:`508` dependency.
- Tools will error if the specified dependencies cannot be provided.
* - ``requires-python``
- A string that specifies the Python version(s)
with which the script is compatible.
The value of this field must be a valid
:pep:`version specifier <440#version-specifiers>`.
- Tools might error if no version of Python that satisfies
the constraint can be executed.
* - ``version``
- A string that specifies the version of the script.
The value of this field must be a valid :pep:`440` version.
- Tools may use this however they wish, if defined.
It is up to individual tools whether or not their behavior is altered based on
the embedded metadata. For example, every script runner may not be able to
provide an environment for specific Python versions as defined by the
``requires-python`` field.
The :pep:`tool table <518#tool-table>` may be used by any tool, script runner
or otherwise, to configure behavior.
Recommendations
===============
Tools that support managing different versions of Python should attempt to use
the highest available version of Python that is compatible with the script's
``requires-python`` metadata, if defined.
Tooling buy-in
==============
The following is a list of tools that have expressed support for this PEP or
have committed to implementing support should it be accepted:
* `Pantsbuild and Pex <https://discuss.python.org/t/31151/15>`__: expressed
support for any way to define dependencies and also features that this PEP
considers as valid use cases such as building packages from scripts and
embedding tool configuration
* `Mypy <https://discuss.python.org/t/31151/16>`__ and
`Ruff <https://discuss.python.org/t/31151/42>`__: strongly expressed support
for embedding tool configuration as it would solve existing pain points for
users
* `Hatch <https://discuss.python.org/t/31151/53>`__: (author of this PEP)
expressed support for all aspects of this PEP, and will be one of the first
tools to support running scripts with specifically configured Python versions
Rejected Ideas
==============
.. _723-comment-block:
Why not use a comment block resembling requirements.txt?
--------------------------------------------------------
This PEP considers there to be different types of users for whom Python code
would live as single-file scripts:
* Non-programmers who are just using Python as a scripting language to achieve
a specific task. These users are unlikely to be familiar with concepts of
operating systems like shebang lines or the ``PATH`` environment variable.
Some examples:
* The average person, perhaps at a workplace, who wants to write a script to
automate something for efficiency or to reduce tedium
* Someone doing data science or machine learning in industry or academia who
wants to write a script to analyze some data or for research purposes.
These users are special in that, although they have limited programming
knowledge, they learn from sources like StackOverflow and blogs that have a
programming bent and are increasingly likely to be part of communities that
share knowledge and code. Therefore, a non-trivial number of these users
will have some familiarity with things like Git(Hub), Jupyter, HuggingFace,
etc.
* Non-programmers who manage operating systems e.g. a sysadmin. These users are
able to set up ``PATH``, for example, but are unlikely to be familiar with
Python concepts like virtual environments. These users often operate in
isolation and have limited need to gain exposure to tools intended for
sharing like Git.
* Programmers who manage operating systems/infrastructure e.g. SREs. These
users are not very likely to be familiar with Python concepts like virtual
environments, but are likely to be familiar with Git and most often use it
to version control everything required to manage infrastructure like Python
scripts and Kubernetes config.
* Programmers who write scripts primarily for themselves. These users over time
accumulate a great number of scripts in various languages that they use to
automate their workflow and often store them in a single directory, that is
potentially version controlled for persistence. Non-Windows users may set
up each Python script with a shebang line pointing to the desired Python
executable or script runner.
This PEP argues that reusing our TOML-based metadata format is the best for
each category of user and that the requirements-like block comment is only
approachable for those who have familiarity with ``requirements.txt``, which
represents a small subset of users.
* For the average person automating a task or the data scientist, they are
already starting with zero context and are unlikely to be familiar with
TOML nor ``requirements.txt``. These users will very likely rely on
snippets found online via a search engine or utilize AI in the form
of a chat bot or direct code completion software. Searching for Python
metadata formatting will lead them to the TOML-based format that already
exists which they can reuse. The author tested GitHub Copilot with this
PEP and it already supports auto-completion of ``dependencies``. In contrast,
a new format may take years of being trained on the Internet for models to
learn.
Additionally, these users are most susceptible to formatting quirks and
syntax errors. TOML is a well-defined format with existing online
validators that features assignment that is compatible with Python
expressions and has no strict indenting rules. The block comment format
on the other hand could be easily malformed by forgetting the colon, for
example, and debugging why it's not working with a search engine would be
a difficult task for such a user.
* For the sysadmin types, they are equally unlikely as the previously described
users to be familiar with TOML or ``requirements.txt``. For either format
they would have to read documentation. They would likely be more comfortable
with TOML since they are used to structured data formats and there would be
less perceived magic in their systems.
Additionally, for maintenance of their systems ``/// pyproject`` would be
much easier to search for from a shell than a block comment with potentially
numerous extensions over time.
* For the SRE types, they are likely to be familiar with TOML already from
other projects that they might have to work with like configuring the
`GitLab Runner`__ or `Cloud Native Buildpacks`__.
__ https://docs.gitlab.com/runner/configuration/advanced-configuration.html
__ https://buildpacks.io/docs/reference/config/
These users are responsible for the security of their systems and most likely
have security scanners set up to automatically open PRs to update versions
of dependencies. Such automated tools like Dependabot would have a much
easier time using existing TOML libraries than writing their own custom
parser for a block comment format.
* For the programmer types, they are more likely to be familiar with TOML
than they have ever seen a ``requirements.txt`` file, unless they are a
Python programmer who has had previous experience with writing applications.
In the case of experience with the requirements format, it necessarily means
that they are at least somewhat familiar with the ecosystem and therefore
it is safe to assume they know what TOML is.
Another benefit of this PEP to these users is that their IDEs like Visual
Studio Code would be able to provide TOML syntax highlighting much more
easily than each writing custom logic for this feature.
Additionally, since the original block comment alternative format (double
``#``) went against the recommendation of :pep:`8` and as a result linters
and IDE auto-formatters that respected the recommendation would
`fail by default <https://discuss.python.org/t/29905/247>`__, the final
proposal uses standard comments starting with a single ``#`` character without
any obvious start nor end sequence.
The concept of regular comments that do not appear to be intended for machines
(i.e. `encoding declarations`__) affecting behavior would not be customary to
users of Python and goes directly against the "explicit is better than
implicit" foundational principle.
__ https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations
Users typing what to them looks like prose could alter runtime behavior. This
PEP takes the view that the possibility of that happening, even when a tool
has been set up as such (maybe by a sysadmin), is unfriendly to users.
Finally, and critically, the alternatives to this PEP like :pep:`722` do not
satisfy the use cases enumerated herein, such as setting the supported Python
versions, the eventual building of scripts into packages, and the ability to
have machines edit metadata on behalf of users. It is very likely that the
requests for such features persist and conceivable that another PEP in the
future would allow for the embedding of such metadata. At that point there
would be multiple ways to achieve the same thing which goes against our
foundational principle of "there should be one - and preferably only one -
obvious way to do it".
Why not use a multi-line string?
--------------------------------
A previous version of this PEP proposed that the metadata be stored as follows:
.. code:: python
__pyproject__ = """
...
"""
The most significant problem with this proposal is that the embedded TOML would
be limited in the following ways:
* It would not be possible to use multi-line double-quoted strings in the TOML
as that would conflict with the Python string containing the document. Many
TOML writers do not preserve style and may potentially produce output that
would be malformed.
* The way in which character escaping works in Python strings is not quite the
way it works in TOML strings. It would be possible to preserve a one-to-one
character mapping by enforcing raw strings, but this ``r`` prefix requirement
may be potentially confusing to users.
Why not reuse core metadata fields?
-----------------------------------
A previous version of this PEP proposed to reuse the existing
`metadata standard <pyproject metadata_>`_ that is used to describe projects.
There are two significant problems with this proposal:
* The ``name`` and ``version`` fields are required and changing that would
require its own PEP
* Reusing the data is `fundamentally a misuse of it`__
__ https://snarky.ca/differentiating-between-writing-down-dependencies-to-use-packages-and-for-packages-themselves/
Why not limit to specific metadata fields?
------------------------------------------
By limiting the metadata to a specific set of fields, for example just
``dependencies``, we would prevent legitimate known use cases:
* ``requires-python``: For tools that support managing Python installations,
this allows users to target specific versions of Python for new syntax
or standard library functionality.
* ``version``: It is quite common to version scripts for persistence even when
using a VCS like Git. When not using a VCS it is even more common to version,
for example the author has been in multiple time sensitive debugging sessions
with customers where due to the airgapped nature of the environment, the only
way to transfer the script was via email or copying and pasting it into a
chat window. In these cases, versioning is invaluable to ensure that the
customer is using the latest (or a specific) version of the script.
.. _723-tool-configuration:
Why not limit tool configuration?
---------------------------------
By not allowing the ``[tool]`` table, we would prevent known functionality
that would benefit users. For example:
* A script runner may support injecting of dependency resolution data for an
embedded lock file (this is what Go's ``gorun`` can do).
* A script runner may support configuration instructing to run scripts in
containers for situations in which there is no cross-platform support for a
dependency or if the setup is too complex for the average user like when
requiring Nvidia drivers. Situations like this would allow users to proceed
with what they want to do whereas otherwise they may stop at that point
altogether.
* Tools may wish to experiment with features to ease development burden for
users such as the building of single-file scripts into packages. We received
`feedback <https://discuss.python.org/t/31151/9>`__ stating that there are
already tools that exist in the wild that build wheels and source
distributions from single files.
The author of the Rust RFC for embedding metadata
`mentioned to us <https://discuss.python.org/t/29905/179>`__ that they are
actively looking into that as well based on user feedback saying that there
is unnecessary friction with managing small projects.
There has been `a commitment <https://discuss.python.org/t/31151/15>`__ to
support this by at least one major build system.
Why not limit tool behavior?
----------------------------
A previous version of this PEP proposed that non-script running tools SHOULD
NOT modify their behavior when the script is not the sole input to the tool.
For example, if a linter is invoked with the path to a directory, it SHOULD
behave the same as if zero files had embedded metadata.
This was done as a precaution to avoid tool behavior confusion and generating
various feature requests for tools to support this PEP. However, during
discussion we received `feedback <https://discuss.python.org/t/31151/16>`__
from maintainers of tools that this would be undesirable and potentially
confusing to users. Additionally, this may allow for a universally easier
way to configure tools in certain circumstances and solve existing issues.
Why not just set up a Python project with a ``pyproject.toml``?
---------------------------------------------------------------
Again, a key issue here is that the target audience for this proposal is people
writing scripts which aren't intended for distribution. Sometimes scripts will
be "shared", but this is far more informal than "distribution" - it typically
involves sending a script via an email with some written instructions on how to
run it, or passing someone a link to a GitHub gist.
Expecting such users to learn the complexities of Python packaging is a
significant step up in complexity, and would almost certainly give the
impression that "Python is too hard for scripts".
In addition, if the expectation here is that the ``pyproject.toml`` will
somehow be designed for running scripts in place, that's a new feature of the
standard that doesn't currently exist. At a minimum, this isn't a reasonable
suggestion until the `current discussion on Discourse
<pyproject without wheels_>`_ about using ``pyproject.toml`` for projects that
won't be distributed as wheels is resolved. And even then, it doesn't address
the "sending someone a script in a gist or email" use case.
Why not infer the requirements from import statements?
------------------------------------------------------
The idea would be to automatically recognize ``import`` statements in the source
file and turn them into a list of requirements.
However, this is infeasible for several reasons. First, the points above about
the necessity to keep the syntax easily parsable, for all Python versions, also
by tools written in other languages, apply equally here.
Second, PyPI and other package repositories conforming to the Simple Repository
API do not provide a mechanism to resolve package names from the module names
that are imported (see also `this related discussion`__).
__ https://discuss.python.org/t/record-the-top-level-names-of-a-wheel-in-metadata/29494
Third, even if repositories did offer this information, the same import name may
correspond to several packages on PyPI. One might object that disambiguating
which package is wanted would only be needed if there are several projects
providing the same import name. However, this would make it easy for anyone to
unintentionally or malevolently break working scripts, by uploading a package to
PyPI providing an import name that is the same as an existing project. The
alternative where, among the candidates, the first package to have been
registered on the index is chosen, would be confusing in case a popular package
is developed with the same import name as an existing obscure package, and even
harmful if the existing package is malware intentionally uploaded with a
sufficiently generic import name that has a high probability of being reused.
A related idea would be to attach the requirements as comments to the import
statements instead of gathering them in a block, with a syntax such as::
import numpy as np # requires: numpy
import rich # requires: rich
This still suffers from parsing difficulties. Also, where to place the comment
in the case of multiline imports is ambiguous and may look ugly::
from PyQt5.QtWidgets import (
QCheckBox, QComboBox, QDialog, QDialogButtonBox,
QGridLayout, QLabel, QSpinBox, QTextEdit
) # requires: PyQt5
Furthermore, this syntax cannot behave as might be intuitively expected
in all situations. Consider::
import platform
if platform.system() == "Windows":
import pywin32 # requires: pywin32
Here, the user's intent is that the package is only required on Windows, but
this cannot be understood by the script runner (the correct way to write
it would be ``requires: pywin32 ; sys_platform == 'win32'``).
(Thanks to Jean Abou-Samra for the clear discussion of this point)
Why not use a requirements file for dependencies?
-------------------------------------------------
Putting your requirements in a requirements file, doesn't require a PEP. You
can do that right now, and in fact it's quite likely that many adhoc solutions
do this. However, without a standard, there's no way of knowing how to locate a
script's dependency data. And furthermore, the requirements file format is
pip-specific, so tools relying on it are depending on a pip implementation
detail.
So in order to make a standard, two things would be required:
1. A standardised replacement for the requirements file format.
2. A standard for how to locate the requirements file for a given script.
The first item is a significant undertaking. It has been discussed on a number
of occasions, but so far no-one has attempted to actually do it. The most
likely approach would be for standards to be developed for individual use cases
currently addressed with requirements files. One option here would be for this
PEP to simply define a new file format which is simply a text file containing
:pep:`508` requirements, one per line. That would just leave the question of
how to locate that file.
The "obvious" solution here would be to do something like name the file the
same as the script, but with a ``.reqs`` extension (or something similar).
However, this still requires *two* files, where currently only a single file is
needed, and as such, does not match the "better batch file" model (shell
scripts and batch files are typically self-contained). It requires the
developer to remember to keep the two files together, and this may not always
be possible. For example, system administration policies may require that *all*
files in a certain directory are executable (the Linux filesystem standards
require this of ``/usr/bin``, for example). And some methods of sharing a
script (for example, publishing it on a text file sharing service like Github's
gist, or a corporate intranet) may not allow for deriving the location of an
associated requirements file from the script's location (tools like ``pipx``
support running a script directly from a URL, so "download and unpack a zip of
the script and its dependencies" may not be an appropriate requirement).
Essentially, though, the issue here is that there is an explicitly stated
requirement that the format supports storing dependency data *in the script
file itself*. Solutions that don't do that are simply ignoring that
requirement.
Why not use (possibly restricted) Python syntax?
------------------------------------------------
This would typically involve storing metadata as multiple special variables,
such as the following.
.. code:: python
__requires_python__ = ">=3.11"
__dependencies__ = [
"requests",
"click",
]
The most significant problem with this proposal is that it requires all
consumers of the dependency data to implement a Python parser. Even if the
syntax is restricted, the *rest* of the script will use the full Python syntax,
and trying to define a syntax which can be successfully parsed in isolation
from the surrounding code is likely to be extremely difficult and error-prone.
Furthermore, Python's syntax changes in every release. If extracting dependency
data needs a Python parser, the parser will need to know which version of
Python the script is written for, and the overhead for a generic tool of having
a parser that can handle *multiple* versions of Python is unsustainable.
With this approach there is the potential to clutter scripts with many
variables as new extensions get added. Additionally, intuiting which metadata
fields correspond to which variable names would cause confusion for users.
It is worth noting, though, that the ``pip-run`` utility does implement (an
extended form of) this approach. `Further discussion <pip-run issue_>`_ of
the ``pip-run`` design is available on the project's issue tracker.
What about local dependencies?
------------------------------
These can be handled without needing special metadata and tooling, simply by
adding the location of the dependencies to ``sys.path``. This PEP simply isn't
needed for this case. If, on the other hand, the "local dependencies" are
actual distributions which are published locally, they can be specified as
usual with a :pep:`508` requirement, and the local package index specified when
running a tool by using the tool's UI for that.
Open Issues
===========
None at this point.
References
==========
.. _pyproject metadata: https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
.. _pip-run issue: https://github.com/jaraco/pip-run/issues/44
.. _pyproject without wheels: https://discuss.python.org/t/projects-that-arent-meant-to-generate-a-wheel-and-pyproject-toml/29684
Footnotes
=========
.. [1] A large number of users use scripts that are version controlled. For
example, `the SREs that were mentioned <723-comment-block_>`_ or
projects that require special maintenance like the
`AWS CLI <https://github.com/aws/aws-cli/tree/4393dcdf044a5275000c9c193d1933c07a08fdf1/scripts>`__
or `Calibre <https://github.com/kovidgoyal/calibre/tree/master/setup>`__.
.. [2] The syntax is taken directly from the final resolution of the
`Blocks extension`__ to `Python Markdown`__.
__ https://github.com/facelessuser/pymdown-extensions/discussions/1973
__ https://github.com/Python-Markdown/markdown
.. [3] A future PEP that officially introduces the ``[run]`` table to
``pyproject.toml`` files will make this PEP not just similar but a strict
subset.
.. [4] One important thing to note is that the metadata is embedded in a
`doc-comment`__ (their equivalent of docstrings). `Other syntaxes`__ are
under consideration within the Rust project.
__ https://doc.rust-lang.org/stable/book/ch14-02-publishing-to-crates-io.html#making-useful-documentation-comments
__ https://github.com/epage/cargo-script-mvs/blob/main/0000-cargo-script.md#embedded-manifest-format
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.