python-peps/pep-0723.rst

791 lines
35 KiB
ReStructuredText
Raw Normal View History

PEP: 723
Title: Embedding pyproject.toml in single-file scripts
Author: Ofek Lev <ofekmeister@gmail.com>
Sponsor: Adam Turner <python@quite.org.uk>
PEP-Delegate: Brett Cannon <brett@python.org>
Discussions-To: https://discuss.python.org/t/31151
Status: Draft
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 04-Aug-2023
Post-History: `04-Aug-2023 <https://discuss.python.org/t/30979>`__,
`06-Aug-2023 <https://discuss.python.org/t/31151>`__,
Replaces: 722
Abstract
========
This PEP specifies a metadata format that can be embedded in single-file Python
scripts to assist launchers, IDEs and other external tools which may need to
interact with such scripts.
Motivation
==========
Python is routinely used as a scripting language, with Python scripts as a
(better) alternative to shell scripts, batch files, etc. When Python code is
structured as a script, it is usually stored as a single file and does not
expect the availability of any other local code that may be used for imports.
As such, it is possible to share with others over arbitrary text-based means
such as email, a URL to the script, or even a chat window. Code that is
structured like this may live as a single file forever, never becoming a
full-fledged project with its own directory and ``pyproject.toml`` file.
An issue that users encounter with this approach is that there is no standard
mechanism to define metadata for tools whose job it is to execute such scripts.
For example, a tool that runs a script may need to know which dependencies are
required or the supported version(s) of Python.
There is currently no standard tool that addresses this issue, and this PEP
does *not* attempt to define one. However, any tool that *does* address this
issue will need to know what the runtime requirements of scripts are. By
defining a standard format for storing such metadata, existing tools, as well
as any future tools, will be able to obtain that information without requiring
users to include tool-specific metadata in their scripts.
Rationale
=========
This PEP defines a mechanism for embedding metadata *within the script itself*,
and not in an external file.
We choose to follow the latest developments of other modern packaging
ecosystems (namely `Go`__ and provisionally `Rust`__) by embedding the existing
`metadata standard <pyproject metadata_>`_ that is used to describe
projects.
__ https://github.com/erning/gorun
__ https://rust-lang.github.io/rfcs/3424-cargo-script.html
The format is intended to bridge the gap between different types of users
of Python. Knowledge of how to write project metadata will be directly
transferable to all use cases, whether writing a script or maintaining a
project that is distributed via PyPI. Additionally, users will benefit from
seamless interoperability with tools that are already familiar with the format.
One of the central themes we discovered from the recent
`packaging survey <https://discuss.python.org/t/22420>`__ is that users have
begun getting frustrated with the lack of unification regarding both tooling
and specs. Adding yet another way to define metadata, even for a currently
unsatisfied use case, would further fragment the community.
The following are some of the use cases that this PEP wishes to support:
* A user facing CLI that is capable of executing scripts. If we take Hatch as
an example, the interface would be simply
``hatch run /path/to/script.py [args]`` and Hatch will manage the
environment for that script. Such tools could be used as shebang lines on
non-Windows systems e.g. ``#!/usr/bin/env hatch run``. You would also be
able to enter a shell into that environment like other projects by doing
``hatch -p /path/to/script.py shell`` since the project flag would learn
that project metadata could be read from a single file.
* A script that desires to transition to a directory-type project. A user may
be rapidly prototyping locally or in a remote REPL environment and then
decide to transition to a more formal project layout if their idea works
out. This intermediate script stage would be very useful to have fully
reproducible bug reports. By using the same metadata format, the user can
simply copy and paste the metadata into a ``pyproject.toml`` file and
continue working without having to learn a new format. More likely, even, is
that tooling will eventually support this transformation with a single
command.
* Users that wish to avoid manual dependency management. For example, package
managers that have commands to add/remove dependencies or dependency update
automation in CI that triggers based on new versions or in response to
CVEs [1]_.
Specification
=============
Any Python script may assign a variable named ``__pyproject__`` to a multi-line
*double-quoted* string literal (``"""``) containing a valid TOML document. The
variable MUST start at the beginning of the line and the opening of the string
MUST be on the same line as the assignment. The closing of the string MUST be
on a line by itself, and MUST NOT be indented.
When there are multiple ``__pyproject__`` variables defined, tools MUST produce
an error.
The TOML document MUST NOT contain multi-line double-quoted strings, as that
would conflict with the Python string containing the document. Single-quoted
multi-line TOML strings may be used instead.
This is the canonical regular expression that MUST be used to parse the
metadata:
.. code:: text
(?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$
In circumstances where there is a discrepancy between the regular expression
and the text specification, the regular expression takes precedence.
Tools reading embedded metadata MAY respect the standard Python encoding
declaration. If they choose not to do so, they MUST process the file as UTF-8.
This document MAY include the ``[project]``, ``[tool]`` and ``[build-system]``
tables.
The ``[project]`` table differs in the following ways:
* The ``name`` and ``version`` fields are not required and MAY be defined
dynamically by tools if the user does not define them
* These fields do not need to be listed in the ``dynamic`` array
Non-script running tools MAY choose to alter their behavior based on
configuration that is stored in their expected ``[tool]`` sub-table.
Build frontends SHOULD NOT use the backend defined in the ``[build-system]``
table to build scripts with embedded metadata. This requires a new PEP because
the current methods defined in :pep:`517` act upon a directory, not a file.
We use ``SHOULD NOT`` instead of ``MUST NOT`` in order to allow tools to
experiment [2]_ with such functionality before we standardize (indeed this
would be a requirement).
Example
-------
The following is an example of a script with an embedded ``pyproject.toml``:
.. code:: python
__pyproject__ = """
[project]
requires-python = ">=3.11"
dependencies = [
"requests<3",
"rich",
]
"""
import requests
from rich.pretty import pprint
resp = requests.get("https://peps.python.org/api/peps.json")
data = resp.json()
pprint([(k, v["title"]) for k, v in data.items()][:10])
The following is an example of a proposed syntax for single-file Rust project
that embeds their equivalent of ``pyproject.toml``,
which is called ``Cargo.toml``:
.. code:: rust
#!/usr/bin/env cargo
//! ```cargo
//! [dependencies]
//! regex = "1.8.0"
//! ```
fn main() {
let re = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();
println!("Did our date match? {}", re.is_match("2014-01-01"));
}
One important thing to note is that the metadata is embedded in a
`doc-comment`_ (their equivalent of docstrings).
`Other syntaxes <cargo embedded manifest_>`_ are under consideration
within the Rust project,
including using attributes which are somewhat like a
syntactically recognized equivalent of dunder variables,
with the key difference between Rust's choice and this PEP being that
any valid Rust syntax will be allowed,
requiring one of the Rust syntax parsers to work with it, like `syn`__.
__ https://crates.io/crates/syn
We argue that our choice, in comparison to the `doc-comment`_ approach,
is easier to read and provides easier edits for humans by virtue
of the contents starting at the beginning of lines so would precisely match
the contents of a ``pyproject.toml`` file.
It is also is easier for tools to parse and modify this continuous block
of text which was `one of the concerns <cargo embedded manifest_>`_
raised in the Rust pre-RFC.
Reference Implementation
========================
The following is an example of how to read the metadata on Python 3.11 or
higher.
.. code:: python
import re, tomllib
REGEX = r'(?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$'
def read(script: str) -> dict | None:
matches = list(re.finditer(REGEX, script))
if len(matches) > 1:
raise ValueError('Multiple __pyproject__ definitions found')
elif len(matches) == 1:
return tomllib.loads(matches[0])
else:
return None
Often tools will edit dependencies like package managers or dependency update
automation in CI. The following is a crude example of modifying the content
using the ``tomlkit`` library.
.. code:: python
import re, tomlkit
def add(script: str, dependency: str) -> str:
match = re.search(r'(?ms)^__pyproject__ *= *"""\\?$(.+?)^"""$', script)
config = tomlkit.parse(match.group(1))
config['project']['dependencies'].append(dependency)
start, end = match.span(1)
return script[:start] + tomlkit.dumps(config) + script[end:]
Note that this example used a library that preserves TOML formatting. This is
not a requirement for editing by any means but rather is a "nice to have"
feature.
Backwards Compatibility
=======================
At the time of writing, the ``__pyproject__`` variable only appears five times
`on GitHub`__ and four of those belong to a user who appears to already be
using this PEP's exact format.
__ https://github.com/search?q=__pyproject__&type=code
For example, `this script`__ uses ``matplotlib`` and ``pandas`` to plot a
timeseries. It is a good example of a script that you would see in the wild:
self-contained and short.
__ https://github.com/cjolowicz/scripts/blob/31c61e7dad8d17e0070b080abee68f4f505da211/python/plot_timeseries.py
This user's tooling invokes scripts by creating a project at runtime using the
embedded metadata and then uses an entry point that references the main
function.
This PEP allows this user's tooling to remove that extra step of indirection.
This PEP's author has discovered after writing a draft that this pattern is
used in the wild by others (sent private messages).
Security Implications
=====================
If a script containing embedded metadata is ran using a tool that automatically
installs dependencies, this could cause arbitrary code to be downloaded and
installed in the user's environment.
The risk here is part of the functionality of the tool being used to run the
script, and as such should already be addressed by the tool itself. The only
additional risk introduced by this PEP is if an untrusted script with a
embedded metadata is run, when a potentially malicious dependency might be
installed.
This risk is addressed by the normal good practice of reviewing code
before running it. Additionally, tools may be able to provide locking
functionality when configured by their ``[tool]`` sub-table to, for example,
add the resolution result as managed metadata somewhere in the script (this
is what Go's ``gorun`` can do).
How to Teach This
=================
Since the format chosen is the same as the official metadata standard, we can
have a page that describes how to embed the metadata in scripts and to learn
about metadata itself direct users to the living document that describes
`project metadata <pyproject metadata_>`_.
We will document that the name and version fields in the ``[project]`` table
may be elided for simplicity. Additionally, we will have guidance explaining
that single-file scripts cannot (yet) be built into a wheel via standard means.
We will explain that it is up to individual tools whether or not their behavior
is altered based on the embedded metadata. For example, every script runner may
not be able to provide an environment for specific Python versions as defined
by the ``requires-python`` field.
Finally, we may want to list some tools that support this PEP's format.
Recommendations
===============
Tools that support managing different versions of Python should attempt to use
the highest available version of Python that is compatible with the script's
``requires-python`` metadata, if defined.
For projects that have large multi-line external metadata to embed like a
README file, it is recommended that they become directories with a
``pyproject.toml`` file. While this is technically allowed, it is strongly
discouraged to have large chunks of multi-line metadata and is indicative
of the fact that a script has graduated to a more traditional layout.
If the content is small, for example in the case of internal packages, it is
recommended that multi-line *single-quoted* TOML strings (``'''``) be used.
For example:
.. code:: python
__pyproject__ = """
[project]
readme.content-type = "text/markdown"
readme.text = '''
# Some Project
Please refer to our corporate docs
for more information.
'''
"""
Tooling buy-in
==============
The following is a list of tools that have expressed support for this PEP or
have committed to implementing support should it be accepted:
* `Pantsbuild and Pex <https://discuss.python.org/t/31151/15>`__: expressed
support for any way to define dependencies and also features that this PEP
considers as valid use cases such as building packages from scripts and
embedding tool configuration
* `Mypy <https://discuss.python.org/t/31151/16>`__ and
`Ruff <https://discuss.python.org/t/31151/42>`__: strongly expressed support
for embedding tool configuration as it would solve existing pain points for
users
* `Hatch <https://discuss.python.org/t/31151/53>`__: (author of this PEP)
expressed support for all aspects of this PEP, and will be one of the first
tools to support running scripts with specifically configured Python versions
Rejected Ideas
==============
Why not limit to specific metadata fields?
------------------------------------------
By limiting the metadata to a specific set of fields, for example just
``dependencies``, we would prevent legitimate use cases both known and unknown.
The following are examples of known use cases:
* ``requires-python``: For tools that support managing Python installations,
this allows users to target specific versions of Python for new syntax
or standard library functionality.
* ``version``: It is quite common to version scripts for persistence even when
using a VCS like Git. When not using a VCS it is even more common to version,
for example the author has been in multiple time sensitive debugging sessions
with customers where due to the airgapped nature of the environment, the only
way to transfer the script was via email or copying and pasting it into a
chat window. In these cases, versioning is invaluable to ensure that the
customer is using the latest (or a specific) version of the script.
* ``description``: For scripts that don't need an argument parser, or if the
author has never used one, tools can treat this as help text which can be
shown to the user.
By not allowing the ``[tool]`` section, we would prevent especially script
runners from allowing users to configure behavior. For example, a script runner
may support configuration instructing to run scripts in containers for
situations in which there is no cross-platform support for a dependency or if
the setup is too complex for the average user like when requiring Nvidia
drivers. Situations like this would allow users to proceed with what they want
to do whereas otherwise they may stop at that point altogether.
.. _723-comment-block:
Why not use a comment block resembling requirements.txt?
--------------------------------------------------------
This PEP considers there to be different types of users for whom Python code
would live as single-file scripts:
* Non-programmers who are just using Python as a scripting language to achieve
a specific task. These users are unlikely to be familiar with concepts of
operating systems like shebang lines or the ``PATH`` environment variable.
Some examples:
* The average person, perhaps at a workplace, who wants to write a script to
automate something for efficiency or to reduce tedium
* Someone doing data science or machine learning in industry or academia who
wants to write a script to analyze some data or for research purposes.
These users are special in that, although they have limited programming
knowledge, they learn from sources like StackOverflow and blogs that have a
programming bent and are increasingly likely to be part of communities that
share knowledge and code. Therefore, a non-trivial number of these users
will have some familiarity with things like Git(Hub), Jupyter, HuggingFace,
etc.
* Non-programmers who manage operating systems e.g. a sysadmin. These users are
able to set up ``PATH``, for example, but are unlikely to be familiar with
Python concepts like virtual environments. These users often operate in
isolation and have limited need to gain exposure to tools intended for
sharing like Git.
* Programmers who manage operating systems/infrastructure e.g. SREs. These
users are not very likely to be familiar with Python concepts like virtual
environments, but are likely to be familiar with Git and most often use it
to version control everything required to manage infrastructure like Python
scripts and Kubernetes config.
* Programmers who write scripts primarily for themselves. These users over time
accumulate a great number of scripts in various languages that they use to
automate their workflow and often store them in a single directory, that is
potentially version controlled for persistence. Non-Windows users may set
up each Python script with a shebang line pointing to the desired Python
executable or script runner.
This PEP argues that reusing our TOML-based metadata format is the best for
each category of user and that the block comment is only approachable for
those who have familiarity with ``requirements.txt``, which represents a
small subset of users.
* For the average person automating a task or the data scientist, they are
already starting with zero context and are unlikely to be familiar with
TOML nor ``requirements.txt``. These users will very likely rely on
snippets found online via a search engine or utilize AI in the form
of a chat bot or direct code completion software. Searching for Python
metadata formatting will lead them to the TOML-based format that already
exists which they can reuse. The author tested GitHub Copilot with this
PEP and it already supports auto-completion of fields and dependencies.
In contrast, a new format may take years of being trained on the Internet
for models to learn.
Additionally, these users are most susceptible to formatting quirks and
syntax errors. TOML is a well-defined format with existing online
validators that features assignment that is compatible with Python
expressions and has no strict indenting rules. The block comment format
on the other hand could be easily malformed by forgetting the colon, for
example, and debugging why it's not working with a search engine would be
a difficult task for such a user.
* For the sysadmin types, they are equally unlikely as the previously described
users to be familiar with TOML or ``requirements.txt``. For either format
they would have to read documentation. They would likely be more comfortable
with TOML since they are used to structured data formats and there would be
less perceived magic in their systems.
Additionally, for maintenance of their systems ``__pyproject__`` would be
much easier to search for from a shell than a block comment with potentially
numerous extensions over time.
* For the SRE types, they are likely to be familiar with TOML already from
other projects that they might have to work with like configuring the
`GitLab Runner`__ or `Cloud Native Buildpacks`__.
__ https://docs.gitlab.com/runner/configuration/advanced-configuration.html
__ https://buildpacks.io/docs/reference/config/
These users are responsible for the security of their systems and most likely
have security scanners set up to automatically open PRs to update versions
of dependencies. Such automated tools like Dependabot would have a much
easier time using existing TOML libraries than writing their own custom
parser for a block comment format.
* For the programmer types, they are more likely to be familiar with TOML
than they have ever seen a ``requirements.txt`` file, unless they are a
Python programmer who has had previous experience with writing applications.
In the case of experience with the requirements format, it necessarily means
that they are at least somewhat familiar with the ecosystem and therefore
it is safe to assume they know what TOML is.
Another benefit of this PEP to these users is that their IDEs like Visual
Studio Code would be able to provide TOML syntax highlighting much more
easily than each writing custom logic for this feature.
Additionally, since the original block comment alternative format went against
the recommendation of :pep:`8` and as a result linters and IDE auto-formatters
that respected the recommendation would
`fail by default <https://discuss.python.org/t/29905/247>`__, the final
proposal uses standard comments starting with a single ``#`` character.
The concept of regular comments that do not appear to be intended for machines
(i.e. `encoding declarations`__) affecting behavior would not be customary to
users of Python and goes directly against the "explicit is better than
implicit" foundational principle.
__ https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations
Users typing what to them looks like prose could alter runtime behavior. This
PEP takes the view that the possibility of that happening, even when a tool
has been set up as such (maybe by a sysadmin), is unfriendly to users.
Finally, and critically, the alternatives to this PEP like :pep:`722` do not
satisfy the use cases enumerated herein, such as setting the supported Python
versions, the eventual building of scripts into packages, and the ability to
have machines edit metadata on behalf of users. It is very likely that the
requests for such features persist and conceivable that another PEP in the
future would allow for the embedding of such metadata. At that point there
would be multiple ways to achieve the same thing which goes against our
foundational principle of "there should be one - and preferably only one -
obvious way to do it".
Why not consider scripts as projects without wheels?
----------------------------------------------------
There is `an ongoing discussion <pyproject without wheels_>`_ about how to
use ``pyproject.toml`` for projects that are not intended to be built as
wheels. This PEP considers the discussion only tangentially related.
The use case described in that thread is primarily talking about projects that
represent applications like a Django app or a Flask app. These projects are
often installed on each server in a virtual environment with strict dependency
pinning e.g. a lock file with some sort of hash checking. Such projects would
never be distributed as a wheel (except for maybe a transient editable one
that is created when doing ``pip install -e .``).
In contrast, scripts are managed loosely by their runners and would almost
always have relaxed dependency constraints. Additionally, there may be a future
in which there is `a standard way <723-limit-build-backend_>`_ to ship projects
that are in the form of a single file.
.. _723-limit-build-backend:
Why not limit build backend behavior?
-------------------------------------
A previous version of this PEP proposed that the ``[build-system]`` table
mustn't be defined. The rationale was that builds would never occur so it
did not make sense to allow this section.
We removed that limitation based on
`feedback <https://discuss.python.org/t/31151/9>`__ stating that there
are already tools that exist in the wild that build wheels and source
distributions from single files.
The author of the Rust RFC for embedding metadata
`mentioned to us <https://discuss.python.org/t/29905/179>`__ that they are
actively looking into that as well based on user feedback saying that there
is unnecessary friction with managing small projects, which we have also
heard in the Python community.
There has been `a commitment <https://discuss.python.org/t/31151/15>`__ to
support this by at least one major build system.
Why not limit tool behavior?
----------------------------
A previous version of this PEP proposed that non-script running tools SHOULD
NOT modify their behavior when the script is not the sole input to the tool.
For example, if a linter is invoked with the path to a directory, it SHOULD
behave the same as if zero files had embedded metadata.
This was done as a precaution to avoid tool behavior confusion and generating
various feature requests for tools to support this PEP. However, during
discussion we received `feedback <https://discuss.python.org/t/31151/16>`__
from maintainers of tools that this would be undesirable and potentially
confusing to users. Additionally, this may allow for a universally easier
way to configure tools in certain circumstances and solve existing issues.
Why not accept all valid Python expression syntax?
--------------------------------------------------
There has been a suggestion that we should not restrict how the
``__pyproject__`` variable is defined and we should parse the abstract syntax
tree. For example:
.. code:: python
__pyproject__ = (
"""
[project]
dependencies = []
"""
)
We will not be doing this so that every language has the possibility to read
the metadata without dependence on knowledge of every version of Python.
Why not just set up a Python project with a ``pyproject.toml``?
---------------------------------------------------------------
Again, a key issue here is that the target audience for this proposal is people
writing scripts which aren't intended for distribution. Sometimes scripts will
be "shared", but this is far more informal than "distribution" - it typically
involves sending a script via an email with some written instructions on how to
run it, or passing someone a link to a GitHub gist.
Expecting such users to learn the complexities of Python packaging is a
significant step up in complexity, and would almost certainly give the
impression that "Python is too hard for scripts".
In addition, if the expectation here is that the ``pyproject.toml`` will
somehow be designed for running scripts in place, that's a new feature of the
standard that doesn't currently exist. At a minimum, this isn't a reasonable
suggestion until the `current discussion on Discourse
<pyproject without wheels_>`_ about using ``pyproject.toml`` for projects that
won't be distributed as wheels is resolved. And even then, it doesn't address
the "sending someone a script in a gist or email" use case.
Why not infer the requirements from import statements?
------------------------------------------------------
The idea would be to automatically recognize ``import`` statements in the source
file and turn them into a list of requirements.
However, this is infeasible for several reasons. First, the points above about
the necessity to keep the syntax easily parsable, for all Python versions, also
by tools written in other languages, apply equally here.
Second, PyPI and other package repositories conforming to the Simple Repository
API do not provide a mechanism to resolve package names from the module names
that are imported (see also `this related discussion`__).
__ https://discuss.python.org/t/record-the-top-level-names-of-a-wheel-in-metadata/29494
Third, even if repositories did offer this information, the same import name may
correspond to several packages on PyPI. One might object that disambiguating
which package is wanted would only be needed if there are several projects
providing the same import name. However, this would make it easy for anyone to
unintentionally or malevolently break working scripts, by uploading a package to
PyPI providing an import name that is the same as an existing project. The
alternative where, among the candidates, the first package to have been
registered on the index is chosen, would be confusing in case a popular package
is developed with the same import name as an existing obscure package, and even
harmful if the existing package is malware intentionally uploaded with a
sufficiently generic import name that has a high probability of being reused.
A related idea would be to attach the requirements as comments to the import
statements instead of gathering them in a block, with a syntax such as::
import numpy as np # requires: numpy
import rich # requires: rich
This still suffers from parsing difficulties. Also, where to place the comment
in the case of multiline imports is ambiguous and may look ugly::
from PyQt5.QtWidgets import (
QCheckBox, QComboBox, QDialog, QDialogButtonBox,
QGridLayout, QLabel, QSpinBox, QTextEdit
) # requires: PyQt5
Furthermore, this syntax cannot behave as might be intuitively expected
in all situations. Consider::
import platform
if platform.system() == "Windows":
import pywin32 # requires: pywin32
Here, the user's intent is that the package is only required on Windows, but
this cannot be understood by the script runner (the correct way to write
it would be ``requires: pywin32 ; sys_platform == 'win32'``).
(Thanks to Jean Abou-Samra for the clear discussion of this point)
Why not use a requirements file for dependencies?
-------------------------------------------------
Putting your requirements in a requirements file, doesn't require a PEP. You
can do that right now, and in fact it's quite likely that many adhoc solutions
do this. However, without a standard, there's no way of knowing how to locate a
script's dependency data. And furthermore, the requirements file format is
pip-specific, so tools relying on it are depending on a pip implementation
detail.
So in order to make a standard, two things would be required:
1. A standardised replacement for the requirements file format.
2. A standard for how to locate the requiements file for a given script.
The first item is a significant undertaking. It has been discussed on a number
of occasions, but so far no-one has attempted to actually do it. The most
likely approach would be for standards to be developed for individual use cases
currently addressed with requirements files. One option here would be for this
PEP to simply define a new file format which is simply a text file containing
:pep:`508` requirements, one per line. That would just leave the question of
how to locate that file.
The "obvious" solution here would be to do something like name the file the
same as the script, but with a ``.reqs`` extension (or something similar).
However, this still requires *two* files, where currently only a single file is
needed, and as such, does not match the "better batch file" model (shell
scripts and batch files are typically self-contained). It requires the
developer to remember to keep the two files together, and this may not always
be possible. For example, system administration policies may require that *all*
files in a certain directory are executable (the Linux filesystem standards
require this of ``/usr/bin``, for example). And some methods of sharing a
script (for example, publishing it on a text file sharing service like Github's
gist, or a corporate intranet) may not allow for deriving the location of an
associated requirements file from the script's location (tools like ``pipx``
support running a script directly from a URL, so "download and unpack a zip of
the script and itsdependencies" may not be an appropriate requirement).
Essentially, though, the issue here is that there is an explicitly stated
requirement that the format supports storing dependency data *in the script
file itself*. Solutions that don't do that are simply ignoring that
requirement.
Why not use (possibly restricted) Python syntax?
------------------------------------------------
This would typically involve storing metadata as multiple special variables,
such as the following.
.. code:: python
__requires_python__ = ">=3.11"
__dependencies__ = [
"requests",
"click",
]
The most significant problem with this proposal is that it requires all
consumers of the dependency data to implement a Python parser. Even if the
syntax is restricted, the *rest* of the script will use the full Python syntax,
and trying to define a syntax which can be successfully parsed in isolation
from the surrounding code is likely to be extremely difficult and error-prone.
Furthermore, Python's syntax changes in every release. If extracting dependency
data needs a Python parser, the parser will need to know which version of
Python the script is written for, and the overhead for a generic tool of having
a parser that can handle *multiple* versions of Python is unsustainable.
With this approach there is the potential to clutter scripts with many
variables as new extensions get added. Additionally, intuiting which metadata
fields correspond to which variable names would cause confusion for users.
It is worth noting, though, that the ``pip-run`` utility does implement (an
extended form of) this approach. `Further discussion <pip-run issue_>`_ of
the ``pip-run`` design is available on the project's issue tracker.
What about local dependencies?
------------------------------
These can be handled without needing special metadata and tooling, simply by
adding the location of the dependencies to ``sys.path``. This PEP simply isn't
needed for this case. If, on the other hand, the "local dependencies" are
actual distributions which are published locally, they can be specified as
usual with a :pep:`508` requirement, and the local package index specified when
running a tool by using the tool's UI for that.
Open Issues
===========
None at this point.
References
==========
.. _pyproject metadata: https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
.. _doc-comment: https://doc.rust-lang.org/stable/book/ch14-02-publishing-to-crates-io.html#making-useful-documentation-comments
.. _cargo embedded manifest: https://github.com/epage/cargo-script-mvs/blob/main/0000-cargo-script.md#embedded-manifest-format
.. _pip-run issue: https://github.com/jaraco/pip-run/issues/44
.. _pyproject without wheels: https://discuss.python.org/t/projects-that-arent-meant-to-generate-a-wheel-and-pyproject-toml/29684
Footnotes
=========
.. [1] A large number of users use scripts that are version controlled. For
example, `the SREs that were mentioned <723-comment-block_>`_ or
projects that require special maintenance like the
`AWS CLI <https://github.com/aws/aws-cli/tree/4393dcdf044a5275000c9c193d1933c07a08fdf1/scripts>`__
or `Calibre <https://github.com/kovidgoyal/calibre/tree/master/setup>`__.
.. [2] For example, projects like Hatch and Poetry have their own backends
and may wish to support this use case only when their backend is used.
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.