806 lines
35 KiB
ReStructuredText
806 lines
35 KiB
ReStructuredText
PEP: 723
|
|
Title: Embedding pyproject.toml in single-file scripts
|
|
Author: Ofek Lev <ofekmeister@gmail.com>
|
|
Sponsor: Adam Turner <python@quite.org.uk>
|
|
PEP-Delegate: Brett Cannon <brett@python.org>
|
|
Discussions-To: https://discuss.python.org/t/31151
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Topic: Packaging
|
|
Content-Type: text/x-rst
|
|
Created: 04-Aug-2023
|
|
Post-History: `04-Aug-2023 <https://discuss.python.org/t/30979>`__,
|
|
`06-Aug-2023 <https://discuss.python.org/t/31151>`__,
|
|
`23-Aug-2023 <https://discuss.python.org/t/32149>`__,
|
|
Replaces: 722
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP specifies a metadata format that can be embedded in single-file Python
|
|
scripts to assist launchers, IDEs and other external tools which may need to
|
|
interact with such scripts.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
Python is routinely used as a scripting language, with Python scripts as a
|
|
(better) alternative to shell scripts, batch files, etc. When Python code is
|
|
structured as a script, it is usually stored as a single file and does not
|
|
expect the availability of any other local code that may be used for imports.
|
|
As such, it is possible to share with others over arbitrary text-based means
|
|
such as email, a URL to the script, or even a chat window. Code that is
|
|
structured like this may live as a single file forever, never becoming a
|
|
full-fledged project with its own directory and ``pyproject.toml`` file.
|
|
|
|
An issue that users encounter with this approach is that there is no standard
|
|
mechanism to define metadata for tools whose job it is to execute such scripts.
|
|
For example, a tool that runs a script may need to know which dependencies are
|
|
required or the supported version(s) of Python.
|
|
|
|
There is currently no standard tool that addresses this issue, and this PEP
|
|
does *not* attempt to define one. However, any tool that *does* address this
|
|
issue will need to know what the runtime requirements of scripts are. By
|
|
defining a standard format for storing such metadata, existing tools, as well
|
|
as any future tools, will be able to obtain that information without requiring
|
|
users to include tool-specific metadata in their scripts.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
This PEP defines a mechanism for embedding metadata *within the script itself*,
|
|
and not in an external file.
|
|
|
|
We choose to follow the latest developments of other modern packaging
|
|
ecosystems (namely `Go`__ and provisionally `Rust`__) by embedding the existing
|
|
file used to describe projects, in our case ``pyproject.toml``.
|
|
|
|
__ https://github.com/erning/gorun
|
|
__ https://rust-lang.github.io/rfcs/3424-cargo-script.html
|
|
|
|
The format is intended to bridge the gap between different types of users
|
|
of Python. Users will benefit from seamless interoperability with tools that
|
|
already work with TOML.
|
|
|
|
One of the central themes we discovered from the recent
|
|
`packaging survey <https://discuss.python.org/t/22420>`__ is that users have
|
|
begun getting frustrated with the lack of unification regarding both tooling
|
|
and specs. Adding yet another metadata format (like :pep:`722` syntax for a
|
|
list of dependencies), even for a currently unsatisfied use case, would
|
|
further fragment the community.
|
|
|
|
The following are some of the use cases that this PEP wishes to support:
|
|
|
|
* A user facing CLI that is capable of executing scripts. If we take Hatch as
|
|
an example, the interface would be simply
|
|
``hatch run /path/to/script.py [args]`` and Hatch will manage the
|
|
environment for that script. Such tools could be used as shebang lines on
|
|
non-Windows systems e.g. ``#!/usr/bin/env hatch run``.
|
|
* A script that desires to transition to a directory-type project. A user may
|
|
be rapidly prototyping locally or in a remote REPL environment and then
|
|
decide to transition to a more formal project layout if their idea works
|
|
out. This intermediate script stage would be very useful to have fully
|
|
reproducible bug reports. By using the same format, the user can simply copy
|
|
and paste the metadata into a ``pyproject.toml`` file and continue working
|
|
without having to learn a new format. More likely, even, is that tooling will
|
|
eventually support this transformation with a single command.
|
|
* Users that wish to avoid manual dependency management. For example, package
|
|
managers that have commands to add/remove dependencies or dependency update
|
|
automation in CI that triggers based on new versions or in response to
|
|
CVEs [1]_.
|
|
|
|
|
|
Specification
|
|
=============
|
|
|
|
This PEP defines a metadata comment block format loosely inspired [2]_ by
|
|
`reStructuredText Directives`__.
|
|
|
|
__ https://docutils.sourceforge.io/docs/ref/rst/directives.html
|
|
|
|
Any Python script may have top-level comment blocks that start with the line
|
|
``# /// TYPE`` where ``TYPE`` determines how to process the content, and ends
|
|
with the line ``# ///``. Every line between these two lines MUST be a comment
|
|
starting with ``#``. If there are characters after the ``#`` then the first
|
|
character MUST be a space. The embedded content is formed by taking away the
|
|
first two characters of each line if the second character is a space, otherwise
|
|
just the first character (which means the line consists of only a single
|
|
``#``).
|
|
|
|
When there are multiple comment blocks of the same ``TYPE`` defined, tools MUST
|
|
produce an error.
|
|
|
|
Tools reading embedded metadata MAY respect the standard Python encoding
|
|
declaration. If they choose not to do so, they MUST process the file as UTF-8.
|
|
|
|
This is the canonical regular expression that MAY be used to parse the
|
|
metadata:
|
|
|
|
.. code:: text
|
|
|
|
(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$
|
|
|
|
In circumstances where there is a discrepancy between the text specification
|
|
and the regular expression, the text specification takes precedence.
|
|
|
|
Tools MUST NOT read from metadata blocks with types that have not been
|
|
standardized by this PEP or future ones.
|
|
|
|
pyproject type
|
|
--------------
|
|
|
|
The first type of metadata block is named ``pyproject`` which represents
|
|
content similar to [3]_ what one would see in a ``pyproject.toml`` file.
|
|
|
|
This document MAY include the ``[run]`` and ``[tool]`` tables.
|
|
|
|
The :pep:`tool table <518#tool-table>` MAY be used by any tool, script runner
|
|
or otherwise, to configure behavior.
|
|
|
|
The ``[run]`` table MAY include the following optional fields:
|
|
|
|
* ``dependencies``: A list of strings that specifies the runtime dependencies
|
|
of the script. Each entry MUST be a valid :pep:`508` dependency.
|
|
* ``requires-python``: A string that specifies the Python version(s) with which
|
|
the script is compatible. The value of this field MUST be a valid
|
|
:pep:`version specifier <440#version-specifiers>`.
|
|
* ``version``: A string that specifies the version of the script. The value of
|
|
this field MUST be a valid :pep:`440` version.
|
|
|
|
Any future PEPs that define additional fields for the ``[run]`` table when used
|
|
in a ``pyproject.toml`` file MUST include the aforementioned fields exactly as
|
|
specified. The fields defined by this PEP are equally as applicable to
|
|
full-fledged projects as they are to single-file scripts.
|
|
|
|
Script runners MUST error if the specified ``dependencies`` cannot be provided.
|
|
Script runners SHOULD error if no version of Python that satisfies the specified
|
|
``requires-python`` can be provided.
|
|
|
|
Example
|
|
-------
|
|
|
|
The following is an example of a script with an embedded ``pyproject.toml``:
|
|
|
|
.. code:: python
|
|
|
|
# /// pyproject
|
|
# [run]
|
|
# requires-python = ">=3.11"
|
|
# dependencies = [
|
|
# "requests<3",
|
|
# "rich",
|
|
# ]
|
|
# ///
|
|
|
|
import requests
|
|
from rich.pretty import pprint
|
|
|
|
resp = requests.get("https://peps.python.org/api/peps.json")
|
|
data = resp.json()
|
|
pprint([(k, v["title"]) for k, v in data.items()][:10])
|
|
|
|
The following [4]_ is an example of a proposed syntax for single-file Rust
|
|
projects that embeds their equivalent of ``pyproject.toml``, which is called
|
|
``Cargo.toml``:
|
|
|
|
.. code:: rust
|
|
|
|
#!/usr/bin/env cargo
|
|
|
|
//! ```cargo
|
|
//! [dependencies]
|
|
//! regex = "1.8.0"
|
|
//! ```
|
|
|
|
fn main() {
|
|
let re = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();
|
|
println!("Did our date match? {}", re.is_match("2014-01-01"));
|
|
}
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
The following is an example of how to read the metadata on Python 3.11 or
|
|
higher.
|
|
|
|
.. code:: python
|
|
|
|
import re
|
|
import tomllib
|
|
|
|
REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'
|
|
|
|
def read(script: str) -> dict | None:
|
|
name = 'pyproject'
|
|
matches = list(
|
|
filter(lambda m: m.group('type') == name, re.finditer(REGEX, script))
|
|
)
|
|
if len(matches) > 1:
|
|
raise ValueError(f'Multiple {name} blocks found')
|
|
elif len(matches) == 1:
|
|
return tomllib.loads(matches[0])
|
|
else:
|
|
return None
|
|
|
|
Often tools will edit dependencies like package managers or dependency update
|
|
automation in CI. The following is a crude example of modifying the content
|
|
using the ``tomlkit`` library__.
|
|
|
|
__ https://tomlkit.readthedocs.io/en/latest/
|
|
|
|
.. code:: python
|
|
|
|
import re
|
|
|
|
import tomlkit
|
|
|
|
REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'
|
|
|
|
def add(script: str, dependency: str) -> str:
|
|
match = re.search(REGEX, script)
|
|
content = ''.join(
|
|
line[2:] if line.startswith('# ') else line[1:]
|
|
for line in match.group('content').splitlines(keepends=True)
|
|
)
|
|
|
|
config = tomlkit.parse(content)
|
|
config['project']['dependencies'].append(dependency)
|
|
new_content = ''.join(
|
|
f'# {line}' if line.strip() else f'#{line}'
|
|
for line in tomlkit.dumps(config).splitlines(keepends=True)
|
|
)
|
|
|
|
start, end = match.span('content')
|
|
return script[:start] + new_content + script[end:]
|
|
|
|
Note that this example used a library that preserves TOML formatting. This is
|
|
not a requirement for editing by any means but rather is a "nice to have"
|
|
feature.
|
|
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
At the time of writing, the ``# /// pyproject`` block comment starter does not
|
|
appear `on GitHub`__. Therefore, there is little risk of existing scripts being
|
|
broken by this PEP.
|
|
|
|
__ https://github.com/search?q=%22%23+%2F%2F%2F+pyproject%22&type=code
|
|
|
|
|
|
Security Implications
|
|
=====================
|
|
|
|
If a script containing embedded metadata is ran using a tool that automatically
|
|
installs dependencies, this could cause arbitrary code to be downloaded and
|
|
installed in the user's environment.
|
|
|
|
The risk here is part of the functionality of the tool being used to run the
|
|
script, and as such should already be addressed by the tool itself. The only
|
|
additional risk introduced by this PEP is if an untrusted script with a
|
|
embedded metadata is run, when a potentially malicious dependency or transitive
|
|
dependency might be installed.
|
|
|
|
This risk is addressed by the normal good practice of reviewing code
|
|
before running it. Additionally, tools may be able to provide
|
|
`locking functionality <723-tool-configuration_>`__ to ameliorate this risk.
|
|
|
|
|
|
How to Teach This
|
|
=================
|
|
|
|
To embed metadata in a script, define a comment block that starts with the
|
|
line ``# /// pyproject`` and ends with the line ``# ///``. Every line between
|
|
those two lines must be a comment and the full content is derived by removing
|
|
the first two characters. The ``pyproject`` type indicates that the content
|
|
is TOML and resembles a ``pyproject.toml`` file.
|
|
|
|
.. code:: python
|
|
|
|
# /// pyproject
|
|
# [run]
|
|
# dependencies = [
|
|
# "requests<3",
|
|
# "rich",
|
|
# ]
|
|
# requires-python = ">=3.11"
|
|
# version = "0.1.0"
|
|
# ///
|
|
|
|
The two allowed tables are ``[run]`` and ``[tool]``. The ``[run]`` table may
|
|
contain the following fields:
|
|
|
|
.. list-table::
|
|
|
|
* - Field
|
|
- Description
|
|
- Tool behavior
|
|
|
|
* - ``dependencies``
|
|
- A list of strings that specifies the runtime dependencies of the script.
|
|
Each entry must be a valid :pep:`508` dependency.
|
|
- Tools will error if the specified dependencies cannot be provided.
|
|
|
|
* - ``requires-python``
|
|
- A string that specifies the Python version(s)
|
|
with which the script is compatible.
|
|
The value of this field must be a valid
|
|
:pep:`version specifier <440#version-specifiers>`.
|
|
- Tools might error if no version of Python that satisfies
|
|
the constraint can be executed.
|
|
|
|
* - ``version``
|
|
- A string that specifies the version of the script.
|
|
The value of this field must be a valid :pep:`440` version.
|
|
- Tools may use this however they wish, if defined.
|
|
|
|
It is up to individual tools whether or not their behavior is altered based on
|
|
the embedded metadata. For example, every script runner may not be able to
|
|
provide an environment for specific Python versions as defined by the
|
|
``requires-python`` field.
|
|
|
|
The :pep:`tool table <518#tool-table>` may be used by any tool, script runner
|
|
or otherwise, to configure behavior.
|
|
|
|
|
|
Recommendations
|
|
===============
|
|
|
|
Tools that support managing different versions of Python should attempt to use
|
|
the highest available version of Python that is compatible with the script's
|
|
``requires-python`` metadata, if defined.
|
|
|
|
|
|
Tooling buy-in
|
|
==============
|
|
|
|
The following is a list of tools that have expressed support for this PEP or
|
|
have committed to implementing support should it be accepted:
|
|
|
|
* `Pantsbuild and Pex <https://discuss.python.org/t/31151/15>`__: expressed
|
|
support for any way to define dependencies and also features that this PEP
|
|
considers as valid use cases such as building packages from scripts and
|
|
embedding tool configuration
|
|
* `Mypy <https://discuss.python.org/t/31151/16>`__ and
|
|
`Ruff <https://discuss.python.org/t/31151/42>`__: strongly expressed support
|
|
for embedding tool configuration as it would solve existing pain points for
|
|
users
|
|
* `Hatch <https://discuss.python.org/t/31151/53>`__: (author of this PEP)
|
|
expressed support for all aspects of this PEP, and will be one of the first
|
|
tools to support running scripts with specifically configured Python versions
|
|
|
|
|
|
Rejected Ideas
|
|
==============
|
|
|
|
.. _723-comment-block:
|
|
|
|
Why not use a comment block resembling requirements.txt?
|
|
--------------------------------------------------------
|
|
|
|
This PEP considers there to be different types of users for whom Python code
|
|
would live as single-file scripts:
|
|
|
|
* Non-programmers who are just using Python as a scripting language to achieve
|
|
a specific task. These users are unlikely to be familiar with concepts of
|
|
operating systems like shebang lines or the ``PATH`` environment variable.
|
|
Some examples:
|
|
|
|
* The average person, perhaps at a workplace, who wants to write a script to
|
|
automate something for efficiency or to reduce tedium
|
|
* Someone doing data science or machine learning in industry or academia who
|
|
wants to write a script to analyze some data or for research purposes.
|
|
These users are special in that, although they have limited programming
|
|
knowledge, they learn from sources like StackOverflow and blogs that have a
|
|
programming bent and are increasingly likely to be part of communities that
|
|
share knowledge and code. Therefore, a non-trivial number of these users
|
|
will have some familiarity with things like Git(Hub), Jupyter, HuggingFace,
|
|
etc.
|
|
* Non-programmers who manage operating systems e.g. a sysadmin. These users are
|
|
able to set up ``PATH``, for example, but are unlikely to be familiar with
|
|
Python concepts like virtual environments. These users often operate in
|
|
isolation and have limited need to gain exposure to tools intended for
|
|
sharing like Git.
|
|
* Programmers who manage operating systems/infrastructure e.g. SREs. These
|
|
users are not very likely to be familiar with Python concepts like virtual
|
|
environments, but are likely to be familiar with Git and most often use it
|
|
to version control everything required to manage infrastructure like Python
|
|
scripts and Kubernetes config.
|
|
* Programmers who write scripts primarily for themselves. These users over time
|
|
accumulate a great number of scripts in various languages that they use to
|
|
automate their workflow and often store them in a single directory, that is
|
|
potentially version controlled for persistence. Non-Windows users may set
|
|
up each Python script with a shebang line pointing to the desired Python
|
|
executable or script runner.
|
|
|
|
This PEP argues that reusing our TOML-based metadata format is the best for
|
|
each category of user and that the requirements-like block comment is only
|
|
approachable for those who have familiarity with ``requirements.txt``, which
|
|
represents a small subset of users.
|
|
|
|
* For the average person automating a task or the data scientist, they are
|
|
already starting with zero context and are unlikely to be familiar with
|
|
TOML nor ``requirements.txt``. These users will very likely rely on
|
|
snippets found online via a search engine or utilize AI in the form
|
|
of a chat bot or direct code completion software. Searching for Python
|
|
metadata formatting will lead them to the TOML-based format that already
|
|
exists which they can reuse. The author tested GitHub Copilot with this
|
|
PEP and it already supports auto-completion of ``dependencies``. In contrast,
|
|
a new format may take years of being trained on the Internet for models to
|
|
learn.
|
|
|
|
Additionally, these users are most susceptible to formatting quirks and
|
|
syntax errors. TOML is a well-defined format with existing online
|
|
validators that features assignment that is compatible with Python
|
|
expressions and has no strict indenting rules. The block comment format
|
|
on the other hand could be easily malformed by forgetting the colon, for
|
|
example, and debugging why it's not working with a search engine would be
|
|
a difficult task for such a user.
|
|
* For the sysadmin types, they are equally unlikely as the previously described
|
|
users to be familiar with TOML or ``requirements.txt``. For either format
|
|
they would have to read documentation. They would likely be more comfortable
|
|
with TOML since they are used to structured data formats and there would be
|
|
less perceived magic in their systems.
|
|
|
|
Additionally, for maintenance of their systems ``/// pyproject`` would be
|
|
much easier to search for from a shell than a block comment with potentially
|
|
numerous extensions over time.
|
|
* For the SRE types, they are likely to be familiar with TOML already from
|
|
other projects that they might have to work with like configuring the
|
|
`GitLab Runner`__ or `Cloud Native Buildpacks`__.
|
|
|
|
__ https://docs.gitlab.com/runner/configuration/advanced-configuration.html
|
|
__ https://buildpacks.io/docs/reference/config/
|
|
|
|
These users are responsible for the security of their systems and most likely
|
|
have security scanners set up to automatically open PRs to update versions
|
|
of dependencies. Such automated tools like Dependabot would have a much
|
|
easier time using existing TOML libraries than writing their own custom
|
|
parser for a block comment format.
|
|
* For the programmer types, they are more likely to be familiar with TOML
|
|
than they have ever seen a ``requirements.txt`` file, unless they are a
|
|
Python programmer who has had previous experience with writing applications.
|
|
In the case of experience with the requirements format, it necessarily means
|
|
that they are at least somewhat familiar with the ecosystem and therefore
|
|
it is safe to assume they know what TOML is.
|
|
|
|
Another benefit of this PEP to these users is that their IDEs like Visual
|
|
Studio Code would be able to provide TOML syntax highlighting much more
|
|
easily than each writing custom logic for this feature.
|
|
|
|
Additionally, since the original block comment alternative format (double
|
|
``#``) went against the recommendation of :pep:`8` and as a result linters
|
|
and IDE auto-formatters that respected the recommendation would
|
|
`fail by default <https://discuss.python.org/t/29905/247>`__, the final
|
|
proposal uses standard comments starting with a single ``#`` character without
|
|
any obvious start nor end sequence.
|
|
|
|
The concept of regular comments that do not appear to be intended for machines
|
|
(i.e. `encoding declarations`__) affecting behavior would not be customary to
|
|
users of Python and goes directly against the "explicit is better than
|
|
implicit" foundational principle.
|
|
|
|
__ https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations
|
|
|
|
Users typing what to them looks like prose could alter runtime behavior. This
|
|
PEP takes the view that the possibility of that happening, even when a tool
|
|
has been set up as such (maybe by a sysadmin), is unfriendly to users.
|
|
|
|
Finally, and critically, the alternatives to this PEP like :pep:`722` do not
|
|
satisfy the use cases enumerated herein, such as setting the supported Python
|
|
versions, the eventual building of scripts into packages, and the ability to
|
|
have machines edit metadata on behalf of users. It is very likely that the
|
|
requests for such features persist and conceivable that another PEP in the
|
|
future would allow for the embedding of such metadata. At that point there
|
|
would be multiple ways to achieve the same thing which goes against our
|
|
foundational principle of "there should be one - and preferably only one -
|
|
obvious way to do it".
|
|
|
|
Why not use a multi-line string?
|
|
--------------------------------
|
|
|
|
A previous version of this PEP proposed that the metadata be stored as follows:
|
|
|
|
.. code:: python
|
|
|
|
__pyproject__ = """
|
|
...
|
|
"""
|
|
|
|
The most significant problem with this proposal is that the embedded TOML would
|
|
be limited in the following ways:
|
|
|
|
* It would not be possible to use multi-line double-quoted strings in the TOML
|
|
as that would conflict with the Python string containing the document. Many
|
|
TOML writers do not preserve style and may potentially produce output that
|
|
would be malformed.
|
|
* The way in which character escaping works in Python strings is not quite the
|
|
way it works in TOML strings. It would be possible to preserve a one-to-one
|
|
character mapping by enforcing raw strings, but this ``r`` prefix requirement
|
|
may be potentially confusing to users.
|
|
|
|
Why not reuse core metadata fields?
|
|
-----------------------------------
|
|
|
|
A previous version of this PEP proposed to reuse the existing
|
|
`metadata standard <pyproject metadata_>`_ that is used to describe projects.
|
|
|
|
There are two significant problems with this proposal:
|
|
|
|
* The ``name`` and ``version`` fields are required and changing that would
|
|
require its own PEP
|
|
* Reusing the data is `fundamentally a misuse of it`__
|
|
|
|
__ https://snarky.ca/differentiating-between-writing-down-dependencies-to-use-packages-and-for-packages-themselves/
|
|
|
|
Why not limit to specific metadata fields?
|
|
------------------------------------------
|
|
|
|
By limiting the metadata to a specific set of fields, for example just
|
|
``dependencies``, we would prevent legitimate known use cases:
|
|
|
|
* ``requires-python``: For tools that support managing Python installations,
|
|
this allows users to target specific versions of Python for new syntax
|
|
or standard library functionality.
|
|
* ``version``: It is quite common to version scripts for persistence even when
|
|
using a VCS like Git. When not using a VCS it is even more common to version,
|
|
for example the author has been in multiple time sensitive debugging sessions
|
|
with customers where due to the airgapped nature of the environment, the only
|
|
way to transfer the script was via email or copying and pasting it into a
|
|
chat window. In these cases, versioning is invaluable to ensure that the
|
|
customer is using the latest (or a specific) version of the script.
|
|
|
|
.. _723-tool-configuration:
|
|
|
|
Why not limit tool configuration?
|
|
---------------------------------
|
|
|
|
By not allowing the ``[tool]`` table, we would prevent known functionality
|
|
that would benefit users. For example:
|
|
|
|
* A script runner may support injecting of dependency resolution data for an
|
|
embedded lock file (this is what Go's ``gorun`` can do).
|
|
* A script runner may support configuration instructing to run scripts in
|
|
containers for situations in which there is no cross-platform support for a
|
|
dependency or if the setup is too complex for the average user like when
|
|
requiring Nvidia drivers. Situations like this would allow users to proceed
|
|
with what they want to do whereas otherwise they may stop at that point
|
|
altogether.
|
|
* Tools may wish to experiment with features to ease development burden for
|
|
users such as the building of single-file scripts into packages. We received
|
|
`feedback <https://discuss.python.org/t/31151/9>`__ stating that there are
|
|
already tools that exist in the wild that build wheels and source
|
|
distributions from single files.
|
|
|
|
The author of the Rust RFC for embedding metadata
|
|
`mentioned to us <https://discuss.python.org/t/29905/179>`__ that they are
|
|
actively looking into that as well based on user feedback saying that there
|
|
is unnecessary friction with managing small projects.
|
|
|
|
There has been `a commitment <https://discuss.python.org/t/31151/15>`__ to
|
|
support this by at least one major build system.
|
|
|
|
Why not limit tool behavior?
|
|
----------------------------
|
|
|
|
A previous version of this PEP proposed that non-script running tools SHOULD
|
|
NOT modify their behavior when the script is not the sole input to the tool.
|
|
For example, if a linter is invoked with the path to a directory, it SHOULD
|
|
behave the same as if zero files had embedded metadata.
|
|
|
|
This was done as a precaution to avoid tool behavior confusion and generating
|
|
various feature requests for tools to support this PEP. However, during
|
|
discussion we received `feedback <https://discuss.python.org/t/31151/16>`__
|
|
from maintainers of tools that this would be undesirable and potentially
|
|
confusing to users. Additionally, this may allow for a universally easier
|
|
way to configure tools in certain circumstances and solve existing issues.
|
|
|
|
Why not just set up a Python project with a ``pyproject.toml``?
|
|
---------------------------------------------------------------
|
|
|
|
Again, a key issue here is that the target audience for this proposal is people
|
|
writing scripts which aren't intended for distribution. Sometimes scripts will
|
|
be "shared", but this is far more informal than "distribution" - it typically
|
|
involves sending a script via an email with some written instructions on how to
|
|
run it, or passing someone a link to a GitHub gist.
|
|
|
|
Expecting such users to learn the complexities of Python packaging is a
|
|
significant step up in complexity, and would almost certainly give the
|
|
impression that "Python is too hard for scripts".
|
|
|
|
In addition, if the expectation here is that the ``pyproject.toml`` will
|
|
somehow be designed for running scripts in place, that's a new feature of the
|
|
standard that doesn't currently exist. At a minimum, this isn't a reasonable
|
|
suggestion until the `current discussion on Discourse
|
|
<pyproject without wheels_>`_ about using ``pyproject.toml`` for projects that
|
|
won't be distributed as wheels is resolved. And even then, it doesn't address
|
|
the "sending someone a script in a gist or email" use case.
|
|
|
|
Why not infer the requirements from import statements?
|
|
------------------------------------------------------
|
|
|
|
The idea would be to automatically recognize ``import`` statements in the source
|
|
file and turn them into a list of requirements.
|
|
|
|
However, this is infeasible for several reasons. First, the points above about
|
|
the necessity to keep the syntax easily parsable, for all Python versions, also
|
|
by tools written in other languages, apply equally here.
|
|
|
|
Second, PyPI and other package repositories conforming to the Simple Repository
|
|
API do not provide a mechanism to resolve package names from the module names
|
|
that are imported (see also `this related discussion`__).
|
|
|
|
__ https://discuss.python.org/t/record-the-top-level-names-of-a-wheel-in-metadata/29494
|
|
|
|
Third, even if repositories did offer this information, the same import name may
|
|
correspond to several packages on PyPI. One might object that disambiguating
|
|
which package is wanted would only be needed if there are several projects
|
|
providing the same import name. However, this would make it easy for anyone to
|
|
unintentionally or malevolently break working scripts, by uploading a package to
|
|
PyPI providing an import name that is the same as an existing project. The
|
|
alternative where, among the candidates, the first package to have been
|
|
registered on the index is chosen, would be confusing in case a popular package
|
|
is developed with the same import name as an existing obscure package, and even
|
|
harmful if the existing package is malware intentionally uploaded with a
|
|
sufficiently generic import name that has a high probability of being reused.
|
|
|
|
A related idea would be to attach the requirements as comments to the import
|
|
statements instead of gathering them in a block, with a syntax such as::
|
|
|
|
import numpy as np # requires: numpy
|
|
import rich # requires: rich
|
|
|
|
This still suffers from parsing difficulties. Also, where to place the comment
|
|
in the case of multiline imports is ambiguous and may look ugly::
|
|
|
|
from PyQt5.QtWidgets import (
|
|
QCheckBox, QComboBox, QDialog, QDialogButtonBox,
|
|
QGridLayout, QLabel, QSpinBox, QTextEdit
|
|
) # requires: PyQt5
|
|
|
|
Furthermore, this syntax cannot behave as might be intuitively expected
|
|
in all situations. Consider::
|
|
|
|
import platform
|
|
if platform.system() == "Windows":
|
|
import pywin32 # requires: pywin32
|
|
|
|
Here, the user's intent is that the package is only required on Windows, but
|
|
this cannot be understood by the script runner (the correct way to write
|
|
it would be ``requires: pywin32 ; sys_platform == 'win32'``).
|
|
|
|
(Thanks to Jean Abou-Samra for the clear discussion of this point)
|
|
|
|
Why not use a requirements file for dependencies?
|
|
-------------------------------------------------
|
|
|
|
Putting your requirements in a requirements file, doesn't require a PEP. You
|
|
can do that right now, and in fact it's quite likely that many adhoc solutions
|
|
do this. However, without a standard, there's no way of knowing how to locate a
|
|
script's dependency data. And furthermore, the requirements file format is
|
|
pip-specific, so tools relying on it are depending on a pip implementation
|
|
detail.
|
|
|
|
So in order to make a standard, two things would be required:
|
|
|
|
1. A standardised replacement for the requirements file format.
|
|
2. A standard for how to locate the requirements file for a given script.
|
|
|
|
The first item is a significant undertaking. It has been discussed on a number
|
|
of occasions, but so far no-one has attempted to actually do it. The most
|
|
likely approach would be for standards to be developed for individual use cases
|
|
currently addressed with requirements files. One option here would be for this
|
|
PEP to simply define a new file format which is simply a text file containing
|
|
:pep:`508` requirements, one per line. That would just leave the question of
|
|
how to locate that file.
|
|
|
|
The "obvious" solution here would be to do something like name the file the
|
|
same as the script, but with a ``.reqs`` extension (or something similar).
|
|
However, this still requires *two* files, where currently only a single file is
|
|
needed, and as such, does not match the "better batch file" model (shell
|
|
scripts and batch files are typically self-contained). It requires the
|
|
developer to remember to keep the two files together, and this may not always
|
|
be possible. For example, system administration policies may require that *all*
|
|
files in a certain directory are executable (the Linux filesystem standards
|
|
require this of ``/usr/bin``, for example). And some methods of sharing a
|
|
script (for example, publishing it on a text file sharing service like Github's
|
|
gist, or a corporate intranet) may not allow for deriving the location of an
|
|
associated requirements file from the script's location (tools like ``pipx``
|
|
support running a script directly from a URL, so "download and unpack a zip of
|
|
the script and its dependencies" may not be an appropriate requirement).
|
|
|
|
Essentially, though, the issue here is that there is an explicitly stated
|
|
requirement that the format supports storing dependency data *in the script
|
|
file itself*. Solutions that don't do that are simply ignoring that
|
|
requirement.
|
|
|
|
Why not use (possibly restricted) Python syntax?
|
|
------------------------------------------------
|
|
|
|
This would typically involve storing metadata as multiple special variables,
|
|
such as the following.
|
|
|
|
.. code:: python
|
|
|
|
__requires_python__ = ">=3.11"
|
|
__dependencies__ = [
|
|
"requests",
|
|
"click",
|
|
]
|
|
|
|
The most significant problem with this proposal is that it requires all
|
|
consumers of the dependency data to implement a Python parser. Even if the
|
|
syntax is restricted, the *rest* of the script will use the full Python syntax,
|
|
and trying to define a syntax which can be successfully parsed in isolation
|
|
from the surrounding code is likely to be extremely difficult and error-prone.
|
|
|
|
Furthermore, Python's syntax changes in every release. If extracting dependency
|
|
data needs a Python parser, the parser will need to know which version of
|
|
Python the script is written for, and the overhead for a generic tool of having
|
|
a parser that can handle *multiple* versions of Python is unsustainable.
|
|
|
|
With this approach there is the potential to clutter scripts with many
|
|
variables as new extensions get added. Additionally, intuiting which metadata
|
|
fields correspond to which variable names would cause confusion for users.
|
|
|
|
It is worth noting, though, that the ``pip-run`` utility does implement (an
|
|
extended form of) this approach. `Further discussion <pip-run issue_>`_ of
|
|
the ``pip-run`` design is available on the project's issue tracker.
|
|
|
|
What about local dependencies?
|
|
------------------------------
|
|
|
|
These can be handled without needing special metadata and tooling, simply by
|
|
adding the location of the dependencies to ``sys.path``. This PEP simply isn't
|
|
needed for this case. If, on the other hand, the "local dependencies" are
|
|
actual distributions which are published locally, they can be specified as
|
|
usual with a :pep:`508` requirement, and the local package index specified when
|
|
running a tool by using the tool's UI for that.
|
|
|
|
Open Issues
|
|
===========
|
|
|
|
None at this point.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. _pyproject metadata: https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
|
|
.. _pip-run issue: https://github.com/jaraco/pip-run/issues/44
|
|
.. _pyproject without wheels: https://discuss.python.org/t/projects-that-arent-meant-to-generate-a-wheel-and-pyproject-toml/29684
|
|
|
|
|
|
Footnotes
|
|
=========
|
|
|
|
.. [1] A large number of users use scripts that are version controlled. For
|
|
example, `the SREs that were mentioned <723-comment-block_>`_ or
|
|
projects that require special maintenance like the
|
|
`AWS CLI <https://github.com/aws/aws-cli/tree/4393dcdf044a5275000c9c193d1933c07a08fdf1/scripts>`__
|
|
or `Calibre <https://github.com/kovidgoyal/calibre/tree/master/setup>`__.
|
|
.. [2] The syntax is taken directly from the final resolution of the
|
|
`Blocks extension`__ to `Python Markdown`__.
|
|
|
|
__ https://github.com/facelessuser/pymdown-extensions/discussions/1973
|
|
__ https://github.com/Python-Markdown/markdown
|
|
.. [3] A future PEP that officially introduces the ``[run]`` table to
|
|
``pyproject.toml`` files will make this PEP not just similar but a strict
|
|
subset.
|
|
.. [4] One important thing to note is that the metadata is embedded in a
|
|
`doc-comment`__ (their equivalent of docstrings). `Other syntaxes`__ are
|
|
under consideration within the Rust project.
|
|
|
|
__ https://doc.rust-lang.org/stable/book/ch14-02-publishing-to-crates-io.html#making-useful-documentation-comments
|
|
__ https://github.com/epage/cargo-script-mvs/blob/main/0000-cargo-script.md#embedded-manifest-format
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|