2023-08-03 17:14:28 -04:00
|
|
|
PEP: 722
|
|
|
|
Title: Dependency specification for single-file scripts
|
|
|
|
Author: Paul Moore <p.f.moore@gmail.com>
|
|
|
|
PEP-Delegate: Brett Cannon <brett@python.org>
|
|
|
|
Discussions-To: https://discuss.python.org/t/29905
|
|
|
|
Status: Draft
|
|
|
|
Type: Standards Track
|
|
|
|
Topic: Packaging
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
Created: 19-Jul-2023
|
|
|
|
Post-History: `19-Jul-2023 <https://discuss.python.org/t/29905>`__
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
========
|
|
|
|
|
|
|
|
This PEP specifies a format for including 3rd-party dependencies in a
|
|
|
|
single-file Python script.
|
|
|
|
|
|
|
|
|
|
|
|
Motivation
|
|
|
|
==========
|
|
|
|
|
|
|
|
Not all Python code is structured as a "project", in the sense of having its own
|
|
|
|
directory complete with a ``pyproject.toml`` file, and being built into an
|
|
|
|
installable distribution package. Python is also routinely used as a scripting
|
|
|
|
language, with Python scripts as a (better) alternative to shell scripts, batch
|
|
|
|
files, etc. When used to create scripts, Python code is typically stored as a
|
|
|
|
single file, often in a directory dedicated to such "utility scripts", which
|
|
|
|
might be in a mix of languages with Python being only one possibility among
|
|
|
|
many. Such scripts may be shared, often by something as simple as email, or a
|
|
|
|
link to a URL such as a Github gist. But they are typically *not* "distributed"
|
|
|
|
or "installed" as part of a normal workflow.
|
|
|
|
|
|
|
|
One problem when using Python as a scripting language in this way is how to run
|
|
|
|
the script in an environment that contains whatever third party dependencies are
|
|
|
|
required by the script. There is currently no standard tool that addresses this
|
|
|
|
issue, and this PEP does *not* attempt to define one. However, any tool that
|
|
|
|
*does* address this issue will need to know what 3rd party dependencies a script
|
|
|
|
requires. By defining a standard format for storing such data, existing tools,
|
|
|
|
as well as any future tools, will be able to obtain that information without
|
|
|
|
requiring users to include tool-specific metadata in their scripts.
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
=========
|
|
|
|
|
|
|
|
Because a key requirement is writing single-file scripts, and simple sharing by
|
|
|
|
giving someone a copy of the script, the PEP defines a mechanism for embedding
|
|
|
|
dependency data *within the script itself*, and not in an external file.
|
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
We define the concept of a *dependency block* that contains information about
|
|
|
|
what 3rd party packages a script depends on.
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
In order to identify dependency blocks, the script can simply be read as a text
|
2023-08-03 17:14:28 -04:00
|
|
|
file. This is deliberate, as Python syntax changes over time, so attempting to
|
|
|
|
parse the script as Python code would require choosing a specific version of
|
|
|
|
Python syntax. Also, it is likely that at least some tools will not be written
|
|
|
|
in Python, and expecting them to implement a Python parser is too much of a
|
|
|
|
burden.
|
|
|
|
|
|
|
|
However, to avoid needing changes to core Python, the format is designed to
|
|
|
|
appear as comments to the Python parser. It is possible to write code where a
|
2023-08-08 12:26:32 -04:00
|
|
|
dependency block is *not* interpreted as a comment (for example, by embedding it
|
2023-08-03 17:14:28 -04:00
|
|
|
in a Python multi-line string), but such uses are discouraged and can easily be
|
|
|
|
avoided assuming you are not deliberately trying to create a pathological
|
|
|
|
example.
|
|
|
|
|
|
|
|
A `review <language survey_>`_ of how other languages allow scripts to specify
|
|
|
|
their dependencies shows that a "structured comment" like this is a
|
|
|
|
commonly-used approach.
|
|
|
|
|
|
|
|
Specification
|
|
|
|
=============
|
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
The content of this section will be published in the Python Packaging user
|
|
|
|
guide, PyPA Specifications section, as a document with the title "Embedding
|
|
|
|
Metadata in Script Files".
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
Any Python script may contain a *dependency block*. The dependency block is
|
|
|
|
identified by reading the script *as a text file* (i.e., the file is not parsed
|
|
|
|
as Python source code), looking for the first line of the form::
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
# Script Dependencies:
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
The hash character must be at the start of the line with no preceding whitespace.
|
|
|
|
The text "Script Dependencies" is recognised regardless of case, and the spaces
|
|
|
|
represent arbitrary whitespace (although at least one space must be present). The
|
|
|
|
following regular expression recognises the dependency block header line::
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
(?i)^#\s+script\s+dependencies:\s*$
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
Tools reading the dependency block MAY respect the standard Python encoding
|
|
|
|
declaration. If they choose not to do so, they MUST process the file as UTF-8.
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
After the header line, all lines in the file up to the first line that doesn't
|
|
|
|
start with a ``#`` sign are considered *dependency lines* and are treated as
|
|
|
|
follows:
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
1. The initial ``#`` sign is stripped.
|
|
|
|
2. If the line contains the character sequence " # " (SPACE HASH SPACE), then
|
|
|
|
those characters and any subsequent characters are discarded. This allows
|
|
|
|
dependency blocks to contain inline comments.
|
|
|
|
3. Whitespace at the start and end of the remaining text is discarded.
|
|
|
|
4. If the line is now empty, it is ignored.
|
|
|
|
5. The content of the line MUST now be a valid :pep:`508` dependency specifier.
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
The requirement for spaces before and after the ``#`` in an inline comment is
|
|
|
|
necessary to distinguish them from part of a :pep:`508` URL specifier (which
|
|
|
|
can contain a hash, but without surrounding whitespace).
|
2023-08-03 17:14:28 -04:00
|
|
|
|
|
|
|
Consumers MUST validate that at a minimum, all dependencies start with a
|
|
|
|
``name`` as defined in :pep:`508`, and they MAY validate that all dependencies
|
|
|
|
conform fully to :pep:`508`. They MUST fail with an error if they find an
|
|
|
|
invalid specifier.
|
|
|
|
|
|
|
|
Example
|
|
|
|
-------
|
|
|
|
|
|
|
|
The following is an example of a script with an embedded dependency block::
|
|
|
|
|
|
|
|
# In order to run, this script needs the following 3rd party libraries
|
|
|
|
#
|
2023-08-08 12:26:32 -04:00
|
|
|
# Script Dependencies:
|
|
|
|
# requests
|
|
|
|
# rich # Needed for the output
|
|
|
|
#
|
|
|
|
# # Not needed - just to show that fragments in URLs do not
|
|
|
|
# # get treated as comments
|
|
|
|
# pip @ https://github.com/pypa/pip/archive/1.3.1.zip#sha1=da9234ee9982d4bbb3c72346a6de940a148ea686
|
2023-08-03 17:14:28 -04:00
|
|
|
|
|
|
|
import requests
|
|
|
|
from rich.pretty import pprint
|
|
|
|
|
|
|
|
resp = requests.get("https://peps.python.org/api/peps.json")
|
|
|
|
data = resp.json()
|
|
|
|
pprint([(k, v["title"]) for k, v in data.items()][:10])
|
|
|
|
|
|
|
|
|
|
|
|
Backwards Compatibility
|
|
|
|
=======================
|
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
As dependency blocks take the form of a structured comment, they can be added
|
2023-08-03 17:14:28 -04:00
|
|
|
without altering the meaning of existing code.
|
|
|
|
|
|
|
|
It is possible that a comment may already exist which matches the form of a
|
2023-08-08 12:26:32 -04:00
|
|
|
dependency block. While the identifying header text, "Script Dependencies" is
|
|
|
|
chosen to minimise this risk, it is still possible.
|
2023-08-03 17:14:28 -04:00
|
|
|
|
|
|
|
In the rare case where an existing comment would be interpreted incorrectly as a
|
|
|
|
dependency block, this can be addressed by adding an actual dependency block
|
|
|
|
(which can be empty if the script has no dependencies) earlier in the code.
|
|
|
|
|
|
|
|
|
|
|
|
Security Implications
|
|
|
|
=====================
|
|
|
|
|
|
|
|
If a script containing a dependency block is run using a tool that automatically
|
|
|
|
installs dependencies, this could cause arbitrary code to be downloaded and
|
|
|
|
installed in the user's environment.
|
|
|
|
|
|
|
|
The risk here is part of the functionality of the tool being used to run the
|
|
|
|
script, and as such should already be addressed by the tool itself. The only
|
|
|
|
additional risk introduced by this PEP is if an untrusted script with a
|
|
|
|
dependency block is run, when a potentially malicious dependency might be
|
|
|
|
installed. This risk is addressed by the normal good practice of reviewing code
|
|
|
|
before running it.
|
|
|
|
|
|
|
|
|
|
|
|
How to Teach This
|
|
|
|
=================
|
|
|
|
|
|
|
|
The format is intended to be close to how a developer might already specify
|
|
|
|
script dependencies in an explanatory comment. The required structure is
|
2023-08-08 12:26:32 -04:00
|
|
|
deliberately minimal, so that formatting rules are easy to learn.
|
2023-08-03 17:14:28 -04:00
|
|
|
|
|
|
|
Users will need to know how to write Python dependency specifiers. This is
|
|
|
|
covered by :pep:`508`, but for simple examples (which is expected to be the norm
|
|
|
|
for inexperienced users) the syntax is either just a package name, or a name and
|
|
|
|
a version restriction, which is fairly well-understood syntax.
|
|
|
|
|
|
|
|
Users will also know how to *run* a script using a tool that interprets
|
|
|
|
dependency data. This is not covered by this PEP, as it is the responsibility of
|
|
|
|
such a tool to document how it should be used.
|
|
|
|
|
|
|
|
Note that the core Python interpreter does *not* interpret dependency blocks.
|
|
|
|
This may be a point of confusion for beginners, who try to run ``python
|
|
|
|
some_script.py`` and do not understand why it fails. This is no different than
|
|
|
|
the current status quo, though, where running a script without its dependencies
|
|
|
|
present will give an error.
|
|
|
|
|
|
|
|
In general, it is assumed that if a beginner is given a script with dependencies
|
|
|
|
(regardless of whether they are specified in a dependency block), the person
|
|
|
|
supplying the script should explain how to run that script, and if that involves
|
|
|
|
using a script runner tool, that should be noted.
|
|
|
|
|
|
|
|
|
|
|
|
Recommendations
|
|
|
|
===============
|
|
|
|
|
|
|
|
This section is non-normative and simply describes "good practices" when using
|
2023-08-08 18:28:16 -04:00
|
|
|
dependency blocks.
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
While it is permitted for tools to do minimal validation of requirements, in
|
|
|
|
practice they should do as much "sanity check" validation as possible, even if
|
|
|
|
they cannot do a full check for :pep:`508` syntax. This helps to ensure that
|
|
|
|
dependency blocks that are not correctly terminated are reported early. A good
|
|
|
|
compromise between the minimal approach of checking just that the requirement
|
|
|
|
starts with a name, and full :pep:`508` validation, is to check for a bare name,
|
|
|
|
or a name followed by optional whitespace, and then one of ``[`` (extra), ``@``
|
|
|
|
(urlspec), ``;`` (marker) or one of ``(<!=>~`` (version).
|
|
|
|
|
|
|
|
Scripts should, in general, place the dependency block at the top of the file,
|
2023-08-03 17:14:28 -04:00
|
|
|
either immediately after any shebang line, or straight after the script
|
2023-08-08 12:26:32 -04:00
|
|
|
docstring. In particular, the dependency block should always be placed before
|
2023-08-03 17:14:28 -04:00
|
|
|
any executable code in the file. This makes it easy for the human reader to
|
2023-08-08 12:26:32 -04:00
|
|
|
locate it.
|
2023-08-03 17:14:28 -04:00
|
|
|
|
|
|
|
|
|
|
|
Reference Implementation
|
|
|
|
========================
|
|
|
|
|
|
|
|
Code to implement this proposal in Python is fairly straightforward, so the
|
|
|
|
reference implementation can be included here.
|
|
|
|
|
|
|
|
.. code:: python
|
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
import re
|
2023-08-03 17:14:28 -04:00
|
|
|
import tokenize
|
|
|
|
from packaging.requirements import Requirement
|
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
DEPENDENCY_BLOCK_MARKER = r"(?i)^#\s+script\s+dependencies:\s*$"
|
|
|
|
|
2023-08-03 17:14:28 -04:00
|
|
|
def read_dependency_block(filename):
|
2023-08-08 12:26:32 -04:00
|
|
|
# Use the tokenize module to handle any encoding declaration.
|
|
|
|
with tokenize.open(filename) as f:
|
2023-08-08 18:28:16 -04:00
|
|
|
# Skip lines until we reach a dependency block (OR EOF).
|
2023-08-08 12:26:32 -04:00
|
|
|
for line in f:
|
|
|
|
if re.match(DEPENDENCY_BLOCK_MARKER, line):
|
|
|
|
break
|
2023-08-08 18:28:16 -04:00
|
|
|
# Read dependency lines until we hit a line that doesn't
|
|
|
|
# start with #, or we are at EOF.
|
|
|
|
for line in f:
|
|
|
|
if not line.startswith("#"):
|
|
|
|
break
|
|
|
|
# Remove comments. An inline comment is introduced by
|
|
|
|
# a hash, which must be preceded and followed by a
|
|
|
|
# space.
|
|
|
|
line = line[1:].split(" # ", maxsplit=1)[0]
|
|
|
|
line = line.strip()
|
|
|
|
# Ignore empty lines
|
|
|
|
if not line:
|
|
|
|
continue
|
|
|
|
# Try to convert to a requirement. This will raise
|
|
|
|
# an error if the line is not a PEP 508 requirement
|
|
|
|
yield Requirement(line)
|
|
|
|
|
2023-08-03 17:14:28 -04:00
|
|
|
|
|
|
|
A format similar to the one proposed here is already supported `in pipx
|
|
|
|
<https://github.com/pypa/pipx/pull/916>`__ and in `pip-run
|
|
|
|
<https://pypi.org/project/pip-run/>`__.
|
|
|
|
|
|
|
|
|
|
|
|
Rejected Ideas
|
|
|
|
==============
|
|
|
|
|
|
|
|
Why not include other metadata?
|
|
|
|
-------------------------------
|
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
The core use case addressed by this proposal is that of identifying what
|
|
|
|
dependencies a standalone script needs in order to run successfully. This is a
|
|
|
|
common real-world issue that is currently solved by script runner tools, using
|
|
|
|
implementation-specific ways of storing the data. Standardising the storage
|
|
|
|
format improves interoperability by not typing the script to a particular
|
|
|
|
runner.
|
|
|
|
|
|
|
|
While it is arguable that other forms of metadata could be useful in a
|
|
|
|
standalone script, the need is largely theoretical at this point. In practical
|
|
|
|
terms, scripts either don't use other metadata, or they store it in existing,
|
|
|
|
widely used (and therefore de facto standard) formats. For example, scripts
|
|
|
|
needing README style text typically use the standard Python module docstring,
|
|
|
|
and scripts wanting to declare a version use the common convention of having a
|
|
|
|
``__version__`` variable.
|
|
|
|
|
|
|
|
One case which was raised during the discussion on this PEP, was the ability to
|
|
|
|
declare a minimum Python version that a script needed to run, by analogy with
|
|
|
|
the ``Requires-Python`` core metadata item for packages. Unlike packages,
|
|
|
|
scripts are normally only run by one user or in one environment, in contexts
|
|
|
|
where multiple versions of Python are uncommon. The need for this metadata is
|
|
|
|
therefore much less critical in the case of scripts. As further evidence of
|
|
|
|
this, the two key script runners currently available, ``pipx`` and ``pip-run``
|
|
|
|
do not offer a means of including this data in a script.
|
|
|
|
|
|
|
|
Creating a standard "metadata container" format would unify the various
|
|
|
|
approaches, but in practical terms there is no real need for unification, and
|
|
|
|
the disruption would either delay adoption, or more likely simply mean script
|
|
|
|
authors would ignore the standard.
|
|
|
|
|
|
|
|
This proposal therefore chooses to focus just on the one use case where there is
|
|
|
|
a clear need for something, and no existing standard or common practice.
|
|
|
|
|
|
|
|
|
|
|
|
Why not use a marker per line?
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
Rather than using a comment block with a header, another possibility would be to
|
|
|
|
use a marker on each line, something like::
|
|
|
|
|
|
|
|
# Script-Dependency: requests
|
|
|
|
# Script-Dependency: click
|
|
|
|
|
|
|
|
While this makes it easier to parse lines individually, it has a number of
|
|
|
|
issues. The first is simply that it's rather verbose, and less readable. This is
|
|
|
|
clearly affected by the chosen keyword, but all of the suggested options were
|
|
|
|
(in the author's opinion) less readable than the block comment form.
|
|
|
|
|
|
|
|
More importantly, this form *by design* makes it impossible to require that the
|
|
|
|
dependency specifiers are all together in a single block. As a result, it's not
|
|
|
|
possible for a human reader, without a careful check of the whole file, to be
|
|
|
|
sure that they have identified all of the dependencies. See the question below,
|
|
|
|
"Why not allow multiple dependency blocks and merge them?", for further
|
|
|
|
discussion of this problem.
|
|
|
|
|
|
|
|
Finally, as the reference implementation demonstrates, parsing the "comment
|
|
|
|
block" form isn't, in practice, significantly more difficult than parsing this
|
|
|
|
form.
|
|
|
|
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
Why not use a distinct form of comment for the dependency block?
|
|
|
|
----------------------------------------------------------------
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
A previous version of this proposal used ``##`` to identify dependency blocks.
|
|
|
|
Unfortunately, however, the flake8 linter implements a rule requiring that
|
|
|
|
comments must have a space after the initial ``#`` sign. While the PEP author
|
|
|
|
considers that rule misguided, it is on by default and as a result would cause
|
|
|
|
checks to fail when faced with a dependency block.
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
Furthermore, the ``black`` formatter, although it allows the ``##`` form, does
|
|
|
|
add a space after the ``#`` for most other forms of comment. This means that if
|
|
|
|
we chose an alternative like ``#%``, automatic reformatting would corrupt the
|
|
|
|
dependency block. Forms including a space, like ``# #`` are possible, but less
|
|
|
|
natural for the average user (omitting the space is an obvious mistake to make).
|
2023-08-03 17:14:28 -04:00
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
While it is possible that linters and formatters could be changed to recognise
|
|
|
|
the new standard, the benefit of having a dedicated prefix did not seem
|
|
|
|
sufficient to justify the transition cost, or the risk that users might be using
|
|
|
|
older tools.
|
|
|
|
|
|
|
|
|
|
|
|
Why not allow multiple dependency blocks and merge them?
|
|
|
|
--------------------------------------------------------
|
|
|
|
|
|
|
|
Because it's too easy for the human reader to miss the fact that there's a
|
|
|
|
second dependency block. This could simply result in the script runner
|
|
|
|
unexpectedly downloading extra packages, or it could even be a way to smuggle
|
|
|
|
malicious packages onto a user's machine (by "hiding" a second dependency block
|
|
|
|
in the body of the script).
|
|
|
|
|
|
|
|
While the principle of "don't run untrusted code" applies here, the benefits
|
|
|
|
aren't sufficient to be worth the risk.
|
2023-08-03 17:14:28 -04:00
|
|
|
|
|
|
|
|
|
|
|
Why not use a more standard data format (e.g., TOML)?
|
|
|
|
-----------------------------------------------------
|
|
|
|
|
|
|
|
First of all, the only practical choice for an alternative format is TOML.
|
|
|
|
Python packaging has standardised on TOML for structured data, and using a
|
|
|
|
different format, such as YAML or JSON, would add complexity and confusion for
|
|
|
|
no real benefit.
|
|
|
|
|
|
|
|
So the question is essentially, "why not use TOML?"
|
|
|
|
|
2023-08-08 18:28:16 -04:00
|
|
|
The key idea behind the "dependency block" format is to define something that
|
2023-08-03 17:14:28 -04:00
|
|
|
reads naturally as a comment in the script. Dependency data is useful both for
|
|
|
|
tools and for the human reader, so having a human readable format is beneficial.
|
|
|
|
On the other hand, TOML of necessity has a syntax of its own, which distracts
|
|
|
|
from the underlying data.
|
|
|
|
|
|
|
|
It is important to remember that developers who *write* scripts in Python are
|
|
|
|
often *not* experienced in Python, or Python packaging. They are often systems
|
|
|
|
administrators, or data analysts, who may simply be using Python as a "better
|
|
|
|
batch file". For such users, the TOML format is extremely likely to be
|
|
|
|
unfamiliar, and the syntax will be obscure to them, and not particularly
|
|
|
|
intuitive. Such developers may well be copying dependency specifiers from
|
|
|
|
sources such as Stack Overflow, without really understanding them. Having to
|
|
|
|
embed such a requirement into a TOML structure is an additional complexity --
|
|
|
|
and it is important to remember that the goal here is to make using 3rd party
|
|
|
|
libraries *easy* for such users.
|
|
|
|
|
|
|
|
Furthermore, TOML, by its nature, is a flexible format intended to support very
|
|
|
|
general data structures. There are *many* ways of writing a simple list of
|
|
|
|
strings in it, and it will not be clear to inexperienced users which form to use.
|
|
|
|
|
2023-08-16 12:13:33 -04:00
|
|
|
Another potential issue is that using a generalised TOML parser can `in some cases
|
|
|
|
<https://discuss.python.org/t/pep-722-dependency-specification-for-single-file-scripts/29905/275>`__
|
|
|
|
result in a measurable performance overhead. Startup time is often quoted as an
|
|
|
|
issue when running small scripts, so this may be a problem for script runners that
|
|
|
|
are aiming for high performance.
|
|
|
|
|
2023-08-03 17:14:28 -04:00
|
|
|
And finally, there will be tools that expect to *write* dependency data into
|
|
|
|
scripts -- for example, an IDE with a feature that automatically adds an import
|
|
|
|
and a dependency specifier when you reference a library function. While
|
|
|
|
libraries exist that allow editing TOML data, they are not always good at
|
2023-08-08 12:26:32 -04:00
|
|
|
preserving the user's layout. Even if libraries exist which do an effective job
|
|
|
|
at this, expecting all tools to use such a library is a significant imposition
|
|
|
|
on code supporting this PEP.
|
2023-08-03 17:14:28 -04:00
|
|
|
|
|
|
|
By choosing a simple, line-based format with no quoting rules, dependency data
|
|
|
|
is easy to read (for humans and tools) and easy to write. The format doesn't
|
|
|
|
have the flexibility of something like TOML, but the use case simply doesn't
|
|
|
|
demand that sort of flexibility.
|
|
|
|
|
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
Why not use (possibly restricted) Python syntax?
|
|
|
|
------------------------------------------------
|
|
|
|
|
|
|
|
This would typically involve storing the dependencies as a (runtime) list
|
|
|
|
variable with a conventional name, such as::
|
|
|
|
|
|
|
|
__requires__ = [
|
|
|
|
"requests",
|
|
|
|
"click",
|
|
|
|
]
|
|
|
|
|
|
|
|
Other suggestions include a static multi-line string, or including the
|
|
|
|
dependencies in the script's docstring.
|
|
|
|
|
|
|
|
The most significant problem with this proposal is that it requires all
|
|
|
|
consumers of the dependency data to implement a Python parser. Even if the
|
|
|
|
syntax is restricted, the *rest* of the script will use the full Python syntax,
|
|
|
|
and trying to define a syntax which can be successfully parsed in isolation from
|
|
|
|
the surrounding code is likely to be extremely difficult and error-prone.
|
|
|
|
|
|
|
|
Furthermore, Python's syntax changes in every release. If extracting dependency
|
|
|
|
data needs a Python parser, the parser will need to know which version of Python
|
|
|
|
the script is written for, and the overhead for a generic tool of having a
|
|
|
|
parser that can handle *multiple* versions of Python is unsustainable.
|
|
|
|
|
|
|
|
Even if the above issues could be addressed, the format would give the
|
|
|
|
impression that the data could be altered at runtime. However, this is not the
|
|
|
|
case in general, and code that tries to do so will encounter unexpected and
|
|
|
|
confusing behaviour.
|
|
|
|
|
|
|
|
And finally, there is no evidence that having dependency data available at
|
|
|
|
runtime is of any practical use. Should such a use be found, it is simple enough
|
|
|
|
to get the data by parsing the source - ``read_dependency_block(__file__)``.
|
|
|
|
|
|
|
|
It is worth noting, though, that the ``pip-run`` utility does implement (an
|
|
|
|
extended form of) this approach. `Further discussion <pip-run issue_>`_ of
|
|
|
|
the ``pip-run`` design is available on the project's issue tracker.
|
|
|
|
|
|
|
|
|
2023-08-03 17:14:28 -04:00
|
|
|
Why not embed a ``pyproject.toml`` file in the script?
|
|
|
|
------------------------------------------------------
|
|
|
|
|
|
|
|
First of all, ``pyproject.toml`` is a TOML based format, so all of the previous
|
|
|
|
concerns around TOML as a format apply. However, ``pyproject.toml`` is a
|
|
|
|
standard used by Python packaging, and re-using an existing standard is a
|
|
|
|
reasonable suggestion that deserves to be addressed on its own merits.
|
|
|
|
|
|
|
|
The first issue is that the suggestion rarely implies that *all* of
|
|
|
|
``pyproject.toml`` is to be supported for scripts. A script is not intended to
|
|
|
|
be "built" into any sort of distributable artifact like a wheel (see below for
|
|
|
|
more on this point), so the ``[build-system]`` section of ``pyproject.toml``
|
|
|
|
makes little sense, for example. And while the tool-specific sections of
|
|
|
|
``pyproject.toml`` might be useful for scripts, it's not at all clear that a
|
|
|
|
tool like `ruff <https://beta.ruff.rs/docs/>`__ would want to support per-file
|
|
|
|
configuration in this way, leading to confusion when users *expect* it to work,
|
|
|
|
but it doesn't. Furthermore, this sort of tool-specific configuration is just as
|
|
|
|
useful for individual files in a larger project, so we have to consider what it
|
|
|
|
would mean to embed a ``pyproject.toml`` into a single file in a larger project
|
|
|
|
that has its own ``pyproject.toml``.
|
|
|
|
|
|
|
|
In addition, ``pyproject.toml`` is currently focused on projects that are to be
|
|
|
|
built into wheels. There is `an ongoing discussion <pyproject without wheels_>`_
|
|
|
|
about how to use ``pyproject.toml`` for projects that are not intended to be
|
|
|
|
built as wheels, and until that question is resolved (which will likely require
|
|
|
|
some PEPs of its own) it seems premature to be discussing embedding
|
|
|
|
``pyproject.toml`` into scripts, which are *definitely* not intended to be built
|
|
|
|
and distributed in that manner.
|
|
|
|
|
|
|
|
The conclusion, therefore (which has been stated explicitly in some, but not
|
|
|
|
all, cases) is that this proposal is intended to mean that we would embed *part
|
|
|
|
of* ``pyproject.toml``. Typically this is the ``[project]`` section from
|
|
|
|
:pep:`621`, or even just the ``dependencies`` item from that section.
|
|
|
|
|
|
|
|
At this point, the first issue is that by framing the proposal as "embedding
|
|
|
|
``pyproject.toml``", we would be encouraging the sort of confusion discussed in
|
|
|
|
the previous paragraphs - developers will expect the full capabilities of
|
|
|
|
``pyproject.toml``, and be confused when there are differences and limitations.
|
|
|
|
It would be better, therefore, to consider this suggestion as simply being a
|
|
|
|
proposal to use an embedded TOML format, but specifically re-using the
|
|
|
|
*structure* of a particular part of ``pyproject.toml``. The problem then becomes
|
|
|
|
how we describe that structure, *without* causing confusion for people familiar
|
|
|
|
with ``pyproject.toml``. If we describe it with reference to ``pyproject.toml``,
|
|
|
|
the link is still there. But if we describe it in isolation, people will be
|
|
|
|
confused by the "similar but different" nature of the structure.
|
|
|
|
|
|
|
|
It is also important to remember that a key part of the target audience for this
|
|
|
|
proposal is developers who are simply using Python as a "better batch file"
|
|
|
|
solution. These developers will generally not be familiar with Python packaging
|
|
|
|
and its conventions, and are often the people most critical of the "complexity"
|
|
|
|
and "difficulty" of packaging solutions. As a result, proposals based on those
|
|
|
|
existing solutions are likely to be unwelcome to that audience, and could easily
|
|
|
|
result in people simply continuing to use existing adhoc solutions, and ignoring
|
|
|
|
the standard that was intended to make their lives easier.
|
|
|
|
|
2023-08-08 12:26:32 -04:00
|
|
|
Why not infer the requirements from import statements?
|
|
|
|
------------------------------------------------------
|
|
|
|
|
|
|
|
The idea would be to automatically recognize ``import`` statements in the source
|
|
|
|
file and turn them into a list of requirements.
|
|
|
|
|
|
|
|
However, this is infeasible for several reasons. First, the points above about
|
|
|
|
the necessity to keep the syntax easily parsable, for all Python versions, also
|
|
|
|
by tools written in other languages, apply equally here.
|
|
|
|
|
|
|
|
Second, PyPI and other package repositories conforming to the Simple Repository
|
|
|
|
API do not provide a mechanism to resolve package names from the module names
|
|
|
|
that are imported (see also `this related discussion <import-names_>`_).
|
|
|
|
|
|
|
|
Third, even if repositories did offer this information, the same import name may
|
|
|
|
correspond to several packages on PyPI. One might object that disambiguating
|
|
|
|
which package is wanted would only be needed if there are several projects
|
|
|
|
providing the same import name. However, this would make it easy for anyone to
|
|
|
|
unintentionally or malevolently break working scripts, by uploading a package to
|
|
|
|
PyPI providing an import name that is the same as an existing project. The
|
|
|
|
alternative where, among the candidates, the first package to have been
|
|
|
|
registered on the index is chosen, would be confusing in case a popular package
|
|
|
|
is developed with the same import name as an existing obscure package, and even
|
|
|
|
harmful if the existing package is malware intentionally uploaded with a
|
|
|
|
sufficiently generic import name that has a high probability of being reused.
|
|
|
|
|
|
|
|
A related idea would be to attach the requirements as comments to the import
|
|
|
|
statements instead of gathering them in a block, with a syntax such as::
|
|
|
|
|
|
|
|
import numpy as np # requires: numpy
|
|
|
|
import rich # requires: rich
|
|
|
|
|
|
|
|
This still suffers from parsing difficulties. Also, where to place the comment
|
|
|
|
in the case of multiline imports is ambiguous and may look ugly::
|
|
|
|
|
|
|
|
from PyQt5.QtWidgets import (
|
|
|
|
QCheckBox, QComboBox, QDialog, QDialogButtonBox,
|
|
|
|
QGridLayout, QLabel, QSpinBox, QTextEdit
|
|
|
|
) # requires: PyQt5
|
|
|
|
|
|
|
|
Furthermore, this syntax cannot behave as might be intuitively expected
|
|
|
|
in all situations. Consider::
|
|
|
|
|
|
|
|
import platform
|
|
|
|
if platform.system() == "Windows":
|
|
|
|
import pywin32 # requires: pywin32
|
|
|
|
|
|
|
|
Here, the user's intent is that the package is only required on Windows, but
|
|
|
|
this cannot be understood by the script runner (the correct way to write
|
|
|
|
it would be ``requires: pywin32 ; sys_platform == 'win32'``).
|
|
|
|
|
|
|
|
(Thanks to Jean Abou-Samra for the clear discussion of this point)
|
|
|
|
|
|
|
|
|
2023-08-16 12:13:33 -04:00
|
|
|
Why not simply manage the environment at runtime?
|
|
|
|
-------------------------------------------------
|
|
|
|
|
|
|
|
Another approach to running scripts with dependencies is simply to manage those
|
|
|
|
dependencies at runtime. This can be done by using a library that makes packages
|
|
|
|
available. There are many options for implementing such a library, for example
|
|
|
|
by installing them directly into the user's environment or by manipulating
|
|
|
|
``sys.path`` to make them available from a local cache.
|
|
|
|
|
|
|
|
These approaches are not incompatible with this PEP. An API such as
|
|
|
|
|
|
|
|
.. code:: python
|
|
|
|
|
|
|
|
env_mgr.install("rich")
|
|
|
|
env_mgr.install("click")
|
|
|
|
|
|
|
|
import rich
|
|
|
|
import click
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
is certainly feasible. However, such a library could be written without the need
|
|
|
|
for any new standards, and as far as the PEP author is aware, this has not
|
|
|
|
happened. This suggests that an approach like this is not as attractive as it
|
|
|
|
first seems. There is also the bootstrapping issue of making the ``env_mgr``
|
|
|
|
library available in the first place. And finally, this approach doesn't
|
|
|
|
actually offer any interoperability benefits, as it does not use a standard form
|
|
|
|
for the dependency list, and so other tools cannot access the data.
|
|
|
|
|
|
|
|
In any case, such a library could still benefit from this proposal, as it could
|
|
|
|
include an API to read the packages to install from the script dependency block.
|
|
|
|
This would give the same functionality while allowing interoperability with
|
|
|
|
other tools that support this specification.
|
|
|
|
|
|
|
|
.. code:: python
|
|
|
|
|
|
|
|
# Script Dependencies:
|
|
|
|
# rich
|
|
|
|
# click
|
|
|
|
env_mgr.install_dependencies(__file__)
|
|
|
|
|
|
|
|
import rich
|
|
|
|
import click
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
2023-08-03 17:14:28 -04:00
|
|
|
Why not just set up a Python project with a ``pyproject.toml``?
|
|
|
|
---------------------------------------------------------------
|
|
|
|
|
|
|
|
Again, a key issue here is that the target audience for this proposal is people
|
|
|
|
writing scripts which aren't intended for distribution. Sometimes scripts will
|
|
|
|
be "shared", but this is far more informal than "distribution" - it typically
|
|
|
|
involves sending a script via an email with some written instructions on how to
|
|
|
|
run it, or passing someone a link to a gist.
|
|
|
|
|
|
|
|
Expecting such users to learn the complexities of Python packaging is a
|
|
|
|
significant step up in complexity, and would almost certainly give the
|
|
|
|
impression that "Python is too hard for scripts".
|
|
|
|
|
|
|
|
In addition, if the expectation here is that the ``pyproject.toml`` will somehow
|
|
|
|
be designed for running scripts in place, that's a new feature of the standard
|
|
|
|
that doesn't currently exist. At a minimum, this isn't a reasonable suggestion
|
|
|
|
until the `current discussion on Discourse <pyproject without wheels_>`_ about
|
|
|
|
using ``pyproject.toml`` for projects that won't be distributed as wheels is
|
|
|
|
resolved. And even then, it doesn't address the "sending someone a script in a
|
|
|
|
gist or email" use case.
|
|
|
|
|
|
|
|
Why not use a requirements file for dependencies?
|
|
|
|
-------------------------------------------------
|
|
|
|
|
|
|
|
Putting your requirements in a requirements file, doesn't require a PEP. You can
|
|
|
|
do that right now, and in fact it's quite likely that many adhoc solutions do
|
|
|
|
this. However, without a standard, there's no way of knowing how to locate a
|
|
|
|
script's dependency data. And furthermore, the requirements file format is
|
|
|
|
pip-specific, so tools relying on it are depending on a pip implementation
|
|
|
|
detail.
|
|
|
|
|
|
|
|
So in order to make a standard, two things would be required:
|
|
|
|
|
|
|
|
1. A standardised replacement for the requirements file format.
|
2023-08-23 10:33:49 -04:00
|
|
|
2. A standard for how to locate the requirements file for a given script.
|
2023-08-03 17:14:28 -04:00
|
|
|
|
|
|
|
The first item is a significant undertaking. It has been discussed on a number
|
|
|
|
of occasions, but so far no-one has attempted to actually do it. The most likely
|
|
|
|
approach would be for standards to be developed for individual use cases
|
|
|
|
currently addressed with requirements files. One option here would be for this
|
|
|
|
PEP to simply define a new file format which is simply a text file containing
|
|
|
|
:pep:`508` requirements, one per line. That would just leave the question of how
|
|
|
|
to locate that file.
|
|
|
|
|
|
|
|
The "obvious" solution here would be to do something like name the file the same
|
|
|
|
as the script, but with a ``.reqs`` extension (or something similar). However,
|
|
|
|
this still requires *two* files, where currently only a single file is needed,
|
|
|
|
and as such, does not match the "better batch file" model (shell scripts and
|
|
|
|
batch files are typically self-contained). It requires the developer to remember
|
|
|
|
to keep the two files together, and this may not always be possible. For
|
|
|
|
example, system administration policies may require that *all* files in a
|
|
|
|
certain directory are executable (the Linux filesystem standards require this of
|
|
|
|
``/usr/bin``, for example). And some methods of sharing a script (for example,
|
|
|
|
publishing it on a text file sharing service like Github's gist, or a corporate
|
|
|
|
intranet) may not allow for deriving the location of an associated requirements
|
|
|
|
file from the script's location (tools like ``pipx`` support running a script
|
|
|
|
directly from a URL, so "download and unpack a zip of the script and its
|
|
|
|
dependencies" may not be an appropriate requirement).
|
|
|
|
|
|
|
|
Essentially, though, the issue here is that there is an explicitly stated
|
|
|
|
requirement that the format supports storing dependency data *in the script file
|
|
|
|
itself*. Solutions that don't do that are simply ignoring that requirement.
|
|
|
|
|
|
|
|
Should scripts be able to specify a package index?
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
Dependency metadata is about *what* package the code depends on, and not *where*
|
|
|
|
that package comes from. There is no difference here between metadata for
|
|
|
|
scripts, and metadata for distribution packages (as defined in
|
|
|
|
``pyproject.toml``). In both cases, dependencies are given in "abstract" form,
|
|
|
|
without specifying how they are obtained.
|
|
|
|
|
|
|
|
Some tools that use the dependency information may, of course, need to locate
|
|
|
|
concrete dependency artifacts - for example if they expect to create an
|
|
|
|
environment containing those dependencies. But the way they choose to do that
|
|
|
|
will be closely linked to the tool's UI in general, and this PEP does not try to
|
|
|
|
dictate the UI for tools.
|
|
|
|
|
|
|
|
There is more discussion of this point, and in particular of the UI choices made
|
|
|
|
by the ``pip-run`` tool, in `the previously mentioned pip-run issue <pip-run
|
|
|
|
issue_>`_.
|
|
|
|
|
|
|
|
What about local dependencies?
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
These can be handled without needing special metadata and tooling, simply by
|
|
|
|
adding the location of the dependencies to ``sys.path``. This PEP simply isn't
|
|
|
|
needed for this case. If, on the other hand, the "local dependencies" are actual
|
|
|
|
distributions which are published locally, they can be specified as usual with a
|
|
|
|
:pep:`508` requirement, and the local package index specified when running a
|
|
|
|
tool by using the tool's UI for that.
|
|
|
|
|
|
|
|
Open Issues
|
|
|
|
===========
|
|
|
|
|
|
|
|
None at this point.
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
==========
|
|
|
|
|
|
|
|
.. _pip-run issue: https://github.com/jaraco/pip-run/issues/44
|
|
|
|
.. _language survey: https://dbohdan.com/scripts-with-dependencies
|
|
|
|
.. _pyproject without wheels: https://discuss.python.org/t/projects-that-arent-meant-to-generate-a-wheel-and-pyproject-toml/29684
|
2023-08-08 12:26:32 -04:00
|
|
|
.. _import-names: https://discuss.python.org/t/record-the-top-level-names-of-a-wheel-in-metadata/29494
|
2023-08-03 17:14:28 -04:00
|
|
|
|
|
|
|
Copyright
|
|
|
|
=========
|
|
|
|
|
|
|
|
This document is placed in the public domain or under the
|
|
|
|
CC0-1.0-Universal license, whichever is more permissive.
|