python-peps/peps/pep-0722.rst

721 lines
33 KiB
ReStructuredText
Raw Permalink Normal View History

PEP: 722
Title: Dependency specification for single-file scripts
Author: Paul Moore <p.f.moore@gmail.com>
PEP-Delegate: Brett Cannon <brett@python.org>
Discussions-To: https://discuss.python.org/t/29905
2023-10-21 06:30:17 -04:00
Status: Rejected
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 19-Jul-2023
Post-History: `19-Jul-2023 <https://discuss.python.org/t/29905>`__
2023-10-21 06:30:17 -04:00
Resolution: https://discuss.python.org/t/pep-722-723-decision/36763/
Abstract
========
This PEP specifies a format for including 3rd-party dependencies in a
single-file Python script.
Motivation
==========
Not all Python code is structured as a "project", in the sense of having its own
directory complete with a ``pyproject.toml`` file, and being built into an
installable distribution package. Python is also routinely used as a scripting
language, with Python scripts as a (better) alternative to shell scripts, batch
files, etc. When used to create scripts, Python code is typically stored as a
single file, often in a directory dedicated to such "utility scripts", which
might be in a mix of languages with Python being only one possibility among
many. Such scripts may be shared, often by something as simple as email, or a
link to a URL such as a Github gist. But they are typically *not* "distributed"
or "installed" as part of a normal workflow.
One problem when using Python as a scripting language in this way is how to run
the script in an environment that contains whatever third party dependencies are
required by the script. There is currently no standard tool that addresses this
issue, and this PEP does *not* attempt to define one. However, any tool that
*does* address this issue will need to know what 3rd party dependencies a script
requires. By defining a standard format for storing such data, existing tools,
as well as any future tools, will be able to obtain that information without
requiring users to include tool-specific metadata in their scripts.
Rationale
=========
Because a key requirement is writing single-file scripts, and simple sharing by
giving someone a copy of the script, the PEP defines a mechanism for embedding
dependency data *within the script itself*, and not in an external file.
2023-08-08 12:26:32 -04:00
We define the concept of a *dependency block* that contains information about
what 3rd party packages a script depends on.
2023-08-08 12:26:32 -04:00
In order to identify dependency blocks, the script can simply be read as a text
file. This is deliberate, as Python syntax changes over time, so attempting to
parse the script as Python code would require choosing a specific version of
Python syntax. Also, it is likely that at least some tools will not be written
in Python, and expecting them to implement a Python parser is too much of a
burden.
However, to avoid needing changes to core Python, the format is designed to
appear as comments to the Python parser. It is possible to write code where a
2023-08-08 12:26:32 -04:00
dependency block is *not* interpreted as a comment (for example, by embedding it
in a Python multi-line string), but such uses are discouraged and can easily be
avoided assuming you are not deliberately trying to create a pathological
example.
A `review <language survey_>`_ of how other languages allow scripts to specify
their dependencies shows that a "structured comment" like this is a
commonly-used approach.
Specification
=============
2023-08-08 12:26:32 -04:00
The content of this section will be published in the Python Packaging user
guide, PyPA Specifications section, as a document with the title "Embedding
Metadata in Script Files".
2023-08-08 12:26:32 -04:00
Any Python script may contain a *dependency block*. The dependency block is
identified by reading the script *as a text file* (i.e., the file is not parsed
as Python source code), looking for the first line of the form::
2023-08-08 12:26:32 -04:00
# Script Dependencies:
2023-08-08 12:26:32 -04:00
The hash character must be at the start of the line with no preceding whitespace.
The text "Script Dependencies" is recognised regardless of case, and the spaces
represent arbitrary whitespace (although at least one space must be present). The
following regular expression recognises the dependency block header line::
2023-08-08 12:26:32 -04:00
(?i)^#\s+script\s+dependencies:\s*$
2023-08-08 12:26:32 -04:00
Tools reading the dependency block MAY respect the standard Python encoding
declaration. If they choose not to do so, they MUST process the file as UTF-8.
2023-08-08 12:26:32 -04:00
After the header line, all lines in the file up to the first line that doesn't
start with a ``#`` sign are considered *dependency lines* and are treated as
follows:
2023-08-08 12:26:32 -04:00
1. The initial ``#`` sign is stripped.
2. If the line contains the character sequence " # " (SPACE HASH SPACE), then
those characters and any subsequent characters are discarded. This allows
dependency blocks to contain inline comments.
3. Whitespace at the start and end of the remaining text is discarded.
4. If the line is now empty, it is ignored.
5. The content of the line MUST now be a valid :pep:`508` dependency specifier.
2023-08-08 12:26:32 -04:00
The requirement for spaces before and after the ``#`` in an inline comment is
necessary to distinguish them from part of a :pep:`508` URL specifier (which
can contain a hash, but without surrounding whitespace).
Consumers MUST validate that at a minimum, all dependencies start with a
``name`` as defined in :pep:`508`, and they MAY validate that all dependencies
conform fully to :pep:`508`. They MUST fail with an error if they find an
invalid specifier.
Example
-------
The following is an example of a script with an embedded dependency block::
# In order to run, this script needs the following 3rd party libraries
#
2023-08-08 12:26:32 -04:00
# Script Dependencies:
# requests
# rich # Needed for the output
#
# # Not needed - just to show that fragments in URLs do not
# # get treated as comments
# pip @ https://github.com/pypa/pip/archive/1.3.1.zip#sha1=da9234ee9982d4bbb3c72346a6de940a148ea686
import requests
from rich.pretty import pprint
resp = requests.get("https://peps.python.org/api/peps.json")
data = resp.json()
pprint([(k, v["title"]) for k, v in data.items()][:10])
Backwards Compatibility
=======================
2023-08-08 12:26:32 -04:00
As dependency blocks take the form of a structured comment, they can be added
without altering the meaning of existing code.
It is possible that a comment may already exist which matches the form of a
2023-08-08 12:26:32 -04:00
dependency block. While the identifying header text, "Script Dependencies" is
chosen to minimise this risk, it is still possible.
In the rare case where an existing comment would be interpreted incorrectly as a
dependency block, this can be addressed by adding an actual dependency block
(which can be empty if the script has no dependencies) earlier in the code.
Security Implications
=====================
If a script containing a dependency block is run using a tool that automatically
installs dependencies, this could cause arbitrary code to be downloaded and
installed in the user's environment.
The risk here is part of the functionality of the tool being used to run the
script, and as such should already be addressed by the tool itself. The only
additional risk introduced by this PEP is if an untrusted script with a
dependency block is run, when a potentially malicious dependency might be
installed. This risk is addressed by the normal good practice of reviewing code
before running it.
How to Teach This
=================
The format is intended to be close to how a developer might already specify
script dependencies in an explanatory comment. The required structure is
2023-08-08 12:26:32 -04:00
deliberately minimal, so that formatting rules are easy to learn.
Users will need to know how to write Python dependency specifiers. This is
covered by :pep:`508`, but for simple examples (which is expected to be the norm
for inexperienced users) the syntax is either just a package name, or a name and
a version restriction, which is fairly well-understood syntax.
Users will also know how to *run* a script using a tool that interprets
dependency data. This is not covered by this PEP, as it is the responsibility of
such a tool to document how it should be used.
Note that the core Python interpreter does *not* interpret dependency blocks.
This may be a point of confusion for beginners, who try to run ``python
some_script.py`` and do not understand why it fails. This is no different than
the current status quo, though, where running a script without its dependencies
present will give an error.
In general, it is assumed that if a beginner is given a script with dependencies
(regardless of whether they are specified in a dependency block), the person
supplying the script should explain how to run that script, and if that involves
using a script runner tool, that should be noted.
Recommendations
===============
This section is non-normative and simply describes "good practices" when using
dependency blocks.
2023-08-08 12:26:32 -04:00
While it is permitted for tools to do minimal validation of requirements, in
practice they should do as much "sanity check" validation as possible, even if
they cannot do a full check for :pep:`508` syntax. This helps to ensure that
dependency blocks that are not correctly terminated are reported early. A good
compromise between the minimal approach of checking just that the requirement
starts with a name, and full :pep:`508` validation, is to check for a bare name,
or a name followed by optional whitespace, and then one of ``[`` (extra), ``@``
(urlspec), ``;`` (marker) or one of ``(<!=>~`` (version).
Scripts should, in general, place the dependency block at the top of the file,
either immediately after any shebang line, or straight after the script
2023-08-08 12:26:32 -04:00
docstring. In particular, the dependency block should always be placed before
any executable code in the file. This makes it easy for the human reader to
2023-08-08 12:26:32 -04:00
locate it.
Reference Implementation
========================
Code to implement this proposal in Python is fairly straightforward, so the
reference implementation can be included here.
.. code:: python
2023-08-08 12:26:32 -04:00
import re
import tokenize
from packaging.requirements import Requirement
2023-08-08 12:26:32 -04:00
DEPENDENCY_BLOCK_MARKER = r"(?i)^#\s+script\s+dependencies:\s*$"
def read_dependency_block(filename):
2023-08-08 12:26:32 -04:00
# Use the tokenize module to handle any encoding declaration.
with tokenize.open(filename) as f:
# Skip lines until we reach a dependency block (OR EOF).
2023-08-08 12:26:32 -04:00
for line in f:
if re.match(DEPENDENCY_BLOCK_MARKER, line):
break
# Read dependency lines until we hit a line that doesn't
# start with #, or we are at EOF.
for line in f:
if not line.startswith("#"):
break
# Remove comments. An inline comment is introduced by
# a hash, which must be preceded and followed by a
# space.
line = line[1:].split(" # ", maxsplit=1)[0]
line = line.strip()
# Ignore empty lines
if not line:
continue
# Try to convert to a requirement. This will raise
# an error if the line is not a PEP 508 requirement
yield Requirement(line)
A format similar to the one proposed here is already supported `in pipx
<https://github.com/pypa/pipx/pull/916>`__ and in `pip-run
<https://pypi.org/project/pip-run/>`__.
Rejected Ideas
==============
Why not include other metadata?
-------------------------------
2023-08-08 12:26:32 -04:00
The core use case addressed by this proposal is that of identifying what
dependencies a standalone script needs in order to run successfully. This is a
common real-world issue that is currently solved by script runner tools, using
implementation-specific ways of storing the data. Standardising the storage
format improves interoperability by not typing the script to a particular
runner.
While it is arguable that other forms of metadata could be useful in a
standalone script, the need is largely theoretical at this point. In practical
terms, scripts either don't use other metadata, or they store it in existing,
widely used (and therefore de facto standard) formats. For example, scripts
needing README style text typically use the standard Python module docstring,
and scripts wanting to declare a version use the common convention of having a
``__version__`` variable.
One case which was raised during the discussion on this PEP, was the ability to
declare a minimum Python version that a script needed to run, by analogy with
the ``Requires-Python`` core metadata item for packages. Unlike packages,
scripts are normally only run by one user or in one environment, in contexts
where multiple versions of Python are uncommon. The need for this metadata is
therefore much less critical in the case of scripts. As further evidence of
this, the two key script runners currently available, ``pipx`` and ``pip-run``
do not offer a means of including this data in a script.
Creating a standard "metadata container" format would unify the various
approaches, but in practical terms there is no real need for unification, and
the disruption would either delay adoption, or more likely simply mean script
authors would ignore the standard.
This proposal therefore chooses to focus just on the one use case where there is
a clear need for something, and no existing standard or common practice.
Why not use a marker per line?
------------------------------
Rather than using a comment block with a header, another possibility would be to
use a marker on each line, something like::
# Script-Dependency: requests
# Script-Dependency: click
While this makes it easier to parse lines individually, it has a number of
issues. The first is simply that it's rather verbose, and less readable. This is
clearly affected by the chosen keyword, but all of the suggested options were
(in the author's opinion) less readable than the block comment form.
More importantly, this form *by design* makes it impossible to require that the
dependency specifiers are all together in a single block. As a result, it's not
possible for a human reader, without a careful check of the whole file, to be
sure that they have identified all of the dependencies. See the question below,
"Why not allow multiple dependency blocks and merge them?", for further
discussion of this problem.
Finally, as the reference implementation demonstrates, parsing the "comment
block" form isn't, in practice, significantly more difficult than parsing this
form.
2023-08-08 12:26:32 -04:00
Why not use a distinct form of comment for the dependency block?
----------------------------------------------------------------
2023-08-08 12:26:32 -04:00
A previous version of this proposal used ``##`` to identify dependency blocks.
Unfortunately, however, the flake8 linter implements a rule requiring that
comments must have a space after the initial ``#`` sign. While the PEP author
considers that rule misguided, it is on by default and as a result would cause
checks to fail when faced with a dependency block.
2023-08-08 12:26:32 -04:00
Furthermore, the ``black`` formatter, although it allows the ``##`` form, does
add a space after the ``#`` for most other forms of comment. This means that if
we chose an alternative like ``#%``, automatic reformatting would corrupt the
dependency block. Forms including a space, like ``# #`` are possible, but less
natural for the average user (omitting the space is an obvious mistake to make).
2023-08-08 12:26:32 -04:00
While it is possible that linters and formatters could be changed to recognise
the new standard, the benefit of having a dedicated prefix did not seem
sufficient to justify the transition cost, or the risk that users might be using
older tools.
Why not allow multiple dependency blocks and merge them?
--------------------------------------------------------
Because it's too easy for the human reader to miss the fact that there's a
second dependency block. This could simply result in the script runner
unexpectedly downloading extra packages, or it could even be a way to smuggle
malicious packages onto a user's machine (by "hiding" a second dependency block
in the body of the script).
While the principle of "don't run untrusted code" applies here, the benefits
aren't sufficient to be worth the risk.
Why not use a more standard data format (e.g., TOML)?
-----------------------------------------------------
First of all, the only practical choice for an alternative format is TOML.
Python packaging has standardised on TOML for structured data, and using a
different format, such as YAML or JSON, would add complexity and confusion for
no real benefit.
So the question is essentially, "why not use TOML?"
The key idea behind the "dependency block" format is to define something that
reads naturally as a comment in the script. Dependency data is useful both for
tools and for the human reader, so having a human readable format is beneficial.
On the other hand, TOML of necessity has a syntax of its own, which distracts
from the underlying data.
It is important to remember that developers who *write* scripts in Python are
often *not* experienced in Python, or Python packaging. They are often systems
administrators, or data analysts, who may simply be using Python as a "better
batch file". For such users, the TOML format is extremely likely to be
unfamiliar, and the syntax will be obscure to them, and not particularly
intuitive. Such developers may well be copying dependency specifiers from
sources such as Stack Overflow, without really understanding them. Having to
embed such a requirement into a TOML structure is an additional complexity --
and it is important to remember that the goal here is to make using 3rd party
libraries *easy* for such users.
Furthermore, TOML, by its nature, is a flexible format intended to support very
general data structures. There are *many* ways of writing a simple list of
strings in it, and it will not be clear to inexperienced users which form to use.
Another potential issue is that using a generalised TOML parser can `in some cases
<https://discuss.python.org/t/pep-722-dependency-specification-for-single-file-scripts/29905/275>`__
result in a measurable performance overhead. Startup time is often quoted as an
issue when running small scripts, so this may be a problem for script runners that
are aiming for high performance.
And finally, there will be tools that expect to *write* dependency data into
scripts -- for example, an IDE with a feature that automatically adds an import
and a dependency specifier when you reference a library function. While
libraries exist that allow editing TOML data, they are not always good at
2023-08-08 12:26:32 -04:00
preserving the user's layout. Even if libraries exist which do an effective job
at this, expecting all tools to use such a library is a significant imposition
on code supporting this PEP.
By choosing a simple, line-based format with no quoting rules, dependency data
is easy to read (for humans and tools) and easy to write. The format doesn't
have the flexibility of something like TOML, but the use case simply doesn't
demand that sort of flexibility.
2023-08-08 12:26:32 -04:00
Why not use (possibly restricted) Python syntax?
------------------------------------------------
This would typically involve storing the dependencies as a (runtime) list
variable with a conventional name, such as::
__requires__ = [
"requests",
"click",
]
Other suggestions include a static multi-line string, or including the
dependencies in the script's docstring.
The most significant problem with this proposal is that it requires all
consumers of the dependency data to implement a Python parser. Even if the
syntax is restricted, the *rest* of the script will use the full Python syntax,
and trying to define a syntax which can be successfully parsed in isolation from
the surrounding code is likely to be extremely difficult and error-prone.
Furthermore, Python's syntax changes in every release. If extracting dependency
data needs a Python parser, the parser will need to know which version of Python
the script is written for, and the overhead for a generic tool of having a
parser that can handle *multiple* versions of Python is unsustainable.
Even if the above issues could be addressed, the format would give the
impression that the data could be altered at runtime. However, this is not the
case in general, and code that tries to do so will encounter unexpected and
confusing behaviour.
And finally, there is no evidence that having dependency data available at
runtime is of any practical use. Should such a use be found, it is simple enough
to get the data by parsing the source - ``read_dependency_block(__file__)``.
It is worth noting, though, that the ``pip-run`` utility does implement (an
extended form of) this approach. `Further discussion <pip-run issue_>`_ of
the ``pip-run`` design is available on the project's issue tracker.
Why not embed a ``pyproject.toml`` file in the script?
------------------------------------------------------
First of all, ``pyproject.toml`` is a TOML based format, so all of the previous
concerns around TOML as a format apply. However, ``pyproject.toml`` is a
standard used by Python packaging, and re-using an existing standard is a
reasonable suggestion that deserves to be addressed on its own merits.
The first issue is that the suggestion rarely implies that *all* of
``pyproject.toml`` is to be supported for scripts. A script is not intended to
be "built" into any sort of distributable artifact like a wheel (see below for
more on this point), so the ``[build-system]`` section of ``pyproject.toml``
makes little sense, for example. And while the tool-specific sections of
``pyproject.toml`` might be useful for scripts, it's not at all clear that a
tool like `ruff <https://beta.ruff.rs/docs/>`__ would want to support per-file
configuration in this way, leading to confusion when users *expect* it to work,
but it doesn't. Furthermore, this sort of tool-specific configuration is just as
useful for individual files in a larger project, so we have to consider what it
would mean to embed a ``pyproject.toml`` into a single file in a larger project
that has its own ``pyproject.toml``.
In addition, ``pyproject.toml`` is currently focused on projects that are to be
built into wheels. There is `an ongoing discussion <pyproject without wheels_>`_
about how to use ``pyproject.toml`` for projects that are not intended to be
built as wheels, and until that question is resolved (which will likely require
some PEPs of its own) it seems premature to be discussing embedding
``pyproject.toml`` into scripts, which are *definitely* not intended to be built
and distributed in that manner.
The conclusion, therefore (which has been stated explicitly in some, but not
all, cases) is that this proposal is intended to mean that we would embed *part
of* ``pyproject.toml``. Typically this is the ``[project]`` section from
:pep:`621`, or even just the ``dependencies`` item from that section.
At this point, the first issue is that by framing the proposal as "embedding
``pyproject.toml``", we would be encouraging the sort of confusion discussed in
the previous paragraphs - developers will expect the full capabilities of
``pyproject.toml``, and be confused when there are differences and limitations.
It would be better, therefore, to consider this suggestion as simply being a
proposal to use an embedded TOML format, but specifically re-using the
*structure* of a particular part of ``pyproject.toml``. The problem then becomes
how we describe that structure, *without* causing confusion for people familiar
with ``pyproject.toml``. If we describe it with reference to ``pyproject.toml``,
the link is still there. But if we describe it in isolation, people will be
confused by the "similar but different" nature of the structure.
It is also important to remember that a key part of the target audience for this
proposal is developers who are simply using Python as a "better batch file"
solution. These developers will generally not be familiar with Python packaging
and its conventions, and are often the people most critical of the "complexity"
and "difficulty" of packaging solutions. As a result, proposals based on those
existing solutions are likely to be unwelcome to that audience, and could easily
result in people simply continuing to use existing adhoc solutions, and ignoring
the standard that was intended to make their lives easier.
2023-08-08 12:26:32 -04:00
Why not infer the requirements from import statements?
------------------------------------------------------
The idea would be to automatically recognize ``import`` statements in the source
file and turn them into a list of requirements.
However, this is infeasible for several reasons. First, the points above about
the necessity to keep the syntax easily parsable, for all Python versions, also
by tools written in other languages, apply equally here.
Second, PyPI and other package repositories conforming to the Simple Repository
API do not provide a mechanism to resolve package names from the module names
that are imported (see also `this related discussion <import-names_>`_).
Third, even if repositories did offer this information, the same import name may
correspond to several packages on PyPI. One might object that disambiguating
which package is wanted would only be needed if there are several projects
providing the same import name. However, this would make it easy for anyone to
unintentionally or malevolently break working scripts, by uploading a package to
PyPI providing an import name that is the same as an existing project. The
alternative where, among the candidates, the first package to have been
registered on the index is chosen, would be confusing in case a popular package
is developed with the same import name as an existing obscure package, and even
harmful if the existing package is malware intentionally uploaded with a
sufficiently generic import name that has a high probability of being reused.
A related idea would be to attach the requirements as comments to the import
statements instead of gathering them in a block, with a syntax such as::
import numpy as np # requires: numpy
import rich # requires: rich
This still suffers from parsing difficulties. Also, where to place the comment
in the case of multiline imports is ambiguous and may look ugly::
from PyQt5.QtWidgets import (
QCheckBox, QComboBox, QDialog, QDialogButtonBox,
QGridLayout, QLabel, QSpinBox, QTextEdit
) # requires: PyQt5
Furthermore, this syntax cannot behave as might be intuitively expected
in all situations. Consider::
import platform
if platform.system() == "Windows":
import pywin32 # requires: pywin32
Here, the user's intent is that the package is only required on Windows, but
this cannot be understood by the script runner (the correct way to write
it would be ``requires: pywin32 ; sys_platform == 'win32'``).
(Thanks to Jean Abou-Samra for the clear discussion of this point)
Why not simply manage the environment at runtime?
-------------------------------------------------
Another approach to running scripts with dependencies is simply to manage those
dependencies at runtime. This can be done by using a library that makes packages
available. There are many options for implementing such a library, for example
by installing them directly into the user's environment or by manipulating
``sys.path`` to make them available from a local cache.
These approaches are not incompatible with this PEP. An API such as
.. code:: python
env_mgr.install("rich")
env_mgr.install("click")
import rich
import click
...
is certainly feasible. However, such a library could be written without the need
for any new standards, and as far as the PEP author is aware, this has not
happened. This suggests that an approach like this is not as attractive as it
first seems. There is also the bootstrapping issue of making the ``env_mgr``
library available in the first place. And finally, this approach doesn't
actually offer any interoperability benefits, as it does not use a standard form
for the dependency list, and so other tools cannot access the data.
In any case, such a library could still benefit from this proposal, as it could
include an API to read the packages to install from the script dependency block.
This would give the same functionality while allowing interoperability with
other tools that support this specification.
.. code:: python
# Script Dependencies:
# rich
# click
env_mgr.install_dependencies(__file__)
import rich
import click
...
Why not just set up a Python project with a ``pyproject.toml``?
---------------------------------------------------------------
Again, a key issue here is that the target audience for this proposal is people
writing scripts which aren't intended for distribution. Sometimes scripts will
be "shared", but this is far more informal than "distribution" - it typically
involves sending a script via an email with some written instructions on how to
run it, or passing someone a link to a gist.
Expecting such users to learn the complexities of Python packaging is a
significant step up in complexity, and would almost certainly give the
impression that "Python is too hard for scripts".
In addition, if the expectation here is that the ``pyproject.toml`` will somehow
be designed for running scripts in place, that's a new feature of the standard
that doesn't currently exist. At a minimum, this isn't a reasonable suggestion
until the `current discussion on Discourse <pyproject without wheels_>`_ about
using ``pyproject.toml`` for projects that won't be distributed as wheels is
resolved. And even then, it doesn't address the "sending someone a script in a
gist or email" use case.
Why not use a requirements file for dependencies?
-------------------------------------------------
Putting your requirements in a requirements file, doesn't require a PEP. You can
do that right now, and in fact it's quite likely that many adhoc solutions do
this. However, without a standard, there's no way of knowing how to locate a
script's dependency data. And furthermore, the requirements file format is
pip-specific, so tools relying on it are depending on a pip implementation
detail.
So in order to make a standard, two things would be required:
1. A standardised replacement for the requirements file format.
2. A standard for how to locate the requirements file for a given script.
The first item is a significant undertaking. It has been discussed on a number
of occasions, but so far no-one has attempted to actually do it. The most likely
approach would be for standards to be developed for individual use cases
currently addressed with requirements files. One option here would be for this
PEP to simply define a new file format which is simply a text file containing
:pep:`508` requirements, one per line. That would just leave the question of how
to locate that file.
The "obvious" solution here would be to do something like name the file the same
as the script, but with a ``.reqs`` extension (or something similar). However,
this still requires *two* files, where currently only a single file is needed,
and as such, does not match the "better batch file" model (shell scripts and
batch files are typically self-contained). It requires the developer to remember
to keep the two files together, and this may not always be possible. For
example, system administration policies may require that *all* files in a
certain directory are executable (the Linux filesystem standards require this of
``/usr/bin``, for example). And some methods of sharing a script (for example,
publishing it on a text file sharing service like Github's gist, or a corporate
intranet) may not allow for deriving the location of an associated requirements
file from the script's location (tools like ``pipx`` support running a script
directly from a URL, so "download and unpack a zip of the script and its
dependencies" may not be an appropriate requirement).
Essentially, though, the issue here is that there is an explicitly stated
requirement that the format supports storing dependency data *in the script file
itself*. Solutions that don't do that are simply ignoring that requirement.
Should scripts be able to specify a package index?
--------------------------------------------------
Dependency metadata is about *what* package the code depends on, and not *where*
that package comes from. There is no difference here between metadata for
scripts, and metadata for distribution packages (as defined in
``pyproject.toml``). In both cases, dependencies are given in "abstract" form,
without specifying how they are obtained.
Some tools that use the dependency information may, of course, need to locate
concrete dependency artifacts - for example if they expect to create an
environment containing those dependencies. But the way they choose to do that
will be closely linked to the tool's UI in general, and this PEP does not try to
dictate the UI for tools.
There is more discussion of this point, and in particular of the UI choices made
by the ``pip-run`` tool, in `the previously mentioned pip-run issue <pip-run
issue_>`_.
What about local dependencies?
------------------------------
These can be handled without needing special metadata and tooling, simply by
adding the location of the dependencies to ``sys.path``. This PEP simply isn't
needed for this case. If, on the other hand, the "local dependencies" are actual
distributions which are published locally, they can be specified as usual with a
:pep:`508` requirement, and the local package index specified when running a
tool by using the tool's UI for that.
Open Issues
===========
None at this point.
References
==========
.. _pip-run issue: https://github.com/jaraco/pip-run/issues/44
.. _language survey: https://dbohdan.com/scripts-with-dependencies
.. _pyproject without wheels: https://discuss.python.org/t/projects-that-arent-meant-to-generate-a-wheel-and-pyproject-toml/29684
2023-08-08 12:26:32 -04:00
.. _import-names: https://discuss.python.org/t/record-the-top-level-names-of-a-wheel-in-metadata/29494
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.