PEP: 722 Title: Dependency specification for single-file scripts Author: Paul Moore PEP-Delegate: Brett Cannon Discussions-To: https://discuss.python.org/t/29905 Status: Draft Type: Standards Track Topic: Packaging Content-Type: text/x-rst Created: 19-Jul-2023 Post-History: `19-Jul-2023 `__ Abstract ======== This PEP specifies a format for including 3rd-party dependencies in a single-file Python script. Motivation ========== Not all Python code is structured as a "project", in the sense of having its own directory complete with a ``pyproject.toml`` file, and being built into an installable distribution package. Python is also routinely used as a scripting language, with Python scripts as a (better) alternative to shell scripts, batch files, etc. When used to create scripts, Python code is typically stored as a single file, often in a directory dedicated to such "utility scripts", which might be in a mix of languages with Python being only one possibility among many. Such scripts may be shared, often by something as simple as email, or a link to a URL such as a Github gist. But they are typically *not* "distributed" or "installed" as part of a normal workflow. One problem when using Python as a scripting language in this way is how to run the script in an environment that contains whatever third party dependencies are required by the script. There is currently no standard tool that addresses this issue, and this PEP does *not* attempt to define one. However, any tool that *does* address this issue will need to know what 3rd party dependencies a script requires. By defining a standard format for storing such data, existing tools, as well as any future tools, will be able to obtain that information without requiring users to include tool-specific metadata in their scripts. Rationale ========= Because a key requirement is writing single-file scripts, and simple sharing by giving someone a copy of the script, the PEP defines a mechanism for embedding dependency data *within the script itself*, and not in an external file. We define the concept of a *metadata block* that contains information about a script. The only type of metadata defined here is dependency information, but making the concept general allows expansion in the future, should it be needed. In order to identify metadata blocks, the script can simply be read as a text file. This is deliberate, as Python syntax changes over time, so attempting to parse the script as Python code would require choosing a specific version of Python syntax. Also, it is likely that at least some tools will not be written in Python, and expecting them to implement a Python parser is too much of a burden. However, to avoid needing changes to core Python, the format is designed to appear as comments to the Python parser. It is possible to write code where a metadata block is *not* interpreted as a comment (for example, by embedding it in a Python multi-line string), but such uses are discouraged and can easily be avoided assuming you are not deliberately trying to create a pathological example. A `review `_ of how other languages allow scripts to specify their dependencies shows that a "structured comment" like this is a commonly-used approach. Specification ============= Any Python script may contain one or more *metadata blocks*. Metadata blocks are identified by reading the script *as a text file* (i.e., the file is not parsed as Python source code), looking for contiguous blocks of lines that start with the identifying characters ``##``. Whitespace is not allowed before the identifying ``##``. More than one metadata block may exist in a Python file. Tools reading dependency blocks MAY respect the standard Python encoding declaration. If they choose not to do so, they MUST process the file as UTF-8. Within a metadata block, whitespace after the ``##`` and at the end of the line is ignored, and blank lines are ignored. The first line of a metadata block is special, and identifies the type of block. This line MUST contain a colon character, ``:``. If the colon is not present, the block is not considered to be a metadata block, and tools MUST ignore it. The block type is all of the text on the initial line, up to the colon. There must be no whitespace before the colon. The initial line MAY contain text after the colon. How this is interpreted depends on the block type. Block types MUST be treated as case sensitive. The interpretation of any lines in a metadata block after the initial identifying line, is defined by the type of block. Tools MUST ignore any blocks with types they do not handle. Block types starting with the characters ``X-`` are reserved for the user, and MUST NOT be given a meaning in any future standard. Otherwise, the only defined block type is ``Script Dependencies``. For this block type, 1. Text after the colon on the initial line is NOT allowed. 2. All subsequent lines MUST contain :pep:`508` requirement specifiers, one per line. There SHOULD only be a single ``Script Dependencies`` block in the file. Tools consuming dependency data MAY simply process the first such block found. This avoids the need for tools to process more data than is necessary. Consumers MUST validate that at a minimum, all dependencies start with a ``name`` as defined in :pep:`508`, and they MAY validate that all dependencies conform fully to :pep:`508`. They MUST fail with an error if they find an invalid specifier. Example ------- The following is an example of a script with an embedded dependency block:: # In order to run, this script needs the following 3rd party libraries # ## Script Dependencies: ## requests ## rich import requests from rich.pretty import pprint resp = requests.get("https://peps.python.org/api/peps.json") data = resp.json() pprint([(k, v["title"]) for k, v in data.items()][:10]) Backwards Compatibility ======================= As metadata blocks take the form of a structured comment, they can be added without altering the meaning of existing code. It is possible that a comment may already exist which matches the form of a metadata block. While the use of a double ``#`` prefix is intended to minimise this risk, it is still possible. Because tools must ignore unrecognised metadata types, the only potential issue we need to consider is script dependencies. In that case, a tool might read the wrong dependencies. In practice, though, this is unlikely to happen, as (a) the header text (``Script Dependencies:``) is fairly unusual, and (b) any following lines are unlikely to conform to :pep:`508` unless they *are* dependencies. In the rare case where an existing comment would be interpreted incorrectly as a dependency block, this can be addressed by adding an actual dependency block (which can be empty if the script has no dependencies) earlier in the code. Security Implications ===================== If a script containing a dependency block is run using a tool that automatically installs dependencies, this could cause arbitrary code to be downloaded and installed in the user's environment. The risk here is part of the functionality of the tool being used to run the script, and as such should already be addressed by the tool itself. The only additional risk introduced by this PEP is if an untrusted script with a dependency block is run, when a potentially malicious dependency might be installed. This risk is addressed by the normal good practice of reviewing code before running it. How to Teach This ================= The format is intended to be close to how a developer might already specify script dependencies in an explanatory comment. The required structure is deliberately minimal, and the concept of using a special comment marker (``##`` in this case) is not unusual (the "shebang" line in a Unix shell script is an example). Users will need to know how to write Python dependency specifiers. This is covered by :pep:`508`, but for simple examples (which is expected to be the norm for inexperienced users) the syntax is either just a package name, or a name and a version restriction, which is fairly well-understood syntax. Users will also know how to *run* a script using a tool that interprets dependency data. This is not covered by this PEP, as it is the responsibility of such a tool to document how it should be used. Note that the core Python interpreter does *not* interpret dependency blocks. This may be a point of confusion for beginners, who try to run ``python some_script.py`` and do not understand why it fails. This is no different than the current status quo, though, where running a script without its dependencies present will give an error. In general, it is assumed that if a beginner is given a script with dependencies (regardless of whether they are specified in a dependency block), the person supplying the script should explain how to run that script, and if that involves using a script runner tool, that should be noted. Recommendations =============== This section is non-normative and simply describes "good practices" when using metadata blocks. Scripts should, in general, place metadata blocks at the top of the file, either immediately after any shebang line, or straight after the script docstring. In particular, the metadata block should always be placed before any executable code in the file. This makes it easy for the human reader to locate the metadata block, and allows tools to only read the minimum necessary to identify them. Reference Implementation ======================== Code to implement this proposal in Python is fairly straightforward, so the reference implementation can be included here. A parser that reads *only* the script dependency metadata. .. code:: python import tokenize from packaging.requirements import Requirement DEPENDENCY_BLOCK_MARKER = "Script Dependencies:" def read_dependency_block(filename): # Use the tokenize module to handle any encoding declaration. with tokenize.open(filename) as f: for line in f: if line.startswith("##"): line = line[2:].strip() if line == DEPENDENCY_BLOCK_MARKER: for line in f: if not line.startswith("##"): break line = line[2:].strip() if not line: continue # Try to convert to a requirement. This will raise # an error if the line is not a PEP 508 requirement yield Requirement(line) break A full metadata block parser that returns all metadata blocks in a script. .. code:: python import tokenize from packaging.requirements import Requirement def read_metadata_blocks(filename): # Use the tokenize module to handle any encoding declaration. with tokenize.open(filename) as f: for line in f: if line.startswith("##"): block_type, sep, extra = line[2:].strip().partition(":") if not sep: continue block_data = [] for line in f: if not line.startswith("##"): break line = line[2:].strip() if not line: continue block_data.append(line) yield block_type, extra, block_data A format similar to the one proposed here is already supported `in pipx `__ and in `pip-run `__. Rejected Ideas ============== Why not include other metadata? ------------------------------- The "metadata block" format is designed to allow additional metadata types, but none are defined at this time. Currently, the only data used by tools is dependency information, and therefore this is the only information required by this standard. If, in future, a need is identified for other data to be standardised, adding further metadata types is straightforward. By reserving metadata types starting with ``X-``, the specification allows experimentation with additional data *before* standardising. Two particular cases are a script version number, and the version of Python needed to run the script. In the case of the version number, there are no known tools that try to extract version information from scripts, so there is no immediate benefit to having the version as metadata, rather than, for example, as a normal comment or a ``__version__`` attribute (see :pep:`396`). If it becomes common for tools to want to introspect script versions, this could be added at a later date. In the case of the Python version, existing tools provide a means for the *user* to specify what Python interpreter to use when running the script (for example, ``pipx run`` provides the ``--python`` command line option), but they do not typically allow the *script* to define a version range, and then automatically pick an interpreter based on that. Having a "supported version" for a script may allow the tool to provide better error messages when run with an inappropriate interpreter, but currently, this is largely a theoretical benefit. Again, it is something that can be added later if it becomes a commonly requested feature. Why not use a more standard data format (e.g., TOML)? ----------------------------------------------------- First of all, the only practical choice for an alternative format is TOML. Python packaging has standardised on TOML for structured data, and using a different format, such as YAML or JSON, would add complexity and confusion for no real benefit. So the question is essentially, "why not use TOML?" The key idea behind the "metadata block" format is to define something that reads naturally as a comment in the script. Dependency data is useful both for tools and for the human reader, so having a human readable format is beneficial. On the other hand, TOML of necessity has a syntax of its own, which distracts from the underlying data. It is important to remember that developers who *write* scripts in Python are often *not* experienced in Python, or Python packaging. They are often systems administrators, or data analysts, who may simply be using Python as a "better batch file". For such users, the TOML format is extremely likely to be unfamiliar, and the syntax will be obscure to them, and not particularly intuitive. Such developers may well be copying dependency specifiers from sources such as Stack Overflow, without really understanding them. Having to embed such a requirement into a TOML structure is an additional complexity -- and it is important to remember that the goal here is to make using 3rd party libraries *easy* for such users. Furthermore, TOML, by its nature, is a flexible format intended to support very general data structures. There are *many* ways of writing a simple list of strings in it, and it will not be clear to inexperienced users which form to use. And finally, there will be tools that expect to *write* dependency data into scripts -- for example, an IDE with a feature that automatically adds an import and a dependency specifier when you reference a library function. While libraries exist that allow editing TOML data, they are not always good at preserving the user's layout, which could include comments, specific formatting, etc. Even if libraries exist which do an effective job at this, expecting all tools to use such a library is a significant imposition on code supporting this PEP. By choosing a simple, line-based format with no quoting rules, dependency data is easy to read (for humans and tools) and easy to write. The format doesn't have the flexibility of something like TOML, but the use case simply doesn't demand that sort of flexibility. Why not embed a ``pyproject.toml`` file in the script? ------------------------------------------------------ First of all, ``pyproject.toml`` is a TOML based format, so all of the previous concerns around TOML as a format apply. However, ``pyproject.toml`` is a standard used by Python packaging, and re-using an existing standard is a reasonable suggestion that deserves to be addressed on its own merits. The first issue is that the suggestion rarely implies that *all* of ``pyproject.toml`` is to be supported for scripts. A script is not intended to be "built" into any sort of distributable artifact like a wheel (see below for more on this point), so the ``[build-system]`` section of ``pyproject.toml`` makes little sense, for example. And while the tool-specific sections of ``pyproject.toml`` might be useful for scripts, it's not at all clear that a tool like `ruff `__ would want to support per-file configuration in this way, leading to confusion when users *expect* it to work, but it doesn't. Furthermore, this sort of tool-specific configuration is just as useful for individual files in a larger project, so we have to consider what it would mean to embed a ``pyproject.toml`` into a single file in a larger project that has its own ``pyproject.toml``. In addition, ``pyproject.toml`` is currently focused on projects that are to be built into wheels. There is `an ongoing discussion `_ about how to use ``pyproject.toml`` for projects that are not intended to be built as wheels, and until that question is resolved (which will likely require some PEPs of its own) it seems premature to be discussing embedding ``pyproject.toml`` into scripts, which are *definitely* not intended to be built and distributed in that manner. The conclusion, therefore (which has been stated explicitly in some, but not all, cases) is that this proposal is intended to mean that we would embed *part of* ``pyproject.toml``. Typically this is the ``[project]`` section from :pep:`621`, or even just the ``dependencies`` item from that section. At this point, the first issue is that by framing the proposal as "embedding ``pyproject.toml``", we would be encouraging the sort of confusion discussed in the previous paragraphs - developers will expect the full capabilities of ``pyproject.toml``, and be confused when there are differences and limitations. It would be better, therefore, to consider this suggestion as simply being a proposal to use an embedded TOML format, but specifically re-using the *structure* of a particular part of ``pyproject.toml``. The problem then becomes how we describe that structure, *without* causing confusion for people familiar with ``pyproject.toml``. If we describe it with reference to ``pyproject.toml``, the link is still there. But if we describe it in isolation, people will be confused by the "similar but different" nature of the structure. It is also important to remember that a key part of the target audience for this proposal is developers who are simply using Python as a "better batch file" solution. These developers will generally not be familiar with Python packaging and its conventions, and are often the people most critical of the "complexity" and "difficulty" of packaging solutions. As a result, proposals based on those existing solutions are likely to be unwelcome to that audience, and could easily result in people simply continuing to use existing adhoc solutions, and ignoring the standard that was intended to make their lives easier. Why not just set up a Python project with a ``pyproject.toml``? --------------------------------------------------------------- Again, a key issue here is that the target audience for this proposal is people writing scripts which aren't intended for distribution. Sometimes scripts will be "shared", but this is far more informal than "distribution" - it typically involves sending a script via an email with some written instructions on how to run it, or passing someone a link to a gist. Expecting such users to learn the complexities of Python packaging is a significant step up in complexity, and would almost certainly give the impression that "Python is too hard for scripts". In addition, if the expectation here is that the ``pyproject.toml`` will somehow be designed for running scripts in place, that's a new feature of the standard that doesn't currently exist. At a minimum, this isn't a reasonable suggestion until the `current discussion on Discourse `_ about using ``pyproject.toml`` for projects that won't be distributed as wheels is resolved. And even then, it doesn't address the "sending someone a script in a gist or email" use case. Why not use a requirements file for dependencies? ------------------------------------------------- Putting your requirements in a requirements file, doesn't require a PEP. You can do that right now, and in fact it's quite likely that many adhoc solutions do this. However, without a standard, there's no way of knowing how to locate a script's dependency data. And furthermore, the requirements file format is pip-specific, so tools relying on it are depending on a pip implementation detail. So in order to make a standard, two things would be required: 1. A standardised replacement for the requirements file format. 2. A standard for how to locate the requiements file for a given script. The first item is a significant undertaking. It has been discussed on a number of occasions, but so far no-one has attempted to actually do it. The most likely approach would be for standards to be developed for individual use cases currently addressed with requirements files. One option here would be for this PEP to simply define a new file format which is simply a text file containing :pep:`508` requirements, one per line. That would just leave the question of how to locate that file. The "obvious" solution here would be to do something like name the file the same as the script, but with a ``.reqs`` extension (or something similar). However, this still requires *two* files, where currently only a single file is needed, and as such, does not match the "better batch file" model (shell scripts and batch files are typically self-contained). It requires the developer to remember to keep the two files together, and this may not always be possible. For example, system administration policies may require that *all* files in a certain directory are executable (the Linux filesystem standards require this of ``/usr/bin``, for example). And some methods of sharing a script (for example, publishing it on a text file sharing service like Github's gist, or a corporate intranet) may not allow for deriving the location of an associated requirements file from the script's location (tools like ``pipx`` support running a script directly from a URL, so "download and unpack a zip of the script and its dependencies" may not be an appropriate requirement). Essentially, though, the issue here is that there is an explicitly stated requirement that the format supports storing dependency data *in the script file itself*. Solutions that don't do that are simply ignoring that requirement. Why not use (possibly restricted) Python syntax? ------------------------------------------------ This would typically involve storing the dependencies as a (runtime) list variable with a conventional name, such as:: __requires__ = [ "requests", "click", ] Other suggestions include a static multi-line string, or including the dependencies in the script's docstring. The most significant problem with this proposal is that it requires all consumers of the dependency data to implement a Python parser. Even if the syntax is restricted, the *rest* of the script will use the full Python syntax, and trying to define a syntax which can be successfully parsed in isolation from the surrounding code is likely to be extremely difficult and error-prone. Furthermore, Python's syntax changes in every release. If extracting dependency data needs a Python parser, the parser will need to know which version of Python the script is written for, and the overhead for a generic tool of having a parser that can handle *multiple* versions of Python is unsustainable. Even if the above issues could be addressed, the format would give the impression that the data could be altered at runtime. However, this is not the case in general, and code that tries to do so will encounter unexpected and confusing behaviour. And finally, there is no evidence that having dependency data available at runtime is of any practical use. Should such a use be found, it is simple enough to get the data by parsing the source - ``read_dependency_block(__file__)``. It is worth noting, though, that the ``pip-run`` utility does implement (an extended form of) this approach. `Further discussion `_ of the ``pip-run`` design is available on the project's issue tracker. Should scripts be able to specify a package index? -------------------------------------------------- Dependency metadata is about *what* package the code depends on, and not *where* that package comes from. There is no difference here between metadata for scripts, and metadata for distribution packages (as defined in ``pyproject.toml``). In both cases, dependencies are given in "abstract" form, without specifying how they are obtained. Some tools that use the dependency information may, of course, need to locate concrete dependency artifacts - for example if they expect to create an environment containing those dependencies. But the way they choose to do that will be closely linked to the tool's UI in general, and this PEP does not try to dictate the UI for tools. There is more discussion of this point, and in particular of the UI choices made by the ``pip-run`` tool, in `the previously mentioned pip-run issue `_. What about local dependencies? ------------------------------ These can be handled without needing special metadata and tooling, simply by adding the location of the dependencies to ``sys.path``. This PEP simply isn't needed for this case. If, on the other hand, the "local dependencies" are actual distributions which are published locally, they can be specified as usual with a :pep:`508` requirement, and the local package index specified when running a tool by using the tool's UI for that. Open Issues =========== None at this point. References ========== .. _pip-run issue: https://github.com/jaraco/pip-run/issues/44 .. _language survey: https://dbohdan.com/scripts-with-dependencies .. _pyproject without wheels: https://discuss.python.org/t/projects-that-arent-meant-to-generate-a-wheel-and-pyproject-toml/29684 Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.