PEP 711: PyBI: a standard format for distributing Python Binaries (#3090)

* Add PEP 711 (PyBI format)

* pygments doesn't understand csv

* remove some lints

* Doh, fix filename
This commit is contained in:
Nathaniel J. Smith 2023-04-07 00:30:10 -07:00 committed by GitHub
parent 3f4e0209e3
commit c9f29a8d42
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 585 additions and 0 deletions

1
.github/CODEOWNERS vendored
View File

@ -591,6 +591,7 @@ pep-0707.rst @iritkatriel
pep-0708.rst @dstufft
pep-0709.rst @carljm
pep-0710.rst @dstufft
pep-0711.rst @njsmith
# ...
# pep-0754.txt
# ...

584
pep-0711.rst Normal file
View File

@ -0,0 +1,584 @@
PEP: 711
Title: PyBI: a standard format for distributing Python Binaries
Author: Nathaniel J. Smith <njs@pobox.com>
PEP-Delegate: TODO
Discussions-To: TODO
Status: Draft
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 06-Apr-2023
Post-History: TODO
Abstract
========
“Like wheels, but instead of a pre-built python package, its a
pre-built python interpreter”
Motivation
==========
End goal: Pypi.org has pre-built packages for all Python versions on all
popular platforms, so automated tools can easily grab any of them and
set it up. It becomes quick and easy to try Python prereleases, pin
Python versions in CI, make a temporary environment to reproduce a bug
report that only happens on a specific Python point release, etc.
First step (this PEP): define a standard packaging file format to hold pre-built
Python interpreters, that reuses existing Python packaging standards as much as
possible.
Examples
========
Example pybi builds are available at `pybi.vorpus.org
<https://pybi.vorpus.org>`__. They're zip files, so you can unpack them and poke
around inside if you want to get a feel for how they're laid out.
You can also look at the `tooling I used to create them
<https://github.com/njsmith/pybi-tools>`__.
Specification
=============
Filename
--------
Filename: ``{distribution}-{version}[-{build tag}]-{platform tag}.pybi``
This matches the wheel file format defined in :pep:`427`, except dropping the
``{python tag}`` and ``{abi tag}`` and changing the extension from ``.whl``
``.pybi``.
For example:
- ``cpython-3.9.3-manylinux_2014.pybi``
- ``cpython-3.10b2-win_amd64.pybi``
Just like for wheels, if a pybi supports multiple platforms, you can
separate them by dots to make a “compressed tag set”:
- ``cpython-3.9.5-macosx_11_0_x86_64.macosx_11_0_arm64.pybi``
(Though in practice this probably wont be used much, e.g. the above
filename is more idiomatically written as
``cpython-3.9.5-macosx_11_0_universal2.pybi``.)
File contents
-------------
A ``.pybi`` file is a zip file, that can be unpacked directly into an
arbitrary location and then used as a self-contained Python environment.
Theres no ``.data`` directory or install scheme keys, because the
Python environment knows which install scheme its using, so it can just
put things in the right places to start with.
The “arbitrary location” part is important: the pybi cant contain any
hardcoded absolute paths. In particular, any preinstalled scripts MUST
NOT embed absolute paths in their shebang lines.
Similar to wheels ``<package>-<version>.dist-info`` directory, the pybi archive
must contain a top-level directory named ``pybi-info/``. (Rationale: calling it
``pybi-info`` instead ``dist-info`` makes sure that tools dont get confused
about which kind of metadata theyre looking at; leaving off the
``{name}-{version}`` part is fine because only one pybi can be installed into a
given directory.) The ``pybi-info/`` directory contains at least the following
files:
- ``.../PYBI``: metadata about the archive itself, in the same
RFC822-ish format as ``METADATA`` and ``WHEEL`` files:
::
Pybi-Version: 1.0
Generator: {name} {version}
Tag: {platform tag}
Tag: {another platform tag}
Tag: {...and so on...}
Build: 1 # optional
- ``.../RECORD``: same as in wheels, except see the note about
symlinks, below.
- ``.../METADATA``: In the same format as described in the current core
metadata spec, except that the following keys are forbidden because
they dont make sense:
- ``Requires-Dist``
- ``Provides-Extra``
- ``Requires-Python``
And also there are some new, required keys described below.
Pybi-specific core metadata
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here's an example of the new ``METADATA`` fields, before we give the full details::
Pybi-Environment-Marker-Variables: {"implementation_name": "cpython", "implementation_version": "3.10.8", "os_name": "posix", "platform_machine": "x86_64", "platform_system": "Linux", "python_full_version": "3.10.8", "platform_python_implementation": "CPython", "python_version": "3.10", "sys_platform": "linux"}
Pybi-Paths: {"stdlib": "lib/python3.10", "platstdlib": "lib/python3.10", "purelib": "lib/python3.10/site-packages", "platlib": "lib/python3.10/site-packages", "include": "include/python3.10", "platinclude": "include/python3.10", "scripts": "bin", "data": "."}
Pybi-Wheel-Tag: cp310-cp310-PLATFORM
Pybi-Wheel-Tag: cp310-abi3-PLATFORM
Pybi-Wheel-Tag: cp310-none-PLATFORM
Pybi-Wheel-Tag: cp39-abi3-PLATFORM
Pybi-Wheel-Tag: cp38-abi3-PLATFORM
Pybi-Wheel-Tag: cp37-abi3-PLATFORM
Pybi-Wheel-Tag: cp36-abi3-PLATFORM
Pybi-Wheel-Tag: cp35-abi3-PLATFORM
Pybi-Wheel-Tag: cp34-abi3-PLATFORM
Pybi-Wheel-Tag: cp33-abi3-PLATFORM
Pybi-Wheel-Tag: cp32-abi3-PLATFORM
Pybi-Wheel-Tag: py310-none-PLATFORM
Pybi-Wheel-Tag: py3-none-PLATFORM
Pybi-Wheel-Tag: py39-none-PLATFORM
Pybi-Wheel-Tag: py38-none-PLATFORM
Pybi-Wheel-Tag: py37-none-PLATFORM
Pybi-Wheel-Tag: py36-none-PLATFORM
Pybi-Wheel-Tag: py35-none-PLATFORM
Pybi-Wheel-Tag: py34-none-PLATFORM
Pybi-Wheel-Tag: py33-none-PLATFORM
Pybi-Wheel-Tag: py32-none-PLATFORM
Pybi-Wheel-Tag: py31-none-PLATFORM
Pybi-Wheel-Tag: py30-none-PLATFORM
Pybi-Wheel-Tag: py310-none-any
Pybi-Wheel-Tag: py3-none-any
Pybi-Wheel-Tag: py39-none-any
Pybi-Wheel-Tag: py38-none-any
Pybi-Wheel-Tag: py37-none-any
Pybi-Wheel-Tag: py36-none-any
Pybi-Wheel-Tag: py35-none-any
Pybi-Wheel-Tag: py34-none-any
Pybi-Wheel-Tag: py33-none-any
Pybi-Wheel-Tag: py32-none-any
Pybi-Wheel-Tag: py31-none-any
Pybi-Wheel-Tag: py30-none-any
Specification:
- ``Pybi-Environment-Marker-Variables``: The value of all PEP 508
environment marker variables that are static across installs of this
Pybi, as a JSON dict. So for example:
- ``python_version`` will always be present, because a Python 3.10 package
always has ``python_version == "3.10"``.
- ``platform_version`` will generally not be present, because it gives
detailed information about the OS where Python is running, for example::
#60-Ubuntu SMP Thu May 6 07:46:32 UTC 2021
``platform_release`` has similar issues.
- ``platform_machine`` will *usually* be present, except for macOS universal2
pybis: these can potentially be run in either x86-64 or arm64 mode, and we
don't know which until the interpreter is actually invoked, so we can't
record it in static metadata.
**Rationale:** In many cases, this should allow a resolver running on Linux
to compute package pins for a Python environment on Windows, or vice-versa,
so long as the resolver has access to the target platforms .pybi file. (Note
that ``Requires-Python`` constraints can be checked by using the
``python_full_version`` value.) While we have to leave out a few keys
sometimes, they're either fairly useless (``platform_version``,
``platform_release``) or can be reconstructed by the resolver
(``platform_machine``).
The markers are also just generally useful information to have
accessible. For example, if you have a ``pypy3-7.3.2`` pybi, and you
want to know what version of the Python language that supports, then
thats recorded in the ``python_version`` marker.
(Note: we may want to deprecate/remove ``platform_version`` and
``platform_release``? They're problematic and I can't figure out any cases
where they're useful. But that's out of scope of this particular PEP.)
- ``Pybi-Paths``: The install paths needed to install wheels (same keys
as ``sysconfig.get_paths()``), as relative paths starting at the root
of the zip file, as a JSON dict.
These paths MUST be written in Unix format, using forward slashes as
a separator, not backslashes.
It must be possible to invoke the Python interpreter by running
``{paths["scripts"]}/python``. If there are alternative interpreter
entry points (e.g. ``pythonw`` for Windows GUI apps), then they
should also be in that directory under their conventional names, with
no version number attached. (You can *also* have a ``python3.11``
symlink if you want; theres no rule against that. Its just that
``python`` has to exist and work.)
**Rationale:** ``Pybi-Paths`` and ``Pybi-Wheel-Tag``\ s (see below) are
together enough to let an installer choose wheels and install them into an
unpacked pybi environment, without invoking Python. Besides, we need to write
down the interpreter location somewhere, so its two birds with one stone.
- ``Pybi-Wheel-Tag``: The wheel tags supported by this interpreter, in
preference order (most-preferred first, least-preferred last), except
that the special platform tag ``PLATFORM`` should replace any
platform tags that depend on the final installation system.
**Discussion:** It would be nice™ if installers could compute a pybis
corresponding wheel tags ahead of time, so that they could install
wheels into the unpacked pybi without needing to actually invoke the
python interpreter to query its tags both for efficiency and to
allow for more exotic use cases like setting up a Windows environment
from a Linux host.
But unfortunately, its impossible to compute the full set of
platform tags supported by a Python installation ahead of time,
because they can depend on the final system:
- A pybi tagged ``manylinux_2_12_x86_64`` can always use wheels
tagged as ``manylinux_2_12_x86_64``. It also *might* be able to
use wheels tagged ``manylinux_2_17_x86_64``, but only if the final
installation system has glibc 2.17+.
- A pybi tagged ``macosx_11_0_universal2`` (= x86-64 + arm64 support
in the same binary) might be able to use wheels tagged as
``macosx_11_0_arm64``, but only if its installed on an “Apple
Silicon” machine and running in arm64 mode.
In these two cases, an installation tool can still work out the
appropriate set of wheel tags by computing the local platform tags,
taking the wheel tag templates from ``Pybi-Wheel-Tag``, and swapping
in the actual supported platforms in place of the magic ``PLATFORM``
string.
However, there are other cases that are even more complicated:
- You can (usually) run both 32- and 64-bit apps on 64-bit Windows. So a pybi
installer might compute the set of allowable pybi tags on the current
platform as [``win32``, ``win_amd64``]. But you cant then just take that
set and swap it into the pybis wheel tag template or you get nonsense:
::
[
"cp39-cp39-win32",
"cp39-cp39-win_amd64",
"cp39-abi3-win32",
"cp39-abi3-win_amd64",
...
]
To handle this, the installer needs to somehow understand that a
``manylinux_2_12_x86_64`` pybi can use a ``manylinux_2_17_x86_64`` wheel
as long as those are both valid tags on the current machine, but a
``win32`` pybi *cant* use a ``win_amd64`` wheel, even if those are both
valid tags on the current machine.
- A pybi tagged ``macosx_11_0_universal2`` might be able to use
wheels tagged as ``macosx_11_0_x86_64``, but only if its
installed on an x86-64 machine *or* its installed on an ARM
machine *and* the interpreter is invoked with the magic
incantation that tells macOS to run a binary in x86-64 mode. So
how the installer plans to invoke the pybi matters too!
So actually using ``Pybi-Wheel-Tag`` values is less trivial than it
might seem, and theyre probably only useful with fairly
sophisticated tooling. But, smart pybi installers will already have
to understand a lot of these platform compatibility issues in order
to select a working pybi, and for the cross-platform
pinning/environment building case, users can potentially provide
whatever information is needed to disambiguate exactly what platform
theyre targeting. So, its still useful enough to include in the PyBI
metadata -- tools that don't find it useful can simply ignore it.
You can probably generate these metadata values by running this script on the
built interpreter:
.. code:: python
import packaging.markers
import packaging.tags
import sysconfig
import os.path
import json
import sys
marker_vars = packaging.markers.default_environment()
# Delete any keys that depend on the final installation
del marker_vars["platform_release"]
del marker_vars["platform_version"]
# Darwin binaries are often multi-arch, so play it safe and
# delete the architecture marker. (Better would be to only
# do this if the pybi actually is multi-arch.)
if marker_vars["sys_platform"] == "darwin":
del marker_vars["platform_machine"]
# Copied and tweaked version of packaging.tags.sys_tags
tags = []
interp_name = packaging.tags.interpreter_name()
if interp_name == "cp":
tags += list(packaging.tags.cpython_tags(platforms=["xyzzy"]))
else:
tags += list(packaging.tags.generic_tags(platforms=["xyzzy"]))
tags += list(packaging.tags.compatible_tags(platforms=["xyzzy"]))
# Gross hack: packaging.tags normalizes platforms by lowercasing them,
# so we generate the tags with a unique string and then replace it
# with our special uppercase placeholder.
str_tags = [str(t).replace("xyzzy", "PLATFORM") for t in tags]
(base_path,) = sysconfig.get_config_vars("installed_base")
# For some reason, macOS framework builds report their
# installed_base as a directory deep inside the framework.
while "Python.framework" in base_path:
base_path = os.path.dirname(base_path)
paths = {key: os.path.relpath(path, base_path).replace("\\", "/") for (key, path) in sysconfig.get_paths().items()}
json.dump({"marker_vars": marker_vars, "tags": str_tags, "paths": paths}, sys.stdout)
This emits a JSON dict on stdout with separate entries for each set of
pybi-specific tags.
Symlinks
--------
Currently, symlinks are used by default in all Unix Python installs (e.g.,
``bin/python3 -> bin/python3.9``). And furthermore, symlinks are *required* to
store macOS framework builds in ``.pybi`` files. So, unlike wheel files, we
absolutely have to support symlinks in ``.pybi`` files for them to be useful at
all.
Representing symlinks in zip files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The de-facto standard for representing symlinks in zip files is the
Info-Zip symlink extension, which works as follows:
- The symlinks target path is stored as if it were the file contents
- The top 4 bits of the Unix permissions field are set to ``0xa``,
i.e.: ``permissions & 0xf000 == 0xa000``
- The Unix permissions field, in turn, is stored as the top 16 bits of
the “external attributes” field.
So if using Pythons ``zipfile`` module, you can check whether a
``ZipInfo`` represents a symlink by doing:
.. code:: python
(zip_info.external_attr >> 16) & 0xf000 == 0xa000
Or if using Rusts ``zip`` crate, the equivalent check is:
.. code:: rust
fn is_symlink(zip_file: &zip::ZipFile) -> bool {
match zip_file.unix_mode() {
Some(mode) => mode & 0xf000 == 0xa000,
None => false,
}
}
If youre on Unix, your ``zip`` and ``unzip`` commands probably understands this
format already.
Representing symlinks in RECORD files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Normally, a ``RECORD`` file lists each file + its hash + its length:
.. code:: text
my/favorite/file,sha256=...,12345
For symlinks, we instead write:
.. code:: text
name/of/symlink,symlink=path/to/symlink/target,
That is: we use a special “hash function” called ``symlink``, and then
store the actual symlink target as the “hash value”. And the length is
left empty.
**Rationale:** were already committed to the ``RECORD`` file containing a
redundant check on everything in the main archive, so for symlinks we at least
need to store some kind of hash, plus some kind of flag to indicate that this is
a symlink. Given that symlink target strings are roughly the same size as a
hash, we might as well store them directly. This also makes the symlink
information easier to access for tools that dont understand the Info-Zip
symlink extension, and makes it possible to losslessly unpack and repack a Unix
pybi on a Windows system, which someone might find handy at some point.
Storing symlinks in ``pybi`` files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a pybi creator stores a symlink, they MUST use both of the
mechanisms defined above: storing it in the zip archive directly using
the Info-Zip representation, and also recording it in the ``RECORD``
file.
Pybi consumers SHOULD validate that the symlinks in the archive and
``RECORD`` file are consistent with each other.
We also considered using *only* the ``RECORD`` file to store symlinks,
but then the vanilla ``unzip`` tool wouldnt be able to unpack them, and
that would make it hard to install a pybi from a shell script.
Limitations
~~~~~~~~~~~
Symlinks enable a lot of potential messiness. To keep things under
control, we impose the following restrictions:
- Symlinks MUST NOT be used in ``.pybi``\ s targeting Windows, or other
platforms that are missing first-class symlink support.
- Symlinks MUST NOT be used inside the ``pybi-info`` directory.
(Rationale: theres no need, and it makes things simpler for
resolvers that need to extract info from ``pybi-info`` without
unpacking the whole archive.)
- Symlink targets MUST be relative paths, and MUST be inside the pybi
directory.
- If ``A/B/...`` is recorded as a symlink in the archive, then there
MUST NOT be any other entries in the archive named like
``A/B/.../C``.
For example, if an archive has a symlink ``foo -> bar``, and then
later in the archive theres a regular file named ``foo/blah.py``,
then a naive unpacker could potentially end up writing a file called
``bar/blah.py``. Dont be naive.
Unpackers MUST verify that these rules are followed, because without
them attackers could create evil symlinks like ``foo -> /etc/passwd`` or
``foo -> ../../../../../etc`` + ``foo/passwd -> ...`` and cause havoc.
Non-normative comments
======================
Why not just use conda?
-----------------------
This isn't really in the scope of this PEP, but since conda is a popular way to
distribute binary Python interpreters, it's a natural question.
The simple answer is: conda is great! But, there are lots of python users who
aren't conda users, and they deserve nice things too. This PEP just gives them
another option.
The deeper answer is: the maintainers who upload packages to PyPI are the
backbone of the Python ecosystem. They're the first audience for Python
packaging tools. And one thing they want is to upload a package once, and have
it be accessible across all the different ways Python is deployed: in Debian and
Fedora and Homebrew and FreeBSD, in Conda environments, in big companies'
monorepos, in Nix, in Blender plugins, in RenPy games, ..... you get the idea.
All of these environments have their own tooling and strategies for managing
packages and dependencies. So what's special about PyPI and wheels is that
they're designed to describe dependencies in a *standard, abstract way*, that
all these downstream systems can consume and convert into their local
conventions. That's why package maintainers use Python-specific metadata and
upload to PyPI: because it lets them address all of those systems
simultaneously. Every time you build a Python package for conda, there's an
intermediate wheel that's generated, because wheels are the common language that
Python package build systems and conda can use to talk to each other.
But then, if you're a maintainer releasing an sdist+wheels, then you naturally
want to test what you're releasing, which may depend on arbitrary PyPI packages
and versions. So you need tools that build Python environments directly from
PyPI, and conda is fundamentally not designed to do that. So conda and pip are
both necessary for different cases, and this proposal happens to be targeting
the pip side of that equation.
Sdists (or not)
---------------
It might be cool to have an “sdist” equivalent for pybis, i.e., some
kind of format for a Python source release thats structured-enough to
let tools automatically fetch and build it into a pybi, for platforms
where prebuilt pybis arent available. But, this isnt necessary for the
MVP and opens a can of worms, so lets worry about it later.
What packages should be bundled inside a pybi?
----------------------------------------------
Pybi builders have the power to pick and choose what exactly goes inside. For
example, you could include some preinstalled packages in the pybis
``site-packages`` directory, or prune out bits of the stdlib that you dont
want. We cant stop you! Though if you do preinstall packages, then it's
strongly recommended to also include the correct metadata (``.dist-info`` etc.),
so that its possible for Pip or other tools to understand out whats going on.
For my prototype “general purpose” pybis, what I chose is:
- Make sure ``site-packages`` is *empty*.
**Rationale:** for traditional standalone python installers that are targeted
at end-users, you probably want to include at least ``pip``, to avoid
bootstrapping issues (:pep:`453`). But pybis are different: theyre designed
to be installed by “smart” tooling, that consume the pybi as part of some
kind of larger automated deployment process. Its easier for these installers
to start from a blank slate and then add whatever they need, than for them to
start with some preinstalled packages that they may or may not want. (And
besides, you can still run ``python -m ensurepip``.)
- Include the full stdlib, *except* for ``test``.
**Rationale:** the top-level ``test`` module contains CPythons own test
suite. Its huge (CPython without ``test`` is ~37 MB, then ``test``
adds another ~25 MB on top of that!), and essentially never used by
regular user code. Also, as precedent, the official nuget packages,
the official manylinux images, and multiple Linux distributions all
leave it out, and this hasnt caused any major problems.
So this seems like the best way to balance broad compatibility with
reasonable download/install sizes.
- Im not shipping any ``.pyc`` files. They take up space in the
download, can be generated on the final system at minimal cost, and
dropping them removes a source of location-dependence. (``.pyc``
files store the absolute path of the corresponding ``.py`` file and
include it in tracebacks; but, pybis are relocatable, so the correct
path isnt known until after install.)
Backwards Compatibility
=======================
No backwards compatibility considerations.
Security Implications
=====================
No security implications, beyond the fact that anyone who takes it upon
themselves to distribute binaries has to come up with a plan to manage their
security (e.g., whether they roll a new build after an OpenSSL CVE drops). But
collectively, we core Python folks are already maintaining binary builds for all
major platforms (macOS + Windows through python.org, and Linux builds through
the official manylinux image), so even if we do start releasing official CPython
builds on PyPI it doesn't really raise any new security issues.
How to Teach This
=================
This isn't targeted at end-users; their experience will simply be that e.g.
their pyenv or tox invocation magically gets faster and more reliable (if those
projects' maintainers decide to take advantage of this PEP).
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.