PEP: 513 Title: A Platform Tag for Portable Linux Built Distributions Version: $Revision$ Last-Modified: $Date$ Author: Robert T. McGibbon , Nathaniel J. Smith BDFL-Delegate: Nick Coghlan Discussions-To: Distutils SIG Status: Active Type: Informational Content-Type: text/x-rst Created: 19-Jan-2016 Post-History: 19-Jan-2016, 25-Jan-2016, 29-Jan-2016 Resolution: https://mail.python.org/pipermail/distutils-sig/2016-January/028211.html Abstract ======== This PEP proposes the creation of a new platform tag for Python package built distributions, such as wheels, called ``manylinux1_{x86_64,i686}`` with external dependencies limited to a standardized, restricted subset of the Linux kernel and core userspace ABI. It proposes that PyPI support uploading and distributing wheels with this platform tag, and that ``pip`` support downloading and installing these packages on compatible platforms. Rationale ========= Currently, distribution of binary Python extensions for Windows and OS X is straightforward. Developers and packagers build wheels [1]_ [2]_, which are assigned platform tags such as ``win32`` or ``macosx_10_6_intel``, and upload these wheels to PyPI. Users can download and install these wheels using tools such as ``pip``. For Linux, the situation is much more delicate. In general, compiled Python extension modules built on one Linux distribution will not work on other Linux distributions, or even on different machines running the same Linux distribution with different system libraries installed. Build tools using PEP 425 platform tags [3]_ do not track information about the particular Linux distribution or installed system libraries, and instead assign all wheels the too-vague ``linux_i686`` or ``linux_x86_64`` tags. Because of this ambiguity, there is no expectation that ``linux``-tagged built distributions compiled on one machine will work properly on another, and for this reason, PyPI has not permitted the uploading of wheels for Linux. It would be ideal if wheel packages could be compiled that would work on *any* linux system. But, because of the incredible diversity of Linux systems -- from PCs to Android to embedded systems with custom libcs -- this cannot be guaranteed in general. Instead, we define a standard subset of the kernel+core userspace ABI that, in practice, is compatible enough that packages conforming to this standard will work on *many* linux systems, including essentially all of the desktop and server distributions in common use. We know this because there are companies who have been distributing such widely-portable pre-compiled Python extension modules for Linux -- e.g. Enthought with Canopy [4]_ and Continuum Analytics with Anaconda [5]_. Building on the compatibility lessons learned from these companies, we thus define a baseline ``manylinux1`` platform tag for use by binary Python wheels, and introduce the implementation of preliminary tools to aid in the construction of these ``manylinux1`` wheels. Key Causes of Inter-Linux Binary Incompatibility ================================================ To properly define a standard that will guarantee that wheel packages meeting this specification will operate on *many* linux platforms, it is necessary to understand the root causes which often prevent portability of pre-compiled binaries on Linux. The two key causes are dependencies on shared libraries which are not present on users' systems, and dependencies on particular versions of certain core libraries like ``glibc``. External Shared Libraries ------------------------- Most desktop and server linux distributions come with a system package manager (examples include ``APT`` on Debian-based systems, ``yum`` on ``RPM``-based systems, and ``pacman`` on Arch linux) that manages, among other responsibilities, the installation of shared libraries installed to system directories such as ``/usr/lib``. Most non-trivial Python extensions will depend on one or more of these shared libraries, and thus function properly only on systems where the user has the proper libraries (and the proper versions thereof), either installed using their package manager, or installed manually by setting certain environment variables such as ``LD_LIBRARY_PATH`` to notify the runtime linker of the location of the depended-upon shared libraries. Versioning of Core Shared Libraries ----------------------------------- Even if the developers a Python extension module wish to use no external shared libraries, the modules will generally have a dynamic runtime dependency on the GNU C library, ``glibc``. While it is possible, statically linking ``glibc`` is usually a bad idea because certain important C functions like ``dlopen()`` cannot be called from code that statically links ``glibc``. A runtime shared library dependency on a system-provided ``glibc`` is unavoidable in practice. The maintainers of the GNU C library follow a strict symbol versioning scheme for backward compatibility. This ensures that binaries compiled against an older version of ``glibc`` can run on systems that have a newer ``glibc``. The opposite is generally not true -- binaries compiled on newer Linux distributions tend to rely upon versioned functions in ``glibc`` that are not available on older systems. This generally prevents wheels compiled on the latest Linux distributions from being portable. The ``manylinux1`` policy ========================= For these reasons, to achieve broad portability, Python wheels * should depend only on an extremely limited set of external shared libraries; and * should depend only on "old" symbol versions in those external shared libraries; and * should depend only on a widely-compatible kernel ABI. To be eligible for the ``manylinux1`` platform tag, a Python wheel must therefore both (a) contain binary executables and compiled code that links *only* to libraries with SONAMEs included in the following list: :: libpanelw.so.5 libncursesw.so.5 libgcc_s.so.1 libstdc++.so.6 libm.so.6 libdl.so.2 librt.so.1 libcrypt.so.1 libc.so.6 libnsl.so.1 libutil.so.1 libpthread.so.0 libX11.so.6 libXext.so.6 libXrender.so.1 libICE.so.6 libSM.so.6 libGL.so.1 libgobject-2.0.so.0 libgthread-2.0.so.0 libglib-2.0.so.0 and, (b) work on a stock CentOS 5.11 [6]_ system that contains the system package manager's provided versions of these libraries. Because CentOS 5 is only available for x86_64 and i686 architectures, these are the only architectures currently supported by the ``manylinux1`` policy. On Debian-based systems, these libraries are provided by the packages :: libncurses5 libgcc1 libstdc++6 libc6 libx11-6 libxext6 libxrender1 libice6 libsm6 libgl1-mesa-glx libglib2.0-0 On RPM-based systems, these libraries are provided by the packages :: ncurses libgcc libstdc++ glibc libXext libXrender libICE libSM mesa-libGL glib2 This list was compiled by checking the external shared library dependencies of the Canopy [4]_ and Anaconda [5]_ distributions, which both include a wide array of the most popular Python modules and have been confirmed in practice to work across a wide swath of Linux systems in the wild. Many of the permitted system libraries listed above use symbol versioning schemes for backward compatibility. The latest symbol versions provided with the CentOS 5.11 versions of these libraries are: :: GLIBC_2.5 CXXABI_3.4.8 GLIBCXX_3.4.9 GCC_4.2.0 Therefore, as a consequence of requirement (b), any wheel that depends on versioned symbols from the above shared libraries may depend only on symbols with the following versions: :: GLIBC <= 2.5 CXXABI <= 3.4.8 GLIBCXX <= 3.4.9 GCC <= 4.2.0 These recommendations are the outcome of the relevant discussions in January 2016 [7]_, [8]_. Note that in our recommendations below, we do not suggest that ``pip`` or PyPI should attempt to check for and enforce the details of this policy (just as they don't check for and enforce the details of existing platform tags like ``win32``). The text above is provided (a) as advice to package builders, and (b) as a method for allocating blame if a given wheel doesn't work on some system: if it satisfies the policy above, then this is a bug in the spec or the installation tool; if it does not satisfy the policy above, then it's a bug in the wheel. One useful consequence of this approach is that it leaves open the possibility of further updates and tweaks as we gain more experience, e.g., we could have a "manylinux 1.1" policy which targets the same systems and uses the same ``manylinux1`` platform tag (and thus requires no further changes to ``pip`` or PyPI), but that adjusts the list above to remove libraries that have turned out to be problematic or add libraries that have turned out to be safe. libpythonX.Y.so.1 ----------------- Note that ``libpythonX.Y.so.1`` is *not* on the list of libraries that a ``manylinux1`` extension is allowed to link to. Explicitly linking to ``libpythonX.Y.so.1`` is unnecessary in almost all cases: the way ELF linking works, extension modules that are loaded into the interpreter automatically get access to all of the interpreter's symbols, regardless of whether or not the extension itself is explicitly linked against libpython. Furthermore, explicit linking to libpython creates problems in the common configuration where Python is not built with ``--enable-shared``. In particular, on Debian and Ubuntu systems, ``apt install pythonX.Y`` does not even install ``libpythonX.Y.so.1``, meaning that any wheel that *did* depend on ``libpythonX.Y.so.1`` could fail to import. There is one situation where extensions that are linked in this way can fail to work: if a host program (e.g., ``apache2``) uses ``dlopen()`` to load a module (e.g., ``mod_wsgi``) that embeds the CPython interpreter, and the host program does *not* pass the ``RTLD_GLOBAL`` flag to ``dlopen()``, then the embedded CPython will be unable to load any extension modules that do not themselves link explicitly to ``libpythonX.Y.so.1``. Fortunately, ``apache2`` *does* set the ``RTLD_GLOBAL`` flag, as do all the other programs that embed-CPython-via-a-dlopened-plugin that we could locate, so this does not seem to be a serious problem in practice. The incompatibility with Debian/Ubuntu is more of an issue than the theoretical incompatibility with a rather obscure corner case. This is a rather complex and subtle issue that extends beyond the scope of ``manylinux1``; for more discussion see: [9]_, [10]_, [11]_. UCS-2 vs UCS-4 builds --------------------- All versions of CPython 2.x, plus CPython 3.0-3.2 inclusive, can be built in two ABI-incompatible modes: builds using the ``--enable-unicode=ucs2`` configure flag store Unicode data in UCS-2 (or really UTF-16) format, while builds using the ``--enable-unicode=ucs4`` configure flag store Unicode data in UCS-4. (CPython 3.3 and greater use a different storage method that always supports UCS-4.) If we want to make sure ``ucs2`` wheels don't get installed into ``ucs4`` CPythons and vice-versa, then something must be done. An earlier version of this PEP included a requirement that ``manylinux1`` wheels targeting these older CPython versions should always use the ``ucs4`` ABI. But then, in between the PEP's initial acceptance and its implementation, ``pip`` and ``wheel`` gained first-class support for tracking and checking this aspect of ABI compatibility for the relevant CPython versions, which is a better solution. So we now allow the ``manylinux1`` platform tags to be used in combination with any ABI tag. However, to maintain compatibility it is crucial to ensure that all ``manylinux1`` wheels include a non-trivial abi tag. For example, a wheel built against a ``ucs4`` CPython might have a name like:: PKG-VERSION-cp27-cp27mu-manylinux1_x86_64.whl ^^^^^^ Good! While a wheel built against the ``ucs2`` ABI might have a name like:: PKG-VERSION-cp27-cp27m-manylinux1_x86_64.whl ^^^^^ Okay! But you should never have a wheel with a name like:: PKG-VERSION-cp27-none-manylinux1_x86_64.whl ^^^^ BAD! Don't do this! We note for information that the ``ucs4`` ABI appears to be much more widespread among Linux CPython distributors. Compilation of Compliant Wheels =============================== The way glibc, libgcc, and libstdc++ manage their symbol versioning means that in practice, the compiler toolchains that most developers use to do their daily work are incapable of building ``manylinux1``-compliant wheels. Therefore, we do not attempt to change the default behavior of ``pip wheel`` / ``bdist_wheel``: they will continue to generate regular ``linux_*`` platform tags, and developers who wish to use them to generate ``manylinux1``-tagged wheels will have to change the tag as a second post-processing step. To support the compilation of wheels meeting the ``manylinux1`` standard, we provide initial drafts of two tools. Docker Image ------------ The first tool is a Docker image based on CentOS 5.11, which is recommended as an easy to use self-contained build box for compiling ``manylinux1`` wheels [12]_. Compiling on a more recently-released linux distribution will generally introduce dependencies on too-new versioned symbols. The image comes with a full compiler suite installed (``gcc``, ``g++``, and ``gfortran`` 4.8.2) as well as the latest releases of Python and ``pip``. Auditwheel ---------- The second tool is a command line executable called ``auditwheel`` [13]_ that may aid in package maintainers in dealing with third-party external dependencies. There are at least three methods for building wheels that use third-party external libraries in a way that meets the above policy. 1. The third-party libraries can be statically linked. 2. The third-party shared libraries can be distributed in separate packages on PyPI which are depended upon by the wheel. 3. The third-party shared libraries can be bundled inside the wheel libraries, linked with a relative path. All of these are valid option which may be effectively used by different packages and communities. Statically linking generally requires package-specific modifications to the build system, and distributing third-party dependencies on PyPI may require some coordination of the community of users of the package. As an often-automatic alternative to these options, we introduce ``auditwheel``. The tool inspects all of the ELF files inside a wheel to check for dependencies on versioned symbols or external shared libraries, and verifies conformance with the ``manylinux1`` policy. This includes the ability to add the new platform tag to conforming wheels. More importantly, ``auditwheel`` has the ability to automatically modify wheels that depend on external shared libraries by copying those shared libraries from the system into the wheel itself, and modifying the appropriate ``RPATH`` entries such that these libraries will be picked up at runtime. This accomplishes a similar result as if the libraries had been statically linked without requiring changes to the build system. Packagers are advised that bundling, like static linking, may implicate copyright concerns. Bundled Wheels on Linux ======================= While we acknowledge many approaches for dealing with third-party library dependencies within ``manylinux1`` wheels, we recognize that the ``manylinux1`` policy encourages bundling external dependencies, a practice which runs counter to the package management policies of many linux distributions' system package managers [14]_, [15]_. The primary purpose of this is cross-distro compatibility. Furthermore, ``manylinux1`` wheels on PyPI occupy a different niche than the Python packages available through the system package manager. The decision in this PEP to encourage departure from general Linux distribution unbundling policies is informed by the following concerns: 1. In these days of automated continuous integration and deployment pipelines, publishing new versions and updating dependencies is easier than it was when those policies were defined. 2. ``pip`` users remain free to use the ``"--no-binary"`` option if they want to force local builds rather than using pre-built wheel files. 3. The popularity of modern container based deployment and "immutable infrastructure" models involve substantial bundling at the application layer anyway. 4. Distribution of bundled wheels through PyPI is currently the norm for Windows and OS X. 5. This PEP doesn't rule out the idea of offering more targeted binaries for particular Linux distributions in the future. The model described in this PEP is most ideally suited for cross-platform Python packages, because it means they can reuse much of the work that they're already doing to make static Windows and OS X wheels. We recognize that it is less optimal for Linux-specific packages that might prefer to interact more closely with Linux's unique package management functionality and only care about targeting a small set of particular distos. Security Implications --------------------- One of the advantages of dependencies on centralized libraries in Linux is that bugfixes and security updates can be deployed system-wide, and applications which depend on these libraries will automatically feel the effects of these patches when the underlying libraries are updated. This can be particularly important for security updates in packages engaged in communication across the network or cryptography. ``manylinux1`` wheels distributed through PyPI that bundle security-critical libraries like OpenSSL will thus assume responsibility for prompt updates in response disclosed vulnerabilities and patches. This closely parallels the security implications of the distribution of binary wheels on Windows that, because the platform lacks a system package manager, generally bundle their dependencies. In particular, because it lacks a stable ABI, OpenSSL cannot be included in the ``manylinux1`` profile. Platform Detection for Installers ================================= Above, we defined what it means for a *wheel* to be ``manylinux1``-compatible. Here we discuss what it means for a *Python installation* to be ``manylinux1``-compatible. In particular, this is important for tools like ``pip`` to know when deciding whether or not they should consider ``manylinux1``-tagged wheels for installation. Because the ``manylinux1`` profile is already known to work for the many thousands of users of popular commercial Python distributions, we suggest that installation tools should error on the side of assuming that a system *is* compatible, unless there is specific reason to think otherwise. We know of four main sources of potential incompatibility that are likely to arise in practice: * Eventually, in the future, there may exist distributions that break compatibility with this profile (e.g., if one of the libraries in the profile changes its ABI in a backwards-incompatible way) * A linux distribution that is too old (e.g. RHEL 4) * A linux distribution that does not use ``glibc`` (e.g. Alpine Linux, which is based on musl ``libc``, or Android) To address these we propose a two-pronged approach. To handle potential future incompatibilities, we standardize a mechanism for a Python distributor to signal that a particular Python install definitely is or is not compatible with ``manylinux1``: this is done by installing a module named ``_manylinux``, and setting its ``manylinux1_compatible`` attribute. We do not propose adding any such module to the standard library -- this is merely a well-known name by which distributors and installation tools can rendezvous. However, if a distributor does add this module, *they should add it to the standard library* rather than to a ``site-packages/`` directory, because the standard library is inherited by virtualenvs (which we want), and ``site-packages/`` in general is not. Then, to handle the last two cases for existing Python distributions, we suggest a simple and reliable method to check for the presence and version of ``glibc`` (basically using it as a "clock" for the overall age of the distribution). Specifically, the algorithm we propose is:: def is_manylinux1_compatible(): # Only Linux, and only x86-64 / i686 from distutils.util import get_platform if get_platform() not in ["linux-x86_64", "linux-i686"]: return False # Check for presence of _manylinux module try: import _manylinux return bool(_manylinux.manylinux1_compatible) except (ImportError, AttributeError): # Fall through to heuristic check below pass # Check glibc version. CentOS 5 uses glibc 2.5. return have_compatible_glibc(2, 5) def have_compatible_glibc(major, minimum_minor): import ctypes process_namespace = ctypes.CDLL(None) try: gnu_get_libc_version = process_namespace.gnu_get_libc_version except AttributeError: # Symbol doesn't exist -> therefore, we are not linked to # glibc. return False # Call gnu_get_libc_version, which returns a string like "2.5". gnu_get_libc_version.restype = ctypes.c_char_p version_str = gnu_get_libc_version() # py2 / py3 compatibility: if not isinstance(version_str, str): version_str = version_str.decode("ascii") # Parse string and check against requested version. version = [int(piece) for piece in version_str.split(".")] assert len(version) == 2 if major != version[0]: return False if minimum_minor > version[1]: return False return True **Rejected alternatives:** We also considered using a configuration file, e.g. ``/etc/python/compatibility.cfg``. The problem with this is that a single filesystem might contain many different interpreter environments, each with their own ABI profile -- the ``manylinux1`` compatibility of a system-installed x86_64 CPython might not tell us much about the ``manylinux1`` compatibility of a user-installed i686 PyPy. Locating this configuration information within the Python environment itself ensures that it remains attached to the correct binary, and dramatically simplifies lookup code. We also considered using a more elaborate structure, like a list of all platform tags that should be considered compatible, together with their preference ordering, for example: ``_binary_compat.compatible = ["manylinux1_x86_64", "centos5_x86_64", "linux_x86_64"]``. However, this introduces several complications. For example, we want to be able to distinguish between the state of "doesn't support ``manylinux1``" (or eventually ``manylinux2``, etc.) versus "doesn't specify either way whether it supports ``manylinux1``", which is not entirely obvious in the above representation; and, it's not at all clear what features are really needed vis a vis preference ordering given that right now the only possible platform tags are ``manylinux1`` and ``linux``. So we're deferring a more complete solution here for a separate PEP, when / if Linux gets more platform tags. For the library compatibility check, we also considered much more elaborate checks (e.g. checking the kernel version, searching for and checking the versions of all the individual libraries listed in the ``manylinux1`` profile, etc.), but ultimately decided that this would be more likely to introduce confusing bugs than actually help the user. (For example: different distributions vary in where they actually put these libraries, and if our checking code failed to use the correct path search then it could easily return incorrect answers.) PyPI Support ============ PyPI should permit wheels containing the ``manylinux1`` platform tag to be uploaded. PyPI should not attempt to formally verify that wheels containing the ``manylinux1`` platform tag adhere to the ``manylinux1`` policy described in this document. This verification tasks should be left to other tools, like ``auditwheel``, that are developed separately. Rejected Alternatives ===================== One alternative would be to provide separate platform tags for each Linux distribution (and each version thereof), e.g. ``RHEL6``, ``ubuntu14_10``, ``debian_jessie``, etc. Nothing in this proposal rules out the possibility of adding such platform tags in the future, or of further extensions to wheel metadata that would allow wheels to declare dependencies on external system-installed packages. However, such extensions would require substantially more work than this proposal, and still might not be appreciated by package developers who would prefer not to have to maintain multiple build environments and build multiple wheels in order to cover all the common Linux distributions. Therefore, we consider such proposals to be out-of-scope for this PEP. Future updates ============== We anticipate that at some point in the future there will be a ``manylinux2`` specifying a more modern baseline environment (perhaps based on CentOS 6), and someday a ``manylinux3`` and so forth, but we defer specifying these until we have more experience with the initial ``manylinux1`` proposal. References ========== .. [1] PEP 427 -- The Wheel Binary Package Format 1.0 (https://www.python.org/dev/peps/pep-0427/) .. [2] PEP 491 -- The Wheel Binary Package Format 1.9 (https://www.python.org/dev/peps/pep-0491/) .. [3] PEP 425 -- Compatibility Tags for Built Distributions (https://www.python.org/dev/peps/pep-0425/) .. [4] Enthought Canopy Python Distribution (https://store.enthought.com/downloads/) .. [5] Continuum Analytics Anaconda Python Distribution (https://www.continuum.io/downloads) .. [6] CentOS 5.11 Release Notes (https://wiki.centos.org/Manuals/ReleaseNotes/CentOS5.11) .. [7] manylinux-discuss mailing list discussion (https://groups.google.com/forum/#!topic/manylinux-discuss/-4l3rrjfr9U) .. [8] distutils-sig discussion (https://mail.python.org/pipermail/distutils-sig/2016-January/027997.html) .. [9] distutils-sig discussion (https://mail.python.org/pipermail/distutils-sig/2016-February/028275.html) .. [10] github issue discussion (https://github.com/pypa/manylinux/issues/30) .. [11] python bug tracker discussion (https://bugs.python.org/issue21536) .. [12] manylinux1 docker images (Source: https://github.com/pypa/manylinux; x86-64: https://quay.io/repository/pypa/manylinux1_x86_64; x86-32: https://quay.io/repository/pypa/manylinux1_i686) .. [13] auditwheel tool (https://pypi.python.org/pypi/auditwheel) .. [14] Fedora Bundled Software Policy (https://fedoraproject.org/wiki/Bundled_Software_policy) .. [15] Debian Policy Manual -- 4.13: Convenience copies of code (https://www.debian.org/doc/debian-policy/ch-source.html#s-embeddedfiles) Copyright ========= This document has been placed into the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: