PEP: 513 Title: A Platform Tag for Portable Linux Built Distributions Version: $Revision$ Last-Modified: $Date$ Author: Robert T. McGibbon , Nathaniel J. Smith BDFL-Delegate: Nick Coghlan Status: Draft Type: Informational Content-Type: text/x-rst Created: 19-Jan-2016 Post-History: 19-Jan-2016 Abstract ======== This PEP proposes the creation of a new platform tag for Python package built distributions, such as wheels, called ``manylinux1_{x86_64,i386}`` with external dependencies limited restricted to a standardized subset of the Linux kernel and core userspace ABI. It proposes that PyPI support uploading and distributing Wheels with this platform tag, and that ``pip`` support downloading and installing these packages on compatible platforms. Rationale ========= Currently, distribution of binary Python extensions for Windows and OS X is straightforward. Developers and packagers build wheels, which are assigned platform tags such as ``win32`` or ``macosx_10_6_intel``, and upload these wheels to PyPI. Users can download and install these wheels using tools such as ``pip``. For Linux, the situation is much more delicate. In general, compiled Python extension modules built on one Linux distribution will not work on other Linux distributions, or even on the same Linux distribution with different system libraries installed. Build tools using PEP 425 platform tags [1]_ do not track information about the particular Linux distribution or installed system libraries, and instead assign all wheels the too-vague ``linux_i386`` or ``linux_x86_64`` tags. Because of this ambiguity, there is no expectation that ``linux``-tagged built distributions compiled on one machine will work properly on another, and for this reason, PyPI has not permitted the uploading of wheels for Linux. It would be ideal if wheel packages could be compiled that would work on *any* linux system. But, because of the incredible diversity of Linux systems -- from PCs to Android to embedded systems with custom libcs -- this cannot be guaranteed in general. Instead, we define a standard subset of the kernel+core userspace ABI that, in practice, is compatible enough that packages conforming to this standard will work on *many* linux systems, including essentially all of the desktop and server distributions in common use. We know this because there are companies who have been distributing such widely-portable pre-compiled Python extension modules for Linux -- e.g. Enthought with Canopy [2]_ and Continuum Analytics with Anaconda [3]_. Building on the compability lessons learned from these companies, we thus define a baseline ``manylinux1`` platform tag for use by binary Python wheels, and introduce the implementation of preliminary tools to aid in the construction of these ``manylinux1`` wheels. Key Causes of Inter-Linux Binary Incompatibility ================================================ To properly define a standard that will guarantee that wheel packages meeting this specification will operate on *many* linux platforms, it is necessary to understand the root causes which often prevent portability of pre-compiled binaries on Linux. The two key causes are dependencies on shared libraries which are not present on users' systems, and dependencies on particular versions of certain core libraries like ``glibc``. External Shared Libraries ------------------------- Most desktop and server linux distributions come with a system package manager (examples include ``APT`` on Debian-based systems, ``yum`` on ``RPM``-based systems, and ``pacman`` on Arch linux) that manages, among other responsibilities, the installation of shared libraries installed to system directories such as ``/usr/lib``. Most non-trivial Python extensions will depend on one or more of these shared libraries, and thus function properly only on systems where the user has the proper libraries (and the proper versions thereof), either installed using their package manager, or installed manually by setting certain environment variables such as ``LD_LIBRARY_PATH`` to notify the runtime linker of the location of the depended-upon shared libraries. Versioning of Core Shared Libraries ----------------------------------- Even if author or maintainers of a Python extension module with to use no external shared libraries, the modules will generally have a dynamic runtime dependency on the GNU C library, ``glibc``. While it is possible, statically linking ``glibc`` is usually a bad idea because of bloat, and because certain important C functions like ``dlopen()`` cannot be called from code that statically links ``glibc``. A runtime shared library dependency on a system-provided ``glibc`` is unavoidable in practice. The maintainers of the GNU C library follow a strict symbol versioning scheme for backward compatibility. This ensures that binaries compiled against an older version of ``glibc`` can run on systems that have a newer ``glibc``. The opposite is generally not true -- binaries compiled on newer Linux distributions tend to rely upon versioned functions in glibc that are not available on older systems. This generally prevents built distributions compiled on the latest Linux distributions from being portable. The ``manylinux1`` policy ========================= For these reasons, to achieve broad portability, Python wheels * should depend only on an extremely limited set of external shared libraries; and * should depend only on ``old`` symbol versions in those external shared libraries. The ``manylinux1`` policy thus encompasses a standard for what the permitted external shared libraries a wheel may depend on, and the maximum depended-upon symbol versions therein. The permitted external shared libraries are: :: libpanelw.so.5 libncursesw.so.5 libgcc_s.so.1 libstdc++.so.6 libm.so.6 libdl.so.2 librt.so.1 libcrypt.so.1 libc.so.6 libnsl.so.1 libutil.so.1 libpthread.so.0 libX11.so.6 libXext.so.6 libXrender.so.1 libICE.so.6 libSM.so.6 libGL.so.1 libgobject-2.0.so.0 libgthread-2.0.so.0 libglib-2.0.so.0 On Debian-based systems, these libraries are provided by the packages :: libncurses5 libgcc1 libstdc++6 libc6 libx11-6 libxext6 libxrender1 libice6 libsm6 libgl1-mesa-glx libglib2.0-0 On RPM-based systems, these libraries are provided by the packages :: ncurses libgcc libstdc++ glibc libXext libXrender libICE libSM mesa-libGL glib2 This list was compiled by checking the external shared library dependencies of the Canopy [1]_ and Anaconda [2]_ distributions, which both include a wide array of the most popular Python modules and have been confirmed in practice to work across a wide swath of Linux systems in the wild. For dependencies on externally-provided versioned symbols in the above shared libraries, the following symbol versions are permitted: :: GLIBC <= 2.5 CXXABI <= 3.4.8 GLIBCXX <= 3.4.9 GCC <= 4.2.0 These symbol versions were determined by inspecting the latest symbol version provided in the libraries distributed with CentOS 5, a Linux distribution released in April 2007. In practice, this means that Python wheels which conform to this policy should function on almost any linux distribution released after this date. Compilation and Tooling ======================= To support the compilation of wheels meeting the ``manylinux1`` standard, we provide initial drafts of two tools. The first is a Docker image based on CentOS 5.11, which is recommended as an easy to use self-contained build box for compiling ``manylinux1`` wheels [4]_. Compiling on a more recently-released linux distribution will generally introduce dependencies on too-new versioned symbols. The image comes with a full compiler suite installed (``gcc``, ``g++``, and ``gfortran`` 4.8.2) as well as the latest releases of Python and pip. The second tool is a command line executable called ``auditwheel`` [5]_. First, it inspects all of the ELF files inside a wheel to check for dependencies on versioned symbols or external shared libraries, and verifies conformance with the ``manylinux1`` policy. This includes the ability to add the new platform tag to conforming wheels. In addition, ``auditwheel`` has the ability to automatically modify wheels that depend on external shared libraries by copying those shared libraries from the system into the wheel itself, and modifying the appropriate RPATH entries such that these libraries will be picked up at runtime. This accomplishes a similar result as if the libraries had been statically linked without requiring changes to the build system. Neither of these tools are necessary to build wheels which conform with the ``manylinux1`` policy. Similar results can usually be achieved by statically linking external dependencies and/or using certain inline assembly constructs to instruct the linker to prefer older symbol versions, however these tricks can be quite esoteric. Platform Detection for Installers ================================= Because the ``manylinux1`` profile is already known to work for the many thousands of users of popular commercial Python distributions, we suggest that installation tools like ``pip`` should error on the side of assuming that a system *is* compatible, unless there is specific reason to think otherwise. We know of three main sources of potential incompatibility that are likely to arise in practice: * A linux distribution that is too old (e.g. RHEL 4) * A linux distribution that does not use glibc (e.g. Alpine Linux, which is based on musl libc, or Android) * Eventually, in the future, there may exist distributions that break compatibility with this profile To handle the first two cases, we propose the following simple and reliable check: :: def have_glibc_version(major, minimum_minor): import ctypes process_namespace = ctypes.CDLL(None) try: gnu_get_libc_version = process_namespace.gnu_get_libc_version except AttributeError: # We are not linked to glibc. return False gnu_get_libc_version.restype = ctypes.c_char_p version_str = gnu_get_libc_version() # py2 / py3 compatibility: if not isinstance(version_str, str): version_str = version_str.decode("ascii") version = [int(piece) for piece in version_str.split(".")] assert len(version) == 2 if major != version[0]: return False if minimum_minor > version[1]: return False return True # CentOS 5 uses glibc 2.5. is_manylinux1_compatible = have_glibc_version(2, 5) To handle the third case, we propose the creation of a file ``/etc/python/compatibility.cfg`` in ConfigParser format, with sample contents: :: [manylinux1] compatible = true where the supported values for the ``manylinux1.compatible`` entry are the same as those supported by the ConfigParser ``getboolean`` method. The proposed logic for ``pip`` or related tools, then, is: 0) If ``distutils.util.get_platform()`` does not start with the string ``"linux"``, then assume the current system is not ``manylinux1`` compatible. 1) If ``/etc/python/compatibility.conf`` exists and contains a ``manylinux1`` key, then trust that. 2) Otherwise, if ``have_glibc_version(2, 5)`` returns true, then assume the current system can handle ``manylinux1`` wheels. 3) Otherwise, assume that the current system cannot handle ``manylinux1`` wheels. Security Implications ===================== One of the advantages of dependencies on centralized libraries in Linux is that bugfixes and security updates can be deployed system-wide, and applications which depend on on these libraries will automatically feel the effects of these patches when the underlying libraries are updated. This can be particularly important for security updates in packages communication across the network or cryptography. ``manylinux1`` wheels distributed through PyPI that bundle security-critical libraries like OpenSSL will thus assume responsibility for prompt updates in response disclosed vulnerabilities and patches. This closely parallels the security implications of the distribution of binary wheels on Windows that, because the platform lacks a system package manager, generally bundle their dependencies. In particular, because its lacks a stable ABI, OpenSSL cannot be included in the ``manylinux1`` profile. Rejected Alternatives ===================== One alternative would be to provide separate platform tags for each Linux distribution (and each version thereof), e.g. ``RHEL6``, ``ubuntu14_10``, ``debian_jessie``, etc. Nothing in this proposal rules out the possibility of adding such platform tags in the future, or of further extensions to wheel metadata that would allow wheels to declare dependencies on external system-installed packages. However, such extensions would require substantially more work than this proposal, and still might not be appreciated by package developers who would prefer not to have to maintain multiple build environments and build multiple wheels in order to cover all the common Linux distributions. Therefore we consider such proposals to be out-of-scope for this PEP. References ========== .. [1] PEP 425 -- Compatibility Tags for Built Distributions (https://www.python.org/dev/peps/pep-0425/) .. [2] Enthought Canopy Python Distribution (https://store.enthought.com/downloads/) .. [3] Continuum Analytics Anaconda Python Distribution (https://www.continuum.io/downloads) .. [4] manylinux1 docker image (https://quay.io/repository/manylinux/manylinux) .. [5] auditwheel (https://pypi.python.org/pypi/auditwheel) Copyright ========= This document has been placed into the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: