diff --git a/pep-0513.txt b/pep-0513.txt new file mode 100644 index 000000000..ee89e6d64 --- /dev/null +++ b/pep-0513.txt @@ -0,0 +1,346 @@ +PEP: 513 +Title: A Platform Tag for Portable Linux Built Distributions +Version: $Revision$ +Last-Modified: $Date$ +Author: Robert T. McGibbon , Nathaniel J. Smith +BDFL-Delegate: Nick Coghlan +Status: Draft +Type: Informational +Content-Type: text/x-rst +Created: 19-Jan-2016 +Post-History: 19-Jan-2016 + + +Abstract +======== + +This PEP proposes the creation of a new platform tag for Python package built +distributions, such as wheels, called ``manylinux1_{x86_64,i386}`` with +external dependencies limited restricted to a standardized subset of +the Linux kernel and core userspace ABI. It proposes that PyPI support +uploading and distributing Wheels with this platform tag, and that ``pip`` +support downloading and installing these packages on compatible platforms. + + +Rationale +========= + +Currently, distribution of binary Python extensions for Windows and OS X is +straightforward. Developers and packagers build wheels, which are assigned +platform tags such as ``win32`` or ``macosx_10_6_intel``, and upload these +wheels to PyPI. Users can download and install these wheels using tools such +as ``pip``. + +For Linux, the situation is much more delicate. In general, compiled Python +extension modules built on one Linux distribution will not work on other Linux +distributions, or even on the same Linux distribution with different system +libraries installed. + +Build tools using PEP 425 platform tags [1]_ do not track information about the +particular Linux distribution or installed system libraries, and instead assign +all wheels the too-vague ``linux_i386`` or ``linux_x86_64`` tags. Because of +this ambiguity, there is no expectation that ``linux``-tagged built +distributions compiled on one machine will work properly on another, and for +this reason, PyPI has not permitted the uploading of wheels for Linux. + +It would be ideal if wheel packages could be compiled that would work on *any* +linux system. But, because of the incredible diversity of Linux systems -- from +PCs to Android to embedded systems with custom libcs -- this cannot +be guaranteed in general. + +Instead, we define a standard subset of the kernel+core userspace ABI that, +in practice, is compatible enough that packages conforming to this standard +will work on *many* linux systems, including essentially all of the desktop +and server distributions in common use. We know this because there are +companies who have been distributing such widely-portable pre-compiled Python +extension modules for Linux -- e.g. Enthought with Canopy [2]_ and Continuum +Analytics with Anaconda [3]_. + +Building on the compability lessons learned from these companies, we thus +define a baseline ``manylinux1`` platform tag for use by binary Python +wheels, and introduce the implementation of preliminary tools to aid in the +construction of these ``manylinux1`` wheels. + + +Key Causes of Inter-Linux Binary Incompatibility +================================================ + +To properly define a standard that will guarantee that wheel packages meeting +this specification will operate on *many* linux platforms, it is necessary to +understand the root causes which often prevent portability of pre-compiled +binaries on Linux. The two key causes are dependencies on shared libraries +which are not present on users' systems, and dependencies on particular +versions of certain core libraries like ``glibc``. + + +External Shared Libraries +------------------------- + +Most desktop and server linux distributions come with a system package manager +(examples include ``APT`` on Debian-based systems, ``yum`` on +``RPM``-based systems, and ``pacman`` on Arch linux) that manages, among other +responsibilities, the installation of shared libraries installed to system +directories such as ``/usr/lib``. Most non-trivial Python extensions will depend +on one or more of these shared libraries, and thus function properly only on +systems where the user has the proper libraries (and the proper +versions thereof), either installed using their package manager, or installed +manually by setting certain environment variables such as ``LD_LIBRARY_PATH`` +to notify the runtime linker of the location of the depended-upon shared +libraries. + + +Versioning of Core Shared Libraries +----------------------------------- + +Even if author or maintainers of a Python extension module with to use no +external shared libraries, the modules will generally have a dynamic runtime +dependency on the GNU C library, ``glibc``. While it is possible, statically +linking ``glibc`` is usually a bad idea because of bloat, and because certain +important C functions like ``dlopen()`` cannot be called from code that +statically links ``glibc``. A runtime shared library dependency on a +system-provided ``glibc`` is unavoidable in practice. + +The maintainers of the GNU C library follow a strict symbol versioning scheme +for backward compatibility. This ensures that binaries compiled against an older +version of ``glibc`` can run on systems that have a newer ``glibc``. The +opposite is generally not true -- binaries compiled on newer Linux +distributions tend to rely upon versioned functions in glibc that are not +available on older systems. + +This generally prevents built distributions compiled on the latest Linux +distributions from being portable. + + +The ``manylinux1`` policy +========================= + +For these reasons, to achieve broad portability, Python wheels + + * should depend only on an extremely limited set of external shared + libraries; and + * should depend only on ``old`` symbol versions in those external shared + libraries. + +The ``manylinux1`` policy thus encompasses a standard for what the +permitted external shared libraries a wheel may depend on, and the maximum +depended-upon symbol versions therein. + +The permitted external shared libraries are: :: + + libpanelw.so.5 + libncursesw.so.5 + libgcc_s.so.1 + libstdc++.so.6 + libm.so.6 + libdl.so.2 + librt.so.1 + libcrypt.so.1 + libc.so.6 + libnsl.so.1 + libutil.so.1 + libpthread.so.0 + libX11.so.6 + libXext.so.6 + libXrender.so.1 + libICE.so.6 + libSM.so.6 + libGL.so.1 + libgobject-2.0.so.0 + libgthread-2.0.so.0 + libglib-2.0.so.0 + +On Debian-based systems, these libraries are provided by the packages :: + + libncurses5 libgcc1 libstdc++6 libc6 libx11-6 libxext6 + libxrender1 libice6 libsm6 libgl1-mesa-glx libglib2.0-0 + +On RPM-based systems, these libraries are provided by the packages :: + + ncurses libgcc libstdc++ glibc libXext libXrender + libICE libSM mesa-libGL glib2 + +This list was compiled by checking the external shared library dependencies of +the Canopy [1]_ and Anaconda [2]_ distributions, which both include a wide array +of the most popular Python modules and have been confirmed in practice to work +across a wide swath of Linux systems in the wild. + +For dependencies on externally-provided versioned symbols in the above shared +libraries, the following symbol versions are permitted: :: + + GLIBC <= 2.5 + CXXABI <= 3.4.8 + GLIBCXX <= 3.4.9 + GCC <= 4.2.0 + +These symbol versions were determined by inspecting the latest symbol version +provided in the libraries distributed with CentOS 5, a Linux distribution +released in April 2007. In practice, this means that Python wheels which conform +to this policy should function on almost any linux distribution released after +this date. + + +Compilation and Tooling +======================= + +To support the compilation of wheels meeting the ``manylinux1`` standard, we +provide initial drafts of two tools. + +The first is a Docker image based on CentOS 5.11, which is recommended as an +easy to use self-contained build box for compiling ``manylinux1`` wheels [4]_. +Compiling on a more recently-released linux distribution will generally +introduce dependencies on too-new versioned symbols. The image comes with a +full compiler suite installed (``gcc``, ``g++``, and ``gfortran`` 4.8.2) as +well as the latest releases of Python and pip. + +The second tool is a command line executable called ``auditwheel`` [5]_. First, +it inspects all of the ELF files inside a wheel to check for dependencies on +versioned symbols or external shared libraries, and verifies conformance with +the ``manylinux1`` policy. This includes the ability to add the new platform +tag to conforming wheels. + +In addition, ``auditwheel`` has the ability to automatically modify wheels that +depend on external shared libraries by copying those shared libraries from +the system into the wheel itself, and modifying the appropriate RPATH entries +such that these libraries will be picked up at runtime. This accomplishes a +similar result as if the libraries had been statically linked without requiring +changes to the build system. + +Neither of these tools are necessary to build wheels which conform with the +``manylinux1`` policy. Similar results can usually be achieved by statically +linking external dependencies and/or using certain inline assembly constructs +to instruct the linker to prefer older symbol versions, however these tricks +can be quite esoteric. + + +Platform Detection for Installers +================================= + +Because the ``manylinux1`` profile is already known to work for the many +thousands of users of popular commercial Python distributions, we suggest that +installation tools like ``pip`` should error on the side of assuming that a +system *is* compatible, unless there is specific reason to think otherwise. + +We know of three main sources of potential incompatibility that are likely to +arise in practice: + +* A linux distribution that is too old (e.g. RHEL 4) +* A linux distribution that does not use glibc (e.g. Alpine Linux, which is + based on musl libc, or Android) +* Eventually, in the future, there may exist distributions that break + compatibility with this profile + +To handle the first two cases, we propose the following simple and reliable +check: :: + + def have_glibc_version(major, minimum_minor): + import ctypes + + process_namespace = ctypes.CDLL(None) + try: + gnu_get_libc_version = process_namespace.gnu_get_libc_version + except AttributeError: + # We are not linked to glibc. + return False + + gnu_get_libc_version.restype = ctypes.c_char_p + version_str = gnu_get_libc_version() + # py2 / py3 compatibility: + if not isinstance(version_str, str): + version_str = version_str.decode("ascii") + + version = [int(piece) for piece in version_str.split(".")] + assert len(version) == 2 + if major != version[0]: + return False + if minimum_minor > version[1]: + return False + return True + + # CentOS 5 uses glibc 2.5. + is_manylinux1_compatible = have_glibc_version(2, 5) + +To handle the third case, we propose the creation of a file +``/etc/python/compatibility.cfg`` in ConfigParser format, with sample +contents: :: + + [manylinux1] + compatible = true + +where the supported values for the ``manylinux1.compatible`` entry are the +same as those supported by the ConfigParser ``getboolean`` method. + +The proposed logic for ``pip`` or related tools, then, is: + +0) If ``distutils.util.get_platform()`` does not start with the string + ``"linux"``, then assume the current system is not ``manylinux1`` + compatible. +1) If ``/etc/python/compatibility.conf`` exists and contains a ``manylinux1`` + key, then trust that. +2) Otherwise, if ``have_glibc_version(2, 5)`` returns true, then assume the + current system can handle ``manylinux1`` wheels. +3) Otherwise, assume that the current system cannot handle ``manylinux1`` + wheels. + + +Security Implications +===================== + +One of the advantages of dependencies on centralized libraries in Linux is +that bugfixes and security updates can be deployed system-wide, and +applications which depend on on these libraries will automatically feel the +effects of these patches when the underlying libraries are updated. This can +be particularly important for security updates in packages communication +across the network or cryptography. + +``manylinux1`` wheels distributed through PyPI that bundle security-critical +libraries like OpenSSL will thus assume responsibility for prompt updates in +response disclosed vulnerabilities and patches. This closely parallels the +security implications of the distribution of binary wheels on Windows that, +because the platform lacks a system package manager, generally bundle their +dependencies. In particular, because its lacks a stable ABI, OpenSSL cannot be +included in the ``manylinux1`` profile. + + +Rejected Alternatives +===================== + +One alternative would be to provide separate platform tags for each Linux +distribution (and each version thereof), e.g. ``RHEL6``, ``ubuntu14_10``, +``debian_jessie``, etc. Nothing in this proposal rules out the possibility of +adding such platform tags in the future, or of further extensions to wheel +metadata that would allow wheels to declare dependencies on external +system-installed packages. However, such extensions would require substantially +more work than this proposal, and still might not be appreciated by package +developers who would prefer not to have to maintain multiple build environments +and build multiple wheels in order to cover all the common Linux distributions. +Therefore we consider such proposals to be out-of-scope for this PEP. + + +References +========== + +.. [1] PEP 425 -- Compatibility Tags for Built Distributions + (https://www.python.org/dev/peps/pep-0425/) +.. [2] Enthought Canopy Python Distribution + (https://store.enthought.com/downloads/) +.. [3] Continuum Analytics Anaconda Python Distribution + (https://www.continuum.io/downloads) +.. [4] manylinux1 docker image + (https://quay.io/repository/manylinux/manylinux) +.. [5] auditwheel + (https://pypi.python.org/pypi/auditwheel) + +Copyright +========= + +This document has been placed into the public domain. + +.. + + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: