PEP: 421 Title: Adding sys.implementation Version: $Revision$ Last-Modified: $Date$ Author: Eric Snow Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 26-April-2012 Post-History: 26-April-2012 Abstract ======== This PEP introduces a new attribute for the ``sys`` module: ``sys.implementation``. The attribute holds consolidated information about the implementation of the running interpreter. Thus ``sys.implementation`` is the source to which the standard library may look for implementation-specific information. The proposal in this PEP is in line with a broader emphasis on making Python friendlier to alternate implementations. It describes the new variable and the constraints on what that variable contains. The PEP also explains some immediate use cases for ``sys.implementation``. Motivation ========== For a number of years now, the distinction between Python-the-language and CPython (the reference implementation) has been growing. Most of this change is due to the emergence of Jython, IronPython, and PyPy as viable alternate implementations of Python. Consider, however, the nearly two decades of CPython-centric Python (i.e. most of its existence). That focus had understandably contributed to quite a few CPython-specific artifacts both in the standard library and exposed in the interpreter. Though the core developers have made an effort in recent years to address this, quite a few of the artifacts remain. Part of the solution is presented in this PEP: a single namespace in which to consolidate implementation specifics. This will help focus efforts to differentiate the implementation specifics from the language. Additionally, it will foster a multiple-implementation mindset. Proposal ======== We will add a new attribute to the ``sys`` module, called ``sys.implementation``, as an instance of a new type to contain implementation-specific information. The attributes of this object will remain fixed during interpreter execution and through the course of an implementation version. This ensures behaviors don't change between versions which depend on variables in ``sys.implementation``. The object will have each of the attributes described in the `Required Variables`_ section below. Any other per-implementation values may be stored in ``sys.implementation.metadata``. However, nothing in the standard library will rely on ``sys.implementation.metadata``. Examples of possible metadata values are described in the `Example Metadata Values`_ section. This proposal takes a conservative approach in requiring only four variables. As more become appropriate, they may be added with discretion. Required Variables ------------------ These are variables in ``sys.implementation`` on which the standard library would rely, with the exception of ``metadata``, meaning implementers must define them: **name** This is the common name of the implementation (case sensitive). Examples include 'PyPy', 'Jython', 'IronPython', and 'CPython'. **version** This is the version of the implementation, as opposed to the version of the language it implements. This value conforms to the format described in `Version Format`_. **cache_tag** A string used for the PEP 3147 cache tag [#cachetag]_. It would normally be a composite of the name and version (e.g. 'cpython-33' for CPython 3.3). However, an implementation may explicitly use a different cache tag. If ``cache_tag`` is set to None, it indicates that module caching should be disabled. **metadata** Any other values that an implementation wishes to specify, particularly informational ones. Neither the standard library nor the language specification will rely on implementation metadata. Also see the list of `Example Metadata Values`_. Adding New Required Attributes ------------------------------ XXX PEP? something lighter? Version Format -------------- A main point of ``sys.implementation`` is to contain information that will be used internally in the standard library. In order to facilitate the usefulness of a version variable, its value should be in a consistent format across implementations. As such, the format of ``sys.implementation.version`` must follow that of ``sys.version_info``, which is effectively a named tuple. It is a familiar format and generally consistent with normal version format conventions. XXX The following is not exactly true: Keep in mind, however, that ``sys.implementation.version`` is the version of the Python *implementation*, while ``sys.version_info`` (and friends) is the version of the Python language. Example Metadata Values ----------------------- These are the sorts of values an implementation may put into ``sys.implementation.metadata``. However, these names and descriptions are only examples and are not being proposed here. If they later have meaningful uses cases, they can be added by following the process described in `Adding New Required Attributes`_. **vcs_url** The URL pointing to the main VCS repository for the implementation project. **vcs_revision_id** A value that identifies the VCS revision of the implementation that is currently running. **build_toolchain** Identifies the tools used to build the interpreter. **build_date** The timestamp of when the interpreter was built. **homepage** The URL of the implementation's website. **site_prefix** The preferred site prefix for this implementation. **runtime** The run-time environment in which the interpreter is running, as in "Common Language *Runtime*" (.NET CLR) or "Java *Runtime* Executable". **gc_type** The type of garbage collection used, like "reference counting" or "mark and sweep". Rationale ========= The status quo for implementation-specific information gives us that information in a more fragile, harder to maintain way. It is spread out over different modules or inferred from other information, as we see with `platform.python_implementation()`_. This PEP is the main alternative to that approach. It consolidates the implementation-specific information into a single namespace and makes explicit that which was implicit. Why a Custom Type? ------------------ A dedicated class, of which ``sys.implementation`` is an instance, would facilitate the dotted access of a "named" tuple. At the same time, it allows us to avoid the problems of the other approaches (see below), like confusion about ordering and iteration. The alternatives to a dictionary are considered separately here: **Dictionary** A dictionary reflects a simple namespace with item access. It maps names to values and that's all. It also reflects the more variable nature of ``sys.implementation``. However, a simple dictionary does not set expectations very well about the nature of ``sys.implementation``. The custom type approach, with a fixed set of required attributes, does a better job of this. **Named Tuple** Another close alternative is a namedtuple or a structseq or some other tuple type with dotted access (a la ``sys.version_info``). This type is immutable and simple. It is a well established pattern for implementation-specific variables in Python. Dotted access on a namespace is also very convenient. Fallback lookup may favor dicts:: cache_tag = sys.implementation.get('cache_tag') vs. cache_tag = getattr(sys.implementation.get, 'cache_tag', None) However, this is mitigated by having ``sys.implementation.metadata``. One problem with using a named tuple is that ``sys.implementation`` does not have meaning as a sequence. Also, unlike other similar ``sys`` variables, it has a far greater potential to change over time. If a named tuple were used, we'd be very clear in the documentation that the length and order of the value are not reliable. Iterability would not be guaranteed. **Module** Using a module instead of a dict is another option. It has similar characteristics to an instance, but with a slight hint of immutability (at least by convention). Such a module could be a stand-alone sub- module of ``sys`` or added on, like ``os.path``. Unlike a concrete class, no new type would be necessary. This is a pretty close fit to what we need. The downside is that the module type is much less conducive to extension, making it more difficult to address the weaknesses of using an instance of a concrete class. Why metadata? ------------- ``sys.implementation.metadata`` will hold any optional, strictly- informational, or per-implementation data. This allows us to restrict ``sys.implementation`` to the required attributes. In that way, its type can reflect the more stable namespace and ``sys.implementation.metadata`` (as a dict) can reflect the less certain namespace. ``sys.implementation.metadata`` is the place an implementation can put values that must be built-in, "without having to pollute the main sys namespace" [#Nick]_. Why a Part of ``sys``? ---------------------- The ``sys`` module should hold the new namespace because ``sys`` is the depot for interpreter-centric variables and functions. Many implementation-specific variables are already found in ``sys``. Why Strict Constraints on Any of the Values? -------------------------------------------- As already noted in `Version Format`_, values in ``sys.implementation`` are intended for use by the standard library. Constraining those values, essentially specifying an API for them, allows them to be used consistently, regardless of how they are otherwise implemented. Discussion ========== The topic of ``sys.implementation`` came up on the python-ideas list in 2009, where the reception was broadly positive [#original]_. I revived the discussion recently while working on a pure-python ``imp.get_tag()`` [#revived]_. Discussion has been ongoing [#feedback]_. The messages in `issue #14673`_ are also relevant. Use-cases ========= platform.python_implementation() -------------------------------- "explicit is better than implicit" The ``platform`` module determines the python implementation by looking for clues in a couple different ``sys`` variables [#guess]_. However, this approach is fragile, requiring changes to the standard library each time an implementation changes. Beyond that, support in ``platform`` is limited to those implementations that core developers have blessed by special-casing them in the ``platform`` module. With ``sys.implementation`` the various implementations would *explicitly* set the values in their own version of the ``sys`` module. Another concern is that the ``platform`` module is part of the stdlib, which ideally would minimize implementation details such as would be moved to ``sys.implementation``. Any overlap between ``sys.implementation`` and the ``platform`` module would simply defer to ``sys.implementation`` (with the same interface in ``platform`` wrapping it). Cache Tag Generation in Frozen Importlib ---------------------------------------- PEP 3147 defined the use of a module cache and cache tags for file names. The importlib bootstrap code, frozen into the Python binary as of 3.3, uses the cache tags during the import process. Part of the project to bootstrap importlib has been to clean code out of `Python/import.c`_ that did not need to be there any longer. The cache tag defined in ``Python/import.c`` was hard-coded to ``"cpython" MAJOR MINOR`` [#cachetag]_. For importlib the options are either hard-coding it in the same way, or guessing the implementation in the same way as does ``platform.python_implementation()``. As long as the hard-coded tag is limited to CPython-specific code, it is livable. However, inasmuch as other Python implementations use the importlib code to work with the module cache, a hard-coded tag would become a problem. Directly using the ``platform`` module in this case is a non-starter. Any module used in the importlib bootstrap must be built-in or frozen, neither of which apply to the ``platform`` module. This is the point that led to the recent interest in ``sys.implementation``. Regardless of the outcome for the implementation name used, another problem relates to the version used in the cache tag. That version is likely to be the implementation version rather than the language version. However, the implementation version is not readily identified anywhere in the standard library. Implementation-Specific Tests ----------------------------- Currently there are a number of implementation-specific tests in the test suite under ``Lib/test``. The test support module (`Lib/test/support.py`_) provides some functionality for dealing with these tests. However, like the ``platform`` module, ``test.support`` must do some guessing that ``sys.implementation`` would render unnecessary. Jython's ``os.name`` Hack ------------------------- In Jython, ``os.name`` is set to 'java' to accommodate special treatment of the java environment in the standard library [#os_name]_ [#javatest]_. Unfortunately it masks the os name that would otherwise go there. ``sys.implementation`` would help obviate the need for this special case. Feedback From Other Python Implementers ======================================= IronPython ---------- XXX Jython ------ XXX PyPy ---- XXX Past Efforts ============ PEP 3139 -------- This PEP from 2008 recommended a clean-up of the ``sys`` module in part by extracting implementation-specific variables and functions into a separate module. PEP 421 is a much lighter version of that idea. While PEP 3139 was rejected, its goals are reflected in PEP 421 to a large extent, though with a much lighter approach. PEP 399 ------- This informational PEP dictates policy regarding the standard library, helping to make it friendlier to alternate implementations. PEP 421 is proposed in that same spirit. Alternatives ============ Since the single-namespace-under-sys approach is relatively straightforward, no alternatives have been considered for this PEP. Open Issues =========== * What are the long-term objectives for ``sys.implementation``? - possibly pull in implementation details from the main ``sys`` namespace and elsewhere (PEP 3137 lite). * What is the process for introducing new required variables? PEP? * Is the ``sys.version_info`` format the right one here? * Should ``sys.implementation.hexversion`` be part of the PEP? * Does ``sys.(version|version_info|hexversion)`` need to better reflect the version of the language spec? Micro version, series, and release seem like implementation-specific values. * Alternatives to the approach dictated by this PEP? * Do we really want to commit to using a dict for ``sys.implementation``? Backward compatibility issues will make it difficult to change our minds later. The type we use ultimately depends on how general we expect the consumption of ``sys.implementation`` to be. If its practicality is oriented toward internal use then the data structure is not as critical. However, ``sys.implementation`` is intended to have a non-localized impact across the standard library and the interpreter. It is better to *not* make hacking it become an attractive nuisance, regardless of our intentions for usage. * Should ``sys.implementation`` and its values be immutable? A benefit of an immutable type is it communicates that the value is not expected to change and should not be manipulated. * Should ``sys.implementation`` be strictly disallowed to have methods? Classes often imply the presence (or possibility) of methods, which may be misleading in this case. * Should ``sys.implementation`` implement the collections.abc.Mapping interface? Implementation ============== The implementation of this PEP is covered in `issue #14673`_. References ========== .. [#original] The 2009 sys.implementation discussion: http://mail.python.org/pipermail/python-dev/2009-October/092893.html .. [#revived] The initial 2012 discussion: http://mail.python.org/pipermail/python-ideas/2012-April/014878.html .. [#feedback] Feedback on the PEP: http://mail.python.org/pipermail/python-ideas/2012-April/014954.html .. [#guess] The ``platform`` code which divines the implementation name: http://hg.python.org/cpython/file/2f563908ebc5/Lib/platform.py#l1247 .. [#cachetag] The definition for cache tags in PEP 3147: http://www.python.org/dev/peps/pep-3147/#id53 .. [#tag_impl] The original implementation of the cache tag in CPython: http://hg.python.org/cpython/file/2f563908ebc5/Python/import.c#l121 .. [#tests] Examples of implementation-specific handling in test.support: * http://hg.python.org/cpython/file/2f563908ebc5/Lib/test/support.py#l509 * http://hg.python.org/cpython/file/2f563908ebc5/Lib/test/support.py#l1246 * http://hg.python.org/cpython/file/2f563908ebc5/Lib/test/support.py#l1252 * http://hg.python.org/cpython/file/2f563908ebc5/Lib/test/support.py#l1275 .. [#os_name] The standard library entry for os.name: http://docs.python.org/3.3/library/os.html#os.name .. [#javatest] The use of ``os.name`` as 'java' in the stdlib test suite. http://hg.python.org/cpython/file/2f563908ebc5/Lib/test/support.py#l512 .. [#Nick] Nick Coghlan's proposal for ``sys.implementation.metadata``: http://mail.python.org/pipermail/python-ideas/2012-May/014984.html .. _issue #14673: http://bugs.python.org/issue14673 .. _Lib/test/support.py: http://hg.python.org/cpython/file/2f563908ebc5/Lib/test/support.py .. _Python/import.c: http://hg.python.org/cpython/file/2f563908ebc5/Python/import.c Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: