python-peps/pep-0611.rst

PEP: 611
Title: The one million limit
Author: Mark Shannon <mark@hotpy.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 05-Dec-2019
Post-History:


Abstract
========
This PR proposes a limit of one million (1 000 000) for various aspects of Python code and its implementation.

The Python language does not specify limits for many of its features.
Not having any limit to these values seems to enhance programmer freedom,
at least superficially, but in practice the CPython VM and other Python virtual
machines have implicit limits or are forced to assume that the limits are
astronomical, which is expensive.

This PR lists a number of features which are to have a limit of one million.

Motivation
==========

There are many values that need to be represented in a virtual machine.
If no limit is specified for these values,
then the representation must either be inefficient or vulnerable to overflow.
The CPython virtual machine represents values like line numbers,
stack offsets and instruction offsets by 32 bit values. This is inefficient, and potentially unsafe.

It is inefficient as actual values rarely need more than a dozen or so bits to represent them.

It is unsafe as malicious or poorly generated code could cause values to exceed 2\ :sup:`32`.

For example, line numbers are represented by 32 bit values internally.
This is inefficient, given that modules almost never exceed a few thousand lines.
Despite being inefficient, is is still vulnerable to overflow as
it is easy for an attacker to created a module with billions of newline characters.

Memory access is usually a limiting factor in the performance of modern CPUs.
Better packing of data structures enhances locality and reduces memory bandwidth,
at a modest increase in ALU usage (for shifting and masking).
Being able to safely store important values in 20 bits would allow memory savings
in several data structures including, but not limited to:

* Frame objects
* Object headers
* Code objects

There is also the potential for a more efficient instruction format, speeding up interpreter dispatch.

Is this a worthwhile trade off?
-------------------------------

The downside of any form of limit is that it might potentially make someone's job harder,
for example, it may be harder to write a code generator that keeps the size of modules to one million lines.
However, it is the author's opinion, having written many code generators,
that such a limit is extremely unlikely to be a problem in practice.

The upside of these limits is the freedom it grants implementers of runtimes, whether CPython,
PyPy, or any other implementation, to improve performance.
It is the author's belief, that the potential value of even a 0.1% reduction in the cost
of running Python programs globally will hugely exceed the cost of modifying a handful of code generators.

Rationale
=========

Imposing a limit on values such as lines of code in a module, and the number of local variables,
has significant advantages for ease of implementation and efficiency of virtual machines.
If the limit is sufficiently large, there is no adverse effect on users of the language.

By selecting a fixed but large limit for these values,
it is possible to have both safety and efficiency whilst causing no inconvenience to human programmers
and only very rare problems for code generators.

One million
-----------

The Java Virtual Machine (JVM) [1]_ specifies a limit of 2\ :sup:`16`-1 (65535) for many program
elements similar to those covered here.
This limit enables limited values to fit in 16 bits, which is a very efficient machine representation.
However, this limit is quite easily exceeded in practice by code generators and
the author is aware of existing Python code that already exceeds 2\ :sup:`16` lines of code.

A limit of one million fits into 20 bits which, although not as convenient for machine representation,
is still reasonably compact. Three signed values in the range -1000_000 to +1000_000 can fit into a 64 bit word.
A limit of one million is small enough for efficiency advantages (only 20 bits),
but large enough not to impact users (no one has ever written a module of one million lines).

The value "one million" is very easy to remember.

The one million limit is mostly a limit on human generated code, not runtime sizes.

One million lines in a single module is a ridiculous concentration of code;
the entire Python standard library is about 2/3rd of a million lines, spread over 1600 files.

While it is possible that generated code could exceed the limit,
it is easy for a code generator to modify its output to conform.
The author has hit the 64K limit in the JVM on at least two occasions when generating Java code.
The workarounds were relatively straightforward and wouldn't
have been necessary with a limit of one million bytecodes or lines of code.

Specification
=============

This PR proposes that the following language features and runtime values be limited to one million.

* The number of source code lines in a module
* The number of bytecode instructions in a code object.
* The sum of local variables and stack usage for a code object.
* The number of classes in a running interpreter.

It is likely that memory constraints would be a limiting factor before the number of classes reaches one million.

Reducing the limits
-------------------

An API will be provided to modify the above limits at runtime, but the limits may not be raised above one million.

Inferred limits
---------------

These limits are not part of the specification, but a limit of less than one million
can be inferred from the limit on the number of bytecode instructions in a code object.
Because there would be insufficient instructions to load more than
one million constants or use more than one million names.

* The number of distinct names in a code object.
* The number of constants in a code object.

The advantages for CPython of imposing these limits:
----------------------------------------------------

Line of code in a module and code object restrictions.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When compiling source code to bytecode or modifying bytecode for profiling or debugging,
an intermediate form is required. By limiting operands to 20 bits,
instructions can be represented in a compact 64 bit form allowing
very fast passes over the instruction sequence.

Having 20 bit operands (21 bits for relative branches) allows instructions
to fit into 32 bits without needing additional ``EXTENDED_ARG`` instructions.
This improves dispatch, as the operand is strictly local to the instruction.
Using super-instructions would make that the 32 bit format
almost as compact as the 16 bit format, and significantly faster.

The benefit of restricting the number of lines in a module is primarily the implied limit on bytecodes.
It is more important for implementations that it is instructions per code object, not lines per module, that is limited to one million,
but it is much easier to explain a one million line limit. Having a consistent limit of one million is just easier to remember.
It is mostly likely, although not guaranteed, that the line limit will be hit first and thus provide a simpler to understand error message to the developer.

Total number of classes in a running interpreter
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This limit has to the potential to reduce the size of object headers considerably.

Currently objects have a two word header, for objects without references
(int, float, str, etc.) or a four word header for objects with references.
By reducing the maximum number of classes, the space for the class reference
can be reduced from 64 bits to fewer than 32 bits allowing a much more compact header.

For example, a super-compact header format might look like this:

.. code-block::

    struct header {
        uint32_t gc_flags:6; /* Needs finalisation, might be part of a cycle, etc. */
        uint32_t class_id:26; /* Can be efficiently mapped to address by ensuring suitable alignment of classes */
        uint32_t refcount; /* Limited memory or saturating */
    }

This format would reduce the size of a Python object without slots, on a 64 bit machine, from 40 to 16 bytes.

Note that there are two ways to use a 32 bit refcount on a 64 bit machine.
One is to limit each sub-interpreter to 32Gb of memory.
The other is to use a saturating reference count, which would be a little bit slower, but allow unlimited memory allocation.

Enforcement
-----------

Python implementations are not obliged to enforce the limits.
However, if a limit can be enforced without hurting performance, then it should be.

It is anticipated that CPython will enforce the limits as follows:

* The number of source code lines in a module: version 3.9 onward.
* The number of bytecode instructions in a code object: 3.9 onward.
* The sum of local variables and stack usage for a code object: 3.9 onward.
* The number of classes in a running interpreter: probably 3.10 onward, maybe warning in 3.9.

Backwards Compatibility
=======================

It is hypothetically possible that some machine generated code exceeds one or more of the above limits.
The author believes that to be highly unlikely and easily fixed by modifying the output stage of the code generator.

We would like to gain the benefit from the above limits for performance as soon as possible.
To that end, CPython will start applying limits from version 3.9 onward.
To ease the transition and minimize breakage, the initial limits would be much larger than one million
and will be gradually reduced to one million over several versions.

The actual limits enforced by CPython will be:

=============  ===============
   Version      Limit
=============  ===============
 3.9           8 million
 3.10          4 million
 3.11          2 million
 3.12 onward   1 million
=============  ===============

Given the rarity of code generators that would exceed the one million limits,
and the environments in which they are typically used, it seems reasonable
to start issuing warnings in 3.9 if any limited quantity exceeds one million.

Other implementations
=====================

Implementations of Python other than CPython have different purposes, so different limits might be appropriate.
This is acceptable, provided the limits are clearly documented.

General purpose implementations
-------------------------------

General purpose implementations, such as PyPy, should use the one million limit.
If maximum compatibility is a goal, then they should also follow CPython's behaviour for 3.9 to 3.11.

Special purpose implementations
-------------------------------

Special purpose implementations may use lower limits, as long as they are clearly documented.
For example, Jython might need to use a lower class limit of fifty or sixty thousand because of JVM limits.
An implementation designed for embedded systems, for example MicroPython, might impose limits as low as a few thousand.

Security Implications
=====================

Minimal. This reduces the attack surface of any Python virtual machine by a small amount.

Reference Implementation
========================

None, as yet. This will be implemented in CPython, once the PEP has been accepted.


Rejected Ideas
==============

Being able to modify the limits upwards at compile time was suggested by Tal Einat.
This is rejected as the current limits of 2\ :sup:`32` have not been an issue, and the practical
advantages of allowing limits between 2\ :sup:`20` and 2\ :sup:`32` seem slight compared to the additional
code complexity of supporting such a feature.


Open Issues
===========

None, as yet.


References
==========

.. [1] The Java Virtual Machine specification

https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End: