PEP 611: relax hard limits and make 1 million a soft limit. Include recursion depth in limits. (GH-1279)

* PEP 611: relax hard limits and make 1 million a soft limit. Include recursion depth in limits.

* Reorganize part of motivation to account for soft vs. hard limits. Add qualification about recursion depth.
This commit is contained in:
Mark Shannon 2020-01-29 11:31:28 +00:00 committed by GitHub
parent 2aea60e07c
commit a65ee4d2ad
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 77 additions and 33 deletions

View File

@ -10,7 +10,8 @@ Post-History:
Abstract
========
This PR proposes a limit of one million (1 000 000) for various aspects of Python code and its implementation.
This PR proposes a soft limit of one million (1 000 000), and a larger hard limit
for various aspects of Python code and its implementation.
The Python language does not specify limits for many of its features.
Not having any limit to these values seems to enhance programmer freedom,
@ -20,6 +21,8 @@ astronomical, which is expensive.
This PR lists a number of features which are to have a limit of one million.
For CPython the hard limit will be eight million (8 000 000).
Motivation
==========
@ -77,17 +80,6 @@ and only very rare problems for code generators.
One million
-----------
The Java Virtual Machine (JVM) [1]_ specifies a limit of 2\ :sup:`16`-1 (65535) for many program
elements similar to those covered here.
This limit enables limited values to fit in 16 bits, which is a very efficient machine representation.
However, this limit is quite easily exceeded in practice by code generators and
the author is aware of existing Python code that already exceeds 2\ :sup:`16` lines of code.
A limit of one million fits into 20 bits which, although not as convenient for machine representation,
is still reasonably compact. Three signed values in the range -1000_000 to +1000_000 can fit into a 64 bit word.
A limit of one million is small enough for efficiency advantages (only 20 bits),
but large enough not to impact users (no one has ever written a module of one million lines).
The value "one million" is very easy to remember.
The one million limit is mostly a limit on human generated code, not runtime sizes.
@ -95,28 +87,65 @@ The one million limit is mostly a limit on human generated code, not runtime siz
One million lines in a single module is a ridiculous concentration of code;
the entire Python standard library is about 2/3rd of a million lines, spread over 1600 files.
The Java Virtual Machine (JVM) [1]_ specifies a limit of 2\ :sup:`16`-1 (65535) for many program
elements similar to those covered here.
This limit enables limited values to fit in 16 bits, which is a very efficient machine representation.
However, this limit is quite easily exceeded in practice by code generators and
the author is aware of existing Python code that already exceeds 2\ :sup:`16` lines of code.
The hard limit of eight million fits into 23 bits which, although not as convenient for machine representation,
is still reasonably compact.
A limit of eight million is small enough for efficiency advantages (only 23 bits),
but large enough not to impact users (no one has ever written a module that large).
While it is possible that generated code could exceed the limit,
it is easy for a code generator to modify its output to conform.
The author has hit the 64K limit in the JVM on at least two occasions when generating Java code.
The workarounds were relatively straightforward and wouldn't
have been necessary with a limit of one million bytecodes or lines of code.
Where necessary, the soft limit can increased for those programs that exceed the one million limit.
Having a soft limit of one million provides a warning of problematic code, without causing an error and forcing an immediate fix.
It also allows dynamic optimizers to use more compact formats without inline checks.
Specification
=============
This PR proposes that the following language features and runtime values be limited to one million.
This PR proposes that the following language features and runtime values have a soft limit of one million.
* The number of source code lines in a module
* The number of bytecode instructions in a code object.
* The sum of local variables and stack usage for a code object.
* The number of classes in a running interpreter.
* The recursion depth of Python code.
It is likely that memory constraints would be a limiting factor before the number of classes reaches one million.
Reducing the limits
-------------------
Recursion depth
---------------
An API will be provided to modify the above limits at runtime, but the limits may not be raised above one million.
The recursion depth limit only applies to pure Python code. Code written in a foreign language, such as C,
may consume hardware stack and thus be limited to a recursion depth of a few thousand.
It is expected that implementations will raise an exception should the hardware stack get close to its limit.
For code that mixes Python and C calls, it is most likely that the hardware limit will apply first.
The size of the hardware recursion may vary at runtime and will not be visible.
Soft and hard limits
====================
Implementations should emit a warning whenever a soft limit is exceeded, unless the hard limit has the same value as the soft limit.
When a hard limit is exceeded, then an exception should be raised.
Depending on the implementation, different hard limits might apply. In some cases the hard limit might be below the soft limit.
For example, many micropython ports are unlikely to be able to support such large limits.
Introspecting and modifying the limits
--------------------------------------
One or more functions will be provided in the ``sys`` module to introspect or modify the soft limits at runtime,
but the limits may not be raised above the hard limit.
Inferred limits
---------------
@ -136,15 +165,14 @@ Line of code in a module and code object restrictions.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When compiling source code to bytecode or modifying bytecode for profiling or debugging,
an intermediate form is required. By limiting operands to 20 bits,
an intermediate form is required. By limiting operands to 23 bits,
instructions can be represented in a compact 64 bit form allowing
very fast passes over the instruction sequence.
Having 20 bit operands (21 bits for relative branches) allows instructions
Having 23 bit operands (24 bits for relative branches) allows instructions
to fit into 32 bits without needing additional ``EXTENDED_ARG`` instructions.
This improves dispatch, as the operand is strictly local to the instruction.
Using super-instructions would make that the 32 bit format
almost as compact as the 16 bit format, and significantly faster.
It is unclear whether this would help performance, it is merely an example of what is possible.
The benefit of restricting the number of lines in a module is primarily the implied limit on bytecodes.
It is more important for implementations that it is instructions per code object, not lines per module, that is limited to one million,
@ -190,32 +218,49 @@ It is anticipated that CPython will enforce the limits as follows:
* The sum of local variables and stack usage for a code object: 3.9 onward.
* The number of classes in a running interpreter: probably 3.10 onward, maybe warning in 3.9.
Backwards Compatibility
=======================
Hard limits in CPython
======================
CPython will enforce a hard limit on all the above values. The value of the hard limit will be 8 million.
It is hypothetically possible that some machine generated code exceeds one or more of the above limits.
The author believes that to be highly unlikely and easily fixed by modifying the output stage of the code generator.
The author believes that to be incredibly unlikely and easily fixed by modifying the output stage of the code generator.
We would like to gain the benefit from the above limits for performance as soon as possible.
To that end, CPython will start applying limits from version 3.9 onward.
To ease the transition and minimize breakage, the initial limits would be much larger than one million
and will be gradually reduced to one million over several versions.
To ease the transition and minimize breakage, the initial limits will be 16 million, reducing to 8 million in a later version.
The actual limits enforced by CPython will be:
Backwards Compatibility
=======================
The actual hard limits enforced by CPython will be:
============= ===============
Version Limit
Version Hard limit
============= ===============
3.9 8 million
3.10 4 million
3.11 2 million
3.12 onward 1 million
3.9 16 million
3.10 onward 8 million
============= ===============
Given the rarity of code generators that would exceed the one million limits,
and the environments in which they are typically used, it seems reasonable
to start issuing warnings in 3.9 if any limited quantity exceeds one million.
Historically the recursion limit has been set at 1000. To avoid breaking code that implicitly relies on the value being small,
the soft recursion limit will be increased gradually, as follows:
============= ===============
Version Soft limit
============= ===============
3.9 4 000
3.10 16 000
3.11 64 000
3.12 125 000
3.13 1 million
============= ===============
The hard limit will be set to 8 million immediately.
Other implementations
=====================
@ -232,7 +277,6 @@ Special purpose implementations
-------------------------------
Special purpose implementations may use lower limits, as long as they are clearly documented.
For example, Jython might need to use a lower class limit of fifty or sixty thousand because of JVM limits.
An implementation designed for embedded systems, for example MicroPython, might impose limits as low as a few thousand.
Security Implications
@ -249,7 +293,7 @@ None, as yet. This will be implemented in CPython, once the PEP has been accepte
Rejected Ideas
==============
Being able to modify the limits upwards at compile time was suggested by Tal Einat.
Being able to modify the hard limits upwards at compile time was suggested by Tal Einat.
This is rejected as the current limits of 2\ :sup:`32` have not been an issue, and the practical
advantages of allowing limits between 2\ :sup:`20` and 2\ :sup:`32` seem slight compared to the additional
code complexity of supporting such a feature.