Major simplification of PEP 504

- drop the submodule idea
- call random.ensure_repeatable() to opt in to the PRNG
- seed(), getstate(), setstate() all call ensure_repeatable()
This commit is contained in:
Nick Coghlan 2015-09-17 00:21:45 +10:00
parent 2702cd15b3
commit 107927ee7a
1 changed files with 137 additions and 104 deletions

View File

@ -24,22 +24,19 @@ Unfortunately, this approach has resulted in a situation where developers that
aren't aware that they're doing security sensitive work use the default module aren't aware that they're doing security sensitive work use the default module
level APIs, and thus expose their users to unnecessary risks. level APIs, and thus expose their users to unnecessary risks.
This isn't an acute problem, but it is a chronic one, and if documentation and This isn't an acute problem, but it is a chronic one, and the often long
developer education were going to solve it, they would have done so by now. delays between the introduction of security flaws and their exploitation means
that it is difficult for developers to naturally learn from experience.
In order to provide an eventually pervasive solution to the problem, this PEP In order to provide an eventually pervasive solution to the problem, this PEP
proposes that Python switch to using the system random number generator by proposes that Python switch to using the system random number generator by
default in Python 3.6, and require developers to opt-in to using the default in Python 3.6, and require developers to opt-in to using the
deterministic random number generator. deterministic random number generator process wide either by using a new
``random.ensure_repeatable()`` API, or by explicitly creating their own
``random.Random()`` instance.
To minimise the compatibility break, calling any of the following module level To minimise the impact on existing code, module level APIs that require
functions will count as opting in to using the deterministic random number determinism will implicitly switch to the deterministic PRNG.
generator for all future calls to module level functions in the random
module in the same process:
* ``random.seed``
* ``random.getstate``
* ``random.setstate``
Proposal Proposal
======== ========
@ -48,94 +45,130 @@ Currently, it is never correct to use the module level functions in the
``random`` module for security sensitive applications. This PEP proposes to ``random`` module for security sensitive applications. This PEP proposes to
change that admonition in Python 3.6+ to instead be that it is not correct to change that admonition in Python 3.6+ to instead be that it is not correct to
use the module level functions in the ``random`` module for security sensitive use the module level functions in the ``random`` module for security sensitive
applications if ``random.seed``, ``random.getstate``, or ``random.setstate`` applications if ``random.ensure_repeatable()`` is ever called (directly or
are ever called in that process. indirectly) in that process.
This PEP further proposes to make it easier to explicitly opt in to using To achieve this, rather than being bound methods of a ``random.Random``
either the system random number generator or Python's deterministic PRNG by instance as they are today, the module level callables in ``random`` would
converting the random module to a package that exposes the same top-level API, change to be functions that delegate to the corresponding method of the
and offering two new subpackages: existing ``random._inst`` module attribute.
* ``random.system`` By default, this attribute will be bound to a ``random.SystemRandom`` instance.
* ``random.seedable``
The ``random.system`` submodule would provide the following bound methods of a A new ``random.ensure_repeatable()`` API will then rebind the ``random._inst``
module global ``random.SystemRandom`` instance as module attributes: attribute to a ``system.Random`` instance, restoring the same module level
``betavariate``, ``choice``, ``expovariate``, ``gammavariate``, ``gauss``, ``getrandbits``, ``lognormvariate``, ``normalvariate``, ``paretovariate``, API behaviour as existed in previous Python versions (aside from the
``randint``, ``random``, ``randrange``, ``sample``, ``shuffle``, additional level of indirection)::
``triangular``, ``uniform``, ``vonmisesvariate``, ``weibullvariate``
The ``random.seedable`` submodule would provide the same operations, but as def ensure_repeatable():
methods of a ``random.Random`` instance. In addition, it would provide the """Switch to using random.Random() for the module level APIs"""
following additional methods which are only meaningful when using a if not isinstance(_inst, Random):
deterministic random number generator: ``seed``, ``getstate``, ``setstate``. _inst = random.Random()
Rather than being bound methods of a ``random.Random`` instance as they are To minimise the impact on existing code, calling any of the following module
today, the module level callables in ``random`` itself would change to be level functions will implicitly call ``random.ensure_repeatable()``:
functions that, by default, delegated to the ``random.SystemRandom`` instance
in ``random.system``.
Calling any one of ``random.seed``, ``random.getstate``, or ``random.setstate`` * ``random.seed``
would change the delegation to instead refer to the ``random.Random`` instance * ``random.getstate``
in ``random.seedable``. * ``random.setstate``
There are no changes proposed to the ``random.Random`` or
``random.SystemRandom`` class APIs - applications that explicitly instantiate
their own random number generators will be entirely unaffected by this
proposal.
Warning on implicit opt-in Warning on implicit opt-in
-------------------------- --------------------------
In Python 3.6, implicitly opting in to the use of the seedable PRNG will emit a In Python 3.6, implicitly opting in to the use of the deterministic PRNG will
deprecation warning. This warning will suggest explicitly opting in to either emit a deprecation warning using the following check::
the system RNG or the seedable PRNG. Possible wording:
"DeprecationWarning: Implicitly switching to the seedable PRNG. Consider if not isinstance(_inst, Random):
importing from random.system or random.seedable as appropriate" warnings.warn(DeprecationWarning,
"Implicitly ensuring repeatability. "
"See help(random.ensure_repeatable) for details")
ensure_repeatable()
Whatever precise wording is chosen should have an answer added to Stack The specific wording of the warning should have a suitable answer added to
Overflow as was done for the custom error message that was added for missing Stack Overflow as was done for the custom error message that was added for
parentheses in a call to print [#print]_. missing parentheses in a call to print [#print]_.
In the first Python 3 release after Python 2.7 switches to security fix only In the first Python 3 release after Python 2.7 switches to security fix only
mode, the deprecation warning will be upgraded to a RuntimeWarning so it is mode, the deprecation warning will be upgraded to a RuntimeWarning so it is
visible by default. visible by default.
This PEP does *not* propose removing the ability to seed the default RNG used This PEP does *not* propose ever removing the ability to ensure the default RNG
process wide - it's not a good idea relative to the alternative of explicitly used process wide is a deterministic PRNG that will produce the same series of
importing from the appropriate submodule (hence the eventually outputs given a specific seed. That capability is widely used in modelling
visible-by-default warning), but it's also a concern that can be more and simulation scenarios, and requiring that ``ensure_repeatable()`` be called
readily addressed on a project-by-project basis. either directly or indirectly is a sufficient enhancement to address the cases
where the module level random API is used for security sensitive tasks in web
applications without due consideration for the potential security implications
of using a deterministic PRNG.
Performance impact
------------------
Due to the large performance difference between ``random.Random`` and
``random.SystemRandom``, applications ported to Python 3.6 will encounter a
significant performance regression in cases where:
* the application is using the module level random API
* cryptographic quality randomness isn't needed
* the application doesn't already implicitly opt back in to the deterministic
PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate``
* the application isn't updated to explicitly call ``random.ensure_repeatable``
This would be noted in the Porting section of the Python 3.6 What's New guide,
with the recommendation to include the following code in the ``__main__``
module of affected applications::
if hasattr(random, "ensure_repeatable"):
random.ensure_repeatable()
Applications that do need cryptographic quality randomness should be using the
system random number generator regardless of speed considerations, so in those
cases the change proposed in this PEP will fix a previously latent security
defect.
Documentation changes Documentation changes
--------------------- ---------------------
The ``random`` module documentation would be updated to move the documentation The ``random`` module documentation would be updated to move the documentation
of the ``seed``, ``getstate`` and ``setstate`` interfaces later in the module, of the ``seed``, ``getstate`` and ``setstate`` interfaces later in the module,
along with the associated security warning. along with the documentation of the new ``ensure_repeatable`` function and the
associated security warning.
The docs would gain a discussion of the respective use cases for the seedable That section of the module documentation would also gain a discussion of the
PRNG (games, modelling & simulation, software testing) and the system RNG respective use cases for the deterministic PRNG enabled by
(cryptography, security token generation). ``ensure_repeatable`` (games, modelling & simulation, software testing) and the
system RNG that is used by default (cryptography, security token generation).
This discussion will also recommend the use of third party security libraries
for the latter task.
Rationale Rationale
========= =========
Writing secure software under deadline and budget pressures is a hard problem. Writing secure software under deadline and budget pressures is a hard problem.
This is reflected in ongoing problems with data breaches involving personally This is reflected in regular notifications of data breaches involving personally
identifiable information [#breaches]_, as well as with failures to take identifiable information [#breaches]_, as well as with failures to take
security considerations into account when new systems, like motor vehicles security considerations into account when new systems, like motor vehicles
[#uconnect]_, are connected to the internet. Compounding the issue is the fact [#uconnect]_, are connected to the internet. It's also the case that a lot of
that a lot of the programming advice readily available on the internet [#search] the programming advice readily available on the internet [#search] simply
simply doesn't take the mathemetical arcana of computer security into account, doesn't take the mathemetical arcana of computer security into account.
and the fact that defenders have to cover *all* of their potential Compounding these issues is the fact that defenders have to cover *all* of
vulnerabilites, as a single mistake can make it possible to subvert other their potential vulnerabilites, as a single mistake can make it possible to
defences [#bcrypt]_. subvert other defences [#bcrypt]_.
One of the factors that contributes to making this last aspect particularly One of the factors that contributes to making this last aspect particularly
difficult is APIs where using them inappropriately creates a *silent* security difficult is APIs where using them inappropriately creates a *silent* security
failure - one where the only way to find out that what you're doing is failure - one where the only way to find out that what you're doing is
incorrect is for someone reviewing your code to say "that's a potential incorrect is for someone reviewing your code to say "that's a potential
security problem", or for a system you're responsible for to be compromised security problem", or for a system you're responsible for to be compromised
through such an oversight (and your intrusion detection and auditing mechanisms through such an oversight (and you're not only still responsible for that
are good enough for you to be able to figure out after the event how the system when it is compromised, but your intrusion detection and auditing
compromise took place). mechanisms are good enough for you to be able to figure out after the event
how the compromise took place).
This kind of situation is a significant contributor to "security fatigue", This kind of situation is a significant contributor to "security fatigue",
where developers (often rightly [#owasptopten]_) feel that security engineers where developers (often rightly [#owasptopten]_) feel that security engineers
@ -151,8 +184,8 @@ threats, and less time fighting with default language behaviours.
Discussion Discussion
========== ==========
Why "seedable" over "deterministic"? Why "ensure_repeatable" over "ensure_deterministic"?
------------------------------------ ----------------------------------------------------
This is a case where the meaning of a word as specialist jargon conflicts with This is a case where the meaning of a word as specialist jargon conflicts with
the typical meaning of the word, even though it's *technically* the same. the typical meaning of the word, even though it's *technically* the same.
@ -163,16 +196,17 @@ future states.
The problem is that "deterministic" on its own doesn't convey those qualifiers, The problem is that "deterministic" on its own doesn't convey those qualifiers,
so it's likely to instead be interpreted as "predictable" or "not random" by so it's likely to instead be interpreted as "predictable" or "not random" by
folks that aren't familiar with the technical meaning. folks that are familiar with the conventional meaning, but aren't familiar with
the additional qualifiers on the technical meaning.
The other problem with "deterministic" as a description for the traditional RNG A second problem with "deterministic" as a description for the traditional RNG
is that it doesn't tell you what you can *do* with the traditional RNG that you is that it doesn't really tell you what you can *do* with the traditional RNG
can't do with the system one. that you can't do with the system one.
"seedable" aims to address both those problems, as it doesn't have a misleading "ensure_repeatable" aims to address both of those problems, as its common
common meaning, and it's a word form that means "you can seed this", which then meaning accurately describes the main reason for preferring the deterministic
leads naturally into an exploration of what it means to "seed" a random number PRNG over the system RNG: ensuring you can repeat the same series of outputs
generator. by providing the same seed value, or by restoring a previously saved PRNG state.
Only changing the default for Python 3.6+ Only changing the default for Python 3.6+
----------------------------------------- -----------------------------------------
@ -184,9 +218,9 @@ change to all currently supported versions of Python.
The difference in this case is one of degree - the additional benefits from The difference in this case is one of degree - the additional benefits from
rolling out this particular change a couple of years earlier than will rolling out this particular change a couple of years earlier than will
otherwise be the case aren't sufficient to justify the additional effort and otherwise be the case aren't sufficient to justify either the additional effort
stability risks involved in making such an intrusive change in a maintenance or the stability risks involved in making such an intrusive change in a
release. maintenance release.
Keeping the module level functions Keeping the module level functions
---------------------------------- ----------------------------------
@ -198,18 +232,24 @@ of the current ``random`` module API. Accordingly, this proposal ensures that
most of the public API can continue to be used not only without modification, most of the public API can continue to be used not only without modification,
but without generating any new warnings. but without generating any new warnings.
Implicitly opting in to the deterministic RNG Warning when implicitly opting in to the deterministic RNG
--------------------------------------------- ----------------------------------------------------------
Python is widely used for modelling and simulation purposes, and in many cases, It's necessary to implicitly opt in to the deterministic PRNG as Python is
these software models won't have a dedicated maintenance team tasked with widely used for modelling and simulation purposes where this is the right
ensuing they keep working on the latest versions of Python. thing to do, and in many cases, these software models won't have a dedicated
maintenance team tasked with ensuring they keep working on the latest versions
of Python.
Unfortunately, explicitly calling ``random.seed`` with data from ``os.urandom``
is also a mistake that appears in a number of the flawed "how to generate a
security token in Python" guides readily available online.
Using first DeprecationWarning, and then eventually a RuntimeWarning, to Using first DeprecationWarning, and then eventually a RuntimeWarning, to
advise against implicitly switching to the deterministic PRNG, preserves advise against implicitly switching to the deterministic PRNG aims to
compatibility with this existing software, while still nudging future users nudge future users that need a cryptographically secure RNG away from
that need a deterministic generator towards importing ``random.seedable`` calling ``random.seed()`` and those that genuinely need a deterministic
explicitly. generator towards explicitily calling ``random.ensure_repeatable()``.
Avoiding the introduction of a userspace CSPRNG Avoiding the introduction of a userspace CSPRNG
----------------------------------------------- -----------------------------------------------
@ -224,23 +264,9 @@ point of failure in security sensitive situations, for the sake of applications
where the random number generation may not even be on a critical performance where the random number generation may not even be on a critical performance
path. path.
What about the performance impact? Applications that do need cryptographic quality randomness should be using the
---------------------------------- system random number generator regardless of speed considerations, so in those
cases.
Rather than introducing a userspace CSPRNG, this PEP instead proposes that we
accept the performance regression in cases where:
* an application is using the module level random API
* cryptographic quality randomness isn't needed
* the application doesn't already implicitly opt back in to the deterministic
PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate``
* the application isn't updated to explicitly import from ``random.seedable``
rather than ``random``
Applications that need cryptographic quality randomness should be using the
system random number generator regardless of speed considerations, while other
applications where speed is a more important consideration are better off with
the current PRNG implementation than they would be with a new CSPRNG.
Isn't the deterministic PRNG "secure enough"? Isn't the deterministic PRNG "secure enough"?
--------------------------------------------- ---------------------------------------------
@ -252,6 +278,11 @@ studies of PHP's random number generator [#php]_ have demonstrated the ability
to use weaknesses in that subsystem to facilitate a practical attack on to use weaknesses in that subsystem to facilitate a practical attack on
password recovery tokens in popular PHP web applications. password recovery tokens in popular PHP web applications.
However, one of the rules of secure software development is that "attacks only
get better, never worse", so it may be that by the time Python 3.6 is released
we will actually see a practical attack on Python's deterministic PRNG publicly
documented.
Security fatigue in the Python ecosystem Security fatigue in the Python ecosystem
---------------------------------------- ----------------------------------------
@ -264,9 +295,9 @@ on Linux systems in general, a fair share of that burden has fallen on the
Python ecosystem, which is understandably frustrating for Pythonistas using Python ecosystem, which is understandably frustrating for Pythonistas using
Python in other contexts where these issues aren't of as great a concern. Python in other contexts where these issues aren't of as great a concern.
This consideration is one of the primary factors driving the backwards This consideration is one of the primary factors driving the substantial
compatibility improvements in this proposal relative to the initial draft backwards compatibility improvements in this proposal relative to the initial
concept posted to python-ideas [#draft]_. draft concept posted to python-ideas [#draft]_.
Acknowledgements Acknowledgements
================ ================
@ -284,6 +315,8 @@ Acknowledgements
experts that suggested the introduction of a userspace CSPRNG would mean experts that suggested the introduction of a userspace CSPRNG would mean
additional complexity for insufficient gain relative to just using the additional complexity for insufficient gain relative to just using the
system RNG directly system RNG directly
* Paul Moore for eloquently making the case for the current level of security
fatigue in the Python ecosystem
References References
========== ==========