Major simplification of PEP 504
- drop the submodule idea - call random.ensure_repeatable() to opt in to the PRNG - seed(), getstate(), setstate() all call ensure_repeatable()
This commit is contained in:
parent
2702cd15b3
commit
107927ee7a
241
pep-0504.txt
241
pep-0504.txt
|
@ -24,22 +24,19 @@ Unfortunately, this approach has resulted in a situation where developers that
|
|||
aren't aware that they're doing security sensitive work use the default module
|
||||
level APIs, and thus expose their users to unnecessary risks.
|
||||
|
||||
This isn't an acute problem, but it is a chronic one, and if documentation and
|
||||
developer education were going to solve it, they would have done so by now.
|
||||
This isn't an acute problem, but it is a chronic one, and the often long
|
||||
delays between the introduction of security flaws and their exploitation means
|
||||
that it is difficult for developers to naturally learn from experience.
|
||||
|
||||
In order to provide an eventually pervasive solution to the problem, this PEP
|
||||
proposes that Python switch to using the system random number generator by
|
||||
default in Python 3.6, and require developers to opt-in to using the
|
||||
deterministic random number generator.
|
||||
deterministic random number generator process wide either by using a new
|
||||
``random.ensure_repeatable()`` API, or by explicitly creating their own
|
||||
``random.Random()`` instance.
|
||||
|
||||
To minimise the compatibility break, calling any of the following module level
|
||||
functions will count as opting in to using the deterministic random number
|
||||
generator for all future calls to module level functions in the random
|
||||
module in the same process:
|
||||
|
||||
* ``random.seed``
|
||||
* ``random.getstate``
|
||||
* ``random.setstate``
|
||||
To minimise the impact on existing code, module level APIs that require
|
||||
determinism will implicitly switch to the deterministic PRNG.
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
@ -48,94 +45,130 @@ Currently, it is never correct to use the module level functions in the
|
|||
``random`` module for security sensitive applications. This PEP proposes to
|
||||
change that admonition in Python 3.6+ to instead be that it is not correct to
|
||||
use the module level functions in the ``random`` module for security sensitive
|
||||
applications if ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||||
are ever called in that process.
|
||||
applications if ``random.ensure_repeatable()`` is ever called (directly or
|
||||
indirectly) in that process.
|
||||
|
||||
This PEP further proposes to make it easier to explicitly opt in to using
|
||||
either the system random number generator or Python's deterministic PRNG by
|
||||
converting the random module to a package that exposes the same top-level API,
|
||||
and offering two new subpackages:
|
||||
To achieve this, rather than being bound methods of a ``random.Random``
|
||||
instance as they are today, the module level callables in ``random`` would
|
||||
change to be functions that delegate to the corresponding method of the
|
||||
existing ``random._inst`` module attribute.
|
||||
|
||||
* ``random.system``
|
||||
* ``random.seedable``
|
||||
By default, this attribute will be bound to a ``random.SystemRandom`` instance.
|
||||
|
||||
The ``random.system`` submodule would provide the following bound methods of a
|
||||
module global ``random.SystemRandom`` instance as module attributes:
|
||||
``betavariate``, ``choice``, ``expovariate``, ``gammavariate``, ``gauss``, ``getrandbits``, ``lognormvariate``, ``normalvariate``, ``paretovariate``,
|
||||
``randint``, ``random``, ``randrange``, ``sample``, ``shuffle``,
|
||||
``triangular``, ``uniform``, ``vonmisesvariate``, ``weibullvariate``
|
||||
A new ``random.ensure_repeatable()`` API will then rebind the ``random._inst``
|
||||
attribute to a ``system.Random`` instance, restoring the same module level
|
||||
API behaviour as existed in previous Python versions (aside from the
|
||||
additional level of indirection)::
|
||||
|
||||
The ``random.seedable`` submodule would provide the same operations, but as
|
||||
methods of a ``random.Random`` instance. In addition, it would provide the
|
||||
following additional methods which are only meaningful when using a
|
||||
deterministic random number generator: ``seed``, ``getstate``, ``setstate``.
|
||||
def ensure_repeatable():
|
||||
"""Switch to using random.Random() for the module level APIs"""
|
||||
if not isinstance(_inst, Random):
|
||||
_inst = random.Random()
|
||||
|
||||
Rather than being bound methods of a ``random.Random`` instance as they are
|
||||
today, the module level callables in ``random`` itself would change to be
|
||||
functions that, by default, delegated to the ``random.SystemRandom`` instance
|
||||
in ``random.system``.
|
||||
To minimise the impact on existing code, calling any of the following module
|
||||
level functions will implicitly call ``random.ensure_repeatable()``:
|
||||
|
||||
Calling any one of ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||||
would change the delegation to instead refer to the ``random.Random`` instance
|
||||
in ``random.seedable``.
|
||||
* ``random.seed``
|
||||
* ``random.getstate``
|
||||
* ``random.setstate``
|
||||
|
||||
There are no changes proposed to the ``random.Random`` or
|
||||
``random.SystemRandom`` class APIs - applications that explicitly instantiate
|
||||
their own random number generators will be entirely unaffected by this
|
||||
proposal.
|
||||
|
||||
Warning on implicit opt-in
|
||||
--------------------------
|
||||
|
||||
In Python 3.6, implicitly opting in to the use of the seedable PRNG will emit a
|
||||
deprecation warning. This warning will suggest explicitly opting in to either
|
||||
the system RNG or the seedable PRNG. Possible wording:
|
||||
In Python 3.6, implicitly opting in to the use of the deterministic PRNG will
|
||||
emit a deprecation warning using the following check::
|
||||
|
||||
"DeprecationWarning: Implicitly switching to the seedable PRNG. Consider
|
||||
importing from random.system or random.seedable as appropriate"
|
||||
if not isinstance(_inst, Random):
|
||||
warnings.warn(DeprecationWarning,
|
||||
"Implicitly ensuring repeatability. "
|
||||
"See help(random.ensure_repeatable) for details")
|
||||
ensure_repeatable()
|
||||
|
||||
Whatever precise wording is chosen should have an answer added to Stack
|
||||
Overflow as was done for the custom error message that was added for missing
|
||||
parentheses in a call to print [#print]_.
|
||||
The specific wording of the warning should have a suitable answer added to
|
||||
Stack Overflow as was done for the custom error message that was added for
|
||||
missing parentheses in a call to print [#print]_.
|
||||
|
||||
In the first Python 3 release after Python 2.7 switches to security fix only
|
||||
mode, the deprecation warning will be upgraded to a RuntimeWarning so it is
|
||||
visible by default.
|
||||
|
||||
This PEP does *not* propose removing the ability to seed the default RNG used
|
||||
process wide - it's not a good idea relative to the alternative of explicitly
|
||||
importing from the appropriate submodule (hence the eventually
|
||||
visible-by-default warning), but it's also a concern that can be more
|
||||
readily addressed on a project-by-project basis.
|
||||
This PEP does *not* propose ever removing the ability to ensure the default RNG
|
||||
used process wide is a deterministic PRNG that will produce the same series of
|
||||
outputs given a specific seed. That capability is widely used in modelling
|
||||
and simulation scenarios, and requiring that ``ensure_repeatable()`` be called
|
||||
either directly or indirectly is a sufficient enhancement to address the cases
|
||||
where the module level random API is used for security sensitive tasks in web
|
||||
applications without due consideration for the potential security implications
|
||||
of using a deterministic PRNG.
|
||||
|
||||
Performance impact
|
||||
------------------
|
||||
|
||||
Due to the large performance difference between ``random.Random`` and
|
||||
``random.SystemRandom``, applications ported to Python 3.6 will encounter a
|
||||
significant performance regression in cases where:
|
||||
|
||||
* the application is using the module level random API
|
||||
* cryptographic quality randomness isn't needed
|
||||
* the application doesn't already implicitly opt back in to the deterministic
|
||||
PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||||
* the application isn't updated to explicitly call ``random.ensure_repeatable``
|
||||
|
||||
This would be noted in the Porting section of the Python 3.6 What's New guide,
|
||||
with the recommendation to include the following code in the ``__main__``
|
||||
module of affected applications::
|
||||
|
||||
if hasattr(random, "ensure_repeatable"):
|
||||
random.ensure_repeatable()
|
||||
|
||||
Applications that do need cryptographic quality randomness should be using the
|
||||
system random number generator regardless of speed considerations, so in those
|
||||
cases the change proposed in this PEP will fix a previously latent security
|
||||
defect.
|
||||
|
||||
Documentation changes
|
||||
---------------------
|
||||
|
||||
The ``random`` module documentation would be updated to move the documentation
|
||||
of the ``seed``, ``getstate`` and ``setstate`` interfaces later in the module,
|
||||
along with the associated security warning.
|
||||
along with the documentation of the new ``ensure_repeatable`` function and the
|
||||
associated security warning.
|
||||
|
||||
The docs would gain a discussion of the respective use cases for the seedable
|
||||
PRNG (games, modelling & simulation, software testing) and the system RNG
|
||||
(cryptography, security token generation).
|
||||
That section of the module documentation would also gain a discussion of the
|
||||
respective use cases for the deterministic PRNG enabled by
|
||||
``ensure_repeatable`` (games, modelling & simulation, software testing) and the
|
||||
system RNG that is used by default (cryptography, security token generation).
|
||||
This discussion will also recommend the use of third party security libraries
|
||||
for the latter task.
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Writing secure software under deadline and budget pressures is a hard problem.
|
||||
This is reflected in ongoing problems with data breaches involving personally
|
||||
This is reflected in regular notifications of data breaches involving personally
|
||||
identifiable information [#breaches]_, as well as with failures to take
|
||||
security considerations into account when new systems, like motor vehicles
|
||||
[#uconnect]_, are connected to the internet. Compounding the issue is the fact
|
||||
that a lot of the programming advice readily available on the internet [#search]
|
||||
simply doesn't take the mathemetical arcana of computer security into account,
|
||||
and the fact that defenders have to cover *all* of their potential
|
||||
vulnerabilites, as a single mistake can make it possible to subvert other
|
||||
defences [#bcrypt]_.
|
||||
[#uconnect]_, are connected to the internet. It's also the case that a lot of
|
||||
the programming advice readily available on the internet [#search] simply
|
||||
doesn't take the mathemetical arcana of computer security into account.
|
||||
Compounding these issues is the fact that defenders have to cover *all* of
|
||||
their potential vulnerabilites, as a single mistake can make it possible to
|
||||
subvert other defences [#bcrypt]_.
|
||||
|
||||
One of the factors that contributes to making this last aspect particularly
|
||||
difficult is APIs where using them inappropriately creates a *silent* security
|
||||
failure - one where the only way to find out that what you're doing is
|
||||
incorrect is for someone reviewing your code to say "that's a potential
|
||||
security problem", or for a system you're responsible for to be compromised
|
||||
through such an oversight (and your intrusion detection and auditing mechanisms
|
||||
are good enough for you to be able to figure out after the event how the
|
||||
compromise took place).
|
||||
through such an oversight (and you're not only still responsible for that
|
||||
system when it is compromised, but your intrusion detection and auditing
|
||||
mechanisms are good enough for you to be able to figure out after the event
|
||||
how the compromise took place).
|
||||
|
||||
This kind of situation is a significant contributor to "security fatigue",
|
||||
where developers (often rightly [#owasptopten]_) feel that security engineers
|
||||
|
@ -151,8 +184,8 @@ threats, and less time fighting with default language behaviours.
|
|||
Discussion
|
||||
==========
|
||||
|
||||
Why "seedable" over "deterministic"?
|
||||
------------------------------------
|
||||
Why "ensure_repeatable" over "ensure_deterministic"?
|
||||
----------------------------------------------------
|
||||
|
||||
This is a case where the meaning of a word as specialist jargon conflicts with
|
||||
the typical meaning of the word, even though it's *technically* the same.
|
||||
|
@ -163,16 +196,17 @@ future states.
|
|||
|
||||
The problem is that "deterministic" on its own doesn't convey those qualifiers,
|
||||
so it's likely to instead be interpreted as "predictable" or "not random" by
|
||||
folks that aren't familiar with the technical meaning.
|
||||
folks that are familiar with the conventional meaning, but aren't familiar with
|
||||
the additional qualifiers on the technical meaning.
|
||||
|
||||
The other problem with "deterministic" as a description for the traditional RNG
|
||||
is that it doesn't tell you what you can *do* with the traditional RNG that you
|
||||
can't do with the system one.
|
||||
A second problem with "deterministic" as a description for the traditional RNG
|
||||
is that it doesn't really tell you what you can *do* with the traditional RNG
|
||||
that you can't do with the system one.
|
||||
|
||||
"seedable" aims to address both those problems, as it doesn't have a misleading
|
||||
common meaning, and it's a word form that means "you can seed this", which then
|
||||
leads naturally into an exploration of what it means to "seed" a random number
|
||||
generator.
|
||||
"ensure_repeatable" aims to address both of those problems, as its common
|
||||
meaning accurately describes the main reason for preferring the deterministic
|
||||
PRNG over the system RNG: ensuring you can repeat the same series of outputs
|
||||
by providing the same seed value, or by restoring a previously saved PRNG state.
|
||||
|
||||
Only changing the default for Python 3.6+
|
||||
-----------------------------------------
|
||||
|
@ -184,9 +218,9 @@ change to all currently supported versions of Python.
|
|||
|
||||
The difference in this case is one of degree - the additional benefits from
|
||||
rolling out this particular change a couple of years earlier than will
|
||||
otherwise be the case aren't sufficient to justify the additional effort and
|
||||
stability risks involved in making such an intrusive change in a maintenance
|
||||
release.
|
||||
otherwise be the case aren't sufficient to justify either the additional effort
|
||||
or the stability risks involved in making such an intrusive change in a
|
||||
maintenance release.
|
||||
|
||||
Keeping the module level functions
|
||||
----------------------------------
|
||||
|
@ -198,18 +232,24 @@ of the current ``random`` module API. Accordingly, this proposal ensures that
|
|||
most of the public API can continue to be used not only without modification,
|
||||
but without generating any new warnings.
|
||||
|
||||
Implicitly opting in to the deterministic RNG
|
||||
---------------------------------------------
|
||||
Warning when implicitly opting in to the deterministic RNG
|
||||
----------------------------------------------------------
|
||||
|
||||
Python is widely used for modelling and simulation purposes, and in many cases,
|
||||
these software models won't have a dedicated maintenance team tasked with
|
||||
ensuing they keep working on the latest versions of Python.
|
||||
It's necessary to implicitly opt in to the deterministic PRNG as Python is
|
||||
widely used for modelling and simulation purposes where this is the right
|
||||
thing to do, and in many cases, these software models won't have a dedicated
|
||||
maintenance team tasked with ensuring they keep working on the latest versions
|
||||
of Python.
|
||||
|
||||
Unfortunately, explicitly calling ``random.seed`` with data from ``os.urandom``
|
||||
is also a mistake that appears in a number of the flawed "how to generate a
|
||||
security token in Python" guides readily available online.
|
||||
|
||||
Using first DeprecationWarning, and then eventually a RuntimeWarning, to
|
||||
advise against implicitly switching to the deterministic PRNG, preserves
|
||||
compatibility with this existing software, while still nudging future users
|
||||
that need a deterministic generator towards importing ``random.seedable``
|
||||
explicitly.
|
||||
advise against implicitly switching to the deterministic PRNG aims to
|
||||
nudge future users that need a cryptographically secure RNG away from
|
||||
calling ``random.seed()`` and those that genuinely need a deterministic
|
||||
generator towards explicitily calling ``random.ensure_repeatable()``.
|
||||
|
||||
Avoiding the introduction of a userspace CSPRNG
|
||||
-----------------------------------------------
|
||||
|
@ -224,23 +264,9 @@ point of failure in security sensitive situations, for the sake of applications
|
|||
where the random number generation may not even be on a critical performance
|
||||
path.
|
||||
|
||||
What about the performance impact?
|
||||
----------------------------------
|
||||
|
||||
Rather than introducing a userspace CSPRNG, this PEP instead proposes that we
|
||||
accept the performance regression in cases where:
|
||||
|
||||
* an application is using the module level random API
|
||||
* cryptographic quality randomness isn't needed
|
||||
* the application doesn't already implicitly opt back in to the deterministic
|
||||
PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||||
* the application isn't updated to explicitly import from ``random.seedable``
|
||||
rather than ``random``
|
||||
|
||||
Applications that need cryptographic quality randomness should be using the
|
||||
system random number generator regardless of speed considerations, while other
|
||||
applications where speed is a more important consideration are better off with
|
||||
the current PRNG implementation than they would be with a new CSPRNG.
|
||||
Applications that do need cryptographic quality randomness should be using the
|
||||
system random number generator regardless of speed considerations, so in those
|
||||
cases.
|
||||
|
||||
Isn't the deterministic PRNG "secure enough"?
|
||||
---------------------------------------------
|
||||
|
@ -252,6 +278,11 @@ studies of PHP's random number generator [#php]_ have demonstrated the ability
|
|||
to use weaknesses in that subsystem to facilitate a practical attack on
|
||||
password recovery tokens in popular PHP web applications.
|
||||
|
||||
However, one of the rules of secure software development is that "attacks only
|
||||
get better, never worse", so it may be that by the time Python 3.6 is released
|
||||
we will actually see a practical attack on Python's deterministic PRNG publicly
|
||||
documented.
|
||||
|
||||
Security fatigue in the Python ecosystem
|
||||
----------------------------------------
|
||||
|
||||
|
@ -264,9 +295,9 @@ on Linux systems in general, a fair share of that burden has fallen on the
|
|||
Python ecosystem, which is understandably frustrating for Pythonistas using
|
||||
Python in other contexts where these issues aren't of as great a concern.
|
||||
|
||||
This consideration is one of the primary factors driving the backwards
|
||||
compatibility improvements in this proposal relative to the initial draft
|
||||
concept posted to python-ideas [#draft]_.
|
||||
This consideration is one of the primary factors driving the substantial
|
||||
backwards compatibility improvements in this proposal relative to the initial
|
||||
draft concept posted to python-ideas [#draft]_.
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
@ -284,6 +315,8 @@ Acknowledgements
|
|||
experts that suggested the introduction of a userspace CSPRNG would mean
|
||||
additional complexity for insufficient gain relative to just using the
|
||||
system RNG directly
|
||||
* Paul Moore for eloquently making the case for the current level of security
|
||||
fatigue in the Python ecosystem
|
||||
|
||||
References
|
||||
==========
|
||||
|
|
Loading…
Reference in New Issue