PEP 504: Using the System RNG by default
This commit is contained in:
parent
a1b52cbcd9
commit
4634a8bad8
|
@ -0,0 +1,337 @@
|
||||||
|
PEP: 504
|
||||||
|
Title: Using the System RNG by default
|
||||||
|
Version: $Revision$
|
||||||
|
Last-Modified: $Date$
|
||||||
|
Author: Nick Coghlan <ncoghlan@gmail.com>
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Content-Type: text/x-rst
|
||||||
|
Created: 15-Sep-2015
|
||||||
|
Python-Version: 3.6
|
||||||
|
Post-History: 15-Sep-2015
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
========
|
||||||
|
|
||||||
|
Python currently defaults to using the deterministic Mersenne Twister random
|
||||||
|
number generator for the module level APIs in the ``random`` module, requiring
|
||||||
|
users to know that when they're performing "security sensitive" work, they
|
||||||
|
should instead switch to using the cryptographically secure ``os.urandom`` or
|
||||||
|
``random.SystemRandom`` interfaces or a third party library like
|
||||||
|
``cryptography``.
|
||||||
|
|
||||||
|
Unfortunately, this approach has resulted in a situation where developers that
|
||||||
|
aren't aware that they're doing security sensitive work use the default module
|
||||||
|
level APIs, and thus expose their users to unnecessary risks.
|
||||||
|
|
||||||
|
This isn't an acute problem, but it is a chronic one, and if documentation and
|
||||||
|
developer education were going to solve it, they would have done so by now.
|
||||||
|
|
||||||
|
In order to provide an eventually pervasive solution to the problem, this PEP
|
||||||
|
proposes that Python switch to using the system random number generator by
|
||||||
|
default in Python 3.6, and require developers to opt-in to using the
|
||||||
|
deterministic random number generator.
|
||||||
|
|
||||||
|
To minimise the compatibility break, calling any of the following module level
|
||||||
|
functions will count as opting in to using the deterministic random number
|
||||||
|
generator for all future calls to module level functions in the random
|
||||||
|
module in the same process:
|
||||||
|
|
||||||
|
* ``random.seed``
|
||||||
|
* ``random.getstate``
|
||||||
|
* ``random.setstate``
|
||||||
|
|
||||||
|
Proposal
|
||||||
|
========
|
||||||
|
|
||||||
|
Currently, it is never correct to use the module level functions in the
|
||||||
|
``random`` module for security sensitive applications. This PEP proposes to
|
||||||
|
change that admonition in Python 3.6+ to instead be that it is not correct to
|
||||||
|
use the module level functions in the ``random`` module for security sensitive
|
||||||
|
applications if ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||||||
|
are ever called in that process.
|
||||||
|
|
||||||
|
This PEP further proposes to make it easier to explicitly opt in to using
|
||||||
|
either the system random number generator or Python's deterministic PRNG by
|
||||||
|
converting the random module to a package that exposes the same top-level API,
|
||||||
|
and offering two new subpackages:
|
||||||
|
|
||||||
|
* ``random.system``
|
||||||
|
* ``random.seedable``
|
||||||
|
|
||||||
|
The ``random.system`` submodule would provide the following bound methods of a
|
||||||
|
module global ``random.SystemRandom`` instance as module attributes:
|
||||||
|
``betavariate``, ``choice``, ``expovariate``, ``gammavariate``, ``gauss``, ``getrandbits``, ``lognormvariate``, ``normalvariate``, ``paretovariate``,
|
||||||
|
``randint``, ``random``, ``randrange``, ``sample``, ``shuffle``,
|
||||||
|
``triangular``, ``uniform``, ``vonmisesvariate``, ``weibullvariate``
|
||||||
|
|
||||||
|
The ``random.seedable`` submodule would provide the same operations, but as
|
||||||
|
methods of a ``random.Random`` instance. In addition, it would provide the
|
||||||
|
following additional methods which are only meaningful when using a
|
||||||
|
deterministic random number generator: ``seed``, ``getstate``, ``setstate``.
|
||||||
|
|
||||||
|
Rather than being bound methods of a ``random.Random`` instance as they are
|
||||||
|
today, the module level callables in ``random`` itself would change to be
|
||||||
|
functions that, by default, delegated to the ``random.SystemRandom`` instance
|
||||||
|
in ``random.system``.
|
||||||
|
|
||||||
|
Calling any one of ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||||||
|
would change the delegation to instead refer to the ``random.Random`` instance
|
||||||
|
in ``random.seedable``.
|
||||||
|
|
||||||
|
Warning on implicit opt-in
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
In Python 3.6, implicitly opting in to the use of the seedable PRNG will emit a
|
||||||
|
deprecation warning. This warning will suggest explicitly opting in to either
|
||||||
|
the system RNG or the seedable PRNG. Possible wording:
|
||||||
|
|
||||||
|
"DeprecationWarning: Implicitly switching to the seedable PRNG. Consider
|
||||||
|
importing from random.system or random.seedable as appropriate"
|
||||||
|
|
||||||
|
Whatever precise wording is chosen should have an answer added to Stack
|
||||||
|
Overflow as was done for the custom error message that was added for missing
|
||||||
|
parentheses in a call to print [#print]_.
|
||||||
|
|
||||||
|
In the first Python 3 release after Python 2.7 switches to security fix only
|
||||||
|
mode, the deprecation warning will be upgraded to a RuntimeWarning so it is
|
||||||
|
visible by default.
|
||||||
|
|
||||||
|
This PEP does *not* propose removing the ability to seed the default RNG used
|
||||||
|
process wide - it's not a good idea relative to the alternative of explicitly
|
||||||
|
importing from the appropriate submodule (hence the eventually
|
||||||
|
visible-by-default warning), but it's also a concern that can be more
|
||||||
|
readily addressed on a project-by-project basis.
|
||||||
|
|
||||||
|
Documentation changes
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
The ``random`` module documentation would be updated to move the documentation
|
||||||
|
of the ``seed``, ``getstate`` and ``setstate`` interfaces later in the module,
|
||||||
|
along with the associated security warning.
|
||||||
|
|
||||||
|
The docs would gain a discussion of the respective use cases for the seedable
|
||||||
|
PRNG (games, modelling & simulation, software testing) and the system RNG
|
||||||
|
(cryptography, security token generation).
|
||||||
|
|
||||||
|
Rationale
|
||||||
|
=========
|
||||||
|
|
||||||
|
Writing secure software under deadline and budget pressures is a hard problem.
|
||||||
|
This is reflected in ongoing problems with data breaches involving personally
|
||||||
|
identifiable information [#breaches]_, as well as with failures to take
|
||||||
|
security considerations into account when new systems, like motor vehicles
|
||||||
|
[#uconnect]_, are connected to the internet. Compounding the issue is the fact
|
||||||
|
that a lot of the programming advice readily available on the internet [#search]
|
||||||
|
simply doesn't take the mathemetical arcana of computer security into account,
|
||||||
|
and the fact that defenders have to cover *all* of their potential
|
||||||
|
vulnerabilites, as a single mistake can make it possible to subvert other
|
||||||
|
defences [#bcrypt]_.
|
||||||
|
|
||||||
|
One of the factors that contributes to making this last aspect particularly
|
||||||
|
difficult is APIs where using them inappropriately creates a *silent* security
|
||||||
|
failure - one where the only way to find out that what you're doing is
|
||||||
|
incorrect is for someone reviewing your code to say "that's a potential
|
||||||
|
security problem", or for a system you're responsible for to be compromised
|
||||||
|
through such an oversight (and your intrusion detection and auditing mechanisms
|
||||||
|
are good enough for you to be able to figure out after the event how the
|
||||||
|
compromise took place).
|
||||||
|
|
||||||
|
This kind of situation is a significant contributor to "security fatigue",
|
||||||
|
where developers (often rightly [#owasptopten]_) feel that security engineers
|
||||||
|
spend all their time saying "don't do that the easy way, it creates a
|
||||||
|
security vulnerability".
|
||||||
|
|
||||||
|
As the designers of one of the world's most popular languages [#ieeetopten]_,
|
||||||
|
we can help reduce that problem by making the easy way the right way (or at
|
||||||
|
least the "not wrong" way) in more circumstances, so developers and security
|
||||||
|
engineers can spend more time worrying about mitigating actually interesting
|
||||||
|
threats, and less time fighting with default language behaviours.
|
||||||
|
|
||||||
|
Discussion
|
||||||
|
==========
|
||||||
|
|
||||||
|
Why "seedable" over "deterministic"?
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
|
This is a case where the meaning of a word as specialist jargon conflicts with
|
||||||
|
the typical meaning of the word, even though it's *technically* the same.
|
||||||
|
|
||||||
|
From a technical perspective, a "deterministic RNG" means that given knowledge
|
||||||
|
of the algorithm and the current state, you can reliably compute arbitrary
|
||||||
|
future states.
|
||||||
|
|
||||||
|
The problem is that "deterministic" on its own doesn't convey those qualifiers,
|
||||||
|
so it's likely to instead be interpreted as "predictable" or "not random" by
|
||||||
|
folks that aren't familiar with the technical meaning.
|
||||||
|
|
||||||
|
The other problem with "deterministic" as a description for the traditional RNG
|
||||||
|
is that it doesn't tell you what you can *do* with the traditional RNG that you
|
||||||
|
can't do with the system one.
|
||||||
|
|
||||||
|
"seedable" aims to address both those problems, as it doesn't have a misleading
|
||||||
|
common meaning, and it's a word form that means "you can seed this", which then
|
||||||
|
leads naturally into an exploration of what it means to "seed" a random number
|
||||||
|
generator.
|
||||||
|
|
||||||
|
Only changing the default for Python 3.6+
|
||||||
|
-----------------------------------------
|
||||||
|
|
||||||
|
Some other recent security changes, such as upgrading the capabilities of the
|
||||||
|
``ssl`` module and switching to properly verifying HTTPS certificates by
|
||||||
|
default, have been considered critical enough to justify backporting the
|
||||||
|
change to all currently supported versions of Python.
|
||||||
|
|
||||||
|
The difference in this case is one of degree - the additional benefits from
|
||||||
|
rolling out this particular change a couple of years earlier than will
|
||||||
|
otherwise be the case aren't sufficient to justify the additional effort and
|
||||||
|
stability risks involved in making such an intrusive change in a maintenance
|
||||||
|
release.
|
||||||
|
|
||||||
|
Keeping the module level functions
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
|
In additional to general backwards compatibility considerations, Python is
|
||||||
|
widely used for educational purposes, and we specifically don't want to
|
||||||
|
invalidate the wide array of educational material that assumes the availabilty
|
||||||
|
of the current ``random`` module API. Accordingly, this proposal ensures that
|
||||||
|
most of the public API can continue to be used not only without modification,
|
||||||
|
but without generating any new warnings.
|
||||||
|
|
||||||
|
Implicitly opting in to the deterministic RNG
|
||||||
|
---------------------------------------------
|
||||||
|
|
||||||
|
Python is widely used for modelling and simulation purposes, and in many cases,
|
||||||
|
these software models won't have a dedicated maintenance team tasked with
|
||||||
|
ensuing they keep working on the latest versions of Python.
|
||||||
|
|
||||||
|
Using first DeprecationWarning, and then eventually a RuntimeWarning, to
|
||||||
|
advise against implicitly switching to the deterministic PRNG, preserves
|
||||||
|
compatibility with this existing software, while still nudging future users
|
||||||
|
that need a deterministic generator towards importing ``random.seedable``
|
||||||
|
explicitly.
|
||||||
|
|
||||||
|
Avoiding the introduction of a userspace CSPRNG
|
||||||
|
-----------------------------------------------
|
||||||
|
|
||||||
|
The original discussion of this proposal on python-ideas[#csprng]_ suggested
|
||||||
|
introducing a cryptographically secure pseudo-random number generator and using
|
||||||
|
that by default, rather than defaulting to the relatively slow system random
|
||||||
|
number generator.
|
||||||
|
|
||||||
|
The problem [#nocsprng]_ with this approach is that it introduces an additional
|
||||||
|
point of failure in security sensitive situations, for the sake of applications
|
||||||
|
where the random number generation may not even be on a critical performance
|
||||||
|
path.
|
||||||
|
|
||||||
|
What about the performance impact?
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
|
Rather than introducing a userspace CSPRNG, this PEP instead proposes that we
|
||||||
|
accept the performance regression in cases where:
|
||||||
|
|
||||||
|
* an application is using the module level random API
|
||||||
|
* cryptographic quality randomness isn't needed
|
||||||
|
* the application doesn't already implicitly opt back in to the deterministic
|
||||||
|
PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||||||
|
* the application isn't updated to explicitly import from ``random.seedable``
|
||||||
|
rather than ``random``
|
||||||
|
|
||||||
|
Applications that need cryptographic quality randomness should be using the
|
||||||
|
system random number generator regardless of speed considerations, while other
|
||||||
|
applications where speed is a more important consideration are better off with
|
||||||
|
the current PRNG implementation than they would be with a new CSPRNG.
|
||||||
|
|
||||||
|
Isn't the deterministic PRNG "secure enough"?
|
||||||
|
---------------------------------------------
|
||||||
|
|
||||||
|
In a word, "No" - that's why there's a warning in the module documentation
|
||||||
|
that says not to use it for security sensitive purposes. While we're not
|
||||||
|
currently aware of any studies of Python's random number generator specifically,
|
||||||
|
studies of PHP's random number generator [#php]_ have demonstrated the ability
|
||||||
|
to use weaknesses in that subsystem to facilitate a practical attack on
|
||||||
|
password recovery tokens in popular PHP web applications.
|
||||||
|
|
||||||
|
Security fatigue in the Python ecosystem
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
Over the past few years, the computing industry as a whole has been
|
||||||
|
making a concerted effort to upgrade the shared network infrastructure we all
|
||||||
|
depend on to a "secure by default" stance. As one of the most widely used
|
||||||
|
programming languages for network service development (including the OpenStack
|
||||||
|
Infrastructure-as-a-Service platform) and for systems administration
|
||||||
|
on Linux systems in general, a fair share of that burden has fallen on the
|
||||||
|
Python ecosystem, which is understandably frustrating for Pythonistas using
|
||||||
|
Python in other contexts where these issues aren't of as great a concern.
|
||||||
|
|
||||||
|
This consideration is one of the primary factors driving the backwards
|
||||||
|
compatibility improvements in this proposal relative to the initial draft
|
||||||
|
concept posted to python-ideas [#draft]_.
|
||||||
|
|
||||||
|
Acknowledgements
|
||||||
|
================
|
||||||
|
|
||||||
|
* Theo de Raadt, for making the suggestion to Guido van Rossum that we
|
||||||
|
seriously consider defaulting to a cryptographically secure random number
|
||||||
|
generator
|
||||||
|
* Serhiy Storchaka, Terry Reedy, Petr Viktorin, and anyone else in the
|
||||||
|
python-ideas threads that suggested the approach of transparently switching
|
||||||
|
to the ``random.Random`` implementation when any of the functions that only
|
||||||
|
make sense for a deterministic RNG are called
|
||||||
|
* Nathaniel Smith for providing the reference on practical attacks against
|
||||||
|
PHP's random number generator when used to generate password reset tokens
|
||||||
|
* Donald Stufft for pursuing additional discussions with network security
|
||||||
|
experts that suggested the introduction of a userspace CSPRNG would mean
|
||||||
|
additional complexity for insufficient gain relative to just using the
|
||||||
|
system RNG directly
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. [#breaches] Visualization of data breaches involving more than 30k records (each)
|
||||||
|
(http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/)
|
||||||
|
|
||||||
|
.. [#uconnect] Remote UConnect hack for Jeep Cherokee
|
||||||
|
(http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/)
|
||||||
|
|
||||||
|
.. [#php] PRNG based attack against password reset tokens in PHP applications
|
||||||
|
(https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf)
|
||||||
|
|
||||||
|
.. [#search] Search link for "python password generator"
|
||||||
|
(https://www.google.com.au/search?q=python+password+generator)
|
||||||
|
|
||||||
|
.. [#csprng] python-ideas thread discussing using a userspace CSPRNG
|
||||||
|
(https://mail.python.org/pipermail/python-ideas/2015-September/035886.html)
|
||||||
|
|
||||||
|
.. [#draft] Initial draft concept that eventually became this PEP
|
||||||
|
(https://mail.python.org/pipermail/python-ideas/2015-September/036095.html)
|
||||||
|
|
||||||
|
.. [#nocsprng] Safely generating random numbers
|
||||||
|
(http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/)
|
||||||
|
|
||||||
|
.. [#ieeetopten] IEEE Spectrum 2015 Top Ten Programming Languages
|
||||||
|
(http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages)
|
||||||
|
|
||||||
|
.. [#owasptopten] OWASP Top Ten Web Security Issues for 2013
|
||||||
|
(https://www.owasp.org/index.php/OWASP_Top_Ten_Project#tab=OWASP_Top_10_for_2013)
|
||||||
|
|
||||||
|
.. [#print] Stack Overflow answer for missing parentheses in call to print
|
||||||
|
(http://stackoverflow.com/questions/25445439/what-does-syntaxerror-missing-parentheses-in-call-to-print-mean-in-python/25445440#25445440)
|
||||||
|
|
||||||
|
.. [#bcrypt] Bypassing bcrypt through an insecure data cache
|
||||||
|
(http://arstechnica.com/security/2015/09/once-seen-as-bulletproof-11-million-ashley-madison-passwords-already-cracked/)
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
=========
|
||||||
|
|
||||||
|
This document has been placed in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
..
|
||||||
|
Local Variables:
|
||||||
|
mode: indented-text
|
||||||
|
indent-tabs-mode: nil
|
||||||
|
sentence-end-double-space: t
|
||||||
|
fill-column: 70
|
||||||
|
coding: utf-8
|
||||||
|
End:
|
Loading…
Reference in New Issue