PEP 504: Using the System RNG by default
This commit is contained in:
parent
a1b52cbcd9
commit
4634a8bad8
|
@ -0,0 +1,337 @@
|
|||
PEP: 504
|
||||
Title: Using the System RNG by default
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Nick Coghlan <ncoghlan@gmail.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 15-Sep-2015
|
||||
Python-Version: 3.6
|
||||
Post-History: 15-Sep-2015
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
Python currently defaults to using the deterministic Mersenne Twister random
|
||||
number generator for the module level APIs in the ``random`` module, requiring
|
||||
users to know that when they're performing "security sensitive" work, they
|
||||
should instead switch to using the cryptographically secure ``os.urandom`` or
|
||||
``random.SystemRandom`` interfaces or a third party library like
|
||||
``cryptography``.
|
||||
|
||||
Unfortunately, this approach has resulted in a situation where developers that
|
||||
aren't aware that they're doing security sensitive work use the default module
|
||||
level APIs, and thus expose their users to unnecessary risks.
|
||||
|
||||
This isn't an acute problem, but it is a chronic one, and if documentation and
|
||||
developer education were going to solve it, they would have done so by now.
|
||||
|
||||
In order to provide an eventually pervasive solution to the problem, this PEP
|
||||
proposes that Python switch to using the system random number generator by
|
||||
default in Python 3.6, and require developers to opt-in to using the
|
||||
deterministic random number generator.
|
||||
|
||||
To minimise the compatibility break, calling any of the following module level
|
||||
functions will count as opting in to using the deterministic random number
|
||||
generator for all future calls to module level functions in the random
|
||||
module in the same process:
|
||||
|
||||
* ``random.seed``
|
||||
* ``random.getstate``
|
||||
* ``random.setstate``
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
||||
Currently, it is never correct to use the module level functions in the
|
||||
``random`` module for security sensitive applications. This PEP proposes to
|
||||
change that admonition in Python 3.6+ to instead be that it is not correct to
|
||||
use the module level functions in the ``random`` module for security sensitive
|
||||
applications if ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||||
are ever called in that process.
|
||||
|
||||
This PEP further proposes to make it easier to explicitly opt in to using
|
||||
either the system random number generator or Python's deterministic PRNG by
|
||||
converting the random module to a package that exposes the same top-level API,
|
||||
and offering two new subpackages:
|
||||
|
||||
* ``random.system``
|
||||
* ``random.seedable``
|
||||
|
||||
The ``random.system`` submodule would provide the following bound methods of a
|
||||
module global ``random.SystemRandom`` instance as module attributes:
|
||||
``betavariate``, ``choice``, ``expovariate``, ``gammavariate``, ``gauss``, ``getrandbits``, ``lognormvariate``, ``normalvariate``, ``paretovariate``,
|
||||
``randint``, ``random``, ``randrange``, ``sample``, ``shuffle``,
|
||||
``triangular``, ``uniform``, ``vonmisesvariate``, ``weibullvariate``
|
||||
|
||||
The ``random.seedable`` submodule would provide the same operations, but as
|
||||
methods of a ``random.Random`` instance. In addition, it would provide the
|
||||
following additional methods which are only meaningful when using a
|
||||
deterministic random number generator: ``seed``, ``getstate``, ``setstate``.
|
||||
|
||||
Rather than being bound methods of a ``random.Random`` instance as they are
|
||||
today, the module level callables in ``random`` itself would change to be
|
||||
functions that, by default, delegated to the ``random.SystemRandom`` instance
|
||||
in ``random.system``.
|
||||
|
||||
Calling any one of ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||||
would change the delegation to instead refer to the ``random.Random`` instance
|
||||
in ``random.seedable``.
|
||||
|
||||
Warning on implicit opt-in
|
||||
--------------------------
|
||||
|
||||
In Python 3.6, implicitly opting in to the use of the seedable PRNG will emit a
|
||||
deprecation warning. This warning will suggest explicitly opting in to either
|
||||
the system RNG or the seedable PRNG. Possible wording:
|
||||
|
||||
"DeprecationWarning: Implicitly switching to the seedable PRNG. Consider
|
||||
importing from random.system or random.seedable as appropriate"
|
||||
|
||||
Whatever precise wording is chosen should have an answer added to Stack
|
||||
Overflow as was done for the custom error message that was added for missing
|
||||
parentheses in a call to print [#print]_.
|
||||
|
||||
In the first Python 3 release after Python 2.7 switches to security fix only
|
||||
mode, the deprecation warning will be upgraded to a RuntimeWarning so it is
|
||||
visible by default.
|
||||
|
||||
This PEP does *not* propose removing the ability to seed the default RNG used
|
||||
process wide - it's not a good idea relative to the alternative of explicitly
|
||||
importing from the appropriate submodule (hence the eventually
|
||||
visible-by-default warning), but it's also a concern that can be more
|
||||
readily addressed on a project-by-project basis.
|
||||
|
||||
Documentation changes
|
||||
---------------------
|
||||
|
||||
The ``random`` module documentation would be updated to move the documentation
|
||||
of the ``seed``, ``getstate`` and ``setstate`` interfaces later in the module,
|
||||
along with the associated security warning.
|
||||
|
||||
The docs would gain a discussion of the respective use cases for the seedable
|
||||
PRNG (games, modelling & simulation, software testing) and the system RNG
|
||||
(cryptography, security token generation).
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Writing secure software under deadline and budget pressures is a hard problem.
|
||||
This is reflected in ongoing problems with data breaches involving personally
|
||||
identifiable information [#breaches]_, as well as with failures to take
|
||||
security considerations into account when new systems, like motor vehicles
|
||||
[#uconnect]_, are connected to the internet. Compounding the issue is the fact
|
||||
that a lot of the programming advice readily available on the internet [#search]
|
||||
simply doesn't take the mathemetical arcana of computer security into account,
|
||||
and the fact that defenders have to cover *all* of their potential
|
||||
vulnerabilites, as a single mistake can make it possible to subvert other
|
||||
defences [#bcrypt]_.
|
||||
|
||||
One of the factors that contributes to making this last aspect particularly
|
||||
difficult is APIs where using them inappropriately creates a *silent* security
|
||||
failure - one where the only way to find out that what you're doing is
|
||||
incorrect is for someone reviewing your code to say "that's a potential
|
||||
security problem", or for a system you're responsible for to be compromised
|
||||
through such an oversight (and your intrusion detection and auditing mechanisms
|
||||
are good enough for you to be able to figure out after the event how the
|
||||
compromise took place).
|
||||
|
||||
This kind of situation is a significant contributor to "security fatigue",
|
||||
where developers (often rightly [#owasptopten]_) feel that security engineers
|
||||
spend all their time saying "don't do that the easy way, it creates a
|
||||
security vulnerability".
|
||||
|
||||
As the designers of one of the world's most popular languages [#ieeetopten]_,
|
||||
we can help reduce that problem by making the easy way the right way (or at
|
||||
least the "not wrong" way) in more circumstances, so developers and security
|
||||
engineers can spend more time worrying about mitigating actually interesting
|
||||
threats, and less time fighting with default language behaviours.
|
||||
|
||||
Discussion
|
||||
==========
|
||||
|
||||
Why "seedable" over "deterministic"?
|
||||
------------------------------------
|
||||
|
||||
This is a case where the meaning of a word as specialist jargon conflicts with
|
||||
the typical meaning of the word, even though it's *technically* the same.
|
||||
|
||||
From a technical perspective, a "deterministic RNG" means that given knowledge
|
||||
of the algorithm and the current state, you can reliably compute arbitrary
|
||||
future states.
|
||||
|
||||
The problem is that "deterministic" on its own doesn't convey those qualifiers,
|
||||
so it's likely to instead be interpreted as "predictable" or "not random" by
|
||||
folks that aren't familiar with the technical meaning.
|
||||
|
||||
The other problem with "deterministic" as a description for the traditional RNG
|
||||
is that it doesn't tell you what you can *do* with the traditional RNG that you
|
||||
can't do with the system one.
|
||||
|
||||
"seedable" aims to address both those problems, as it doesn't have a misleading
|
||||
common meaning, and it's a word form that means "you can seed this", which then
|
||||
leads naturally into an exploration of what it means to "seed" a random number
|
||||
generator.
|
||||
|
||||
Only changing the default for Python 3.6+
|
||||
-----------------------------------------
|
||||
|
||||
Some other recent security changes, such as upgrading the capabilities of the
|
||||
``ssl`` module and switching to properly verifying HTTPS certificates by
|
||||
default, have been considered critical enough to justify backporting the
|
||||
change to all currently supported versions of Python.
|
||||
|
||||
The difference in this case is one of degree - the additional benefits from
|
||||
rolling out this particular change a couple of years earlier than will
|
||||
otherwise be the case aren't sufficient to justify the additional effort and
|
||||
stability risks involved in making such an intrusive change in a maintenance
|
||||
release.
|
||||
|
||||
Keeping the module level functions
|
||||
----------------------------------
|
||||
|
||||
In additional to general backwards compatibility considerations, Python is
|
||||
widely used for educational purposes, and we specifically don't want to
|
||||
invalidate the wide array of educational material that assumes the availabilty
|
||||
of the current ``random`` module API. Accordingly, this proposal ensures that
|
||||
most of the public API can continue to be used not only without modification,
|
||||
but without generating any new warnings.
|
||||
|
||||
Implicitly opting in to the deterministic RNG
|
||||
---------------------------------------------
|
||||
|
||||
Python is widely used for modelling and simulation purposes, and in many cases,
|
||||
these software models won't have a dedicated maintenance team tasked with
|
||||
ensuing they keep working on the latest versions of Python.
|
||||
|
||||
Using first DeprecationWarning, and then eventually a RuntimeWarning, to
|
||||
advise against implicitly switching to the deterministic PRNG, preserves
|
||||
compatibility with this existing software, while still nudging future users
|
||||
that need a deterministic generator towards importing ``random.seedable``
|
||||
explicitly.
|
||||
|
||||
Avoiding the introduction of a userspace CSPRNG
|
||||
-----------------------------------------------
|
||||
|
||||
The original discussion of this proposal on python-ideas[#csprng]_ suggested
|
||||
introducing a cryptographically secure pseudo-random number generator and using
|
||||
that by default, rather than defaulting to the relatively slow system random
|
||||
number generator.
|
||||
|
||||
The problem [#nocsprng]_ with this approach is that it introduces an additional
|
||||
point of failure in security sensitive situations, for the sake of applications
|
||||
where the random number generation may not even be on a critical performance
|
||||
path.
|
||||
|
||||
What about the performance impact?
|
||||
----------------------------------
|
||||
|
||||
Rather than introducing a userspace CSPRNG, this PEP instead proposes that we
|
||||
accept the performance regression in cases where:
|
||||
|
||||
* an application is using the module level random API
|
||||
* cryptographic quality randomness isn't needed
|
||||
* the application doesn't already implicitly opt back in to the deterministic
|
||||
PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||||
* the application isn't updated to explicitly import from ``random.seedable``
|
||||
rather than ``random``
|
||||
|
||||
Applications that need cryptographic quality randomness should be using the
|
||||
system random number generator regardless of speed considerations, while other
|
||||
applications where speed is a more important consideration are better off with
|
||||
the current PRNG implementation than they would be with a new CSPRNG.
|
||||
|
||||
Isn't the deterministic PRNG "secure enough"?
|
||||
---------------------------------------------
|
||||
|
||||
In a word, "No" - that's why there's a warning in the module documentation
|
||||
that says not to use it for security sensitive purposes. While we're not
|
||||
currently aware of any studies of Python's random number generator specifically,
|
||||
studies of PHP's random number generator [#php]_ have demonstrated the ability
|
||||
to use weaknesses in that subsystem to facilitate a practical attack on
|
||||
password recovery tokens in popular PHP web applications.
|
||||
|
||||
Security fatigue in the Python ecosystem
|
||||
----------------------------------------
|
||||
|
||||
Over the past few years, the computing industry as a whole has been
|
||||
making a concerted effort to upgrade the shared network infrastructure we all
|
||||
depend on to a "secure by default" stance. As one of the most widely used
|
||||
programming languages for network service development (including the OpenStack
|
||||
Infrastructure-as-a-Service platform) and for systems administration
|
||||
on Linux systems in general, a fair share of that burden has fallen on the
|
||||
Python ecosystem, which is understandably frustrating for Pythonistas using
|
||||
Python in other contexts where these issues aren't of as great a concern.
|
||||
|
||||
This consideration is one of the primary factors driving the backwards
|
||||
compatibility improvements in this proposal relative to the initial draft
|
||||
concept posted to python-ideas [#draft]_.
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
||||
* Theo de Raadt, for making the suggestion to Guido van Rossum that we
|
||||
seriously consider defaulting to a cryptographically secure random number
|
||||
generator
|
||||
* Serhiy Storchaka, Terry Reedy, Petr Viktorin, and anyone else in the
|
||||
python-ideas threads that suggested the approach of transparently switching
|
||||
to the ``random.Random`` implementation when any of the functions that only
|
||||
make sense for a deterministic RNG are called
|
||||
* Nathaniel Smith for providing the reference on practical attacks against
|
||||
PHP's random number generator when used to generate password reset tokens
|
||||
* Donald Stufft for pursuing additional discussions with network security
|
||||
experts that suggested the introduction of a userspace CSPRNG would mean
|
||||
additional complexity for insufficient gain relative to just using the
|
||||
system RNG directly
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [#breaches] Visualization of data breaches involving more than 30k records (each)
|
||||
(http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/)
|
||||
|
||||
.. [#uconnect] Remote UConnect hack for Jeep Cherokee
|
||||
(http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/)
|
||||
|
||||
.. [#php] PRNG based attack against password reset tokens in PHP applications
|
||||
(https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf)
|
||||
|
||||
.. [#search] Search link for "python password generator"
|
||||
(https://www.google.com.au/search?q=python+password+generator)
|
||||
|
||||
.. [#csprng] python-ideas thread discussing using a userspace CSPRNG
|
||||
(https://mail.python.org/pipermail/python-ideas/2015-September/035886.html)
|
||||
|
||||
.. [#draft] Initial draft concept that eventually became this PEP
|
||||
(https://mail.python.org/pipermail/python-ideas/2015-September/036095.html)
|
||||
|
||||
.. [#nocsprng] Safely generating random numbers
|
||||
(http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/)
|
||||
|
||||
.. [#ieeetopten] IEEE Spectrum 2015 Top Ten Programming Languages
|
||||
(http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages)
|
||||
|
||||
.. [#owasptopten] OWASP Top Ten Web Security Issues for 2013
|
||||
(https://www.owasp.org/index.php/OWASP_Top_Ten_Project#tab=OWASP_Top_10_for_2013)
|
||||
|
||||
.. [#print] Stack Overflow answer for missing parentheses in call to print
|
||||
(http://stackoverflow.com/questions/25445439/what-does-syntaxerror-missing-parentheses-in-call-to-print-mean-in-python/25445440#25445440)
|
||||
|
||||
.. [#bcrypt] Bypassing bcrypt through an insecure data cache
|
||||
(http://arstechnica.com/security/2015/09/once-seen-as-bulletproof-11-million-ashley-madison-passwords-already-cracked/)
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
Loading…
Reference in New Issue