PEP 504: Using the System RNG by default

This commit is contained in:
Nick Coghlan 2015-09-16 00:29:04 +10:00
parent a1b52cbcd9
commit 4634a8bad8
1 changed files with 337 additions and 0 deletions

337
pep-0504.txt Normal file
View File

@ -0,0 +1,337 @@
PEP: 504
Title: Using the System RNG by default
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 15-Sep-2015
Python-Version: 3.6
Post-History: 15-Sep-2015
Abstract
========
Python currently defaults to using the deterministic Mersenne Twister random
number generator for the module level APIs in the ``random`` module, requiring
users to know that when they're performing "security sensitive" work, they
should instead switch to using the cryptographically secure ``os.urandom`` or
``random.SystemRandom`` interfaces or a third party library like
``cryptography``.
Unfortunately, this approach has resulted in a situation where developers that
aren't aware that they're doing security sensitive work use the default module
level APIs, and thus expose their users to unnecessary risks.
This isn't an acute problem, but it is a chronic one, and if documentation and
developer education were going to solve it, they would have done so by now.
In order to provide an eventually pervasive solution to the problem, this PEP
proposes that Python switch to using the system random number generator by
default in Python 3.6, and require developers to opt-in to using the
deterministic random number generator.
To minimise the compatibility break, calling any of the following module level
functions will count as opting in to using the deterministic random number
generator for all future calls to module level functions in the random
module in the same process:
* ``random.seed``
* ``random.getstate``
* ``random.setstate``
Proposal
========
Currently, it is never correct to use the module level functions in the
``random`` module for security sensitive applications. This PEP proposes to
change that admonition in Python 3.6+ to instead be that it is not correct to
use the module level functions in the ``random`` module for security sensitive
applications if ``random.seed``, ``random.getstate``, or ``random.setstate``
are ever called in that process.
This PEP further proposes to make it easier to explicitly opt in to using
either the system random number generator or Python's deterministic PRNG by
converting the random module to a package that exposes the same top-level API,
and offering two new subpackages:
* ``random.system``
* ``random.seedable``
The ``random.system`` submodule would provide the following bound methods of a
module global ``random.SystemRandom`` instance as module attributes:
``betavariate``, ``choice``, ``expovariate``, ``gammavariate``, ``gauss``, ``getrandbits``, ``lognormvariate``, ``normalvariate``, ``paretovariate``,
``randint``, ``random``, ``randrange``, ``sample``, ``shuffle``,
``triangular``, ``uniform``, ``vonmisesvariate``, ``weibullvariate``
The ``random.seedable`` submodule would provide the same operations, but as
methods of a ``random.Random`` instance. In addition, it would provide the
following additional methods which are only meaningful when using a
deterministic random number generator: ``seed``, ``getstate``, ``setstate``.
Rather than being bound methods of a ``random.Random`` instance as they are
today, the module level callables in ``random`` itself would change to be
functions that, by default, delegated to the ``random.SystemRandom`` instance
in ``random.system``.
Calling any one of ``random.seed``, ``random.getstate``, or ``random.setstate``
would change the delegation to instead refer to the ``random.Random`` instance
in ``random.seedable``.
Warning on implicit opt-in
--------------------------
In Python 3.6, implicitly opting in to the use of the seedable PRNG will emit a
deprecation warning. This warning will suggest explicitly opting in to either
the system RNG or the seedable PRNG. Possible wording:
"DeprecationWarning: Implicitly switching to the seedable PRNG. Consider
importing from random.system or random.seedable as appropriate"
Whatever precise wording is chosen should have an answer added to Stack
Overflow as was done for the custom error message that was added for missing
parentheses in a call to print [#print]_.
In the first Python 3 release after Python 2.7 switches to security fix only
mode, the deprecation warning will be upgraded to a RuntimeWarning so it is
visible by default.
This PEP does *not* propose removing the ability to seed the default RNG used
process wide - it's not a good idea relative to the alternative of explicitly
importing from the appropriate submodule (hence the eventually
visible-by-default warning), but it's also a concern that can be more
readily addressed on a project-by-project basis.
Documentation changes
---------------------
The ``random`` module documentation would be updated to move the documentation
of the ``seed``, ``getstate`` and ``setstate`` interfaces later in the module,
along with the associated security warning.
The docs would gain a discussion of the respective use cases for the seedable
PRNG (games, modelling & simulation, software testing) and the system RNG
(cryptography, security token generation).
Rationale
=========
Writing secure software under deadline and budget pressures is a hard problem.
This is reflected in ongoing problems with data breaches involving personally
identifiable information [#breaches]_, as well as with failures to take
security considerations into account when new systems, like motor vehicles
[#uconnect]_, are connected to the internet. Compounding the issue is the fact
that a lot of the programming advice readily available on the internet [#search]
simply doesn't take the mathemetical arcana of computer security into account,
and the fact that defenders have to cover *all* of their potential
vulnerabilites, as a single mistake can make it possible to subvert other
defences [#bcrypt]_.
One of the factors that contributes to making this last aspect particularly
difficult is APIs where using them inappropriately creates a *silent* security
failure - one where the only way to find out that what you're doing is
incorrect is for someone reviewing your code to say "that's a potential
security problem", or for a system you're responsible for to be compromised
through such an oversight (and your intrusion detection and auditing mechanisms
are good enough for you to be able to figure out after the event how the
compromise took place).
This kind of situation is a significant contributor to "security fatigue",
where developers (often rightly [#owasptopten]_) feel that security engineers
spend all their time saying "don't do that the easy way, it creates a
security vulnerability".
As the designers of one of the world's most popular languages [#ieeetopten]_,
we can help reduce that problem by making the easy way the right way (or at
least the "not wrong" way) in more circumstances, so developers and security
engineers can spend more time worrying about mitigating actually interesting
threats, and less time fighting with default language behaviours.
Discussion
==========
Why "seedable" over "deterministic"?
------------------------------------
This is a case where the meaning of a word as specialist jargon conflicts with
the typical meaning of the word, even though it's *technically* the same.
From a technical perspective, a "deterministic RNG" means that given knowledge
of the algorithm and the current state, you can reliably compute arbitrary
future states.
The problem is that "deterministic" on its own doesn't convey those qualifiers,
so it's likely to instead be interpreted as "predictable" or "not random" by
folks that aren't familiar with the technical meaning.
The other problem with "deterministic" as a description for the traditional RNG
is that it doesn't tell you what you can *do* with the traditional RNG that you
can't do with the system one.
"seedable" aims to address both those problems, as it doesn't have a misleading
common meaning, and it's a word form that means "you can seed this", which then
leads naturally into an exploration of what it means to "seed" a random number
generator.
Only changing the default for Python 3.6+
-----------------------------------------
Some other recent security changes, such as upgrading the capabilities of the
``ssl`` module and switching to properly verifying HTTPS certificates by
default, have been considered critical enough to justify backporting the
change to all currently supported versions of Python.
The difference in this case is one of degree - the additional benefits from
rolling out this particular change a couple of years earlier than will
otherwise be the case aren't sufficient to justify the additional effort and
stability risks involved in making such an intrusive change in a maintenance
release.
Keeping the module level functions
----------------------------------
In additional to general backwards compatibility considerations, Python is
widely used for educational purposes, and we specifically don't want to
invalidate the wide array of educational material that assumes the availabilty
of the current ``random`` module API. Accordingly, this proposal ensures that
most of the public API can continue to be used not only without modification,
but without generating any new warnings.
Implicitly opting in to the deterministic RNG
---------------------------------------------
Python is widely used for modelling and simulation purposes, and in many cases,
these software models won't have a dedicated maintenance team tasked with
ensuing they keep working on the latest versions of Python.
Using first DeprecationWarning, and then eventually a RuntimeWarning, to
advise against implicitly switching to the deterministic PRNG, preserves
compatibility with this existing software, while still nudging future users
that need a deterministic generator towards importing ``random.seedable``
explicitly.
Avoiding the introduction of a userspace CSPRNG
-----------------------------------------------
The original discussion of this proposal on python-ideas[#csprng]_ suggested
introducing a cryptographically secure pseudo-random number generator and using
that by default, rather than defaulting to the relatively slow system random
number generator.
The problem [#nocsprng]_ with this approach is that it introduces an additional
point of failure in security sensitive situations, for the sake of applications
where the random number generation may not even be on a critical performance
path.
What about the performance impact?
----------------------------------
Rather than introducing a userspace CSPRNG, this PEP instead proposes that we
accept the performance regression in cases where:
* an application is using the module level random API
* cryptographic quality randomness isn't needed
* the application doesn't already implicitly opt back in to the deterministic
PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate``
* the application isn't updated to explicitly import from ``random.seedable``
rather than ``random``
Applications that need cryptographic quality randomness should be using the
system random number generator regardless of speed considerations, while other
applications where speed is a more important consideration are better off with
the current PRNG implementation than they would be with a new CSPRNG.
Isn't the deterministic PRNG "secure enough"?
---------------------------------------------
In a word, "No" - that's why there's a warning in the module documentation
that says not to use it for security sensitive purposes. While we're not
currently aware of any studies of Python's random number generator specifically,
studies of PHP's random number generator [#php]_ have demonstrated the ability
to use weaknesses in that subsystem to facilitate a practical attack on
password recovery tokens in popular PHP web applications.
Security fatigue in the Python ecosystem
----------------------------------------
Over the past few years, the computing industry as a whole has been
making a concerted effort to upgrade the shared network infrastructure we all
depend on to a "secure by default" stance. As one of the most widely used
programming languages for network service development (including the OpenStack
Infrastructure-as-a-Service platform) and for systems administration
on Linux systems in general, a fair share of that burden has fallen on the
Python ecosystem, which is understandably frustrating for Pythonistas using
Python in other contexts where these issues aren't of as great a concern.
This consideration is one of the primary factors driving the backwards
compatibility improvements in this proposal relative to the initial draft
concept posted to python-ideas [#draft]_.
Acknowledgements
================
* Theo de Raadt, for making the suggestion to Guido van Rossum that we
seriously consider defaulting to a cryptographically secure random number
generator
* Serhiy Storchaka, Terry Reedy, Petr Viktorin, and anyone else in the
python-ideas threads that suggested the approach of transparently switching
to the ``random.Random`` implementation when any of the functions that only
make sense for a deterministic RNG are called
* Nathaniel Smith for providing the reference on practical attacks against
PHP's random number generator when used to generate password reset tokens
* Donald Stufft for pursuing additional discussions with network security
experts that suggested the introduction of a userspace CSPRNG would mean
additional complexity for insufficient gain relative to just using the
system RNG directly
References
==========
.. [#breaches] Visualization of data breaches involving more than 30k records (each)
(http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/)
.. [#uconnect] Remote UConnect hack for Jeep Cherokee
(http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/)
.. [#php] PRNG based attack against password reset tokens in PHP applications
(https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf)
.. [#search] Search link for "python password generator"
(https://www.google.com.au/search?q=python+password+generator)
.. [#csprng] python-ideas thread discussing using a userspace CSPRNG
(https://mail.python.org/pipermail/python-ideas/2015-September/035886.html)
.. [#draft] Initial draft concept that eventually became this PEP
(https://mail.python.org/pipermail/python-ideas/2015-September/036095.html)
.. [#nocsprng] Safely generating random numbers
(http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/)
.. [#ieeetopten] IEEE Spectrum 2015 Top Ten Programming Languages
(http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages)
.. [#owasptopten] OWASP Top Ten Web Security Issues for 2013
(https://www.owasp.org/index.php/OWASP_Top_Ten_Project#tab=OWASP_Top_10_for_2013)
.. [#print] Stack Overflow answer for missing parentheses in call to print
(http://stackoverflow.com/questions/25445439/what-does-syntaxerror-missing-parentheses-in-call-to-print-mean-in-python/25445440#25445440)
.. [#bcrypt] Bypassing bcrypt through an insecure data cache
(http://arstechnica.com/security/2015/09/once-seen-as-bulletproof-11-million-ashley-madison-passwords-already-cracked/)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: