397 lines
18 KiB
ReStructuredText
397 lines
18 KiB
ReStructuredText
PEP: 504
|
||
Title: Using the System RNG by default
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Alyssa Coghlan <ncoghlan@gmail.com>
|
||
Status: Withdrawn
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 15-Sep-2015
|
||
Python-Version: 3.6
|
||
Post-History: 15-Sep-2015
|
||
|
||
Abstract
|
||
========
|
||
|
||
Python currently defaults to using the deterministic Mersenne Twister random
|
||
number generator for the module level APIs in the ``random`` module, requiring
|
||
users to know that when they're performing "security sensitive" work, they
|
||
should instead switch to using the cryptographically secure ``os.urandom`` or
|
||
``random.SystemRandom`` interfaces or a third party library like
|
||
``cryptography``.
|
||
|
||
Unfortunately, this approach has resulted in a situation where developers that
|
||
aren't aware that they're doing security sensitive work use the default module
|
||
level APIs, and thus expose their users to unnecessary risks.
|
||
|
||
This isn't an acute problem, but it is a chronic one, and the often long
|
||
delays between the introduction of security flaws and their exploitation means
|
||
that it is difficult for developers to naturally learn from experience.
|
||
|
||
In order to provide an eventually pervasive solution to the problem, this PEP
|
||
proposes that Python switch to using the system random number generator by
|
||
default in Python 3.6, and require developers to opt-in to using the
|
||
deterministic random number generator process wide either by using a new
|
||
``random.ensure_repeatable()`` API, or by explicitly creating their own
|
||
``random.Random()`` instance.
|
||
|
||
To minimise the impact on existing code, module level APIs that require
|
||
determinism will implicitly switch to the deterministic PRNG.
|
||
|
||
PEP Withdrawal
|
||
==============
|
||
|
||
During discussion of this PEP, Steven D'Aprano proposed the simpler alternative
|
||
of offering a standardised ``secrets`` module that provides "one obvious way"
|
||
to handle security sensitive tasks like generating default passwords and other
|
||
tokens.
|
||
|
||
Steven's proposal has the desired effect of aligning the easy way to generate
|
||
such tokens and the right way to generate them, without introducing any
|
||
compatibility risks for the existing ``random`` module API, so this PEP has
|
||
been withdrawn in favour of further work on refining Steven's proposal as
|
||
:pep:`506`.
|
||
|
||
|
||
Proposal
|
||
========
|
||
|
||
Currently, it is never correct to use the module level functions in the
|
||
``random`` module for security sensitive applications. This PEP proposes to
|
||
change that admonition in Python 3.6+ to instead be that it is not correct to
|
||
use the module level functions in the ``random`` module for security sensitive
|
||
applications if ``random.ensure_repeatable()`` is ever called (directly or
|
||
indirectly) in that process.
|
||
|
||
To achieve this, rather than being bound methods of a ``random.Random``
|
||
instance as they are today, the module level callables in ``random`` would
|
||
change to be functions that delegate to the corresponding method of the
|
||
existing ``random._inst`` module attribute.
|
||
|
||
By default, this attribute will be bound to a ``random.SystemRandom`` instance.
|
||
|
||
A new ``random.ensure_repeatable()`` API will then rebind the ``random._inst``
|
||
attribute to a ``system.Random`` instance, restoring the same module level
|
||
API behaviour as existed in previous Python versions (aside from the
|
||
additional level of indirection)::
|
||
|
||
def ensure_repeatable():
|
||
"""Switch to using random.Random() for the module level APIs
|
||
|
||
This switches the default RNG instance from the cryptographically
|
||
secure random.SystemRandom() to the deterministic random.Random(),
|
||
enabling the seed(), getstate() and setstate() operations. This means
|
||
a particular random scenario can be replayed later by providing the
|
||
same seed value or restoring a previously saved state.
|
||
|
||
NOTE: Libraries implementing security sensitive operations should
|
||
always explicitly use random.SystemRandom() or os.urandom in order to
|
||
correctly handle applications that call this function.
|
||
"""
|
||
if not isinstance(_inst, Random):
|
||
_inst = random.Random()
|
||
|
||
To minimise the impact on existing code, calling any of the following module
|
||
level functions will implicitly call ``random.ensure_repeatable()``:
|
||
|
||
* ``random.seed``
|
||
* ``random.getstate``
|
||
* ``random.setstate``
|
||
|
||
There are no changes proposed to the ``random.Random`` or
|
||
``random.SystemRandom`` class APIs - applications that explicitly instantiate
|
||
their own random number generators will be entirely unaffected by this
|
||
proposal.
|
||
|
||
Warning on implicit opt-in
|
||
--------------------------
|
||
|
||
In Python 3.6, implicitly opting in to the use of the deterministic PRNG will
|
||
emit a deprecation warning using the following check::
|
||
|
||
if not isinstance(_inst, Random):
|
||
warnings.warn(DeprecationWarning,
|
||
"Implicitly ensuring repeatability. "
|
||
"See help(random.ensure_repeatable) for details")
|
||
ensure_repeatable()
|
||
|
||
The specific wording of the warning should have a suitable answer added to
|
||
Stack Overflow as was done for the custom error message that was added for
|
||
missing parentheses in a call to print [#print]_.
|
||
|
||
In the first Python 3 release after Python 2.7 switches to security fix only
|
||
mode, the deprecation warning will be upgraded to a RuntimeWarning so it is
|
||
visible by default.
|
||
|
||
This PEP does *not* propose ever removing the ability to ensure the default RNG
|
||
used process wide is a deterministic PRNG that will produce the same series of
|
||
outputs given a specific seed. That capability is widely used in modelling
|
||
and simulation scenarios, and requiring that ``ensure_repeatable()`` be called
|
||
either directly or indirectly is a sufficient enhancement to address the cases
|
||
where the module level random API is used for security sensitive tasks in web
|
||
applications without due consideration for the potential security implications
|
||
of using a deterministic PRNG.
|
||
|
||
Performance impact
|
||
------------------
|
||
|
||
Due to the large performance difference between ``random.Random`` and
|
||
``random.SystemRandom``, applications ported to Python 3.6 will encounter a
|
||
significant performance regression in cases where:
|
||
|
||
* the application is using the module level random API
|
||
* cryptographic quality randomness isn't needed
|
||
* the application doesn't already implicitly opt back in to the deterministic
|
||
PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||
* the application isn't updated to explicitly call ``random.ensure_repeatable``
|
||
|
||
This would be noted in the Porting section of the Python 3.6 What's New guide,
|
||
with the recommendation to include the following code in the ``__main__``
|
||
module of affected applications::
|
||
|
||
if hasattr(random, "ensure_repeatable"):
|
||
random.ensure_repeatable()
|
||
|
||
Applications that do need cryptographic quality randomness should be using the
|
||
system random number generator regardless of speed considerations, so in those
|
||
cases the change proposed in this PEP will fix a previously latent security
|
||
defect.
|
||
|
||
Documentation changes
|
||
---------------------
|
||
|
||
The ``random`` module documentation would be updated to move the documentation
|
||
of the ``seed``, ``getstate`` and ``setstate`` interfaces later in the module,
|
||
along with the documentation of the new ``ensure_repeatable`` function and the
|
||
associated security warning.
|
||
|
||
That section of the module documentation would also gain a discussion of the
|
||
respective use cases for the deterministic PRNG enabled by
|
||
``ensure_repeatable`` (games, modelling & simulation, software testing) and the
|
||
system RNG that is used by default (cryptography, security token generation).
|
||
This discussion will also recommend the use of third party security libraries
|
||
for the latter task.
|
||
|
||
Rationale
|
||
=========
|
||
|
||
Writing secure software under deadline and budget pressures is a hard problem.
|
||
This is reflected in regular notifications of data breaches involving personally
|
||
identifiable information [#breaches]_, as well as with failures to take
|
||
security considerations into account when new systems, like motor vehicles
|
||
[#uconnect]_, are connected to the internet. It's also the case that a lot of
|
||
the programming advice readily available on the internet [#search] simply
|
||
doesn't take the mathematical arcana of computer security into account.
|
||
Compounding these issues is the fact that defenders have to cover *all* of
|
||
their potential vulnerabilities, as a single mistake can make it possible to
|
||
subvert other defences [#bcrypt]_.
|
||
|
||
One of the factors that contributes to making this last aspect particularly
|
||
difficult is APIs where using them inappropriately creates a *silent* security
|
||
failure - one where the only way to find out that what you're doing is
|
||
incorrect is for someone reviewing your code to say "that's a potential
|
||
security problem", or for a system you're responsible for to be compromised
|
||
through such an oversight (and you're not only still responsible for that
|
||
system when it is compromised, but your intrusion detection and auditing
|
||
mechanisms are good enough for you to be able to figure out after the event
|
||
how the compromise took place).
|
||
|
||
This kind of situation is a significant contributor to "security fatigue",
|
||
where developers (often rightly [#owasptopten]_) feel that security engineers
|
||
spend all their time saying "don't do that the easy way, it creates a
|
||
security vulnerability".
|
||
|
||
As the designers of one of the world's most popular languages [#ieeetopten]_,
|
||
we can help reduce that problem by making the easy way the right way (or at
|
||
least the "not wrong" way) in more circumstances, so developers and security
|
||
engineers can spend more time worrying about mitigating actually interesting
|
||
threats, and less time fighting with default language behaviours.
|
||
|
||
Discussion
|
||
==========
|
||
|
||
Why "ensure_repeatable" over "ensure_deterministic"?
|
||
----------------------------------------------------
|
||
|
||
This is a case where the meaning of a word as specialist jargon conflicts with
|
||
the typical meaning of the word, even though it's *technically* the same.
|
||
|
||
From a technical perspective, a "deterministic RNG" means that given knowledge
|
||
of the algorithm and the current state, you can reliably compute arbitrary
|
||
future states.
|
||
|
||
The problem is that "deterministic" on its own doesn't convey those qualifiers,
|
||
so it's likely to instead be interpreted as "predictable" or "not random" by
|
||
folks that are familiar with the conventional meaning, but aren't familiar with
|
||
the additional qualifiers on the technical meaning.
|
||
|
||
A second problem with "deterministic" as a description for the traditional RNG
|
||
is that it doesn't really tell you what you can *do* with the traditional RNG
|
||
that you can't do with the system one.
|
||
|
||
"ensure_repeatable" aims to address both of those problems, as its common
|
||
meaning accurately describes the main reason for preferring the deterministic
|
||
PRNG over the system RNG: ensuring you can repeat the same series of outputs
|
||
by providing the same seed value, or by restoring a previously saved PRNG state.
|
||
|
||
Only changing the default for Python 3.6+
|
||
-----------------------------------------
|
||
|
||
Some other recent security changes, such as upgrading the capabilities of the
|
||
``ssl`` module and switching to properly verifying HTTPS certificates by
|
||
default, have been considered critical enough to justify backporting the
|
||
change to all currently supported versions of Python.
|
||
|
||
The difference in this case is one of degree - the additional benefits from
|
||
rolling out this particular change a couple of years earlier than will
|
||
otherwise be the case aren't sufficient to justify either the additional effort
|
||
or the stability risks involved in making such an intrusive change in a
|
||
maintenance release.
|
||
|
||
Keeping the module level functions
|
||
----------------------------------
|
||
|
||
In additional to general backwards compatibility considerations, Python is
|
||
widely used for educational purposes, and we specifically don't want to
|
||
invalidate the wide array of educational material that assumes the availability
|
||
of the current ``random`` module API. Accordingly, this proposal ensures that
|
||
most of the public API can continue to be used not only without modification,
|
||
but without generating any new warnings.
|
||
|
||
Warning when implicitly opting in to the deterministic RNG
|
||
----------------------------------------------------------
|
||
|
||
It's necessary to implicitly opt in to the deterministic PRNG as Python is
|
||
widely used for modelling and simulation purposes where this is the right
|
||
thing to do, and in many cases, these software models won't have a dedicated
|
||
maintenance team tasked with ensuring they keep working on the latest versions
|
||
of Python.
|
||
|
||
Unfortunately, explicitly calling ``random.seed`` with data from ``os.urandom``
|
||
is also a mistake that appears in a number of the flawed "how to generate a
|
||
security token in Python" guides readily available online.
|
||
|
||
Using first DeprecationWarning, and then eventually a RuntimeWarning, to
|
||
advise against implicitly switching to the deterministic PRNG aims to
|
||
nudge future users that need a cryptographically secure RNG away from
|
||
calling ``random.seed()`` and those that genuinely need a deterministic
|
||
generator towards explicitly calling ``random.ensure_repeatable()``.
|
||
|
||
Avoiding the introduction of a userspace CSPRNG
|
||
-----------------------------------------------
|
||
|
||
The original discussion of this proposal on python-ideas[#csprng]_ suggested
|
||
introducing a cryptographically secure pseudo-random number generator and using
|
||
that by default, rather than defaulting to the relatively slow system random
|
||
number generator.
|
||
|
||
The problem [#nocsprng]_ with this approach is that it introduces an additional
|
||
point of failure in security sensitive situations, for the sake of applications
|
||
where the random number generation may not even be on a critical performance
|
||
path.
|
||
|
||
Applications that do need cryptographic quality randomness should be using the
|
||
system random number generator regardless of speed considerations, so in those
|
||
cases.
|
||
|
||
Isn't the deterministic PRNG "secure enough"?
|
||
---------------------------------------------
|
||
|
||
In a word, "No" - that's why there's a warning in the module documentation
|
||
that says not to use it for security sensitive purposes. While we're not
|
||
currently aware of any studies of Python's random number generator specifically,
|
||
studies of PHP's random number generator [#php]_ have demonstrated the ability
|
||
to use weaknesses in that subsystem to facilitate a practical attack on
|
||
password recovery tokens in popular PHP web applications.
|
||
|
||
However, one of the rules of secure software development is that "attacks only
|
||
get better, never worse", so it may be that by the time Python 3.6 is released
|
||
we will actually see a practical attack on Python's deterministic PRNG publicly
|
||
documented.
|
||
|
||
Security fatigue in the Python ecosystem
|
||
----------------------------------------
|
||
|
||
Over the past few years, the computing industry as a whole has been
|
||
making a concerted effort to upgrade the shared network infrastructure we all
|
||
depend on to a "secure by default" stance. As one of the most widely used
|
||
programming languages for network service development (including the OpenStack
|
||
Infrastructure-as-a-Service platform) and for systems administration
|
||
on Linux systems in general, a fair share of that burden has fallen on the
|
||
Python ecosystem, which is understandably frustrating for Pythonistas using
|
||
Python in other contexts where these issues aren't of as great a concern.
|
||
|
||
This consideration is one of the primary factors driving the substantial
|
||
backwards compatibility improvements in this proposal relative to the initial
|
||
draft concept posted to python-ideas [#draft]_.
|
||
|
||
Acknowledgements
|
||
================
|
||
|
||
* Theo de Raadt, for making the suggestion to Guido van Rossum that we
|
||
seriously consider defaulting to a cryptographically secure random number
|
||
generator
|
||
* Serhiy Storchaka, Terry Reedy, Petr Viktorin, and anyone else in the
|
||
python-ideas threads that suggested the approach of transparently switching
|
||
to the ``random.Random`` implementation when any of the functions that only
|
||
make sense for a deterministic RNG are called
|
||
* Nathaniel Smith for providing the reference on practical attacks against
|
||
PHP's random number generator when used to generate password reset tokens
|
||
* Donald Stufft for pursuing additional discussions with network security
|
||
experts that suggested the introduction of a userspace CSPRNG would mean
|
||
additional complexity for insufficient gain relative to just using the
|
||
system RNG directly
|
||
* Paul Moore for eloquently making the case for the current level of security
|
||
fatigue in the Python ecosystem
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#breaches] Visualization of data breaches involving more than 30k records (each)
|
||
(http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/)
|
||
|
||
.. [#uconnect] Remote UConnect hack for Jeep Cherokee
|
||
(http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/)
|
||
|
||
.. [#php] PRNG based attack against password reset tokens in PHP applications
|
||
(https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf)
|
||
|
||
.. [#search] Search link for "python password generator"
|
||
(https://www.google.com.au/search?q=python+password+generator)
|
||
|
||
.. [#csprng] python-ideas thread discussing using a userspace CSPRNG
|
||
(https://mail.python.org/pipermail/python-ideas/2015-September/035886.html)
|
||
|
||
.. [#draft] Initial draft concept that eventually became this PEP
|
||
(https://mail.python.org/pipermail/python-ideas/2015-September/036095.html)
|
||
|
||
.. [#nocsprng] Safely generating random numbers
|
||
(http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/)
|
||
|
||
.. [#ieeetopten] IEEE Spectrum 2015 Top Ten Programming Languages
|
||
(http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages)
|
||
|
||
.. [#owasptopten] OWASP Top Ten Web Security Issues for 2013
|
||
(https://www.owasp.org/index.php/OWASP_Top_Ten_Project#tab=OWASP_Top_10_for_2013)
|
||
|
||
.. [#print] Stack Overflow answer for missing parentheses in call to print
|
||
(http://stackoverflow.com/questions/25445439/what-does-syntaxerror-missing-parentheses-in-call-to-print-mean-in-python/25445440#25445440)
|
||
|
||
.. [#bcrypt] Bypassing bcrypt through an insecure data cache
|
||
(http://arstechnica.com/security/2015/09/once-seen-as-bulletproof-11-million-ashley-madison-passwords-already-cracked/)
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|