PEP: 522
Title: Allow BlockingIOError in security sensitive APIs
Version: $Revision$
Last-Modified: $Date$
Author: Alyssa Coghlan <ncoghlan@gmail.com>, Nathaniel J. Smith <njs@pobox.com>
Status: Rejected
Type: Standards Track
Content-Type: text/x-rst
Requires: 506
Created: 16-Jun-2016
Python-Version: 3.6
Resolution: https://mail.python.org/pipermail/security-sig/2016-August/000101.html


Abstract
========

A number of APIs in the standard library that return random values nominally
suitable for use in security sensitive operations currently have an obscure
operating system dependent failure mode that allows them to return values that
are not, in fact, suitable for such operations.

This is due to some operating system kernels (most notably the Linux kernel)
permitting reads from ``/dev/urandom`` before the system random number
generator is fully initialized, whereas most other operating systems will
implicitly block on such reads until the random number generator is ready.

For the lower level ``os.urandom`` and ``random.SystemRandom`` APIs, this PEP
proposes changing such failures in Python 3.6 from the current silent,
hard to detect, and hard to debug, errors to easily detected and debugged errors
by raising ``BlockingIOError`` with a suitable error message, allowing
developers the opportunity to unambiguously specify their preferred approach
for handling the situation.

For the new high level ``secrets`` API, it proposes to block implicitly if
needed whenever random number is generated by that module, as well as to
expose a new ``secrets.wait_for_system_rng()`` function to allow code otherwise
using the low level APIs to explicitly wait for the system random number
generator to be available.

This change will impact any operating system that offers the ``getrandom()``
system call, regardless of whether the default behaviour of the
``/dev/urandom`` device is to return potentially predictable results when the
system random number generator is not ready (e.g. Linux, NetBSD) or to block
(e.g. FreeBSD, Solaris, Illumos). Operating systems that prevent execution of
userspace code prior to the initialization of the system random number
generator, or do not offer the ``getrandom()`` syscall, will be entirely
unaffected by the proposed change (e.g. Windows, Mac OS X, OpenBSD).

The new exception or the blocking behaviour in the ``secrets`` module would
potentially be encountered in the following situations:

* Python code calling these APIs during Linux system initialization
* Python code running on improperly initialized Linux systems (e.g. embedded
  hardware without adequate sources of entropy to seed the system random number
  generator, or Linux VMs that aren't configured to accept entropy from the
  VM host)


Relationship with other PEPs
============================

This PEP depends on the Accepted :pep:`506`, which adds the ``secrets`` module.

This PEP competes with Victor Stinner's :pep:`524`, which proposes to make
``os.urandom`` itself implicitly block when the system RNG is not ready.


PEP Rejection
=============

For the reference implementation, Guido rejected this PEP in favour of the
unconditional implicit blocking proposal in :pep:`524` (which brings CPython's
behaviour on Linux into line with its behaviour on other operating systems).

This means any further discussion of appropriate default behaviour for
``os.urandom()`` in system Python installations in Linux distributions should
take place on the respective distro mailing lists, rather than on the upstream
CPython mailing lists.


Changes independent of this PEP
===============================

CPython interpreter initialization and ``random`` module initialization have
already been updated to gracefully fall back to alternative seeding options if
the system random number generator is not ready.

This PEP does not compete with the proposal in :pep:`524` to add an
``os.getrandom()`` API to expose the ``getrandom`` syscall on platforms that
offer it. There is sufficient motive for adding that API in the ``os`` module's
role as a thin wrapper around potentially platform dependent operating system
features that it can be added regardless of what happens to the default
behaviour of ``os.urandom()`` on these systems.


Proposal
========

Changing ``os.urandom()`` on platforms with the getrandom() system call
-----------------------------------------------------------------------

This PEP proposes that in Python 3.6+, ``os.urandom()`` be updated to call
the ``getrandom()`` syscall in non-blocking mode if available and raise
``BlockingIOError: system random number generator is not ready; see secrets.token_bytes()``
if the kernel reports that the call would block.

This behaviour will then propagate through to the existing
``random.SystemRandom``, which provides a relatively thin wrapper around
``os.urandom()`` that matches the ``random.Random()`` API.

However, the new ``secrets`` module introduced by :pep:`506` will be updated to
catch the new exception and implicitly wait for the system random number
generator if the exception is ever encountered.

In all cases, as soon as a call to one of these security sensitive APIs
succeeds, all future calls to these APIs in that process will succeed
without blocking (once the operating system random number generator is ready
after system boot, it remains ready).

On Linux and NetBSD, this will replace the previous behaviour of returning
potentially predictable results read from ``/dev/urandom``.

On FreeBSD, Solaris, and Illumos, this will replace the previous behaviour of
implicitly blocking until the system random number generator is ready. However,
it is not clear if these operating systems actually allow userspace code (and
hence Python) to run before the system random number generator is ready.

Note that in all cases, if calling the underlying ``getrandom()`` API reports
``ENOSYS`` rather than returning a successful response or reporting ``EAGAIN``,
CPython will continue to fall back to reading from ``/dev/urandom`` directly.


Adding ``secrets.wait_for_system_rng()``
----------------------------------------

A new exception shouldn't be added without a straightforward recommendation
for how to resolve that error when encountered (however rare encountering
the new error is expected to be in practice). For security sensitive code that
actually does need to use the lower level interfaces to the system random
number generator (rather than the new ``secrets`` module), and does receive
live bug reports indicating this is a real problem for the userbase of that
particular application rather than a theoretical one, this PEP's recommendation
will be to add the following snippet (directly or indirectly) to the
``__main__`` module::

    import secrets
    secrets.wait_for_system_rng()

Or, if compatibility with versions prior to Python 3.6 is needed::

    try:
        import secrets
    except ImportError:
        pass
    else:
        secrets.wait_for_system_rng()

Within the ``secrets`` module itself, this will then be used in
``token_bytes()`` to block implicitly if the new exception is encountered::

    def token_bytes(nbytes=None):
        if nbytes is None:
            nbytes = DEFAULT_ENTROPY
        try:
            result = os.urandom(nbytes)
        except BlockingIOError:
            wait_for_system_rng()
            result = os.urandom(nbytes)
        return result

Other parts of the module will then be updated to use ``token_bytes()`` as
their basic random number generation building block, rather than calling
``os.urandom()`` directly.

Application frameworks covering use cases where access to the system random
number generator is almost certain to be needed (e.g. web frameworks) may
choose to incorporate a call to ``secrets.wait_for_system_rng()`` implicitly
into the commands that start the application such that existing calls to
``os.urandom()`` will be guaranteed to never raise the new exception when using
those frameworks.

For cases where the error is encountered for an application which cannot be
modified directly, then the following command can be used to wait for the
system random number generator to initialize before starting that application::

    python3 -c "import secrets; secrets.wait_for_system_rng()"

For example, this snippet could be added to a shell script or a systemd
``ExecStartPre`` hook (and may prove useful in reliably waiting for the
system random number generator to be ready, even if the subsequent command
is not itself an application running under Python 3.6)

Given the changes proposed to ``os.urandom()`` above, and the inclusion of
an ``os.getrandom()`` API on systems that support it, the suggested
implementation of this function would be::

    if hasattr(os, "getrandom"):
        # os.getrandom() always blocks waiting for the system RNG by default
        def wait_for_system_rng():
            """Block waiting for system random number generator to be ready"""
            os.getrandom(1)
            return
    else:
       # As far as we know, other platforms will never get BlockingIOError
       # below but the implementation makes pessimistic assumptions
        def wait_for_system_rng():
            """Block waiting for system random number generator to be ready"""
            # If the system RNG is already seeded, don't wait at all
            try:
                os.urandom(1)
                return
            except BlockingIOError:
                pass
            # Avoid the below busy loop if possible
            try:
                block_on_system_rng = open("/dev/random", "rb")
            except FileNotFoundError:
                pass
            else:
                with block_on_system_rng:
                    block_on_system_rng.read(1)
            # Busy loop until the system RNG is ready
            while True:
                try:
                    os.urandom(1)
                    break
                except BlockingIOError:
                    # Only check once per millisecond
                    time.sleep(0.001)

On systems where it is possible to wait for the system RNG to be ready, this
function will do so without a busy loop if ``os.getrandom()`` is defined,
``os.urandom()`` itself implicitly blocks, or the ``/dev/random`` device is
available. If the system random number generator is ready, this call is
guaranteed to never block, even if the system's ``/dev/random`` device uses
a design that permits it to block intermittently during normal system operation.


Limitations on scope
--------------------

No changes are proposed for Windows or Mac OS X systems, as neither of those
platforms provides any mechanism to run Python code before the operating
system random number generator has been initialized. Mac OS X goes so far as
to kernel panic and abort the boot process if it can't properly initialize the
random number generator (although Apple's restrictions on the supported
hardware platforms make that exceedingly unlikely in practice).

Similarly, no changes are proposed for other \*nix systems that do not offer
the ``getrandom()`` syscall. On these systems, ``os.urandom()`` will continue
to block waiting for the system random number generator to be initialized.

While other \*nix systems that offer a non-blocking API (other than
``getrandom()``) for requesting random numbers suitable for use in security
sensitive applications could potentially receive a similar update to the one
proposed for ``getrandom()`` in this PEP, such changes are out of scope for
this particular proposal.

Python's behaviour on older versions of affected platforms that do not offer
the new ``getrandom()`` syscall will also remain unchanged.


Rationale
=========

Ensuring the ``secrets`` module implicitly blocks when needed
-------------------------------------------------------------

This is done to help encourage the meme that arises for folks that want the
simplest possible answer to the right way to generate security sensitive random
numbers to be "Use the secrets module when available or your application might
crash unexpectedly", rather than the more boilerplate heavy "Always call
secrets.wait_for_system_rng() when available or your application might crash
unexpectedly".

It's also done due to the BDFL having a higher tolerance for APIs that might
block unexpectedly than he does for APIs that might throw an unexpected
exception [11]_.


Raising ``BlockingIOError`` in ``os.urandom()`` on Linux
--------------------------------------------------------

For several years now, the security community's guidance has been to use
``os.urandom()`` (or the ``random.SystemRandom()`` wrapper) when implementing
security sensitive operations in Python.

To help improve API discoverability and make it clearer that secrecy and
simulation are not the same problem (even though they both involve
random numbers), :pep:`506` collected several of the one line recipes based
on the lower level ``os.urandom()`` API into a new ``secrets`` module.

However, this guidance has also come with a longstanding caveat: developers
writing security sensitive software at least for Linux, and potentially for
some other \*BSD systems, may need to wait until the operating system's
random number generator is ready before relying on it for security sensitive
operations. This generally only occurs if ``os.urandom()`` is read very
early in the system initialization process, or on systems with few sources of
available entropy (e.g. some kinds of virtualized or embedded systems), but
unfortunately the exact conditions that trigger this are difficult to predict,
and when it occurs then there is no direct way for userspace to tell it has
happened without querying operating system specific interfaces.

On \*BSD systems (if the particular \*BSD variant allows the problem to occur
at all) and potentially also Solaris and Illumos, encountering this situation
means ``os.urandom()`` will either block waiting for the system random number
generator to be ready (the associated symptom would be for the affected script
to pause unexpectedly on the first call to ``os.urandom()``) or else will
behave the same way as it does on Linux.

On Linux, in Python versions up to and including Python 3.4, and in
Python 3.5 maintenance versions following Python 3.5.2, there's no clear
indicator to developers that their software may not be working as expected
when run early in the Linux boot process, or on hardware without good
sources of entropy to seed the operating system's random number generator: due
to the behaviour of the underlying ``/dev/urandom`` device, ``os.urandom()``
on Linux returns a result either way, and it takes extensive statistical
analysis to show that a security vulnerability exists.

By contrast, if ``BlockingIOError`` is raised in those situations, then
developers using Python 3.6+ can easily choose their desired behaviour:

1. Wait for the system RNG at or before application startup (security sensitive)
2. Switch to using the random module (non-security sensitive)


Making ``secrets.wait_for_system_rng()`` public
-----------------------------------------------

Earlier versions of this PEP proposed a number of recipes for wrapping
``os.urandom()`` to make it suitable for use in security sensitive use cases.

Discussion of the proposal on the security-sig mailing list prompted the
realization [9]_ that the core assumption driving the API design in this PEP
was that choosing between letting the exception cause the application to fail,
blocking waiting for the system RNG to be ready and switching to using the
``random`` module instead of ``os.urandom`` is an application and use-case
specific decision that should take into account application and use-case
specific details.

There is no way for the interpreter runtime or support libraries to determine
whether a particular use case is security sensitive or not, and while it's
straightforward for application developer to decide how to handle an exception
thrown by a particular API, they can't readily workaround an API blocking when
they expected it to be non-blocking.

Accordingly, the PEP was updated to add ``secrets.wait_for_system_rng()`` as
an API for applications, scripts and frameworks to use to indicate that they
wanted to ensure the system RNG was available before continuing, while library
developers could continue to call ``os.urandom()`` without worrying that it
might unexpectedly start blocking waiting for the system RNG to be available.


Backwards Compatibility Impact Assessment
=========================================

Similar to :pep:`476`, this is a proposal to turn a previously silent security
failure into a noisy exception that requires the application developer to
make an explicit decision regarding the behaviour they desire.

As no changes are proposed for operating systems that don't provide the
``getrandom()`` syscall, ``os.urandom()`` retains its existing behaviour as
a nominally blocking API that is non-blocking in practice due to the difficulty
of scheduling Python code to run before the operating system random number
generator is ready. We believe it may be possible to encounter problems akin to
those described in this PEP on at least some \*BSD variants, but nobody has
explicitly demonstrated that. On Mac OS X and Windows, it appears to be
straight up impossible to even try to run a Python interpreter that early in
the boot process.

On Linux and other platforms with similar ``/dev/urandom`` behaviour,
``os.urandom()`` retains its status as a guaranteed non-blocking API.
However, the means of achieving that status changes in the specific case of
the operating system random number generator not being ready for use in security
sensitive operations: historically it would return potentially predictable
random data, with this PEP it would change to raise ``BlockingIOError``.

Developers of affected applications would then be required to make one of the
following changes to gain forward compatibility with Python 3.6, based on the
kind of application they're developing.


Unaffected Applications
-----------------------

The following kinds of applications would be entirely unaffected by the change,
regardless of whether or not they perform security sensitive operations:

- applications that don't support Linux
- applications that are only run on desktops or conventional servers
- applications that are only run after the system RNG is ready (including
  those where an application framework calls ``secrets.wait_for_system_rng()``
  on their behalf)

Applications in this category simply won't encounter the new exception, so it
will be reasonable for developers to wait and see if they receive
Python 3.6 compatibility bugs related to the new runtime behaviour, rather than
attempting to pre-emptively determine whether or not they're affected.


Affected security sensitive applications
----------------------------------------

Security sensitive applications would need to either change their system
configuration so the application is only started after the operating system
random number generator is ready for security sensitive operations, change the
application startup code to invoke ``secrets.wait_for_system_rng()``, or
else switch to using the new ``secrets.token_bytes()`` API.

As an example for components started via a systemd unit file, the following
snippet would delay activation until the system RNG was ready:

    ExecStartPre=python3 -c "import secrets; secrets.wait_for_system_rng()"

Alternatively, the following snippet will use ``secrets.token_bytes()`` if
available, and fall back to ``os.urandom()`` otherwise:

    try:
        import secrets.token_bytes as _get_random_bytes
    except ImportError:
        import os.urandom as _get_random_bytes


Affected non-security sensitive applications
--------------------------------------------

Non-security sensitive applications should be updated to use the ``random``
module rather than ``os.urandom``::

    def pseudorandom_bytes(num_bytes):
        return random.getrandbits(num_bytes*8).to_bytes(num_bytes, "little")

Depending on the details of the application, the random module may offer
other APIs that can be used directly, rather than needing to emulate the
raw byte sequence produced by the ``os.urandom()`` API.


Additional Background
=====================

Why propose this now?
---------------------

The main reason is because the Python 3.5.0 release switched to using the new
Linux ``getrandom()`` syscall when available in order to avoid consuming a
file descriptor [1]_, and this had the side effect of making the following
operations block waiting for the system random number generator to be ready:

* ``os.urandom`` (and APIs that depend on it)
* importing the ``random`` module
* initializing the randomized hash algorithm used by some builtin types

While the first of those behaviours is arguably desirable (and consistent with
the existing behaviour of ``os.urandom`` on other operating systems), the
latter two behaviours are unnecessary and undesirable, and the last one is now
known to cause a system level deadlock when attempting to run Python scripts
during the Linux init process with Python 3.5.0 or 3.5.1 [2]_, while the second
one can cause problems when using virtual machines without robust entropy
sources configured [3]_.

Since decoupling these behaviours in CPython will involve a number of
implementation changes more appropriate for a feature release than a maintenance
release, the relatively simple resolution applied in Python 3.5.2 was to revert
all three of them to a behaviour similar to that of previous Python versions:
if the new Linux syscall indicates it will block, then Python 3.5.2 will
implicitly fall back on reading ``/dev/urandom`` directly [4]_.

However, this bug report *also* resulted in a range of proposals to add *new*
APIs like ``os.getrandom()`` [5]_, ``os.urandom_block()`` [6]_,
``os.pseudorandom()`` and ``os.cryptorandom()`` [7]_, or adding new optional
parameters to ``os.urandom()`` itself [8]_, and then attempting to educate
users on when they should call those APIs instead of just using a plain
``os.urandom()`` call.

These proposals arguably represent overreactions, as the question of reliably
obtaining random numbers suitable for security sensitive work on Linux is a
relatively obscure problem of interest mainly to operating system developers
and embedded systems programmers, that may not justify expanding the
Python standard library's cross-platform APIs with new Linux-specific concerns.
This is especially so with the ``secrets`` module already being added as the
"use this and don't worry about the low level details" option for developers
writing security sensitive software that for some reason can't rely on even
higher level domain specific APIs (like web frameworks) and also don't need to
worry about Python versions prior to Python 3.6.

That said, it's also the case that low cost ARM devices are becoming
increasingly prevalent, with a lot of them running Linux, and a lot of folks
writing Python applications that run on those devices. That creates an
opportunity to take an obscure security problem that currently requires a lot
of knowledge about Linux boot processes and provably unpredictable random
number generation to diagnose and resolve, and instead turn it into a
relatively mundane and easy-to-find-in-an-internet-search runtime exception.


The cross-platform behaviour of ``os.urandom()``
------------------------------------------------

On operating systems other than Linux and NetBSD, ``os.urandom()`` may already
block waiting for the operating system's random number generator to be ready.
This will happen at most once in the lifetime of the process, and the call is
subsequently guaranteed to be non-blocking.

Linux and NetBSD are outliers in that, even when the operating system's random
number generator doesn't consider itself ready for use in security sensitive
operations, reading from the ``/dev/urandom`` device will return random values
based on the entropy it has available.

This behaviour is potentially problematic, so Linux 3.17 added a new
``getrandom()`` syscall that (amongst other benefits) allows callers to
either block waiting for the random number generator to be ready, or
else request an error return if the random number generator is not ready.
Notably, the new API does *not* support the old behaviour of returning
data that is not suitable for security sensitive use cases.

Versions of Python prior up to and including Python 3.4 access the
Linux ``/dev/urandom`` device directly.

Python 3.5.0 and 3.5.1 (when build on a system that offered the new syscall)
called ``getrandom()`` in blocking mode in order to avoid the use of a file
descriptor to access ``/dev/urandom``. While there were no specific problems
reported due to ``os.urandom()`` blocking in user code, there *were* problems
due to CPython implicitly invoking the blocking behaviour during interpreter
startup and when importing the ``random`` module.

Rather than trying to decouple SipHash initialization from the
``os.urandom()`` implementation, Python 3.5.2 switched to calling
``getrandom()`` in non-blocking mode, and falling back to reading from
``/dev/urandom`` if the syscall indicates it will block.

As a result of the above, ``os.urandom()`` in all Python versions up to and
including Python 3.5 propagate the behaviour of the underling ``/dev/urandom``
device to Python code.


Problems with the behaviour of ``/dev/urandom`` on Linux
--------------------------------------------------------

The Python ``os`` module has largely co-evolved with Linux APIs, so having
``os`` module functions closely follow the behaviour of their Linux operating
system level counterparts when running on Linux is typically considered to be
a desirable feature.

However, ``/dev/urandom`` represents a case where the current behaviour is
acknowledged to be problematic, but fixing it unilaterally at the kernel level
has been shown to prevent some Linux distributions from booting (at least in
part due to components like Python currently using it for
non-security-sensitive purposes early in the system initialization process).

As an analogy, consider the following two functions::

    def generate_example_password():
        """Generates passwords solely for use in code examples"""
        return generate_unpredictable_password()

    def generate_actual_password():
        """Generates actual passwords for use in real applications"""
        return generate_unpredictable_password()

If you think of an operating system's random number generator as a method for
generating unpredictable, secret passwords, then you can think of Linux's
``/dev/urandom`` as being implemented like::

    # Oversimplified artist's conception of the kernel code
    # implementing /dev/urandom
    def generate_unpredictable_password():
        if system_rng_is_ready:
            return use_system_rng_to_generate_password()
        else:
            # we can't make an unpredictable password; silently return a
            # potentially predictable one instead:
            return "p4ssw0rd"

In this scenario, the author of ``generate_example_password`` is fine - even if
``"p4ssw0rd"`` shows up a bit more often than they expect, it's only used in
examples anyway. However, the author of ``generate_actual_password`` has a
problem - how do they prove that their calls to
``generate_unpredictable_password`` never follow the path that returns a
predictable answer?

In real life it's slightly more complicated than this, because there
might be some level of system entropy available -- so the fallback might
be more like ``return random.choice(["p4ssword", "passw0rd",
"p4ssw0rd"])`` or something even more variable and hence only statistically
predictable with better odds than the author of ``generate_actual_password``
was expecting. This doesn't really make things more provably secure, though;
mostly it just means that if you try to catch the problem in the obvious way --
``if returned_password == "p4ssw0rd": raise UhOh`` -- then it doesn't work,
because ``returned_password`` might instead be ``p4ssword`` or even
``pa55word``, or just an arbitrary 64 bit sequence selected from fewer than
2**64 possibilities. So this rough sketch does give the right general idea of
the consequences of the "more predictable than expected" fallback behaviour,
even though it's thoroughly unfair to the Linux kernel team's efforts to
mitigate the practical consequences of this problem without resorting to
breaking backwards compatibility.

This design is generally agreed to be a bad idea. As far as we can
tell, there are no use cases whatsoever in which this is the behavior
you actually want. It has led to the use of insecure ``ssh`` keys on
real systems, and many \*nix-like systems (including at least Mac OS
X, OpenBSD, and FreeBSD) have modified their ``/dev/urandom``
implementations so that they never return predictable outputs, either
by making reads block in this case, or by simply refusing to run any
userspace programs until the system RNG has been
initialized. Unfortunately, Linux has so far been unable to follow
suit, because it's been empirically determined that enabling the
blocking behavior causes some currently extant distributions to
fail to boot.

Instead, the new ``getrandom()`` syscall was introduced, making
it *possible* for userspace applications to access the system random number
generator safely, without introducing hard to debug deadlock problems into
the system initialization processes of existing Linux distros.


Consequences of ``getrandom()`` availability for Python
-------------------------------------------------------

Prior to the introduction of the ``getrandom()`` syscall, it simply wasn't
feasible to access the Linux system random number generator in a provably
safe way, so we were forced to settle for reading from ``/dev/urandom`` as the
best available option. However, with ``getrandom()`` insisting on raising an
error or blocking rather than returning predictable data, as well as having
other advantages, it is now the recommended method for accessing the kernel
RNG on Linux, with reading ``/dev/urandom`` directly relegated to "legacy"
status. This moves Linux into the same category as other operating systems
like Windows, which doesn't provide a ``/dev/urandom`` device at all: the
best available option for implementing ``os.urandom()`` is no longer simply
reading bytes from the ``/dev/urandom`` device.

This means that what used to be somebody else's problem (the Linux kernel
development team's) is now Python's problem -- given a way to detect that the
system RNG is not initialized, we have to choose how to handle this
situation whenever we try to use the system RNG.

It could simply block, as was somewhat inadvertently implemented in 3.5.0,
and as is proposed in Victor Stinner's competing PEP::

    # artist's impression of the CPython 3.5.0-3.5.1 behavior
    def generate_unpredictable_bytes_or_block(num_bytes):
        while not system_rng_is_ready:
            wait
        return unpredictable_bytes(num_bytes)

Or it could raise an error, as this PEP proposes (in *some* cases)::

    # artist's impression of the behavior proposed in this PEP
    def generate_unpredictable_bytes_or_raise(num_bytes):
        if system_rng_is_ready:
            return unpredictable_bytes(num_bytes)
        else:
            raise BlockingIOError

Or it could explicitly emulate the ``/dev/urandom`` fallback behavior,
as was implemented in 3.5.2rc1 and is expected to remain for the rest
of the 3.5.x cycle::

    # artist's impression of the CPython 3.5.2rc1+ behavior
    def generate_unpredictable_bytes_or_maybe_not(num_bytes):
        if system_rng_is_ready:
            return unpredictable_bytes(num_bytes)
        else:
            return (b"p4ssw0rd" * (num_bytes // 8 + 1))[:num_bytes]

(And the same caveats apply to this sketch as applied to the
``generate_unpredictable_password`` sketch of ``/dev/urandom`` above.)

There are five places where CPython and the standard library attempt to use the
operating system's random number generator, and thus five places where this
decision has to be made:

* initializing the SipHash used to protect ``str.__hash__`` and
  friends against DoS attacks (called unconditionally at startup)
* initializing the ``random`` module (called when ``random`` is
  imported)
* servicing user calls to the ``os.urandom`` public API
* the higher level ``random.SystemRandom`` public API
* the new ``secrets`` module public API added by :pep:`506`

Previously, these five places all used the same underlying code, and
thus made this decision in the same way.

This whole problem was first noticed because 3.5.0 switched that
underlying code to the ``generate_unpredictable_bytes_or_block`` behavior,
and it turns out that there are some rare cases where Linux boot
scripts attempted to run a Python program as part of system initialization, the
Python startup sequence blocked while trying to initialize SipHash,
and then this triggered a deadlock because the system stopped doing
anything -- including gathering new entropy -- until the Python script
was forcibly terminated by an external timer. This is particularly unfortunate
since the scripts in question never processed untrusted input, so there was no
need for SipHash to be initialized with provably unpredictable random data in
the first place. This motivated the change in 3.5.2rc1 to emulate the old
``/dev/urandom`` behavior in all cases (by calling ``getrandom()`` in
non-blocking mode, and then falling back to reading ``/dev/urandom``
if the syscall indicates that the ``/dev/urandom`` pool is not yet
fully initialized.)

We don't know whether such problems may also exist in the Fedora/RHEL/CentOS
ecosystem, as the build systems for those distributions use chroots on servers
running an older operating system kernel that doesn't offer the ``getrandom()``
syscall, which means CPython's current build configuration compiles out the
runtime check for that syscall [10]_.

A similar problem was found due to the ``random`` module calling
``os.urandom`` as a side-effect of import in order to seed the default
global ``random.Random()`` instance.

We have not received any specific complaints regarding direct calls to
``os.urandom()`` or ``random.SystemRandom()`` blocking with 3.5.0 or 3.5.1 -
only problem reports due to the implicit blocking on interpreter startup and
as a side-effect of importing the random module.

Independently of this PEP, the first two cases have already been updated to
never block, regardless of the behaviour of ``os.urandom()``.

Where :pep:`524` proposes to make all 3 of the latter cases block implicitly,
this PEP proposes that approach only for the last case (the ``secrets``)
module, with ``os.urandom()`` and ``random.SystemRandom()`` instead raising
an exception when they detect that the underlying operating system call
would block.


References
==========

.. [1] os.urandom() should use Linux 3.17 getrandom() syscall
   (http://bugs.python.org/issue22181)

.. [2] Python 3.5 running on Linux kernel 3.17+ can block at startup or on
   importing the random module on getrandom()
   (http://bugs.python.org/issue26839)

.. [3] "import random" blocks on entropy collection on Linux with low entropy
   (http://bugs.python.org/issue25420)

.. [4] os.urandom() doesn't block on Linux anymore
   (https://hg.python.org/cpython/rev/9de508dc4837)

.. [5] Proposal to add os.getrandom()
   (http://bugs.python.org/issue26839#msg267803)

.. [6] Add os.urandom_block()
   (http://bugs.python.org/issue27250)

.. [7] Add random.cryptorandom() and random.pseudorandom, deprecate os.urandom()
   (http://bugs.python.org/issue27279)

.. [8] Always use getrandom() in os.random() on Linux and add
   block=False parameter to os.urandom()
   (http://bugs.python.org/issue27266)

.. [9] Application level vs library level design decisions
   (https://mail.python.org/pipermail/security-sig/2016-June/000057.html)

.. [10] Does the HAVE_GETRANDOM_SYSCALL config setting make sense?
   (https://mail.python.org/pipermail/security-sig/2016-June/000060.html)

.. [11] Take a decision for os.urandom() in Python 3.6
   (https://mail.python.org/pipermail/security-sig/2016-August/000084.htm)


For additional background details beyond those captured in this PEP and Victor's
competing PEP, also see Victor's prior collection of relevant information and
links at https://haypo-notes.readthedocs.io/summary_python_random_issue.html


Copyright
=========

This document has been placed into the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8