780 lines
36 KiB
Plaintext
780 lines
36 KiB
Plaintext
PEP: 522
|
|
Title: Allow BlockingIOError in security sensitive APIs
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Nick Coghlan <ncoghlan@gmail.com>, Nathaniel J. Smith <njs@pobox.com>
|
|
Status: Rejected
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Requires: 506
|
|
Created: 16 June 2016
|
|
Python-Version: 3.6
|
|
Resolution: https://mail.python.org/pipermail/security-sig/2016-August/000101.html
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
A number of APIs in the standard library that return random values nominally
|
|
suitable for use in security sensitive operations currently have an obscure
|
|
operating system dependent failure mode that allows them to return values that
|
|
are not, in fact, suitable for such operations.
|
|
|
|
This is due to some operating system kernels (most notably the Linux kernel)
|
|
permitting reads from ``/dev/urandom`` before the system random number
|
|
generator is fully initialized, whereas most other operating systems will
|
|
implicitly block on such reads until the random number generator is ready.
|
|
|
|
For the lower level ``os.urandom`` and ``random.SystemRandom`` APIs, this PEP
|
|
proposes changing such failures in Python 3.6 from the current silent,
|
|
hard to detect, and hard to debug, errors to easily detected and debugged errors
|
|
by raising ``BlockingIOError`` with a suitable error message, allowing
|
|
developers the opportunity to unambiguously specify their preferred approach
|
|
for handling the situation.
|
|
|
|
For the new high level ``secrets`` API, it proposes to block implicitly if
|
|
needed whenever random number is generated by that module, as well as to
|
|
expose a new ``secrets.wait_for_system_rng()`` function to allow code otherwise
|
|
using the low level APIs to explicitly wait for the system random number
|
|
generator to be available.
|
|
|
|
This change will impact any operating system that offers the ``getrandom()``
|
|
system call, regardless of whether the default behaviour of the
|
|
``/dev/urandom`` device is to return potentially predictable results when the
|
|
system random number generator is not ready (e.g. Linux, NetBSD) or to block
|
|
(e.g. FreeBSD, Solaris, Illumos). Operating systems that prevent execution of
|
|
userspace code prior to the initialization of the system random number
|
|
generator, or do not offer the ``getrandom()`` syscall, will be entirely
|
|
unaffected by the proposed change (e.g. Windows, Mac OS X, OpenBSD).
|
|
|
|
The new exception or the blocking behaviour in the ``secrets`` module would
|
|
potentially be encountered in the following situations:
|
|
|
|
* Python code calling these APIs during Linux system initialization
|
|
* Python code running on improperly initialized Linux systems (e.g. embedded
|
|
hardware without adequate sources of entropy to seed the system random number
|
|
generator, or Linux VMs that aren't configured to accept entropy from the
|
|
VM host)
|
|
|
|
|
|
Relationship with other PEPs
|
|
============================
|
|
|
|
This PEP depends on the Accepted PEP 506, which adds the ``secrets`` module.
|
|
|
|
This PEP competes with Victor Stinner's PEP 524, which proposes to make
|
|
``os.urandom`` itself implicitly block when the system RNG is not ready.
|
|
|
|
|
|
PEP Rejection
|
|
=============
|
|
|
|
For the reference implementation, Guido rejected this PEP in favour of the
|
|
unconditional implicit blocking proposal in PEP 524.
|
|
|
|
This means any further discussion of appropriate default behaviour for
|
|
``os.urandom()`` in system Python installations in Linux distributions should
|
|
take place on the respective distro mailing lists, rather than on the upstream
|
|
CPython mailing lists.
|
|
|
|
|
|
Changes independent of this PEP
|
|
===============================
|
|
|
|
CPython interpreter initialization and ``random`` module initialization have
|
|
already been updated to gracefully fall back to alternative seeding options if
|
|
the system random number generator is not ready.
|
|
|
|
This PEP does not compete with the proposal in PEP 524 to add an
|
|
``os.getrandom()`` API to expose the ``getrandom`` syscall on platforms that
|
|
offer it. There is sufficient motive for adding that API in the ``os`` module's
|
|
role as a thin wrapper around potentially platform dependent operating system
|
|
features that it can be added regardless of what happens to the default
|
|
behaviour of ``os.urandom()`` on these systems.
|
|
|
|
|
|
Proposal
|
|
========
|
|
|
|
Changing ``os.urandom()`` on platforms with the getrandom() system call
|
|
-----------------------------------------------------------------------
|
|
|
|
This PEP proposes that in Python 3.6+, ``os.urandom()`` be updated to call
|
|
the ``getrandom()`` syscall in non-blocking mode if available and raise
|
|
``BlockingIOError: system random number generator is not ready; see secrets.token_bytes()``
|
|
if the kernel reports that the call would block.
|
|
|
|
This behaviour will then propagate through to the existing
|
|
``random.SystemRandom``, which provides a relatively thin wrapper around
|
|
``os.urandom()`` that matches the ``random.Random()`` API.
|
|
|
|
However, the new ``secrets`` module introduced by PEP 506 will be updated to
|
|
catch the new exception and implicitly wait for the system random number
|
|
generator if the exception is ever encountered.
|
|
|
|
In all cases, as soon as a call to one of these security sensitive APIs
|
|
succeeds, all future calls to these APIs in that process will succeed
|
|
without blocking (once the operating system random number generator is ready
|
|
after system boot, it remains ready).
|
|
|
|
On Linux and NetBSD, this will replace the previous behaviour of returning
|
|
potentially predictable results read from ``/dev/urandom``.
|
|
|
|
On FreeBSD, Solaris, and Illumos, this will replace the previous behaviour of
|
|
implicitly blocking until the system random number generator is ready. However,
|
|
it is not clear if these operating systems actually allow userspace code (and
|
|
hence Python) to run before the system random number generator is ready.
|
|
|
|
Note that in all cases, if calling the underlying ``getrandom()`` API reports
|
|
``ENOSYS`` rather than returning a successful response or reporting ``EAGAIN``,
|
|
CPython will continue to fall back to reading from ``/dev/urandom`` directly.
|
|
|
|
|
|
Adding ``secrets.wait_for_system_rng()``
|
|
----------------------------------------
|
|
|
|
A new exception shouldn't be added without a straightforward recommendation
|
|
for how to resolve that error when encountered (however rare encountering
|
|
the new error is expected to be in practice). For security sensitive code that
|
|
actually does need to use the lower level interfaces to the system random
|
|
number generator (rather than the new ``secrets`` module), and does receive
|
|
live bug reports indicating this is a real problem for the userbase of that
|
|
particular application rather than a theoretical one, this PEP's recommendation
|
|
will be to add the following snippet (directly or indirectly) to the
|
|
``__main__`` module::
|
|
|
|
import secrets
|
|
secrets.wait_for_system_rng()
|
|
|
|
Or, if compatibility with versions prior to Python 3.6 is needed::
|
|
|
|
try:
|
|
import secrets
|
|
except ImportError:
|
|
pass
|
|
else:
|
|
secrets.wait_for_system_rng()
|
|
|
|
Within the ``secrets`` module itself, this will then be used in
|
|
``token_bytes()`` to block implicitly if the new exception is encountered::
|
|
|
|
def token_bytes(nbytes=None):
|
|
if nbytes is None:
|
|
nbytes = DEFAULT_ENTROPY
|
|
try:
|
|
result = os.urandom(nbytes)
|
|
except BlockingIOError:
|
|
wait_for_system_rng()
|
|
result = os.urandom(nbytes)
|
|
return result
|
|
|
|
Other parts of the module will then be updated to use ``token_bytes()`` as
|
|
their basic random number generation building block, rather than calling
|
|
``os.urandom()`` directly.
|
|
|
|
Application frameworks covering use cases where access to the system random
|
|
number generator is almost certain to be needed (e.g. web frameworks) may
|
|
choose to incorporate a call to ``secrets.wait_for_system_rng()`` implicitly
|
|
into the commands that start the application such that existing calls to
|
|
``os.urandom()`` will be guaranteed to never raise the new exception when using
|
|
those frameworks.
|
|
|
|
For cases where the error is encountered for an application which cannot be
|
|
modified directly, then the following command can be used to wait for the
|
|
system random number generator to initialize before starting that application::
|
|
|
|
python3 -c "import secrets; secrets.wait_for_system_rng()"
|
|
|
|
For example, this snippet could be added to a shell script or a systemd
|
|
``ExecStartPre`` hook (and may prove useful in reliably waiting for the
|
|
system random number generator to be ready, even if the subsequent command
|
|
is not itself an application running under Python 3.6)
|
|
|
|
Given the changes proposed to ``os.urandom()`` above, and the inclusion of
|
|
an ``os.getrandom()`` API on systems that support it, the suggested
|
|
implementation of this function would be::
|
|
|
|
if hasattr(os, "getrandom"):
|
|
# os.getrandom() always blocks waiting for the system RNG by default
|
|
def wait_for_system_rng():
|
|
"""Block waiting for system random number generator to be ready"""
|
|
os.getrandom(1)
|
|
return
|
|
else:
|
|
# As far as we know, other platforms will never get BlockingIOError
|
|
# below but the implementation makes pessimistic assumptions
|
|
def wait_for_system_rng():
|
|
"""Block waiting for system random number generator to be ready"""
|
|
# If the system RNG is already seeded, don't wait at all
|
|
try:
|
|
os.urandom(1)
|
|
return
|
|
except BlockingIOError:
|
|
pass
|
|
# Avoid the below busy loop if possible
|
|
try:
|
|
block_on_system_rng = open("/dev/random", "rb")
|
|
except FileNotFoundError:
|
|
pass
|
|
else:
|
|
with block_on_system_rng:
|
|
block_on_system_rng.read(1)
|
|
# Busy loop until the system RNG is ready
|
|
while True:
|
|
try:
|
|
os.urandom(1)
|
|
break
|
|
except BlockingIOError:
|
|
# Only check once per millisecond
|
|
time.sleep(0.001)
|
|
|
|
On systems where it is possible to wait for the system RNG to be ready, this
|
|
function will do so without a busy loop if ``os.getrandom()`` is defined,
|
|
``os.urandom()`` itself implicitly blocks, or the ``/dev/random`` device is
|
|
available. If the system random number generator is ready, this call is
|
|
guaranteed to never block, even if the system's ``/dev/random`` device uses
|
|
a design that permits it to block intermittently during normal system operation.
|
|
|
|
|
|
Limitations on scope
|
|
--------------------
|
|
|
|
No changes are proposed for Windows or Mac OS X systems, as neither of those
|
|
platforms provides any mechanism to run Python code before the operating
|
|
system random number generator has been initialized. Mac OS X goes so far as
|
|
to kernel panic and abort the boot process if it can't properly initialize the
|
|
random number generator (although Apple's restrictions on the supported
|
|
hardware platforms make that exceedingly unlikely in practice).
|
|
|
|
Similarly, no changes are proposed for other \*nix systems that do not offer
|
|
the ``getrandom()`` syscall. On these systems, ``os.urandom()`` will continue
|
|
to block waiting for the system random number generator to be initialized.
|
|
|
|
While other \*nix systems that offer a non-blocking API (other than
|
|
``getrandom()``) for requesting random numbers suitable for use in security
|
|
sensitive applications could potentially receive a similar update to the one
|
|
proposed for ``getrandom()`` in this PEP, such changes are out of scope for
|
|
this particular proposal.
|
|
|
|
Python's behaviour on older versions of affected platforms that do not offer
|
|
the new ``getrandom()`` syscall will also remain unchanged.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Ensuring the ``secrets`` module implicitly blocks when needed
|
|
-------------------------------------------------------------
|
|
|
|
This is done to help encourage the meme that arises for folks that want the
|
|
simplest possible answer to the right way to generate security sensitive random
|
|
numbers to be "Use the secrets module when available or your application might
|
|
crash unexpectedly", rather than the more boilerplate heavy "Always call
|
|
secrets.wait_for_system_rng() when available or your application might crash
|
|
unexpectedly".
|
|
|
|
It's also done due to the BDFL having a higher tolerance for APIs that might
|
|
block unexpectedly than he does for APIs that might throw an unexpected
|
|
exception [11]_.
|
|
|
|
|
|
Raising ``BlockingIOError`` in ``os.urandom()`` on Linux
|
|
--------------------------------------------------------
|
|
|
|
For several years now, the security community's guidance has been to use
|
|
``os.urandom()`` (or the ``random.SystemRandom()`` wrapper) when implementing
|
|
security sensitive operations in Python.
|
|
|
|
To help improve API discoverability and make it clearer that secrecy and
|
|
simulation are not the same problem (even though they both involve
|
|
random numbers), PEP 506 collected several of the one line recipes based
|
|
on the lower level ``os.urandom()`` API into a new ``secrets`` module.
|
|
|
|
However, this guidance has also come with a longstanding caveat: developers
|
|
writing security sensitive software at least for Linux, and potentially for
|
|
some other \*BSD systems, may need to wait until the operating system's
|
|
random number generator is ready before relying on it for security sensitive
|
|
operations. This generally only occurs if ``os.urandom()`` is read very
|
|
early in the system initialization process, or on systems with few sources of
|
|
available entropy (e.g. some kinds of virtualized or embedded systems), but
|
|
unfortunately the exact conditions that trigger this are difficult to predict,
|
|
and when it occurs then there is no direct way for userspace to tell it has
|
|
happened without querying operating system specific interfaces.
|
|
|
|
On \*BSD systems (if the particular \*BSD variant allows the problem to occur
|
|
at all) and potentially also Solaris and Illumos, encountering this situation
|
|
means ``os.urandom()`` will either block waiting for the system random number
|
|
generator to be ready (the associated symptom would be for the affected script
|
|
to pause unexpectedly on the first call to ``os.urandom()``) or else will
|
|
behave the same way as it does on Linux.
|
|
|
|
On Linux, in Python versions up to and including Python 3.4, and in
|
|
Python 3.5 maintenance versions following Python 3.5.2, there's no clear
|
|
indicator to developers that their software may not be working as expected
|
|
when run early in the Linux boot process, or on hardware without good
|
|
sources of entropy to seed the operating system's random number generator: due
|
|
to the behaviour of the underlying ``/dev/urandom`` device, ``os.urandom()``
|
|
on Linux returns a result either way, and it takes extensive statistical
|
|
analysis to show that a security vulnerability exists.
|
|
|
|
By contrast, if ``BlockingIOError`` is raised in those situations, then
|
|
developers using Python 3.6+ can easily choose their desired behaviour:
|
|
|
|
1. Wait for the system RNG at or before application startup (security sensitive)
|
|
2. Switch to using the random module (non-security sensitive)
|
|
|
|
|
|
Making ``secrets.wait_for_system_rng()`` public
|
|
-----------------------------------------------
|
|
|
|
Earlier versions of this PEP proposed a number of recipes for wrapping
|
|
``os.urandom()`` to make it suitable for use in security sensitive use cases.
|
|
|
|
Discussion of the proposal on the security-sig mailing list prompted the
|
|
realization [9]_ that the core assumption driving the API design in this PEP
|
|
was that choosing between letting the exception cause the application to fail,
|
|
blocking waiting for the system RNG to be ready and switching to using the
|
|
``random`` module instead of ``os.urandom`` is an application and use-case
|
|
specific decision that should take into account application and use-case
|
|
specific details.
|
|
|
|
There is no way for the interpreter runtime or support libraries to determine
|
|
whether a particular use case is security sensitive or not, and while it's
|
|
straightforward for application developer to decide how to handle an exception
|
|
thrown by a particular API, they can't readily workaround an API blocking when
|
|
they expected it to be non-blocking.
|
|
|
|
Accordingly, the PEP was updated to add ``secrets.wait_for_system_rng()`` as
|
|
an API for applications, scripts and frameworks to use to indicate that they
|
|
wanted to ensure the system RNG was available before continuing, while library
|
|
developers could continue to call ``os.urandom()`` without worrying that it
|
|
might unexpectedly start blocking waiting for the system RNG to be available.
|
|
|
|
|
|
Backwards Compatibility Impact Assessment
|
|
=========================================
|
|
|
|
Similar to PEP 476, this is a proposal to turn a previously silent security
|
|
failure into a noisy exception that requires the application developer to
|
|
make an explicit decision regarding the behaviour they desire.
|
|
|
|
As no changes are proposed for operating systems that don't provide the
|
|
``getrandom()`` syscall, ``os.urandom()`` retains its existing behaviour as
|
|
a nominally blocking API that is non-blocking in practice due to the difficulty
|
|
of scheduling Python code to run before the operating system random number
|
|
generator is ready. We believe it may be possible to encounter problems akin to
|
|
those described in this PEP on at least some \*BSD variants, but nobody has
|
|
explicitly demonstrated that. On Mac OS X and Windows, it appears to be
|
|
straight up impossible to even try to run a Python interpreter that early in
|
|
the boot process.
|
|
|
|
On Linux and other platforms with similar ``/dev/urandom`` behaviour,
|
|
``os.urandom()`` retains its status as a guaranteed non-blocking API.
|
|
However, the means of achieving that status changes in the specific case of
|
|
the operating system random number generator not being ready for use in security
|
|
sensitive operations: historically it would return potentially predictable
|
|
random data, with this PEP it would change to raise ``BlockingIOError``.
|
|
|
|
Developers of affected applications would then be required to make one of the
|
|
following changes to gain forward compatibility with Python 3.6, based on the
|
|
kind of application they're developing.
|
|
|
|
|
|
Unaffected Applications
|
|
-----------------------
|
|
|
|
The following kinds of applications would be entirely unaffected by the change,
|
|
regardless of whether or not they perform security sensitive operations:
|
|
|
|
- applications that don't support Linux
|
|
- applications that are only run on desktops or conventional servers
|
|
- applications that are only run after the system RNG is ready (including
|
|
those where an application framework calls ``secrets.wait_for_system_rng()``
|
|
on their behalf)
|
|
|
|
Applications in this category simply won't encounter the new exception, so it
|
|
will be reasonable for developers to wait and see if they receive
|
|
Python 3.6 compatibility bugs related to the new runtime behaviour, rather than
|
|
attempting to pre-emptively determine whether or not they're affected.
|
|
|
|
|
|
Affected security sensitive applications
|
|
----------------------------------------
|
|
|
|
Security sensitive applications would need to either change their system
|
|
configuration so the application is only started after the operating system
|
|
random number generator is ready for security sensitive operations, change the
|
|
application startup code to invoke ``secrets.wait_for_system_rng()``, or
|
|
else switch to using the new ``secrets.token_bytes()`` API.
|
|
|
|
As an example for components started via a systemd unit file, the following
|
|
snippet would delay activation until the system RNG was ready:
|
|
|
|
ExecStartPre=python3 -c "import secrets; secrets.wait_for_system_rng()"
|
|
|
|
Alternatively, the following snippet will use ``secrets.token_bytes()`` if
|
|
available, and fall back to ``os.urandom()`` otherwise:
|
|
|
|
try:
|
|
import secrets.token_bytes as _get_random_bytes
|
|
except ImportError:
|
|
import os.urandom as _get_random_bytes
|
|
|
|
|
|
Affected non-security sensitive applications
|
|
--------------------------------------------
|
|
|
|
Non-security sensitive applications should be updated to use the ``random``
|
|
module rather than ``os.urandom``::
|
|
|
|
def pseudorandom_bytes(num_bytes):
|
|
return random.getrandbits(num_bytes*8).to_bytes(num_bytes, "little")
|
|
|
|
Depending on the details of the application, the random module may offer
|
|
other APIs that can be used directly, rather than needing to emulate the
|
|
raw byte sequence produced by the ``os.urandom()`` API.
|
|
|
|
|
|
Additional Background
|
|
=====================
|
|
|
|
Why propose this now?
|
|
---------------------
|
|
|
|
The main reason is because the Python 3.5.0 release switched to using the new
|
|
Linux ``getrandom()`` syscall when available in order to avoid consuming a
|
|
file descriptor [1]_, and this had the side effect of making the following
|
|
operations block waiting for the system random number generator to be ready:
|
|
|
|
* ``os.urandom`` (and APIs that depend on it)
|
|
* importing the ``random`` module
|
|
* initializing the randomized hash algorithm used by some builtin types
|
|
|
|
While the first of those behaviours is arguably desirable (and consistent with
|
|
the existing behaviour of ``os.urandom`` on other operating systems), the
|
|
latter two behaviours are unnecessary and undesirable, and the last one is now
|
|
known to cause a system level deadlock when attempting to run Python scripts
|
|
during the Linux init process with Python 3.5.0 or 3.5.1 [2]_, while the second
|
|
one can cause problems when using virtual machines without robust entropy
|
|
sources configured [3]_.
|
|
|
|
Since decoupling these behaviours in CPython will involve a number of
|
|
implementation changes more appropriate for a feature release than a maintenance
|
|
release, the relatively simple resolution applied in Python 3.5.2 was to revert
|
|
all three of them to a behaviour similar to that of previous Python versions:
|
|
if the new Linux syscall indicates it will block, then Python 3.5.2 will
|
|
implicitly fall back on reading ``/dev/urandom`` directly [4]_.
|
|
|
|
However, this bug report *also* resulted in a range of proposals to add *new*
|
|
APIs like ``os.getrandom()`` [5]_, ``os.urandom_block()`` [6]_,
|
|
``os.pseudorandom()`` and ``os.cryptorandom()`` [7]_, or adding new optional
|
|
parameters to ``os.urandom()`` itself [8]_, and then attempting to educate
|
|
users on when they should call those APIs instead of just using a plain
|
|
``os.urandom()`` call.
|
|
|
|
These proposals arguably represent overreactions, as the question of reliably
|
|
obtaining random numbers suitable for security sensitive work on Linux is a
|
|
relatively obscure problem of interest mainly to operating system developers
|
|
and embedded systems programmers, that may not justify expanding the
|
|
Python standard library's cross-platform APIs with new Linux-specific concerns.
|
|
This is especially so with the ``secrets`` module already being added as the
|
|
"use this and don't worry about the low level details" option for developers
|
|
writing security sensitive software that for some reason can't rely on even
|
|
higher level domain specific APIs (like web frameworks) and also don't need to
|
|
worry about Python versions prior to Python 3.6.
|
|
|
|
That said, it's also the case that low cost ARM devices are becoming
|
|
increasingly prevalent, with a lot of them running Linux, and a lot of folks
|
|
writing Python applications that run on those devices. That creates an
|
|
opportunity to take an obscure security problem that currently requires a lot
|
|
of knowledge about Linux boot processes and provably unpredictable random
|
|
number generation to diagnose and resolve, and instead turn it into a
|
|
relatively mundane and easy-to-find-in-an-internet-search runtime exception.
|
|
|
|
|
|
The cross-platform behaviour of ``os.urandom()``
|
|
------------------------------------------------
|
|
|
|
On operating systems other than Linux and NetBSD, ``os.urandom()`` may already
|
|
block waiting for the operating system's random number generator to be ready.
|
|
This will happen at most once in the lifetime of the process, and the call is
|
|
subsequently guaranteed to be non-blocking.
|
|
|
|
Linux and NetBSD are outliers in that, even when the operating system's random
|
|
number generator doesn't consider itself ready for use in security sensitive
|
|
operations, reading from the ``/dev/urandom`` device will return random values
|
|
based on the entropy it has available.
|
|
|
|
This behaviour is potentially problematic, so Linux 3.17 added a new
|
|
``getrandom()`` syscall that (amongst other benefits) allows callers to
|
|
either block waiting for the random number generator to be ready, or
|
|
else request an error return if the random number generator is not ready.
|
|
Notably, the new API does *not* support the old behaviour of returning
|
|
data that is not suitable for security sensitive use cases.
|
|
|
|
Versions of Python prior up to and including Python 3.4 access the
|
|
Linux ``/dev/urandom`` device directly.
|
|
|
|
Python 3.5.0 and 3.5.1 (when build on a system that offered the new syscall)
|
|
called ``getrandom()`` in blocking mode in order to avoid the use of a file
|
|
descriptor to access ``/dev/urandom``. While there were no specific problems
|
|
reported due to ``os.urandom()`` blocking in user code, there *were* problems
|
|
due to CPython implicitly invoking the blocking behaviour during interpreter
|
|
startup and when importing the ``random`` module.
|
|
|
|
Rather than trying to decouple SipHash initialization from the
|
|
``os.urandom()`` implementation, Python 3.5.2 switched to calling
|
|
``getrandom()`` in non-blocking mode, and falling back to reading from
|
|
``/dev/urandom`` if the syscall indicates it will block.
|
|
|
|
As a result of the above, ``os.urandom()`` in all Python versions up to and
|
|
including Python 3.5 propagate the behaviour of the underling ``/dev/urandom``
|
|
device to Python code.
|
|
|
|
|
|
Problems with the behaviour of ``/dev/urandom`` on Linux
|
|
--------------------------------------------------------
|
|
|
|
The Python ``os`` module has largely co-evolved with Linux APIs, so having
|
|
``os`` module functions closely follow the behaviour of their Linux operating
|
|
system level counterparts when running on Linux is typically considered to be
|
|
a desirable feature.
|
|
|
|
However, ``/dev/urandom`` represents a case where the current behaviour is
|
|
acknowledged to be problematic, but fixing it unilaterally at the kernel level
|
|
has been shown to prevent some Linux distributions from booting (at least in
|
|
part due to components like Python currently using it for
|
|
non-security-sensitive purposes early in the system initialization process).
|
|
|
|
As an analogy, consider the following two functions::
|
|
|
|
def generate_example_password():
|
|
"""Generates passwords solely for use in code examples"""
|
|
return generate_unpredictable_password()
|
|
|
|
def generate_actual_password():
|
|
"""Generates actual passwords for use in real applications"""
|
|
return generate_unpredictable_password()
|
|
|
|
If you think of an operating system's random number generator as a method for
|
|
generating unpredictable, secret passwords, then you can think of Linux's
|
|
``/dev/urandom`` as being implemented like::
|
|
|
|
# Oversimplified artist's conception of the kernel code
|
|
# implementing /dev/urandom
|
|
def generate_unpredictable_password():
|
|
if system_rng_is_ready:
|
|
return use_system_rng_to_generate_password()
|
|
else:
|
|
# we can't make an unpredictable password; silently return a
|
|
# potentially predictable one instead:
|
|
return "p4ssw0rd"
|
|
|
|
In this scenario, the author of ``generate_example_password`` is fine - even if
|
|
``"p4ssw0rd"`` shows up a bit more often than they expect, it's only used in
|
|
examples anyway. However, the author of ``generate_actual_password`` has a
|
|
problem - how do they prove that their calls to
|
|
``generate_unpredictable_password`` never follow the path that returns a
|
|
predictable answer?
|
|
|
|
In real life it's slightly more complicated than this, because there
|
|
might be some level of system entropy available -- so the fallback might
|
|
be more like ``return random.choice(["p4ssword", "passw0rd",
|
|
"p4ssw0rd"])`` or something even more variable and hence only statistically
|
|
predictable with better odds than the author of ``generate_actual_password``
|
|
was expecting. This doesn't really make things more provably secure, though;
|
|
mostly it just means that if you try to catch the problem in the obvious way --
|
|
``if returned_password == "p4ssw0rd": raise UhOh`` -- then it doesn't work,
|
|
because ``returned_password`` might instead be ``p4ssword`` or even
|
|
``pa55word``, or just an arbitrary 64 bit sequence selected from fewer than
|
|
2**64 possibilities. So this rough sketch does give the right general idea of
|
|
the consequences of the "more predictable than expected" fallback behaviour,
|
|
even though it's thoroughly unfair to the Linux kernel team's efforts to
|
|
mitigate the practical consequences of this problem without resorting to
|
|
breaking backwards compatibility.
|
|
|
|
This design is generally agreed to be a bad idea. As far as we can
|
|
tell, there are no use cases whatsoever in which this is the behavior
|
|
you actually want. It has led to the use of insecure ``ssh`` keys on
|
|
real systems, and many \*nix-like systems (including at least Mac OS
|
|
X, OpenBSD, and FreeBSD) have modified their ``/dev/urandom``
|
|
implementations so that they never return predictable outputs, either
|
|
by making reads block in this case, or by simply refusing to run any
|
|
userspace programs until the system RNG has been
|
|
initialized. Unfortunately, Linux has so far been unable to follow
|
|
suit, because it's been empirically determined that enabling the
|
|
blocking behavior causes some currently extant distributions to
|
|
fail to boot.
|
|
|
|
Instead, the new ``getrandom()`` syscall was introduced, making
|
|
it *possible* for userspace applications to access the system random number
|
|
generator safely, without introducing hard to debug deadlock problems into
|
|
the system initialization processes of existing Linux distros.
|
|
|
|
|
|
Consequences of ``getrandom()`` availability for Python
|
|
-------------------------------------------------------
|
|
|
|
Prior to the introduction of the ``getrandom()`` syscall, it simply wasn't
|
|
feasible to access the Linux system random number generator in a provably
|
|
safe way, so we were forced to settle for reading from ``/dev/urandom`` as the
|
|
best available option. However, with ``getrandom()`` insisting on raising an
|
|
error or blocking rather than returning predictable data, as well as having
|
|
other advantages, it is now the recommended method for accessing the kernel
|
|
RNG on Linux, with reading ``/dev/urandom`` directly relegated to "legacy"
|
|
status. This moves Linux into the same category as other operating systems
|
|
like Windows, which doesn't provide a ``/dev/urandom`` device at all: the
|
|
best available option for implementing ``os.urandom()`` is no longer simply
|
|
reading bytes from the ``/dev/urandom`` device.
|
|
|
|
This means that what used to be somebody else's problem (the Linux kernel
|
|
development team's) is now Python's problem -- given a way to detect that the
|
|
system RNG is not initialized, we have to choose how to handle this
|
|
situation whenever we try to use the system RNG.
|
|
|
|
It could simply block, as was somewhat inadvertently implemented in 3.5.0,
|
|
and as is proposed in Victor Stinner's competing PEP::
|
|
|
|
# artist's impression of the CPython 3.5.0-3.5.1 behavior
|
|
def generate_unpredictable_bytes_or_block(num_bytes):
|
|
while not system_rng_is_ready:
|
|
wait
|
|
return unpredictable_bytes(num_bytes)
|
|
|
|
Or it could raise an error, as this PEP proposes (in *some* cases)::
|
|
|
|
# artist's impression of the behavior proposed in this PEP
|
|
def generate_unpredictable_bytes_or_raise(num_bytes):
|
|
if system_rng_is_ready:
|
|
return unpredictable_bytes(num_bytes)
|
|
else:
|
|
raise BlockingIOError
|
|
|
|
Or it could explicitly emulate the ``/dev/urandom`` fallback behavior,
|
|
as was implemented in 3.5.2rc1 and is expected to remain for the rest
|
|
of the 3.5.x cycle::
|
|
|
|
# artist's impression of the CPython 3.5.2rc1+ behavior
|
|
def generate_unpredictable_bytes_or_maybe_not(num_bytes):
|
|
if system_rng_is_ready:
|
|
return unpredictable_bytes(num_bytes)
|
|
else:
|
|
return (b"p4ssw0rd" * (num_bytes // 8 + 1))[:num_bytes]
|
|
|
|
(And the same caveats apply to this sketch as applied to the
|
|
``generate_unpredictable_password`` sketch of ``/dev/urandom`` above.)
|
|
|
|
There are five places where CPython and the standard library attempt to use the
|
|
operating system's random number generator, and thus five places where this
|
|
decision has to be made:
|
|
|
|
* initializing the SipHash used to protect ``str.__hash__`` and
|
|
friends against DoS attacks (called unconditionally at startup)
|
|
* initializing the ``random`` module (called when ``random`` is
|
|
imported)
|
|
* servicing user calls to the ``os.urandom`` public API
|
|
* the higher level ``random.SystemRandom`` public API
|
|
* the new ``secrets`` module public API added by PEP 506
|
|
|
|
Previously, these five places all used the same underlying code, and
|
|
thus made this decision in the same way.
|
|
|
|
This whole problem was first noticed because 3.5.0 switched that
|
|
underlying code to the ``generate_unpredictable_bytes_or_block`` behavior,
|
|
and it turns out that there are some rare cases where Linux boot
|
|
scripts attempted to run a Python program as part of system initialization, the
|
|
Python startup sequence blocked while trying to initialize SipHash,
|
|
and then this triggered a deadlock because the system stopped doing
|
|
anything -- including gathering new entropy -- until the Python script
|
|
was forcibly terminated by an external timer. This is particularly unfortunate
|
|
since the scripts in question never processed untrusted input, so there was no
|
|
need for SipHash to be initialized with provably unpredictable random data in
|
|
the first place. This motivated the change in 3.5.2rc1 to emulate the old
|
|
``/dev/urandom`` behavior in all cases (by calling ``getrandom()`` in
|
|
non-blocking mode, and then falling back to reading ``/dev/urandom``
|
|
if the syscall indicates that the ``/dev/urandom`` pool is not yet
|
|
fully initialized.)
|
|
|
|
We don't know whether such problems may also exist in the Fedora/RHEL/CentOS
|
|
ecosystem, as the build systems for those distributions use chroots on servers
|
|
running an older operating system kernel that doesn't offer the ``getrandom()``
|
|
syscall, which means CPython's current build configuration compiles out the
|
|
runtime check for that syscall [10]_.
|
|
|
|
A similar problem was found due to the ``random`` module calling
|
|
``os.urandom`` as a side-effect of import in order to seed the default
|
|
global ``random.Random()`` instance.
|
|
|
|
We have not received any specific complaints regarding direct calls to
|
|
``os.urandom()`` or ``random.SystemRandom()`` blocking with 3.5.0 or 3.5.1 -
|
|
only problem reports due to the implicit blocking on interpreter startup and
|
|
as a side-effect of importing the random module.
|
|
|
|
Independently of this PEP, the first two cases have already been updated to
|
|
never block, regardless of the behaviour of ``os.urandom()``.
|
|
|
|
Where PEP 524 proposes to make all 3 of the latter cases block implicitly,
|
|
this PEP proposes that approach only for the last case (the ``secrets``)
|
|
module, with ``os.urandom()`` and ``random.SystemRandom()`` instead raising
|
|
an exception when they detect that the underlying operating system call
|
|
would block.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] os.urandom() should use Linux 3.17 getrandom() syscall
|
|
(http://bugs.python.org/issue22181)
|
|
|
|
.. [2] Python 3.5 running on Linux kernel 3.17+ can block at startup or on
|
|
importing the random module on getrandom()
|
|
(http://bugs.python.org/issue26839)
|
|
|
|
.. [3] "import random" blocks on entropy collection on Linux with low entropy
|
|
(http://bugs.python.org/issue25420)
|
|
|
|
.. [4] os.urandom() doesn't block on Linux anymore
|
|
(https://hg.python.org/cpython/rev/9de508dc4837)
|
|
|
|
.. [5] Proposal to add os.getrandom()
|
|
(http://bugs.python.org/issue26839#msg267803)
|
|
|
|
.. [6] Add os.urandom_block()
|
|
(http://bugs.python.org/issue27250)
|
|
|
|
.. [7] Add random.cryptorandom() and random.pseudorandom, deprecate os.urandom()
|
|
(http://bugs.python.org/issue27279)
|
|
|
|
.. [8] Always use getrandom() in os.random() on Linux and add
|
|
block=False parameter to os.urandom()
|
|
(http://bugs.python.org/issue27266)
|
|
|
|
.. [9] Application level vs library level design decisions
|
|
(https://mail.python.org/pipermail/security-sig/2016-June/000057.html)
|
|
|
|
.. [10] Does the HAVE_GETRANDOM_SYSCALL config setting make sense?
|
|
(https://mail.python.org/pipermail/security-sig/2016-June/000060.html)
|
|
|
|
.. [11] Take a decision for os.urandom() in Python 3.6
|
|
(https://mail.python.org/pipermail/security-sig/2016-August/000084.htm)
|
|
|
|
|
|
For additional background details beyond those captured in this PEP and Victor's
|
|
competing PEP, also see Victor's prior collection of relevant information and
|
|
links at https://haypo-notes.readthedocs.io/summary_python_random_issue.html
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed into the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
coding: utf-8
|