Merge pull request #23 from ncoghlan/security-sensitive-rng
PEP 522: Rewrite accounting for Nathaniel's feedback
This commit is contained in:
commit
48c750982c
567
pep-0522.txt
567
pep-0522.txt
|
@ -1,8 +1,8 @@
|
||||||
PEP: 522
|
PEP: 522
|
||||||
Title: Raise BlockingIOError in security sensitive APIs on Linux
|
Title: Allow BlockingIOError in security sensitive APIs on Linux
|
||||||
Version: $Revision$
|
Version: $Revision$
|
||||||
Last-Modified: $Date$
|
Last-Modified: $Date$
|
||||||
Author: Nick Coghlan <ncoghlan@gmail.com>
|
Author: Nick Coghlan <ncoghlan@gmail.com>, Nathaniel J. Smith <njs@pobox.com>
|
||||||
Status: Draft
|
Status: Draft
|
||||||
Type: Standards Track
|
Type: Standards Track
|
||||||
Content-Type: text/x-rst
|
Content-Type: text/x-rst
|
||||||
|
@ -13,56 +13,105 @@ Python-Version: 3.6
|
||||||
Abstract
|
Abstract
|
||||||
========
|
========
|
||||||
|
|
||||||
On Linux systems, the documentation for ``os.urandom`` currently makes the
|
A number of APIs in the standard library that return random values nominally
|
||||||
following contradictory promises:
|
suitable for use in security sensitive operations currently have an obscure
|
||||||
|
Linux-specific failure mode that allows them to return values that are not,
|
||||||
|
in fact, suitable for such operations.
|
||||||
|
|
||||||
* to provide random numbers that are suitable for security sensitive
|
This PEP proposes changing such failures in Python 3.6 from the current silent,
|
||||||
operations (such as client authentication and cryptography)
|
hard to detect, and hard to debug, errors to easily detected and debugged errors
|
||||||
* to provide access to the best available randomness source provided by
|
by raising ``BlockingIOError`` with a suitable error message, allowing
|
||||||
the underlying operating system
|
developers the opportunity to unambiguously specify their preferred approach
|
||||||
* to present a relatively thin wrapper around the system ``/dev/urandom``
|
for handling the situation.
|
||||||
device
|
|
||||||
|
|
||||||
This PEP proposes that in Python 3.6+ the 3rd guarantee be dropped in order to
|
The APIs affected by this change would be:
|
||||||
preserve the first two: on Linux systems that provide the ``getrandom()``
|
|
||||||
syscall, ``os.urandom()`` would become a wrapper around that API, and raise
|
|
||||||
``BlockingIOError`` in cases where directly accessing ``/dev/urandom/`` would
|
|
||||||
instead return random data that may not be adequately unpredictable for use in
|
|
||||||
security sensitive operations.
|
|
||||||
|
|
||||||
As higher level abstractions over the lower level ``os.urandom()`` API, both
|
* ``os.urandom``
|
||||||
``random.SystemRandom()`` and the ``secrets`` would also be documented as
|
* ``random.SystemRandom``
|
||||||
potentially raising ``BlockingIOError``.
|
* the new ``secrets`` module added by PEP 506
|
||||||
|
|
||||||
In all cases, as soon as a call to ``os.urandom()`` succeeds, all future
|
The new exception would potentially be encountered in the following situations:
|
||||||
calls to ``os.urandom()`` in that process will succeed (once the operating
|
|
||||||
system random number generator is ready after system boot, it remains ready).
|
* Python code calling these APIs during Linux system initialization
|
||||||
|
* Python code running on improperly initialized Linux systems (e.g. embedded
|
||||||
|
hardware without adequate sources of entropy to seed the system random number
|
||||||
|
generator, or Linux VMs that aren't configured to accept entropy from the
|
||||||
|
VM host)
|
||||||
|
|
||||||
|
CPython interpreter initialization and ``random`` module initialization would
|
||||||
|
also be updated to gracefully fall back to alternative seeding options if the
|
||||||
|
system random number generator is not ready.
|
||||||
|
|
||||||
|
|
||||||
Proposal
|
Proposal
|
||||||
========
|
========
|
||||||
|
|
||||||
|
Changing ``os.urandom()`` on Linux
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
This PEP proposes that in Python 3.6+, ``os.urandom()`` be updated to call
|
This PEP proposes that in Python 3.6+, ``os.urandom()`` be updated to call
|
||||||
the new Linux ``getrandom()``` syscall in non-blocking mode if available and
|
the new Linux ``getrandom()``` syscall in non-blocking mode if available and
|
||||||
raise ``BlockingIOError: system random number generator is not ready`` if
|
raise ``BlockingIOError: system random number generator is not ready`` if
|
||||||
the kernel reports that the call would block.
|
the kernel reports that the call would block.
|
||||||
|
|
||||||
|
This behaviour will then
|
||||||
|
propagate through to higher level standard library APIs that depend on
|
||||||
|
``os.urandom`` (specifically ``random.SystemRandom`` and the new ``secrets``
|
||||||
|
module introduced by PEP 506).
|
||||||
|
|
||||||
|
In all cases, as soon as a call to one of these security sensitive APIs
|
||||||
|
succeeds, all future calls to these APIs in that process will succeed (once
|
||||||
|
the operating system random number generator is ready after system boot, it
|
||||||
|
remains ready).
|
||||||
|
|
||||||
|
|
||||||
|
Related changes
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Currently, SipHash initialization and ``random`` module initialization
|
||||||
|
both gather random bytes using the same code that underlies
|
||||||
|
``os.urandom``. This PEP proposes to modify these so that in situations where
|
||||||
|
``os.urandom`` would raise a ``BlockingIOError``, they automatically
|
||||||
|
fall back on potentially more predictable sources of randomness (and in the
|
||||||
|
SipHash case, print a warning message to ``stderr`` indicating that that
|
||||||
|
particular Python process should not be used to process untrusted data).
|
||||||
|
|
||||||
|
To transparently accommodate a potential future where Linux adopts the same
|
||||||
|
"potentially blocking during system initialization" ``/dev/urandom`` behaviour
|
||||||
|
used by other \*nix systems, this fallback source of randomness will *not* be
|
||||||
|
the ``/dev/urandom`` device.
|
||||||
|
|
||||||
|
|
||||||
|
Limitations on scope
|
||||||
|
--------------------
|
||||||
|
|
||||||
No changes are proposed for Windows or Mac OS X systems, as neither of those
|
No changes are proposed for Windows or Mac OS X systems, as neither of those
|
||||||
platforms provides any mechanism to run Python code before the operating
|
platforms provides any mechanism to run Python code before the operating
|
||||||
system random number generator has been initialised. Mac OS X goes so far as
|
system random number generator has been initialized. Mac OS X goes so far as
|
||||||
to kernel panic and abort the boot process if it can't properly initialise the
|
to kernel panic and abort the boot process if it can't properly initialize the
|
||||||
random number generator (although Apple's restrictions on the supported
|
random number generator (although Apple's restrictions on the supported
|
||||||
hardware platforms make that exceedingly unlikely in practice).
|
hardware platforms make that exceedingly unlikely in practice).
|
||||||
|
|
||||||
Other \*nix systems that offer a non-blocking API for requesting random numbers
|
Similarly, no changes are proposed for other \*nix systems where
|
||||||
suitable for use in security sensitive applications could potentially receive
|
``os.urandom()`` will currently block waiting for the system random number
|
||||||
a similar update, but such changes are out of scope for this particular
|
generator to be initialized, rather than returning values that are potentially
|
||||||
proposal.
|
unsuitable for use in security sensitive applications.
|
||||||
|
|
||||||
|
While other \*nix systems that offer a non-blocking API for requesting random
|
||||||
|
numbers suitable for use in security sensitive applications could potentially
|
||||||
|
receive a similar update to the one proposed for Linux in this PEP, such
|
||||||
|
changes are out of scope for this particular proposal.
|
||||||
|
|
||||||
|
Python's behaviour on older Linux systems that do not offer the new
|
||||||
|
``getrandom()`` syscall will also remain unchanged.
|
||||||
|
|
||||||
|
|
||||||
Rationale
|
Rationale
|
||||||
=========
|
=========
|
||||||
|
|
||||||
|
Raising ``BlockingIOError`` in ``os.urandom()`` on Linux
|
||||||
|
--------------------------------------------------------
|
||||||
|
|
||||||
For several years now, the security community's guidance has been to use
|
For several years now, the security community's guidance has been to use
|
||||||
``os.urandom()`` (or the ``random.SystemRandom()`` wrapper) when implementing
|
``os.urandom()`` (or the ``random.SystemRandom()`` wrapper) when implementing
|
||||||
security sensitive operations in Python.
|
security sensitive operations in Python.
|
||||||
|
@ -76,78 +125,84 @@ However, this guidance has also come with a longstanding caveat: developers
|
||||||
writing security sensitive software at least for Linux, and potentially for
|
writing security sensitive software at least for Linux, and potentially for
|
||||||
some other \*BSD systems, may need to wait until the operating system's
|
some other \*BSD systems, may need to wait until the operating system's
|
||||||
random number generator is ready before relying on it for security sensitive
|
random number generator is ready before relying on it for security sensitive
|
||||||
operations.
|
operations. This generally only occurs if ``os.urandom()`` is read very
|
||||||
|
early in the system initialization process, or on systems with few sources of
|
||||||
|
available entropy (e.g. some kinds of virtualized or embedded systems), but
|
||||||
|
unfortunately the exact conditions that trigger this are difficult to predict,
|
||||||
|
and when it occurs then there is no direct way for userspace to tell it has
|
||||||
|
happened without querying operating system specific interfaces.
|
||||||
|
|
||||||
Unfortunately, there's currently no clear indicator to developers that their
|
On \*BSD systems, encountering this situation means ``os.urandom()`` will block
|
||||||
software may not be working as expected when run early in the Linux boot
|
waiting for the system random number generator to be ready - the associated
|
||||||
process, or on hardware without good sources of entropy to seed the operating
|
symptom would be for the affected script to pause unexpectedly on the first
|
||||||
system's random number generator: due to the behaviour of the underlying
|
call to ``os.urandom()``.
|
||||||
``/dev/urandom`` device, ``os.urandom()`` on Linux returns a result either way,
|
|
||||||
and it takes extensive statistical analysis to show that a security
|
However, on Linux, in Python versions up to and including Python 3.4, and in
|
||||||
vulnerability exists.
|
Python 3.5 maintenance versions following Python 3.5.2, there's no clear
|
||||||
|
indicator to developers that their software may not be working as expected
|
||||||
|
when run early in the Linux boot process, or on hardware without good
|
||||||
|
sources of entropy to seed the operating system's random number generator: due
|
||||||
|
to the behaviour of the underlying ``/dev/urandom`` device, ``os.urandom()``
|
||||||
|
on Linux returns a result either way, and it takes extensive statistical
|
||||||
|
analysis to show that a security vulnerability exists.
|
||||||
|
|
||||||
By contrast, if ``BlockingIOError`` is raised in those situations, then
|
By contrast, if ``BlockingIOError`` is raised in those situations, then
|
||||||
developers can easily choose their desired behaviour:
|
developers using Python 3.6+ can easily choose their desired behaviour:
|
||||||
|
|
||||||
1. Loop until the call succeeds (security sensitive)
|
1. Loop until the call succeeds (security sensitive)
|
||||||
2. Switch to using the random module (non-security sensitive)
|
2. Switch to using the random module (non-security sensitive)
|
||||||
3. Switch to reading ``/dev/urandom`` directly (non-security sensitive)
|
3. Switch to reading ``/dev/urandom`` directly (non-security sensitive)
|
||||||
|
|
||||||
|
|
||||||
Why now?
|
Issuing a warning for potentially predictable internal hash initialization
|
||||||
--------
|
--------------------------------------------------------------------------
|
||||||
|
|
||||||
The main reason is because the 3.5 SipHash initialisation bug causing a deadlock
|
The challenge for internal hash initialization is that it might be very
|
||||||
when attempting to run Python scripts during the Linux init process resulted in
|
important to initialize SipHash with a reliably unpredictable random seed
|
||||||
a rash of proposals to add *new* APIs like ``getrandom()``, ``urandom_block()``,
|
(for processes that are exposed to potentially hostile input) or it might be
|
||||||
``pseudorandom()`` and ``cryptorandom()`` to the ``os`` module and to start
|
totally unimportant (for processes that never have to deal with untrusted data).
|
||||||
trying to educate users on when they should call those APIs instead of
|
|
||||||
``os.urandom()``.
|
|
||||||
|
|
||||||
This is a *really* obscure problem, and we definitely shouldn't clutter up the
|
The Python runtime has no way to know which case a given invocation involves,
|
||||||
standard library with new APIs without a compelling reason, especially with the
|
which means that if we allow SipHash initialization to block or error out,
|
||||||
``secrets`` module already being added as the "use this and don't worry about
|
then our intended security enhancement may break code that is already safe
|
||||||
the low level details" for developers that don't need to worry about versions
|
and working fine, which is unacceptable -- especially since we are reasonably
|
||||||
prior to Python 3.6.
|
confident that most Python invocations that might run during Linux system
|
||||||
|
initialization fall into this category (exposure to untrusted input tends to
|
||||||
|
involve network access, which typically isn't brought up until after the system
|
||||||
|
random number generator is initialized).
|
||||||
|
|
||||||
However, it's also the case that low cost ARM devices are becoming increasingly
|
However, at the same time, since Python has no way to know whether any given
|
||||||
prevalent, with a lot of them running Linux, and a lot of folks writing
|
invocation needs to handle untrusted data, when the default SipHash
|
||||||
Python applications that run on those devices. That creates an opportunity to
|
initialization fails this *might* indicate a genuine security problem, which
|
||||||
take an obscure security problem that requires a lot of knowledge about
|
should not be allowed to pass silently.
|
||||||
Linux boot processes and secure random number generation and turn it into a
|
|
||||||
relatively mundane and easy-to-find-in-an-internet-search runtime exception.
|
Accordingly, if internal hash initialization needs to fall back to a potentially
|
||||||
|
predictable seed due to the system random number generator not being ready, it
|
||||||
|
will also emit a warning message on ``stderr`` to say that the system random
|
||||||
|
number generator is not available and that processing potentially hostile
|
||||||
|
untrusted data should be avoided.
|
||||||
|
|
||||||
|
|
||||||
Background
|
Allowing potentially predictable ``random`` module initialization
|
||||||
==========
|
-----------------------------------------------------------------
|
||||||
|
|
||||||
On operating systems other than Linux, ``os.urandom()`` may already block
|
Other than for ``random.SystemRandom`` (which is a relatively thin
|
||||||
waiting for the operating system's random number generator to be ready.
|
wrapper around ``os.urandom``), the ``random`` module has never made
|
||||||
|
any guarantees that the numbers it generates are suitable for use in
|
||||||
|
security sensitive operations, so the use of the system random number
|
||||||
|
generator to seed the default Mersenne Twister instance is mainly beneficial
|
||||||
|
as a harm mitigation measure for code that is using the ``random`` module
|
||||||
|
inappropriately.
|
||||||
|
|
||||||
On Linux, even when the operating system's random number generator doesn't
|
Since a single call to ``os.urandom()`` is cheap once the system random
|
||||||
consider itself ready for use in security sensitive operations, it will return
|
number generator has been initialized it makes sense to retain that as the
|
||||||
random values based on the entropy it as available.
|
default behaviour, but there's no need to issue a warning when falling back to
|
||||||
|
a potentially more predictable alternative when necessary (in such cases,
|
||||||
This behaviour is potentially problematic, so Linux 3.17 added a new
|
a warning will typically already have been issued as part of interpreter
|
||||||
``getrandom()`` syscall that (amongst other benefits) allows callers to
|
startup, as the only way for the call when importing the random module to
|
||||||
either block waiting for the random number generator to be ready, or
|
fail without the implicit call during interpreter startup also failing if for
|
||||||
else request an error return if the random number generator is not ready.
|
the latter to have been skipped by entirely disabling the hash randomization
|
||||||
Notably, the new API does *not* support the old behaviour of returning
|
mechanism).
|
||||||
data that is not suitable for security sensitive use cases.
|
|
||||||
|
|
||||||
Versions of Python prior up to and including Python 3.4 access the
|
|
||||||
Linux ``/dev/urandom`` device directly.
|
|
||||||
|
|
||||||
Python 3.5.0 and 3.5.1 called ``getrandom()`` in blocking mode in order to
|
|
||||||
avoid the use of a file descriptor to access ``/dev/urandom``. While there
|
|
||||||
were no specific problems reported due to ``os.urandom()`` blocking in user
|
|
||||||
code, there *were* problems due to CPython implicitly invoking the blocking
|
|
||||||
behaviour during interpreter startup.
|
|
||||||
|
|
||||||
Rather than trying to decouple SipHash initialisation from the
|
|
||||||
``os.urandom()`` implementation, Python 3.5.2 switched to calling
|
|
||||||
``getrandom()`` in non-blocking mode, and falling back to reading from
|
|
||||||
``/dev/urandom`` if the syscall indicates it will block.
|
|
||||||
|
|
||||||
|
|
||||||
Backwards Compatibility Impact Assessment
|
Backwards Compatibility Impact Assessment
|
||||||
|
@ -172,8 +227,8 @@ sensitive operations: historically it would return potentially predictable
|
||||||
random data, with this PEP it would change to raise ``BlockingIOError``.
|
random data, with this PEP it would change to raise ``BlockingIOError``.
|
||||||
|
|
||||||
Developers of affected applications would then be required to make one of the
|
Developers of affected applications would then be required to make one of the
|
||||||
following changes to forward compatibility with Python 3.6, based on the kind
|
following changes to gain forward compatibility with Python 3.6, based on the
|
||||||
of application they're developing.
|
kind of application they're developing.
|
||||||
|
|
||||||
|
|
||||||
Unaffected Applications
|
Unaffected Applications
|
||||||
|
@ -186,6 +241,11 @@ regardless of whether or not they perform security sensitive operations:
|
||||||
- applications that are only run on desktops or conventional servers
|
- applications that are only run on desktops or conventional servers
|
||||||
- applications that are only run after the system RNG is ready
|
- applications that are only run after the system RNG is ready
|
||||||
|
|
||||||
|
Applications in this category simply won't encounter the new exception, so it
|
||||||
|
will be reasonable for developers to simply wait and see if they receive
|
||||||
|
Python 3.6 compatibility bugs related to the new runtime behaviour, rather than
|
||||||
|
attempting to pre-emptively determine whether or not they're affected.
|
||||||
|
|
||||||
|
|
||||||
Affected security sensitive applications
|
Affected security sensitive applications
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
|
@ -203,31 +263,350 @@ change their code to busy loop until the operating system is ready::
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
Affected non-security sensitive applications
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
Non-security sensitive applications that don't want to assume access to
|
||||||
|
``/dev/urandom`` (or assume a non-blocking implementation of that device)
|
||||||
|
can be updated to use the ``random`` module as a fallback option::
|
||||||
|
|
||||||
|
def pseudorandom_fallback(num_bytes):
|
||||||
|
try:
|
||||||
|
return os.urandom(num_bytes)
|
||||||
|
except BlockingIOError:
|
||||||
|
random.getrandbits(num_bytes*8).to_bytes(num_bytes, "little")
|
||||||
|
|
||||||
|
Depending on the application, it may also be appropriate to skip accessing
|
||||||
|
``os.urandom`` at all, and instead rely solely on the ``random`` module.
|
||||||
|
|
||||||
|
|
||||||
Affected Linux specific non-security sensitive applications
|
Affected Linux specific non-security sensitive applications
|
||||||
-----------------------------------------------------------
|
-----------------------------------------------------------
|
||||||
|
|
||||||
Non-security sensitive applications that don't need to worry about cross
|
Non-security sensitive applications that don't need to worry about cross
|
||||||
platform compatibility can be updated to access ``/dev/urandom`` directly::
|
platform compatibility and are willing to assume that ``/dev/urandom`` on
|
||||||
|
Linux will always retain its current behaviour can be updated to access
|
||||||
|
``/dev/urandom`` directly::
|
||||||
|
|
||||||
def dev_urandom(num_bytes):
|
def dev_urandom(num_bytes):
|
||||||
with open("/dev/urandom", "rb") as f:
|
with open("/dev/urandom", "rb") as f:
|
||||||
return f.read(num_bytes)
|
return f.read(num_bytes)
|
||||||
|
|
||||||
|
However, pursuing this option has the downside of contributing to ensuring
|
||||||
|
that the default behaviour of Linux at the operating system level can never
|
||||||
|
be changed.
|
||||||
|
|
||||||
Affected portable non-security sensitive applications
|
|
||||||
-----------------------------------------------------
|
|
||||||
|
|
||||||
Non-security sensitive applications that don't want to assume access to
|
Additional Background
|
||||||
``/dev/urandom`` can be updated to use the ``random`` module instead::
|
=====================
|
||||||
|
|
||||||
def pseudorandom(num_bytes):
|
Why propose this now?
|
||||||
random.getrandbits(num_bytes*8).to_bytes(num_bytes, "little")
|
---------------------
|
||||||
|
|
||||||
|
The main reason is because the Python 3.5.0 release switched to using the new
|
||||||
|
Linux ``getrandom()`` syscall when available in order to avoid consuming a
|
||||||
|
file descriptor [1]_, and this had the side effect of making the following
|
||||||
|
operations block waiting for the system random number generator to be ready:
|
||||||
|
|
||||||
|
* ``os.urandom`` (and APIs that depend on it)
|
||||||
|
* importing the ``random`` module
|
||||||
|
* initializing the randomized hash algorithm used by some builtin types
|
||||||
|
|
||||||
|
While the first of those behaviours is arguably desirable (and consistent with
|
||||||
|
``os.urandom``'s existing behaviour on other operating systems), the latter two
|
||||||
|
behaviours are unnecessary and undesirable, and the last one is now known to
|
||||||
|
cause a system level deadlock when attempting to run Python scripts during the
|
||||||
|
Linux init process with Python 3.5.0 or 3.5.1 [2]_, while the second one can
|
||||||
|
cause problems when using virtual machines without robust entropy sources
|
||||||
|
configured [3]_.
|
||||||
|
|
||||||
|
Since decoupling these behaviours in CPython will involve a number of
|
||||||
|
implementation changes more appropriate for a feature release than a maintenance
|
||||||
|
release, the relatively simple resolution applied in Python 3.5.2 was to revert
|
||||||
|
all three of them to a behaviour similar to that of previous Python versions:
|
||||||
|
if the new Linux syscall indicates it will block, then Python 3.5.2 will
|
||||||
|
implicitly fall back on reading ``/dev/urandom`` directly [4]_.
|
||||||
|
|
||||||
|
However, this bug report *also* resulted in a range of proposals to add *new*
|
||||||
|
APIs like ``os.getrandom()`` [5]_, ``os.urandom_block()`` [6]_,
|
||||||
|
``os.pseudorandom()`` and ``os.cryptorandom()`` [7]_, or adding new optional
|
||||||
|
parameters to ``os.urandom()`` itself [8]_, and then attempting to educate
|
||||||
|
users on when they should call those APIs instead of just using a plain
|
||||||
|
``os.urandom()`` call.
|
||||||
|
|
||||||
|
These proposals represent dramatic overreactions, as the question of reliably
|
||||||
|
obtaining random numbers suitable for security sensitive work on Linux is a
|
||||||
|
relatively obscure problem of interest mainly to operating system developers
|
||||||
|
and embedded systems programmers, that in no way justifies cluttering up the
|
||||||
|
Python standard library's cross-platform APIs with new Linux-specific concerns.
|
||||||
|
This is especially so with the ``secrets`` module already being added as the
|
||||||
|
"use this and don't worry about the low level details" option for developers
|
||||||
|
writing security sensitive software that for some reason can't rely on even
|
||||||
|
higher level domain specific APIs (like web frameworks) and also don't need to
|
||||||
|
worry about Python versions prior to Python 3.6.
|
||||||
|
|
||||||
|
That said, it's also the case that low cost ARM devices are becoming
|
||||||
|
increasingly prevalent, with a lot of them running Linux, and a lot of folks
|
||||||
|
writing Python applications that run on those devices. That creates an
|
||||||
|
opportunity to take an obscure security problem that currently requires a lot
|
||||||
|
of knowledge about Linux boot processes and provably unpredictable random
|
||||||
|
number generation to diagnose and resolve, and instead turn it into a
|
||||||
|
relatively mundane and easy-to-find-in-an-internet-search runtime exception.
|
||||||
|
|
||||||
|
|
||||||
|
The cross-platform behaviour of ``os.urandom()``
|
||||||
|
------------------------------------------------
|
||||||
|
|
||||||
|
On operating systems other than Linux, ``os.urandom()`` may already block
|
||||||
|
waiting for the operating system's random number generator to be ready. This
|
||||||
|
will happen at most once in the lifetime of the process, and the call is
|
||||||
|
subsequently guaranteed to be non-blocking.
|
||||||
|
|
||||||
|
Linux is unique in that, even when the operating system's random number
|
||||||
|
generator doesn't consider itself ready for use in security sensitive
|
||||||
|
operations, reading from the ``/dev/urandom`` device will return random values
|
||||||
|
based on the entropy it has available.
|
||||||
|
|
||||||
|
This behaviour is potentially problematic, so Linux 3.17 added a new
|
||||||
|
``getrandom()`` syscall that (amongst other benefits) allows callers to
|
||||||
|
either block waiting for the random number generator to be ready, or
|
||||||
|
else request an error return if the random number generator is not ready.
|
||||||
|
Notably, the new API does *not* support the old behaviour of returning
|
||||||
|
data that is not suitable for security sensitive use cases.
|
||||||
|
|
||||||
|
Versions of Python prior up to and including Python 3.4 access the
|
||||||
|
Linux ``/dev/urandom`` device directly.
|
||||||
|
|
||||||
|
Python 3.5.0 and 3.5.1 called ``getrandom()`` in blocking mode in order to
|
||||||
|
avoid the use of a file descriptor to access ``/dev/urandom``. While there
|
||||||
|
were no specific problems reported due to ``os.urandom()`` blocking in user
|
||||||
|
code, there *were* problems due to CPython implicitly invoking the blocking
|
||||||
|
behaviour during interpreter startup and when importing the ``random`` module.
|
||||||
|
|
||||||
|
Rather than trying to decouple SipHash initialization from the
|
||||||
|
``os.urandom()`` implementation, Python 3.5.2 switched to calling
|
||||||
|
``getrandom()`` in non-blocking mode, and falling back to reading from
|
||||||
|
``/dev/urandom`` if the syscall indicates it will block.
|
||||||
|
|
||||||
|
As a result of the above, ``os.urandom()`` in all Python versions up to and
|
||||||
|
including Python 3.5 propagate the behaviour of the underling ``/dev/urandom``
|
||||||
|
device to Python code.
|
||||||
|
|
||||||
|
|
||||||
|
Problems with the behaviour of ``/dev/urandom`` on Linux
|
||||||
|
--------------------------------------------------------
|
||||||
|
|
||||||
|
The Python ``os`` module has largely co-evolved with Linux APIs, so having
|
||||||
|
``os`` module functions closely follow the behaviour of their Linux operating
|
||||||
|
system level counterparts when running on Linux is typically considered to be
|
||||||
|
a desirable feature.
|
||||||
|
|
||||||
|
However, ``/dev/urandom`` represents a case where the current behaviour is
|
||||||
|
acknowledged to be problematic, but fixing it unilaterally at the kernel level
|
||||||
|
has been shown to prevent some Linux distributions from booting (at least in
|
||||||
|
part due to components like Python currently using it for
|
||||||
|
non-security-sensitive purposes early in the system initialization process).
|
||||||
|
|
||||||
|
As an analogy, consider the following two functions:
|
||||||
|
|
||||||
|
def generate_example_password():
|
||||||
|
"""Generates passwords solely for use in code examples"""
|
||||||
|
return generate_unpredictable_password()
|
||||||
|
|
||||||
|
def generate_actual_password():
|
||||||
|
"""Generates actual passwords for use in real applications"""
|
||||||
|
return generate_unpredictable_password()
|
||||||
|
|
||||||
|
If you think of an operating system's random number generator as a method for
|
||||||
|
generating unpredictable, secret passwords, then you can think of Linux's
|
||||||
|
``/dev/urandom`` as being implemented like::
|
||||||
|
|
||||||
|
# Oversimplified artist's conception of the kernel code
|
||||||
|
# implementing /dev/urandom
|
||||||
|
def generate_unpredictable_password():
|
||||||
|
if system_rng_is_ready:
|
||||||
|
return use_system_rng_to_generate_password()
|
||||||
|
else:
|
||||||
|
# we can't make an unpredictable password; silently return a
|
||||||
|
# potentially predictable one instead:
|
||||||
|
return "p4ssw0rd"
|
||||||
|
|
||||||
|
In this scenario, the author of ``generate_example_password`` is fine - even if
|
||||||
|
``"p4ssw0rd`` shows up a bit more often than they expect, it's only used in
|
||||||
|
examples anyway. However, the author of ``generate_actual_password`` has a
|
||||||
|
problem - how do they prove that their calls to
|
||||||
|
``generate_unpredictable_password`` never follow the path that returns a
|
||||||
|
predictable answer?
|
||||||
|
|
||||||
|
In real life it's slightly more complicated than this, because there
|
||||||
|
might be some level of system entropy available -- so the fallback might
|
||||||
|
be more like ``return random.choice(["p4ssword", "passw0rd",
|
||||||
|
"p4ssw0rd"])`` or something even more variable and hence only statistically
|
||||||
|
predictable with better odds than the author of ``generate_actual_password``
|
||||||
|
was expecting. This doesn't really make things more provably secure, though;
|
||||||
|
mostly it just means that if you try to catch the problem in the obvious way --
|
||||||
|
``if returned_password == "p4ssw0rd": raise UhOh`` -- then it doesn't work,
|
||||||
|
because ``returned_password`` might instead be ``p4ssword`` or even
|
||||||
|
``pa55word``, or just an arbitrary 64 bit sequence selected from fewer than
|
||||||
|
2**64 possibilities. So this rough sketch does give the right general idea of
|
||||||
|
the consequences of the "more predictable than expected" fallback behaviour,
|
||||||
|
even though it's thoroughly unfair to the Linux kernel team's efforts to
|
||||||
|
mitigate the practical consequences of this problem without resorting to
|
||||||
|
breaking backwards compatibility.
|
||||||
|
|
||||||
|
This design is generally agreed to be a bad idea. As far as we can
|
||||||
|
tell, there are no use cases whatsoever in which this is the behavior
|
||||||
|
you actually want. It has led to the use of insecure ``ssh`` keys on
|
||||||
|
real systems, and many \*nix-like systems (including at least Mac OS
|
||||||
|
X, OpenBSD, and FreeBSD) have modified their ``/dev/urandom``
|
||||||
|
implementations so that they never return predictable outputs, either
|
||||||
|
by making reads block in this case, or by simply refusing to run any
|
||||||
|
userspace programs until the system RNG has been
|
||||||
|
initialized. Unfortunately, Linux has so far been unable to follow
|
||||||
|
suit, because it's been empirically determined that enabling the
|
||||||
|
blocking behavior causes some currently extant distributions to
|
||||||
|
fail to boot.
|
||||||
|
|
||||||
|
Instead, the new ``getrandom()`` syscall was introduced, making
|
||||||
|
it *possible* for userspace applications to access the system random number
|
||||||
|
generator safely, without introducing hard to debug deadlock problems into
|
||||||
|
the system initialization processes of existing Linux distros.
|
||||||
|
|
||||||
|
|
||||||
|
Consequences of ``getrandom()`` availability for Python
|
||||||
|
-------------------------------------------------------
|
||||||
|
|
||||||
|
Prior to the introduction of the ``getrandom()`` syscall, it simply wasn't
|
||||||
|
feasible to access the Linux system random number generator in a provably
|
||||||
|
safe way, so we were forced to settle for reading from ``/dev/urandom`` as the
|
||||||
|
best available option. However, with ``getrandom()`` insisting on raising an
|
||||||
|
error or blocking rather than returning predictable data, as well as having
|
||||||
|
other advantages, it is now the recommended method for accessing the kernel
|
||||||
|
RNG on Linux, with reading ``/dev/urandom`` directly relegated to "legacy"
|
||||||
|
status. This moves Linux into the same category as other operating systems
|
||||||
|
like Windows, which doesn't provide a ``/dev/urandom`` device at all: the
|
||||||
|
best available option for implementing ``os.urandom()`` is no longer simply
|
||||||
|
reading bytes from the ``/dev/urandom`` device.
|
||||||
|
|
||||||
|
This means that what used to be somebody else's problem (the Linux kernel
|
||||||
|
development team's) is now Python's problem -- given a way to detect that the
|
||||||
|
system RNG is not initialized, we have to choose how to handle this
|
||||||
|
situation whenever we try to use the system RNG.
|
||||||
|
|
||||||
|
It could simply block, as was somewhat inadvertently implemented in 3.5.0::
|
||||||
|
|
||||||
|
# artist's impression of the CPython 3.5.0-3.5.1 behavior
|
||||||
|
def generate_unpredictable_bytes_or_block(num_bytes):
|
||||||
|
while not system_rng_is_ready:
|
||||||
|
wait
|
||||||
|
return unpredictable_bytes(num_bytes)
|
||||||
|
|
||||||
|
Or it could raise an error, as this PEP proposes (in *some* cases)::
|
||||||
|
|
||||||
|
# artist's impression of the behavior proposed in this PEP
|
||||||
|
def generate_unpredictable_bytes_or_raise(num_bytes):
|
||||||
|
if system_rng_is_ready:
|
||||||
|
return unpredictable_bytes(num_bytes)
|
||||||
|
else:
|
||||||
|
raise BlockingIOError
|
||||||
|
|
||||||
|
Or it could explicitly emulate the ``/dev/urandom`` fallback behavior,
|
||||||
|
as was implemented in 3.5.2rc1 and is expected to remain for the rest
|
||||||
|
of the 3.5.x cycle::
|
||||||
|
|
||||||
|
# artist's impression of the CPython 3.5.2rc1+ behavior
|
||||||
|
def generate_unpredictable_bytes_or_maybe_not(num_bytes):
|
||||||
|
if system_rng_is_ready:
|
||||||
|
return unpredictable_bytes(num_bytes)
|
||||||
|
else:
|
||||||
|
return (b"p4ssw0rd" * (num_bytes // 8 + 1))[:num_bytes]
|
||||||
|
|
||||||
|
(And the same caveats apply to this sketch as applied to the
|
||||||
|
``generate_unpredictable_password`` sketch of ``/dev/urandom`` above.)
|
||||||
|
|
||||||
|
There are five places where CPython and the standard library attempt to use the
|
||||||
|
operating system's random number generator, and thus five places where this
|
||||||
|
decision has to be made:
|
||||||
|
|
||||||
|
* initializing the SipHash used to protect ``str.__hash__`` and
|
||||||
|
friends against DoS attacks (called unconditionally at startup)
|
||||||
|
* initializing the ``random`` module (called when ``random`` is
|
||||||
|
imported)
|
||||||
|
* servicing user calls to the ``os.urandom`` public API
|
||||||
|
* the higher level ``random.SystemRandom`` public API
|
||||||
|
* the new ``secrets`` module public API added by PEP 506
|
||||||
|
|
||||||
|
Currently, these five places all use the same underlying code, and
|
||||||
|
thus make this decision in the same way.
|
||||||
|
|
||||||
|
This whole problem was first noticed because 3.5.0 switched that
|
||||||
|
underlying code to the ``generate_unpredictable_bytes_or_block`` behavior,
|
||||||
|
and it turns out that there are some rare cases where Linux boot
|
||||||
|
scripts attempted to run a Python program as part of system initialization, the
|
||||||
|
Python startup sequence blocked while trying to initialize SipHash,
|
||||||
|
and then this triggered a deadlock because the system stopped doing
|
||||||
|
anything -- including gathering new entropy -- until the Python script
|
||||||
|
was forcibly terminated by an external time. This is particularly unfortunate
|
||||||
|
since the scripts in question never processed untrusted input, so there was no
|
||||||
|
need for SipHash to be initialized with provably unpredictable random data in
|
||||||
|
the first place. This motivated the change in 3.5.2rc1 to emulate the old
|
||||||
|
``/dev/urandom`` behavior in all cases (by calling ``getrandom()`` in
|
||||||
|
non-blocking mode, and then falling back to reading ``/dev/urandom``
|
||||||
|
if the syscall indicates that the ``/dev/urandom`` pool is not yet
|
||||||
|
fully initialized.)
|
||||||
|
|
||||||
|
A similar problem was found due to the ``random`` module calling
|
||||||
|
``os.urandom`` as a side-effect of import in order to seed the default
|
||||||
|
global ``random.Random()`` instance.
|
||||||
|
|
||||||
|
We have not received any specific complaints regarding direct calls to
|
||||||
|
``os.urandom()`` or ``random.SystemRandom()`` blocking with 3.5.0 or 3.5.1 -
|
||||||
|
only problem reports due to the implicit blocking on interpreter startup and
|
||||||
|
as a side-effect of importing the random module.
|
||||||
|
|
||||||
|
Accordingly, this PEP proposes providing consistent shared behaviour for the
|
||||||
|
latter three cases (ensuring that their behaviour is unequivocally suitable for
|
||||||
|
all security sensitive operations), while updating the first two cases to
|
||||||
|
account for that behavioural change.
|
||||||
|
|
||||||
|
This approach should mean that the vast majority of Python users never need to
|
||||||
|
even be aware that this change was made, while those few whom it affects will
|
||||||
|
receive an exception at runtime that they can look up online and find suitable
|
||||||
|
guidance on addressing.
|
||||||
|
|
||||||
|
|
||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
|
|
||||||
* Victor's summary: http://haypo-notes.readthedocs.io/pep_random.html
|
.. [1] os.urandom() should use Linux 3.17 getrandom() syscall
|
||||||
|
(http://bugs.python.org/issue22181)
|
||||||
|
|
||||||
|
.. [2] Python 3.5 running on Linux kernel 3.17+ can block at startup or on
|
||||||
|
importing the random module on getrandom()
|
||||||
|
(http://bugs.python.org/issue26839)
|
||||||
|
|
||||||
|
.. [3] "import random" blocks on entropy collection on Linux with low entropy
|
||||||
|
(http://bugs.python.org/issue25420)
|
||||||
|
|
||||||
|
.. [4] os.urandom() doesn't block on Linux anymore
|
||||||
|
(https://hg.python.org/cpython/rev/9de508dc4837)
|
||||||
|
|
||||||
|
.. [5] Proposal to add os.getrandom()
|
||||||
|
(http://bugs.python.org/issue26839#msg267803)
|
||||||
|
|
||||||
|
.. [6] Add os.urandom_block()
|
||||||
|
(http://bugs.python.org/issue27250)
|
||||||
|
|
||||||
|
.. [7] Add random.cryptorandom() and random.pseudorandom, deprecate os.urandom()
|
||||||
|
(http://bugs.python.org/issue27279)
|
||||||
|
|
||||||
|
.. [8] Always use getrandom() in os.random() on Linux and add
|
||||||
|
block=False parameter to os.urandom()
|
||||||
|
(http://bugs.python.org/issue27266)
|
||||||
|
|
||||||
|
For additional background details beyond those captured in this PEP, also see
|
||||||
|
Victor Stinner's summary at http://haypo-notes.readthedocs.io/pep_random.html
|
||||||
|
|
||||||
|
|
||||||
Copyright
|
Copyright
|
||||||
=========
|
=========
|
||||||
|
|
Loading…
Reference in New Issue