561 lines
18 KiB
Plaintext
561 lines
18 KiB
Plaintext
PEP: 524
|
|
Title: Make os.urandom() blocking on Linux
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Victor Stinner <victor.stinner@gmail.com>
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 20-June-2016
|
|
Python-Version: 3.6
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
Modify ``os.urandom()`` to block on Linux 3.17 and newer until the OS
|
|
urandom is initialized to increase the security.
|
|
|
|
Add also a new ``os.getrandom()`` function (for Linux and Solaris) to be
|
|
able to choose how to handle when ``os.urandom()`` is going to block on
|
|
Linux.
|
|
|
|
|
|
The bug
|
|
=======
|
|
|
|
Original bug
|
|
------------
|
|
|
|
Python 3.5.0 was enhanced to use the new ``getrandom()`` syscall
|
|
introduced in Linux 3.17 and Solaris 11.3. The problem is that users
|
|
started to complain that Python 3.5 blocks at startup on Linux in
|
|
virtual machines and embedded devices: see issues `#25420
|
|
<http://bugs.python.org/issue25420>`_ and `#26839
|
|
<http://bugs.python.org/issue26839>`_.
|
|
|
|
On Linux, ``getrandom(0)`` blocks until the kernel initialized urandom
|
|
with 128 bits of entropy. The issue #25420 describes a Linux build
|
|
platform blocking at ``import random``. The issue #26839 describes a
|
|
short Python script used to compute a MD5 hash, systemd-cron, script
|
|
called very early in the init process. The system initialization blocks
|
|
on this script which blocks on ``getrandom(0)`` to initialize Python.
|
|
|
|
The Python initilization requires random bytes to implement a
|
|
counter-measure against the hash denial-of-service (hash DoS), see:
|
|
|
|
* `Issue #13703: Hash collision security issue
|
|
<http://bugs.python.org/issue13703>`_
|
|
* `PEP 456: Secure and interchangeable hash algorithm
|
|
<https://www.python.org/dev/peps/pep-0456/>`_
|
|
|
|
Importing the ``random`` module creates an instance of
|
|
``random.Random``: ``random._inst``. On Python 3.5, random.Random
|
|
constructor reads 2500 bytes from ``os.urandom()`` to seed a Mersenne
|
|
Twister RNG (random number generator).
|
|
|
|
Other platforms may be affected by this bug, but in practice, only Linux
|
|
systems use Python scripts to initialize the system.
|
|
|
|
|
|
Status in Python 3.5.2
|
|
----------------------
|
|
|
|
Python 3.5.2 behaves like Python 2.7 and Python 3.4. If the system
|
|
urandom is not initialized, the startup does not block, but
|
|
``os.urandom()`` can return low-quality entropy (even it is not easily
|
|
guessable).
|
|
|
|
|
|
Use Cases
|
|
=========
|
|
|
|
The following use cases are used to help to choose the right compromise
|
|
between security and practicability.
|
|
|
|
|
|
Use Case 1: init script
|
|
-----------------------
|
|
|
|
Use a Python 3 script to initialize the system, like systemd-cron. If
|
|
the script blocks, the system initialize is stuck too. The issue #26839
|
|
is a good example of this use case.
|
|
|
|
Use case 1.1: No secret needed
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
If the init script doesn't have to generate any secure secret, this use
|
|
case is already handled correctly in Python 3.5.2: Python startup
|
|
doesn't block on system urandom anymore.
|
|
|
|
Use case 1.2: Secure secret required
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
If the init script has to generate a secure secret, there is no safe
|
|
solution.
|
|
|
|
Falling back to weak entropy is not acceptable, it would
|
|
reduce the security of the program.
|
|
|
|
Python cannot produce itself secure entropy, it can only wait until
|
|
system urandom is initialized. But in this use case, the whole system
|
|
initialization is blocked by this script, so the system fails to boot.
|
|
|
|
The real answer is that the system iniitalization must not be blocked by
|
|
such script. It is ok to start the script very early at system
|
|
initialization, but the script may blocked a few seconds until it is
|
|
able to generate the secret.
|
|
|
|
Reminder: in some cases, the initialization of the system urandom never
|
|
occurs and so programs waiting for system urandom blocks forever.
|
|
|
|
|
|
Use Case 2: Web server
|
|
----------------------
|
|
|
|
Run a Python 3 web server serving web pages using HTTP and HTTPS
|
|
protocols. The server is started as soon as possible.
|
|
|
|
The first target of the hash DoS attack was web server: it's important
|
|
that the hash secret cannot be easily guessed by an attacker.
|
|
|
|
If serving a web page needs a secret to create a cookie, create an
|
|
encryption key, ..., the secret must be created with good entropy:
|
|
again, it must be hard to guess the secret.
|
|
|
|
A web server requires security. If a choice must be made between
|
|
security and running the server with weak entropy, security is more
|
|
important. If there is no good entropy: the server must block or fail
|
|
with an error.
|
|
|
|
The question is if it makes sense to start a web server on a host before
|
|
system urandom is initialized.
|
|
|
|
The issues #25420 and #26839 are restricted to the Python startup, not
|
|
to generate a secret before the system urandom is initialized.
|
|
|
|
|
|
Fix system urandom
|
|
==================
|
|
|
|
Load entropy from disk at boot
|
|
-------------------------------
|
|
|
|
Collecting entropy can take up to several minutes. To accelerate the
|
|
system initialization, operating systems store entropy on disk at
|
|
shutdown, and then reload entropy from disk at the boot.
|
|
|
|
If a system collects enough entropy at least once, the system urandom
|
|
will be initialized quickly, as soon as the entropy is reloaded from
|
|
disk.
|
|
|
|
|
|
Virtual machines
|
|
----------------
|
|
|
|
Virtual machines don't have a direct access to the hardware and so have
|
|
less sources of entropy than bare metal. A solution is to add a
|
|
`virtio-rng device
|
|
<https://fedoraproject.org/wiki/Features/Virtio_RNG>`_ to pass entropy
|
|
from the host to the virtual machine.
|
|
|
|
|
|
Embedded devices
|
|
----------------
|
|
|
|
A solution for embedded devices is to plug an hardware RNG.
|
|
|
|
For example, Raspberry Pi have an hardware RNG but it's not used by
|
|
default. See: `Hardware RNG on Raspberry Pi
|
|
<http://fios.sector16.net/hardware-rng-on-raspberry-pi/>`_.
|
|
|
|
|
|
|
|
Denial-of-service when reading random
|
|
=====================================
|
|
|
|
Don't use /dev/random but /dev/urandom
|
|
--------------------------------------
|
|
|
|
The ``/dev/random`` device should only used for very specific use cases.
|
|
Reading from ``/dev/random`` on Linux is likely to block. Users don't
|
|
like when an application blocks longer than 5 seconds to generate a
|
|
secret. It is only expected for specific cases like generating
|
|
explicitly an encryption key.
|
|
|
|
When the system has no available entropy, choosing between blocking
|
|
until entropy is available or falling back on lower quality entropy is a
|
|
matter of compromise between security and practicability. The choice
|
|
depends on the use case.
|
|
|
|
On Linux, ``/dev/urandom`` is secure, it should be used instead of
|
|
``/dev/random``. See `Myths about /dev/urandom
|
|
<http://www.2uo.de/myths-about-urandom/>`_ by Thomas Hühn: "Fact:
|
|
/dev/urandom is the preferred source of cryptographic randomness on
|
|
UNIX-like systems"
|
|
|
|
|
|
getrandom(size, 0) can block forever on Linux
|
|
---------------------------------------------
|
|
|
|
The origin of the Python issue #26839 is the `Debian bug
|
|
report #822431
|
|
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=822431>`_: in fact,
|
|
``getrandom(size, 0)`` blocks forever on the virtual machine. The system
|
|
succeeded to boot because systemd killed the blocked process after 90
|
|
seconds.
|
|
|
|
Solutions like `Load entropy from disk at boot`_ reduces the risk of
|
|
this bug.
|
|
|
|
|
|
Rationale
|
|
=========
|
|
|
|
On Linux, reading the ``/dev/urandom`` can return "weak" entropy before
|
|
urandom is fully initialized, before the kernel collected 128 bits of
|
|
entropy. Linux 3.17 adds a new ``getrandom()`` syscall which allows to
|
|
block until urandom is initialized.
|
|
|
|
On Python 3.5.2, os.urandom() uses the
|
|
``getrandom(size, GRND_NONBLOCK)``, but falls back on reading the
|
|
non-blocking ``/dev/urandom`` if ``getrandom(size, GRND_NONBLOCK)``
|
|
fails with ``EAGAIN``.
|
|
|
|
Security experts promotes ``os.urandom()`` to genereate cryptographic
|
|
keys because it is implemented with a `Cryptographically secure
|
|
pseudo-random number generator (CSPRNG)
|
|
<https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator>`_.
|
|
By the way, ``os.urandom()`` is preferred over ``ssl.RAND_bytes()`` for
|
|
different reasons.
|
|
|
|
This PEP proposes to modify os.urandom() to use ``getrandom()`` in
|
|
blocking mode to not return weak entropy, but also ensure that Python
|
|
will not block at startup.
|
|
|
|
|
|
Changes
|
|
=======
|
|
|
|
Make os.urandom() blocking on Linux
|
|
-----------------------------------
|
|
|
|
All changes described in this section are specific to the Linux
|
|
platform.
|
|
|
|
Changes:
|
|
|
|
* Modify os.urandom() to block until system urandom is initialized:
|
|
``os.urandom()`` (C function ``_PyOS_URandom()``) is modified to
|
|
always call ``getrandom(size, 0)`` (blocking mode) on Linux and
|
|
Solaris.
|
|
* Add a new private ``_PyOS_URandom_Nonblocking()`` function: try to
|
|
call ``getrandom(size, GRND_NONBLOCK)`` on Linux and Solaris, but
|
|
falls back on reading ``/dev/urandom`` if it fails with ``EAGAIN``.
|
|
* Initialize hash secret from non-blocking system urandom:
|
|
``_PyRandom_Init()`` is modified to call
|
|
``_PyOS_URandom_Nonblocking()``.
|
|
* ``random.Random`` constructor now uses non-blocking system urandom: it
|
|
is modified to use internally the new ``_PyOS_URandom_Nonblocking()``
|
|
function to seed the RNG.
|
|
|
|
|
|
Add a new os.getrandom() function
|
|
---------------------------------
|
|
|
|
A new ``os.getrandom(size, flags=0)`` function is added: use
|
|
``getrandom()`` syscall on Linux and ``getrandom()`` C function on
|
|
Solaris.
|
|
|
|
The function comes with 2 new flags:
|
|
|
|
* ``os.GRND_RANDOM``: read bytes from ``/dev/random`` rather than
|
|
reading ``/dev/urandom``
|
|
* ``os.GRND_NONBLOCK``: raise a BlockingIOError if ``os.getrandom()``
|
|
would block
|
|
|
|
The ``os.getrandom()`` is a thin wrapper on the ``getrandom()``
|
|
syscall/C function and so inherit of its behaviour. For example, on
|
|
Linux, it can return less bytes than requested if the syscall is
|
|
interrupted by a signal.
|
|
|
|
|
|
Examples using os.getrandom()
|
|
=============================
|
|
|
|
Best-effort RNG
|
|
---------------
|
|
|
|
Example of a portable non-blocking RNG function: try to get random bytes
|
|
from the OS urandom, or fallback on the random module.
|
|
|
|
::
|
|
|
|
def best_effort_rng(size):
|
|
# getrandom() is only available on Linux and Solaris
|
|
if not hasattr(os, 'getrandom'):
|
|
return os.urandom(size)
|
|
|
|
result = bytearray()
|
|
try:
|
|
# need a loop because getrandom() can return less bytes than
|
|
# requested for different reasons
|
|
while size:
|
|
data = os.getrandom(size, os.GRND_NONBLOCK)
|
|
result += data
|
|
size -= len(data)
|
|
except BlockingIOError:
|
|
# OS urandom is not initialized yet:
|
|
# fallback on the Python random module
|
|
data = bytes(random.randrange(256) for byte in range(size))
|
|
result += data
|
|
return bytes(result)
|
|
|
|
This function *can* block in theory on a platform where
|
|
``os.getrandom()`` is not available but ``os.urandom()`` can block.
|
|
|
|
|
|
wait_for_system_rng()
|
|
---------------------
|
|
|
|
Example of function waiting *timeout* seconds until the OS urandom is
|
|
initialized on Linux or Solaris::
|
|
|
|
def wait_for_system_rng(timeout, interval=1.0):
|
|
if not hasattr(os, 'getrandom'):
|
|
return
|
|
|
|
deadline = time.monotonic() + timeout
|
|
while True:
|
|
try:
|
|
os.getrandom(1, os.GRND_NONBLOCK)
|
|
except BlockingIOError:
|
|
pass
|
|
else:
|
|
return
|
|
|
|
if time.monotonic() > deadline:
|
|
raise Exception('OS urandom not initialized after %s seconds'
|
|
% timeout)
|
|
|
|
time.sleep(interval)
|
|
|
|
This function is *not* portable. For example, ``os.urandom()`` can block
|
|
on FreeBSD in theory, at the early stage of the system initialization.
|
|
|
|
|
|
Create a best-effort RNG
|
|
------------------------
|
|
|
|
Simpler example to create a non-blocking RNG on Linux: choose between
|
|
``Random.SystemRandom`` and ``Random.Random`` depending if
|
|
``getrandom(size)`` would block.
|
|
|
|
::
|
|
|
|
def create_nonblocking_random():
|
|
if not hasattr(os, 'getrandom'):
|
|
return random.Random()
|
|
|
|
try:
|
|
os.getrandom(1, os.GRND_NONBLOCK)
|
|
except BlockingIOError:
|
|
return random.Random()
|
|
else:
|
|
return random.SystemRandom()
|
|
|
|
This function is *not* portable. For example, ``random.SystemRandom``
|
|
can block on FreeBSD in theory, at the early stage of the system
|
|
initialization.
|
|
|
|
|
|
Alternative
|
|
===========
|
|
|
|
Leave os.urandom() unchanged, add os.getrandom()
|
|
------------------------------------------------
|
|
|
|
os.urandom() remains unchanged: never block, but it can return weak
|
|
entropy if system urandom is not initialized yet.
|
|
|
|
Only add the new ``os.getrandom()`` function (wrapper to the
|
|
``getrandom()`` syscall/C function).
|
|
|
|
The ``secrets.token_bytes()`` function should be used to write portable
|
|
code.
|
|
|
|
The problem with this change is that it expects that users understand
|
|
well security and know well each platforms. Python has the tradition of
|
|
hiding "implementation details". For example, ``os.urandom()`` is not a
|
|
thin wrapper to the ``/dev/urandom`` device: it uses
|
|
``CryptGenRandom()`` on Windows, it uses ``getentropy()`` on OpenBSD, it
|
|
tries ``getrandom()`` on Linux and Solaris or falls back on reading
|
|
``/dev/urandom``. Python already uses the best available system RNG
|
|
depending on the platform.
|
|
|
|
This PEP does not change the API:
|
|
|
|
* ``os.urandom()``, ``random.SystemRandom`` and ``secrets`` for security
|
|
* ``random`` module (except ``random.SystemRandom``) for all other usages
|
|
|
|
|
|
Raise BlockingIOError in os.urandom()
|
|
-------------------------------------
|
|
|
|
Proposition
|
|
^^^^^^^^^^^
|
|
|
|
`PEP 522: Allow BlockingIOError in security sensitive APIs on Linux
|
|
<https://www.python.org/dev/peps/pep-0522/>`_.
|
|
|
|
Python should not decide for the developer how to handle `The bug`_:
|
|
raising immediatly a ``BlockingIOError`` if ``os.urandom()`` is going to
|
|
block allows developers to choose how to handle this case:
|
|
|
|
* catch the exception and falls back to a non-secure entropy source:
|
|
read ``/dev/urandom`` on Linux, use the Python ``random`` module
|
|
(which is not secure at all), use time, use process identifier, etc.
|
|
* don't catch the error, the whole program fails with this fatal
|
|
exception
|
|
|
|
More generally, the exception helps to notify when sometimes goes wrong.
|
|
The application can emit a warning when it starts to wait for
|
|
``os.urandom()``.
|
|
|
|
Criticism
|
|
^^^^^^^^^
|
|
|
|
For the use case 2 (web server), falling back on non-secure entropy is
|
|
not acceptable. The application must handle ``BlockingIOError``: poll
|
|
``os.urandom()`` until it completes. Example::
|
|
|
|
def secret(n=16):
|
|
try:
|
|
return os.urandom(n)
|
|
except BlockingIOError:
|
|
pass
|
|
|
|
print("Wait for system urandom initialiation: move your "
|
|
"mouse, use your keyboard, use your disk, ...")
|
|
while 1:
|
|
# Avoid busy-loop: sleep 1 ms
|
|
time.sleep(0.001)
|
|
try:
|
|
return os.urandom(n)
|
|
except BlockingIOError:
|
|
pass
|
|
|
|
For correctness, all applications which must generate a secure secret
|
|
must be modified to handle ``BlockingIOError`` even if `The bug`_ is
|
|
unlikely.
|
|
|
|
The case of applications using ``os.urandom()`` but don't really require
|
|
security is not well defined. Maybe these applications should not use
|
|
``os.urandom()`` at the first place, but always the non-blocking
|
|
``random`` module. If ``os.urandom()`` is used for security, we are back
|
|
to the use case 2 described above: `Use Case 2: Web server`_. If a
|
|
developer doesn't want to drop ``os.urandom()``, the code should be
|
|
modified. Example::
|
|
|
|
def almost_secret(n=16):
|
|
try:
|
|
return os.urandom(n)
|
|
except BlockingIOError:
|
|
return bytes(random.randrange(256) for byte in range(n))
|
|
|
|
The question is if `The bug`_ is common enough to require that so many
|
|
applications have to be modified.
|
|
|
|
Another simpler choice is to refuse to start before the system urandom
|
|
is initialized::
|
|
|
|
def secret(n=16):
|
|
try:
|
|
return os.urandom(n)
|
|
except BlockingIOError:
|
|
print("Fatal error: the system urandom is not initialized")
|
|
print("Wait a bit, and rerun the program later.")
|
|
sys.exit(1)
|
|
|
|
Compared to Python 2.7, Python 3.4 and Python 3.5.2 where os.urandom()
|
|
never blocks nor raise an exception on Linux, such behaviour change can
|
|
be seen as a major regression.
|
|
|
|
|
|
Add an optional block parameter to os.urandom()
|
|
-----------------------------------------------
|
|
|
|
See the `issue #27250: Add os.urandom_block()
|
|
<http://bugs.python.org/issue27250>`_.
|
|
|
|
Add an optional block parameter to os.urandom(). The default value may
|
|
be ``True`` (block by default) or ``False`` (non-blocking).
|
|
|
|
The first technical issue is to implement ``os.urandom(block=False)`` on
|
|
all platforms. Only Linux 3.17 (and newer) and Solaris 11.3 (and newer)
|
|
have a well defined non-blocking API (``getrandom(size,
|
|
GRND_NONBLOCK)``).
|
|
|
|
As `Raise BlockingIOError in os.urandom()`_, it doesn't seem worth it to
|
|
make the API more complex for a theorical (or at least very rare) use
|
|
case.
|
|
|
|
As `Leave os.urandom() unchanged, add os.getrandom()`_, the problem is
|
|
that it makes the API more complex and so more error-prone.
|
|
|
|
|
|
|
|
|
|
|
|
Annexes
|
|
=======
|
|
|
|
Operating system random functions
|
|
---------------------------------
|
|
|
|
``os.urandom()`` uses the following functions:
|
|
|
|
* `OpenBSD: getentropy()
|
|
<http://man.openbsd.org/OpenBSD-current/man2/getentropy.2>`_
|
|
(OpenBSD 5.6)
|
|
* `Linux: getrandom()
|
|
<http://man7.org/linux/man-pages/man2/getrandom.2.html>`_ (Linux 3.17)
|
|
-- see also `A system call for random numbers: getrandom()
|
|
<https://lwn.net/Articles/606141/>`_
|
|
* Solaris: `getentropy()
|
|
<https://docs.oracle.com/cd/E53394_01/html/E54765/getentropy-2.html#scrolltoc>`_,
|
|
`getrandom()
|
|
<https://docs.oracle.com/cd/E53394_01/html/E54765/getrandom-2.html>`_
|
|
(both need Solaris 11.3)
|
|
* UNIX, BSD: /dev/urandom, /dev/random
|
|
* Windows: `CryptGenRandom()
|
|
<https://msdn.microsoft.com/en-us/library/windows/desktop/aa379942%28v=vs.85%29.aspx>`_
|
|
(Windows XP)
|
|
|
|
On Linux, commands to get the status of ``/dev/random`` (results are
|
|
number of bytes)::
|
|
|
|
$ cat /proc/sys/kernel/random/entropy_avail
|
|
2850
|
|
$ cat /proc/sys/kernel/random/poolsize
|
|
4096
|
|
|
|
Why using os.urandom()?
|
|
-----------------------
|
|
|
|
Since ``os.urandom()`` is implemented in the kernel, it doesn't have
|
|
issues of user-space RNG. For example, it is much harder to get its
|
|
state. It is usually built on a CSPRNG, so even if its state is
|
|
"stolen", it is hard to compute previously generated numbers. The kernel
|
|
has a good knowledge of entropy sources and feed regulary the entropy
|
|
pool.
|
|
|
|
That's also why ``os.urandom()`` is preferred over ``ssl.RAND_bytes()``.
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|