Add PEP 524: Make os.urandom() blocking on Linux
This commit is contained in:
parent
672f5f41b1
commit
f6cda7909e
|
@ -0,0 +1,469 @@
|
|||
PEP: 524
|
||||
Title: Make os.urandom() blocking on Linux
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Victor Stinner <victor.stinner@gmail.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 20-June-2016
|
||||
Python-Version: 3.6
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
Modify ``os.urandom()`` to block on Linux 3.17 and newer until the OS
|
||||
urandom is initialized.
|
||||
|
||||
|
||||
The bug
|
||||
=======
|
||||
|
||||
Original bug
|
||||
------------
|
||||
|
||||
Python 3.5.0 was enhanced to use the new ``getrandom()`` syscall
|
||||
introduced in Linux 3.17 and Solaris 11.3. The problem is that users
|
||||
started to complain that Python 3.5 blocks at startup on Linux in
|
||||
virtual machines and embedded devices: see issues `#25420
|
||||
<http://bugs.python.org/issue25420>`_ and `#26839
|
||||
<http://bugs.python.org/issue26839>`_.
|
||||
|
||||
On Linux, ``getrandom(0)`` blocks until the kernel initialized urandom
|
||||
with 128 bits of entropy. The issue #25420 describes a Linux build
|
||||
platform blocking at ``import random``. The issue #26839 describes a
|
||||
short Python script used to compute a MD5 hash, systemd-cron, script
|
||||
called very early in the init process. The system initialization blocks
|
||||
on this script which blocks on ``getrandom(0)`` to initialize Python.
|
||||
|
||||
The Python initilization requires random bytes to implement a
|
||||
counter-measure against the hash denial-of-service (hash DoS), see:
|
||||
|
||||
* `Issue #13703: Hash collision security issue
|
||||
<http://bugs.python.org/issue13703>`_
|
||||
* `PEP 456: Secure and interchangeable hash algorithm
|
||||
<https://www.python.org/dev/peps/pep-0456/>`_
|
||||
|
||||
Importing the ``random`` module creates an instance of
|
||||
``random.Random``: ``random._inst``. On Python 3.5, random.Random
|
||||
constructor reads 2500 bytes from ``os.urandom()`` to seed a Mersenne
|
||||
Twister RNG (random number generator).
|
||||
|
||||
Other platforms may be affected by this bug, but in practice, only Linux
|
||||
systems use Python scripts to initialize the system.
|
||||
|
||||
|
||||
Status in Python 3.5.2
|
||||
----------------------
|
||||
|
||||
Python 3.5.2 behaves like Python 2.7 and Python 3.4. If the system
|
||||
urandom is not initialized, the startup does not block, but
|
||||
``os.urandom()`` can return low-quality entropy (even it is not easily
|
||||
guessable).
|
||||
|
||||
|
||||
Use Cases
|
||||
=========
|
||||
|
||||
The following use cases are used to help to choose the right compromise
|
||||
between security and practicability.
|
||||
|
||||
|
||||
Use Case 1: init script
|
||||
-----------------------
|
||||
|
||||
Use a Python 3 script to initialize the system, like systemd-cron. If
|
||||
the script blocks, the system initialize is stuck too.
|
||||
|
||||
The issue #26839 is a good example of this use case.
|
||||
|
||||
Use case 1.1: No secret needed
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If the init script doesn't have to generate any secure secret, this use
|
||||
case is already handled correctly but Python 3.5.2: Python startup
|
||||
doesn't block on system urandom anymore.
|
||||
|
||||
Use case 1.2: Secure secret required
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If the init script has to generate a secure secret, there is no safe
|
||||
solution.
|
||||
|
||||
Falling back to weak entropy is not acceptable, it would
|
||||
reduce the security of the program.
|
||||
|
||||
Python cannot produce itself secure entropy, it can only wait until
|
||||
system urandom is initialized. But in this use case, the whole system
|
||||
initialization is blocked by this script, so the system fails to boot.
|
||||
|
||||
The real answer is that the system iniitalization must not be blocked by
|
||||
such script. It is ok to start the script very early at system
|
||||
initialization, but the script may blocked a few seconds until it is
|
||||
able to generate the secret.
|
||||
|
||||
Reminder: in some cases, the initialization of the system urandom never
|
||||
occurs and so programs waiting for system urandom blocks forever.
|
||||
|
||||
|
||||
Use Case 2: Web server
|
||||
----------------------
|
||||
|
||||
Run a Python 3 web server serving web pages using HTTP and HTTPS
|
||||
protocols. The server is started as soon as possible.
|
||||
|
||||
The first target of the hash DoS attack was web server: it's important
|
||||
that the hash secret cannot be easily guessed by an attacker.
|
||||
|
||||
If serving a web page needs a secret to create a cookie, create an
|
||||
encryption key, ..., the secret must be created with good entropy:
|
||||
again, it must be hard to guess the secret.
|
||||
|
||||
A web server requires security. If a choice must be made between
|
||||
security and running the server with weak entropy, security is more
|
||||
important. If there is no good entropy: the server must block or fail
|
||||
with an error.
|
||||
|
||||
The question is if it makes sense to start a web server on a host before
|
||||
system urandom is initialized.
|
||||
|
||||
The issues #25420 and #26839 are restricted to the Python startup, not
|
||||
to generate a secret before the system urandom is initialized.
|
||||
|
||||
|
||||
Fix system urandom
|
||||
==================
|
||||
|
||||
Load entropy from disk at boot
|
||||
-------------------------------
|
||||
|
||||
Collecting entropy can take several minutes. To accelerate the system
|
||||
initialization, operating systems store entropy on disk at shutdown, and
|
||||
then reload entropy from disk at the boot.
|
||||
|
||||
If a system collects enough entropy at least once, the system urandom
|
||||
will be initialized quickly, as soon as the entropy is reloaded from
|
||||
disk.
|
||||
|
||||
|
||||
Virtual machines
|
||||
----------------
|
||||
|
||||
Virtual machines don't have a direct access to the hardware and so have
|
||||
less sources of entropy than bare metal. A solution is to add a
|
||||
`virtio-rng device
|
||||
<https://fedoraproject.org/wiki/Features/Virtio_RNG>`_ to pass entropy
|
||||
from the host to the virtual machine.
|
||||
|
||||
|
||||
Embedded devices
|
||||
----------------
|
||||
|
||||
A solution for embedded devices is to plug an hardware RNG.
|
||||
|
||||
For example, Raspberry Pi have an hardware RNG but it's not used by
|
||||
default. See: `Hardware RNG on Raspberry Pi
|
||||
<http://fios.sector16.net/hardware-rng-on-raspberry-pi/>`_.
|
||||
|
||||
|
||||
|
||||
Denial-of-service when reading random
|
||||
=====================================
|
||||
|
||||
Don't use /dev/random but /dev/urandom
|
||||
--------------------------------------
|
||||
|
||||
The ``/dev/random`` device should only used for very specific use cases.
|
||||
Reading from ``/dev/random`` on Linux is likely to block. Users don't
|
||||
like when an application blocks longer than 5 seconds to generate a
|
||||
secret. It is only expected for specific cases like generating
|
||||
explicitly an encryption key.
|
||||
|
||||
When the system has no available entropy, choosing between blocking
|
||||
until entropy is available or falling back on lower quality entropy is a
|
||||
matter of compromise between security and practicability. The choice
|
||||
depends on the use case.
|
||||
|
||||
On Linux, ``/dev/urandom`` is secure, it should be used instead of
|
||||
``/dev/random``:
|
||||
|
||||
* `Myths about /dev/urandom <http://www.2uo.de/myths-about-urandom/>`_
|
||||
by Thomas Hühn: "Fact: /dev/urandom is the preferred source of
|
||||
cryptographic randomness on UNIX-like systems"
|
||||
|
||||
|
||||
getrandom(0) can block forever on Linux
|
||||
---------------------------------------
|
||||
|
||||
The origin of the Python issue #26839 is the `Debian bug
|
||||
report #822431
|
||||
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=822431>`_: in fact,
|
||||
``getrandom(0)`` blocks forever on the virtual machine.
|
||||
|
||||
`Load entropy from disk at boot`_ reduces the risk of this case.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
On Linux, reading the ``/dev/urandom`` can return "weak" entropy before
|
||||
urandom is fully initialized, before the kernel collected 128 bits of
|
||||
entropy. Linux 3.17 adds a new ``getrandom()`` syscall which allows to
|
||||
block until urandom is initialized.
|
||||
|
||||
On Python 3.5.2, os.urandom() uses the ``getrandom(GRND_NONBLOCK)``, but
|
||||
falls back on reading the non-blocking ``/dev/urandom`` if
|
||||
``getrandom(GRND_NONBLOCK)`` fails with ``EAGAIN``.
|
||||
|
||||
Security experts promotes ``os.urandom()`` to genereate cryptographic
|
||||
keys. By the way, ``os.urandom()`` is preferred over
|
||||
``ssl.RAND_bytes()`` for different reasons.
|
||||
|
||||
This PEP proposes to modify os.urandom() to use ``getrandom()`` in
|
||||
blocking mode to not return weak entropy, but also ensure that Python
|
||||
will not block at startup.
|
||||
|
||||
|
||||
Changes
|
||||
=======
|
||||
|
||||
All changes described in this section are specific to the Linux
|
||||
platform.
|
||||
|
||||
* Initialize hash secret from non-blocking system urandom
|
||||
* Initialize ``random._inst`` with non-blocking system urandom
|
||||
* Modify os.urandom() to block (until system urandom is initialized)
|
||||
|
||||
A new ``_PyOS_URandom_Nonblocking()`` private method is added: try to
|
||||
call ``getrandom(GRND_NONBLOCK)``, but falls back on reading
|
||||
``/dev/urandom`` if it fails with ``EAGAIN``.
|
||||
|
||||
``_PyRandom_Init()`` is modified to call
|
||||
``_PyOS_URandom_Nonblocking()``. Moreover, a new ``random_inst_seed``
|
||||
field is added to the ``_Py_HashSecret_t`` structure.
|
||||
|
||||
``random._inst`` (an instance of ``random.Random``) is initialized with
|
||||
the new ``random_inst_seed`` secret. A ("fuse") flag is used to ensure
|
||||
that this secret is only used once.
|
||||
|
||||
If a second instance of random.Random is created, blocking
|
||||
``os.urandom()`` is used.
|
||||
|
||||
``os.urandom()`` (C function ``_PyOS_URandom()``) is modified to always
|
||||
call ``getrandom(0)`` (blocking mode).
|
||||
|
||||
|
||||
|
||||
Alternative
|
||||
===========
|
||||
|
||||
Never use blocking urandom in the random module
|
||||
-----------------------------------------------
|
||||
|
||||
The random module can use ``random_inst_seed`` as a seed, but add other
|
||||
sources of entropy like the process identifier (``os.getpid()``), the
|
||||
current time (``time.time()``), memory addresses, etc.
|
||||
|
||||
Reading 2500 bytes from os.urandom() to initialize the Mersenne Twister
|
||||
RNG in random.Random is a deliberate choice to get access to the full
|
||||
range of the RNG. This PEP is a compromise between "security" and
|
||||
"feature". Python should not block at startup before the OS collected
|
||||
enough entropy. But on the regular use case (system urandom
|
||||
iniitalized), the random module should continue to its code to
|
||||
initialize the seed.
|
||||
|
||||
Python 3.5.0 was blocked on ``import random``, not on building a second
|
||||
instance of ``random.Random``.
|
||||
|
||||
|
||||
Leave os.urandom() unchanged, add os.getrandom()
|
||||
------------------------------------------------
|
||||
|
||||
os.urandom() remains unchanged: never block, but it can return weak
|
||||
entropy if system urandom is not initialized yet.
|
||||
|
||||
A new ``os.getrandom()`` function is added: thin wrapper to the
|
||||
``getrandom()`` syscall.
|
||||
|
||||
The ``secrets.token_bytes()`` function should be used to write portable
|
||||
code.
|
||||
|
||||
The problem with this change is that it expects that users understand
|
||||
well security and know well each platforms. Python has the tradition of
|
||||
hiding "implementation details". For example, ``os.urandom()`` is not a
|
||||
thin wrapper to the ``/dev/urandom`` device: it uses
|
||||
``CryptGenRandom()`` on Windows, it uses ``getentropy()`` on OpenBSD, it
|
||||
tries ``getrandom()`` on Linux and Solaris or falls back on reading
|
||||
``/dev/urandom``. Python already uses the best available system RNG
|
||||
depending on the platform.
|
||||
|
||||
This PEP does not change the API which didn't change since the creation
|
||||
of Python:
|
||||
|
||||
* ``os.urandom()``, ``random.SystemRandom`` and ``secrets`` for security
|
||||
* ``random`` module (except ``random.SystemRandom``) for all other usages
|
||||
|
||||
|
||||
Raise BlockingIOError in os.urandom()
|
||||
-------------------------------------
|
||||
|
||||
Proposition
|
||||
^^^^^^^^^^^
|
||||
|
||||
`PEP 522: Allow BlockingIOError in security sensitive APIs on Linux
|
||||
<https://www.python.org/dev/peps/pep-0522/>`_.
|
||||
|
||||
Python should not decide for the developer how to handle `The bug`_:
|
||||
raising immediatly a ``BlockingIOError`` if ``os.urandom()`` is going to
|
||||
block allows developers to choose how to handle this case:
|
||||
|
||||
* catch the exception and falls back to a non-secure entropy source:
|
||||
read ``/dev/urandom`` on Linux, use the Python ``random`` module
|
||||
(which is not secure at all), use time, use process identifier, etc.
|
||||
* don't catch the error, the whole program fails with this fatal
|
||||
exception
|
||||
|
||||
More generally, the exception helps to notify when sometimes goes wrong.
|
||||
The application can emit a warning when it starts to wait for
|
||||
``os.urandom()``.
|
||||
|
||||
Criticism
|
||||
^^^^^^^^^
|
||||
|
||||
For the use case 2 (web server), falling back on non-secure entropy is
|
||||
not acceptable. The application must handle ``BlockingIOError``: poll
|
||||
``os.urandom()`` until it completes. Example::
|
||||
|
||||
def secret(n=16):
|
||||
try:
|
||||
return os.urandom(n)
|
||||
except BlockingIOError:
|
||||
pass
|
||||
|
||||
print("Wait for system urandom initialiation: move your "
|
||||
"mouse, use your keyboard, use your disk, ...")
|
||||
while 1:
|
||||
# Avoid busy-loop: sleep 1 ms
|
||||
time.sleep(0.001)
|
||||
try:
|
||||
return os.urandom(n)
|
||||
except BlockingIOError:
|
||||
pass
|
||||
|
||||
For correctness, all applications which must generate a secure secret
|
||||
must be modified to handle ``BlockingIOError`` even if `The bug`_ is
|
||||
unlikely.
|
||||
|
||||
The case of applications using ``os.random()`` but don't really require
|
||||
security is less clear. Maybe these applications should not use
|
||||
``os.urandom()`` at the first place, but always the non-blocking
|
||||
``random`` module. If ``os.urandom()`` is used for security, we are back
|
||||
to the use case 2 (web server) described above. If a developer doesn't
|
||||
want to drop ``os.urandom()``, the code should be modified. Example::
|
||||
|
||||
def almost_secret(n=16):
|
||||
try:
|
||||
return os.urandom(n)
|
||||
except BlockingIOError:
|
||||
return [random.randrange(256) for index in range(n)]
|
||||
|
||||
The question is if `The bug`_ is common enough to require that so many
|
||||
applications have to be modified.
|
||||
|
||||
Another simpler choice is to refuse to start before the system urandom
|
||||
is initialized::
|
||||
|
||||
def secret(n=16):
|
||||
try:
|
||||
return os.urandom(n)
|
||||
except BlockingIOError:
|
||||
print("Fatal error: the system urandom is not initialized")
|
||||
print("Wait a bit, and rerun the program later.")
|
||||
sys.exit(1)
|
||||
|
||||
Compared to Python 2.7, Python 3.4 and Python 3.5.2 where os.urandom()
|
||||
never blocks nor raise an exception on Linux, such behaviour change can
|
||||
be seen as a major regression.
|
||||
|
||||
|
||||
Add an optional block parameter to os.urandom()
|
||||
-----------------------------------------------
|
||||
|
||||
Add an optional block parameter to os.urandom(). The default value may
|
||||
be ``True`` (block by default) or ``False`` (non-blocking).
|
||||
|
||||
The first technical issue is to implement ``os.urandom(block=False)`` on
|
||||
all platforms. On Linux 3.17 and newer has a well defined non-blocking
|
||||
API.
|
||||
|
||||
See the `issue #27250: Add os.urandom_block()
|
||||
<http://bugs.python.org/issue27250>`_.
|
||||
|
||||
As `Raise BlockingIOError in os.urandom()`_, it doesn't seem worth it to
|
||||
make the API more complex for a theorical (or at least very rare) use
|
||||
case.
|
||||
|
||||
As `Leave os.urandom() unchanged, add os.getrandom()`_, the problem is
|
||||
that it makes the API more complex and so more error-prone.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Annexes
|
||||
=======
|
||||
|
||||
Operating system random functions
|
||||
---------------------------------
|
||||
|
||||
``os.urandom()`` uses the following functions:
|
||||
|
||||
* `OpenBSD: getentropy()
|
||||
<http://man.openbsd.org/OpenBSD-current/man2/getentropy.2>`_
|
||||
(OpenBSD 5.6)
|
||||
* `Linux: getrandom()
|
||||
<http://man7.org/linux/man-pages/man2/getrandom.2.html>`_ (Linux 3.17)
|
||||
-- see also `A system call for random numbers: getrandom()
|
||||
<https://lwn.net/Articles/606141/>`_
|
||||
* Solaris: `getentropy()
|
||||
<https://docs.oracle.com/cd/E53394_01/html/E54765/getentropy-2.html#scrolltoc>`_,
|
||||
`getrandom()
|
||||
<https://docs.oracle.com/cd/E53394_01/html/E54765/getrandom-2.html>`_
|
||||
(both need Solaris 11.3)
|
||||
* Windows: `CryptGenRandom()
|
||||
<https://msdn.microsoft.com/en-us/library/windows/desktop/aa379942%28v=vs.85%29.aspx>`_
|
||||
(Windows XP)
|
||||
* UNIX, BSD: /dev/urandom, /dev/random
|
||||
* OpenBSD: /dev/srandom
|
||||
|
||||
On Linux, commands to get the status of ``/dev/random`` (results are
|
||||
number of bytes)::
|
||||
|
||||
$ cat /proc/sys/kernel/random/entropy_avail
|
||||
2850
|
||||
$ cat /proc/sys/kernel/random/poolsize
|
||||
4096
|
||||
|
||||
Why using os.urandom()?
|
||||
-----------------------
|
||||
|
||||
Since ``os.urandom()`` is implemented in the kernel, it doesn't have
|
||||
some issues of user-space RNG. For example, it is much harder to get its
|
||||
state. It is usually built on a CSPRNG, so even if its state is get, it
|
||||
is hard to compute previously generated numbers. The kernel has a good
|
||||
knowledge of entropy sources and feed regulary the entropy pool.
|
||||
|
||||
|
||||
Links
|
||||
=====
|
||||
|
||||
* `Cryptographically secure pseudo-random number generator (CSPRNG)
|
||||
<https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator>`_
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
Loading…
Reference in New Issue