PEP: 433 Title: Easier suppression of file descriptor inheritance Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-January-2013 Python-Version: 3.4 Abstract ======== Add a new optional *cloexec* parameter on functions creating file descriptors, add different ways to change default values of this parameter, and add four new functions: * ``os.get_cloexec(fd)`` * ``os.set_cloexec(fd, cloexec=True)`` * ``sys.getdefaultcloexec()`` * ``sys.setdefaultcloexec(cloexec)`` Rationale ========= A file descriptor has a close-on-exec flag which indicates if the file descriptor will be inherited or not. On UNIX, if the close-on-exec flag is set, the file descriptor is not inherited: it will be closed at the execution of child processes; otherwise the file descriptor is inherited by child processes. On Windows, if the close-on-exec flag is set, the file descriptor is not inherited; the file descriptor is inherited by child processes if the close-on-exec flag is cleared and if ``CreateProcess()`` is called with the *bInheritHandles* parameter set to ``TRUE`` (when ``subprocess.Popen`` is created with ``close_fds=False`` for example). Windows does now have "close-on-exec" flag but an inherance flag which is just the opposite value. For example, setting close-on-exec flag means clearing the ``HANDLE_FLAG_INHERIT`` flag of an handle. Status in Python 3.3 -------------------- On UNIX, the subprocess module closes file descriptors greater than 2 by default since Python 3.2 [#subprocess_close]_. All file descriptors created by the parent process are automatically closed in the child process. ``xmlrpc.server.SimpleXMLRPCServer`` sets the close-on-exec flag of the listening socket, the parent class ``socketserver.TCPServer`` does not set this flag. There are other cases creating a subprocess or executing a new program where file descriptors are not closed: functions of the ``os.spawn*()`` and the ``os.exec*()`` families and third party modules calling ``exec()`` or ``fork()`` + ``exec()``. In this case, file descriptors are shared between the parent and the child processes which is usually unexpected and causes various issues. This PEP proposes to continue the work started with the change in the subprocess in Python 3.2, to fix the issue in any code, and not just code using subprocess. Inherited file descriptors issues --------------------------------- Closing the file descriptor in the parent process does not close the related resource (file, socket, ...) because it is still open in the child process. The listening socket of TCPServer is not closed on ``exec()``: the child process is able to get connection from new clients; if the parent closes the listening socket and create a new listening socket on the same address, it would get an "address already is used" error. Not closing file descriptors can lead to resource exhaustion: even if the parent closes all files, creating a new file descriptor may fail with "too many files" because files are still open in the child process. See also the following issues: * `Issue #2320: Race condition in subprocess using stdin `_ (2008) * `Issue #3006: subprocess.Popen causes socket to remain open after close `_ (2008) * `Issue #7213: subprocess leaks open file descriptors between Popen instances causing hangs `_ (2009) * `Issue #12786: subprocess wait() hangs when stdin is closed `_ (2011) Security -------- Leaking file descriptors is a major security vulnerability. An untrusted child process can read sensitive data like passwords and take control of the parent process though leaked file descriptors. It is for example a known vulnerability to escape from a chroot. See also the CERT recommandation: `FIO42-C. Ensure files are properly closed when they are no longer needed `_. Example of vulnerabilities: * `OpenSSH Security Advisory: portable-keysign-rand-helper.adv `_ (April 2011) * `CWE-403: Exposure of File Descriptor to Unintended Control Sphere `_ (2008) * `Hijacking Apache https by mod_php `_ (Dec 2003) Atomicity --------- Using ``fcntl()`` to set the close-on-exec flag is not safe in a multithreaded application. If a thread calls ``fork()`` and ``exec()`` between the creation of the file descriptor and the call to ``fcntl(fd, F_SETFD, new_flags)``: the file descriptor will be inherited by the child process. Modern operating systems offer functions to set the flag during the creation of the file descriptor, which avoids the race condition. Portability ----------- Python 3.2 added ``socket.SOCK_CLOEXEC`` flag, Python 3.3 added ``os.O_CLOEXEC`` flag and ``os.pipe2()`` function. It is already possible to set atomically close-on-exec flag in Python 3.3 when opening a file and creating a pipe or socket. The problem is that these flags and functions are not portable: only recent versions of operating systems support them. ``O_CLOEXEC`` and ``SOCK_CLOEXEC`` flags are ignored by old Linux versions and so ``FD_CLOEXEC`` flag must be checked using ``fcntl(fd, F_GETFD)``. If the kernel ignores ``O_CLOEXEC`` or ``SOCK_CLOEXEC`` flag, a call to ``fcntl(fd, F_SETFD, flags)`` is required to set close-on-exec flag. .. note:: OpenBSD older 5.2 does not close the file descriptor with close-on-exec flag set if ``fork()`` is used before ``exec()``, but it works correctly if ``exec()`` is called without ``fork()``. Try `openbsd_bug.py `_. Scope ----- Applications still have to close explicitly file descriptors after a ``fork()``. The close-on-exec flag only closes file descriptors after ``exec()``, and so after ``fork()`` + ``exec()``. This PEP only change the close-on-exec flag of file descriptors created by the Python standard library, or by modules using the standard library. Third party modules not using the standard library should be modified to conform to this PEP. The new ``os.set_cloexec()`` function can be used for example. XXX Should ``subprocess.Popen`` clear the close-on-exec flag on file XXX descriptors of the constructor the ``pass_fds`` parameter? .. note:: See `Close file descriptors after fork`_ for a possible solution for ``fork()`` without ``exec()``. Proposal ======== Add a new optional *cloexec* parameter on functions creating file descriptors and different ways to change default values of this parameter. Add new functions: * ``os.get_cloexec(fd:int) -> bool``: get the close-on-exec flag of a file descriptor. Not available on all platforms. * ``os.set_cloexec(fd:int, cloexec:bool=True)``: set or clear the close-on-exec flag on a file descriptor. Not available on all platforms. * ``sys.getdefaultcloexec() -> bool``: get the current default value of the *cloexec* parameter * ``sys.setdefaultcloexec(cloexec: bool)``: set the default value of the *cloexec* parameter Add a new optional *cloexec* parameter to: * ``asyncore.dispatcher.create_socket()`` * ``io.FileIO`` * ``io.open()`` * ``open()`` * ``os.dup()`` * ``os.dup2()`` * ``os.fdopen()`` * ``os.open()`` * ``os.openpty()`` * ``os.pipe()`` * ``select.devpoll()`` * ``select.epoll()`` * ``select.kqueue()`` * ``socket.socket()`` * ``socket.socket.accept()`` * ``socket.socket.dup()`` * ``socket.socket.fromfd`` * ``socket.socketpair()`` The default value of the *cloexec* parameter is ``sys.getdefaultcloexec()``. Add a new command line option ``-e`` and an environment variable ``PYTHONCLOEXEC`` to the set close-on-exec flag by default. All functions creating file descriptors in the standard library must respect the default *cloexec* parameter (``sys.getdefaultcloexec()``). File descriptors 0 (stdin), 1 (stdout) and 2 (stderr) are expected to be inherited, but Python does not handle them differently. When ``os.dup2()`` is used to replace standard streams, ``cloexec=False`` must be specified explicitly. Drawbacks of the proposal: * It is not more possible to know if the close-on-exec flag will be set or not on a newly created file descriptor just by reading the source code. * If the inherance of a file descriptor matters, the *cloexec* parameter must now be specified explicitly, or the library or the application will not work depending on the default value of the *cloexec* parameter. Alternatives ============ Enable inherance by default and no configurable default ------------------------------------------------------- Add a new optional parameter *cloexec* on functions creating file descriptors. The default value of the *cloexec* parameter is ``False``, and this default cannot be changed. No file descriptor inherance by default is also the default on POSIX and on Windows. This alternative is the most convervative option. This option does solve issues listed in the `Rationale`_ section, it only provides an helper to fix them. All functions creating file descriptors have to be modified to set *cloexec=True* in each module used by an application to fix all these issues. Enable inherance by default, default can only be set to True ------------------------------------------------------------ This alternative is based on the proposal: the only difference is that ``sys.setdefaultcloexec()`` does not take any argument, it can only be used to set the default value of the *cloexec* parameter to ``True``. Disable inheritance by default ------------------------------ This alternative is based on the proposal: the only difference is that the default value of the *cloexec* parameter is ``True`` (instead of ``False``). If a file must be inherited by child processes, ``cloexec=False`` parameter can be used. ``subprocess.Popen`` constructor has an ``pass_fds`` parameter to specify which file descriptors must be inherited. The close-on-exec flag of these file descriptors must be changed with ``os.set_cloexec()``. Advantages of setting close-on-exec flag by default: * There are far more programs that are bitten by FD inheritance upon exec (see `Inherited file descriptors issues`_ and `Security`_) than programs relying on it (see `Applications using inherance of file descriptors`_). Drawbacks of setting close-on-exec flag by default: * It violates the principle of least surprise. Developers using the os module may expect that Python respects the POSIX standard and so that close-on-exec flag is not set by default. * The os module is written as a thin wrapper to system calls (to functions of the C standard library). If atomic flags to set close-on-exec flag are not supported (see `Appendix: Operating system support`_), a single Python function call may call 2 or 3 system calls (see `Performances`_ section). * Extra system calls, if any, may slow down Python: see `Performances`_. Backward compatibility: only a few programs rely on inherance of file descriptors, and they only pass a few file descriptors, usually just one. These programs will fail immediatly with ``EBADF`` error, and it will be simple to fix them: add ``cloexec=False`` parameter or use ``os.set_cloexec(fd, False)``. The ``subprocess`` module will be changed anyway to clear close-on-exec flag on file descriptors listed in the ``pass_fds`` parameter of Popen constructor. So it possible that these programs will not need any fix if they use the ``subprocess`` module. Close file descriptors after fork --------------------------------- This PEP does not fix issues with applications using ``fork()`` without ``exec()``. Python needs a generic process to register callbacks which would be called after a fork, see `#16500: Add an atfork module`_. Such registry could be used to close file descriptors just after a ``fork()``. Drawbacks: * It does not solve the problem on Windows: ``fork()`` does not exist on Windows * This alternative does not solve the problem for programs using ``exec()`` without ``fork()``. * A third party module may call directly the C function ``fork()`` which will not call "atfork" callbacks. * All functions creating file descriptors must be changed to register a callback and then unregister their callback when the file is closed. Or a list of *all* open file descriptors must be maintained. * The operating system is a better place than Python to close automatically file descriptors. For example, it is not easy to avoid a race condition between closing the file and unregistering the callback closing the file. open(): add "e" flag to mode ---------------------------- A new "e" mode would set close-on-exec flag (best-effort). This alternative only solves the problem for ``open()``. socket.socket() and os.pipe() do not have a ``mode`` parameter for example. Since its version 2.7, the GNU libc supports ``"e"`` flag for ``fopen()``. It uses ``O_CLOEXEC`` if available, or use ``fcntl(fd, F_SETFD, FD_CLOEXEC)``. With Visual Studio, fopen() accepts a "N" flag which uses ``O_NOINHERIT``. Bikeshedding on the name of the new parameter --------------------------------------------- * ``inherit``, ``inherited``: closer to Windows definition * ``sensitive`` * ``sterile``: "Does not produce offspring." Applications using inherance of file descriptors ================================================ Most developers don't know that file descriptors are inherited by default. Most programs do not rely on inherance of file descriptors. For example, ``subprocess.Popen`` was changed in Python 3.2 to close all file descriptors greater than 2 in the child process by default. No user complained about this behavior change yet. Network servers using fork may want to pass the client socket to the child process. For example, on UNIX a CGI server pass the socket client through file descriptors 0 (stdin) and 1 (stdout) using ``dup2()``. To access a restricted resource like creating a socket listening on a TCP port lower than 1024 or reading a file containing sensitive data like passwords, a common practice is: start as the root user, create a file descriptor, create a child process, drop privileges (ex: change the current user), pass the file descriptor to the child process and exit the parent process. Security is very important in such use case: leaking another file descriptor would be a critical security vulnerability (see `Security`_). The root process may not exit but monitors the child process instead, and restarts a new child process and pass the same file descriptor if the previous child process crashed. Example of programs taking file descriptors from the parent process using a command line option: * gpg: ``--status-fd ``, ``--logger-fd ``, etc. * openssl: ``-pass fd:`` * qemu: ``-add-fd `` * valgrind: ``--log-fd=``, ``--input-fd=``, etc. * xterm: ``-S `` On Linux, it is possible to use ``"/dev/fd/"`` filename to pass a file descriptor to a program expecting a filename. Performances ============ Setting close-on-exec flag may require additional system calls for each creation of new file descriptors. The number of additional system calls depends on the method used to set the flag: * ``O_NOINHERIT``: no additional system call * ``O_CLOEXEC``: one additional system call, but only at the creation of the first file descriptor, to check if the flag is supported. If the flag is not supported, Python has to fallback to the next method. * ``ioctl(fd, FIOCLEX)``: one additional system call per file descriptor * ``fcntl(fd, F_SETFD, flags)``: two additional system calls per file descriptor, one to get old flags and one to set new flags On Linux, setting the close-on-flag has a low overhead on performances. Results of `bench_cloexec.py `_ on Linux 3.6: * close-on-flag not set: 7.8 us * ``O_CLOEXEC``: 1% slower (7.9 us) * ``ioctl()``: 3% slower (8.0 us) * ``fcntl()``: 3% slower (8.0 us) Implementation ============== os.get_cloexec(fd) ------------------ Get the close-on-exec flag of a file descriptor. Pseudo-code:: if os.name == 'nt': def get_cloexec(fd): handle = _winapi._get_osfhandle(fd); flags = _winapi.GetHandleInformation(handle) return not(flags & _winapi.HANDLE_FLAG_INHERIT) else: try: import fcntl except ImportError: pass else: def get_cloexec(fd): flags = fcntl.fcntl(fd, fcntl.F_GETFD) return bool(flags & fcntl.FD_CLOEXEC) os.set_cloexec(fd, cloexec=True) -------------------------------- Set or clear the close-on-exec flag on a file descriptor. The flag is set after the creation of the file descriptor and so it is not atomic. Pseudo-code:: if os.name == 'nt': def set_cloexec(fd, cloexec=True): handle = _winapi._get_osfhandle(fd); mask = _winapi.HANDLE_FLAG_INHERIT if cloexec: flags = 0 else: flags = mask _winapi.SetHandleInformation(handle, mask, flags) else: fnctl = None ioctl = None try: import ioctl except ImportError: try: import fcntl except ImportError: pass if ioctl is not None and hasattr('FIOCLEX', ioctl): def set_cloexec(fd, cloexec=True): if cloexec: ioctl.ioctl(fd, ioctl.FIOCLEX) else: ioctl.ioctl(fd, ioctl.FIONCLEX) elif fnctl is not None: def set_cloexec(fd, cloexec=True): flags = fcntl.fcntl(fd, fcntl.F_GETFD) if cloexec: flags |= FD_CLOEXEC else: flags &= ~FD_CLOEXEC fcntl.fcntl(fd, fcntl.F_SETFD, flags) ioctl is preferred over fcntl because it requires only one syscall, instead of two syscalls for fcntl. .. note:: ``fcntl(fd, F_SETFD, flags)`` only supports one flag (``FD_CLOEXEC``), so it would be possible to avoid ``fcntl(fd, F_GETFD)``. But it may drop other flags in the future, and so it is safer to keep the two functions calls. .. note:: ``fopen()`` function of the GNU libc ignores the error if ``fcntl(fd, F_SETFD, flags)`` failed. open() ------ * Windows: ``open()`` with ``O_NOINHERIT`` flag [atomic] * ``open()`` with ``O_CLOEXEC flag`` [atomic] * ``open()`` + ``os.set_cloexec(fd, True)`` [best-effort] os.dup() -------- * ``fcntl(fd, F_DUPFD_CLOEXEC)`` [atomic] * ``dup()`` + ``os.set_cloexec(fd, True)`` [best-effort] os.dup2() --------- * ``dup3()`` with ``O_CLOEXEC`` flag [atomic] * ``dup2()`` + ``os.set_cloexec(fd, True)`` [best-effort] os.pipe() --------- * Windows: ``CreatePipe()`` with ``SECURITY_ATTRIBUTES.bInheritHandle=TRUE``, or ``_pipe()`` with ``O_NOINHERIT`` flag [atomic] * ``pipe2()`` with ``O_CLOEXEC`` flag [atomic] * ``pipe()`` + ``os.set_cloexec(fd, True)`` [best-effort] socket.socket() --------------- * Windows: ``WSASocket()`` with ``WSA_FLAG_NO_HANDLE_INHERIT`` flag [atomic] * ``socket()`` with ``SOCK_CLOEXEC`` flag [atomic] * ``socket()`` + ``os.set_cloexec(fd, True)`` [best-effort] socket.socketpair() ------------------- * ``socketpair()`` with ``SOCK_CLOEXEC`` flag [atomic] * ``socketpair()`` + ``os.set_cloexec(fd, True)`` [best-effort] socket.socket.accept() ---------------------- * ``accept4()`` with ``SOCK_CLOEXEC`` flag [atomic] * ``accept()`` + ``os.set_cloexec(fd, True)`` [best-effort] Backward compatibility ====================== There is no backward incompatible change. The default behaviour is unchanged: the close-on-exec flag is not set by default. Appendix: Operating system support ================================== Windows ------- Windows has an ``O_NOINHERIT`` flag: "Do not inherit in child processes". For example, it is supported by ``open()`` and ``_pipe()``. The flag can be cleared using ``SetHandleInformation(fd, HANDLE_FLAG_INHERIT, 0)``. ``CreateProcess()`` has an ``bInheritHandles`` parameter: if it is ``FALSE``, the handles are not inherited. If it is ``TRUE``, handles with ``HANDLE_FLAG_INHERIT`` flag set are inherited. ``subprocess.Popen`` uses ``close_fds`` option to define ``bInheritHandles``. ioctl ----- Functions: * ``ioctl(fd, FIOCLEX, 0)``: set the close-on-exec flag * ``ioctl(fd, FIONCLEX, 0)``: clear the close-on-exec flag Availability: Linux, Mac OS X, QNX, NetBSD, OpenBSD, FreeBSD. fcntl ----- Functions: * ``flags = fcntl(fd, F_GETFD); fcntl(fd, F_SETFD, flags | FD_CLOEXEC)``: set the close-on-exec flag * ``flags = fcntl(fd, F_GETFD); fcntl(fd, F_SETFD, flags & ~FD_CLOEXEC)``: clear the close-on-exec flag Availability: AIX, Digital UNIX, FreeBSD, HP-UX, IRIX, Linux, Mac OS X, OpenBSD, Solaris, SunOS, Unicos. Atomic flags ------------ New flags: * ``O_CLOEXEC``: available on Linux (2.6.23), FreeBSD (8.3), OpenBSD 5.0, QNX, BeOS, next NetBSD release (6.1?). This flag is part of POSIX.1-2008. * ``SOCK_CLOEXEC`` flag for ``socket()`` and ``socketpair()``, available on Linux 2.6.27, OpenBSD 5.2, NetBSD 6.0. * ``WSA_FLAG_NO_HANDLE_INHERIT`` flag for ``WSASocket()``: supported on Windows 7 with SP1, Windows Server 2008 R2 with SP1, and later * ``fcntl()``: ``F_DUPFD_CLOEXEC`` flag, available on Linux 2.6.24, OpenBSD 5.0, FreeBSD 9.1, NetBSD 6.0. This flag is part of POSIX.1-2008. * ``recvmsg()``: ``MSG_CMSG_CLOEXEC``, available on Linux 2.6.23, NetBSD 6.0. On Linux older than 2.6.23, ``O_CLOEXEC`` flag is simply ignored. So we have to check that the flag is supported by calling ``fcntl()``. If it does not work, we have to set the flag using ``ioctl()`` or ``fcntl()``. On Linux older than 2.6.27, if the ``SOCK_CLOEXEC`` flag is set in the socket type, ``socket()`` or ``socketpair()`` fail and ``errno`` is set to ``EINVAL``. New functions: * ``dup3()``: available on Linux 2.6.27 (and glibc 2.9) * ``pipe2()``: available on Linux 2.6.27 (and glibc 2.9) * ``accept4()``: available on Linux 2.6.28 (and glibc 2.10) If ``accept4()`` is called on Linux older than 2.6.28, ``accept4()`` returns ``-1`` (fail) and ``errno`` is set to ``ENOSYS``. Links ===== Links: * `Secure File Descriptor Handling `_ (Ulrich Drepper, 2008) * `win32_support.py of the Tornado project `_: emulate fcntl(fd, F_SETFD, FD_CLOEXEC) using ``SetHandleInformation(fd, HANDLE_FLAG_INHERIT, 1)`` Python issues: * `#10115: Support accept4() for atomic setting of flags at socket creation `_ * `#12105: open() does not able to set flags, such as O_CLOEXEC `_ * `#12107: TCP listening sockets created without FD_CLOEXEC flag `_ * `#16500: Add an atfork module `_ * `#16850: Add "e" mode to open(): close-and-exec (O_CLOEXEC) / O_NOINHERIT `_ * `#16860: Use O_CLOEXEC in the tempfile module `_ * `#17036: Implementation of the PEP 433 `_ * `#16946: subprocess: _close_open_fd_range_safe() does not set close-on-exec flag on Linux < 2.6.23 if O_CLOEXEC is defined `_ * `#17070: PEP 433: Use the new cloexec to improve security and avoid bugs `_ Ruby: * `Set FD_CLOEXEC for all fds (except 0, 1, 2) `_ * `O_CLOEXEC flag missing for Kernel::open `_: the `commit was reverted later `_ Footnotes ========= .. [#subprocess_close] On UNIX since Python 3.2, subprocess.Popen() closes all file descriptors by default: ``close_fds=True``. It closes file descriptors in range 3 inclusive to ``local_max_fd`` exclusive, where ``local_max_fd`` is ``fcntl(0, F_MAXFD)`` on NetBSD, or ``sysconf(_SC_OPEN_MAX)`` otherwise. If the error pipe has a descriptor smaller than 3, ``ValueError`` is raised.