PEP: 3151 Title: Reworking the OS and IO exception hierarchy Version: $Revision$ Last-Modified: $Date$ Author: Antoine Pitrou Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2010-07-21 Python-Version: 3.2 or 3.3 Post-History: Resolution: TBD Abstract ======== The standard exception hierarchy is an important part of the Python language. It has two defining qualities: it is both generic and selective. Generic in that the same exception type can be raised - and handled - regardless of the context (for example, whether you are trying to add something to an integer, to call a string method, or to write an object on a socket, a TypeError will be raised for bad argument types). Selective in that it allows the user to easily handle (silence, examine, process, store or encapsulate...) specific kinds of error conditions while letting other errors bubble up to higher calling contexts. For example, you can choose to catch ZeroDivisionErrors without affecting the default handling of other ArithmeticErrors (such as OverflowErrors). This PEP proposes changes to a part of the exception hierarchy in order to better embody the qualities mentioned above: the errors related to operating system calls (OSError, IOError, mmap.error, select.error, and all their subclasses). Rationale ========= Confusing set of OS-related exceptions -------------------------------------- OS-related (or system call-related) exceptions are currently a diversity of classes, arranged in the following sub-hierarchies:: +-- EnvironmentError +-- IOError +-- io.BlockingIOError +-- io.UnsupportedOperation (also inherits from ValueError) +-- socket.error +-- OSError +-- VMSError +-- WindowsError +-- mmap.error +-- select.error While some of these distinctions can be explained by implementation considerations, they are often not very logical at a higher level. The line separating OSError and IOError, for example, is often blurry. Consider the following:: >>> os.remove("fff") Traceback (most recent call last): File "", line 1, in OSError: [Errno 2] No such file or directory: 'fff' >>> open("fff") Traceback (most recent call last): File "", line 1, in IOError: [Errno 2] No such file or directory: 'fff' The same error condition (a non-existing file) gets cast as two different exceptions depending on which library function was called. The reason for this is that the ``os`` module exclusively raises OSError (or its subclass WindowsError) while the ``io`` module mostly raises IOError. However, the user is interested in the nature of the error, not in which part of the interpreter it comes from (since the latter is obvious from reading the traceback message or application source code). In fact, it is hard to think of any situation where OSError should be caught but not IOError, or the reverse. A further proof of the ambiguity of this segmentation is that the standard library itself sometimes has problems deciding. For example, in the ``select`` module, similar failures will raise either ``select.error``, ``OSError`` or ``IOError`` depending on whether you are using select(), a poll object, a kqueue object, or an epoll object. This makes user code uselessly complicated since it has to be prepared to catch various exception types, depending on which exact implementation of a single primitive it chooses to use at runtime. As for WindowsError, it seems to be a pointless distinction. First, it only exists on Windows systems, which requires tedious compatibility code in cross-platform applications (such code can be found in ``Lib/shutil.py``). Second, it inherits from OSError and is raised for similar errors as OSError is raised for on other systems. Third, the user wanting access to low-level exception specifics has to examine the ``errno`` or ``winerror`` attribute anyway. .. note:: `Appendix B`_ surveys the use of the various exception types across the interpreter and the standard library. Lack of fine-grained exceptions ------------------------------- The current variety of OS-related exceptions doesn't allow the user to filter easily for the desired kinds of failures. As an example, consider the task of deleting a file if it exists. The Look Before You Leap (LBYL) idiom suffers from an obvious race condition:: if os.path.exists(filename): os.remove(filename) If a file named as ``filename`` is created by another thread or process between the calls to ``os.path.exists`` and ``os.remove``, it won't be deleted. This can produce bugs in the application, or even security issues. Therefore, the solution is to try to remove the file, and ignore the error if the file doesn't exist (an idiom known as Easier to Ask Forgiveness than to get Permission, or EAFP). Careful code will read like the following (which works under both POSIX and Windows systems):: try: os.remove(filename) except OSError as e: if e.errno != errno.ENOENT: raise or even:: try: os.remove(filename) except EnvironmentError as e: if e.errno != errno.ENOENT: raise This is a lot more to type, and also forces the user to remember the various cryptic mnemonics from the ``errno`` module. It imposes an additional cognitive burden and gets tiresome rather quickly. Consequently, many programmers will instead write the following code, which silences exceptions too broadly:: try: os.remove(filename) except OSError: pass ``os.remove`` can raise an OSError not only when the file doesn't exist, but in other possible situations (for example, the filename points to a directory, or the current process doesn't have permission to remove the file), which all indicate bugs in the application logic and therefore shouldn't be silenced. What the programmer would like to write instead is something such as:: try: os.remove(filename) except FileNotFoundError: pass Compatibility strategy ====================== Reworking the exception hierarchy will obviously change the exact semantics of at least some existing code. While it is not possible to improve on the current situation without changing exact semantics, it is possible to define a narrower type of compatibility, which we will call **useful compatibility**, and define as follows: * *useful compatibility* doesn't make exception catching any narrower, but it can be broader for *naïve* exception-catching code. Given the following kind of snippet, all exceptions caught before this PEP will also be caught after this PEP, but the reverse may be false:: try: os.remove(filename) except OSError: pass * *useful compatibility* doesn't alter the behaviour of *careful* exception-catching code. Given the following kind of snippet, the same errors should be silenced or re-raised, regardless of whether this PEP has been implemented or not:: try: os.remove(filename) except OSError as e: if e.errno != errno.ENOENT: raise The rationale for this compromise is that careless (or "naïve") code can't really be helped, but at least code which "works" won't suddenly raise errors and crash. This is important since such code is likely to be present in scripts used as cron tasks or automated system administration programs. Careful code should not be penalized. .. _Step 1: Step 1: coalesce exception types ================================ The first step of the resolution is to coalesce existing exception types. The following changes are proposed: * alias both socket.error and select.error to IOError * alias mmap.error to OSError * alias both WindowsError and VMSError to OSError * alias OSError to IOError * coalesce EnvironmentError into IOError Each of these changes doesn't preserve exact compatibility, but it does preserve *useful compatibility* (see "compatibility" section above). Each of these changes can be accepted or refused individually, but of course it is considered that the greatest impact can be achieved if this first step is accepted in full. In this case, the IO exception sub-hierarchy would become:: +-- IOError (replacing OSError, WindowsError, EnvironmentError, etc.) +-- io.BlockingIOError +-- io.UnsupportedOperation (also inherits from ValueError) Justification ------------- Not only does this first step present the user a simpler landscape as explained in the rationale_ section, but it also allows for a better and more complete resolution of `Step 2`_ (see Prerequisite_). The rationale for keeping ``IOError`` as the official name for generic OS-related exceptions is the survey in `Appendix B`_, which shows it is the dominant error today in the standard library. ``EnvironmentError`` might be more accurate, but it is more tedious to type and also much lesser-known. As for third-party Python code, Google Code Search shows IOError being ten times more popular than EnvironmentError in user code, and three times more popular than OSError [3]_. Exception attributes -------------------- Coalescing WindowsError would mean the ``winerror`` attribute would be present on all platforms, just set to ``None`` if the platform isn't Windows. Indeed, ``errno``, ``filename`` and ``strerror`` can all already be None, as is often the case when IOError is raised directly by Python code. Deprecation of names -------------------- It is not yet decided whether the old names will be deprecated (then removed) or all names will continue living forever in the builtins namespace. built-in exceptions ''''''''''''''''''' Deprecating the old built-in exceptions cannot be done in a straightforward fashion by intercepting all lookups in the builtins namespace, since these are performance-critical. We also cannot work at the object level, since the deprecated names will be aliased to non-deprecated objects. A solution is to recognize these names at compilation time, and then emit a separate ``LOAD_OLD_GLOBAL`` opcode instead of the regular ``LOAD_GLOBAL``. This specialized opcode will handle the output of a DeprecationWarning (or PendingDeprecationWarning, depending on the policy decided upon) when the name doesn't exist in the globals namespace, but only in the builtins one. This will be enough to avoid false positives (for example if someone defines their own ``OSError`` in a module), and false negatives will be rare (for example when someone accesses ``OSError`` through the ``builtins`` module rather than directly). module-level exceptions ''''''''''''''''''''''' The above approach cannot be used easily, since it would require special-casing some modules when compiling code objects. However, these names are by construction much less visible (they don't appear in the builtins namespace), and lesser-known too, so we might decide to let them live in their own namespaces. .. _Step 2: Step 2: define additional subclasses ==================================== The second step of the resolution is to extend the hierarchy by defining subclasses which will be raised, rather than their parent, for specific errno values. Which errno values is subject to discussion, but a survey of existing exception matching practices (see `Appendix A`_) helps us propose a reasonable subset of all values. Trying to map all errno mnemonics, indeed, seems foolish, pointless, and would pollute the root namespace. Furthermore, in a couple of cases, different errno values could raise the same exception subclass. For example, EAGAIN, EALREADY, EWOULDBLOCK and EINPROGRESS are all used to signal that an operation on a non-blocking socket would block (and therefore needs trying again later). They could therefore all raise an identical subclass and let the user examine the ``errno`` attribute if (s)he so desires (see below "exception attributes"). Prerequisite ------------ `Step 1`_ is a loose prerequisite for this. Prerequisite, because some errnos can currently be attached to different exception classes: for example, EBADF can be attached to both OSError and IOError, depending on the context. If we don't want to break *useful compatibility*, we can't make an ``except OSError`` (or IOError) fail to match an exception where it would succeed today. Loose, because we could decide for a partial resolution of step 2 if existing exception classes are not coalesced: for example, EBADF could raise a hypothetical BadFileDescriptor where an IOError was previously raised, but continue to raise OSError otherwise. The dependency on step 1 could be totally removed if the new subclasses used multiple inheritance to match with all of the existing superclasses (or, at least, OSError and IOError, which are arguable the most prevalent ones). It would, however, make the hierarchy more complicated and therefore harder to grasp for the user. New exception classes --------------------- The following tentative list of subclasses, along with a description and the list of errnos mapped to them, is submitted to discussion: * ``FileExistsError``: trying to create a file or directory which already exists (EEXIST) * ``FileNotFoundError``: for all circumstances where a file and directory is requested but doesn't exist (ENOENT) * ``IsADirectoryError``: file-level operation (open(), os.remove()...) requested on a directory (EISDIR) * ``NotADirectoryError``: directory-level operation requested on something else (ENOTDIR) * ``PermissionError``: trying to run an operation without the adequate access rights - for example filesystem permissions (EACCESS, EPERM) * ``BlockingIOError``: an operation would block on an object (e.g. socket) set for non-blocking operation (EAGAIN, EALREADY, EWOULDBLOCK, EINPROGRESS); this is the existing ``io.BlockingIOError`` with an extended role * ``FileDescriptorError``: operation on an invalid file descriptor (EBADF); the default error message could point out that most causes are that an existing file descriptor has been closed * ``ConnectionAbortedError``: connection attempt aborted by peer (ECONNABORTED) * ``ConnectionRefusedError``: connection reset by peer (ECONNREFUSED) * ``ConnectionResetError``: connection reset by peer (ECONNRESET) * ``TimeoutError``: connection timed out (ECONNTIMEOUT); this can be re-cast as a generic timeout exception, useful for other types of timeout (for example in Lock.acquire()) In addition, the following exception class are proposed for inclusion: * ``ConnectionError``: a base class for ``ConnectionAbortedError``, ``ConnectionRefusedError`` and ``ConnectionResetError`` * ``FileSystemError``: a base class for ``FileExistsError``, ``FileNotFoundError``, ``IsADirectoryError`` and ``NotADirectoryError`` The following drawing tries to sum up the proposed additions, along with the corresponding errno values (where applicable). The root of the sub-hierarchy (IOError, assuming `Step 1`_ is accepted in full) is not shown:: +-- BlockingIOError EAGAIN, EALREADY, EWOULDBLOCK, EINPROGRESS +-- ConnectionError +-- ConnectionAbortedError ECONNABORTED +-- ConnectionRefusedError ECONNREFUSED +-- ConnectionResetError ECONNRESET +-- FileDescriptorError EBADF +-- FileSystemError +-- FileExistsError EEXIST +-- FileNotFoundError ENOENT +-- IsADirectoryError EISDIR +-- NotADirectoryError ENOTDIR +-- PermissionError EACCESS, EPERM +-- TimeoutError ECONNTIMEOUT Naming ------ Various naming controversies can arise. One of them is whether all exception class names should end in "``Error``". In favour is consistency with the rest of the exception hiearchy, against is concision (especially with long names such as ``ConnectionAbortedError``). Another cosmetic issue is whether ``FileSystemError`` should be spelled ``FilesystemError`` instead. Exception attributes -------------------- In order to preserve *useful compatibility*, these subclasses should still set adequate values for the various exception attributes defined on the superclass (for example ``errno``, ``filename``, and optionally ``winerror``). Implementation -------------- Since it is proposed that the subclasses are raised based purely on the value of ``errno``, little or no changes should be required in extension modules (either standard or third-party). As long as they use the ``PyErr_SetFromErrno()`` family of functions (or the ``PyErr_SetFromWindowsErr()`` family of functions under Windows), they should automatically benefit from the new, finer-grained exception classes. Library modules written in Python, though, will have to be adapted where they currently use the following idiom (seen in ``Lib/tempfile.py``):: raise IOError(_errno.EEXIST, "No usable temporary file name found") Fortunately, such Python code is quite rare since raising OSError or IOError with an errno value normally happens when interfacing with system calls, which is usually done in C extensions. If there is popular demand, the subroutine choosing an exception type based on the errno value could be exposed for use in pure Python. Possible objections =================== Namespace pollution ------------------- Making the exception hierarchy finer-grained makes the root (or builtins) namespace larger. This is to be moderated, however, as: * only a handful of additional classes are proposed; * while standard exception types live in the root namespace, they are visually distinguished by the fact that they use the CamelCase convention, while almost all other builtins use lowercase naming (except True, False, None, Ellipsis and NotImplemented) An alternative would be to provide a separate module containing the finer-grained exceptions, but that would defeat the purpose of encouraging careful code over careless code, since the user would first have to import the new module instead of using names already accessible. Earlier discussion ================== While this is the first time such as formal proposal is made, the idea has received informal support in the past [1]_; both the introduction of finer-grained exception classes and the coalescing of OSError and IOError. The removal of WindowsError alone has been discussed and rejected as part of another PEP [2]_, but there seemed to be a consensus that the distinction with OSError wasn't meaningful. This supports at least its aliasing with OSError. Moratorium ========== The moratorium in effect on language builtins means this PEP has little chance to be accepted for Python 3.2. Possible alternative ==================== Pattern matching ---------------- Another possibility would be to introduce an advanced pattern matching syntax when catching exceptions. For example:: try: os.remove(filename) except OSError as e if e.errno == errno.ENOENT: pass Several problems with this proposal: * it introduces new syntax, which is perceived by the author to be a heavier change compared to reworking the exception hierarchy * it doesn't decrease typing effort significantly * it doesn't relieve the programmer from the burden of having to remember errno mnemonics Exceptions ignored by this PEP ============================== This PEP ignores ``EOFError``, which signals a truncated input stream in various protocol and file format implementations (for example ``GzipFile``). ``EOFError`` is not OS- or IO-related, it is a logical error raised at a higher level. This PEP also ignores ``SSLError``, which is raised by the ``ssl`` module in order to propagate errors signalled by the ``OpenSSL`` library. Ideally, ``SSLError`` would benefit from a similar but separate treatment since it defines its own constants for error types (``ssl.SSL_ERROR_WANT_READ``, etc.). .. _Appendix A: Appendix A: Survey of common errnos =================================== This is a quick inventory of the various errno mnemonics checked for in the standard library and its tests, as part of ``except`` clauses. Common errnos with OSError -------------------------- * ``EBADF``: bad file descriptor (usually means the file descriptor was closed) * ``EEXIST``: file or directory exists * ``EINTR``: interrupted function call * ``EISDIR``: is a directory * ``ENOTDIR``: not a directory * ``ENOENT``: no such file or directory * ``EOPNOTSUPP``: operation not supported on socket (possible confusion with the existing io.UnsupportedOperation) * ``EPERM``: operation not permitted (when using e.g. os.setuid()) Common errnos with IOError -------------------------- * ``EACCES``: permission denied (for filesystem operations) * ``EBADF``: bad file descriptor (with select.epoll); read operation on a write-only GzipFile, or vice-versa * ``EBUSY``: device or resource busy * ``EISDIR``: is a directory (when trying to open()) * ``ENODEV``: no such device * ``ENOENT``: no such file or directory (when trying to open()) * ``ETIMEDOUT``: connection timed out Common errnos with socket.error ------------------------------- All these errors may also be associated with a plain IOError, for example when calling read() on a socket's file descriptor. * ``EAGAIN``: resource temporarily unavailable (during a non-blocking socket call except connect()) * ``EALREADY``: connection already in progress (during a non-blocking connect()) * ``EINPROGRESS``: operation in progress (during a non-blocking connect()) * ``EINTR``: interrupted function call * ``EISCONN``: the socket is connected * ``ECONNABORTED``: connection aborted by peer (during an accept() call) * ``ECONNREFUSED``: connection refused by peer * ``ECONNRESET``: connection reset by peer * ``ENOTCONN``: socket not connected * ``ESHUTDOWN``: cannot send after transport endpoint shutdown * ``EWOULDBLOCK``: same reasons as ``EAGAIN`` Common errnos with select.error ------------------------------- * ``EINTR``: interrupted function call .. _Appendix B: Appendix B: Survey of raised OS and IO errors ============================================= About VMSError -------------- VMSError is completely unused by the interpreter core and the standard library. It was added as part of the OpenVMS patches submitted in 2002 by Jean-François Piéronne [4]_; the motivation for including VMSError was that it could be raised by third-party packages. Interpreter core ---------------- Handling of PYTHONSTARTUP raises IOError (but the error gets discarded):: $ PYTHONSTARTUP=foox ./python Python 3.2a0 (py3k:82920M, Jul 16 2010, 22:53:23) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. Could not open PYTHONSTARTUP IOError: [Errno 2] No such file or directory: 'foox' ``PyObject_Print()`` raises IOError when ferror() signals an error on the `FILE *` parameter (which, in the source tree, is always either stdout or stderr). Unicode encoding and decoding using the ``mbcs`` encoding can raise WindowsError for some error conditions. Standard library ---------------- bz2 ''' Raises IOError throughout (OSError is unused):: >>> bz2.BZ2File("foox", "rb") Traceback (most recent call last): File "", line 1, in IOError: [Errno 2] No such file or directory >>> bz2.BZ2File("LICENSE", "rb").read() Traceback (most recent call last): File "", line 1, in IOError: invalid data stream >>> bz2.BZ2File("/tmp/zzz.bz2", "wb").read() Traceback (most recent call last): File "", line 1, in IOError: file is not ready for reading curses '''''' Not examined. dbm.gnu, dbm.ndbm ''''''''''''''''' _dbm.error and _gdbm.error inherit from IOError:: >>> dbm.gnu.open("foox") Traceback (most recent call last): File "", line 1, in _gdbm.error: [Errno 2] No such file or directory fcntl ''''' Raises IOError throughout (OSError is unused). imp module '''''''''' Raises IOError for bad file descriptors:: >>> imp.load_source("foo", "foo", 123) Traceback (most recent call last): File "", line 1, in IOError: [Errno 9] Bad file descriptor io module ''''''''' Raises IOError when trying to open a directory under Unix:: >>> open("Python/", "r") Traceback (most recent call last): File "", line 1, in IOError: [Errno 21] Is a directory: 'Python/' Raises IOError or io.UnsupportedOperation (which inherits from the former) for unsupported operations:: >>> open("LICENSE").write("bar") Traceback (most recent call last): File "", line 1, in IOError: not writable >>> io.StringIO().fileno() Traceback (most recent call last): File "", line 1, in io.UnsupportedOperation: fileno >>> open("LICENSE").seek(1, 1) Traceback (most recent call last): File "", line 1, in IOError: can't do nonzero cur-relative seeks Raises either IOError or TypeError when the inferior I/O layer misbehaves (i.e. violates the API it is expected to implement). Raises IOError when the underlying OS resource becomes invalid:: >>> f = open("LICENSE") >>> os.close(f.fileno()) >>> f.read() Traceback (most recent call last): File "", line 1, in IOError: [Errno 9] Bad file descriptor ...or for implementation-specific optimizations:: >>> f = open("LICENSE") >>> next(f) 'A. HISTORY OF THE SOFTWARE\n' >>> f.tell() Traceback (most recent call last): File "", line 1, in IOError: telling position disabled by next() call Raises BlockingIOError (inheriting from IOError) when a call on a non-blocking object would block. mmap '''' Under Unix, raises its own ``mmap.error`` (inheriting from EnvironmentError) throughout:: >>> mmap.mmap(123, 10) Traceback (most recent call last): File "", line 1, in mmap.error: [Errno 9] Bad file descriptor >>> mmap.mmap(os.open("/tmp", os.O_RDONLY), 10) Traceback (most recent call last): File "", line 1, in mmap.error: [Errno 13] Permission denied Under Windows, however, it mostly raises WindowsError (the source code also shows a few occurrences of ``mmap.error``):: >>> fd = os.open("LICENSE", os.O_RDONLY) >>> m = mmap.mmap(fd, 16384) Traceback (most recent call last): File "", line 1, in WindowsError: [Error 5] Accès refusé >>> sys.last_value.errno 13 >>> errno.errorcode[13] 'EACCES' >>> m = mmap.mmap(-1, 4096) >>> m.resize(16384) Traceback (most recent call last): File "", line 1, in WindowsError: [Error 87] Paramètre incorrect >>> sys.last_value.errno 22 >>> errno.errorcode[22] 'EINVAL' multiprocessing ''''''''''''''' Not examined. os / posix '''''''''' The ``os`` (or ``posix``) module raises OSError throughout, except under Windows where WindowsError can be raised instead. ossaudiodev ''''''''''' Raises IOError throughout (OSError is unused):: >>> ossaudiodev.open("foo", "r") Traceback (most recent call last): File "", line 1, in IOError: [Errno 2] No such file or directory: 'foo' readline '''''''' Raises IOError in various file-handling functions:: >>> readline.read_history_file("foo") Traceback (most recent call last): File "", line 1, in IOError: [Errno 2] No such file or directory >>> readline.read_init_file("foo") Traceback (most recent call last): File "", line 1, in IOError: [Errno 2] No such file or directory >>> readline.write_history_file("/dev/nonexistent") Traceback (most recent call last): File "", line 1, in IOError: [Errno 13] Permission denied select '''''' * select() and poll objects raise ``select.error``, which doesn't inherit from anything (but poll.modify() raises IOError); * epoll objects raise IOError; * kqueue objects raise both OSError and IOError. signal '''''' ``signal.ItimerError`` inherits from IOError. socket '''''' ``socket.error`` inherits from IOError. sys ''' ``sys.getwindowsversion()`` raises WindowsError with a bogus error number if the ``GetVersionEx()`` call fails. time '''' Raises IOError for internal errors in time.time() and time.sleep(). zipimport ''''''''' zipimporter.get_data() can raise IOError. Acknowledgments =============== Significant input has been received from Nick Coghlan. References ========== .. [1] "IO module precisions and exception hierarchy": http://mail.python.org/pipermail/python-dev/2009-September/092130.html .. [2] Discussion of "Removing WindowsError" in PEP 348: http://www.python.org/dev/peps/pep-0348/#removing-windowserror .. [3] Google Code Search of ``IOError`` in Python code: `around 40000 results `_; ``OSError``: `around 15200 results `_; ``EnvironmentError``: `around 3000 results `_ .. [4] http://bugs.python.org/issue614055 Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: