Splits PEP 551 into PEP 578 (#674)

2018-06-19 17:09:57 -07:00 · 2018-06-19 17:09:57 -07:00 · 87fb9ab25a
parent 9900d8d696
commit 87fb9ab25a
2 changed files with 622 additions and 458 deletions
--- a/pep-0551.rst
+++ b/pep-0551.rst
@ -4,28 +4,37 @@ Version: $Revision$
 Last-Modified: $Date$
 Author: Steve Dower <steve.dower@python.org>
 Status: Draft
-Type: Standards Track
+Type: Informational
 Content-Type: text/x-rst
 Created: 23-Aug-2017
 Python-Version: 3.7
 Post-History: 24-Aug-2017 (security-sig), 28-Aug-2017 (python-dev)

+Status
+======
+
+This PEP is currently in the process of being split into two.
+
+See PEP 578 for the new auditing APIs proposed for addition to the next
+version of Python.
+
+PEP 551 is now a draft informational PEP, providing guidance to those
+planning to integrate Python into their secure or audited environments.
+
 Abstract
 ========

-This PEP describes additions to the Python API and specific behaviors
-for the CPython implementation that make actions taken by the Python
-runtime visible to security and auditing tools. The goals in order of
-increasing importance are to prevent malicious use of Python, to detect
-and report on malicious use, and most importantly to detect attempts to
-bypass detection. Most of the responsibility for implementation is
-required from users, who must customize and build Python for their own
-environment.
+This PEP describes the concept of security transparency and how it
+applies to the Python runtime. Visibility into actions taken by the
+runtime is invaluable in integrating Python into an otherwise secure
+and/or monitored environment.

-We propose two small sets of public APIs to enable users to reliably
-build their copy of Python without having to modify the core runtime,
-protecting future maintainability. We also discuss recommendations for
-users to help them develop and configure their copy of Python.
+The audit hooks described in PEP-578 are an essential component in
+detecting, identifying and analyzing misuse of Python. While the hooks
+themselves are neutral (in that not every reported event is inherently
+misuse), they provide essential context to those who are responsible
+for monitoring an overall system or network. With enough transparency,
+attackers are no longer able to hide.

 Background
 ==========
@ -126,14 +135,14 @@ tools, most network access and DNS resolution, and attempts to create
 and hide files or configuration settings on the local machine.

 To summarize, defenders have a need to audit specific uses of Python in
-order to detect abnormal or malicious usage. Currently, the Python
-runtime does not provide any ability to do this, which (anecdotally) has
-led to organizations switching to other languages. The aim of this PEP
-is to enable system administrators to deploy a security transparent copy
-of Python that can integrate with their existing auditing and protection
-systems.
+order to detect abnormal or malicious usage. With PEP 578, the Python
+runtime gains the ability to provide this. The aim of this PEP is to
+assist system administrators with deploying a security transparent
+version of Python that can integrate with their existing auditing and
+protection systems.

-On Windows, some specific features that may be enabled by this include:
+On Windows, some specific features that may be integrated through the
+hooks added by PEP 578 include:

 * Script Block Logging [3]_
 * DeviceGuard [4]_
@ -151,7 +160,7 @@ On Linux, some specific features that may be integrated are:
 * SELinux labels [13]_
 * check execute bit on imported modules

-On macOS, some features that may be used with the expanded APIs are:
+On macOS, some features that may be integrated are:

 * OpenBSM [10]_
 * syslog [11]_
@ -161,9 +170,6 @@ production machines is highly appealing to system administrators and
 will make Python a more trustworthy dependency for application
 developers.

-Overview of Changes
-===================
-
 True security transparency is not fully achievable by Python in
 isolation. The runtime can audit as many events as it likes, but unless
 the logs are reviewed and analyzed there is no value. Python may impose
@ -173,340 +179,64 @@ implementations of certain security features, and organizations with the
 resources to fully customize their runtime should be encouraged to do
 so.

-The aim of these changes is to enable system administrators to integrate
-Python into their existing security systems, without dictating what
-those systems look like or how they should behave. We propose two API
-changes to enable this: an Audit Hook and Verified Open Hook. Both are
-not set by default, and both require modifications to the entry point
-binary to enable any functionality. For the purposes of validation and
-example, we propose a new ``spython``/``spython.exe`` entry point
-program that enables some basic functionality using these hooks.
-**However, security-conscious organizations are expected to create their
-own entry points to meet their own needs.**
+Summary Recommendations
+=======================

-Audit Hook
----------
+These are discussed in greater detail in later sections, but are
+presented here to frame the overall discussion.

-In order to achieve security transparency, an API is required to raise
-messages from within certain operations. These operations are typically
-deep within the Python runtime or standard library, such as dynamic code
-compilation, module imports, DNS resolution, or use of certain modules
-such as ``ctypes``.
+Sysadmins should provide and use an alternate entry point (besides
+``python.exe`` or ``pythonX.Y``) in order to reduce surface area and
+securely enable audit hooks. A discussion of what could be restricted
+is below in `Restricting the Entry Point`_.

-The new C APIs required for audit hooks are::
+Sysadmins should use all available measures provided by their operating
+system to prevent modifications to their Python installation, such as
+file permissions, access control lists and signature validation.

-   # Add an auditing hook
-   typedef int (*hook_func)(const char *event, PyObject *args,
-                            void *userData);
-   int PySys_AddAuditHook(hook_func hook, void *userData);
+Sysadmins should log everything and collect logs to a central location
+as quickly as possible - avoid keeping logs on outer-ring machines.

-   # Raise an event with all auditing hooks
-   int PySys_Audit(const char *event, PyObject *args);
-
-   # Internal API used during Py_Finalize() - not publicly accessible
-   void _Py_ClearAuditHooks(void);
-
-The new Python APIs for audit hooks are::
-
-   # Add an auditing hook
-   sys.addaudithook(hook: Callable[str, tuple]) -> None
-
-   # Raise an event with all auditing hooks
-   sys.audit(str, *args) -> None
+Sysadmins should prioritize _detection_ of misuse over _prevention_ of
+misuse.


-Hooks are added by calling ``PySys_AddAuditHook()`` from C at any time,
-including before ``Py_Initialize()``, or by calling
-``sys.addaudithook()`` from Python code. Hooks are never removed or
-replaced, and existing hooks have an opportunity to refuse to allow new
-hooks to be added (adding an audit hook is audited, and so preexisting
-hooks can raise an exception to block the new addition).
+Restricting the Entry Point
+===========================

-When events of interest are occurring, code can either call
-``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The
-string argument is the name of the event, and the tuple contains
-arguments. A given event name should have a fixed schema for arguments,
-and both arguments are considered a public API (for a given x.y version
-of Python), and thus should only change between feature releases with
-updated documentation.
+One of the primary vulnerabilities exposed by the presence of Python
+on a machine is the ability to execute arbitrary code without
+detection or verification by the system. This is made significantly
+easier because the default entry point (``python.exe`` on Windows and
+``pythonX.Y`` on other platforms) allows execution from the command
+line, from standard input, and does not have any hooks enabled by
+default.

-When an event is audited, each hook is called in the order it was added
-with the event name and tuple. If any hook returns with an exception
-set, later hooks are ignored and *in general* the Python runtime should
-terminate. This is intentional to allow hook implementations to decide
-how to respond to any particular event. The typical responses will be to
-log the event, abort the operation with an exception, or to immediately
-terminate the process with an operating system exit call.
+Our recommendation is that production machines should use a modified
+entry point instead of the default. Once outside of the development
+environment, there is rarely a need for the flexibility offered by the
+default entry point.

-When an event is audited but no hooks have been set, the ``audit()``
-function should include minimal overhead. Ideally, each argument is a
-reference to existing data rather than a value calculated just for the
-auditing call.
+In this section, we describe a hypothetical ``spython`` entry point
+(``spython.exe`` on Windows; ``spythonX.Y`` on other platforms) that
+provides a level of security transparency recommended for production
+machines. An associated example implementation shows many of the
+features described here, though with a number of concessions for the
+sake of avoiding platform-specific code. A sufficient implementation
+will inherently require some integration with platform-specific
+security features.

-As hooks may be Python objects, they need to be freed during
-``Py_Finalize()``. To do this, we add an internal API
-``_Py_ClearAuditHooks()`` that releases any ``PyObject*`` hooks that are
-held, as well as any heap memory used. This is an internal function with
-no public export, but it triggers an event for all audit hooks to ensure
-that unexpected calls are logged.
+Official distributions will not include any ``spython`` by default, but
+third party distributions may include appropriately modified entry
+points that use the same name.

-See `Audit Hook Locations`_ for proposed audit hook points and schemas,
-and the `Recommendations`_ section for discussion on
-appropriate responses.
-
-Verified Open Hook
------------------
-
-Most operating systems have a mechanism to distinguish between files
-that can be executed and those that can not. For example, this may be an
-execute bit in the permissions field, or a verified hash of the file
-contents to detect potential code tampering. These are an important
-security mechanism for preventing execution of data or code that is not
-approved for a given environment. Currently, Python has no way to
-integrate with these when launching scripts or importing modules.
-
-The new public C API for the verified open hook is::
-
-   # Set the handler
-   typedef PyObject *(*hook_func)(PyObject *path)
-   int PyImport_SetOpenForImportHook(void *handler)
-
-   # Open a file using the handler
-   PyObject *PyImport_OpenForImport(const char *path)
-
-The new public Python API for the verified open hook is::
-
-   # Open a file using the handler
-   _imp.open_for_import(path)
-
-The ``_imp.open_for_import()`` function is a drop-in replacement for
-``open(str(pathlike), 'rb')``. Its default behaviour is to open a file
-for raw, binary access - any more restrictive behaviour requires the
-use of a custom handler. Only ``str`` arguments are accepted.
-
-A custom handler may be set by calling ``PyImport_SetOpenForImportHook()``
-from C at any time, including before ``Py_Initialize()``. However, if a
-hook has already been set then the call will fail. When
-``open_for_import()`` is called with a hook set, the hook will be passed
-the path and its return value will be returned directly. The returned
-object should be an open file-like object that supports reading raw
-bytes. This is explicitly intended to allow a ``BytesIO`` instance if
-the open handler has already had to read the file into memory in order
-to perform whatever verification is necessary to determine whether the
-content is permitted to be executed.
-
-Note that these hooks can import and call the ``_io.open()`` function on
-CPython without triggering themselves.
-
-If the hook determines that the file is not suitable for execution, it
-should raise an exception of its choice, as well as raising any other
-auditing events or notifications.
-
-All import and execution functionality involving code from a file will
-be changed to use ``open_for_import()`` unconditionally. It is important
-to note that calls to ``compile()``, ``exec()`` and ``eval()`` do not go
-through this function - an audit hook that includes the code from these
-calls will be added and is the best opportunity to validate code that is
-read from the file. Given the current decoupling between import and
-execution in Python, most imported code will go through both
-``open_for_import()`` and the log hook for ``compile``, and so care
-should be taken to avoid repeating verification steps.
-
-.. note::
-   The use of ``open_for_import()`` by ``importlib`` is a valuable
-   first defence, but should not be relied upon to prevent misuse. In
-   particular, it is easy to monkeypatch ``importlib`` in order to
-   bypass the call. Auditing hooks are the primary way to achieve
-   security transparency, and are essential for detecting attempts to
-   bypass other functionality.
-
-API Availability
----------------
-
-While all the functions added here are considered public and stable API,
-the behavior of the functions is implementation specific. The
-descriptions here refer to the CPython implementation, and while other
-implementations should provide the functions, there is no requirement
-that they behave the same.
-
-For example, ``sys.addaudithook()`` and ``sys.audit()`` should exist but
-may do nothing. This allows code to make calls to ``sys.audit()``
-without having to test for existence, but it should not assume that its
-call will have any effect. (Including existence tests in
-security-critical code allows another vector to bypass auditing, so it
-is preferable that the function always exist.)
-
-``_imp.open_for_import(path)`` should at a minimum always return
-``_io.open(path, 'rb')``. Code using the function should make no further
-assumptions about what may occur, and implementations other than CPython
-are not required to let developers override the behavior of this
-function with a hook.
-
-Audit Hook Locations
-====================
-
-Calls to ``sys.audit()`` or ``PySys_Audit()`` will be added to the
-following operations with the schema in Table 1. Unless otherwise
-specified, the ability for audit hooks to abort any listed operation
-should be considered part of the rationale for including the hook.
-
-.. csv-table:: Table 1: Audit Hooks
-   :header: "API Function", "Event Name", "Arguments", "Rationale"
-   :widths: 2, 2, 3, 6
-   
-   ``PySys_AddAuditHook``, ``sys.addaudithook``, "", "Detect when new
-   audit hooks are being added.
-   "
-   ``_PySys_ClearAuditHooks``, ``sys._clearaudithooks``, "", "Notifies
-   hooks they are being cleaned up, mainly in case the event is
-   triggered unexpectedly. This event cannot be aborted.
-   "
-   ``PyImport_SetOpenForImportHook``, ``setopenforimporthook``, "", "
-   Detects any attempt to set the ``open_for_import`` hook.
-   "
-   "``compile``, ``exec``, ``eval``, ``PyAst_CompileString``,
-   ``PyAST_obj2mod``", ``compile``, "``(code, filename_or_none)``", "
-   Detect dynamic code compilation, where ``code`` could be a string or
-   AST. Note that this will be called for regular imports of source
-   code, including those that were opened with ``open_for_import``.
-   "
-   "``exec``, ``eval``, ``run_mod``", ``exec``, "``(code_object,)``", "
-   Detect dynamic execution of code objects. This only occurs for
-   explicit calls, and is not raised for normal function invocation.
-   "
-   ``import``, ``import``, "``(module, filename, sys.path,
-   sys.meta_path, sys.path_hooks)``", "Detect when modules are
-   imported. This is raised before the module name is resolved to a
-   file. All arguments other than the module name may be ``None`` if
-   they are not used or available.
-   "
-   ``code_new``, ``code.__new__``, "``(bytecode, filename, name)``", "
-   Detect dynamic creation of code objects. This only occurs for
-   direct instantiation, and is not raised for normal compilation.
-   "
-   ``func_new_impl``, ``function.__new__``, "``(code,)``", "Detect
-   dynamic creation of function objects. This only occurs for direct
-   instantiation, and is not raised for normal compilation.
-   "
-   "``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
-   ``(module_or_path,)``", "Detect when native modules are used.
-   "
-   ``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``", "
-   Collect information about specific symbols retrieved from native
-   modules.
-   "
-   ``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect
-   when code is accessing arbitrary memory using ``ctypes``.
-   "
-   ``id``, ``id``, "``(id_as_int,)``", "Detect when code is accessing
-   the id of objects, which in CPython reveals information about
-   memory layout.
-   "
-   ``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect
-   when code is accessing frames directly.
-   "
-   ``sys._current_frames``, ``sys._current_frames``, "", "Detect when
-   code is accessing frames directly.
-   "
-   ``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
-   injecting trace functions. Because of the implementation, exceptions
-   raised from the hook will abort the operation, but will not be
-   raised in Python code. Note that ``threading.setprofile`` eventually
-   calls this function, so the event will be audited for each thread.
-   "
-   ``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is
-   injecting trace functions. Because of the implementation, exceptions
-   raised from the hook will abort the operation, but will not be
-   raised in Python code. Note that ``threading.settrace`` eventually
-   calls this function, so the event will be audited for each thread.
-   "
-   ``_PyEval_SetAsyncGenFirstiter``, ``sys.set_async_gen_firstiter``, "
-   ", "Detect changes to async generator hooks.
-   "
-   ``_PyEval_SetAsyncGenFinalizer``, ``sys.set_async_gen_finalizer``, "
-   ", "Detect changes to async generator hooks.
-   "
-   ``_PyEval_SetCoroutineWrapper``, ``sys.set_coroutine_wrapper``, "
-   ", "Detect changes to the coroutine wrapper.
-   "
-   "``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
-   ``socket.getaddrinfo``, ``socket.getnameinfo``, ``socket.sendmsg``,
-   ``socket.sendto``", ``socket.address``, "``(address,)``", "Detect
-   access to network resources. The address is unmodified from the
-   original call.
-   "
-   ``socket.__init__``, "socket()", "``(family, type, proto)``", "
-   Detect creation of sockets. The arguments will be int values.
-   "
-   ``socket.gethostname``, ``socket.gethostname``, "", "Detect attempts
-   to retrieve the current host name.
-   "
-   ``socket.sethostname``, ``socket.sethostname``, "``(name,)``", "
-   Detect attempts to change the current host name. The name argument
-   is passed as a bytes object.
-   "
-   "``socket.gethostbyname``, ``socket.gethostbyname_ex``",
-   "``socket.gethostbyname``", "``(name,)``", "Detect host name
-   resolution. The name argument is a str or bytes object.
-   "
-   ``socket.gethostbyaddr``, ``socket.gethostbyaddr``, "
-   ``(address,)``", "Detect host resolution. The address argument is a
-   str or bytes object.
-   "
-   ``socket.getservbyname``, ``socket.getservbyname``, "``(name,
-   protocol)``", "Detect service resolution. The arguments are str
-   objects.
-   "
-   "``socket.getservbyport``", ``socket.getservbyport``, "``(port,
-   protocol)``", "Detect service resolution. The port argument is an
-   int and protocol is a str.
-   "
-   "``member_get``, ``func_get_code``, ``func_get_[kw]defaults``
-   ",``object.__getattr__``,"``(object, attr)``","Detect access to
-   restricted attributes. This event is raised for any built-in
-   members that are marked as restricted, and members that may allow
-   bypassing imports.
-   "
-   "``_PyObject_GenericSetAttr``, ``check_set_special_type_attr``,
-   ``object_set_class``, ``func_set_code``, ``func_set_[kw]defaults``","
-   ``object.__setattr__``","``(object, attr, value)``","Detect monkey
-   patching of types and objects. This event
-   is raised for the ``__class__`` attribute and any attribute on
-   ``type`` objects.
-   "
-   "``_PyObject_GenericSetAttr``",``object.__delattr__``,"``(object,
-   attr)``","Detect deletion of object attributes. This event is raised
-   for any attribute on ``type`` objects.
-   "
-   "``Unpickler.find_class``",``pickle.find_class``,"``(module_name,
-   global_name)``","Detect imports and global name lookup when
-   unpickling.
-   "
-   "``array_new``",``array.__new__``,"``(typecode, initial_value)``", "
-   Detects creation of array objects.
-   "
-
-TODO - more hooks in ``_socket``, ``_ssl``, others?
-
-SPython Entry Point
-===================
-
-A new entry point binary will be added, called ``spython.exe`` on
-Windows and ``spythonX.Y`` on other platforms. This entry point is
-intended primarily as an example, as we expect most users of this
-functionality to implement their own entry point and hooks (see
-`Recommendations`_). It will also be used for tests.
-
-Source builds will build ``spython`` by default, but distributions
-should not include it except as a test binary. The python.org managed
-binary distributions will not include ``spython``.
-
-**Do not accept most command-line arguments**
+**Remove most command-line arguments**

 The ``spython`` entry point requires a script file be passed as the
-first argument, and does not allow any options. This prevents arbitrary
-code execution from in-memory data or non-script files (such as pickles,
-which can be executed using ``-m pickle <path>``.
+first argument, and does not allow any options to precede it. This
+prevents arbitrary code execution from in-memory data or non-script
+files (such as pickles, which could be executed using
+``-m pickle <path>``.

 Options ``-B`` (do not write bytecode), ``-E`` (ignore environment
 variables) and ``-s`` (no user site) are assumed.
@ -517,38 +247,57 @@ will be used to initialize ``sys.path`` following the rules currently
 described `for Windows
 <https://docs.python.org/3/using/windows.html#finding-modules>`_.

-When built with ``Py_DEBUG``, the ``spython`` entry point will allow a
-``-i`` option with no other arguments to enter into interactive mode,
-with audit messages being written to standard error rather than a file.
-This is intended for testing and debugging only.
+For the sake of demonstration, the example implementation of
+``spython`` also allows the ``-i`` option to start in interactive mode.
+This is not recommended for restricted entry points.

-**Log security events to a file**
+**Log audited events**

-Before initialization, ``spython`` will set an audit hook that writes
-events to a local file. By default, this file is the full path of the
-process with a ``.log`` suffix, but may be overridden with the
-``SPYTHONLOG`` environment variable (despite such overrides being
-explicitly discouraged in `Recommendations`_).
+Before initialization, ``spython`` sets an audit hook that writes all
+audited events to an OS-managed log file. On Windows, this is the Event
+Tracing functionality,[7]_ and on other platforms they go to
+syslog.[11]_ Logs are copied from the machine as frequently as possible
+to prevent loss of information should an attacker attempt to clear
+local logs or prevent legitimate access to the machine.

 The audit hook will also abort all ``sys.addaudithook`` events,
 preventing any other hooks from being added.

+The logging hook is written in native code and configured before the
+interpreter is initialized. This is the only opportunity to ensure that
+no Python code executes without auditing, and that Python code cannot
+prevent registration of the hook.
+
+Our primary aim is to record all actions taken by all Python processes,
+so that detection may be performed offline against logged events.
+Having all events recorded also allows for deeper analysis and the use
+of machine learning algorithms. These are useful for detecting
+persistent attacks, where the attacker is intending to remain within
+the protected machines for some period of time, as well as for later
+analysis to determine the impact and exposure caused by a successful
+attack.
+
+The example implementation of ``spython`` writes to a log file on the
+local machine, for the sake of demonstration. When started with ``-i``,
+the example implementation writes all audit events to standard error
+instead of the log file. The ``SPYTHONLOG`` environment variable can be
+used to specify the log file location.
+
 **Restrict importable modules**

-Also before initialization, ``spython`` will set an open-for-import
-hook that validates all files opened with ``os.open_for_import``. This
-implementation will require all files to have a ``.py`` suffix (thereby
-blocking the use of cached bytecode), and will raise a custom audit
-event ``spython.open_for_import`` containing ``(filename,
-True_if_allowed)``.
+Also before initialization, ``spython`` sets an open-for-import hook
+that validates all files opened with ``os.open_for_import``. This
+implementation requires all files to have a ``.py`` suffix (preventing
+the use of cached bytecode), and will raise a custom audit event
+``spython.open_for_import`` containing ``(filename, True_if_allowed)``.

-On Windows, the hook will also open the file with flags that prevent any
-other process from opening it with write access, which allows the hook
-to perform additional validation on the contents with confidence that it
-will not be modified between the check and use. Compilation will later
-trigger a ``compile`` event, so there is no need to read the contents
-now for AMSI, but other validation mechanisms such as DeviceGuard [4]_
-should be performed here.
+After opening the file, the entire contents is read into memory in a
+single buffer and the file is closed. 
+
+Compilation will later trigger a ``compile`` event, so there is no need
+to validate the contents now using mechanisms that also apply to
+dynamically generated code. However, if a whitelist of source files or
+file hashes is available, then this is the point

 **Restrict globals in pickles**

@ -556,35 +305,37 @@ The ``spython`` entry point will abort all ``pickle.find_class`` events
 that use the default implementation. Overrides will not raise audit
 events unless explicitly added, and so they will continue to be allowed.

-Performance Impact
-==================
+**Prevent os.system**

-The important performance impact is the case where events are being
-raised but there are no hooks attached. This is the unavoidable case -
-once a distributor or sysadmin begins adding audit hooks they have
-explicitly chosen to trade performance for functionality. Performance
-impact using ``spython`` or with hooks added are not of interest here,
-since this is considered opt-in functionality.
+The ``spython`` entry point aborts all ``os.system`` calls.

-Analysis using the ``performance`` tool shows no significant impact,
-with the vast majority of benchmarks showing between 1.05x faster to
-1.05x slower.
+It should be noted here that ``subprocess.Popen(shell=True)`` is
+allowed (though logged via the platform-specific process creation
+events). This tradeoff is made because it is much simpler to induce a
+running application to call ``os.system`` with a single string argument
+than a function with multiple arguments, and so it is more likely to be
+used as part of an exploit. There is also little justification for
+using ``os.system`` in production code, while ``subprocess.Popen`` has
+a large number of legitimate uses. Though logs indicating the use of
+the ``shell=True`` argument should be more carefully scrutinised.

-In our opinion, the performance impact of the set of auditing points
-described in this PEP is negligible.
+Sysadmins are encouraged to make these kinds of tradeoffs between
+restriction and detection, and generally should prefer detection.

-Recommendations
-===============
+General Recommendations
+=======================

-Specific recommendations are difficult to make, as the ideal
-configuration for any environment will depend on the user's ability to
-manage, monitor, and respond to activity on their own network. However,
-many of the proposals here do not appear to be of value without deeper
-illustration. This section provides recommendations using the terms
-**should** (or **should not**), indicating that we consider it dangerous
-to ignore the advice, and **may**, indicating that for the advice ought
-to be considered for high value systems. The term **sysadmins** refers
-to whoever is responsible for deploying Python throughout your network;
+Recommendations beyond those suggested in the previous section are
+difficult, as the ideal configuration for any environment depends on
+the sysadmin's ability to manage, monitor, and respond to activity on
+their own network. Nonetheless, here we attempt to provide some context
+and guidance for integrating Python into a complete system.
+
+This section provides recommendations using the terms **should** (or
+**should not**), indicating that we consider it risky to ignore the
+advice, and **may**, indicating that for the advice ought to be
+considered for high value systems. The term **sysadmin** refers to
+whoever is responsible for deploying Python throughout the network;
 different organizations may have an alternative title for the
 responsible people.

@ -668,72 +419,6 @@ attribute changes on type objects.

 [TODO: more good advice; less bad advice]

-Rejected Ideas
-==============
-
-Separate module for audit hooks
-------------------------------
-
-The proposal is to add a new module for audit hooks, hypothetically
-``audit``. This would separate the API and implementation from the
-``sys`` module, and allow naming the C functions ``PyAudit_AddHook`` and
-``PyAudit_Audit`` rather than the current variations.
-
-Any such module would need to be a built-in module that is guaranteed to
-always be present. The nature of these hooks is that they must be
-callable without condition, as any conditional imports or calls provide
-more opportunities to intercept and suppress or modify events.
-
-Given its nature as one of the most core modules, the ``sys`` module is
-somewhat protected against module shadowing attacks. Replacing ``sys``
-with a sufficiently functional module that the application can still run
-is a much more complicated task than replacing a module with only one
-function of interest. An attacker that has the ability to shadow the
-``sys`` module is already capable of running arbitrary code from files,
-whereas an ``audit`` module can be replaced with a single statement::
-
-    import sys; sys.modules['audit'] = type('audit', (object,),
-        {'audit': lambda *a: None, 'addhook': lambda *a: None})
-
-Multiple layers of protection already exist for monkey patching attacks
-against either ``sys`` or ``audit``, but assignments or insertions to
-``sys.modules`` are not audited.
-
-This idea is rejected because it makes substituting ``audit`` calls
-throughout all callers near trivial.
-
-Flag in sys.flags to indicate "secure" mode
-------------------------------------------
-
-The proposal is to add a value in ``sys.flags`` to indicate when Python
-is running in a "secure" mode. This would allow applications to detect
-when some features are enabled and modify their behaviour appropriately.
-
-Currently there are no guarantees made about security by this PEP - this
-section is the first time the word "secure" has been used. Security
-**transparency** does not result in any changed behaviour, so there is
-no appropriate reason for applications to modify their behaviour.
-
-Both application-level APIs ``sys.audit`` and ``_imp.open_for_import``
-are always present and functional, regardless of whether the regular
-``python`` entry point or some alternative entry point is used. Callers
-cannot determine whether any hooks have been added (except by performing
-side-channel analysis), nor do they need to. The calls should be fast
-enough that callers do not need to avoid them, and the sysadmin is
-responsible for ensuring their added hooks are fast enough to not affect
-application performance.
-
-The argument that this is "security by obscurity" is valid, but
-irrelevant. Security by obscurity is only an issue when there are no
-other protective mechanisms; obscurity as the first step in avoiding
-attack is strongly recommended (see `this article
-<https://danielmiessler.com/study/security-by-obscurity/>`_ for
-discussion).
-
-This idea is rejected because there are no appropriate reasons for an
-application to change its behaviour based on whether these APIs are in
-use.
-
 Further Reading
 ===============

@ -820,7 +505,7 @@ discussions.
 Copyright
 =========

-Copyright (c) 2017 by Microsoft Corporation. This material may be
+Copyright (c) 2017-2018 by Microsoft Corporation. This material may be
 distributed only subject to the terms and conditions set forth in the
 Open Publication License, v1.0 or later (the latest version is presently
 available at http://www.opencontent.org/openpub/).
--- a/pep-0578.rst
+++ b/pep-0578.rst
@ -0,0 +1,479 @@
+PEP: 578
+Title: Python Runtime Audit Hooks
+Version: $Revision$
+Last-Modified: $Date$
+Author: Steve Dower <steve.dower@python.org>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 16-Jun-2018
+Python-Version: 3.8
+Post-History: 
+
+Abstract
+========
+
+This PEP describes additions to the Python API and specific behaviors
+for the CPython implementation that make actions taken by the Python
+runtime visible to auditing tools. Visibility into these actions
+provides opportunities for test frameworks, logging frameworks, and
+security tools to monitor and optionally limit actions taken by the
+runtime.
+
+This PEP proposes adding two APIs to provide insights into a running
+Python application: one for arbitrary events, and another specific to
+the module import system. The APIs are intended to be available in all
+Python implementations, though the specific messages and values used
+are unspecified here to allow implementations the freedom to determine
+how best to provide information to their users. Some examples likely
+to be used in CPython are provided for explanatory purposes.
+
+See PEP-551 for discussion and recommendations on enhancing the
+security of a Python runtime making use of these auditing APIs.
+
+Background
+==========
+
+Python provides access to a wide range of low-level functionality on
+many common operating systems in a consistent manner. While this is
+incredibly useful for "write-once, run-anywhere" scripting, it also
+makes monitoring of software written in Python difficult. Because
+Python uses native system APIs directly, existing monitoring
+tools either suffer from limited context or auditing bypass.
+
+Limited context occurs when system monitoring can report that an
+action occurred, but cannot explain the sequence of events leading to
+it. For example, network monitoring at the OS level may be able to
+report "listening started on port 5678", but may not be able to
+provide the process ID, command line or parent process, or the local
+state in the program at the point that triggered the action. Firewall
+controls to prevent such an action are similarly limited, typically
+to a process name or some global state such as the current user, and
+in any case rarely provide a useful log file correlated with other
+application messages.
+
+Auditing bypass can occur when the typical system tool used for an
+action would ordinarily report its use, but accessing the APIs via
+Python do not trigger this. For example, invoking "curl" to make HTTP
+requests may be specifically monitored in an audited system, but
+Python's "urlretrieve" function is not.
+
+Within a long-running Python application, particularly one that
+processes user-provided information such as a web app, there is a risk
+of unexpected behavior. This may be due to bugs in the code, or
+deliberately induced by a malicious user. In both cases, normal
+application logging may be bypassed resulting in no indication that
+anything out of the ordinary has occurred.
+
+Additionally, and somewhat unique to Python, it is very easy to affect
+the code that is run in an application by manipulating either the
+import system's search path or placing files earlier on the path than
+intended. This is often seen when developers create a script with the
+same name as the module they intend to use - for example, a
+``random.py`` file that attempts to import the standard library
+``random`` module.
+
+Overview of Changes
+===================
+
+The aim of these changes is to enable both application developers and
+system administrators to integrate Python into their existing
+monitoring systems without dictating how those systems look or behave.
+
+We propose two API changes to enable this: an Audit Hook and Verified
+Open Hook. Both are available from Python and native code, allowing
+applications and frameworks written in pure Python code to take
+advantage of the extra messages, while also allowing embedders or
+system administrators to deploy "always-on" builds of Python.
+
+Only CPython is bound to provide the native APIs as described here.
+Other implementations should provide the pure Python APIs, and
+may provide native versions as appropriate for their underlying
+runtimes.
+
+Audit Hook
+----------
+
+In order to observe actions taken by the runtime (on behalf of the
+caller), an API is required to raise messages from within certain
+operations. These operations are typically deep within the Python
+runtime or standard library, such as dynamic code compilation, module
+imports, DNS resolution, or use of certain modules such as ``ctypes``.
+
+The following new C APIs allow embedders and CPython implementors to
+send and receive audit hook messages::
+
+   # Add an auditing hook
+   typedef int (*hook_func)(const char *event, PyObject *args,
+                            void *userData);
+   int PySys_AddAuditHook(hook_func hook, void *userData);
+
+   # Raise an event with all auditing hooks
+   int PySys_Audit(const char *event, PyObject *args);
+
+   # Internal API used during Py_Finalize() - not publicly accessible
+   void _Py_ClearAuditHooks(void);
+
+The new Python APIs for receiving and raising audit hooks are::
+
+   # Add an auditing hook
+   sys.addaudithook(hook: Callable[[str, tuple]])
+
+   # Raise an event with all auditing hooks
+   sys.audit(str, *args)
+
+
+Hooks are added by calling ``PySys_AddAuditHook()`` from C at any time,
+including before ``Py_Initialize()``, or by calling
+``sys.addaudithook()`` from Python code. Hooks cannot be removed or
+replaced.
+
+When events of interest are occurring, code can either call
+``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The
+string argument is the name of the event, and the tuple contains
+arguments. A given event name should have a fixed schema for arguments,
+which should be considered a public API (for a given x.y version
+release), and thus should only change between feature releases with
+updated documentation.
+
+For maximum compatibility, events using the same name as an event in
+the reference interpreter CPython should make every attempt to use
+compatible arguments. Including the name or an abbreviation of the
+implementation in implementation-specific event names will also help
+prevent collisions. For example, a ``pypy.jit_invoked`` event is clearly
+distinguised from an ``ipy.jit_invoked`` event.
+
+When an event is audited, each hook is called in the order it was added
+with the event name and tuple. If any hook returns with an exception
+set, later hooks are ignored and *in general* the Python runtime should
+terminate. This is intentional to allow hook implementations to decide
+how to respond to any particular event. The typical responses will be to
+log the event, abort the operation with an exception, or to immediately
+terminate the process with an operating system exit call.
+
+When an event is audited but no hooks have been set, the ``audit()``
+function should include minimal overhead. Ideally, each argument is a
+reference to existing data rather than a value calculated just for the
+auditing call.
+
+As hooks may be Python objects, they need to be freed during
+``Py_Finalize()``. To do this, we add an internal API
+``_Py_ClearAuditHooks()`` that releases any Python hooks and any
+memory held. This is an internal function with no public export, and
+we recommend it should raise its own audit event for all current hooks
+to ensure that unexpected calls are observed.
+
+Below in `Suggested Audit Hook Locations`_, we recommend some important
+operations that should raise audit events. In PEP 551, more audited
+operations are recommended with a view to security transparency. 
+
+Python implementations should document which operations will raise
+audit events, along with the event schema. It is intended that
+``sys.addaudithook(print)`` be a trivial way to display all messages.
+
+Verified Open Hook
+------------------
+
+Most operating systems have a mechanism to distinguish between files
+that can be executed and those that can not. For example, this may be an
+execute bit in the permissions field, or a verified hash of the file
+contents to detect potential code tampering. These are an important
+security mechanism for preventing execution of data or code that is not
+approved for a given environment. Currently, Python has no way to
+integrate with these when launching scripts or importing modules.
+
+The new public C API for the verified open hook is::
+
+   # Set the handler
+   typedef PyObject *(*hook_func)(PyObject *path, void *userData)
+   int PyImport_SetOpenForImportHook(hook_func handler, void *userData)
+
+   # Open a file using the handler
+   PyObject *PyImport_OpenForImport(const char *path)
+
+The new public Python API for the verified open hook is::
+
+   # Open a file using the handler
+   importlib.util.open_for_import(path : str) -> io.IOBase
+
+
+The ``importlib.util.open_for_import()`` function is a drop-in
+replacement for ``open(str(pathlike), 'rb')``. Its default behaviour is
+to open a file for raw, binary access. To change the behaviour a new
+handler should be set. Handler functions only accept ``str`` arguments.
+
+A custom handler may be set by calling ``PyImport_SetOpenForImportHook()``
+from C at any time, including before ``Py_Initialize()``. However, if a
+hook has already been set then the call will fail. When
+``open_for_import()`` is called with a hook set, the hook will be passed
+the path and its return value will be returned directly. The returned
+object should be an open file-like object that supports reading raw
+bytes. This is explicitly intended to allow a ``BytesIO`` instance if
+the open handler has already had to read the file into memory in order
+to perform whatever verification is necessary to determine whether the
+content is permitted to be executed.
+
+Note that these hooks can import and call the ``_io.open()`` function on
+CPython without triggering themselves. They can also use ``_io.BytesIO``
+to return a compatible result using an in-memory buffer.
+
+If the hook determines that the file should not be loaded, it should
+raise an exception of its choice, as well as performing any other
+logging.
+
+All import and execution functionality involving code from a file will
+be changed to use ``open_for_import()`` unconditionally. It is important
+to note that calls to ``compile()``, ``exec()`` and ``eval()`` do not go
+through this function - an audit hook that includes the code from these
+calls is the best opportunity to validate code that is read from the
+file. Given the current decoupling between import and execution in
+Python, most imported code will go through both ``open_for_import()``
+and the log hook for ``compile``, and so care should be taken to avoid
+repeating verification steps.
+
+There is no Python API provided for changing the open hook. To modify
+import behavior from Python code, use the existing functionality
+provided by ``importlib``.
+
+API Availability
+----------------
+
+While all the functions added here are considered public and stable API,
+the behavior of the functions is implementation specific. Most
+descriptions here refer to the CPython implementation, and while other
+implementations should provide the functions, there is no requirement
+that they behave the same.
+
+For example, ``sys.addaudithook()`` and ``sys.audit()`` should exist but
+may do nothing. This allows code to make calls to ``sys.audit()``
+without having to test for existence, but it should not assume that its
+call will have any effect. (Including existence tests in
+security-critical code allows another vector to bypass auditing, so it
+is preferable that the function always exist.)
+
+``importlib.util.open_for_import(path)`` should at a minimum always
+return ``_io.open(path, 'rb')``. Code using the function should make no
+further assumptions about what may occur, and implementations other than
+CPython are not required to let developers override the behavior of this
+function with a hook.
+
+Suggested Audit Hook Locations
+==============================
+
+The locations and parameters in calls to ``sys.audit()`` or
+``PySys_Audit()`` are to be determined by individual Python
+implementations. This is to allow maximum freedom for implementations
+to expose the operations that are most relevant to their platform,
+and to avoid or ignore potentially expensive or noisy events.
+
+Table 1 acts as both suggestions of operations that should trigger
+audit events on all implementations, and examples of event schemas.
+
+Table 2 provides further examples that are not required, but are
+likely to be available in CPython.
+
+Refer to the documentation associated with your version of Python to
+see which operations provide audit events.
+
+.. csv-table:: Table 1: Suggested Audit Hooks
+   :header: "API Function", "Event Name", "Arguments", "Rationale"
+   :widths: 2, 2, 3, 6
+   
+   ``PySys_AddAuditHook``, ``sys.addaudithook``, "", "Detect when new
+   audit hooks are being added.
+   "
+   ``PyImport_SetOpenForImportHook``, ``setopenforimporthook``, "", "
+   Detects any attempt to set the ``open_for_import`` hook.
+   "
+   "``compile``, ``exec``, ``eval``, ``PyAst_CompileString``,
+   ``PyAST_obj2mod``", ``compile``, "``(code, filename_or_none)``", "
+   Detect dynamic code compilation, where ``code`` could be a string or
+   AST. Note that this will be called for regular imports of source
+   code, including those that were opened with ``open_for_import``.
+   "
+   "``exec``, ``eval``, ``run_mod``", ``exec``, "``(code_object,)``", "
+   Detect dynamic execution of code objects. This only occurs for
+   explicit calls, and is not raised for normal function invocation.
+   "
+   ``import``, ``import``, "``(module, filename, sys.path,
+   sys.meta_path, sys.path_hooks)``", "Detect when modules are
+   imported. This is raised before the module name is resolved to a
+   file. All arguments other than the module name may be ``None`` if
+   they are not used or available.
+   "
+   ``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
+   injecting trace functions. Because of the implementation, exceptions
+   raised from the hook will abort the operation, but will not be
+   raised in Python code. Note that ``threading.setprofile`` eventually
+   calls this function, so the event will be audited for each thread.
+   "
+   ``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is
+   injecting trace functions. Because of the implementation, exceptions
+   raised from the hook will abort the operation, but will not be
+   raised in Python code. Note that ``threading.settrace`` eventually
+   calls this function, so the event will be audited for each thread.
+   "
+   "``_PyObject_GenericSetAttr``, ``check_set_special_type_attr``,
+   ``object_set_class``, ``func_set_code``, ``func_set_[kw]defaults``","
+   ``object.__setattr__``","``(object, attr, value)``","Detect monkey
+   patching of types and objects. This event
+   is raised for the ``__class__`` attribute and any attribute on
+   ``type`` objects.
+   "
+   "``_PyObject_GenericSetAttr``",``object.__delattr__``,"``(object,
+   attr)``","Detect deletion of object attributes. This event is raised
+   for any attribute on ``type`` objects.
+   "
+   "``Unpickler.find_class``",``pickle.find_class``,"``(module_name,
+   global_name)``","Detect imports and global name lookup when
+   unpickling.
+   "
+
+
+.. csv-table:: Table 2: Potential CPython Audit Hooks
+   :header: "API Function", "Event Name", "Arguments", "Rationale"
+   :widths: 2, 2, 3, 6
+   
+   ``_PySys_ClearAuditHooks``, ``sys._clearaudithooks``, "", "Notifies
+   hooks they are being cleaned up, mainly in case the event is
+   triggered unexpectedly. This event cannot be aborted.
+   "
+   ``code_new``, ``code.__new__``, "``(bytecode, filename, name)``", "
+   Detect dynamic creation of code objects. This only occurs for
+   direct instantiation, and is not raised for normal compilation.
+   "
+   ``func_new_impl``, ``function.__new__``, "``(code,)``", "Detect
+   dynamic creation of function objects. This only occurs for direct
+   instantiation, and is not raised for normal compilation.
+   "
+   "``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
+   ``(module_or_path,)``", "Detect when native modules are used.
+   "
+   ``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``", "
+   Collect information about specific symbols retrieved from native
+   modules.
+   "
+   ``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect
+   when code is accessing arbitrary memory using ``ctypes``.
+   "
+   ``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect
+   when code is accessing frames directly.
+   "
+   ``sys._current_frames``, ``sys._current_frames``, "", "Detect when
+   code is accessing frames directly.
+   "
+   "``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
+   ``socket.getaddrinfo``, ``socket.getnameinfo``, ``socket.sendmsg``,
+   ``socket.sendto``", ``socket.address``, "``(address,)``", "Detect
+   access to network resources. The address is unmodified from the
+   original call.
+   "
+   "``member_get``, ``func_get_code``, ``func_get_[kw]defaults``
+   ",``object.__getattr__``,"``(object, attr)``","Detect access to
+   restricted attributes. This event is raised for any built-in
+   members that are marked as restricted, and members that may allow
+   bypassing imports.
+   "
+
+
+Performance Impact
+==================
+
+The important performance impact is the case where events are being
+raised but there are no hooks attached. This is the unavoidable case -
+once a distributor or sysadmin begins adding audit hooks they have
+explicitly chosen to trade performance for functionality. Performance
+impact using ``spython`` or with hooks added are not of interest here,
+since this is considered opt-in functionality.
+
+Analysis using the ``performance`` tool shows no significant impact,
+with the vast majority of benchmarks showing between 1.05x faster to
+1.05x slower.
+
+In our opinion, the performance impact of the set of auditing points
+described in this PEP is negligible.
+
+Rejected Ideas
+==============
+
+Separate module for audit hooks
+-------------------------------
+
+The proposal is to add a new module for audit hooks, hypothetically
+``audit``. This would separate the API and implementation from the
+``sys`` module, and allow naming the C functions ``PyAudit_AddHook`` and
+``PyAudit_Audit`` rather than the current variations.
+
+Any such module would need to be a built-in module that is guaranteed to
+always be present. The nature of these hooks is that they must be
+callable without condition, as any conditional imports or calls provide
+more opportunities to intercept and suppress or modify events.
+
+Given its nature as one of the most core modules, the ``sys`` module is
+somewhat protected against module shadowing attacks. Replacing ``sys``
+with a sufficiently functional module that the application can still run
+is a much more complicated task than replacing a module with only one
+function of interest. An attacker that has the ability to shadow the
+``sys`` module is already capable of running arbitrary code from files,
+whereas an ``audit`` module can be replaced with a single statement::
+
+    import sys; sys.modules['audit'] = type('audit', (object,),
+        {'audit': lambda *a: None, 'addhook': lambda *a: None})
+
+Multiple layers of protection already exist for monkey patching attacks
+against either ``sys`` or ``audit``, but assignments or insertions to
+``sys.modules`` are not audited.
+
+This idea is rejected because it makes substituting ``audit`` calls
+throughout all callers near trivial.
+
+Flag in sys.flags to indicate "secure" mode
+-------------------------------------------
+
+The proposal is to add a value in ``sys.flags`` to indicate when Python
+is running in a "secure" mode. This would allow applications to detect
+when some features are enabled and modify their behaviour appropriately.
+
+Currently there are no guarantees made about security by this PEP - this
+section is the first time the word "secure" has been used. Security
+**transparency** does not result in any changed behaviour, so there is
+no appropriate reason for applications to modify their behaviour.
+
+Both application-level APIs ``sys.audit`` and ``_imp.open_for_import``
+are always present and functional, regardless of whether the regular
+``python`` entry point or some alternative entry point is used. Callers
+cannot determine whether any hooks have been added (except by performing
+side-channel analysis), nor do they need to. The calls should be fast
+enough that callers do not need to avoid them, and the sysadmin is
+responsible for ensuring their added hooks are fast enough to not affect
+application performance.
+
+The argument that this is "security by obscurity" is valid, but
+irrelevant. Security by obscurity is only an issue when there are no
+other protective mechanisms; obscurity as the first step in avoiding
+attack is strongly recommended (see `this article
+<https://danielmiessler.com/study/security-by-obscurity/>`_ for
+discussion).
+
+This idea is rejected because there are no appropriate reasons for an
+application to change its behaviour based on whether these APIs are in
+use.
+
+
+Acknowledgments
+===============
+
+Thanks to all the people from Microsoft involved in helping make the
+Python runtime safer for production use, and especially to James Powell
+for doing much of the initial research, analysis and implementation, Lee
+Holmes for invaluable insights into the info-sec field and PowerShell's
+responses, and Brett Cannon for the restraining and grounding
+discussions.
+
+Copyright
+=========
+
+Copyright (c) 2018 by Microsoft Corporation. This material may be
+distributed only subject to the terms and conditions set forth in the
+Open Publication License, v1.0 or later (the latest version is presently
+available at http://www.opencontent.org/openpub/).