534 lines
24 KiB
ReStructuredText
534 lines
24 KiB
ReStructuredText
PEP: 578
|
||
Title: Python Runtime Audit Hooks
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Steve Dower <steve.dower@python.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 16-Jun-2018
|
||
Python-Version: 3.8
|
||
Post-History:
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP describes additions to the Python API and specific behaviors
|
||
for the CPython implementation that make actions taken by the Python
|
||
runtime visible to auditing tools. Visibility into these actions
|
||
provides opportunities for test frameworks, logging frameworks, and
|
||
security tools to monitor and optionally limit actions taken by the
|
||
runtime.
|
||
|
||
This PEP proposes adding two APIs to provide insights into a running
|
||
Python application: one for arbitrary events, and another specific to
|
||
the module import system. The APIs are intended to be available in all
|
||
Python implementations, though the specific messages and values used
|
||
are unspecified here to allow implementations the freedom to determine
|
||
how best to provide information to their users. Some examples likely
|
||
to be used in CPython are provided for explanatory purposes.
|
||
|
||
See PEP 551 for discussion and recommendations on enhancing the
|
||
security of a Python runtime making use of these auditing APIs.
|
||
|
||
Background
|
||
==========
|
||
|
||
Python provides access to a wide range of low-level functionality on
|
||
many common operating systems. While this is incredibly useful for
|
||
"write-once, run-anywhere" scripting, it also makes monitoring of
|
||
software written in Python difficult. Because Python uses native system
|
||
APIs directly, existing monitoring tools either suffer from limited
|
||
context or auditing bypass.
|
||
|
||
Limited context occurs when system monitoring can report that an
|
||
action occurred, but cannot explain the sequence of events leading to
|
||
it. For example, network monitoring at the OS level may be able to
|
||
report "listening started on port 5678", but may not be able to
|
||
provide the process ID, command line, parent process, or the local
|
||
state in the program at the point that triggered the action. Firewall
|
||
controls to prevent such an action are similarly limited, typically
|
||
to process names or some global state such as the current user, and
|
||
in any case rarely provide a useful log file correlated with other
|
||
application messages.
|
||
|
||
Auditing bypass can occur when the typical system tool used for an
|
||
action would ordinarily report its use, but accessing the APIs via
|
||
Python do not trigger this. For example, invoking "curl" to make HTTP
|
||
requests may be specifically monitored in an audited system, but
|
||
Python's "urlretrieve" function is not.
|
||
|
||
Within a long-running Python application, particularly one that
|
||
processes user-provided information such as a web app, there is a risk
|
||
of unexpected behavior. This may be due to bugs in the code, or
|
||
deliberately induced by a malicious user. In both cases, normal
|
||
application logging may be bypassed resulting in no indication that
|
||
anything out of the ordinary has occurred.
|
||
|
||
Additionally, and somewhat unique to Python, it is very easy to affect
|
||
the code that is run in an application by manipulating either the
|
||
import system's search path or placing files earlier on the path than
|
||
intended. This is often seen when developers create a script with the
|
||
same name as the module they intend to use - for example, a
|
||
``random.py`` file that attempts to import the standard library
|
||
``random`` module.
|
||
|
||
This is not sandboxing, as this proposal does not attempt to prevent
|
||
malicious behavior (though it enables some new options to do so).
|
||
See the `Why Not A Sandbox`_ section below for further discussion.
|
||
|
||
Overview of Changes
|
||
===================
|
||
|
||
The aim of these changes is to enable both application developers and
|
||
system administrators to integrate Python into their existing
|
||
monitoring systems without dictating how those systems look or behave.
|
||
|
||
We propose two API changes to enable this: an Audit Hook and Verified
|
||
Open Hook. Both are available from Python and native code, allowing
|
||
applications and frameworks written in pure Python code to take
|
||
advantage of the extra messages, while also allowing embedders or
|
||
system administrators to deploy builds of Python where auditing is
|
||
always enabled.
|
||
|
||
Only CPython is bound to provide the native APIs as described here.
|
||
Other implementations should provide the pure Python APIs, and
|
||
may provide native versions as appropriate for their underlying
|
||
runtimes. Auditing events are likewise considered implementation
|
||
specific, but are bound by normal feature compatibility guarantees.
|
||
|
||
Audit Hook
|
||
----------
|
||
|
||
In order to observe actions taken by the runtime (on behalf of the
|
||
caller), an API is required to raise messages from within certain
|
||
operations. These operations are typically deep within the Python
|
||
runtime or standard library, such as dynamic code compilation, module
|
||
imports, DNS resolution, or use of certain modules such as ``ctypes``.
|
||
|
||
The following new C APIs allow embedders and CPython implementors to
|
||
send and receive audit hook messages::
|
||
|
||
# Add an auditing hook
|
||
typedef int (*hook_func)(const char *event, PyObject *args,
|
||
void *userData);
|
||
int PySys_AddAuditHook(hook_func hook, void *userData);
|
||
|
||
# Raise an event with all auditing hooks
|
||
int PySys_Audit(const char *event, PyObject *args);
|
||
|
||
# Internal API used during Py_Finalize() - not publicly accessible
|
||
void _Py_ClearAuditHooks(void);
|
||
|
||
The new Python APIs for receiving and raising audit hooks are::
|
||
|
||
# Add an auditing hook
|
||
sys.addaudithook(hook: Callable[[str, tuple]])
|
||
|
||
# Raise an event with all auditing hooks
|
||
sys.audit(str, *args)
|
||
|
||
|
||
Hooks are added by calling ``PySys_AddAuditHook()`` from C at any time,
|
||
including before ``Py_Initialize()``, or by calling
|
||
``sys.addaudithook()`` from Python code. Hooks cannot be removed or
|
||
replaced.
|
||
|
||
When events of interest are occurring, code can either call
|
||
``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The
|
||
string argument is the name of the event, and the tuple contains
|
||
arguments. A given event name should have a fixed schema for arguments,
|
||
which should be considered a public API (for each x.y version release),
|
||
and thus should only change between feature releases with updated
|
||
documentation.
|
||
|
||
For maximum compatibility, events using the same name as an event in
|
||
the reference interpreter CPython should make every attempt to use
|
||
compatible arguments. Including the name or an abbreviation of the
|
||
implementation in implementation-specific event names will also help
|
||
prevent collisions. For example, a ``pypy.jit_invoked`` event is clearly
|
||
distinguised from an ``ipy.jit_invoked`` event.
|
||
|
||
When an event is audited, each hook is called in the order it was added
|
||
with the event name and tuple. If any hook returns with an exception
|
||
set, later hooks are ignored and *in general* the Python runtime should
|
||
terminate. This is intentional to allow hook implementations to decide
|
||
how to respond to any particular event. The typical responses will be to
|
||
log the event, abort the operation with an exception, or to immediately
|
||
terminate the process with an operating system exit call.
|
||
|
||
When an event is audited but no hooks have been set, the ``audit()``
|
||
function should impose minimal overhead. Ideally, each argument is a
|
||
reference to existing data rather than a value calculated just for the
|
||
auditing call.
|
||
|
||
As hooks may be Python objects, they need to be freed during
|
||
``Py_Finalize()``. To do this, we add an internal API
|
||
``_Py_ClearAuditHooks()`` that releases any Python hooks and any
|
||
memory held. This is an internal function with no public export, and
|
||
we recommend it raise its own audit event for all current hooks to
|
||
ensure that unexpected calls are observed.
|
||
|
||
Below in `Suggested Audit Hook Locations`_, we recommend some important
|
||
operations that should raise audit events.
|
||
|
||
Python implementations should document which operations will raise
|
||
audit events, along with the event schema. It is intentional that
|
||
``sys.addaudithook(print)`` be a trivial way to display all messages.
|
||
|
||
Verified Open Hook
|
||
------------------
|
||
|
||
Most operating systems have a mechanism to distinguish between files
|
||
that can be executed and those that can not. For example, this may be an
|
||
execute bit in the permissions field, a verified hash of the file
|
||
contents to detect potential code tampering, or file system path
|
||
restrictions. These are an important security mechanism for preventing
|
||
execution of data or code that is not approved for a given environment.
|
||
Currently, Python has no way to integrate with these when launching
|
||
scripts or importing modules.
|
||
|
||
The new public C API for the verified open hook is::
|
||
|
||
# Set the handler
|
||
typedef PyObject *(*hook_func)(PyObject *path, void *userData)
|
||
int PyFile_SetOpenCodeHook(hook_func handler, void *userData)
|
||
|
||
# Open a file using the handler
|
||
PyObject *PyFile_OpenCode(const char *path)
|
||
|
||
The new public Python API for the verified open hook is::
|
||
|
||
# Open a file using the handler
|
||
io.open_code(path : str) -> io.IOBase
|
||
|
||
|
||
The ``io.open_code()`` function is a drop-in replacement for
|
||
``open(abspath(str(pathlike)), 'rb')``. Its default behaviour is to
|
||
open a file for raw, binary access. To change the behaviour a new
|
||
handler should be set. Handler functions only accept ``str`` arguments.
|
||
The C API ``PyFile_OpenCode`` function assumes UTF-8 encoding. Paths
|
||
must be absolute, and it is the responsibility of the caller to ensure
|
||
the full path is correctly resolved.
|
||
|
||
A custom handler may be set by calling ``PyFile_SetOpenCodeHook()`` from
|
||
C at any time, including before ``Py_Initialize()``. However, if a hook
|
||
has already been set then the call will fail. When ``open_code()`` is
|
||
called with a hook set, the hook will be passed the path and its return
|
||
value will be returned directly. The returned object should be an open
|
||
file-like object that supports reading raw bytes. This is explicitly
|
||
intended to allow a ``BytesIO`` instance if the open handler has already
|
||
read the entire file into memory.
|
||
|
||
Note that these hooks can import and call the ``_io.open()`` function on
|
||
CPython without triggering themselves. They can also use ``_io.BytesIO``
|
||
to return a compatible result using an in-memory buffer.
|
||
|
||
If the hook determines that the file should not be loaded, it should
|
||
raise an exception of its choice, as well as performing any other
|
||
logging.
|
||
|
||
All import and execution functionality involving code from a file will
|
||
be changed to use ``open_code()`` unconditionally. It is important to
|
||
note that calls to ``compile()``, ``exec()`` and ``eval()`` do not go
|
||
through this function - an audit hook that includes the code from these
|
||
calls is the best opportunity to validate code that is read from the
|
||
file. Given the current decoupling between import and execution in
|
||
Python, most imported code will go through both ``open_code()`` and the
|
||
log hook for ``compile``, and so care should be taken to avoid
|
||
repeating verification steps.
|
||
|
||
There is no Python API provided for changing the open hook. To modify
|
||
import behavior from Python code, use the existing functionality
|
||
provided by ``importlib``.
|
||
|
||
API Availability
|
||
----------------
|
||
|
||
While all the functions added here are considered public and stable API,
|
||
the behavior of the functions is implementation specific. Most
|
||
descriptions here refer to the CPython implementation, and while other
|
||
implementations should provide the functions, there is no requirement
|
||
that they behave the same.
|
||
|
||
For example, ``sys.addaudithook()`` and ``sys.audit()`` should exist but
|
||
may do nothing. This allows code to make calls to ``sys.audit()``
|
||
without having to test for existence, but it should not assume that its
|
||
call will have any effect. (Including existence tests in
|
||
security-critical code allows another vector to bypass auditing, so it
|
||
is preferable that the function always exist.)
|
||
|
||
``io.open_code(path)`` should at a minimum always return
|
||
``_io.open(path, 'rb')``. Code using the function should make no further
|
||
assumptions about what may occur, and implementations other than CPython
|
||
are not required to let developers override the behavior of this
|
||
function with a hook.
|
||
|
||
Suggested Audit Hook Locations
|
||
==============================
|
||
|
||
The locations and parameters in calls to ``sys.audit()`` or
|
||
``PySys_Audit()`` are to be determined by individual Python
|
||
implementations. This is to allow maximum freedom for implementations
|
||
to expose the operations that are most relevant to their platform,
|
||
and to avoid or ignore potentially expensive or noisy events.
|
||
|
||
Table 1 acts as both suggestions of operations that should trigger
|
||
audit events on all implementations, and examples of event schemas.
|
||
|
||
Table 2 provides further examples that are not required, but are
|
||
likely to be available in CPython.
|
||
|
||
Refer to the documentation associated with your version of Python to
|
||
see which operations provide audit events.
|
||
|
||
.. csv-table:: Table 1: Suggested Audit Hooks
|
||
:header: "API Function", "Event Name", "Arguments", "Rationale"
|
||
:widths: 2, 2, 3, 6
|
||
|
||
``PySys_AddAuditHook``, ``sys.addaudithook``, "", "Detect when new
|
||
audit hooks are being added.
|
||
"
|
||
``PyFile_SetOpenCodeHook``, ``setopencodehook``, "", "
|
||
Detects any attempt to set the ``open_code`` hook.
|
||
"
|
||
"``compile``, ``exec``, ``eval``, ``PyAst_CompileString``,
|
||
``PyAST_obj2mod``", ``compile``, "``(code, filename_or_none)``", "
|
||
Detect dynamic code compilation, where ``code`` could be a string or
|
||
AST. Note that this will be called for regular imports of source
|
||
code, including those that were opened with ``open_code``.
|
||
"
|
||
"``exec``, ``eval``, ``run_mod``", ``exec``, "``(code_object,)``", "
|
||
Detect dynamic execution of code objects. This only occurs for
|
||
explicit calls, and is not raised for normal function invocation.
|
||
"
|
||
``import``, ``import``, "``(module, filename, sys.path,
|
||
sys.meta_path, sys.path_hooks)``", "Detect when modules are
|
||
imported. This is raised before the module name is resolved to a
|
||
file. All arguments other than the module name may be ``None`` if
|
||
they are not used or available.
|
||
"
|
||
"``open``", ``open``, "``(path, mode, flags)``", "Detect when a file
|
||
is about to be opened. *path* and *mode* are the usual parameters to
|
||
``open`` if available, while *flags* is provided instead of *mode*
|
||
in some cases.
|
||
"
|
||
``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
|
||
injecting trace functions. Because of the implementation, exceptions
|
||
raised from the hook will abort the operation, but will not be
|
||
raised in Python code. Note that ``threading.setprofile`` eventually
|
||
calls this function, so the event will be audited for each thread.
|
||
"
|
||
``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is
|
||
injecting trace functions. Because of the implementation, exceptions
|
||
raised from the hook will abort the operation, but will not be
|
||
raised in Python code. Note that ``threading.settrace`` eventually
|
||
calls this function, so the event will be audited for each thread.
|
||
"
|
||
"``_PyObject_GenericSetAttr``, ``check_set_special_type_attr``,
|
||
``object_set_class``, ``func_set_code``, ``func_set_[kw]defaults``","
|
||
``object.__setattr__``","``(object, attr, value)``","Detect monkey
|
||
patching of types and objects. This event
|
||
is raised for the ``__class__`` attribute and any attribute on
|
||
``type`` objects.
|
||
"
|
||
"``_PyObject_GenericSetAttr``",``object.__delattr__``,"``(object,
|
||
attr)``","Detect deletion of object attributes. This event is raised
|
||
for any attribute on ``type`` objects.
|
||
"
|
||
"``Unpickler.find_class``",``pickle.find_class``,"``(module_name,
|
||
global_name)``","Detect imports and global name lookup when
|
||
unpickling.
|
||
"
|
||
|
||
|
||
.. csv-table:: Table 2: Potential CPython Audit Hooks
|
||
:header: "API Function", "Event Name", "Arguments", "Rationale"
|
||
:widths: 2, 2, 3, 6
|
||
|
||
``_PySys_ClearAuditHooks``, ``sys._clearaudithooks``, "", "Notifies
|
||
hooks they are being cleaned up, mainly in case the event is
|
||
triggered unexpectedly. This event cannot be aborted.
|
||
"
|
||
``code_new``, ``code.__new__``, "``(bytecode, filename, name)``", "
|
||
Detect dynamic creation of code objects. This only occurs for
|
||
direct instantiation, and is not raised for normal compilation.
|
||
"
|
||
``func_new_impl``, ``function.__new__``, "``(code,)``", "Detect
|
||
dynamic creation of function objects. This only occurs for direct
|
||
instantiation, and is not raised for normal compilation.
|
||
"
|
||
"``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
|
||
``(module_or_path,)``", "Detect when native modules are used.
|
||
"
|
||
``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``", "
|
||
Collect information about specific symbols retrieved from native
|
||
modules.
|
||
"
|
||
``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect
|
||
when code is accessing arbitrary memory using ``ctypes``.
|
||
"
|
||
"``new_mmap_object``",``mmap.__new__``,"``(fileno, map_size, access,
|
||
offset)``", "Detects creation of mmap objects. On POSIX, access may
|
||
have been calculated from the ``prot`` and ``flags`` arguments.
|
||
"
|
||
``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect
|
||
when code is accessing frames directly.
|
||
"
|
||
``sys._current_frames``, ``sys._current_frames``, "", "Detect when
|
||
code is accessing frames directly.
|
||
"
|
||
"``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
|
||
``socket.getaddrinfo``, ``socket.getnameinfo``, ``socket.sendmsg``,
|
||
``socket.sendto``", ``socket.address``, "``(address,)``", "Detect
|
||
access to network resources. The address is unmodified from the
|
||
original call.
|
||
"
|
||
"``member_get``, ``func_get_code``, ``func_get_[kw]defaults``
|
||
",``object.__getattr__``,"``(object, attr)``","Detect access to
|
||
restricted attributes. This event is raised for any built-in
|
||
members that are marked as restricted, and members that may allow
|
||
bypassing imports.
|
||
"
|
||
"``urllib.urlopen``",``urllib.Request``,"``(url, data, headers,
|
||
method)``", "Detects URL requests.
|
||
"
|
||
|
||
Performance Impact
|
||
==================
|
||
|
||
The important performance impact is the case where events are being
|
||
raised but there are no hooks attached. This is the unavoidable case -
|
||
once a developer has added audit hooks they have explicitly chosen to
|
||
trade performance for functionality. Performance impact with hooks added
|
||
are not of interest here, since this is opt-in functionality.
|
||
|
||
Analysis using the Python Performance Benchmark Suite [1]_ shows no
|
||
significant impact, with the vast majority of benchmarks showing
|
||
between 1.05x faster to 1.05x slower.
|
||
|
||
In our opinion, the performance impact of the set of auditing points
|
||
described in this PEP is negligible.
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Separate module for audit hooks
|
||
-------------------------------
|
||
|
||
The proposal is to add a new module for audit hooks, hypothetically
|
||
``audit``. This would separate the API and implementation from the
|
||
``sys`` module, and allow naming the C functions ``PyAudit_AddHook`` and
|
||
``PyAudit_Audit`` rather than the current variations.
|
||
|
||
Any such module would need to be a built-in module that is guaranteed to
|
||
always be present. The nature of these hooks is that they must be
|
||
callable without condition, as any conditional imports or calls provide
|
||
opportunities to intercept and suppress or modify events.
|
||
|
||
Given it is one of the most core modules, the ``sys`` module is somewhat
|
||
protected against module shadowing attacks. Replacing ``sys`` with a
|
||
sufficiently functional module that the application can still run is a
|
||
much more complicated task than replacing a module with only one
|
||
function of interest. An attacker that has the ability to shadow the
|
||
``sys`` module is already capable of running arbitrary code from files,
|
||
whereas an ``audit`` module could be replaced with a single line in a
|
||
``.pth`` file anywhere on the search path::
|
||
|
||
import sys; sys.modules['audit'] = type('audit', (object,),
|
||
{'audit': lambda *a: None, 'addhook': lambda *a: None})
|
||
|
||
Multiple layers of protection already exist for monkey patching attacks
|
||
against either ``sys`` or ``audit``, but assignments or insertions to
|
||
``sys.modules`` are not audited.
|
||
|
||
This idea is rejected because it makes it trivial to suppress all calls
|
||
to ``audit``.
|
||
|
||
Flag in sys.flags to indicate "audited" mode
|
||
--------------------------------------------
|
||
|
||
The proposal is to add a value in ``sys.flags`` to indicate when Python
|
||
is running in a "secure" or "audited" mode. This would allow
|
||
applications to detect when some features are enabled or when hooks
|
||
have been added and modify their behaviour appropriately.
|
||
|
||
Currently, we are not aware of any legitimate reasons for a program to
|
||
behave differently in the presence of audit hooks.
|
||
|
||
Both application-level APIs ``sys.audit`` and ``io.open_code`` are
|
||
always present and functional, regardless of whether the regular
|
||
``python`` entry point or some alternative entry point is used. Callers
|
||
cannot determine whether any hooks have been added (except by performing
|
||
side-channel analysis), nor do they need to. The calls should be fast
|
||
enough that callers do not need to avoid them, and the program is
|
||
responsible for ensuring that any added hooks are fast enough to not
|
||
affect application performance.
|
||
|
||
The argument that this is "security by obscurity" is valid, but
|
||
irrelevant. Security by obscurity is only an issue when there are no
|
||
other protective mechanisms; obscurity as the first step in avoiding
|
||
attack is strongly recommended (see `this article
|
||
<https://danielmiessler.com/study/security-by-obscurity/>`_ for
|
||
discussion).
|
||
|
||
This idea is rejected because there are no appropriate reasons for an
|
||
application to change its behaviour based on whether these APIs are in
|
||
use.
|
||
|
||
Why Not A Sandbox
|
||
=================
|
||
|
||
Sandboxing CPython has been attempted many times in the past, and each
|
||
past attempt has failed. Fundamentally, the problem is that certain
|
||
functionality has to be restricted when executing the sandboxed code,
|
||
but otherwise needs to be available for normal operation of Python. For
|
||
example, completely removing the ability to compile strings into
|
||
bytecode also breaks the ability to import modules from source code, and
|
||
if it is not completely removed then there are too many ways to get
|
||
access to that functionality indirectly. There is not yet any feasible
|
||
way to generically determine whether a given operation is "safe" or not.
|
||
Further information and references available at [2]_.
|
||
|
||
This proposal does not attempt to restrict functionality, but simply
|
||
exposes the fact that the functionality is being used. Particularly for
|
||
intrusion scenarios, detection is significantly more important than
|
||
early prevention (as early prevention will generally drive attackers to
|
||
use an alternate, less-detectable, approach). The availability of audit
|
||
hooks alone does not change the attack surface of Python in any way, but
|
||
they enable defenders to integrate Python into their environment in ways
|
||
that are currently not possible.
|
||
|
||
Since audit hooks have the ability to safely prevent an operation
|
||
occuring, this feature does enable the ability to provide some level of
|
||
sandboxing. In most cases, however, the intention is to enable logging
|
||
rather than creating a sandbox.
|
||
|
||
Relationship to PEP 551
|
||
=======================
|
||
|
||
This API was originally presented as part of
|
||
`PEP 551 <https://www.python.org/dev/peps/pep-0551/>`_ Security
|
||
Transparency in the Python Runtime.
|
||
|
||
For simpler review purposes, and due to the broader applicability of
|
||
these APIs beyond security, the API design is now presented separately.
|
||
|
||
PEP 551 is an informational PEP discussing how to integrate Python into
|
||
a secure or audited environment.
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] Python Performance Benchmark Suite `<https://github.com/python/performance>`_
|
||
|
||
.. [2] Python Security model - Sandbox `<https://python-security.readthedocs.io/security.html#sandbox>`_
|
||
|
||
Copyright
|
||
=========
|
||
|
||
Copyright (c) 2019 by Microsoft Corporation. This material may be
|
||
distributed only subject to the terms and conditions set forth in the
|
||
Open Publication License, v1.0 or later (the latest version is presently
|
||
available at http://www.opencontent.org/openpub/).
|