2018-06-19 20:09:57 -04:00
|
|
|
|
PEP: 578
|
|
|
|
|
Title: Python Runtime Audit Hooks
|
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
|
|
|
|
Author: Steve Dower <steve.dower@python.org>
|
2019-04-17 15:15:36 -04:00
|
|
|
|
BDFL-Delegate: Christian Heimes <christian@python.org>
|
2019-05-07 16:51:11 -04:00
|
|
|
|
Status: Accepted
|
2018-06-19 20:09:57 -04:00
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 16-Jun-2018
|
|
|
|
|
Python-Version: 3.8
|
2019-05-07 16:51:11 -04:00
|
|
|
|
Post-History: 28-March-2019, 07-May-2019
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
This PEP describes additions to the Python API and specific behaviors
|
|
|
|
|
for the CPython implementation that make actions taken by the Python
|
|
|
|
|
runtime visible to auditing tools. Visibility into these actions
|
|
|
|
|
provides opportunities for test frameworks, logging frameworks, and
|
|
|
|
|
security tools to monitor and optionally limit actions taken by the
|
|
|
|
|
runtime.
|
|
|
|
|
|
|
|
|
|
This PEP proposes adding two APIs to provide insights into a running
|
|
|
|
|
Python application: one for arbitrary events, and another specific to
|
|
|
|
|
the module import system. The APIs are intended to be available in all
|
|
|
|
|
Python implementations, though the specific messages and values used
|
|
|
|
|
are unspecified here to allow implementations the freedom to determine
|
|
|
|
|
how best to provide information to their users. Some examples likely
|
|
|
|
|
to be used in CPython are provided for explanatory purposes.
|
|
|
|
|
|
2022-01-21 06:03:51 -05:00
|
|
|
|
See :pep:`551` for discussion and recommendations on enhancing the
|
2018-06-19 20:09:57 -04:00
|
|
|
|
security of a Python runtime making use of these auditing APIs.
|
|
|
|
|
|
|
|
|
|
Background
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
Python provides access to a wide range of low-level functionality on
|
2019-03-28 18:53:22 -04:00
|
|
|
|
many common operating systems. While this is incredibly useful for
|
|
|
|
|
"write-once, run-anywhere" scripting, it also makes monitoring of
|
|
|
|
|
software written in Python difficult. Because Python uses native system
|
|
|
|
|
APIs directly, existing monitoring tools either suffer from limited
|
|
|
|
|
context or auditing bypass.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
Limited context occurs when system monitoring can report that an
|
|
|
|
|
action occurred, but cannot explain the sequence of events leading to
|
|
|
|
|
it. For example, network monitoring at the OS level may be able to
|
|
|
|
|
report "listening started on port 5678", but may not be able to
|
2019-03-28 18:53:22 -04:00
|
|
|
|
provide the process ID, command line, parent process, or the local
|
2018-06-19 20:09:57 -04:00
|
|
|
|
state in the program at the point that triggered the action. Firewall
|
|
|
|
|
controls to prevent such an action are similarly limited, typically
|
2019-03-28 18:53:22 -04:00
|
|
|
|
to process names or some global state such as the current user, and
|
2018-06-19 20:09:57 -04:00
|
|
|
|
in any case rarely provide a useful log file correlated with other
|
|
|
|
|
application messages.
|
|
|
|
|
|
|
|
|
|
Auditing bypass can occur when the typical system tool used for an
|
|
|
|
|
action would ordinarily report its use, but accessing the APIs via
|
|
|
|
|
Python do not trigger this. For example, invoking "curl" to make HTTP
|
|
|
|
|
requests may be specifically monitored in an audited system, but
|
|
|
|
|
Python's "urlretrieve" function is not.
|
|
|
|
|
|
|
|
|
|
Within a long-running Python application, particularly one that
|
|
|
|
|
processes user-provided information such as a web app, there is a risk
|
|
|
|
|
of unexpected behavior. This may be due to bugs in the code, or
|
|
|
|
|
deliberately induced by a malicious user. In both cases, normal
|
|
|
|
|
application logging may be bypassed resulting in no indication that
|
|
|
|
|
anything out of the ordinary has occurred.
|
|
|
|
|
|
|
|
|
|
Additionally, and somewhat unique to Python, it is very easy to affect
|
|
|
|
|
the code that is run in an application by manipulating either the
|
|
|
|
|
import system's search path or placing files earlier on the path than
|
|
|
|
|
intended. This is often seen when developers create a script with the
|
|
|
|
|
same name as the module they intend to use - for example, a
|
|
|
|
|
``random.py`` file that attempts to import the standard library
|
|
|
|
|
``random`` module.
|
|
|
|
|
|
2019-03-28 18:53:22 -04:00
|
|
|
|
This is not sandboxing, as this proposal does not attempt to prevent
|
|
|
|
|
malicious behavior (though it enables some new options to do so).
|
|
|
|
|
See the `Why Not A Sandbox`_ section below for further discussion.
|
|
|
|
|
|
2018-06-19 20:09:57 -04:00
|
|
|
|
Overview of Changes
|
|
|
|
|
===================
|
|
|
|
|
|
|
|
|
|
The aim of these changes is to enable both application developers and
|
|
|
|
|
system administrators to integrate Python into their existing
|
|
|
|
|
monitoring systems without dictating how those systems look or behave.
|
|
|
|
|
|
|
|
|
|
We propose two API changes to enable this: an Audit Hook and Verified
|
|
|
|
|
Open Hook. Both are available from Python and native code, allowing
|
|
|
|
|
applications and frameworks written in pure Python code to take
|
|
|
|
|
advantage of the extra messages, while also allowing embedders or
|
2019-03-28 18:53:22 -04:00
|
|
|
|
system administrators to deploy builds of Python where auditing is
|
|
|
|
|
always enabled.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
Only CPython is bound to provide the native APIs as described here.
|
|
|
|
|
Other implementations should provide the pure Python APIs, and
|
|
|
|
|
may provide native versions as appropriate for their underlying
|
2019-03-28 18:53:22 -04:00
|
|
|
|
runtimes. Auditing events are likewise considered implementation
|
|
|
|
|
specific, but are bound by normal feature compatibility guarantees.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
Audit Hook
|
|
|
|
|
----------
|
|
|
|
|
|
|
|
|
|
In order to observe actions taken by the runtime (on behalf of the
|
|
|
|
|
caller), an API is required to raise messages from within certain
|
|
|
|
|
operations. These operations are typically deep within the Python
|
|
|
|
|
runtime or standard library, such as dynamic code compilation, module
|
|
|
|
|
imports, DNS resolution, or use of certain modules such as ``ctypes``.
|
|
|
|
|
|
|
|
|
|
The following new C APIs allow embedders and CPython implementors to
|
|
|
|
|
send and receive audit hook messages::
|
|
|
|
|
|
|
|
|
|
# Add an auditing hook
|
|
|
|
|
typedef int (*hook_func)(const char *event, PyObject *args,
|
|
|
|
|
void *userData);
|
|
|
|
|
int PySys_AddAuditHook(hook_func hook, void *userData);
|
|
|
|
|
|
|
|
|
|
# Raise an event with all auditing hooks
|
|
|
|
|
int PySys_Audit(const char *event, PyObject *args);
|
|
|
|
|
|
|
|
|
|
The new Python APIs for receiving and raising audit hooks are::
|
|
|
|
|
|
|
|
|
|
# Add an auditing hook
|
|
|
|
|
sys.addaudithook(hook: Callable[[str, tuple]])
|
|
|
|
|
|
|
|
|
|
# Raise an event with all auditing hooks
|
|
|
|
|
sys.audit(str, *args)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hooks are added by calling ``PySys_AddAuditHook()`` from C at any time,
|
|
|
|
|
including before ``Py_Initialize()``, or by calling
|
|
|
|
|
``sys.addaudithook()`` from Python code. Hooks cannot be removed or
|
2019-05-07 15:36:45 -04:00
|
|
|
|
replaced. For CPython, hooks added from C are global, while hooks added
|
|
|
|
|
from Python are only for the current interpreter. Global hooks are
|
|
|
|
|
executed before interpreter hooks.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
When events of interest are occurring, code can either call
|
|
|
|
|
``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The
|
|
|
|
|
string argument is the name of the event, and the tuple contains
|
|
|
|
|
arguments. A given event name should have a fixed schema for arguments,
|
2019-03-28 18:53:22 -04:00
|
|
|
|
which should be considered a public API (for each x.y version release),
|
|
|
|
|
and thus should only change between feature releases with updated
|
2019-05-07 15:36:45 -04:00
|
|
|
|
documentation. To minimize overhead and simplify handling in native code
|
|
|
|
|
hook implementations, named arguments are not supported.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
For maximum compatibility, events using the same name as an event in
|
|
|
|
|
the reference interpreter CPython should make every attempt to use
|
|
|
|
|
compatible arguments. Including the name or an abbreviation of the
|
|
|
|
|
implementation in implementation-specific event names will also help
|
|
|
|
|
prevent collisions. For example, a ``pypy.jit_invoked`` event is clearly
|
2019-05-07 15:36:45 -04:00
|
|
|
|
distinguished from an ``ipy.jit_invoked`` event. Events raised from
|
|
|
|
|
Python modules should include their module or package name in the event
|
|
|
|
|
name.
|
|
|
|
|
|
|
|
|
|
While event names may be arbitrary UTF-8 strings, for consistency across
|
|
|
|
|
implementations it is recommended to use valid Python dotted names and
|
|
|
|
|
avoid encoding specific details in the name. For example, an ``import``
|
|
|
|
|
event with the module name ``spam`` as an argument is preferable to a
|
|
|
|
|
``spam module imported`` event with no arguments. Avoid using embedded
|
|
|
|
|
null characters or you may upset those who implement hooks using C.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
When an event is audited, each hook is called in the order it was added
|
2019-05-07 15:36:45 -04:00
|
|
|
|
(as much as is possible), passing the event name and arguments. If any
|
|
|
|
|
hook returns with an exception set, later hooks are ignored and *in
|
|
|
|
|
general* the Python runtime should terminate - exceptions from hooks are
|
|
|
|
|
not intended to be handled or treated as expected occurrences. This
|
|
|
|
|
allows hook implementations to decide how to respond to any particular
|
|
|
|
|
event. The typical responses will be to log the event, abort the
|
|
|
|
|
operation with an exception, or to immediately terminate the process with
|
|
|
|
|
an operating system exit call.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
When an event is audited but no hooks have been set, the ``audit()``
|
2019-03-28 18:53:22 -04:00
|
|
|
|
function should impose minimal overhead. Ideally, each argument is a
|
2019-05-07 15:36:45 -04:00
|
|
|
|
reference to existing data rather than a value calculated just for the
|
2018-06-19 20:09:57 -04:00
|
|
|
|
auditing call.
|
|
|
|
|
|
|
|
|
|
As hooks may be Python objects, they need to be freed during
|
2019-05-07 15:36:45 -04:00
|
|
|
|
interpreter or runtime finalization. These should not be triggered at
|
|
|
|
|
any other time, and should raise an event hook to ensure that any
|
|
|
|
|
unexpected calls are observed.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
Below in `Suggested Audit Hook Locations`_, we recommend some important
|
2019-05-07 15:36:45 -04:00
|
|
|
|
operations that should raise audit events. In general, events should be
|
|
|
|
|
raised at the lowest possible level. Given the choice between raising an
|
|
|
|
|
event from Python code or native code, raising from native code should be
|
|
|
|
|
preferred.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
Python implementations should document which operations will raise
|
2019-03-28 18:53:22 -04:00
|
|
|
|
audit events, along with the event schema. It is intentional that
|
2019-05-07 15:36:45 -04:00
|
|
|
|
``sys.addaudithook(print)`` is a trivial way to display all messages.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
Verified Open Hook
|
|
|
|
|
------------------
|
|
|
|
|
|
|
|
|
|
Most operating systems have a mechanism to distinguish between files
|
|
|
|
|
that can be executed and those that can not. For example, this may be an
|
2019-03-28 18:53:22 -04:00
|
|
|
|
execute bit in the permissions field, a verified hash of the file
|
|
|
|
|
contents to detect potential code tampering, or file system path
|
2019-04-10 14:46:20 -04:00
|
|
|
|
restrictions. These are an important security mechanism for ensuring
|
|
|
|
|
that only code that has been approved for a given environment is
|
2019-05-07 15:36:45 -04:00
|
|
|
|
executed.
|
|
|
|
|
|
|
|
|
|
Most kernels offer ways to restrict or audit binaries loaded and executed
|
|
|
|
|
by the kernel. File types owned by Python appear as regular data and
|
|
|
|
|
these features do not apply. This open hook allows Python embedders to
|
|
|
|
|
integrate with operating system support when launching scripts or
|
|
|
|
|
importing Python code.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
The new public C API for the verified open hook is::
|
|
|
|
|
|
|
|
|
|
# Set the handler
|
|
|
|
|
typedef PyObject *(*hook_func)(PyObject *path, void *userData)
|
2019-04-02 12:11:22 -04:00
|
|
|
|
int PyFile_SetOpenCodeHook(hook_func handler, void *userData)
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
# Open a file using the handler
|
2019-04-02 12:11:22 -04:00
|
|
|
|
PyObject *PyFile_OpenCode(const char *path)
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
The new public Python API for the verified open hook is::
|
|
|
|
|
|
|
|
|
|
# Open a file using the handler
|
2019-04-02 12:11:22 -04:00
|
|
|
|
io.open_code(path : str) -> io.IOBase
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
|
2019-04-02 12:11:22 -04:00
|
|
|
|
The ``io.open_code()`` function is a drop-in replacement for
|
|
|
|
|
``open(abspath(str(pathlike)), 'rb')``. Its default behaviour is to
|
|
|
|
|
open a file for raw, binary access. To change the behaviour a new
|
2018-06-19 20:09:57 -04:00
|
|
|
|
handler should be set. Handler functions only accept ``str`` arguments.
|
2019-04-02 12:11:22 -04:00
|
|
|
|
The C API ``PyFile_OpenCode`` function assumes UTF-8 encoding. Paths
|
|
|
|
|
must be absolute, and it is the responsibility of the caller to ensure
|
|
|
|
|
the full path is correctly resolved.
|
|
|
|
|
|
|
|
|
|
A custom handler may be set by calling ``PyFile_SetOpenCodeHook()`` from
|
|
|
|
|
C at any time, including before ``Py_Initialize()``. However, if a hook
|
|
|
|
|
has already been set then the call will fail. When ``open_code()`` is
|
|
|
|
|
called with a hook set, the hook will be passed the path and its return
|
|
|
|
|
value will be returned directly. The returned object should be an open
|
|
|
|
|
file-like object that supports reading raw bytes. This is explicitly
|
|
|
|
|
intended to allow a ``BytesIO`` instance if the open handler has already
|
|
|
|
|
read the entire file into memory.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
Note that these hooks can import and call the ``_io.open()`` function on
|
|
|
|
|
CPython without triggering themselves. They can also use ``_io.BytesIO``
|
|
|
|
|
to return a compatible result using an in-memory buffer.
|
|
|
|
|
|
|
|
|
|
If the hook determines that the file should not be loaded, it should
|
|
|
|
|
raise an exception of its choice, as well as performing any other
|
|
|
|
|
logging.
|
|
|
|
|
|
|
|
|
|
All import and execution functionality involving code from a file will
|
2019-04-02 12:11:22 -04:00
|
|
|
|
be changed to use ``open_code()`` unconditionally. It is important to
|
|
|
|
|
note that calls to ``compile()``, ``exec()`` and ``eval()`` do not go
|
2018-06-19 20:09:57 -04:00
|
|
|
|
through this function - an audit hook that includes the code from these
|
|
|
|
|
calls is the best opportunity to validate code that is read from the
|
|
|
|
|
file. Given the current decoupling between import and execution in
|
2019-04-02 12:11:22 -04:00
|
|
|
|
Python, most imported code will go through both ``open_code()`` and the
|
|
|
|
|
log hook for ``compile``, and so care should be taken to avoid
|
2018-06-19 20:09:57 -04:00
|
|
|
|
repeating verification steps.
|
|
|
|
|
|
2019-04-10 14:46:20 -04:00
|
|
|
|
File accesses that are not intentionally planning to execute code are
|
|
|
|
|
not expected to use this function. This includes loading pickles, XML
|
|
|
|
|
or YAML files, where code execution is generally considered malicious
|
|
|
|
|
rather than intentional. These operations should provide their own
|
|
|
|
|
auditing events, preferably distinguishing between normal functionality
|
|
|
|
|
(for example, ``Unpickler.load``) and code execution
|
|
|
|
|
(``Unpickler.find_class``).
|
|
|
|
|
|
|
|
|
|
A few examples: if the file type normally requires an execute bit (on
|
|
|
|
|
POSIX) or would warn when marked as having been downloaded from the
|
|
|
|
|
internet (on Windows), it should probably use ``open_code()`` rather
|
|
|
|
|
than plain ``open()``. Opening ZIP files using the ``ZipFile`` class
|
|
|
|
|
should use ``open()``, while opening them via ``zipimport`` should use
|
|
|
|
|
``open_code()`` to signal the correct intent. Code that uses the wrong
|
|
|
|
|
function for a particular context may bypass the hook, which in CPython
|
|
|
|
|
and the standard library should be considered a bug. Using a combination
|
|
|
|
|
of ``open_code`` hooks and auditing hooks is necessary to trace all
|
|
|
|
|
executed sources in the presence of arbitrary code.
|
|
|
|
|
|
2018-06-19 20:09:57 -04:00
|
|
|
|
There is no Python API provided for changing the open hook. To modify
|
|
|
|
|
import behavior from Python code, use the existing functionality
|
|
|
|
|
provided by ``importlib``.
|
|
|
|
|
|
|
|
|
|
API Availability
|
|
|
|
|
----------------
|
|
|
|
|
|
|
|
|
|
While all the functions added here are considered public and stable API,
|
|
|
|
|
the behavior of the functions is implementation specific. Most
|
|
|
|
|
descriptions here refer to the CPython implementation, and while other
|
|
|
|
|
implementations should provide the functions, there is no requirement
|
|
|
|
|
that they behave the same.
|
|
|
|
|
|
|
|
|
|
For example, ``sys.addaudithook()`` and ``sys.audit()`` should exist but
|
|
|
|
|
may do nothing. This allows code to make calls to ``sys.audit()``
|
|
|
|
|
without having to test for existence, but it should not assume that its
|
|
|
|
|
call will have any effect. (Including existence tests in
|
|
|
|
|
security-critical code allows another vector to bypass auditing, so it
|
|
|
|
|
is preferable that the function always exist.)
|
|
|
|
|
|
2019-04-02 12:11:22 -04:00
|
|
|
|
``io.open_code(path)`` should at a minimum always return
|
|
|
|
|
``_io.open(path, 'rb')``. Code using the function should make no further
|
|
|
|
|
assumptions about what may occur, and implementations other than CPython
|
|
|
|
|
are not required to let developers override the behavior of this
|
2018-06-19 20:09:57 -04:00
|
|
|
|
function with a hook.
|
|
|
|
|
|
|
|
|
|
Suggested Audit Hook Locations
|
|
|
|
|
==============================
|
|
|
|
|
|
|
|
|
|
The locations and parameters in calls to ``sys.audit()`` or
|
|
|
|
|
``PySys_Audit()`` are to be determined by individual Python
|
|
|
|
|
implementations. This is to allow maximum freedom for implementations
|
|
|
|
|
to expose the operations that are most relevant to their platform,
|
|
|
|
|
and to avoid or ignore potentially expensive or noisy events.
|
|
|
|
|
|
|
|
|
|
Table 1 acts as both suggestions of operations that should trigger
|
|
|
|
|
audit events on all implementations, and examples of event schemas.
|
|
|
|
|
|
|
|
|
|
Table 2 provides further examples that are not required, but are
|
|
|
|
|
likely to be available in CPython.
|
|
|
|
|
|
|
|
|
|
Refer to the documentation associated with your version of Python to
|
|
|
|
|
see which operations provide audit events.
|
|
|
|
|
|
|
|
|
|
.. csv-table:: Table 1: Suggested Audit Hooks
|
|
|
|
|
:header: "API Function", "Event Name", "Arguments", "Rationale"
|
|
|
|
|
:widths: 2, 2, 3, 6
|
2019-04-16 10:50:15 -04:00
|
|
|
|
|
2018-06-19 20:09:57 -04:00
|
|
|
|
``PySys_AddAuditHook``, ``sys.addaudithook``, "", "Detect when new
|
|
|
|
|
audit hooks are being added.
|
|
|
|
|
"
|
2019-05-07 15:36:45 -04:00
|
|
|
|
``PyFile_SetOpenCodeHook``, ``cpython.PyFile_SetOpenCodeHook``, "
|
|
|
|
|
", "Detects any attempt to set the ``open_code`` hook.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
"
|
|
|
|
|
"``compile``, ``exec``, ``eval``, ``PyAst_CompileString``,
|
|
|
|
|
``PyAST_obj2mod``", ``compile``, "``(code, filename_or_none)``", "
|
|
|
|
|
Detect dynamic code compilation, where ``code`` could be a string or
|
|
|
|
|
AST. Note that this will be called for regular imports of source
|
2019-04-02 12:11:22 -04:00
|
|
|
|
code, including those that were opened with ``open_code``.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
"
|
|
|
|
|
"``exec``, ``eval``, ``run_mod``", ``exec``, "``(code_object,)``", "
|
|
|
|
|
Detect dynamic execution of code objects. This only occurs for
|
|
|
|
|
explicit calls, and is not raised for normal function invocation.
|
|
|
|
|
"
|
|
|
|
|
``import``, ``import``, "``(module, filename, sys.path,
|
|
|
|
|
sys.meta_path, sys.path_hooks)``", "Detect when modules are
|
|
|
|
|
imported. This is raised before the module name is resolved to a
|
|
|
|
|
file. All arguments other than the module name may be ``None`` if
|
|
|
|
|
they are not used or available.
|
|
|
|
|
"
|
2019-05-07 15:36:45 -04:00
|
|
|
|
"``open``", ``io.open``, "``(path, mode, flags)``", "Detect when a
|
|
|
|
|
file is about to be opened. *path* and *mode* are the usual parameters
|
|
|
|
|
to ``open`` if available, while *flags* is provided instead of *mode*
|
2019-03-28 18:53:22 -04:00
|
|
|
|
in some cases.
|
|
|
|
|
"
|
2018-06-19 20:09:57 -04:00
|
|
|
|
``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
|
|
|
|
|
injecting trace functions. Because of the implementation, exceptions
|
|
|
|
|
raised from the hook will abort the operation, but will not be
|
|
|
|
|
raised in Python code. Note that ``threading.setprofile`` eventually
|
|
|
|
|
calls this function, so the event will be audited for each thread.
|
|
|
|
|
"
|
|
|
|
|
``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is
|
|
|
|
|
injecting trace functions. Because of the implementation, exceptions
|
|
|
|
|
raised from the hook will abort the operation, but will not be
|
|
|
|
|
raised in Python code. Note that ``threading.settrace`` eventually
|
|
|
|
|
calls this function, so the event will be audited for each thread.
|
|
|
|
|
"
|
|
|
|
|
"``_PyObject_GenericSetAttr``, ``check_set_special_type_attr``,
|
|
|
|
|
``object_set_class``, ``func_set_code``, ``func_set_[kw]defaults``","
|
|
|
|
|
``object.__setattr__``","``(object, attr, value)``","Detect monkey
|
|
|
|
|
patching of types and objects. This event
|
|
|
|
|
is raised for the ``__class__`` attribute and any attribute on
|
|
|
|
|
``type`` objects.
|
|
|
|
|
"
|
|
|
|
|
"``_PyObject_GenericSetAttr``",``object.__delattr__``,"``(object,
|
|
|
|
|
attr)``","Detect deletion of object attributes. This event is raised
|
|
|
|
|
for any attribute on ``type`` objects.
|
|
|
|
|
"
|
|
|
|
|
"``Unpickler.find_class``",``pickle.find_class``,"``(module_name,
|
|
|
|
|
global_name)``","Detect imports and global name lookup when
|
|
|
|
|
unpickling.
|
|
|
|
|
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.. csv-table:: Table 2: Potential CPython Audit Hooks
|
|
|
|
|
:header: "API Function", "Event Name", "Arguments", "Rationale"
|
|
|
|
|
:widths: 2, 2, 3, 6
|
2019-04-16 10:50:15 -04:00
|
|
|
|
|
2018-06-19 20:09:57 -04:00
|
|
|
|
``_PySys_ClearAuditHooks``, ``sys._clearaudithooks``, "", "Notifies
|
|
|
|
|
hooks they are being cleaned up, mainly in case the event is
|
|
|
|
|
triggered unexpectedly. This event cannot be aborted.
|
|
|
|
|
"
|
|
|
|
|
``code_new``, ``code.__new__``, "``(bytecode, filename, name)``", "
|
|
|
|
|
Detect dynamic creation of code objects. This only occurs for
|
|
|
|
|
direct instantiation, and is not raised for normal compilation.
|
|
|
|
|
"
|
|
|
|
|
``func_new_impl``, ``function.__new__``, "``(code,)``", "Detect
|
|
|
|
|
dynamic creation of function objects. This only occurs for direct
|
|
|
|
|
instantiation, and is not raised for normal compilation.
|
|
|
|
|
"
|
|
|
|
|
"``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
|
|
|
|
|
``(module_or_path,)``", "Detect when native modules are used.
|
|
|
|
|
"
|
|
|
|
|
``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``", "
|
|
|
|
|
Collect information about specific symbols retrieved from native
|
|
|
|
|
modules.
|
|
|
|
|
"
|
|
|
|
|
``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect
|
|
|
|
|
when code is accessing arbitrary memory using ``ctypes``.
|
|
|
|
|
"
|
2019-04-16 10:50:15 -04:00
|
|
|
|
"``new_mmap_object``",``mmap.__new__``,"``(fileno, map_size, access,
|
|
|
|
|
offset)``", "Detects creation of mmap objects. On POSIX, access may
|
2018-07-04 12:53:40 -04:00
|
|
|
|
have been calculated from the ``prot`` and ``flags`` arguments.
|
|
|
|
|
"
|
2018-06-19 20:09:57 -04:00
|
|
|
|
``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect
|
|
|
|
|
when code is accessing frames directly.
|
|
|
|
|
"
|
|
|
|
|
``sys._current_frames``, ``sys._current_frames``, "", "Detect when
|
|
|
|
|
code is accessing frames directly.
|
|
|
|
|
"
|
|
|
|
|
"``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
|
|
|
|
|
``socket.getaddrinfo``, ``socket.getnameinfo``, ``socket.sendmsg``,
|
2019-05-07 15:36:45 -04:00
|
|
|
|
``socket.sendto``", ``socket.address``, "``(socket, address,)``", "
|
|
|
|
|
Detect access to network resources. The address is unmodified from
|
|
|
|
|
the original call.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
"
|
|
|
|
|
"``member_get``, ``func_get_code``, ``func_get_[kw]defaults``
|
|
|
|
|
",``object.__getattr__``,"``(object, attr)``","Detect access to
|
|
|
|
|
restricted attributes. This event is raised for any built-in
|
|
|
|
|
members that are marked as restricted, and members that may allow
|
|
|
|
|
bypassing imports.
|
|
|
|
|
"
|
2019-04-16 10:50:15 -04:00
|
|
|
|
"``urllib.urlopen``",``urllib.Request``,"``(url, data, headers,
|
|
|
|
|
method)``", "Detects URL requests.
|
|
|
|
|
"
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
Performance Impact
|
|
|
|
|
==================
|
|
|
|
|
|
|
|
|
|
The important performance impact is the case where events are being
|
|
|
|
|
raised but there are no hooks attached. This is the unavoidable case -
|
2019-03-28 18:53:22 -04:00
|
|
|
|
once a developer has added audit hooks they have explicitly chosen to
|
|
|
|
|
trade performance for functionality. Performance impact with hooks added
|
|
|
|
|
are not of interest here, since this is opt-in functionality.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
2018-07-04 12:53:40 -04:00
|
|
|
|
Analysis using the Python Performance Benchmark Suite [1]_ shows no
|
|
|
|
|
significant impact, with the vast majority of benchmarks showing
|
|
|
|
|
between 1.05x faster to 1.05x slower.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
In our opinion, the performance impact of the set of auditing points
|
|
|
|
|
described in this PEP is negligible.
|
|
|
|
|
|
|
|
|
|
Rejected Ideas
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
Separate module for audit hooks
|
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
|
|
The proposal is to add a new module for audit hooks, hypothetically
|
|
|
|
|
``audit``. This would separate the API and implementation from the
|
|
|
|
|
``sys`` module, and allow naming the C functions ``PyAudit_AddHook`` and
|
|
|
|
|
``PyAudit_Audit`` rather than the current variations.
|
|
|
|
|
|
|
|
|
|
Any such module would need to be a built-in module that is guaranteed to
|
|
|
|
|
always be present. The nature of these hooks is that they must be
|
|
|
|
|
callable without condition, as any conditional imports or calls provide
|
2018-07-04 12:53:40 -04:00
|
|
|
|
opportunities to intercept and suppress or modify events.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
2019-03-28 18:53:22 -04:00
|
|
|
|
Given it is one of the most core modules, the ``sys`` module is somewhat
|
|
|
|
|
protected against module shadowing attacks. Replacing ``sys`` with a
|
|
|
|
|
sufficiently functional module that the application can still run is a
|
|
|
|
|
much more complicated task than replacing a module with only one
|
2018-06-19 20:09:57 -04:00
|
|
|
|
function of interest. An attacker that has the ability to shadow the
|
|
|
|
|
``sys`` module is already capable of running arbitrary code from files,
|
2019-03-28 18:53:22 -04:00
|
|
|
|
whereas an ``audit`` module could be replaced with a single line in a
|
2018-07-04 12:53:40 -04:00
|
|
|
|
``.pth`` file anywhere on the search path::
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
import sys; sys.modules['audit'] = type('audit', (object,),
|
|
|
|
|
{'audit': lambda *a: None, 'addhook': lambda *a: None})
|
|
|
|
|
|
|
|
|
|
Multiple layers of protection already exist for monkey patching attacks
|
|
|
|
|
against either ``sys`` or ``audit``, but assignments or insertions to
|
|
|
|
|
``sys.modules`` are not audited.
|
|
|
|
|
|
2019-03-28 18:53:22 -04:00
|
|
|
|
This idea is rejected because it makes it trivial to suppress all calls
|
|
|
|
|
to ``audit``.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
2018-07-04 12:53:40 -04:00
|
|
|
|
Flag in sys.flags to indicate "audited" mode
|
|
|
|
|
--------------------------------------------
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
The proposal is to add a value in ``sys.flags`` to indicate when Python
|
2018-07-04 12:53:40 -04:00
|
|
|
|
is running in a "secure" or "audited" mode. This would allow
|
|
|
|
|
applications to detect when some features are enabled or when hooks
|
|
|
|
|
have been added and modify their behaviour appropriately.
|
|
|
|
|
|
|
|
|
|
Currently, we are not aware of any legitimate reasons for a program to
|
|
|
|
|
behave differently in the presence of audit hooks.
|
|
|
|
|
|
2019-04-02 12:11:22 -04:00
|
|
|
|
Both application-level APIs ``sys.audit`` and ``io.open_code`` are
|
|
|
|
|
always present and functional, regardless of whether the regular
|
|
|
|
|
``python`` entry point or some alternative entry point is used. Callers
|
|
|
|
|
cannot determine whether any hooks have been added (except by performing
|
|
|
|
|
side-channel analysis), nor do they need to. The calls should be fast
|
|
|
|
|
enough that callers do not need to avoid them, and the program is
|
|
|
|
|
responsible for ensuring that any added hooks are fast enough to not
|
|
|
|
|
affect application performance.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
|
|
|
|
The argument that this is "security by obscurity" is valid, but
|
|
|
|
|
irrelevant. Security by obscurity is only an issue when there are no
|
|
|
|
|
other protective mechanisms; obscurity as the first step in avoiding
|
|
|
|
|
attack is strongly recommended (see `this article
|
|
|
|
|
<https://danielmiessler.com/study/security-by-obscurity/>`_ for
|
|
|
|
|
discussion).
|
|
|
|
|
|
|
|
|
|
This idea is rejected because there are no appropriate reasons for an
|
|
|
|
|
application to change its behaviour based on whether these APIs are in
|
|
|
|
|
use.
|
|
|
|
|
|
2019-03-28 18:53:22 -04:00
|
|
|
|
Why Not A Sandbox
|
|
|
|
|
=================
|
|
|
|
|
|
|
|
|
|
Sandboxing CPython has been attempted many times in the past, and each
|
|
|
|
|
past attempt has failed. Fundamentally, the problem is that certain
|
|
|
|
|
functionality has to be restricted when executing the sandboxed code,
|
|
|
|
|
but otherwise needs to be available for normal operation of Python. For
|
|
|
|
|
example, completely removing the ability to compile strings into
|
|
|
|
|
bytecode also breaks the ability to import modules from source code, and
|
|
|
|
|
if it is not completely removed then there are too many ways to get
|
|
|
|
|
access to that functionality indirectly. There is not yet any feasible
|
|
|
|
|
way to generically determine whether a given operation is "safe" or not.
|
|
|
|
|
Further information and references available at [2]_.
|
|
|
|
|
|
|
|
|
|
This proposal does not attempt to restrict functionality, but simply
|
|
|
|
|
exposes the fact that the functionality is being used. Particularly for
|
|
|
|
|
intrusion scenarios, detection is significantly more important than
|
|
|
|
|
early prevention (as early prevention will generally drive attackers to
|
|
|
|
|
use an alternate, less-detectable, approach). The availability of audit
|
|
|
|
|
hooks alone does not change the attack surface of Python in any way, but
|
|
|
|
|
they enable defenders to integrate Python into their environment in ways
|
|
|
|
|
that are currently not possible.
|
|
|
|
|
|
|
|
|
|
Since audit hooks have the ability to safely prevent an operation
|
2019-06-25 00:58:50 -04:00
|
|
|
|
occurring, this feature does enable the ability to provide some level of
|
2019-03-28 18:53:22 -04:00
|
|
|
|
sandboxing. In most cases, however, the intention is to enable logging
|
|
|
|
|
rather than creating a sandbox.
|
|
|
|
|
|
2018-07-04 12:53:40 -04:00
|
|
|
|
Relationship to PEP 551
|
|
|
|
|
=======================
|
|
|
|
|
|
|
|
|
|
This API was originally presented as part of
|
2022-01-21 06:03:51 -05:00
|
|
|
|
:pep:`551` Security
|
2018-07-04 12:53:40 -04:00
|
|
|
|
Transparency in the Python Runtime.
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
2018-07-04 12:53:40 -04:00
|
|
|
|
For simpler review purposes, and due to the broader applicability of
|
|
|
|
|
these APIs beyond security, the API design is now presented separately.
|
|
|
|
|
|
2022-01-21 06:03:51 -05:00
|
|
|
|
:pep:`551` is an informational PEP discussing how to integrate Python into
|
2018-07-04 12:53:40 -04:00
|
|
|
|
a secure or audited environment.
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
==========
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
2018-07-04 12:53:40 -04:00
|
|
|
|
.. [1] Python Performance Benchmark Suite `<https://github.com/python/performance>`_
|
2018-06-19 20:09:57 -04:00
|
|
|
|
|
2019-03-28 18:53:22 -04:00
|
|
|
|
.. [2] Python Security model - Sandbox `<https://python-security.readthedocs.io/security.html#sandbox>`_
|
|
|
|
|
|
2018-06-19 20:09:57 -04:00
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
2019-03-28 18:53:22 -04:00
|
|
|
|
Copyright (c) 2019 by Microsoft Corporation. This material may be
|
2018-06-19 20:09:57 -04:00
|
|
|
|
distributed only subject to the terms and conditions set forth in the
|
|
|
|
|
Open Publication License, v1.0 or later (the latest version is presently
|
|
|
|
|
available at http://www.opencontent.org/openpub/).
|