PEP 578: Updated text for clarity (#959)

This commit is contained in:
Steve Dower 2019-03-28 15:53:22 -07:00 committed by GitHub
parent f350734f80
commit cd42999322
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 78 additions and 39 deletions

View File

@ -28,27 +28,27 @@ are unspecified here to allow implementations the freedom to determine
how best to provide information to their users. Some examples likely how best to provide information to their users. Some examples likely
to be used in CPython are provided for explanatory purposes. to be used in CPython are provided for explanatory purposes.
See PEP-551 for discussion and recommendations on enhancing the See PEP 551 for discussion and recommendations on enhancing the
security of a Python runtime making use of these auditing APIs. security of a Python runtime making use of these auditing APIs.
Background Background
========== ==========
Python provides access to a wide range of low-level functionality on Python provides access to a wide range of low-level functionality on
many common operating systems in a consistent manner. While this is many common operating systems. While this is incredibly useful for
incredibly useful for "write-once, run-anywhere" scripting, it also "write-once, run-anywhere" scripting, it also makes monitoring of
makes monitoring of software written in Python difficult. Because software written in Python difficult. Because Python uses native system
Python uses native system APIs directly, existing monitoring APIs directly, existing monitoring tools either suffer from limited
tools either suffer from limited context or auditing bypass. context or auditing bypass.
Limited context occurs when system monitoring can report that an Limited context occurs when system monitoring can report that an
action occurred, but cannot explain the sequence of events leading to action occurred, but cannot explain the sequence of events leading to
it. For example, network monitoring at the OS level may be able to it. For example, network monitoring at the OS level may be able to
report "listening started on port 5678", but may not be able to report "listening started on port 5678", but may not be able to
provide the process ID, command line or parent process, or the local provide the process ID, command line, parent process, or the local
state in the program at the point that triggered the action. Firewall state in the program at the point that triggered the action. Firewall
controls to prevent such an action are similarly limited, typically controls to prevent such an action are similarly limited, typically
to a process name or some global state such as the current user, and to process names or some global state such as the current user, and
in any case rarely provide a useful log file correlated with other in any case rarely provide a useful log file correlated with other
application messages. application messages.
@ -73,6 +73,10 @@ same name as the module they intend to use - for example, a
``random.py`` file that attempts to import the standard library ``random.py`` file that attempts to import the standard library
``random`` module. ``random`` module.
This is not sandboxing, as this proposal does not attempt to prevent
malicious behavior (though it enables some new options to do so).
See the `Why Not A Sandbox`_ section below for further discussion.
Overview of Changes Overview of Changes
=================== ===================
@ -84,12 +88,14 @@ We propose two API changes to enable this: an Audit Hook and Verified
Open Hook. Both are available from Python and native code, allowing Open Hook. Both are available from Python and native code, allowing
applications and frameworks written in pure Python code to take applications and frameworks written in pure Python code to take
advantage of the extra messages, while also allowing embedders or advantage of the extra messages, while also allowing embedders or
system administrators to deploy "always-on" builds of Python. system administrators to deploy builds of Python where auditing is
always enabled.
Only CPython is bound to provide the native APIs as described here. Only CPython is bound to provide the native APIs as described here.
Other implementations should provide the pure Python APIs, and Other implementations should provide the pure Python APIs, and
may provide native versions as appropriate for their underlying may provide native versions as appropriate for their underlying
runtimes. runtimes. Auditing events are likewise considered implementation
specific, but are bound by normal feature compatibility guarantees.
Audit Hook Audit Hook
---------- ----------
@ -132,9 +138,9 @@ When events of interest are occurring, code can either call
``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The ``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The
string argument is the name of the event, and the tuple contains string argument is the name of the event, and the tuple contains
arguments. A given event name should have a fixed schema for arguments, arguments. A given event name should have a fixed schema for arguments,
which should be considered a public API (for a given x.y version which should be considered a public API (for each x.y version release),
release), and thus should only change between feature releases with and thus should only change between feature releases with updated
updated documentation. documentation.
For maximum compatibility, events using the same name as an event in For maximum compatibility, events using the same name as an event in
the reference interpreter CPython should make every attempt to use the reference interpreter CPython should make every attempt to use
@ -152,7 +158,7 @@ log the event, abort the operation with an exception, or to immediately
terminate the process with an operating system exit call. terminate the process with an operating system exit call.
When an event is audited but no hooks have been set, the ``audit()`` When an event is audited but no hooks have been set, the ``audit()``
function should include minimal overhead. Ideally, each argument is a function should impose minimal overhead. Ideally, each argument is a
reference to existing data rather than a value calculated just for the reference to existing data rather than a value calculated just for the
auditing call. auditing call.
@ -160,15 +166,14 @@ As hooks may be Python objects, they need to be freed during
``Py_Finalize()``. To do this, we add an internal API ``Py_Finalize()``. To do this, we add an internal API
``_Py_ClearAuditHooks()`` that releases any Python hooks and any ``_Py_ClearAuditHooks()`` that releases any Python hooks and any
memory held. This is an internal function with no public export, and memory held. This is an internal function with no public export, and
we recommend it should raise its own audit event for all current hooks we recommend it raise its own audit event for all current hooks to
to ensure that unexpected calls are observed. ensure that unexpected calls are observed.
Below in `Suggested Audit Hook Locations`_, we recommend some important Below in `Suggested Audit Hook Locations`_, we recommend some important
operations that should raise audit events. In PEP 551, more audited operations that should raise audit events.
operations are recommended with a view to security transparency.
Python implementations should document which operations will raise Python implementations should document which operations will raise
audit events, along with the event schema. It is intended that audit events, along with the event schema. It is intentional that
``sys.addaudithook(print)`` be a trivial way to display all messages. ``sys.addaudithook(print)`` be a trivial way to display all messages.
Verified Open Hook Verified Open Hook
@ -176,11 +181,12 @@ Verified Open Hook
Most operating systems have a mechanism to distinguish between files Most operating systems have a mechanism to distinguish between files
that can be executed and those that can not. For example, this may be an that can be executed and those that can not. For example, this may be an
execute bit in the permissions field, or a verified hash of the file execute bit in the permissions field, a verified hash of the file
contents to detect potential code tampering. These are an important contents to detect potential code tampering, or file system path
security mechanism for preventing execution of data or code that is not restrictions. These are an important security mechanism for preventing
approved for a given environment. Currently, Python has no way to execution of data or code that is not approved for a given environment.
integrate with these when launching scripts or importing modules. Currently, Python has no way to integrate with these when launching
scripts or importing modules.
The new public C API for the verified open hook is:: The new public C API for the verified open hook is::
@ -201,6 +207,7 @@ The ``importlib.util.open_for_import()`` function is a drop-in
replacement for ``open(str(pathlike), 'rb')``. Its default behaviour is replacement for ``open(str(pathlike), 'rb')``. Its default behaviour is
to open a file for raw, binary access. To change the behaviour a new to open a file for raw, binary access. To change the behaviour a new
handler should be set. Handler functions only accept ``str`` arguments. handler should be set. Handler functions only accept ``str`` arguments.
The C API ``PyImport_OpenForImport`` function assumes UTF-8 encoding.
A custom handler may be set by calling ``PyImport_SetOpenForImportHook()`` A custom handler may be set by calling ``PyImport_SetOpenForImportHook()``
from C at any time, including before ``Py_Initialize()``. However, if a from C at any time, including before ``Py_Initialize()``. However, if a
@ -209,9 +216,7 @@ hook has already been set then the call will fail. When
the path and its return value will be returned directly. The returned the path and its return value will be returned directly. The returned
object should be an open file-like object that supports reading raw object should be an open file-like object that supports reading raw
bytes. This is explicitly intended to allow a ``BytesIO`` instance if bytes. This is explicitly intended to allow a ``BytesIO`` instance if
the open handler has already had to read the file into memory in order the open handler has already read the entire file into memory.
to perform whatever verification is necessary to determine whether the
content is permitted to be executed.
Note that these hooks can import and call the ``_io.open()`` function on Note that these hooks can import and call the ``_io.open()`` function on
CPython without triggering themselves. They can also use ``_io.BytesIO`` CPython without triggering themselves. They can also use ``_io.BytesIO``
@ -301,6 +306,11 @@ see which operations provide audit events.
file. All arguments other than the module name may be ``None`` if file. All arguments other than the module name may be ``None`` if
they are not used or available. they are not used or available.
" "
"``open``", ``open``, "``(path, mode, flags)``", "Detect when a file
is about to be opened. *path* and *mode* are the usual parameters to
``open`` if available, while *flags* is provided instead of *mode*
in some cases.
"
``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is ``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
injecting trace functions. Because of the implementation, exceptions injecting trace functions. Because of the implementation, exceptions
raised from the hook will abort the operation, but will not be raised from the hook will abort the operation, but will not be
@ -387,10 +397,9 @@ Performance Impact
The important performance impact is the case where events are being The important performance impact is the case where events are being
raised but there are no hooks attached. This is the unavoidable case - raised but there are no hooks attached. This is the unavoidable case -
once a distributor begins adding audit hooks they have explicitly once a developer has added audit hooks they have explicitly chosen to
chosen to trade performance for functionality. Performance import trade performance for functionality. Performance impact with hooks added
with hooks added are not of interest here, since this is considered are not of interest here, since this is opt-in functionality.
opt-in functionality.
Analysis using the Python Performance Benchmark Suite [1]_ shows no Analysis using the Python Performance Benchmark Suite [1]_ shows no
significant impact, with the vast majority of benchmarks showing significant impact, with the vast majority of benchmarks showing
@ -415,13 +424,13 @@ always be present. The nature of these hooks is that they must be
callable without condition, as any conditional imports or calls provide callable without condition, as any conditional imports or calls provide
opportunities to intercept and suppress or modify events. opportunities to intercept and suppress or modify events.
Given its nature as one of the most core modules, the ``sys`` module is Given it is one of the most core modules, the ``sys`` module is somewhat
somewhat protected against module shadowing attacks. Replacing ``sys`` protected against module shadowing attacks. Replacing ``sys`` with a
with a sufficiently functional module that the application can still run sufficiently functional module that the application can still run is a
is a much more complicated task than replacing a module with only one much more complicated task than replacing a module with only one
function of interest. An attacker that has the ability to shadow the function of interest. An attacker that has the ability to shadow the
``sys`` module is already capable of running arbitrary code from files, ``sys`` module is already capable of running arbitrary code from files,
whereas an ``audit`` module can be replaced with a single line in a whereas an ``audit`` module could be replaced with a single line in a
``.pth`` file anywhere on the search path:: ``.pth`` file anywhere on the search path::
import sys; sys.modules['audit'] = type('audit', (object,), import sys; sys.modules['audit'] = type('audit', (object,),
@ -431,8 +440,8 @@ Multiple layers of protection already exist for monkey patching attacks
against either ``sys`` or ``audit``, but assignments or insertions to against either ``sys`` or ``audit``, but assignments or insertions to
``sys.modules`` are not audited. ``sys.modules`` are not audited.
This idea is rejected because it makes substituting ``audit`` calls This idea is rejected because it makes it trivial to suppress all calls
throughout all callers trivial. to ``audit``.
Flag in sys.flags to indicate "audited" mode Flag in sys.flags to indicate "audited" mode
-------------------------------------------- --------------------------------------------
@ -465,6 +474,34 @@ This idea is rejected because there are no appropriate reasons for an
application to change its behaviour based on whether these APIs are in application to change its behaviour based on whether these APIs are in
use. use.
Why Not A Sandbox
=================
Sandboxing CPython has been attempted many times in the past, and each
past attempt has failed. Fundamentally, the problem is that certain
functionality has to be restricted when executing the sandboxed code,
but otherwise needs to be available for normal operation of Python. For
example, completely removing the ability to compile strings into
bytecode also breaks the ability to import modules from source code, and
if it is not completely removed then there are too many ways to get
access to that functionality indirectly. There is not yet any feasible
way to generically determine whether a given operation is "safe" or not.
Further information and references available at [2]_.
This proposal does not attempt to restrict functionality, but simply
exposes the fact that the functionality is being used. Particularly for
intrusion scenarios, detection is significantly more important than
early prevention (as early prevention will generally drive attackers to
use an alternate, less-detectable, approach). The availability of audit
hooks alone does not change the attack surface of Python in any way, but
they enable defenders to integrate Python into their environment in ways
that are currently not possible.
Since audit hooks have the ability to safely prevent an operation
occuring, this feature does enable the ability to provide some level of
sandboxing. In most cases, however, the intention is to enable logging
rather than creating a sandbox.
Relationship to PEP 551 Relationship to PEP 551
======================= =======================
@ -483,10 +520,12 @@ References
.. [1] Python Performance Benchmark Suite `<https://github.com/python/performance>`_ .. [1] Python Performance Benchmark Suite `<https://github.com/python/performance>`_
.. [2] Python Security model - Sandbox `<https://python-security.readthedocs.io/security.html#sandbox>`_
Copyright Copyright
========= =========
Copyright (c) 2018 by Microsoft Corporation. This material may be Copyright (c) 2019 by Microsoft Corporation. This material may be
distributed only subject to the terms and conditions set forth in the distributed only subject to the terms and conditions set forth in the
Open Publication License, v1.0 or later (the latest version is presently Open Publication License, v1.0 or later (the latest version is presently
available at http://www.opencontent.org/openpub/). available at http://www.opencontent.org/openpub/).