234 lines
8.6 KiB
ReStructuredText
234 lines
8.6 KiB
ReStructuredText
PEP: 330
|
|
Title: Python Bytecode Verification
|
|
Version: $Revision$
|
|
Last-Modified: $Date$
|
|
Author: Michel Pelletier <michel@users.sourceforge.net>
|
|
Status: Rejected
|
|
Type: Standards Track
|
|
Content-Type: text/x-rst
|
|
Created: 17-Jun-2004
|
|
Python-Version: 2.6
|
|
Post-History:
|
|
|
|
|
|
Abstract
|
|
========
|
|
|
|
If Python Virtual Machine (PVM) bytecode is not "well-formed" it
|
|
is possible to crash or exploit the PVM by causing various errors
|
|
such as under/overflowing the value stack or reading/writing into
|
|
arbitrary areas of the PVM program space. Most of these kinds of
|
|
errors can be eliminated by verifying that PVM bytecode does not
|
|
violate a set of simple constraints before execution.
|
|
|
|
This PEP proposes a set of constraints on the format and structure
|
|
of Python Virtual Machine (PVM) bytecode and provides an
|
|
implementation in Python of this verification process.
|
|
|
|
|
|
Pronouncement
|
|
=============
|
|
|
|
Guido believes that a verification tool has some value. If
|
|
someone wants to add it to ``Tools/scripts``, no PEP is required.
|
|
|
|
Such a tool may have value for validating the output from
|
|
"bytecodehacks" or from direct edits of PYC files. As security
|
|
measure, its value is somewhat limited because perfectly valid
|
|
bytecode can still do horrible things. That situation could
|
|
change if the concept of restricted execution were to be
|
|
successfully resurrected.
|
|
|
|
|
|
Motivation
|
|
==========
|
|
|
|
The Python Virtual Machine executes Python programs that have been
|
|
compiled from the Python language into a bytecode representation.
|
|
The PVM assumes that any bytecode being executed is "well-formed"
|
|
with regard to a number implicit constraints. Some of these
|
|
constraints are checked at run-time, but most of them are not due
|
|
to the overhead they would create.
|
|
|
|
When running in debug mode the PVM does do several run-time checks
|
|
to ensure that any particular bytecode cannot violate these
|
|
constraints that, to a degree, prevent bytecode from crashing or
|
|
exploiting the interpreter. These checks add a measurable
|
|
overhead to the interpreter, and are typically turned off in
|
|
common use.
|
|
|
|
Bytecode that is not well-formed and executed by a PVM not running
|
|
in debug mode may create a variety of fatal and non-fatal errors.
|
|
Typically, ill-formed code will cause the PVM to seg-fault and
|
|
cause the OS to immediately and abruptly terminate the
|
|
interpreter.
|
|
|
|
Conceivably, ill-formed bytecode could exploit the interpreter and
|
|
allow Python bytecode to execute arbitrary C-level machine
|
|
instructions or to modify private, internal data structures in the
|
|
interpreter. If used cleverly this could subvert any form of
|
|
security policy an application may want to apply to its objects.
|
|
|
|
Practically, it would be difficult for a malicious user to
|
|
"inject" invalid bytecode into a PVM for the purposes of
|
|
exploitation, but not impossible. Buffer overflow and memory
|
|
overwrite attacks are commonly understood, particularly when the
|
|
exploit payload is transmitted unencrypted over a network or when
|
|
a file or network security permission weakness is used as a
|
|
foothold for further attacks.
|
|
|
|
Ideally, no bytecode should ever be allowed to read or write
|
|
underlying C-level data structures to subvert the operation of the
|
|
PVM, whether the bytecode was maliciously crafted or not. A
|
|
simple pre-execution verification step could ensure that bytecode
|
|
cannot over/underflow the value stack or access other sensitive
|
|
areas of PVM program space at run-time.
|
|
|
|
This PEP proposes several validation steps that should be taken on
|
|
Python bytecode before it is executed by the PVM so that it
|
|
compiles with static and structure constraints on its instructions
|
|
and their operands. These steps are simple and catch a large
|
|
class of invalid bytecode that can cause crashes. There is also
|
|
some possibility that some run-time checks can be eliminated up
|
|
front by a verification pass.
|
|
|
|
There is, of course, no way to verify that bytecode is "completely
|
|
safe", for every definition of complete and safe. Even with
|
|
bytecode verification, Python programs can and most likely in the
|
|
future will seg-fault for a variety of reasons and continue to
|
|
cause many different classes of run-time errors, fatal or not.
|
|
The verification step proposed here simply plugs an easy hole that
|
|
can cause a large class of fatal and subtle errors at the bytecode
|
|
level.
|
|
|
|
Currently, the Java Virtual Machine (JVM) verifies Java bytecode
|
|
in a way very similar to what is proposed here. The JVM
|
|
Specification version 2 [1]_, Sections 4.8 and 4.9 were therefore
|
|
used as a basis for some of the constraints explained below. Any
|
|
Python bytecode verification implementation at a minimum must
|
|
enforce these constraints, but may not be limited to them.
|
|
|
|
|
|
Static Constraints on Bytecode Instructions
|
|
===========================================
|
|
|
|
1. The bytecode string must not be empty. (``len(co_code) > 0``).
|
|
|
|
2. The bytecode string cannot exceed a maximum size
|
|
(``len(co_code) < sizeof(unsigned char) - 1``).
|
|
|
|
3. The first instruction in the bytecode string begins at index 0.
|
|
|
|
4. Only valid byte-codes with the correct number of operands can
|
|
be in the bytecode string.
|
|
|
|
|
|
Static Constraints on Bytecode Instruction Operands
|
|
===================================================
|
|
|
|
1. The target of a jump instruction must be within the code
|
|
boundaries and must fall on an instruction, never between an
|
|
instruction and its operands.
|
|
|
|
2. The operand of a ``LOAD_*`` instruction must be a valid index into
|
|
its corresponding data structure.
|
|
|
|
3. The operand of a ``STORE_*`` instruction must be a valid index
|
|
into its corresponding data structure.
|
|
|
|
|
|
Structural Constraints between Bytecode Instructions
|
|
====================================================
|
|
|
|
1. Each instruction must only be executed with the appropriate
|
|
number of arguments in the value stack, regardless of the
|
|
execution path that leads to its invocation.
|
|
|
|
2. If an instruction can be executed along several different
|
|
execution paths, the value stack must have the same depth prior
|
|
to the execution of the instruction, regardless of the path
|
|
taken.
|
|
|
|
3. At no point during execution can the value stack grow to a
|
|
depth greater than that implied by ``co_stacksize``.
|
|
|
|
4. Execution never falls off the bottom of ``co_code``.
|
|
|
|
|
|
Implementation
|
|
==============
|
|
|
|
This PEP is the working document for a Python bytecode
|
|
verification implementation written in Python. This
|
|
implementation is not used implicitly by the PVM before executing
|
|
any bytecode, but is to be used explicitly by users concerned
|
|
about possibly invalid bytecode with the following snippet::
|
|
|
|
import verify
|
|
verify.verify(object)
|
|
|
|
The ``verify`` module provides a ``verify`` function which accepts the
|
|
same kind of arguments as ``dis.dis``: classes, methods, functions,
|
|
or code objects. It verifies that the object's bytecode is
|
|
well-formed according to the specifications of this PEP.
|
|
|
|
If the code is well-formed the call to ``verify`` returns silently
|
|
without error. If an error is encountered, it throws a
|
|
``VerificationError`` whose argument indicates the cause of the
|
|
failure. It is up to the programmer whether or not to handle the
|
|
error in some way or execute the invalid code regardless.
|
|
|
|
Phillip Eby has proposed a pseudo-code algorithm for bytecode
|
|
stack depth verification used by the reference implementation.
|
|
|
|
|
|
Verification Issues
|
|
===================
|
|
|
|
This PEP describes only a small number of verifications. While
|
|
discussion and analysis will lead to many more, it is highly
|
|
possible that future verification may need to be done or custom,
|
|
project-specific verifications. For this reason, it might be
|
|
desirable to add a verification registration interface to the test
|
|
implementation to register future verifiers. The need for this is
|
|
minimal since custom verifiers can subclass and extend the current
|
|
implementation for added behavior.
|
|
|
|
|
|
Required Changes
|
|
================
|
|
|
|
Armin Rigo noted that several byte-codes will need modification in
|
|
order for their stack effect to be statically analyzed. These are
|
|
``END_FINALLY``, ``POP_BLOCK``, and ``MAKE_CLOSURE``. Armin and Guido have
|
|
already agreed on how to correct the instructions. Currently the
|
|
Python implementation punts on these instructions.
|
|
|
|
This PEP does not propose to add the verification step to the
|
|
interpreter, but only to provide the Python implementation in the
|
|
standard library for optional use. Whether or not this
|
|
verification procedure is translated into C, included with the PVM
|
|
or enforced in any way is left for future discussion.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
.. [1] The Java Virtual Machine Specification 2nd Edition
|
|
http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html
|
|
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
..
|
|
Local Variables:
|
|
mode: indented-text
|
|
indent-tabs-mode: nil
|
|
sentence-end-double-space: t
|
|
fill-column: 70
|
|
End:
|