220 lines
8.8 KiB
Plaintext
220 lines
8.8 KiB
Plaintext
PEP: 330
|
||
Title: Python Bytecode Verification
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Michel Pelletier <michel@users.sourceforge.net>
|
||
Status: Rejected
|
||
Type: Standards Track
|
||
Content-Type: text/plain
|
||
Created: 17-Jun-2004
|
||
Python-Version: 2.6?
|
||
Post-History:
|
||
|
||
|
||
Abstract
|
||
|
||
If Python Virtual Machine (PVM) bytecode is not "well-formed" it
|
||
is possible to crash or exploit the PVM by causing various errors
|
||
such as under/overflowing the value stack or reading/writing into
|
||
arbitrary areas of the PVM program space. Most of these kinds of
|
||
errors can be eliminated by verifying that PVM bytecode does not
|
||
violate a set of simple constraints before execution.
|
||
|
||
This PEP proposes a set of constraints on the format and structure
|
||
of Python Virtual Machine (PVM) bytecode and provides an
|
||
implementation in Python of this verification process.
|
||
|
||
Pronouncement
|
||
|
||
Guido believes that a verification tool has some value. If
|
||
someone wants to add it to Tools/scripts, no PEP is required.
|
||
|
||
Such a tool may have value for validating the output from
|
||
"bytecodehacks" or from direct edits of PYC files. As security
|
||
measure, its value is somewhat limited because perfectly valid
|
||
bytecode can still do horrible things. That situation could
|
||
change if the concept of restricted execution were to be
|
||
successfully resurrected.
|
||
|
||
Motivation
|
||
|
||
The Python Virtual Machine executes Python programs that have been
|
||
compiled from the Python language into a bytecode representation.
|
||
The PVM assumes that any bytecode being executed is "well-formed"
|
||
with regard to a number implicit constraints. Some of these
|
||
constraints are checked at run-time, but most of them are not due
|
||
to the overhead they would create.
|
||
|
||
When running in debug mode the PVM does do several run-time checks
|
||
to ensure that any particular bytecode cannot violate these
|
||
constraints that, to a degree, prevent bytecode from crashing or
|
||
exploiting the interpreter. These checks add a measurable
|
||
overhead to the interpreter, and are typically turned off in
|
||
common use.
|
||
|
||
Bytecode that is not well-formed and executed by a PVM not running
|
||
in debug mode may create a variety of fatal and non-fatal errors.
|
||
Typically, ill-formed code will cause the PVM to seg-fault and
|
||
cause the OS to immediately and abruptly terminate the
|
||
interpreter.
|
||
|
||
Conceivably, ill-formed bytecode could exploit the interpreter and
|
||
allow Python bytecode to execute arbitrary C-level machine
|
||
instructions or to modify private, internal data structures in the
|
||
interpreter. If used cleverly this could subvert any form of
|
||
security policy an application may want to apply to it's objects.
|
||
|
||
Practically, it would be difficult for a malicious user to
|
||
"inject" invalid bytecode into a PVM for the purposes of
|
||
exploitation, but not impossible. Buffer overflow and memory
|
||
overwrite attacks are commonly understood, particularly when the
|
||
exploit payload is transmitted unencrypted over a network or when
|
||
a file or network security permission weakness is used as a
|
||
foothold for further attacks.
|
||
|
||
Ideally, no bytecode should ever be allowed to read or write
|
||
underlying C-level data structures to subvert the operation of the
|
||
PVM, whether the bytecode was maliciously crafted or not. A
|
||
simple pre-execution verification step could ensure that bytecode
|
||
cannot over/underflow the value stack or access other sensitive
|
||
areas of PVM program space at run-time.
|
||
|
||
This PEP proposes several validation steps that should be taken on
|
||
Python bytecode before it is executed by the PVM so that it
|
||
compiles with static and structure constraints on its instructions
|
||
and their operands. These steps are simple and catch a large
|
||
class of invalid bytecode that can cause crashes. There is also
|
||
some possibility that some run-time checks can be eliminated up
|
||
front by a verification pass.
|
||
|
||
There is, of course, no way to verify that bytecode is "completely
|
||
safe", for every definition of complete and safe. Even with
|
||
bytecode verification, Python programs can and most likely in the
|
||
future will seg-fault for a variety of reasons and continue to
|
||
cause many different classes of run-time errors, fatal or not.
|
||
The verification step proposed here simply plugs an easy hole that
|
||
can cause a large class of fatal and subtle errors at the bytecode
|
||
level.
|
||
|
||
Currently, the Java Virtual Machine (JVM) verifies Java bytecode
|
||
in a way very similar to what is proposed here. The JVM
|
||
Specification version 2 [1], Sections 4.8 and 4.9 were therefore
|
||
used as a basis for some of the constraints explained below. Any
|
||
Python bytecode verification implementation at a minimum must
|
||
enforce these constraints, but may not be limited to them.
|
||
|
||
|
||
Static Constraints on Bytecode Instructions
|
||
|
||
1. The bytecode string must not be empty. (len(co_code) > 0).
|
||
|
||
2. The bytecode string cannot exceed a maximum size
|
||
(len(co_code) < sizeof(unsigned char) - 1).
|
||
|
||
3. The first instruction in the bytecode string begins at index 0.
|
||
|
||
4. Only valid byte-codes with the correct number of operands can
|
||
be in the bytecode string.
|
||
|
||
|
||
Static Constraints on Bytecode Instruction Operands
|
||
|
||
1. The target of a jump instruction must be within the code
|
||
boundaries and must fall on an instruction, never between an
|
||
instruction and its operands.
|
||
|
||
2. The operand of a LOAD_* instruction must be an valid index into
|
||
its corresponding data structure.
|
||
|
||
3. The operand of a STORE_* instruction must be an valid index
|
||
into its corresponding data structure.
|
||
|
||
|
||
Structural Constraints between Bytecode Instructions
|
||
|
||
1. Each instruction must only be executed with the appropriate
|
||
number of arguments in the value stack, regardless of the
|
||
execution path that leads to its invocation.
|
||
|
||
2. If an instruction can be executed along several different
|
||
execution paths, the value stack must have the same depth prior
|
||
to the execution of the instruction, regardless of the path
|
||
taken.
|
||
|
||
3. At no point during execution can the value stack grow to a
|
||
depth greater than that implied by co_stacksize.
|
||
|
||
4. Execution never falls off the bottom of co_code.
|
||
|
||
|
||
Implementation
|
||
|
||
This PEP is the working document for an Python bytecode
|
||
verification implementation written in Python. This
|
||
implementation is not used implicitly by the PVM before executing
|
||
any bytecode, but is to be used explicitly by users concerned
|
||
about possibly invalid bytecode with the following snippet:
|
||
|
||
import verify
|
||
verify.verify(object)
|
||
|
||
The `verify` module provides a `verify` function which accepts the
|
||
same kind of arguments as `dis.dis`: classes, methods, functions,
|
||
or code objects. It verifies that the object's bytecode is
|
||
well-formed according to the specifications of this PEP.
|
||
|
||
If the code is well-formed the call to `verify` returns silently
|
||
without error. If an error is encountered, it throws a
|
||
'VerificationError' whose argument indicates the cause of the
|
||
failure. It is up to the programmer whether or not to handle the
|
||
error in some way or execute the invalid code regardless.
|
||
|
||
Phillip Eby has proposed a pseudo-code algorithm for bytecode
|
||
stack depth verification used by the reference implementation.
|
||
|
||
|
||
Verification Issues
|
||
|
||
This PEP describes only a small number of verifications. While
|
||
discussion and analysis will lead to many more, it is highly
|
||
possible that future verification may need to be done or custom,
|
||
project-specific verifications. For this reason, it might be
|
||
desirable to add a verification registration interface to the test
|
||
implementation to register future verifiers. The need for this is
|
||
minimal since custom verifiers can subclass and extend the current
|
||
implementation for added behavior.
|
||
|
||
|
||
Required Changes
|
||
|
||
Armin Rigo noted that several byte-codes will need modification in
|
||
order for their stack effect to be statically analyzed. These are
|
||
END_FINALLY, POP_BLOCK, and MAKE_CLOSURE. Armin and Guido have
|
||
already agreed on how to correct the instructions. Currently the
|
||
Python implementation punts on these instructions.
|
||
|
||
This PEP does not propose to add the verification step to the
|
||
interpreter, but only to provide the Python implementation in the
|
||
standard library for optional use. Whether or not this
|
||
verification procedure is translated into C, included with the PVM
|
||
or enforced in any way is left for future discussion.
|
||
|
||
|
||
References
|
||
|
||
[1] The Java Virtual Machine Specification 2nd Edition
|
||
http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html
|
||
|
||
|
||
Copyright
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
End:
|