added PEP 330, Python Bytecode Verification, by Michel Pelletier
This commit is contained in:
parent
d8aca009a6
commit
068a7f290b
|
@ -122,6 +122,7 @@ Index by Category
|
|||
S 324 popen5 - New POSIX process module Astrand
|
||||
S 325 Resource-Release Support for Generators Pedroni
|
||||
S 327 Decimal Data Type Batista
|
||||
S 330 Python Bytecode Verification Pelletier
|
||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||
|
||||
Finished PEPs (done, implemented in CVS)
|
||||
|
@ -351,6 +352,7 @@ Numerical Index
|
|||
S 327 Decimal Data Type Batista
|
||||
SA 328 Imports: Multi-Line and Absolute/Relative Aahz
|
||||
SR 329 Treating Builtins as Constants in the Standard Library Hettinger
|
||||
S 330 Python Bytecode Verification Pelletier
|
||||
SR 666 Reject Foolish Indentation Creighton
|
||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||
|
||||
|
|
|
@ -0,0 +1,208 @@
|
|||
PEP: 330
|
||||
Title: Python Bytecode Verification
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Michel Pelletier <michel@users.sourceforge.net>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/plain
|
||||
Created: 17-Jun-2004
|
||||
Python-Version: 2.6?
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
|
||||
If Python Virtual Machine (PVM) bytecode is not "well-formed" it
|
||||
is possible to crash or exploit the PVM by causing various errors
|
||||
such as under/overflowing the value stack or reading/writing into
|
||||
arbitrary areas of the PVM program space. Most of these kinds of
|
||||
errors can be eliminated by verifying that PVM bytecode does not
|
||||
violate a set of simple constraints before execution.
|
||||
|
||||
This PEP proposes a set of constraints on the format and structure
|
||||
of Python Virtual Machine (PVM) bytecode and provides an
|
||||
implementation in Python of this verification process.
|
||||
|
||||
|
||||
Motivation
|
||||
|
||||
The Python Virtual Machine executes Python programs that have been
|
||||
compiled from the Python language into a bytecode representation.
|
||||
The PVM assumes that any bytecode being executed is "well-formed"
|
||||
with regard to a number implicit constraints. Some of these
|
||||
constraints are checked at run-time, but most of them are not due
|
||||
to the overhead they would create.
|
||||
|
||||
When running in debug mode the PVM does do several run-time checks
|
||||
to ensure that any particular bytecode cannot violate these
|
||||
constraints that, to a degree, prevent bytecode from crashing or
|
||||
exploiting the interpreter. These checks add a measurable
|
||||
overhead to the interpreter, and are typically turned off in
|
||||
common use.
|
||||
|
||||
Bytecode that is not well-formed and executed by a PVM not running
|
||||
in debug mode may create a variety of fatal and non-fatal errors.
|
||||
Typically, ill-formed code will cause the PVM to seg-fault and
|
||||
cause the OS to immediately and abruptly terminate the
|
||||
interpreter.
|
||||
|
||||
Conceivably, ill-formed bytecode could exploit the interpreter and
|
||||
allow Python bytecode to execute arbitrary C-level machine
|
||||
instructions or to modify private, internal data structures in the
|
||||
interpreter. If used cleverly this could subvert any form of
|
||||
security policy an application may want to apply to it's objects.
|
||||
|
||||
Practically, it would be difficult for a malicious user to
|
||||
"inject" invalid bytecode into a PVM for the purposes of
|
||||
exploitation, but not impossible. Buffer overflow and memory
|
||||
overwrite attacks are commonly understood, particularly when the
|
||||
exploit payload is transmitted unencrypted over a network or when
|
||||
a file or network security permission weakness is used as a
|
||||
foothold for further attacks.
|
||||
|
||||
Ideally, no bytecode should ever be allowed to read or write
|
||||
underlying C-level data structures to subvert the operation of the
|
||||
PVM, whether the bytecode was maliciously crafted or not. A
|
||||
simple pre-execution verification step could ensure that bytecode
|
||||
cannot over/underflow the value stack or access other sensitive
|
||||
areas of PVM program space at run-time.
|
||||
|
||||
This PEP proposes several validation steps that should be taken on
|
||||
Python bytecode before it is executed by the PVM so that it
|
||||
compiles with static and structure constraints on its instructions
|
||||
and their operands. These steps are simple and catch a large
|
||||
class of invalid bytecode that can cause crashes. There is also
|
||||
some possibility that some run-time checks can be eliminated up
|
||||
front by a verification pass.
|
||||
|
||||
There is, of course, no way to verify that bytecode is "completely
|
||||
safe", for every definition of complete and safe. Even with
|
||||
bytecode verification, Python programs can and most likely in the
|
||||
future will seg-fault for a variety of reasons and continue to
|
||||
cause many different classes of run-time errors, fatal or not.
|
||||
The verification step proposed here simply plugs an easy hole that
|
||||
can cause a large class of fatal and subtle errors at the bytecode
|
||||
level.
|
||||
|
||||
Currently, the Java Virtual Machine (JVM) verifies Java bytecode
|
||||
in a way very similar to what is proposed here. The JVM
|
||||
Specification version 2 [1], Sections 4.8 and 4.9 were therefore
|
||||
used as a basis for some of the constraints explained below. Any
|
||||
Python bytecode verification implementation at a minimum must
|
||||
enforce these constraints, but may not be limited to them.
|
||||
|
||||
|
||||
Static Constraints on Bytecode Instructions
|
||||
|
||||
1. The bytecode string must not be empty. (len(co_code) > 0).
|
||||
|
||||
2. The bytecode string cannot exceed a maximum size
|
||||
(len(co_code) < sizeof(unsigned char) - 1).
|
||||
|
||||
3. The first instruction in the bytecode string begins at index 0.
|
||||
|
||||
4. Only valid byte-codes with the correct number of operands can
|
||||
be in the bytecode string.
|
||||
|
||||
|
||||
Static Constraints on Bytecode Instruction Operands
|
||||
|
||||
1. The target of a jump instruction must be within the code
|
||||
boundaries and must fall on an instruction, never between an
|
||||
instruction and its operands.
|
||||
|
||||
2. The operand of a LOAD_* instruction must be an valid index into
|
||||
its corresponding data structure.
|
||||
|
||||
3. The operand of a STORE_* instruction must be an valid index
|
||||
into its corresponding data structure.
|
||||
|
||||
|
||||
Structural Constraints between Bytecode Instructions
|
||||
|
||||
1. Each instruction must only be executed with the appropriate
|
||||
number of arguments in the value stack, regardless of the
|
||||
execution path that leads to its invocation.
|
||||
|
||||
2. If an instruction can be executed along several different
|
||||
execution paths, the value stack must have the same depth prior
|
||||
to the execution of the instruction, regardless of the path
|
||||
taken.
|
||||
|
||||
3. At no point during execution can the value stack grow to a
|
||||
depth greater than that implied by co_stacksize.
|
||||
|
||||
4. Execution never falls off the bottom of co_code.
|
||||
|
||||
|
||||
Implementation
|
||||
|
||||
This PEP is the working document for an Python bytecode
|
||||
verification implementation written in Python. This
|
||||
implementation is not used implicitly by the PVM before executing
|
||||
any bytecode, but is to be used explicitly by users concerned
|
||||
about possibly invalid bytecode with the following snippet:
|
||||
|
||||
import verify
|
||||
verify.verify(object)
|
||||
|
||||
The `verify` module provides a `verify` function which accepts the
|
||||
same kind of arguments as `dis.dis`: classes, methods, functions,
|
||||
or code objects. It verifies that the object's bytecode is
|
||||
well-formed according to the specifications of this PEP.
|
||||
|
||||
If the code is well-formed the call to `verify` returns silently
|
||||
without error. If an error is encountered, it throws a
|
||||
'VerificationError' whose argument indicates the cause of the
|
||||
failure. It is up to the programmer whether or not to handle the
|
||||
error in some way or execute the invalid code regardless.
|
||||
|
||||
Phillip Eby has proposed a pseudo-code algorithm for bytecode
|
||||
stack depth verification used by the reference implementation.
|
||||
|
||||
|
||||
Verification Issues
|
||||
|
||||
This PEP describes only a small number of verifications. While
|
||||
discussion and analysis will lead to many more, it is highly
|
||||
possible that future verification may need to be done or custom,
|
||||
project-specific verifications. For this reason, it might be
|
||||
desirable to add a verification registration interface to the test
|
||||
implementation to register future verifiers. The need for this is
|
||||
minimal since custom verifiers can subclass and extend the current
|
||||
implementation for added behavior.
|
||||
|
||||
|
||||
Required Changes
|
||||
|
||||
Armin Rigo noted that several byte-codes will need modification in
|
||||
order for their stack effect to be statically analyzed. These are
|
||||
END_FINALLY, POP_BLOCK, and MAKE_CLOSURE. Armin and Guido have
|
||||
already agreed on how to correct the instructions. Currently the
|
||||
Python implementation punts on these instructions.
|
||||
|
||||
This PEP does not propose to add the verification step to the
|
||||
interpreter, but only to provide the Python implementation in the
|
||||
standard library for optional use. Whether or not this
|
||||
verification procedure is translated into C, included with the PVM
|
||||
or enforced in any way is left for future discussion.
|
||||
|
||||
|
||||
References
|
||||
|
||||
[1] The Java Virtual Machine Specification 2nd Edition
|
||||
http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html
|
||||
|
||||
|
||||
Copyright
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
End:
|
Loading…
Reference in New Issue