Convert 10 text PEPs to reST format (#196)

PEP 208

PEP 272:
- Fix example traceback code from File "<stdin>", line 1, in ?
  into File "<stdin>", line 1, in <module>

PEP 273:
- replace urls from http://python.org/sf/*
   into https://bugs.python.org/issue*

PEP 292

PEP320:
- move urls into references section
- replace urls from http://python.org/sf/*
   into https://bugs.python.org/issue*

PEP 330

PEP 331:
- fix PSF contributor agreement url
- replace urls from http://python.org/sf/*
   into https://bugs.python.org/issue*

PEP 357

PEP 3130

PEP 3140
This commit is contained in:
Mariatta 2017-02-02 09:58:49 -08:00 committed by GitHub
parent 9b58a292ea
commit 96d2e59aa4
10 changed files with 1607 additions and 1386 deletions

View File

@ -5,229 +5,242 @@ Last-Modified: $Date$
Author: nas@arctrix.com (Neil Schemenauer), mal@lemburg.com (Marc-André Lemburg)
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 04-Dec-2000
Python-Version: 2.1
Post-History:
Abstract
========
Many Python types implement numeric operations. When the arguments of
a numeric operation are of different types, the interpreter tries to
coerce the arguments into a common type. The numeric operation is
then performed using this common type. This PEP proposes a new type
flag to indicate that arguments to a type's numeric operations should
not be coerced. Operations that do not support the supplied types
indicate it by returning a new singleton object. Types which do not
set the type flag are handled in a backwards compatible manner.
Allowing operations handle different types is often simpler, more
flexible, and faster than having the interpreter do coercion.
Many Python types implement numeric operations. When the arguments of
a numeric operation are of different types, the interpreter tries to
coerce the arguments into a common type. The numeric operation is
then performed using this common type. This PEP proposes a new type
flag to indicate that arguments to a type's numeric operations should
not be coerced. Operations that do not support the supplied types
indicate it by returning a new singleton object. Types which do not
set the type flag are handled in a backwards compatible manner.
Allowing operations handle different types is often simpler, more
flexible, and faster than having the interpreter do coercion.
Rationale
=========
When implementing numeric or other related operations, it is often
desirable to provide not only operations between operands of one type
only, e.g. integer + integer, but to generalize the idea behind the
operation to other type combinations as well, e.g. integer + float.
When implementing numeric or other related operations, it is often
desirable to provide not only operations between operands of one type
only, e.g. integer + integer, but to generalize the idea behind the
operation to other type combinations as well, e.g. integer + float.
A common approach to this mixed type situation is to provide a method
of "lifting" the operands to a common type (coercion) and then use
that type's operand method as execution mechanism. Yet, this strategy
has a few drawbacks:
A common approach to this mixed type situation is to provide a method
of "lifting" the operands to a common type (coercion) and then use
that type's operand method as execution mechanism. Yet, this strategy
has a few drawbacks:
* the "lifting" process creates at least one new (temporary)
operand object,
* the "lifting" process creates at least one new (temporary)
operand object,
* since the coercion method is not being told about the operation
that is to follow, it is not possible to implement operation
specific coercion of types,
* since the coercion method is not being told about the operation
that is to follow, it is not possible to implement operation
specific coercion of types,
* there is no elegant way to solve situations were a common type
is not at hand, and
* there is no elegant way to solve situations were a common type
is not at hand, and
* the coercion method will always have to be called prior to the
operation's method itself.
* the coercion method will always have to be called prior to the
operation's method itself.
A fix for this situation is obviously needed, since these drawbacks
make implementations of types needing these features very cumbersome,
if not impossible. As an example, have a look at the DateTime and
DateTimeDelta[1] types, the first being absolute, the second
relative. You can always add a relative value to an absolute one,
giving a new absolute value. Yet, there is no common type which the
existing coercion mechanism could use to implement that operation.
A fix for this situation is obviously needed, since these drawbacks
make implementations of types needing these features very cumbersome,
if not impossible. As an example, have a look at the ``DateTime`` and
``DateTimeDelta`` [1]_ types, the first being absolute, the second
relative. You can always add a relative value to an absolute one,
giving a new absolute value. Yet, there is no common type which the
existing coercion mechanism could use to implement that operation.
Currently, PyInstance types are treated specially by the interpreter
in that their numeric methods are passed arguments of different types.
Removing this special case simplifies the interpreter and allows other
types to implement numeric methods that behave like instance types.
This is especially useful for extension types like ExtensionClass.
Currently, ``PyInstance`` types are treated specially by the interpreter
in that their numeric methods are passed arguments of different types.
Removing this special case simplifies the interpreter and allows other
types to implement numeric methods that behave like instance types.
This is especially useful for extension types like ExtensionClass.
Specification
=============
Instead of using a central coercion method, the process of handling
different operand types is simply left to the operation. If the
operation finds that it cannot handle the given operand type
combination, it may return a special singleton as indicator.
Instead of using a central coercion method, the process of handling
different operand types is simply left to the operation. If the
operation finds that it cannot handle the given operand type
combination, it may return a special singleton as indicator.
Note that "numbers" (anything that implements the number protocol, or
part of it) written in Python already use the first part of this
strategy - it is the C level API that we focus on here.
Note that "numbers" (anything that implements the number protocol, or
part of it) written in Python already use the first part of this
strategy - it is the C level API that we focus on here.
To maintain nearly 100% backward compatibility we have to be very
careful to make numbers that don't know anything about the new
strategy (old style numbers) work just as well as those that expect
the new scheme (new style numbers). Furthermore, binary compatibility
is a must, meaning that the interpreter may only access and use new
style operations if the number indicates the availability of these.
To maintain nearly 100% backward compatibility we have to be very
careful to make numbers that don't know anything about the new
strategy (old style numbers) work just as well as those that expect
the new scheme (new style numbers). Furthermore, binary compatibility
is a must, meaning that the interpreter may only access and use new
style operations if the number indicates the availability of these.
A new style number is considered by the interpreter as such if and
only if it sets the type flag Py_TPFLAGS_CHECKTYPES. The main
difference between an old style number and a new style one is that the
numeric slot functions can no longer assume to be passed arguments of
identical type. New style slots must check all arguments for proper
type and implement the necessary conversions themselves. This may seem
to cause more work on the behalf of the type implementor, but is in
fact no more difficult than writing the same kind of routines for an
old style coercion slot.
A new style number is considered by the interpreter as such if and
only if it sets the type flag ``Py_TPFLAGS_CHECKTYPES``. The main
difference between an old style number and a new style one is that the
numeric slot functions can no longer assume to be passed arguments of
identical type. New style slots must check all arguments for proper
type and implement the necessary conversions themselves. This may seem
to cause more work on the behalf of the type implementor, but is in
fact no more difficult than writing the same kind of routines for an
old style coercion slot.
If a new style slot finds that it cannot handle the passed argument
type combination, it may return a new reference of the special
singleton Py_NotImplemented to the caller. This will cause the caller
to try the other operands operation slots until it finds a slot that
does implement the operation for the specific type combination. If
none of the possible slots succeed, it raises a TypeError.
If a new style slot finds that it cannot handle the passed argument
type combination, it may return a new reference of the special
singleton ``Py_NotImplemented`` to the caller. This will cause the caller
to try the other operands operation slots until it finds a slot that
does implement the operation for the specific type combination. If
none of the possible slots succeed, it raises a ``TypeError``.
To make the implementation easy to understand (the whole topic is
esoteric enough), a new layer in the handling of numeric operations is
introduced. This layer takes care of all the different cases that need
to be taken into account when dealing with all the possible
combinations of old and new style numbers. It is implemented by the
two static functions binary_op() and ternary_op(), which are both
internal functions that only the functions in Objects/abstract.c
have access to. The numeric API (PyNumber_*) is easy to adapt to
this new layer.
To make the implementation easy to understand (the whole topic is
esoteric enough), a new layer in the handling of numeric operations is
introduced. This layer takes care of all the different cases that need
to be taken into account when dealing with all the possible
combinations of old and new style numbers. It is implemented by the
two static functions ``binary_op()`` and ``ternary_op()``, which are both
internal functions that only the functions in Objects/abstract.c
have access to. The numeric API (``PyNumber_*``) is easy to adapt to
this new layer.
As a side-effect all numeric slots can be NULL-checked (this has to be
done anyway, so the added feature comes at no extra cost).
As a side-effect all numeric slots can be NULL-checked (this has to be
done anyway, so the added feature comes at no extra cost).
The scheme used by the layer to execute a binary operation is as
follows:
The scheme used by the layer to execute a binary operation is as
follows:
v | w | Action taken
---------+------------+----------------------------------
new | new | v.op(v,w), w.op(v,w)
new | old | v.op(v,w), coerce(v,w), v.op(v,w)
old | new | w.op(v,w), coerce(v,w), v.op(v,w)
old | old | coerce(v,w), v.op(v,w)
=== === =================================
v w Action taken
=== === =================================
new new v.op(v,w), w.op(v,w)
new old v.op(v,w), coerce(v,w), v.op(v,w)
old new w.op(v,w), coerce(v,w), v.op(v,w)
old old coerce(v,w), v.op(v,w)
=== === =================================
The indicated action sequence is executed from left to right until
either the operation succeeds and a valid result (!=
Py_NotImplemented) is returned or an exception is raised. Exceptions
are returned to the calling function as-is. If a slot returns
Py_NotImplemented, the next item in the sequence is executed.
The indicated action sequence is executed from left to right until
either the operation succeeds and a valid result (!=
``Py_NotImplemented``) is returned or an exception is raised. Exceptions
are returned to the calling function as-is. If a slot returns
``Py_NotImplemented``, the next item in the sequence is executed.
Note that coerce(v,w) will use the old style nb_coerce slot methods
via a call to PyNumber_Coerce().
Note that coerce(v,w) will use the old style ``nb_coerce`` slot methods
via a call to ``PyNumber_Coerce()``.
Ternary operations have a few more cases to handle:
Ternary operations have a few more cases to handle:
v | w | z | Action taken
----+-----+-----+------------------------------------
new | new | new | v.op(v,w,z), w.op(v,w,z), z.op(v,w,z)
new | old | new | v.op(v,w,z), z.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
old | new | new | w.op(v,w,z), z.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
old | old | new | z.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
new | new | old | v.op(v,w,z), w.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
new | old | old | v.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
old | new | old | w.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
old | old | old | coerce(v,w,z), v.op(v,w,z)
=== === === ====================================================
v w z Action taken
=== === === ====================================================
new new new v.op(v,w,z), w.op(v,w,z), z.op(v,w,z)
new old new v.op(v,w,z), z.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
old new new w.op(v,w,z), z.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
old old new z.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
new new old v.op(v,w,z), w.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
new old old v.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
old new old w.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
old old old coerce(v,w,z), v.op(v,w,z)
=== === === ====================================================
The same notes as above, except that coerce(v,w,z) actually does:
The same notes as above, except that coerce(v,w,z) actually does::
if z != Py_None:
coerce(v,w), coerce(v,z), coerce(w,z)
else:
# treat z as absent variable
coerce(v,w)
if z != Py_None:
coerce(v,w), coerce(v,z), coerce(w,z)
else:
# treat z as absent variable
coerce(v,w)
The current implementation uses this scheme already (there's only one
ternary slot: nb_pow(a,b,c)).
The current implementation uses this scheme already (there's only one
ternary slot: ``nb_pow(a,b,c))``.
Note that the numeric protocol is also used for some other related
tasks, e.g. sequence concatenation. These can also benefit from the
new mechanism by implementing right-hand operations for type
combinations that would otherwise fail to work. As an example, take
string concatenation: currently you can only do string + string. With
the new mechanism, a new string-like type could implement new_type +
string and string + new_type, even though strings don't know anything
about new_type.
Note that the numeric protocol is also used for some other related
tasks, e.g. sequence concatenation. These can also benefit from the
new mechanism by implementing right-hand operations for type
combinations that would otherwise fail to work. As an example, take
string concatenation: currently you can only do string + string. With
the new mechanism, a new string-like type could implement new_type +
string and string + new_type, even though strings don't know anything
about new_type.
Since comparisons also rely on coercion (every time you compare an
integer to a float, the integer is first converted to float and then
compared...), a new slot to handle numeric comparisons is needed:
Since comparisons also rely on coercion (every time you compare an
integer to a float, the integer is first converted to float and then
compared...), a new slot to handle numeric comparisons is needed::
PyObject *nb_cmp(PyObject *v, PyObject *w)
PyObject *nb_cmp(PyObject *v, PyObject *w)
This slot should compare the two objects and return an integer object
stating the result. Currently, this result integer may only be -1, 0,
1. If the slot cannot handle the type combination, it may return a
reference to Py_NotImplemented. [XXX Note that this slot is still
in flux since it should take into account rich comparisons
(i.e. PEP 207).]
This slot should compare the two objects and return an integer object
stating the result. Currently, this result integer may only be -1, 0, 1.
If the slot cannot handle the type combination, it may return a
reference to ``Py_NotImplemented``. [XXX Note that this slot is still
in flux since it should take into account rich comparisons
(i.e. PEP 207).]
Numeric comparisons are handled by a new numeric protocol API:
Numeric comparisons are handled by a new numeric protocol API::
PyObject *PyNumber_Compare(PyObject *v, PyObject *w)
PyObject *PyNumber_Compare(PyObject *v, PyObject *w)
This function compare the two objects as "numbers" and return an
integer object stating the result. Currently, this result integer may
only be -1, 0, 1. In case the operation cannot be handled by the given
objects, a TypeError is raised.
This function compare the two objects as "numbers" and return an
integer object stating the result. Currently, this result integer may
only be -1, 0, 1. In case the operation cannot be handled by the given
objects, a ``TypeError`` is raised.
The PyObject_Compare() API needs to adjusted accordingly to make use
of this new API.
The ``PyObject_Compare()`` API needs to adjusted accordingly to make use
of this new API.
Other changes include adapting some of the built-in functions (e.g.
cmp()) to use this API as well. Also, PyNumber_CoerceEx() will need to
check for new style numbers before calling the nb_coerce slot. New
style numbers don't provide a coercion slot and thus cannot be
explicitly coerced.
Other changes include adapting some of the built-in functions (e.g.
``cmp()``) to use this API as well. Also, ``PyNumber_CoerceEx()`` will need to
check for new style numbers before calling the ``nb_coerce`` slot. New
style numbers don't provide a coercion slot and thus cannot be
explicitly coerced.
Reference Implementation
========================
A preliminary patch for the CVS version of Python is available through
the Source Forge patch manager[2].
A preliminary patch for the CVS version of Python is available through
the Source Forge patch manager [2]_.
Credits
=======
This PEP and the patch are heavily based on work done by Marc-André
Lemburg[3].
This PEP and the patch are heavily based on work done by Marc-André
Lemburg [3]_.
Copyright
=========
This document has been placed in the public domain.
This document has been placed in the public domain.
References
==========
[1] http://www.lemburg.com/files/python/mxDateTime.html
[2] http://sourceforge.net/patch/?func=detailpatch&patch_id=102652&group_id=5470
[3] http://www.lemburg.com/files/python/CoercionProposal.html
.. [1] http://www.lemburg.com/files/python/mxDateTime.html
.. [2] http://sourceforge.net/patch/?func=detailpatch&patch_id=102652&group_id=5470
.. [3] http://www.lemburg.com/files/python/CoercionProposal.html
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

View File

@ -5,173 +5,185 @@ Last-Modified: $Date$
Author: A.M. Kuchling <amk@amk.ca>
Status: Final
Type: Informational
Content-Type: text/x-rst
Created: 18-Sep-2001
Post-History: 17-Apr-2002, 29-May-2002
Abstract
This document specifies a standard API for secret-key block
encryption algorithms such as DES or Rijndael, making it easier to
switch between different algorithms and implementations.
Abstract
========
This document specifies a standard API for secret-key block
encryption algorithms such as DES or Rijndael, making it easier to
switch between different algorithms and implementations.
Introduction
============
Encryption algorithms transform their input data (called
plaintext) in some way that is dependent on a variable key,
producing ciphertext. The transformation can easily be reversed
if and only if one knows the key. The key is a sequence of bits
chosen from some very large space of possible keys. There are two
classes of encryption algorithms: block ciphers and stream ciphers.
Encryption algorithms transform their input data (called
plaintext) in some way that is dependent on a variable key,
producing ciphertext. The transformation can easily be reversed
if and only if one knows the key. The key is a sequence of bits
chosen from some very large space of possible keys. There are two
classes of encryption algorithms: block ciphers and stream ciphers.
Block ciphers encrypt multibyte inputs of a fixed size (frequently
8 or 16 bytes long), and can be operated in various feedback
modes. The feedback modes supported in this specification are:
Block ciphers encrypt multibyte inputs of a fixed size (frequently
8 or 16 bytes long), and can be operated in various feedback
modes. The feedback modes supported in this specification are:
Number Constant Description
1 MODE_ECB Electronic Code Book
2 MODE_CBC Cipher Block Chaining
3 MODE_CFB Cipher Feedback
5 MODE_OFB Output Feedback
6 MODE_CTR Counter
====== ======== =====================
Number Constant Description
====== ======== =====================
1 MODE_ECB Electronic Code Book
2 MODE_CBC Cipher Block Chaining
3 MODE_CFB Cipher Feedback
5 MODE_OFB Output Feedback
6 MODE_CTR Counter
====== ======== =====================
These modes are to be implemented as described in NIST publication
SP 800-38A [1]. Descriptions of the first three feedback modes can
also be found in Bruce Schneier's book _Applied
Cryptography_ [2].
These modes are to be implemented as described in NIST publication
SP 800-38A [1]_. Descriptions of the first three feedback modes can
also be found in Bruce Schneier's book *Applied Cryptography* [2]_.
(The numeric value 4 is reserved for MODE_PGP, a variant of CFB
described in RFC 2440: "OpenPGP Message Format" [3]. This mode
isn't considered important enough to make it worth requiring it
for all block encryption ciphers, though supporting it is a nice
extra feature.)
(The numeric value 4 is reserved for MODE_PGP, a variant of CFB
described in RFC 2440: "OpenPGP Message Format" [3]_. This mode
isn't considered important enough to make it worth requiring it
for all block encryption ciphers, though supporting it is a nice
extra feature.)
In a strict formal sense, stream ciphers encrypt data bit-by-bit;
practically, stream ciphers work on a character-by-character
basis. This PEP only aims at specifying an interface for block
ciphers, though stream ciphers can support the interface described
here by fixing 'block_size' to 1. Feedback modes also don't make
sense for stream ciphers, so the only reasonable feedback mode
would be ECB mode.
In a strict formal sense, stream ciphers encrypt data bit-by-bit;
practically, stream ciphers work on a character-by-character
basis. This PEP only aims at specifying an interface for block
ciphers, though stream ciphers can support the interface described
here by fixing 'block_size' to 1. Feedback modes also don't make
sense for stream ciphers, so the only reasonable feedback mode
would be ECB mode.
Specification
=============
Encryption modules can add additional functions, methods, and
attributes beyond those described in this PEP, but all of the
features described in this PEP must be present for a module to
claim compliance with it.
Encryption modules can add additional functions, methods, and
attributes beyond those described in this PEP, but all of the
features described in this PEP must be present for a module to
claim compliance with it.
Secret-key encryption modules should define one function:
Secret-key encryption modules should define one function::
new(key, mode, [IV], **kwargs)
Returns a ciphering object, using the secret key contained in the
string 'key', and using the feedback mode 'mode', which must be
one of the constants from the table above.
Returns a ciphering object, using the secret key contained in the
string 'key', and using the feedback mode 'mode', which must be
one of the constants from the table above.
If 'mode' is MODE_CBC or MODE_CFB, 'IV' must be provided and must
be a string of the same length as the block size. Not providing a
value of 'IV' will result in a ValueError exception being raised.
If 'mode' is MODE_CBC or MODE_CFB, 'IV' must be provided and must
be a string of the same length as the block size. Not providing a
value of 'IV' will result in a ``ValueError`` exception being raised.
Depending on the algorithm, a module may support additional
keyword arguments to this function. Some keyword arguments are
specified by this PEP, and modules are free to add additional
keyword arguments. If a value isn't provided for a given keyword,
a secure default value should be used. For example, if an
algorithm has a selectable number of rounds between 1 and 16, and
1-round encryption is insecure and 8-round encryption is believed
secure, the default value for 'rounds' should be 8 or more.
(Module implementors can choose a very slow but secure value, too,
such as 16 in this example. This decision is left up to the
implementor.)
Depending on the algorithm, a module may support additional
keyword arguments to this function. Some keyword arguments are
specified by this PEP, and modules are free to add additional
keyword arguments. If a value isn't provided for a given keyword,
a secure default value should be used. For example, if an
algorithm has a selectable number of rounds between 1 and 16, and
1-round encryption is insecure and 8-round encryption is believed
secure, the default value for 'rounds' should be 8 or more.
(Module implementors can choose a very slow but secure value, too,
such as 16 in this example. This decision is left up to the
implementor.)
The following table lists keyword arguments defined by this PEP:
Keyword Meaning
counter Callable object that returns counter blocks
(see below; CTR mode only)
The following table lists keyword arguments defined by this PEP:
rounds Number of rounds of encryption to use
============ ============================================
Keyword Meaning
============ ============================================
counter Callable object that returns counter blocks
(see below; CTR mode only)
segment_size Size of data and ciphertext segments,
measured in bits (see below; CFB mode only)
The Counter feedback mode requires a sequence of input blocks,
called counters, that are used to produce the output. When 'mode'
is MODE_CTR, the 'counter' keyword argument must be provided, and
its value must be a callable object, such as a function or method.
Successive calls to this callable object must return a sequence of
strings that are of the length 'block_size' and that never
repeats. (Appendix B of the NIST publication gives a way to
generate such a sequence, but that's beyond the scope of this
PEP.)
The CFB mode operates on segments of the plaintext and ciphertext
that are 'segment_size' bits long. Therefore, when using this
mode, the input and output strings must be a multiple of
'segment_size' bits in length. 'segment_size' must be an integer
between 1 and block_size*8, inclusive. (The factor of 8 comes
from 'block_size' being measured in bytes and not in bits). The
default value for this parameter should be block_size*8.
Implementors are allowed to constrain 'segment_size' to be a
multiple of 8 for simplicity, but they're encouraged to support
arbitrary values for generality.
rounds Number of rounds of encryption to use
Secret-key encryption modules should define two variables:
segment_size Size of data and ciphertext segments,
measured in bits (see below; CFB mode only)
============ ============================================
block_size
The Counter feedback mode requires a sequence of input blocks,
called counters, that are used to produce the output. When 'mode'
is MODE_CTR, the 'counter' keyword argument must be provided, and
its value must be a callable object, such as a function or method.
Successive calls to this callable object must return a sequence of
strings that are of the length 'block_size' and that never
repeats. (Appendix B of the NIST publication gives a way to
generate such a sequence, but that's beyond the scope of this
PEP.)
An integer value; the size of the blocks encrypted by this
module, measured in bytes. For all feedback modes, the length
of strings passed to the encrypt() and decrypt() must be a
multiple of the block size.
The CFB mode operates on segments of the plaintext and ciphertext
that are 'segment_size' bits long. Therefore, when using this
mode, the input and output strings must be a multiple of
'segment_size' bits in length. 'segment_size' must be an integer
between 1 and block_size\*8, inclusive. (The factor of 8 comes
from 'block_size' being measured in bytes and not in bits). The
default value for this parameter should be block_size\*8.
Implementors are allowed to constrain 'segment_size' to be a
multiple of 8 for simplicity, but they're encouraged to support
arbitrary values for generality.
key_size
Secret-key encryption modules should define two variables:
An integer value; the size of the keys required by this
module, measured in bytes. If key_size is None, then the
algorithm accepts variable-length keys. This may mean the
module accepts keys of any random length, or that there are a
few different possible lengths, e.g. 16, 24, or 32 bytes. You
cannot pass a key of length 0 (that is, the null string '') as
a variable-length key.
- block_size
Cipher objects should have two attributes:
An integer value; the size of the blocks encrypted by this
module, measured in bytes. For all feedback modes, the length
of strings passed to the encrypt() and decrypt() must be a
multiple of the block size.
block_size
- key_size
An integer value equal to the size of the blocks encrypted by
this object. For algorithms with a variable block size, this
value is equal to the block size selected for this object.
An integer value; the size of the keys required by this
module, measured in bytes. If key_size is None, then the
algorithm accepts variable-length keys. This may mean the
module accepts keys of any random length, or that there are a
few different possible lengths, e.g. 16, 24, or 32 bytes. You
cannot pass a key of length 0 (that is, the null string '') as
a variable-length key.
IV
Contains the initial value which will be used to start a
cipher feedback mode; it will always be a string exactly one
block in length. After encrypting or decrypting a string,
this value is updated to reflect the modified feedback text.
It is read-only, and cannot be assigned a new value.
Cipher objects require the following methods:
Cipher objects should have two attributes:
decrypt(string)
- block_size
Decrypts 'string', using the key-dependent data in the object
and with the appropriate feedback mode. The string's length
must be an exact multiple of the algorithm's block size or, in
CFB mode, of the segment size. Returns a string containing
the plaintext.
An integer value equal to the size of the blocks encrypted by
this object. For algorithms with a variable block size, this
value is equal to the block size selected for this object.
encrypt(string)
- IV
Encrypts a non-empty string, using the key-dependent data in
the object, and with the appropriate feedback mode. The
string's length must be an exact multiple of the algorithm's
block size or, in CFB mode, of the segment size. Returns a
string containing the ciphertext.
Contains the initial value which will be used to start a
cipher feedback mode; it will always be a string exactly one
block in length. After encrypting or decrypting a string,
this value is updated to reflect the modified feedback text.
It is read-only, and cannot be assigned a new value.
Here's an example, using a module named 'DES':
Cipher objects require the following methods:
- decrypt(string)
Decrypts 'string', using the key-dependent data in the object
and with the appropriate feedback mode. The string's length
must be an exact multiple of the algorithm's block size or, in
CFB mode, of the segment size. Returns a string containing
the plaintext.
- encrypt(string)
Encrypts a non-empty string, using the key-dependent data in
the object, and with the appropriate feedback mode. The
string's length must be an exact multiple of the algorithm's
block size or, in CFB mode, of the segment size. Returns a
string containing the ciphertext.
Here's an example, using a module named 'DES'::
>>> import DES
>>> obj = DES.new('abcdefgh', DES.MODE_ECB)
@ -179,8 +191,8 @@ Specification
>>> len(plaintext)
34
>>> obj.encrypt(plaintext)
Traceback (innermost last):
File "<stdin>", line 1, in ?
Traceback (innermost last)::
File "<stdin>", line 1, in <module>
ValueError: Strings for DES must be a multiple of 8 in length
>>> ciphertext = obj.encrypt(plain+'XXXXXX') # Add padding
>>> ciphertext
@ -190,41 +202,46 @@ Specification
References
==========
[1] NIST publication SP 800-38A, "Recommendation for Block Cipher
Modes of Operation" (http://csrc.nist.gov/encryption/modes/)
.. [1] NIST publication SP 800-38A, "Recommendation for Block Cipher
Modes of Operation" (http://csrc.nist.gov/encryption/modes/)
[2] Applied Cryptography
.. [2] Applied Cryptography
[3] RFC2440: "OpenPGP Message Format" (http://rfc2440.x42.com,
http://www.faqs.org/rfcs/rfc2440.html)
.. [3] RFC2440: "OpenPGP Message Format" (http://rfc2440.x42.com,
http://www.faqs.org/rfcs/rfc2440.html)
Changes
=======
2002-04: Removed references to stream ciphers; retitled PEP;
prefixed feedback mode constants with MODE_; removed PGP feedback
mode; added CTR and OFB feedback modes; clarified where numbers
are measured in bytes and where in bits.
2002-04: Removed references to stream ciphers; retitled PEP;
prefixed feedback mode constants with ``MODE_``; removed PGP feedback
mode; added CTR and OFB feedback modes; clarified where numbers
are measured in bytes and where in bits.
2002-09: Clarified the discussion of key length by using
"variable-length keys" instead of "arbitrary-length".
2002-09: Clarified the discussion of key length by using
"variable-length keys" instead of "arbitrary-length".
Acknowledgements
================
Thanks to the readers of the python-crypto list for their comments on
this PEP.
Thanks to the readers of the python-crypto list for their comments on
this PEP.
Copyright
=========
This document has been placed in the public domain.
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:

View File

@ -5,212 +5,240 @@ Last-Modified: $Date$
Author: jim@interet.com (James C. Ahlstrom)
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Oct-2001
Python-Version: 2.3
Post-History: 26-Oct-2001
Abstract
========
This PEP adds the ability to import Python modules
*.py, *.py[co] and packages from zip archives. The
same code is used to speed up normal directory imports
provided os.listdir is available.
This PEP adds the ability to import Python modules
``*.py``, ``*.py[co]`` and packages from zip archives. The
same code is used to speed up normal directory imports
provided ``os.listdir`` is available.
Note
====
Zip imports were added to Python 2.3, but the final implementation
uses an approach different from the one described in this PEP.
The 2.3 implementation is SourceForge patch #652586, which adds
new import hooks described in PEP 302.
Zip imports were added to Python 2.3, but the final implementation
uses an approach different from the one described in this PEP.
The 2.3 implementation is SourceForge patch #652586 [1]_, which adds
new import hooks described in PEP 302.
The rest of this PEP is therefore only of historical interest.
The rest of this PEP is therefore only of historical interest.
Specification
=============
Currently, sys.path is a list of directory names as strings. If
this PEP is implemented, an item of sys.path can be a string
naming a zip file archive. The zip archive can contain a
subdirectory structure to support package imports. The zip
archive satisfies imports exactly as a subdirectory would.
Currently, ``sys.path`` is a list of directory names as strings. If
this PEP is implemented, an item of ``sys.path`` can be a string
naming a zip file archive. The zip archive can contain a
subdirectory structure to support package imports. The zip
archive satisfies imports exactly as a subdirectory would.
The implementation is in C code in the Python core and works on
all supported Python platforms.
The implementation is in C code in the Python core and works on
all supported Python platforms.
Any files may be present in the zip archive, but only files
*.py and *.py[co] are available for import. Zip import of
dynamic modules (*.pyd, *.so) is disallowed.
Any files may be present in the zip archive, but only files
``*.py`` and ``*.py[co]`` are available for import. Zip import of
dynamic modules (``*.pyd``, ``*.so``) is disallowed.
Just as sys.path currently has default directory names, a default
zip archive name is added too. Otherwise there is no way to
import all Python library files from an archive.
Just as ``sys.path`` currently has default directory names, a default
zip archive name is added too. Otherwise there is no way to
import all Python library files from an archive.
Subdirectory Equivalence
========================
The zip archive must be treated exactly as a subdirectory tree so
we can support package imports based on current and future rules.
All zip data is taken from the Central Directory, the data must be
correct, and brain dead zip files are not accommodated.
The zip archive must be treated exactly as a subdirectory tree so
we can support package imports based on current and future rules.
All zip data is taken from the Central Directory, the data must be
correct, and brain dead zip files are not accommodated.
Suppose sys.path contains "/A/B/SubDir" and "/C/D/E/Archive.zip",
and we are trying to import modfoo from the Q package. Then
import.c will generate a list of paths and extensions and will
look for the file. The list of generated paths does not change
for zip imports. Suppose import.c generates the path
"/A/B/SubDir/Q/R/modfoo.pyc". Then it will also generate the path
"/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the SubDir path is
exactly equivalent to finding "Q/R/modfoo.pyc" in the archive.
Suppose ``sys.path`` contains "/A/B/SubDir" and "/C/D/E/Archive.zip",
and we are trying to import ``modfoo`` from the ``Q`` package. Then
``import.c`` will generate a list of paths and extensions and will
look for the file. The list of generated paths does not change
for zip imports. Suppose ``import.c`` generates the path
"/A/B/SubDir/Q/R/modfoo.pyc". Then it will also generate the path
"/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the SubDir path is
exactly equivalent to finding "Q/R/modfoo.pyc" in the archive.
Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then
your zip file will satisfy imports just as your subdirectory did.
Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then
your zip file will satisfy imports just as your subdirectory did.
Well, not quite. You can't satisfy dynamic modules from a zip
file. Dynamic modules have extensions like .dll, .pyd, and .so.
They are operating system dependent, and probably can't be loaded
except from a file. It might be possible to extract the dynamic
module from the zip file, write it to a plain file and load it.
But that would mean creating temporary files, and dealing with all
the dynload_*.c, and that's probably not a good idea.
Well, not quite. You can't satisfy dynamic modules from a zip
file. Dynamic modules have extensions like ``.dll``, ``.pyd``, and ``.so``.
They are operating system dependent, and probably can't be loaded
except from a file. It might be possible to extract the dynamic
module from the zip file, write it to a plain file and load it.
But that would mean creating temporary files, and dealing with all
the ``dynload_*.c``, and that's probably not a good idea.
When trying to import *.pyc, if it is not available then
*.pyo will be used instead. And vice versa when looking for *.pyo.
If neither *.pyc nor *.pyo is available, or if the magic numbers
are invalid, then *.py will be compiled and used to satisfy the
import, but the compiled file will not be saved. Python would
normally write it to the same directory as *.py, but surely we
don't want to write to the zip file. We could write to the
directory of the zip archive, but that would clutter it up, not
good if it is /usr/bin for example.
When trying to import ``*.pyc``, if it is not available then
``*.pyo`` will be used instead. And vice versa when looking for ``*.pyo``.
If neither ``*.pyc`` nor ``*.pyo`` is available, or if the magic numbers
are invalid, then ``*.py`` will be compiled and used to satisfy the
import, but the compiled file will not be saved. Python would
normally write it to the same directory as ``*.py``, but surely we
don't want to write to the zip file. We could write to the
directory of the zip archive, but that would clutter it up, not
good if it is ``/usr/bin`` for example.
Failing to write the compiled files will make zip imports very slow,
and the user will probably not figure out what is wrong. So it
is best to put *.pyc and *.pyo in the archive with the *.py.
Failing to write the compiled files will make zip imports very slow,
and the user will probably not figure out what is wrong. So it
is best to put ``*.pyc`` and ``*.pyo`` in the archive with the ``*.py``.
Efficiency
==========
The only way to find files in a zip archive is linear search. So
for each zip file in sys.path, we search for its names once, and
put the names plus other relevant data into a static Python
dictionary. The key is the archive name from sys.path joined with
the file name (including any subdirectories) within the archive.
This is exactly the name generated by import.c, and makes lookup
easy.
The only way to find files in a zip archive is linear search. So
for each zip file in ``sys.path``, we search for its names once, and
put the names plus other relevant data into a static Python
dictionary. The key is the archive name from ``sys.path`` joined with
the file name (including any subdirectories) within the archive.
This is exactly the name generated by ``import.c``, and makes lookup
easy.
This same mechanism is used to speed up directory (non-zip) imports.
See below.
This same mechanism is used to speed up directory (non-zip) imports.
See below.
zlib
====
Compressed zip archives require zlib for decompression. Prior to
any other imports, we attempt an import of zlib. Import of
compressed files will fail with a message "missing zlib" unless
zlib is available.
Compressed zip archives require ``zlib`` for decompression. Prior to
any other imports, we attempt an import of ``zlib``. Import of
compressed files will fail with a message "missing ``zlib``" unless
``zlib`` is available.
Booting
=======
Python imports site.py itself, and this imports os, nt, ntpath,
stat, and UserDict. It also imports sitecustomize.py which may
import more modules. Zip imports must be available before site.py
is imported.
Python imports ``site.py`` itself, and this imports ``os``, ``nt``, ``ntpath``,
``stat``, and ``UserDict``. It also imports ``sitecustomize.py`` which may
import more modules. Zip imports must be available before ``site.py``
is imported.
Just as there are default directories in sys.path, there must be
one or more default zip archives too.
Just as there are default directories in ``sys.path``, there must be
one or more default zip archives too.
The problem is what the name should be. The name should be linked
with the Python version, so the Python executable can correctly
find its corresponding libraries even when there are multiple
Python versions on the same machine.
The problem is what the name should be. The name should be linked
with the Python version, so the Python executable can correctly
find its corresponding libraries even when there are multiple
Python versions on the same machine.
We add one name to sys.path. On Unix, the directory is
sys.prefix + "/lib", and the file name is
"python%s%s.zip" % (sys.version[0], sys.version[2]).
So for Python 2.2 and prefix /usr/local, the path
/usr/local/lib/python2.2/ is already on sys.path, and
/usr/local/lib/python22.zip would be added.
On Windows, the file is the full path to python22.dll, with
"dll" replaced by "zip". The zip archive name is always inserted
as the second item in sys.path. The first is the directory of the
main.py (thanks Tim).
We add one name to ``sys.path``. On Unix, the directory is
``sys.prefix + "lib"``, and the file name is
``"python%s%s.zip" % (sys.version[0], sys.version[2])``.
So for Python 2.2 and prefix ``/usr/local``, the path
``/usr/local/lib/python2.2/`` is already on ``sys.path``, and
``/usr/local/lib/python22.zip`` would be added.
On Windows, the file is the full path to ``python22.dll``, with
"dll" replaced by "zip". The zip archive name is always inserted
as the second item in ``sys.path``. The first is the directory of the
``main.py`` (thanks Tim).
Directory Imports
=================
The static Python dictionary used to speed up zip imports can be
used to speed up normal directory imports too. For each item in
sys.path that is not a zip archive, we call os.listdir, and add
the directory contents to the dictionary. Then instead of calling
fopen() in a double loop, we just check the dictionary. This
greatly speeds up imports. If os.listdir doesn't exist, the
dictionary is not used.
The static Python dictionary used to speed up zip imports can be
used to speed up normal directory imports too. For each item in
``sys.path`` that is not a zip archive, we call ``os.listdir``, and add
the directory contents to the dictionary. Then instead of calling
``fopen()`` in a double loop, we just check the dictionary. This
greatly speeds up imports. If ``os.listdir`` doesn't exist, the
dictionary is not used.
Benchmarks
==========
Case Original 2.2a3 Using os.listdir Zip Uncomp Zip Compr
---- ----------------- ----------------- ---------- ----------
1 3.2 2.5 3.2->1.02 2.3 2.5 2.3->0.87 1.66->0.93 1.5->1.07
2 2.8 3.9 3.0->1.32 Same as Case 1.
3 5.7 5.7 5.7->5.7 2.1 2.1 2.1->1.8 1.25->0.99 1.19->1.13
4 9.4 9.4 9.3->9.35 Same as Case 3.
==== ================= ================= ========== ==========
Case Original 2.2a3 Using os.listdir Zip Uncomp Zip Compr
==== ================= ================= ========== ==========
1 3.2 2.5 3.2->1.02 2.3 2.5 2.3->0.87 1.66->0.93 1.5->1.07
2 2.8 3.9 3.0->1.32 Same as Case 1.
3 5.7 5.7 5.7->5.7 2.1 2.1 2.1->1.8 1.25->0.99 1.19->1.13
4 9.4 9.4 9.3->9.35 Same as Case 3.
==== ================= ================= ========== ==========
Case 1: Local drive C:, sys.path has its default value.
Case 2: Local drive C:, directory with files is at the end of sys.path.
Case 3: Network drive, sys.path has its default value.
Case 4: Network drive, directory with files is at the end of sys.path.
Case 1: Local drive C:, ``sys.path`` has its default value.
Case 2: Local drive C:, directory with files is at the end of ``sys.path``.
Case 3: Network drive, ``sys.path`` has its default value.
Case 4: Network drive, directory with files is at the end of ``sys.path``.
Benchmarks were performed on a Pentium 4 clone, 1.4 GHz, 256 Meg.
The machine was running Windows 2000 with a Linux/Samba network server.
Times are in seconds, and are the time to import about 100 Lib modules.
Case 2 and 4 have the "correct" directory moved to the end of sys.path.
"Uncomp" means uncompressed zip archive, "Compr" means compressed.
Benchmarks were performed on a Pentium 4 clone, 1.4 GHz, 256 Meg.
The machine was running Windows 2000 with a Linux/Samba network server.
Times are in seconds, and are the time to import about 100 Lib modules.
Case 2 and 4 have the "correct" directory moved to the end of ``sys.path``.
"Uncomp" means uncompressed zip archive, "Compr" means compressed.
Initial times are after a re-boot of the system; the time after
"->" is the time after repeated runs. Times to import from C:
after a re-boot are rather highly variable for the "Original" case,
but are more realistic.
Initial times are after a re-boot of the system; the time after
"->" is the time after repeated runs. Times to import from C:
after a re-boot are rather highly variable for the "Original" case,
but are more realistic.
Custom Imports
==============
The logic demonstrates the ability to import using default searching
until a needed Python module (in this case, os) becomes available.
This can be used to bootstrap custom importers. For example, if
"importer()" in __init__.py exists, then it could be used for imports.
The "importer()" can freely import os and other modules, and these
will be satisfied from the default mechanism. This PEP does not
define any custom importers, and this note is for information only.
The logic demonstrates the ability to import using default searching
until a needed Python module (in this case, os) becomes available.
This can be used to bootstrap custom importers. For example, if
"``importer()``" in ``__init__.py`` exists, then it could be used for imports.
The "``importer()``" can freely import os and other modules, and these
will be satisfied from the default mechanism. This PEP does not
define any custom importers, and this note is for information only.
Implementation
==============
A C implementation is available as SourceForge patch 492105.
Superseded by patch 652586 and current CVS.
http://python.org/sf/492105
A C implementation is available as SourceForge patch 492105.
Superseded by patch 652586 and current CVS. [2]_
A newer version (updated for recent CVS by Paul Moore) is 645650.
Superseded by patch 652586 and current CVS. [3]_
A competing implementation by Just van Rossum is 652586, which is
the basis for the final implementation of PEP 302. PEP 273 has
been implemented using PEP 302's import hooks. [1]_
References
==========
.. [1] Just van Rossum, New import hooks + Import from Zip files
https://bugs.python.org/issue652586
.. [2] Import from Zip archive, James C. Ahlstrom
https://bugs.python.org/issue492105
.. [3] Import from Zip Archive, Paul Moore
https://bugs.python.org/issue645650
A newer version (updated for recent CVS by Paul Moore) is 645650.
Superseded by patch 652586 and current CVS.
http://python.org/sf/645650
A competing implementation by Just van Rossum is 652586, which is
the basis for the final implementation of PEP 302. PEP 273 has
been implemented using PEP 302's import hooks.
http://python.org/sf/652586
Copyright
=========
This document has been placed in the public domain.
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End:
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End:

View File

@ -5,205 +5,217 @@ Last-Modified: $Date$
Author: barry@python.org (Barry Warsaw)
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Jun-2002
Python-Version: 2.4
Post-History: 18-Jun-2002, 23-Mar-2004, 22-Aug-2004
Abstract
========
This PEP describes a simpler string substitution feature, also
known as string interpolation. This PEP is "simpler" in two
respects:
This PEP describes a simpler string substitution feature, also
known as string interpolation. This PEP is "simpler" in two
respects:
1. Python's current string substitution feature
(i.e. %-substitution) is complicated and error prone. This PEP
is simpler at the cost of some expressiveness.
1. Python's current string substitution feature
(i.e. ``%``-substitution) is complicated and error prone. This PEP
is simpler at the cost of some expressiveness.
2. PEP 215 proposed an alternative string interpolation feature,
introducing a new `$' string prefix. PEP 292 is simpler than
this because it involves no syntax changes and has much simpler
rules for what substitutions can occur in the string.
2. PEP 215 proposed an alternative string interpolation feature,
introducing a new ``$`` string prefix. PEP 292 is simpler than
this because it involves no syntax changes and has much simpler
rules for what substitutions can occur in the string.
Rationale
=========
Python currently supports a string substitution syntax based on
C's printf() '%' formatting character[1]. While quite rich,
%-formatting codes are also error prone, even for
experienced Python programmers. A common mistake is to leave off
the trailing format character, e.g. the `s' in "%(name)s".
Python currently supports a string substitution syntax based on
C's ``printf()`` '``%``' formatting character [1]_. While quite rich,
``%``-formatting codes are also error prone, even for
experienced Python programmers. A common mistake is to leave off
the trailing format character, e.g. the '``s``' in ``'%(name)s'``.
In addition, the rules for what can follow a % sign are fairly
complex, while the usual application rarely needs such complexity.
Most scripts need to do some string interpolation, but most of
those use simple `stringification' formats, i.e. %s or %(name)s
This form should be made simpler and less error prone.
In addition, the rules for what can follow a ``%`` sign are fairly
complex, while the usual application rarely needs such complexity.
Most scripts need to do some string interpolation, but most of
those use simple 'stringification' formats, i.e. ``%s`` or ``%(name)s``
This form should be made simpler and less error prone.
A Simpler Proposal
==================
We propose the addition of a new class, called 'Template', which
will live in the string module. The Template class supports new
rules for string substitution; its value contains placeholders,
introduced with the $ character. The following rules for
$-placeholders apply:
We propose the addition of a new class, called ``Template``, which
will live in the string module. The ``Template`` class supports new
rules for string substitution; its value contains placeholders,
introduced with the ``$`` character. The following rules for
``$``-placeholders apply:
1. $$ is an escape; it is replaced with a single $
1. ``$$`` is an escape; it is replaced with a single ``$``
2. $identifier names a substitution placeholder matching a mapping
key of "identifier". By default, "identifier" must spell a
Python identifier as defined in [2]. The first non-identifier
character after the $ character terminates this placeholder
specification.
2. ``$identifier`` names a substitution placeholder matching a mapping
key of "identifier". By default, "identifier" must spell a
Python identifier as defined in [2]_. The first non-identifier
character after the ``$`` character terminates this placeholder
specification.
3. ${identifier} is equivalent to $identifier. It is required
when valid identifier characters follow the placeholder but are
not part of the placeholder, e.g. "${noun}ification".
3. ``${identifier}`` is equivalent to ``$identifier``. It is required
when valid identifier characters follow the placeholder but are
not part of the placeholder, e.g. "${noun}ification".
If the $ character appears at the end of the line, or is followed
by any other character than those described above, a ValueError
will be raised at interpolation time. Values in mapping are
converted automatically to strings.
If the ``$`` character appears at the end of the line, or is followed
by any other character than those described above, a ``ValueError``
will be raised at interpolation time. Values in mapping are
converted automatically to strings.
No other characters have special meaning, however it is possible
to derive from the Template class to define different substitution
rules. For example, a derived class could allow for periods in
the placeholder (e.g. to support a kind of dynamic namespace and
attribute path lookup), or could define a delimiter character
other than '$'.
No other characters have special meaning, however it is possible
to derive from the ``Template`` class to define different substitution
rules. For example, a derived class could allow for periods in
the placeholder (e.g. to support a kind of dynamic namespace and
attribute path lookup), or could define a delimiter character
other than ``$``.
Once the Template has been created, substitutions can be performed
by calling one of two methods:
Once the ``Template`` has been created, substitutions can be performed
by calling one of two methods:
- substitute(). This method returns a new string which results
when the values of a mapping are substituted for the
placeholders in the Template. If there are placeholders which
are not present in the mapping, a KeyError will be raised.
- ``substitute()``. This method returns a new string which results
when the values of a mapping are substituted for the
placeholders in the ``Template``. If there are placeholders which
are not present in the mapping, a ``KeyError`` will be raised.
- safe_substitute(). This is similar to the substitute() method,
except that KeyErrors are never raised (due to placeholders
missing from the mapping). When a placeholder is missing, the
original placeholder will appear in the resulting string.
- ``safe_substitute()``. This is similar to the ``substitute()`` method,
except that ``KeyErrors`` are never raised (due to placeholders
missing from the mapping). When a placeholder is missing, the
original placeholder will appear in the resulting string.
Here are some examples:
>>> from string import Template
>>> s = Template('${name} was born in ${country}')
>>> print s.substitute(name='Guido', country='the Netherlands')
Guido was born in the Netherlands
>>> print s.substitute(name='Guido')
Traceback (most recent call last):
[...]
KeyError: 'country'
>>> print s.safe_substitute(name='Guido')
Guido was born in ${country}
The signature of substitute() and safe_substitute() allows for
passing the mapping of placeholders to values, either as a single
dictionary-like object in the first positional argument, or as
keyword arguments as shown above. The exact details and
signatures of these two methods is reserved for the standard
library documentation.
Here are some examples::
Why `$' and Braces?
>>> from string import Template
>>> s = Template('${name} was born in ${country}')
>>> print s.substitute(name='Guido', country='the Netherlands')
Guido was born in the Netherlands
>>> print s.substitute(name='Guido')
Traceback (most recent call last):
[...]
KeyError: 'country'
>>> print s.safe_substitute(name='Guido')
Guido was born in ${country}
The BDFL said it best[4]: "The $ means "substitution" in so many
languages besides Perl that I wonder where you've been. [...]
We're copying this from the shell."
The signature of ``substitute()`` and ``safe_substitute()`` allows for
passing the mapping of placeholders to values, either as a single
dictionary-like object in the first positional argument, or as
keyword arguments as shown above. The exact details and
signatures of these two methods is reserved for the standard
library documentation.
Thus the substitution rules are chosen because of the similarity
with so many other languages. This makes the substitution rules
easier to teach, learn, and remember.
Why ``$`` and Braces?
=====================
The BDFL said it best [4]_: "The ``$`` means "substitution" in so many
languages besides Perl that I wonder where you've been. [...]
We're copying this from the shell."
Thus the substitution rules are chosen because of the similarity
with so many other languages. This makes the substitution rules
easier to teach, learn, and remember.
Comparison to PEP 215
=====================
PEP 215 describes an alternate proposal for string interpolation.
Unlike that PEP, this one does not propose any new syntax for
Python. All the proposed new features are embodied in a new
library module. PEP 215 proposes a new string prefix
representation such as $"" which signal to Python that a new type
of string is present. $-strings would have to interact with the
existing r-prefixes and u-prefixes, essentially doubling the
number of string prefix combinations.
PEP 215 describes an alternate proposal for string interpolation.
Unlike that PEP, this one does not propose any new syntax for
Python. All the proposed new features are embodied in a new
library module. PEP 215 proposes a new string prefix
representation such as ``$""`` which signal to Python that a new type
of string is present. ``$``-strings would have to interact with the
existing r-prefixes and u-prefixes, essentially doubling the
number of string prefix combinations.
PEP 215 also allows for arbitrary Python expressions inside the
$-strings, so that you could do things like:
PEP 215 also allows for arbitrary Python expressions inside the
``$``-strings, so that you could do things like::
import sys
print $"sys = $sys, sys = $sys.modules['sys']"
import sys
print $"sys = $sys, sys = $sys.modules['sys']"
which would return
which would return::
sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
It's generally accepted that the rules in PEP 215 are safe in the
sense that they introduce no new security issues (see PEP 215,
"Security Issues" for details). However, the rules are still
quite complex, and make it more difficult to see the substitution
placeholder in the original $-string.
It's generally accepted that the rules in PEP 215 are safe in the
sense that they introduce no new security issues (see PEP 215,
"Security Issues" for details). However, the rules are still
quite complex, and make it more difficult to see the substitution
placeholder in the original ``$``-string.
The interesting thing is that the Template class defined in this
PEP is designed for inheritance and, with a little extra work,
it's possible to support PEP 215's functionality using existing
Python syntax.
The interesting thing is that the ``Template`` class defined in this
PEP is designed for inheritance and, with a little extra work,
it's possible to support PEP 215's functionality using existing
Python syntax.
For example, one could define subclasses of Template and dict that
allowed for a more complex placeholder syntax and a mapping that
evaluated those placeholders.
For example, one could define subclasses of ``Template`` and dict that
allowed for a more complex placeholder syntax and a mapping that
evaluated those placeholders.
Internationalization
====================
The implementation supports internationalization by recording the
original template string in the Template instance's 'template'
attribute. This attribute would serve as the lookup key in an
gettext-based catalog. It is up to the application to turn the
resulting string back into a Template for substitution.
The implementation supports internationalization by recording the
original template string in the ``Template`` instance's ``template``
attribute. This attribute would serve as the lookup key in an
gettext-based catalog. It is up to the application to turn the
resulting string back into a ``Template`` for substitution.
However, the Template class was designed to work more intuitively
in an internationalized application, by supporting the mixing-in
of Template and unicode subclasses. Thus an internationalized
application could create an application-specific subclass,
multiply inheriting from Template and unicode, and using instances
of that subclass as the gettext catalog key. Further, the
subclass could alias the special __mod__() method to either
.substitute() or .safe_substitute() to provide a more traditional
string/unicode like %-operator substitution syntax.
However, the ``Template`` class was designed to work more intuitively
in an internationalized application, by supporting the mixing-in
of ``Template`` and unicode subclasses. Thus an internationalized
application could create an application-specific subclass,
multiply inheriting from ``Template`` and unicode, and using instances
of that subclass as the gettext catalog key. Further, the
subclass could alias the special ``__mod__()`` method to either
``.substitute()`` or ``.safe_substitute()`` to provide a more traditional
string/unicode like ``%``-operator substitution syntax.
Reference Implementation
========================
The implementation has been committed to the Python 2.4 source tree.
The implementation has been committed to the Python 2.4 source tree.
References
==========
[1] String Formatting Operations
http://docs.python.org/library/stdtypes.html#string-formatting-operations
.. [1] String Formatting Operations
http://docs.python.org/library/stdtypes.html#string-formatting-operations
[2] Identifiers and Keywords
http://docs.python.org/reference/lexical_analysis.html#identifiers-and-keywords
.. [2] Identifiers and Keywords
http://docs.python.org/reference/lexical_analysis.html#identifiers-and-keywords
[3] Guido's python-dev posting from 21-Jul-2002
http://mail.python.org/pipermail/python-dev/2002-July/026397.html
.. [3] Guido's python-dev posting from 21-Jul-2002
http://mail.python.org/pipermail/python-dev/2002-July/026397.html
[4] http://mail.python.org/pipermail/python-dev/2002-June/025652.html
.. [4] http://mail.python.org/pipermail/python-dev/2002-June/025652.html
[5] Reference Implementation
http://sourceforge.net/tracker/index.php?func=detail&aid=1014055&group_id=5470&atid=305470
.. [5] Reference Implementation
http://sourceforge.net/tracker/index.php?func=detail&aid=1014055&group_id=5470&atid=305470
Copyright
=========
This document has been placed in the public domain.
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:

View File

@ -5,214 +5,253 @@ Last-Modified: $Date$
Author: Barry Warsaw, Raymond Hettinger, Anthony Baxter
Status: Final
Type: Informational
Content-Type: text/x-rst
Created: 29-Jul-2003
Python-Version: 2.4
Post-History: 1-Dec-2004
Abstract
========
This document describes the development and release schedule for
Python 2.4. The schedule primarily concerns itself with PEP-sized
items. Small features may be added up to and including the first
beta release. Bugs may be fixed until the final release.
This document describes the development and release schedule for
Python 2.4. The schedule primarily concerns itself with PEP-sized
items. Small features may be added up to and including the first
beta release. Bugs may be fixed until the final release.
There will be at least two alpha releases, two beta releases, and
one release candidate. The release date was 30th November, 2004.
There will be at least two alpha releases, two beta releases, and
one release candidate. The release date was 30th November, 2004.
Release Manager
===============
Anthony Baxter
Anthony Baxter
Martin von Lowis is building the Windows installers, Fred the
doc packages, Sean the RPMs.
Martin von Lowis is building the Windows installers, Fred the
doc packages, Sean the RPMs.
Release Schedule
================
July 9: alpha 1 [completed]
- July 9: alpha 1 [completed]
August 5/6: alpha 2 [completed]
- August 5/6: alpha 2 [completed]
Sept 3: alpha 3 [completed]
- Sept 3: alpha 3 [completed]
October 15: beta 1 [completed]
- October 15: beta 1 [completed]
November 3: beta 2 [completed]
- November 3: beta 2 [completed]
November 18: release candidate 1 [completed]
- November 18: release candidate 1 [completed]
- November 30: final [completed]
November 30: final [completed]
Completed features for 2.4
==========================
PEP 218 Builtin Set Objects.
- PEP 218 Builtin Set Objects.
PEP 289 Generator expressions.
- PEP 289 Generator expressions.
PEP 292 Simpler String Substitutions to be implemented as a module.
- PEP 292 Simpler String Substitutions to be implemented as a module.
PEP 318: Function/method decorator syntax, using @syntax
- PEP 318: Function/method decorator syntax, using @syntax
PEP 322 Reverse Iteration.
- PEP 322 Reverse Iteration.
PEP 327: A Decimal package for fixed precision arithmetic.
- PEP 327: A Decimal package for fixed precision arithmetic.
PEP 328: Multi-line Imports
- PEP 328: Multi-line Imports
Encapsulate the decorate-sort-undecorate pattern in a keyword for
list.sort().
- Encapsulate the decorate-sort-undecorate pattern in a keyword for
``list.sort()``.
Added a builtin called sorted() which may be used in expressions.
- Added a builtin called ``sorted()`` which may be used in expressions.
The itertools module has two new functions, tee() and groupby().
Add a collections module with a deque() object.
- The itertools module has two new functions, ``tee()`` and ``groupby()``.
Add two statistical/reduction functions, nlargest() and nsmallest()
to the heapq module.
- Add a collections module with a ``deque()`` object.
Python's windows installer now uses MSI
- Add two statistical/reduction functions, ``nlargest()`` and ``nsmallest()``
to the heapq module.
Deferred until 2.5:
- Python's windows installer now uses MSI
- Deprecate and/or remove the modules listed in PEP 4 (posixfile,
gopherlib, pre, others)
- Remove support for platforms as described in PEP 11.
Deferred until 2.5
==================
- Finish implementing the Distutils bdist_dpkg command. (AMK)
- Deprecate and/or remove the modules listed in PEP 4 (``posixfile``,
``gopherlib``, ``pre``, ``others``)
- Add support for reading shadow passwords (www.python.org/sf/579435)
- Remove support for platforms as described in PEP 11.
- It would be nice if the built-in SSL socket type could be used
for non-blocking SSL I/O. Currently packages such as Twisted
which implement async servers using SSL have to require third-party
packages such as pyopenssl.
- Finish implementing the Distutils ``bdist_dpkg`` command. (AMK)
- AST-based compiler: this branch was not completed in time for
2.4, but will land on the trunk some time after 2.4 final is
out, for inclusion in 2.5.
- Add support for reading shadow passwords [1]_
- reST is going to be used a lot in Zope3. Maybe it could become
a standard library module? (Since reST's author thinks it's too
instable, I'm inclined not to do this.)
- It would be nice if the built-in SSL socket type could be used
for non-blocking SSL I/O. Currently packages such as Twisted
which implement async servers using SSL have to require third-party
packages such as pyopenssl.
- AST-based compiler: this branch was not completed in time for
2.4, but will land on the trunk some time after 2.4 final is
out, for inclusion in 2.5.
- reST is going to be used a lot in Zope3. Maybe it could become
a standard library module? (Since reST's author thinks it's too
instable, I'm inclined not to do this.)
Ongoing tasks
=============
The following are ongoing TO-DO items which we should attempt to
work on without hoping for completion by any particular date.
The following are ongoing TO-DO items which we should attempt to
work on without hoping for completion by any particular date.
- Documentation: complete the distribution and installation
manuals.
- Documentation: complete the distribution and installation
manuals.
- Documentation: complete the documentation for new-style
classes.
- Documentation: complete the documentation for new-style
classes.
- Look over the Demos/ directory and update where required (Andrew
Kuchling has done a lot of this)
- Look over the Demos/ directory and update where required (Andrew
Kuchling has done a lot of this)
- New tests.
- New tests.
- Fix doc bugs on SF.
- Fix doc bugs on SF.
- Remove use of deprecated features in the core.
- Remove use of deprecated features in the core.
- Document deprecated features appropriately.
- Document deprecated features appropriately.
- Mark deprecated C APIs with Py_DEPRECATED.
- Mark deprecated C APIs with ``Py_DEPRECATED``.
- Deprecate modules which are unmaintained, or perhaps make a new
category for modules 'Unmaintained'
- Deprecate modules which are unmaintained, or perhaps make a new
category for modules 'Unmaintained'
- In general, lots of cleanup so it is easier to move forward.
- In general, lots of cleanup so it is easier to move forward.
Open issues
===========
None at this time.
None at this time.
Carryover features from Python 2.3
==================================
- The import lock could use some redesign. (SF 683658.)
- The import lock could use some redesign. [2]_
- A nicer API to open text files, replacing the ugly (in some
people's eyes) "U" mode flag. There's a proposal out there to
have a new built-in type textfile(filename, mode, encoding).
(Shouldn't it have a bufsize argument too?)
- A nicer API to open text files, replacing the ugly (in some
people's eyes) "U" mode flag. There's a proposal out there to
have a new built-in type textfile(filename, mode, encoding).
(Shouldn't it have a bufsize argument too?)
- New widgets for Tkinter???
- New widgets for Tkinter???
Has anyone gotten the time for this? *Are* there any new
widgets in Tk 8.4? Note that we've got better Tix support
already (though not on Windows yet).
Has anyone gotten the time for this? *Are* there any new
widgets in Tk 8.4? Note that we've got better Tix support
already (though not on Windows yet).
- PEP 304 (Controlling Generation of Bytecode Files by Montanaro)
seems to have lost steam.
- PEP 304 (Controlling Generation of Bytecode Files by Montanaro)
seems to have lost steam.
- For a class defined inside another class, the __name__ should be
"outer.inner", and pickling should work. (SF 633930. I'm no
longer certain this is easy or even right.)
- For a class defined inside another class, the ``__name__`` should be
"outer.inner", and pickling should work. ([3]_. I'm no
longer certain this is easy or even right.)
- Decide on a clearer deprecation policy (especially for modules)
and act on it. For a start, see this message from Neal Norwitz:
http://mail.python.org/pipermail/python-dev/2002-April/023165.html
There seems insufficient interest in moving this further in an
organized fashion, and it's not particularly important.
- Decide on a clearer deprecation policy (especially for modules)
and act on it. For a start, see this message from Neal Norwitz [4]_.
There seems insufficient interest in moving this further in an
organized fashion, and it's not particularly important.
- Provide alternatives for common uses of the types module;
Skip Montanaro has posted a proto-PEP for this idea:
http://mail.python.org/pipermail/python-dev/2002-May/024346.html
There hasn't been any progress on this, AFAICT.
- Provide alternatives for common uses of the types module;
Skip Montanaro has posted a proto-PEP for this idea [5]_.
There hasn't been any progress on this, AFAICT.
- Use pending deprecation for the types and string modules. This
requires providing alternatives for the parts that aren't
covered yet (e.g. string.whitespace and types.TracebackType).
It seems we can't get consensus on this.
- Use pending deprecation for the types and string modules. This
requires providing alternatives for the parts that aren't
covered yet (e.g. ``string.whitespace`` and ``types.TracebackType``).
It seems we can't get consensus on this.
- PEP 262 Database of Installed Python Packages Kuchling
- PEP 262 Database of Installed Python Packages (Kuchling)
This turns out to be useful for Jack Jansen's Python installer,
so the database is worth implementing. Code will go in
sandbox/pep262.
This turns out to be useful for Jack Jansen's Python installer,
so the database is worth implementing. Code will go in
sandbox/pep262.
- PEP 269 Pgen Module for Python Riehl
- PEP 269 Pgen Module for Python (Riehl)
(Some necessary changes are in; the pgen module itself needs to
mature more.)
(Some necessary changes are in; the pgen module itself needs to
mature more.)
- PEP 266 Optimizing Global Variable/Attribute Access Montanaro
PEP 267 Optimized Access to Module Namespaces Hylton
PEP 280 Optimizing access to globals van Rossum
- PEP 266 Optimizing Global Variable/Attribute Access (Montanaro)
These are basically three friendly competing proposals. Jeremy
has made a little progress with a new compiler, but it's going
slowly and the compiler is only the first step. Maybe we'll be
able to refactor the compiler in this release. I'm tempted to
say we won't hold our breath.
PEP 267 Optimized Access to Module Namespaces (Hylton)
- Lazily tracking tuples?
http://mail.python.org/pipermail/python-dev/2002-May/023926.html
http://www.python.org/sf/558745
Not much enthusiasm I believe.
PEP 280 Optimizing access to globals (van Rossum)
- PEP 286 Enhanced Argument Tuples von Loewis
These are basically three friendly competing proposals. Jeremy
has made a little progress with a new compiler, but it's going
slowly and the compiler is only the first step. Maybe we'll be
able to refactor the compiler in this release. I'm tempted to
say we won't hold our breath.
I haven't had the time to review this thoroughly. It seems a
deep optimization hack (also makes better correctness guarantees
though).
- Lazily tracking tuples? [6]_ [7]_
Not much enthusiasm I believe.
- Make 'as' a keyword. It has been a pseudo-keyword long enough.
Too much effort to bother.
- PEP 286 Enhanced Argument Tuples (von Loewis)
I haven't had the time to review this thoroughly. It seems a
deep optimization hack (also makes better correctness guarantees
though).
- Make '``as``' a keyword. It has been a pseudo-keyword long enough.
Too much effort to bother.
References
==========
.. [1] Shadow Password Support Module
https://bugs.python.org/issue579435
.. [2] PyErr_Warn may cause import deadlock
https://bugs.python.org/issue683658
.. [3] Nested class __name__
https://bugs.python.org/issue633930
.. [4] Neal Norwitz, random vs whrandom
https://mail.python.org/pipermail/python-dev/2002-April/023165.html
.. [5] Skip Montanaro, python/dist/src/Lib types.py,1.26,1.27
https://mail.python.org/pipermail/python-dev/2002-May/024346.html
.. [6] Daniel Dunbar, Lazily GC tracking tuples
https://mail.python.org/pipermail/python-dev/2002-May/023926.html
.. [7] GC: untrack simple objects
https://bugs.python.org/issue558745
Copyright
=========
This document has been placed in the public domain.
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:

View File

@ -5,215 +5,229 @@ Last-Modified: $Date$
Author: Michel Pelletier <michel@users.sourceforge.net>
Status: Rejected
Type: Standards Track
Content-Type: text/plain
Content-Type: text/x-rst
Created: 17-Jun-2004
Python-Version: 2.6?
Post-History:
Post-History:
Abstract
========
If Python Virtual Machine (PVM) bytecode is not "well-formed" it
is possible to crash or exploit the PVM by causing various errors
such as under/overflowing the value stack or reading/writing into
arbitrary areas of the PVM program space. Most of these kinds of
errors can be eliminated by verifying that PVM bytecode does not
violate a set of simple constraints before execution.
If Python Virtual Machine (PVM) bytecode is not "well-formed" it
is possible to crash or exploit the PVM by causing various errors
such as under/overflowing the value stack or reading/writing into
arbitrary areas of the PVM program space. Most of these kinds of
errors can be eliminated by verifying that PVM bytecode does not
violate a set of simple constraints before execution.
This PEP proposes a set of constraints on the format and structure
of Python Virtual Machine (PVM) bytecode and provides an
implementation in Python of this verification process.
This PEP proposes a set of constraints on the format and structure
of Python Virtual Machine (PVM) bytecode and provides an
implementation in Python of this verification process.
Pronouncement
=============
Guido believes that a verification tool has some value. If
someone wants to add it to Tools/scripts, no PEP is required.
Guido believes that a verification tool has some value. If
someone wants to add it to Tools/scripts, no PEP is required.
Such a tool may have value for validating the output from
"bytecodehacks" or from direct edits of PYC files. As security
measure, its value is somewhat limited because perfectly valid
bytecode can still do horrible things. That situation could
change if the concept of restricted execution were to be
successfully resurrected.
Such a tool may have value for validating the output from
"bytecodehacks" or from direct edits of PYC files. As security
measure, its value is somewhat limited because perfectly valid
bytecode can still do horrible things. That situation could
change if the concept of restricted execution were to be
successfully resurrected.
Motivation
==========
The Python Virtual Machine executes Python programs that have been
compiled from the Python language into a bytecode representation.
The PVM assumes that any bytecode being executed is "well-formed"
with regard to a number implicit constraints. Some of these
constraints are checked at run-time, but most of them are not due
to the overhead they would create.
The Python Virtual Machine executes Python programs that have been
compiled from the Python language into a bytecode representation.
The PVM assumes that any bytecode being executed is "well-formed"
with regard to a number implicit constraints. Some of these
constraints are checked at run-time, but most of them are not due
to the overhead they would create.
When running in debug mode the PVM does do several run-time checks
to ensure that any particular bytecode cannot violate these
constraints that, to a degree, prevent bytecode from crashing or
exploiting the interpreter. These checks add a measurable
overhead to the interpreter, and are typically turned off in
common use.
When running in debug mode the PVM does do several run-time checks
to ensure that any particular bytecode cannot violate these
constraints that, to a degree, prevent bytecode from crashing or
exploiting the interpreter. These checks add a measurable
overhead to the interpreter, and are typically turned off in
common use.
Bytecode that is not well-formed and executed by a PVM not running
in debug mode may create a variety of fatal and non-fatal errors.
Typically, ill-formed code will cause the PVM to seg-fault and
cause the OS to immediately and abruptly terminate the
interpreter.
Bytecode that is not well-formed and executed by a PVM not running
in debug mode may create a variety of fatal and non-fatal errors.
Typically, ill-formed code will cause the PVM to seg-fault and
cause the OS to immediately and abruptly terminate the
interpreter.
Conceivably, ill-formed bytecode could exploit the interpreter and
allow Python bytecode to execute arbitrary C-level machine
instructions or to modify private, internal data structures in the
interpreter. If used cleverly this could subvert any form of
security policy an application may want to apply to its objects.
Conceivably, ill-formed bytecode could exploit the interpreter and
allow Python bytecode to execute arbitrary C-level machine
instructions or to modify private, internal data structures in the
interpreter. If used cleverly this could subvert any form of
security policy an application may want to apply to its objects.
Practically, it would be difficult for a malicious user to
"inject" invalid bytecode into a PVM for the purposes of
exploitation, but not impossible. Buffer overflow and memory
overwrite attacks are commonly understood, particularly when the
exploit payload is transmitted unencrypted over a network or when
a file or network security permission weakness is used as a
foothold for further attacks.
Practically, it would be difficult for a malicious user to
"inject" invalid bytecode into a PVM for the purposes of
exploitation, but not impossible. Buffer overflow and memory
overwrite attacks are commonly understood, particularly when the
exploit payload is transmitted unencrypted over a network or when
a file or network security permission weakness is used as a
foothold for further attacks.
Ideally, no bytecode should ever be allowed to read or write
underlying C-level data structures to subvert the operation of the
PVM, whether the bytecode was maliciously crafted or not. A
simple pre-execution verification step could ensure that bytecode
cannot over/underflow the value stack or access other sensitive
areas of PVM program space at run-time.
Ideally, no bytecode should ever be allowed to read or write
underlying C-level data structures to subvert the operation of the
PVM, whether the bytecode was maliciously crafted or not. A
simple pre-execution verification step could ensure that bytecode
cannot over/underflow the value stack or access other sensitive
areas of PVM program space at run-time.
This PEP proposes several validation steps that should be taken on
Python bytecode before it is executed by the PVM so that it
compiles with static and structure constraints on its instructions
and their operands. These steps are simple and catch a large
class of invalid bytecode that can cause crashes. There is also
some possibility that some run-time checks can be eliminated up
front by a verification pass.
This PEP proposes several validation steps that should be taken on
Python bytecode before it is executed by the PVM so that it
compiles with static and structure constraints on its instructions
and their operands. These steps are simple and catch a large
class of invalid bytecode that can cause crashes. There is also
some possibility that some run-time checks can be eliminated up
front by a verification pass.
There is, of course, no way to verify that bytecode is "completely
safe", for every definition of complete and safe. Even with
bytecode verification, Python programs can and most likely in the
future will seg-fault for a variety of reasons and continue to
cause many different classes of run-time errors, fatal or not.
The verification step proposed here simply plugs an easy hole that
can cause a large class of fatal and subtle errors at the bytecode
level.
There is, of course, no way to verify that bytecode is "completely
safe", for every definition of complete and safe. Even with
bytecode verification, Python programs can and most likely in the
future will seg-fault for a variety of reasons and continue to
cause many different classes of run-time errors, fatal or not.
The verification step proposed here simply plugs an easy hole that
can cause a large class of fatal and subtle errors at the bytecode
level.
Currently, the Java Virtual Machine (JVM) verifies Java bytecode
in a way very similar to what is proposed here. The JVM
Specification version 2 [1], Sections 4.8 and 4.9 were therefore
used as a basis for some of the constraints explained below. Any
Python bytecode verification implementation at a minimum must
enforce these constraints, but may not be limited to them.
Currently, the Java Virtual Machine (JVM) verifies Java bytecode
in a way very similar to what is proposed here. The JVM
Specification version 2 [1]_, Sections 4.8 and 4.9 were therefore
used as a basis for some of the constraints explained below. Any
Python bytecode verification implementation at a minimum must
enforce these constraints, but may not be limited to them.
Static Constraints on Bytecode Instructions
===========================================
1. The bytecode string must not be empty. (len(co_code) > 0).
1. The bytecode string must not be empty. (``len(co_code) > 0``).
2. The bytecode string cannot exceed a maximum size
(len(co_code) < sizeof(unsigned char) - 1).
2. The bytecode string cannot exceed a maximum size
(``len(co_code) < sizeof(unsigned char) - 1``).
3. The first instruction in the bytecode string begins at index 0.
3. The first instruction in the bytecode string begins at index 0.
4. Only valid byte-codes with the correct number of operands can
be in the bytecode string.
4. Only valid byte-codes with the correct number of operands can
be in the bytecode string.
Static Constraints on Bytecode Instruction Operands
===================================================
1. The target of a jump instruction must be within the code
boundaries and must fall on an instruction, never between an
instruction and its operands.
1. The target of a jump instruction must be within the code
boundaries and must fall on an instruction, never between an
instruction and its operands.
2. The operand of a LOAD_* instruction must be a valid index into
its corresponding data structure.
2. The operand of a ``LOAD_*`` instruction must be a valid index into
its corresponding data structure.
3. The operand of a STORE_* instruction must be a valid index
into its corresponding data structure.
3. The operand of a ``STORE_*`` instruction must be a valid index
into its corresponding data structure.
Structural Constraints between Bytecode Instructions
====================================================
1. Each instruction must only be executed with the appropriate
number of arguments in the value stack, regardless of the
execution path that leads to its invocation.
1. Each instruction must only be executed with the appropriate
number of arguments in the value stack, regardless of the
execution path that leads to its invocation.
2. If an instruction can be executed along several different
execution paths, the value stack must have the same depth prior
to the execution of the instruction, regardless of the path
taken.
2. If an instruction can be executed along several different
execution paths, the value stack must have the same depth prior
to the execution of the instruction, regardless of the path
taken.
3. At no point during execution can the value stack grow to a
depth greater than that implied by co_stacksize.
3. At no point during execution can the value stack grow to a
depth greater than that implied by ``co_stacksize``.
4. Execution never falls off the bottom of co_code.
4. Execution never falls off the bottom of ``co_code``.
Implementation
==============
This PEP is the working document for a Python bytecode
verification implementation written in Python. This
implementation is not used implicitly by the PVM before executing
any bytecode, but is to be used explicitly by users concerned
about possibly invalid bytecode with the following snippet:
This PEP is the working document for a Python bytecode
verification implementation written in Python. This
implementation is not used implicitly by the PVM before executing
any bytecode, but is to be used explicitly by users concerned
about possibly invalid bytecode with the following snippet::
import verify
verify.verify(object)
import verify
verify.verify(object)
The `verify` module provides a `verify` function which accepts the
same kind of arguments as `dis.dis`: classes, methods, functions,
or code objects. It verifies that the object's bytecode is
well-formed according to the specifications of this PEP.
The ``verify`` module provides a ``verify`` function which accepts the
same kind of arguments as ``dis.dis``: classes, methods, functions,
or code objects. It verifies that the object's bytecode is
well-formed according to the specifications of this PEP.
If the code is well-formed the call to `verify` returns silently
without error. If an error is encountered, it throws a
'VerificationError' whose argument indicates the cause of the
failure. It is up to the programmer whether or not to handle the
error in some way or execute the invalid code regardless.
If the code is well-formed the call to ``verify`` returns silently
without error. If an error is encountered, it throws a
``VerificationError`` whose argument indicates the cause of the
failure. It is up to the programmer whether or not to handle the
error in some way or execute the invalid code regardless.
Phillip Eby has proposed a pseudo-code algorithm for bytecode
stack depth verification used by the reference implementation.
Phillip Eby has proposed a pseudo-code algorithm for bytecode
stack depth verification used by the reference implementation.
Verification Issues
===================
This PEP describes only a small number of verifications. While
discussion and analysis will lead to many more, it is highly
possible that future verification may need to be done or custom,
project-specific verifications. For this reason, it might be
desirable to add a verification registration interface to the test
implementation to register future verifiers. The need for this is
minimal since custom verifiers can subclass and extend the current
implementation for added behavior.
This PEP describes only a small number of verifications. While
discussion and analysis will lead to many more, it is highly
possible that future verification may need to be done or custom,
project-specific verifications. For this reason, it might be
desirable to add a verification registration interface to the test
implementation to register future verifiers. The need for this is
minimal since custom verifiers can subclass and extend the current
implementation for added behavior.
Required Changes
================
Armin Rigo noted that several byte-codes will need modification in
order for their stack effect to be statically analyzed. These are
END_FINALLY, POP_BLOCK, and MAKE_CLOSURE. Armin and Guido have
already agreed on how to correct the instructions. Currently the
Python implementation punts on these instructions.
Armin Rigo noted that several byte-codes will need modification in
order for their stack effect to be statically analyzed. These are
``END_FINALLY``, ``POP_BLOCK``, and ``MAKE_CLOSURE``. Armin and Guido have
already agreed on how to correct the instructions. Currently the
Python implementation punts on these instructions.
This PEP does not propose to add the verification step to the
interpreter, but only to provide the Python implementation in the
standard library for optional use. Whether or not this
verification procedure is translated into C, included with the PVM
or enforced in any way is left for future discussion.
This PEP does not propose to add the verification step to the
interpreter, but only to provide the Python implementation in the
standard library for optional use. Whether or not this
verification procedure is translated into C, included with the PVM
or enforced in any way is left for future discussion.
References
==========
[1] The Java Virtual Machine Specification 2nd Edition
http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html
.. [1] The Java Virtual Machine Specification 2nd Edition
http://java.sun.com/docs/books/vmspec/2nd-edition/html/ClassFile.doc.html
Copyright
=========
This document has been placed in the public domain.
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:

View File

@ -5,205 +5,216 @@ Last-Modified: $Date$
Author: Christian R. Reis <kiko at async.com.br>
Status: Final
Type: Standards Track
Content-Type: text/plain
Content-Type: text/x-rst
Created: 19-Jul-2003
Python-Version: 2.4
Post-History: 21-Jul-2003, 13-Aug-2003, 18-Jun-2004
Abstract
========
Support for the LC_NUMERIC locale category in Python 2.3 is
implemented only in Python-space. This causes inconsistent
behavior and thread-safety issues for applications that use
extension modules and libraries implemented in C that parse and
generate floats from strings. This document proposes a plan for
removing this inconsistency by providing and using substitute
locale-agnostic functions as necessary.
Support for the ``LC_NUMERIC`` locale category in Python 2.3 is
implemented only in Python-space. This causes inconsistent
behavior and thread-safety issues for applications that use
extension modules and libraries implemented in C that parse and
generate floats from strings. This document proposes a plan for
removing this inconsistency by providing and using substitute
locale-agnostic functions as necessary.
Introduction
============
Python provides generic localization services through the locale
module, which among other things allows localizing the display and
conversion process of numeric types. Locale categories, such as
LC_TIME and LC_COLLATE, allow configuring precisely what aspects
of the application are to be localized.
Python provides generic localization services through the locale
module, which among other things allows localizing the display and
conversion process of numeric types. Locale categories, such as
``LC_TIME`` and ``LC_COLLATE``, allow configuring precisely what aspects
of the application are to be localized.
The LC_NUMERIC category specifies formatting for non-monetary
numeric information, such as the decimal separator in float and
fixed-precision numbers. Localization of the LC_NUMERIC category
is currently implemented only in Python-space; C libraries invoked
from the Python runtime are unaware of Python's LC_NUMERIC
setting. This is done to avoid changing the behavior of certain
low-level functions that are used by the Python parser and related
code [2].
The ``LC_NUMERIC`` category specifies formatting for non-monetary
numeric information, such as the decimal separator in float and
fixed-precision numbers. Localization of the ``LC_NUMERIC`` category
is currently implemented only in Python-space; C libraries invoked
from the Python runtime are unaware of Python's ``LC_NUMERIC``
setting. This is done to avoid changing the behavior of certain
low-level functions that are used by the Python parser and related
code [2]_.
However, this presents a problem for extension modules that wrap C
libraries. Applications that use these extension modules will
inconsistently display and convert floating-point values.
However, this presents a problem for extension modules that wrap C
libraries. Applications that use these extension modules will
inconsistently display and convert floating-point values.
James Henstridge, the author of PyGTK [3], has additionally
pointed out that the setlocale() function also presents
thread-safety issues, since a thread may call the C library
setlocale() outside of the GIL, and cause Python to parse and
generate floats incorrectly.
James Henstridge, the author of PyGTK [3]_, has additionally
pointed out that the ``setlocale()`` function also presents
thread-safety issues, since a thread may call the C library
``setlocale()`` outside of the GIL, and cause Python to parse and
generate floats incorrectly.
Rationale
=========
The inconsistency between Python and C library localization for
LC_NUMERIC is a problem for any localized application using C
extensions. The exact nature of the problem will vary depending
on the application, but it will most likely occur when parsing or
formatting a floating-point value.
The inconsistency between Python and C library localization for
``LC_NUMERIC`` is a problem for any localized application using C
extensions. The exact nature of the problem will vary depending
on the application, but it will most likely occur when parsing or
formatting a floating-point value.
Example Problem
===============
The initial problem that motivated this PEP is related to the
GtkSpinButton [4] widget in the GTK+ UI toolkit, wrapped by the
PyGTK module. The widget can be set to numeric mode, and when
this occurs, characters typed into it are evaluated as a number.
The initial problem that motivated this PEP is related to the
GtkSpinButton [4]_ widget in the GTK+ UI toolkit, wrapped by the
PyGTK module. The widget can be set to numeric mode, and when
this occurs, characters typed into it are evaluated as a number.
Problems occur when LC_NUMERIC is set to a locale with a float
separator that differs from the C locale's standard (for instance,
`,' instead of `.' for the Brazilian locale pt_BR). Because
LC_NUMERIC is not set at the libc level, float values are
displayed incorrectly (using `.' as a separator) in the
spinbutton's text entry, and it is impossible to enter fractional
values using the `,' separator.
Problems occur when ``LC_NUMERIC`` is set to a locale with a float
separator that differs from the C locale's standard (for instance,
',' instead of '.' for the Brazilian locale pt_BR). Because
``LC_NUMERIC`` is not set at the libc level, float values are
displayed incorrectly (using '.' as a separator) in the
spinbutton's text entry, and it is impossible to enter fractional
values using the ',' separator.
This small example demonstrates reduced usability for localized
applications using this toolkit when coded in Python.
This small example demonstrates reduced usability for localized
applications using this toolkit when coded in Python.
Proposal
========
Martin v. Löwis commented on the initial constraints for an
acceptable solution to the problem on python-dev:
Martin v. Löwis commented on the initial constraints for an
acceptable solution to the problem on python-dev:
- LC_NUMERIC can be set at the C library level without
breaking the parser.
- float() and str() stay locale-unaware.
- locale-aware str() and atof() stay in the locale module.
- ``LC_NUMERIC`` can be set at the C library level without
breaking the parser.
- ``float()`` and ``str()`` stay locale-unaware.
- locale-aware ``str()`` and ``atof()`` stay in the locale module.
An analysis of the Python source suggests that the following
functions currently depend on LC_NUMERIC being set to the C
locale:
An analysis of the Python source suggests that the following
functions currently depend on ``LC_NUMERIC`` being set to the C
locale:
- Python/compile.c:parsenumber()
- Python/marshal.c:r_object()
- Objects/complexobject.c:complex_to_buf()
- Objects/complexobject.c:complex_subtype_from_string()
- Objects/floatobject.c:PyFloat_FromString()
- Objects/floatobject.c:format_float()
- Objects/stringobject.c:formatfloat()
- Modules/stropmodule.c:strop_atof()
- Modules/cPickle.c:load_float()
- ``Python/compile.c:parsenumber()``
- ``Python/marshal.c:r_object()``
- ``Objects/complexobject.c:complex_to_buf()``
- ``Objects/complexobject.c:complex_subtype_from_string()``
- ``Objects/floatobject.c:PyFloat_FromString()``
- ``Objects/floatobject.c:format_float()``
- ``Objects/stringobject.c:formatfloat()``
- ``Modules/stropmodule.c:strop_atof()``
- ``Modules/cPickle.c:load_float()``
The proposed approach is to implement LC_NUMERIC-agnostic
functions for converting from (strtod()/atof()) and to
(snprintf()) float formats, using these functions where the
formatting should not vary according to the user-specified locale.
The proposed approach is to implement ``LC_NUMERIC``-agnostic
functions for converting from (``strtod()``/``atof()``) and to
(``snprintf()``) float formats, using these functions where the
formatting should not vary according to the user-specified locale.
The locale module should also be changed to remove the
special-casing for LC_NUMERIC.
The locale module should also be changed to remove the
special-casing for ``LC_NUMERIC``.
This change should also solve the aforementioned thread-safety
problems.
This change should also solve the aforementioned thread-safety
problems.
Potential Code Contributions
============================
This problem was initially reported as a problem in the GTK+
libraries [5]; since then it has been correctly diagnosed as an
inconsistency in Python's implementation. However, in a fortunate
coincidence, the glib library (developed primarily for GTK+, not
to be confused with the GNU C library) implements a number of
LC_NUMERIC-agnostic functions (for an example, see [6]) for
reasons similar to those presented in this paper.
This problem was initially reported as a problem in the GTK+
libraries [5]_; since then it has been correctly diagnosed as an
inconsistency in Python's implementation. However, in a fortunate
coincidence, the glib library (developed primarily for GTK+, not
to be confused with the GNU C library) implements a number of
``LC_NUMERIC``-agnostic functions (for an example, see [6]_) for
reasons similar to those presented in this paper.
In the same GTK+ problem report, Havoc Pennington suggested that
the glib authors would be willing to contribute this code to the
PSF, which would simplify implementation of this PEP considerably.
Alex Larsson, the original author of the glib code, submitted a
PSF Contributor Agreement [7] on 2003-08-20 [8] to ensure the code
could be safely integrated; this agreement has been received and
accepted.
In the same GTK+ problem report, Havoc Pennington suggested that
the glib authors would be willing to contribute this code to the
PSF, which would simplify implementation of this PEP considerably.
Alex Larsson, the original author of the glib code, submitted a
PSF Contributor Agreement [7]_ on 2003-08-20 [8]_ to ensure the code
could be safely integrated; this agreement has been received and
accepted.
Risks
=====
There may be cross-platform issues with the provided
locale-agnostic functions, though this risk is low given that the
code supplied simply reverses any locale-dependent changes made to
floating-point numbers.
There may be cross-platform issues with the provided
locale-agnostic functions, though this risk is low given that the
code supplied simply reverses any locale-dependent changes made to
floating-point numbers.
Martin and Guido pointed out potential copyright issues with the
contributed code. I believe we will have no problems in this area
as members of the GTK+ and glib teams have said they are fine with
relicensing the code, and a PSF contributor agreement has been
mailed in to ensure this safety.
Martin and Guido pointed out potential copyright issues with the
contributed code. I believe we will have no problems in this area
as members of the GTK+ and glib teams have said they are fine with
relicensing the code, and a PSF contributor agreement has been
mailed in to ensure this safety.
Tim Peters has pointed out [9] that there are situations involving
threading in which the proposed change is insufficient to solve
the problem completely. A complete solution, however, does not
currently exist.
Tim Peters has pointed out [9]_ that there are situations involving
threading in which the proposed change is insufficient to solve
the problem completely. A complete solution, however, does not
currently exist.
Implementation
==============
An implementation was developed by Gustavo Carneiro <gjc at
inescporto.pt>, and attached to Sourceforge.net bug 774665 [10]
An implementation was developed by Gustavo Carneiro <gjc at
inescporto.pt>, and attached to Sourceforge.net bug 774665 [10]_
The final patch [11] was integrated into Python CVS by Martin v.
Löwis on 2004-06-08, as stated in the bug report.
The final patch [11]_ was integrated into Python CVS by Martin v.
Löwis on 2004-06-08, as stated in the bug report.
References
==========
[1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
http://www.python.org/dev/peps/pep-0001/
.. [1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
http://www.python.org/dev/peps/pep-0001/
[2] Python locale documentation for embedding,
http://docs.python.org/library/locale.html
.. [2] Python locale documentation for embedding,
http://docs.python.org/library/locale.html
[3] PyGTK homepage, http://www.daa.com.au/~james/pygtk/
.. [3] PyGTK homepage, http://www.daa.com.au/~james/pygtk/
[4] GtkSpinButton screenshot (demonstrating problem),
http://www.async.com.br/~kiko/spin.png
.. [4] GtkSpinButton screenshot (demonstrating problem),
http://www.async.com.br/~kiko/spin.png
[5] GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=114132
.. [5] GNOME bug report, http://bugzilla.gnome.org/show_bug.cgi?id=114132
[6] Code submission of g_ascii_strtod and g_ascii_dtostr (later
renamed g_ascii_formatd) by Alex Larsson,
http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg00114.html
.. [6] Code submission of g_ascii_strtod and g_ascii_dtostr (later
renamed g_ascii_formatd) by Alex Larsson,
http://mail.gnome.org/archives/gtk-devel-list/2001-October/msg00114.html
[7] PSF Contributor Agreement,
http://www.python.org/psf/psf-contributor-agreement.html
.. [7] PSF Contributor Agreement,
https://www.python.org/psf/contrib/contrib-form/
[8] Alex Larsson's email confirming his agreement was mailed in,
http://mail.python.org/pipermail/python-dev/2003-August/037755.html
.. [8] Alex Larsson's email confirming his agreement was mailed in,
http://mail.python.org/pipermail/python-dev/2003-August/037755.html
[9] Tim Peters' email summarizing LC_NUMERIC trouble with Spambayes,
http://mail.python.org/pipermail/python-dev/2003-September/037898.html
.. [9] Tim Peters' email summarizing LC_NUMERIC trouble with Spambayes,
http://mail.python.org/pipermail/python-dev/2003-September/037898.html
[10] Python bug report, http://www.python.org/sf/774665
.. [10] Python bug report, https://bugs.python.org/issue774665
[11] Integrated LC_NUMERIC-agnostic patch,
https://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=89685&aid=774665
.. [11] Integrated LC_NUMERIC-agnostic patch,
https://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=89685&aid=774665
Copyright
=========
This document has been placed in the public domain.
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

View File

@ -5,201 +5,245 @@ Last-Modified: $Date$
Author: Travis Oliphant <oliphant@ee.byu.edu>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 09-Feb-2006
Python-Version: 2.5
Post-History:
Abstract
This PEP proposes adding an nb_index slot in PyNumberMethods and an
__index__ special method so that arbitrary objects can be used
whenever integers are explicitly needed in Python, such as in slice
syntax (from which the slot gets its name).
Abstract
========
This PEP proposes adding an ``nb_index`` slot in ``PyNumberMethods`` and an
``__index__`` special method so that arbitrary objects can be used
whenever integers are explicitly needed in Python, such as in slice
syntax (from which the slot gets its name).
Rationale
=========
Currently integers and long integers play a special role in
slicing in that they are the only objects allowed in slice
syntax. In other words, if X is an object implementing the
sequence protocol, then X[obj1:obj2] is only valid if obj1 and
obj2 are both integers or long integers. There is no way for obj1
and obj2 to tell Python that they could be reasonably used as
indexes into a sequence. This is an unnecessary limitation.
Currently integers and long integers play a special role in
slicing in that they are the only objects allowed in slice
syntax. In other words, if X is an object implementing the
sequence protocol, then ``X[obj1:obj2]`` is only valid if ``obj1`` and
``obj2`` are both integers or long integers. There is no way for ``obj1``
and ``obj2`` to tell Python that they could be reasonably used as
indexes into a sequence. This is an unnecessary limitation.
In NumPy, for example, there are 8 different integer scalars
corresponding to unsigned and signed integers of 8, 16, 32, and 64
bits. These type-objects could reasonably be used as integers in
many places where Python expects true integers but cannot inherit from
the Python integer type because of incompatible memory layouts.
There should be some way to be able to tell Python that an object can
behave like an integer.
In NumPy, for example, there are 8 different integer scalars
corresponding to unsigned and signed integers of 8, 16, 32, and 64
bits. These type-objects could reasonably be used as integers in
many places where Python expects true integers but cannot inherit from
the Python integer type because of incompatible memory layouts.
There should be some way to be able to tell Python that an object can
behave like an integer.
It is not possible to use the ``nb_int`` (and ``__int__`` special method)
for this purpose because that method is used to *coerce* objects
to integers. It would be inappropriate to allow every object that
can be coerced to an integer to be used as an integer everywhere
Python expects a true integer. For example, if ``__int__`` were used
to convert an object to an integer in slicing, then float objects
would be allowed in slicing and ``x[3.2:5.8]`` would not raise an error
as it should.
It is not possible to use the nb_int (and __int__ special method)
for this purpose because that method is used to *coerce* objects
to integers. It would be inappropriate to allow every object that
can be coerced to an integer to be used as an integer everywhere
Python expects a true integer. For example, if __int__ were used
to convert an object to an integer in slicing, then float objects
would be allowed in slicing and x[3.2:5.8] would not raise an error
as it should.
Proposal
Add an nb_index slot to PyNumberMethods, and a corresponding
__index__ special method. Objects could define a function to
place in the nb_index slot that returns a Python integer
(either an int or a long). This integer can
then be appropriately converted to a Py_ssize_t value whenever
Python needs one such as in PySequence_GetSlice,
PySequence_SetSlice, and PySequence_DelSlice.
========
Specification:
1) The nb_index slot will have the following signature
Add an ``nb_index`` slot to ``PyNumberMethods``, and a corresponding
``__index__`` special method. Objects could define a function to
place in the ``nb_index`` slot that returns a Python integer
(either an int or a long). This integer can
then be appropriately converted to a ``Py_ssize_t`` value whenever
Python needs one such as in ``PySequence_GetSlice``,
``PySequence_SetSlice``, and ``PySequence_DelSlice``.
Specification
=============
1) The ``nb_index`` slot will have the following signature::
PyObject *index_func (PyObject *self)
The returned object must be a Python IntType or
Python LongType. NULL should be returned on
error with an appropriate error set.
The returned object must be a Python ``IntType`` or
Python ``LongType``. NULL should be returned on
error with an appropriate error set.
2) The __index__ special method will have the signature
2) The ``__index__`` special method will have the signature::
def __index__(self):
return obj
where obj must be either an int or a long.
3) 3 new abstract C-API functions will be added
where obj must be either an int or a long.
a) The first checks to see if the object supports the index
slot and if it is filled in.
3) 3 new abstract C-API functions will be added
a) The first checks to see if the object supports the index
slot and if it is filled in.
::
int PyIndex_Check(obj)
This will return true if the object defines the nb_index
slot.
This will return true if the object defines the ``nb_index``
slot.
b) The second is a simple wrapper around the nb_index call that
raises PyExc_TypeError if the call is not available or if it
doesn't return an int or long. Because the
PyIndex_Check is performed inside the PyNumber_Index call
you can call it directly and manage any error rather than
check for compatibility first.
b) The second is a simple wrapper around the ``nb_index`` call that
raises ``PyExc_TypeError`` if the call is not available or if it
doesn't return an int or long. Because the
``PyIndex_Check`` is performed inside the ``PyNumber_Index`` call
you can call it directly and manage any error rather than
check for compatibility first.
::
PyObject *PyNumber_Index (PyObject *obj)
c) The third call helps deal with the common situation of
actually needing a Py_ssize_t value from the object to use for
indexing or other needs.
c) The third call helps deal with the common situation of
actually needing a ``Py_ssize_t`` value from the object to use for
indexing or other needs.
::
Py_ssize_t PyNumber_AsSsize_t(PyObject *obj, PyObject *exc)
The function calls the nb_index slot of obj if it is
available and then converts the returned Python integer into
a Py_ssize_t value. If this goes well, then the value is
returned. The second argument allows control over what
happens if the integer returned from nb_index cannot fit
into a Py_ssize_t value.
The function calls the ``nb_index`` slot of obj if it is
available and then converts the returned Python integer into
a ``Py_ssize_t`` value. If this goes well, then the value is
returned. The second argument allows control over what
happens if the integer returned from ``nb_index`` cannot fit
into a ``Py_ssize_t`` value.
If exc is NULL, then the returned value will be clipped to
``PY_SSIZE_T_MAX`` or ``PY_SSIZE_T_MIN`` depending on whether the
``nb_index`` slot of obj returned a positive or negative
integer. If exc is non-NULL, then it is the error object
that will be set to replace the ``PyExc_OverflowError`` that was
raised when the Python integer or long was converted to ``Py_ssize_t``.
4) A new ``operator.index(obj)`` function will be added that calls
equivalent of obj.``__index__``() and raises an error if obj does not implement
the special method.
If exc is NULL, then the returned value will be clipped to
PY_SSIZE_T_MAX or PY_SSIZE_T_MIN depending on whether the
nb_index slot of obj returned a positive or negative
integer. If exc is non-NULL, then it is the error object
that will be set to replace the PyExc_OverflowError that was
raised when the Python integer or long was converted to Py_ssize_t.
4) A new operator.index(obj) function will be added that calls
equivalent of obj.__index__() and raises an error if obj does not implement
the special method.
Implementation Plan
===================
1) Add the nb_index slot in object.h and modify typeobject.c to
create the __index__ method
1) Add the ``nb_index`` slot in object.h and modify typeobject.c to
create the ``__index__`` method
2) Change the ISINT macro in ceval.c to ISINDEX and alter it to
accommodate objects with the index slot defined.
2) Change the ``ISINT`` macro in ``ceval.c`` to ``ISINDEX`` and alter it to
accommodate objects with the index slot defined.
3) Change the _PyEval_SliceIndex function to accommodate objects
with the index slot defined.
3) Change the ``_PyEval_SliceIndex`` function to accommodate objects
with the index slot defined.
4) Change all builtin objects (e.g. lists) that use the as_mapping
slots for subscript access and use a special-check for integers to
check for the slot as well.
4) Change all builtin objects (e.g. lists) that use the as_mapping
slots for subscript access and use a special-check for integers to
check for the slot as well.
5) Add the nb_index slot to integers and long_integers
(which just return themselves)
5) Add the ``nb_index`` slot to integers and long_integers
(which just return themselves)
6) Add PyNumber_Index C-API to return an integer from any
Python Object that has the nb_index slot.
6) Add ``PyNumber_Index`` C-API to return an integer from any
Python Object that has the ``nb_index`` slot.
7) Add the operator.index(x) function.
7) Add the ``operator.index(x)`` function.
8) Alter arrayobject.c and mmapmodule.c to use the new C-API for their
sub-scripting and other needs.
8) Alter ``arrayobject.c`` and ``mmapmodule.c`` to use the new C-API for their
sub-scripting and other needs.
9) Add unit-tests
9) Add unit-tests
Discussion Questions
====================
Speed:
Speed
-----
Implementation should not slow down Python because integers and long
integers used as indexes will complete in the same number of
instructions. The only change will be that what used to generate
an error will now be acceptable.
Implementation should not slow down Python because integers and long
integers used as indexes will complete in the same number of
instructions. The only change will be that what used to generate
an error will now be acceptable.
Why not use nb_int which is already there?
Why not use ``nb_int`` which is already there?
----------------------------------------------
The nb_int method is used for coercion and so means something
fundamentally different than what is requested here. This PEP
proposes a method for something that *can* already be thought of as
an integer communicate that information to Python when it needs an
integer. The biggest example of why using nb_int would be a bad
thing is that float objects already define the nb_int method, but
float objects *should not* be used as indexes in a sequence.
The ``nb_int`` method is used for coercion and so means something
fundamentally different than what is requested here. This PEP
proposes a method for something that *can* already be thought of as
an integer communicate that information to Python when it needs an
integer. The biggest example of why using ``nb_int`` would be a bad
thing is that float objects already define the ``nb_int`` method, but
float objects *should not* be used as indexes in a sequence.
Why the name __index__?
Why the name ``__index__``?
---------------------------
Some questions were raised regarding the name __index__ when other
interpretations of the slot are possible. For example, the slot
can be used any time Python requires an integer internally (such
as in "mystring" * 3). The name was suggested by Guido because
slicing syntax is the biggest reason for having such a slot and
in the end no better name emerged. See the discussion thread:
http://mail.python.org/pipermail/python-dev/2006-February/thread.html#60594
for examples of names that were suggested such as "__discrete__" and
"__ordinal__".
Some questions were raised regarding the name ``__index__`` when other
interpretations of the slot are possible. For example, the slot
can be used any time Python requires an integer internally (such
as in "mystring" \* 3). The name was suggested by Guido because
slicing syntax is the biggest reason for having such a slot and
in the end no better name emerged. See the discussion thread [1]_
for examples of names that were suggested such as "``__discrete__``" and
"``__ordinal__``".
Why return PyObject * from nb_index?
Why return ``PyObject *`` from ``nb_index``?
--------------------------------------------
Initially Py_ssize_t was selected as the return type for the
nb_index slot. However, this led to an inability to track and
distinguish overflow and underflow errors without ugly and brittle
hacks. As the nb_index slot is used in at least 3 different ways
in the Python core (to get an integer, to get a slice end-point,
and to get a sequence index), there is quite a bit of flexibility
needed to handle all these cases. The importance of having the
necessary flexibility to handle all the use cases is critical.
For example, the initial implementation that returned Py_ssize_t for
nb_index led to the discovery that on a 32-bit machine with >=2GB of RAM
s = 'x' * (2**100) works but len(s) was clipped at 2147483647.
Several fixes were suggested but eventually it was decided that
nb_index needed to return a Python Object similar to the nb_int
and nb_long slots in order to handle overflow correctly.
Initially ``Py_ssize_t`` was selected as the return type for the
``nb_index`` slot. However, this led to an inability to track and
distinguish overflow and underflow errors without ugly and brittle
hacks. As the ``nb_index`` slot is used in at least 3 different ways
in the Python core (to get an integer, to get a slice end-point,
and to get a sequence index), there is quite a bit of flexibility
needed to handle all these cases. The importance of having the
necessary flexibility to handle all the use cases is critical.
For example, the initial implementation that returned ``Py_ssize_t`` for
``nb_index`` led to the discovery that on a 32-bit machine with >=2GB of RAM
``s = 'x' * (2**100)`` works but ``len(s)`` was clipped at 2147483647.
Several fixes were suggested but eventually it was decided that
``nb_index`` needed to return a Python Object similar to the ``nb_int``
and nb_long slots in order to handle overflow correctly.
Why can't __index__ return any object with the nb_index method?
Why can't ``__index__`` return any object with the ``nb_index`` method?
-----------------------------------------------------------------------
This would allow infinite recursion in many different ways that are not
easy to check for. This restriction is similar to the requirement that
``__nonzero__`` return an int or a bool.
This would allow infinite recursion in many different ways that are not
easy to check for. This restriction is similar to the requirement that
__nonzero__ return an int or a bool.
Reference Implementation
Submitted as patch 1436368 to SourceForge.
========================
Submitted as patch 1436368 to SourceForge.
References
==========
.. [1] Travis Oliphant, PEP for adding an sq_index slot so that any object, a
or b, can be used in X[a:b] notation,
http://mail.python.org/pipermail/python-dev/2006-February/thread.html#60594
Copyright
=========
This document is placed in the public domain.
This document is placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

View File

@ -5,210 +5,235 @@ Last-Modified: $Date$
Author: Jim J. Jewett <jimjjewett@gmail.com>
Status: Rejected
Type: Standards Track
Content-Type: text/plain
Content-Type: text/x-rst
Created: 22-Apr-2007
Python-Version: 3.0
Post-History: 22-Apr-2007
Rejection Notice
================
This PEP is rejected. It is not clear how it should be
implemented or what the precise semantics should be in edge cases,
and there aren't enough important use cases given. response has
been lukewarm at best.
This PEP is rejected. It is not clear how it should be
implemented or what the precise semantics should be in edge cases,
and there aren't enough important use cases given. response has
been lukewarm at best.
Abstract
========
It is common to need a reference to the current module, class,
or function, but there is currently no entirely correct way to
do this. This PEP proposes adding the keywords __module__,
__class__, and __function__.
It is common to need a reference to the current module, class,
or function, but there is currently no entirely correct way to
do this. This PEP proposes adding the keywords ``__module__``,
``__class__``, and ``__function__``.
Rationale for __module__
========================
Many modules export various functions, classes, and other objects,
but will perform additional activities (such as running unit
tests) when run as a script. The current idiom is to test whether
the module's name has been set to magic value.
Many modules export various functions, classes, and other objects,
but will perform additional activities (such as running unit
tests) when run as a script. The current idiom is to test whether
the module's name has been set to magic value.
if __name__ == "__main__": ...
::
More complicated introspection requires a module to (attempt to)
import itself. If importing the expected name actually produces
a different module, there is no good workaround.
if __name__ == "__main__": ...
# __import__ lets you use a variable, but... it gets more
# complicated if the module is in a package.
__import__(__name__)
More complicated introspection requires a module to (attempt to)
import itself. If importing the expected name actually produces
a different module, there is no good workaround.
# So just go to sys modules... and hope that the module wasn't
# hidden/removed (perhaps for security), that __name__ wasn't
# changed, and definitely hope that no other module with the
# same name is now available.
class X(object):
pass
::
import sys
mod = sys.modules[__name__]
mod = sys.modules[X.__class__.__module__]
# __import__ lets you use a variable, but... it gets more
# complicated if the module is in a package.
__import__(__name__)
Proposal: Add a __module__ keyword which refers to the module
currently being defined (executed). (But see open issues.)
# So just go to sys modules... and hope that the module wasn't
# hidden/removed (perhaps for security), that __name__ wasn't
# changed, and definitely hope that no other module with the
# same name is now available.
class X(object):
pass
# XXX sys.main is still changing as draft progresses. May
# really need sys.modules[sys.main]
if __module__ is sys.main: # assumes PEP (3122), Cannon
...
import sys
mod = sys.modules[__name__]
mod = sys.modules[X.__class__.__module__]
Proposal: Add a ``__module__`` keyword which refers to the module
currently being defined (executed). (But see open issues.)
::
# XXX sys.main is still changing as draft progresses. May
# really need sys.modules[sys.main]
if __module__ is sys.main: # assumes PEP (3122), Cannon
...
Rationale for __class__
=======================
Class methods are passed the current instance; from this they can
determine self.__class__ (or cls, for class methods).
Unfortunately, this reference is to the object's actual class,
which may be a subclass of the defining class. The current
workaround is to repeat the name of the class, and assume that the
name will not be rebound.
Class methods are passed the current instance; from this they can
determine ``self.__class__`` (or cls, for class methods).
Unfortunately, this reference is to the object's actual class,
which may be a subclass of the defining class. The current
workaround is to repeat the name of the class, and assume that the
name will not be rebound.
class C(B):
::
def meth(self):
super(C, self).meth() # Hope C is never rebound.
class C(B):
class D(C):
def meth(self):
super(C, self).meth() # Hope C is never rebound.
def meth(self):
# ?!? issubclass(D,C), so it "works":
super(C, self).meth()
class D(C):
Proposal: Add a __class__ keyword which refers to the class
currently being defined (executed). (But see open issues.)
def meth(self):
# ?!? issubclass(D,C), so it "works":
super(C, self).meth()
class C(B):
def meth(self):
super(__class__, self).meth()
Proposal: Add a ``__class__`` keyword which refers to the class
currently being defined (executed). (But see open issues.)
Note that super calls may be further simplified by the "New Super"
PEP (Spealman). The __class__ (or __this_class__) attribute came
up in attempts to simplify the explanation and/or implementation
of that PEP, but was separated out as an independent decision.
::
Note that __class__ (or __this_class__) is not quite the same as
the __thisclass__ property on bound super objects. The existing
super.__thisclass__ property refers to the class from which the
Method Resolution Order search begins. In the above class D, it
would refer to (the current reference of name) C.
class C(B):
def meth(self):
super(__class__, self).meth()
Note that super calls may be further simplified by the "New Super"
PEP (Spealman). The ``__class__`` (or ``__this_class__``) attribute came
up in attempts to simplify the explanation and/or implementation
of that PEP, but was separated out as an independent decision.
Note that ``__class__`` (or ``__this_class__``) is not quite the same as
the ``__thisclass__`` property on bound super objects. The existing
super.``__thisclass__`` property refers to the class from which the
Method Resolution Order search begins. In the above class D, it
would refer to (the current reference of name) C.
Rationale for __function__
Rationale for ``__function__``
==============================
Functions (including methods) often want access to themselves,
usually for a private storage location or true recursion. While
there are several workarounds, all have their drawbacks.
Functions (including methods) often want access to themselves,
usually for a private storage location or true recursion. While
there are several workarounds, all have their drawbacks.
def counter(_total=[0]):
# _total shouldn't really appear in the
# signature at all; the list wrapping and
# [0] unwrapping obscure the code
_total[0] += 1
return _total[0]
::
@annotate(total=0)
def counter():
# Assume name counter is never rebound:
counter.total += 1
return counter.total
def counter(_total=[0]):
# _total shouldn't really appear in the
# signature at all; the list wrapping and
# [0] unwrapping obscure the code
_total[0] += 1
return _total[0]
# class exists only to provide storage:
class _wrap(object):
@annotate(total=0)
def counter():
# Assume name counter is never rebound:
counter.total += 1
return counter.total
__total = 0
# class exists only to provide storage:
class _wrap(object):
def f(self):
self.__total += 1
return self.__total
__total = 0
# set module attribute to a bound method:
accum = _wrap().f
def f(self):
self.__total += 1
return self.__total
# This function calls "factorial", which should be itself --
# but the same programming styles that use heavy recursion
# often have a greater willingness to rebind function names.
def factorial(n):
return (n * factorial(n-1) if n else 1)
# set module attribute to a bound method:
accum = _wrap().f
Proposal: Add a __function__ keyword which refers to the function
(or method) currently being defined (executed). (But see open
issues.)
# This function calls "factorial", which should be itself --
# but the same programming styles that use heavy recursion
# often have a greater willingness to rebind function names.
def factorial(n):
return (n * factorial(n-1) if n else 1)
@annotate(total=0)
def counter():
# Always refers to this function obj:
__function__.total += 1
return __function__.total
Proposal: Add a ``__function__`` keyword which refers to the function
(or method) currently being defined (executed). (But see open
issues.)
def factorial(n):
return (n * __function__(n-1) if n else 1)
::
@annotate(total=0)
def counter():
# Always refers to this function obj:
__function__.total += 1
return __function__.total
def factorial(n):
return (n * __function__(n-1) if n else 1)
Backwards Compatibility
=======================
While a user could be using these names already, double-underscore
names ( __anything__ ) are explicitly reserved to the interpreter.
It is therefore acceptable to introduce special meaning to these
names within a single feature release.
While a user could be using these names already, double-underscore
names ( ``__anything__`` ) are explicitly reserved to the interpreter.
It is therefore acceptable to introduce special meaning to these
names within a single feature release.
Implementation
==============
Ideally, these names would be keywords treated specially by the
bytecode compiler.
Ideally, these names would be keywords treated specially by the
bytecode compiler.
Guido has suggested [1] using a cell variable filled in by the
metaclass.
Guido has suggested [1]_ using a cell variable filled in by the
metaclass.
Michele Simionato has provided a prototype using bytecode hacks
[2]. This does not require any new bytecode operators; it just
modifies the which specific sequence of existing operators gets
run.
Michele Simionato has provided a prototype using bytecode hacks [2]_.
This does not require any new bytecode operators; it just
modifies the which specific sequence of existing operators gets
run.
Open Issues
===========
- Are __module__, __class__, and __function__ the right names? In
particular, should the names include the word "this", either as
__this_module__, __this_class__, and __this_function__, (format
discussed on the python-3000 and python-ideas lists) or as
__thismodule__, __thisclass__, and __thisfunction__ (inspired
by, but conflicting with, current usage of super.__thisclass__).
- Are ``__module__``, ``__class__``, and ``__function__`` the right names? In
particular, should the names include the word "this", either as
``__this_module__``, ``__this_class__``, and ``__this_function__``, (format
discussed on the python-3000 and python-ideas lists) or as
``__thismodule__``, ``__thisclass__``, and ``__thisfunction__`` (inspired
by, but conflicting with, current usage of super.``__thisclass__``).
- Are all three keywords needed, or should this enhancement be
limited to a subset of the objects? Should methods be treated
separately from other functions?
- Are all three keywords needed, or should this enhancement be
limited to a subset of the objects? Should methods be treated
separately from other functions?
References
==========
[1] Fixing super anyone? Guido van Rossum
http://mail.python.org/pipermail/python-3000/2007-April/006671.html
.. [1] Fixing super anyone? Guido van Rossum
http://mail.python.org/pipermail/python-3000/2007-April/006671.html
[2] Descriptor/Decorator challenge, Michele Simionato
http://groups.google.com/group/comp.lang.python/browse_frm/thread/a6010c7494871bb1/62a2da68961caeb6?lnk=gst&q=simionato+challenge&rnum=1&hl=en#62a2da68961caeb6
.. [2] Descriptor/Decorator challenge, Michele Simionato
http://groups.google.com/group/comp.lang.python/browse_frm/thread/a6010c7494871bb1/62a2da68961caeb6?lnk=gst&q=simionato+challenge&rnum=1&hl=en#62a2da68961caeb6
Copyright
=========
This document has been placed in the public domain.
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

View File

@ -3,203 +3,221 @@ Title: str(container) should call str(item), not repr(item)
Version: $Revision$
Last-Modified: $Date$
Author: Oleg Broytman <phd@phdru.name>,
Jim J. Jewett <jimjjewett@gmail.com>
Jim J. Jewett <jimjjewett@gmail.com>
Discussions-To: python-3000@python.org
Status: Rejected
Type: Standards Track
Content-Type: text/plain
Content-Type: text/x-rst
Created: 27-May-2008
Post-History: 28-May-2008
Rejection
=========
Guido said this would cause too much disturbance too close to beta. See
http://mail.python.org/pipermail/python-3000/2008-May/013876.html.
Guido said this would cause too much disturbance too close to beta. See [1]_.
Abstract
========
This document discusses the advantages and disadvantages of the
current implementation of str(container). It also discusses the
pros and cons of a different approach - to call str(item) instead
of repr(item).
This document discusses the advantages and disadvantages of the
current implementation of ``str(container)``. It also discusses the
pros and cons of a different approach - to call ``str(item)`` instead
of ``repr(item)``.
Motivation
==========
Currently str(container) calls repr on items. Arguments for it:
-- containers refuse to guess what the user wants to see on
str(container) - surroundings, delimiters, and so on;
-- repr(item) usually displays type information - apostrophes
around strings, class names, etc.
Currently ``str(container)`` calls ``repr`` on items. Arguments for it:
Arguments against:
-- it's illogical; str() is expected to call __str__ if it exists,
not __repr__;
-- there is no standard way to print a container's content calling
items' __str__, that's inconvenient in cases where __str__ and
__repr__ return different results;
-- repr(item) sometimes do wrong things (hex-escapes non-ASCII
strings, e.g.)
* containers refuse to guess what the user wants to see on
``str(container)`` - surroundings, delimiters, and so on;
* ``repr(item)`` usually displays type information - apostrophes
around strings, class names, etc.
This PEP proposes to change how str(container) works. It is
proposed to mimic how repr(container) works except one detail
- call str on items instead of repr. This allows a user to choose
what results she want to get - from item.__repr__ or item.__str__.
Arguments against:
* it's illogical; ``str()`` is expected to call ``__str__`` if it exists,
not ``__repr__``;
* there is no standard way to print a container's content calling
items' ``__str__``, that's inconvenient in cases where ``__str__`` and
``__repr__`` return different results;
* ``repr(item)`` sometimes do wrong things (hex-escapes non-ASCII strings,
e.g.)
This PEP proposes to change how ``str(container)`` works. It is
proposed to mimic how ``repr(container)`` works except one detail - call
``str`` on items instead of ``repr``. This allows a user to choose
what results she want to get - from ``item.__repr__`` or ``item.__str__``.
Current situation
=================
Most container types (tuples, lists, dicts, sets, etc.) do not
implement __str__ method, so str(container) calls
container.__repr__, and container.__repr__, once called, forgets
it is called from str and always calls repr on the container's
items.
Most container types (tuples, lists, dicts, sets, etc.) do not
implement ``__str__`` method, so ``str(container)`` calls
``container.__repr__``, and ``container.__repr__``, once called, forgets
it is called from ``str`` and always calls ``repr`` on the container's
items.
This behaviour has advantages and disadvantages. One advantage is
that most items are represented with type information - strings
are surrounded by apostrophes, instances may have both class name
and instance data:
This behaviour has advantages and disadvantages. One advantage is
that most items are represented with type information - strings
are surrounded by apostrophes, instances may have both class name
and instance data::
>>> print([42, '42'])
[42, '42']
>>> print([Decimal('42'), datetime.now()])
[Decimal("42"), datetime.datetime(2008, 5, 27, 19, 57, 43, 485028)]
>>> print([42, '42'])
[42, '42']
>>> print([Decimal('42'), datetime.now()])
[Decimal("42"), datetime.datetime(2008, 5, 27, 19, 57, 43, 485028)]
The disadvantage is that __repr__ often returns technical data
(like '<object at address>') or unreadable string (hex-encoded
string if the input is non-ASCII string):
The disadvantage is that ``__repr__`` often returns technical data
(like '``<object at address>``') or unreadable string (hex-encoded
string if the input is non-ASCII string)::
>>> print(['тест'])
['\xd4\xc5\xd3\xd4']
>>> print(['тест'])
['\xd4\xc5\xd3\xd4']
One of the motivations for PEP 3138 is that neither repr nor str
will allow the sensible printing of dicts whose keys are non-ASCII
text strings. Now that Unicode identifiers are allowed, it
includes Python's own attribute dicts. This also includes JSON
serialization (and caused some hoops for the json lib).
One of the motivations for PEP 3138 is that neither ``repr`` nor ``str``
will allow the sensible printing of dicts whose keys are non-ASCII
text strings. Now that Unicode identifiers are allowed, it
includes Python's own attribute dicts. This also includes JSON
serialization (and caused some hoops for the json lib).
PEP 3138 proposes to fix this by breaking the "repr is safe ASCII"
invariant, and changing the way repr (which is used for
persistence) outputs some objects, with system-dependent failures.
PEP 3138 proposes to fix this by breaking the "repr is safe ASCII"
invariant, and changing the way ``repr`` (which is used for
persistence) outputs some objects, with system-dependent failures.
Changing how str(container) works would allow easy debugging in
the normal case, and retain the safety of ASCII-only for the
machine-readable case. The only downside is that str(x) and
repr(x) would more often be different -- but only in those cases
where the current almost-the-same version is insufficient.
Changing how ``str(container)`` works would allow easy debugging in
the normal case, and retain the safety of ASCII-only for the
machine-readable case. The only downside is that ``str(x)`` and
``repr(x)`` would more often be different -- but only in those cases
where the current almost-the-same version is insufficient.
It also seems illogical that str(container) calls repr on items
instead of str. It's only logical to expect following code
It also seems illogical that ``str(container)`` calls ``repr`` on items
instead of ``str``. It's only logical to expect following code::
class Test:
def __str__(self):
return "STR"
class Test:
def __str__(self):
return "STR"
def __repr__(self):
return "REPR"
def __repr__(self):
return "REPR"
test = Test()
print(test)
print(repr(test))
print([test])
print(str([test]))
test = Test()
print(test)
print(repr(test))
print([test])
print(str([test]))
to print
to print::
STR
REPR
[STR]
[STR]
STR
REPR
[STR]
[STR]
where it actually prints
where it actually prints::
STR
REPR
[REPR]
[REPR]
STR
REPR
[REPR]
[REPR]
Especially it is illogical to see that print in Python 2 uses str
if it is called on what seems to be a tuple:
Especially it is illogical to see that print in Python 2 uses ``str``
if it is called on what seems to be a tuple::
>>> print Decimal('42'), datetime.now()
42 2008-05-27 20:16:22.534285
>>> print Decimal('42'), datetime.now()
42 2008-05-27 20:16:22.534285
where on an actual tuple it prints
where on an actual tuple it prints::
>>> print((Decimal('42'), datetime.now()))
(Decimal("42"), datetime.datetime(2008, 5, 27, 20, 16, 27, 937911))
>>> print((Decimal('42'), datetime.now()))
(Decimal("42"), datetime.datetime(2008, 5, 27, 20, 16, 27, 937911))
A different approach - call str(item)
A different approach - call ``str(item)``
=========================================
For example, with numbers it is often only the value that people
care about.
For example, with numbers it is often only the value that people
care about.
>>> print Decimal('3')
3
::
But putting the value in a list forces users to read the type
information, exactly as if repr had been called for the benefit of
a machine:
>>> print Decimal('3')
3
>>> print [Decimal('3')]
[Decimal("3")]
But putting the value in a list forces users to read the type
information, exactly as if ``repr`` had been called for the benefit of
a machine::
After this change, the type information would not clutter the str
output:
>>> print [Decimal('3')]
[Decimal("3")]
>>> print "%s".format([Decimal('3')])
[3]
>>> str([Decimal('3')]) # ==
[3]
After this change, the type information would not clutter the ``str``
output::
But it would still be available if desired:
>>> print "%s".format([Decimal('3')])
[3]
>>> str([Decimal('3')]) # ==
[3]
>>> print "%r".format([Decimal('3')])
[Decimal('3')]
>>> repr([Decimal('3')]) # ==
[Decimal('3')]
But it would still be available if desired::
There is a number of strategies to fix the problem. The most
radical is to change __repr__ so it accepts a new parameter (flag)
"called from str, so call str on items, not repr". The
drawback of the proposal is that every __repr__ implementation
must be changed. Introspection could help a bit (inspect __repr__
before calling if it accepts 2 or 3 parameters), but introspection
doesn't work on classes written in C, like all built-in containers.
>>> print "%r".format([Decimal('3')])
[Decimal('3')]
>>> repr([Decimal('3')]) # ==
[Decimal('3')]
Less radical proposal is to implement __str__ methods for built-in
container types. The obvious drawback is a duplication of effort
- all those __str__ and __repr__ implementations are only differ
in one small detail - if they call str or repr on items.
There is a number of strategies to fix the problem. The most
radical is to change ``__repr__`` so it accepts a new parameter (flag)
"called from ``str``, so call ``str`` on items, not ``repr``". The
drawback of the proposal is that every ``__repr__`` implementation
must be changed. Introspection could help a bit (inspect ``__repr__``
before calling if it accepts 2 or 3 parameters), but introspection
doesn't work on classes written in C, like all built-in containers.
The most conservative proposal is not to change str at all but
to allow developers to implement their own application- or
library-specific pretty-printers. The drawback is again
a multiplication of effort and proliferation of many small
specific container-traversal algorithms.
Less radical proposal is to implement ``__str__`` methods for built-in
container types. The obvious drawback is a duplication of effort - all
those ``__str__`` and ``__repr__`` implementations are only differ
in one small detail - if they call ``str`` or ``repr`` on items.
The most conservative proposal is not to change str at all but
to allow developers to implement their own application- or
library-specific pretty-printers. The drawback is again
a multiplication of effort and proliferation of many small
specific container-traversal algorithms.
Backward compatibility
======================
In those cases where type information is more important than
usual, it will still be possible to get the current results by
calling repr explicitly.
In those cases where type information is more important than
usual, it will still be possible to get the current results by
calling ``repr`` explicitly.
References
==========
.. [1] Guido van Rossum, PEP: str(container) should call str(item), not
repr(item)
http://mail.python.org/pipermail/python-3000/2008-May/013876.html
Copyright
=========
This document has been placed in the public domain.
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: