New version of PEP 349. Propose that str() be changed rather than

adding a new built-in function.
This commit is contained in:
Neil Schemenauer 2005-08-22 21:12:08 +00:00
parent 35b61a7c94
commit fb89a4ee52
2 changed files with 46 additions and 80 deletions

View File

@ -105,7 +105,7 @@ Index by Category
S 345 Metadata for Python Software Packages 1.2 Jones S 345 Metadata for Python Software Packages 1.2 Jones
P 347 Migrating the Python CVS to Subversion von Löwis P 347 Migrating the Python CVS to Subversion von Löwis
S 348 Exception Reorganization for Python 3.0 Cannon S 348 Exception Reorganization for Python 3.0 Cannon
S 349 Generalized String Coercion Schemenauer S 349 Allow str() to return unicode strings Schemenauer
S 754 IEEE 754 Floating Point Special Values Warnes S 754 IEEE 754 Floating Point Special Values Warnes
Finished PEPs (done, implemented in CVS) Finished PEPs (done, implemented in CVS)
@ -393,7 +393,7 @@ Numerical Index
SR 346 User Defined ("with") Statements Coghlan SR 346 User Defined ("with") Statements Coghlan
P 347 Migrating the Python CVS to Subversion von Löwis P 347 Migrating the Python CVS to Subversion von Löwis
S 348 Exception Reorganization for Python 3.0 Cannon S 348 Exception Reorganization for Python 3.0 Cannon
S 349 Generalized String Coercion Schemenauer S 349 Allow str() to return unicode strings Schemenauer
SR 666 Reject Foolish Indentation Creighton SR 666 Reject Foolish Indentation Creighton
S 754 IEEE 754 Floating Point Special Values Warnes S 754 IEEE 754 Floating Point Special Values Warnes
I 3000 Python 3.0 Plans Kuchling, Cannon I 3000 Python 3.0 Plans Kuchling, Cannon

View File

@ -1,5 +1,5 @@
PEP: 349 PEP: 349
Title: Generalised String Coercion Title: Allow str() to return unicode strings
Version: $Revision$ Version: $Revision$
Last-Modified: $Date$ Last-Modified: $Date$
Author: Neil Schemenauer <nas@arctrix.com> Author: Neil Schemenauer <nas@arctrix.com>
@ -7,20 +7,18 @@ Status: Draft
Type: Standards Track Type: Standards Track
Content-Type: text/plain Content-Type: text/plain
Created: 02-Aug-2005 Created: 02-Aug-2005
Post-History: Post-History: 06-Aug-2005
Python-Version: 2.5 Python-Version: 2.5
Abstract Abstract
This PEP proposes the introduction of a new built-in function, This PEP proposes to change the str() built-in function so that it
text(), that provides a way of generating a string representation can return unicode strings. This change would make it easier to
of an object without forcing the result to be a particular string write code that works with either string type and would also make
type. In addition, the behavior %s format specifier would be some existing code handle unicode strings. The C function
changed to call text() on the argument. These two changes would PyObject_Str() would remain unchanged and the function
make it easier to write library code that can be used by PyString_New() would be added instead.
applications that use only the str type and by others that also use
the unicode type.
Rationale Rationale
@ -64,51 +62,35 @@ Rationale
object; an operation traditionally accomplished by using the str() object; an operation traditionally accomplished by using the str()
built-in function. built-in function.
Using str() makes the code not Unicode-safe. Replacing a str() Using the current str() function makes the code not Unicode-safe.
call with a unicode() call makes the code not str-stable. Using a Replacing a str() call with a unicode() call makes the code not
string format almost accomplishes the goal but not quite. str-stable. Changing str() so that it could return unicode
Consider the following code: instances would solve this problem. As a further benefit, some code
that is currently not Unicode-safe because it uses str() would
def text(obj): become Unicode-safe.
return '%s' % obj
It behaves as desired except if 'obj' is not a basestring instance
and needs to return a Unicode representation of itself. In that
case, the string format will attempt to coerce the result of
__str__ to a str instance. Defining a __unicode__ method does not
help since it will only be called if the right-hand operand is a
unicode instance. Using a unicode instance for the right-hand
operand does not work because the function is no longer str-stable
(i.e. it will coerce everything to unicode).
Specification Specification
A Python implementation of the text() built-in follows: A Python implementation of the str() built-in follows:
def text(s): def str(s):
"""Return a nice string representation of the object. The """Return a nice string representation of the object. The
return value is a basestring instance. return value is a str or unicode instance.
""" """
if isinstance(s, basestring): if type(s) is str or type(s) is unicode:
return s return s
r = s.__str__() r = s.__str__()
if not isinstance(r, basestring): if not isinstance(r, (str, unicode)):
raise TypeError('__str__ returned non-string') raise TypeError('__str__ returned non-string')
return r return r
Note that it is currently possible, although not very useful, to
write __str__ methods that return unicode instances.
The %s format specifier for str objects would be changed to call
text() on the argument. Currently it calls str() unless the
argument is a unicode instance (in which case the object is
substituted as is and the % operation returns a unicode instance).
The following function would be added to the C API and would be the The following function would be added to the C API and would be the
equivalent of the text() function: equivalent to the str() built-in (ideally it be called PyObject_Str,
but changing that function could cause a massive number of
compatibility problems):
PyObject *PyObject_Text(PyObject *o); PyObject *PyString_New(PyObject *);
A reference implementation is available on Sourceforge [1] as a A reference implementation is available on Sourceforge [1] as a
patch. patch.
@ -116,52 +98,36 @@ Specification
Backwards Compatibility Backwards Compatibility
The change to the %s format specifier would result in some % Some code may require that str() returns a str instance. In the
operations returning a unicode instance rather than raising a standard library, only one such case has been found so far. The
UnicodeDecodeError exception. It seems unlikely that the change function email.header_decode() requires a str instance and the
would break currently working code. email.Header.decode_header() function tries to ensure this by
calling str() on its argument. The code was fixed by changing
the line "header = str(header)" to:
if isinstance(header, unicode):
header = header.encode('ascii')
Whether this is truly a bug is questionable since decode_header()
really operates on byte strings, not character strings. Code that
passes it a unicode instance could itself be considered buggy.
Alternative Solutions Alternative Solutions
Rather than adding the text() built-in, if PEP 246 were A new built-in function could be added instead of changing str().
implemented then adapt(s, basestring) could be equivalent to Doing so would introduce virtually no backwards compatibility
text(s). The advantage would be one less built-in function. The problems. However, since the compatibility problems are expected to
problem is that PEP 246 is not implemented. rare, changing str() seems preferable to adding a new built-in.
Fredrik Lundh has suggested [2] that perhaps a new slot should be The basestring type could be changed to have the proposed behaviour,
added (e.g. __text__), that could return any kind of string that's rather than changing str(). However, that would be confusing
compatible with Python's text model. That seems like an behaviour for an abstract base type.
attractive idea but many details would still need to be worked
out.
Instead of providing the text() built-in, the %s format specifier
could be changed and a string format could be used instead of
calling text(). However, it seems like the operation is important
enough to justify a built-in.
Instead of providing the text() built-in, the basestring type
could be changed to provide the same functionality. That would
possibly be confusing behaviour for an abstract base type.
Some people have suggested [3] that an easier migration path would
be to change the default encoding to be UTF-8. Code that is not
Unicode safe would then encode Unicode strings as UTF-8 and
operate on them as str instances, rather than raising a
UnicodeDecodeError exception. Other code would assume that str
instances were encoded using UTF-8 and decode them if necessary.
While that solution may work for some applications, it seems
unsuitable as a general solution. For example, some applications
get string data from many different sources and assuming that all
str instances were encoded using UTF-8 could easily introduce
subtle bugs.
References References
[1] http://www.python.org/sf/1159501 [1] http://www.python.org/sf/1266570
[2] http://mail.python.org/pipermail/python-dev/2004-September/048755.html
[3] http://blog.ianbicking.org/illusive-setdefaultencoding.html
Copyright Copyright