New version of PEP 349. Propose that str() be changed rather than

adding a new built-in function.
2005-08-22 21:12:08 +00:00 · 2005-08-22 21:12:08 +00:00 · fb89a4ee52
parent 35b61a7c94
commit fb89a4ee52
2 changed files with 46 additions and 80 deletions
--- a/pep-0000.txt
+++ b/pep-0000.txt
@ -105,7 +105,7 @@ Index by Category
 S   345  Metadata for Python Software Packages 1.2    Jones
 P   347  Migrating the Python CVS to Subversion       von Löwis
 S   348  Exception Reorganization for Python 3.0      Cannon
- S   349  Generalized String Coercion                  Schemenauer
+ S   349  Allow str() to return unicode strings        Schemenauer
 S   754  IEEE 754 Floating Point Special Values       Warnes
 Finished PEPs (done, implemented in CVS)
@ -393,7 +393,7 @@ Numerical Index
 SR  346  User Defined ("with") Statements             Coghlan
 P   347  Migrating the Python CVS to Subversion       von Löwis
 S   348  Exception Reorganization for Python 3.0      Cannon
- S   349  Generalized String Coercion                  Schemenauer
+ S   349  Allow str() to return unicode strings        Schemenauer
 SR  666  Reject Foolish Indentation                   Creighton
 S   754  IEEE 754 Floating Point Special Values       Warnes
 I  3000  Python 3.0 Plans                             Kuchling, Cannon
--- a/pep-0349.txt
+++ b/pep-0349.txt
@ -1,5 +1,5 @@
 PEP: 349
-Title: Generalised String Coercion
+Title: Allow str() to return unicode strings
 Version: $Revision$
 Last-Modified: $Date$
 Author: Neil Schemenauer <nas@arctrix.com>
@ -7,20 +7,18 @@ Status: Draft
 Type: Standards Track
 Content-Type: text/plain
 Created: 02-Aug-2005
-Post-History:
+Post-History: 06-Aug-2005
 Python-Version: 2.5
 Abstract
-    This PEP proposes the introduction of a new built-in function,
+    This PEP proposes to change the str() built-in function so that it
-    text(), that provides a way of generating a string representation
+    can return unicode strings.  This change would make it easier to
-    of an object without forcing the result to be a particular string
+    write code that works with either string type and would also make
-    type.  In addition, the behavior %s format specifier would be
+    some existing code handle unicode strings.  The C function
-    changed to call text() on the argument.  These two changes would
+    PyObject_Str() would remain unchanged and the function
-    make it easier to write library code that can be used by
+    PyString_New() would be added instead.
    applications that use only the str type and by others that also use
    the unicode type.
 Rationale
@ -64,51 +62,35 @@ Rationale
    object; an operation traditionally accomplished by using the str()
    built-in function.
-    Using str() makes the code not Unicode-safe.  Replacing a str()
+    Using the current str() function makes the code not Unicode-safe.
-    call with a unicode() call makes the code not str-stable.  Using a
+    Replacing a str() call with a unicode() call makes the code not
-    string format almost accomplishes the goal but not quite.
+    str-stable.  Changing str() so that it could return unicode
-    Consider the following code:
+    instances would solve this problem.  As a further benefit, some code
-
+    that is currently not Unicode-safe because it uses str() would
-        def text(obj):
+    become Unicode-safe.
            return '%s' % obj
    It behaves as desired except if 'obj' is not a basestring instance
    and needs to return a Unicode representation of itself.  In that
    case, the string format will attempt to coerce the result of
    __str__ to a str instance.  Defining a __unicode__ method does not
    help since it will only be called if the right-hand operand is a
    unicode instance.  Using a unicode instance for the right-hand
    operand does not work because the function is no longer str-stable
    (i.e. it will coerce everything to unicode).
 Specification
-    A Python implementation of the text() built-in follows:
+    A Python implementation of the str() built-in follows:
-        def text(s):
+        def str(s):
            """Return a nice string representation of the object.  The
-            return value is a basestring instance.
+            return value is a str or unicode instance.
            """
-            if isinstance(s, basestring):
+            if type(s) is str or type(s) is unicode:
                return s
            r = s.__str__()
-            if not isinstance(r, basestring):
+            if not isinstance(r, (str, unicode)):
                raise TypeError('__str__ returned non-string')
            return r
    Note that it is currently possible, although not very useful, to
    write __str__ methods that return unicode instances.
    The %s format specifier for str objects would be changed to call
    text() on the argument.  Currently it calls str() unless the
    argument is a unicode instance (in which case the object is
    substituted as is and the % operation returns a unicode instance).
    The following function would be added to the C API and would be the
-    equivalent of the text() function:
+    equivalent to the str() built-in (ideally it be called PyObject_Str,
    but changing that function could cause a massive number of
    compatibility problems):
-        PyObject *PyObject_Text(PyObject *o);
+        PyObject *PyString_New(PyObject *);
    A reference implementation is available on Sourceforge [1] as a
    patch.
@ -116,52 +98,36 @@ Specification
 Backwards Compatibility
-    The change to the %s format specifier would result in some %
+    Some code may require that str() returns a str instance.  In the
-    operations returning a unicode instance rather than raising a
+    standard library, only one such case has been found so far.  The
-    UnicodeDecodeError exception.  It seems unlikely that the change
+    function email.header_decode() requires a str instance and the
-    would break currently working code.
+    email.Header.decode_header() function tries to ensure this by
    calling str() on its argument.  The code was fixed by changing
    the line "header = str(header)" to:
        if isinstance(header, unicode):
            header = header.encode('ascii')
    Whether this is truly a bug is questionable since decode_header()
    really operates on byte strings, not character strings.  Code that
    passes it a unicode instance could itself be considered buggy.
 Alternative Solutions
-    Rather than adding the text() built-in, if PEP 246 were
+    A new built-in function could be added instead of changing str().
-    implemented then adapt(s, basestring) could be equivalent to
+    Doing so would introduce virtually no backwards compatibility
-    text(s).  The advantage would be one less built-in function.  The
+    problems.  However, since the compatibility problems are expected to
-    problem is that PEP 246 is not implemented.
+    rare, changing str() seems preferable to adding a new built-in.
-    Fredrik Lundh has suggested [2] that perhaps a new slot should be
+    The basestring type could be changed to have the proposed behaviour,
-    added (e.g. __text__), that could return any kind of string that's
+    rather than changing str().  However, that would be confusing
-    compatible with Python's text model.  That seems like an
+    behaviour for an abstract base type.
    attractive idea but many details would still need to be worked
    out.
    Instead of providing the text() built-in, the %s format specifier
    could be changed and a string format could be used instead of
    calling text().  However, it seems like the operation is important
    enough to justify a built-in.
    Instead of providing the text() built-in, the basestring type
    could be changed to provide the same functionality.  That would
    possibly be confusing behaviour for an abstract base type.
    Some people have suggested [3] that an easier migration path would
    be to change the default encoding to be UTF-8.  Code that is not
    Unicode safe would then encode Unicode strings as UTF-8 and
    operate on them as str instances, rather than raising a
    UnicodeDecodeError exception.  Other code would assume that str
    instances were encoded using UTF-8 and decode them if necessary.
    While that solution may work for some applications, it seems
    unsuitable as a general solution.  For example, some applications
    get string data from many different sources and assuming that all
    str instances were encoded using UTF-8 could easily introduce
    subtle bugs.
 References
-    [1] http://www.python.org/sf/1159501
+    [1] http://www.python.org/sf/1266570
    [2] http://mail.python.org/pipermail/python-dev/2004-September/048755.html
    [3] http://blog.ianbicking.org/illusive-setdefaultencoding.html
 Copyright