python-peps/pep-0349.txt

PEP: 349
Title: Allow str() to return unicode strings
Version: $Revision$
Last-Modified: $Date$
Author: Neil Schemenauer <nas@arctrix.com>
Status: Deferred
Type: Standards Track
Content-Type: text/plain
Created: 02-Aug-2005
Python-Version: 2.5
Post-History: 06-Aug-2005


Abstract

    This PEP proposes to change the str() built-in function so that it
    can return unicode strings.  This change would make it easier to
    write code that works with either string type and would also make
    some existing code handle unicode strings.  The C function
    PyObject_Str() would remain unchanged and the function
    PyString_New() would be added instead.


Rationale

    Python has had a Unicode string type for some time now but use of
    it is not yet widespread.  There is a large amount of Python code
    that assumes that string data is represented as str instances.
    The long term plan for Python is to phase out the str type and use
    unicode for all string data.  Clearly, a smooth migration path
    must be provided.

    We need to upgrade existing libraries, written for str instances,
    to be made capable of operating in an all-unicode string world.
    We can't change to an all-unicode world until all essential
    libraries are made capable for it.  Upgrading the libraries in one
    shot does not seem feasible.  A more realistic strategy is to
    individually make the libraries capable of operating on unicode
    strings while preserving their current all-str environment
    behaviour.

    First, we need to be able to write code that can accept unicode
    instances without attempting to coerce them to str instances.  Let
    us label such code as Unicode-safe.  Unicode-safe libraries can be
    used in an all-unicode world.

    Second, we need to be able to write code that, when provided only
    str instances, will not create unicode results.  Let us label such
    code as str-stable.  Libraries that are str-stable can be used by
    libraries and applications that are not yet Unicode-safe.
    
    Sometimes it is simple to write code that is both str-stable and
    Unicode-safe.  For example, the following function just works:

        def appendx(s):
            return s + 'x'

    That's not too surprising since the unicode type is designed to
    make the task easier.  The principle is that when str and unicode
    instances meet, the result is a unicode instance.  One notable
    difficulty arises when code requires a string representation of an
    object; an operation traditionally accomplished by using the str()
    built-in function.
    
    Using the current str() function makes the code not Unicode-safe.
    Replacing a str() call with a unicode() call makes the code not
    str-stable.  Changing str() so that it could return unicode
    instances would solve this problem.  As a further benefit, some code
    that is currently not Unicode-safe because it uses str() would
    become Unicode-safe.


Specification

    A Python implementation of the str() built-in follows:

        def str(s):
            """Return a nice string representation of the object.  The
            return value is a str or unicode instance.
            """
            if type(s) is str or type(s) is unicode:
                return s
            r = s.__str__()
            if not isinstance(r, (str, unicode)):
                raise TypeError('__str__ returned non-string')
            return r
            
    The following function would be added to the C API and would be the
    equivalent to the str() built-in (ideally it be called PyObject_Str,
    but changing that function could cause a massive number of
    compatibility problems):

        PyObject *PyString_New(PyObject *);

    A reference implementation is available on Sourceforge [1] as a
    patch.

                
Backwards Compatibility

    Some code may require that str() returns a str instance.  In the
    standard library, only one such case has been found so far.  The
    function email.header_decode() requires a str instance and the
    email.Header.decode_header() function tries to ensure this by
    calling str() on its argument.  The code was fixed by changing
    the line "header = str(header)" to:

        if isinstance(header, unicode):
            header = header.encode('ascii')

    Whether this is truly a bug is questionable since decode_header()
    really operates on byte strings, not character strings.  Code that
    passes it a unicode instance could itself be considered buggy.


Alternative Solutions

    A new built-in function could be added instead of changing str().
    Doing so would introduce virtually no backwards compatibility
    problems.  However, since the compatibility problems are expected to
    rare, changing str() seems preferable to adding a new built-in.

    The basestring type could be changed to have the proposed behaviour,
    rather than changing str().  However, that would be confusing
    behaviour for an abstract base type.


References

    [1] http://www.python.org/sf/1266570


Copyright

    This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
+								PEP: 349
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								Title: Allow str() to return unicode strings
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
+								Version: $Revision$
 								Last-Modified: $Date$
 								Author: Neil Schemenauer <nas@arctrix.com>
-												Defer PEP 349.

											
										
										
											2005-10-29 10:36:02 -04:00
+								Status: Deferred
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
+								Type: Standards Track
 								Content-Type: text/plain
 								Created: 02-Aug-2005
 								Python-Version: 2.5
-												Fix the headers so that they are in the right order and no required headers are
missing.

											
										
										
											2007-06-19 00:20:07 -04:00
+								Post-History: 06-Aug-2005
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
 								Abstract
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								    This PEP proposes to change the str() built-in function so that it
 								    can return unicode strings.  This change would make it easier to
 								    write code that works with either string type and would also make
 								    some existing code handle unicode strings.  The C function
 								    PyObject_Str() would remain unchanged and the function
 								    PyString_New() would be added instead.
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
 								Rationale
 								    Python has had a Unicode string type for some time now but use of
 								    it is not yet widespread.  There is a large amount of Python code
 								    that assumes that string data is represented as str instances.
 								    The long term plan for Python is to phase out the str type and use
 								    unicode for all string data.  Clearly, a smooth migration path
 								    must be provided.
 								    We need to upgrade existing libraries, written for str instances,
 								    to be made capable of operating in an all-unicode string world.
 								    We can't change to an all-unicode world until all essential
 								    libraries are made capable for it.  Upgrading the libraries in one
 								    shot does not seem feasible.  A more realistic strategy is to
 								    individually make the libraries capable of operating on unicode
 								    strings while preserving their current all-str environment
 								    behaviour.
 								    First, we need to be able to write code that can accept unicode
 								    instances without attempting to coerce them to str instances.  Let
 								    us label such code as Unicode-safe.  Unicode-safe libraries can be
 								    used in an all-unicode world.
 								    Second, we need to be able to write code that, when provided only
 								    str instances, will not create unicode results.  Let us label such
 								    code as str-stable.  Libraries that are str-stable can be used by
 								    libraries and applications that are not yet Unicode-safe.
 								    Sometimes it is simple to write code that is both str-stable and
 								    Unicode-safe.  For example, the following function just works:
 								        def appendx(s):
 								            return s + 'x'
 								    That's not too surprising since the unicode type is designed to
 								    make the task easier.  The principle is that when str and unicode
 								    instances meet, the result is a unicode instance.  One notable
 								    difficulty arises when code requires a string representation of an
 								    object; an operation traditionally accomplished by using the str()
 								    built-in function.
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								    Using the current str() function makes the code not Unicode-safe.
 								    Replacing a str() call with a unicode() call makes the code not
 								    str-stable.  Changing str() so that it could return unicode
 								    instances would solve this problem.  As a further benefit, some code
 								    that is currently not Unicode-safe because it uses str() would
 								    become Unicode-safe.
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
 								Specification
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								    A Python implementation of the str() built-in follows:
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								        def str(s):
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
+								            """Return a nice string representation of the object.  The
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								            return value is a str or unicode instance.
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
+								            """
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								            if type(s) is str or type(s) is unicode:
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
+								                return s
 								            r = s.__str__()
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								            if not isinstance(r, (str, unicode)):
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
+								                raise TypeError('__str__ returned non-string')
 								            return r
 								    The following function would be added to the C API and would be the
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								    equivalent to the str() built-in (ideally it be called PyObject_Str,
 								    but changing that function could cause a massive number of
 								    compatibility problems):
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								        PyObject *PyString_New(PyObject *);
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
 								    A reference implementation is available on Sourceforge [1] as a
 								    patch.
 								Backwards Compatibility
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								    Some code may require that str() returns a str instance.  In the
 								    standard library, only one such case has been found so far.  The
 								    function email.header_decode() requires a str instance and the
 								    email.Header.decode_header() function tries to ensure this by
 								    calling str() on its argument.  The code was fixed by changing
 								    the line "header = str(header)" to:
 								        if isinstance(header, unicode):
 								            header = header.encode('ascii')
 								    Whether this is truly a bug is questionable since decode_header()
 								    really operates on byte strings, not character strings.  Code that
 								    passes it a unicode instance could itself be considered buggy.
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
 								Alternative Solutions
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								    A new built-in function could be added instead of changing str().
 								    Doing so would introduce virtually no backwards compatibility
 								    problems.  However, since the compatibility problems are expected to
 								    rare, changing str() seems preferable to adding a new built-in.
 								    The basestring type could be changed to have the proposed behaviour,
 								    rather than changing str().  However, that would be confusing
 								    behaviour for an abstract base type.
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
 								References
-												New version of PEP 349.  Propose that str() be changed rather than
adding a new built-in function.

											
										
										
											2005-08-22 17:12:08 -04:00
+								    [1] http://www.python.org/sf/1266570
-												Add PEP 349.

											
										
										
											2005-08-04 22:59:00 -04:00
 								Copyright
 								    This document has been placed in the public domain.
 								Local Variables:
 								mode: indented-text
 								indent-tabs-mode: nil
 								sentence-end-double-space: t
 								fill-column: 70
 								End: