2005-08-04 22:59:00 -04:00
|
|
|
PEP: 349
|
2005-08-22 17:12:08 -04:00
|
|
|
Title: Allow str() to return unicode strings
|
2005-08-04 22:59:00 -04:00
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
|
|
|
Author: Neil Schemenauer <nas@arctrix.com>
|
2020-05-04 12:57:07 -04:00
|
|
|
Status: Rejected
|
2005-08-04 22:59:00 -04:00
|
|
|
Type: Standards Track
|
2016-12-03 19:03:37 -05:00
|
|
|
Content-Type: text/x-rst
|
2005-08-04 22:59:00 -04:00
|
|
|
Created: 02-Aug-2005
|
|
|
|
Python-Version: 2.5
|
2007-06-19 00:20:07 -04:00
|
|
|
Post-History: 06-Aug-2005
|
2020-05-04 12:57:07 -04:00
|
|
|
Resolution: https://mail.python.org/archives/list/python-dev@python.org/message/M2Y3PUFLAE23NPRJPVBYF6P5LW5LVN6F/
|
2005-08-04 22:59:00 -04:00
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
2016-12-03 19:03:37 -05:00
|
|
|
========
|
2005-08-04 22:59:00 -04:00
|
|
|
|
2016-12-03 19:03:37 -05:00
|
|
|
This PEP proposes to change the ``str()`` built-in function so that it
|
|
|
|
can return unicode strings. This change would make it easier to
|
|
|
|
write code that works with either string type and would also make
|
|
|
|
some existing code handle unicode strings. The C function
|
|
|
|
``PyObject_Str()`` would remain unchanged and the function
|
|
|
|
``PyString_New()`` would be added instead.
|
2005-08-04 22:59:00 -04:00
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
2016-12-03 19:03:37 -05:00
|
|
|
=========
|
|
|
|
|
|
|
|
Python has had a Unicode string type for some time now but use of
|
|
|
|
it is not yet widespread. There is a large amount of Python code
|
|
|
|
that assumes that string data is represented as str instances.
|
2021-02-03 09:06:23 -05:00
|
|
|
The long-term plan for Python is to phase out the str type and use
|
2016-12-03 19:03:37 -05:00
|
|
|
unicode for all string data. Clearly, a smooth migration path
|
|
|
|
must be provided.
|
|
|
|
|
|
|
|
We need to upgrade existing libraries, written for str instances,
|
|
|
|
to be made capable of operating in an all-unicode string world.
|
|
|
|
We can't change to an all-unicode world until all essential
|
|
|
|
libraries are made capable for it. Upgrading the libraries in one
|
|
|
|
shot does not seem feasible. A more realistic strategy is to
|
|
|
|
individually make the libraries capable of operating on unicode
|
|
|
|
strings while preserving their current all-str environment
|
|
|
|
behaviour.
|
|
|
|
|
|
|
|
First, we need to be able to write code that can accept unicode
|
|
|
|
instances without attempting to coerce them to str instances. Let
|
|
|
|
us label such code as Unicode-safe. Unicode-safe libraries can be
|
|
|
|
used in an all-unicode world.
|
|
|
|
|
|
|
|
Second, we need to be able to write code that, when provided only
|
|
|
|
str instances, will not create unicode results. Let us label such
|
|
|
|
code as str-stable. Libraries that are str-stable can be used by
|
|
|
|
libraries and applications that are not yet Unicode-safe.
|
|
|
|
|
|
|
|
Sometimes it is simple to write code that is both str-stable and
|
|
|
|
Unicode-safe. For example, the following function just works::
|
|
|
|
|
|
|
|
def appendx(s):
|
|
|
|
return s + 'x'
|
|
|
|
|
|
|
|
That's not too surprising since the unicode type is designed to
|
|
|
|
make the task easier. The principle is that when str and unicode
|
|
|
|
instances meet, the result is a unicode instance. One notable
|
|
|
|
difficulty arises when code requires a string representation of an
|
|
|
|
object; an operation traditionally accomplished by using the ``str()``
|
|
|
|
built-in function.
|
|
|
|
|
|
|
|
Using the current ``str()`` function makes the code not Unicode-safe.
|
|
|
|
Replacing a ``str()`` call with a ``unicode()`` call makes the code not
|
|
|
|
str-stable. Changing ``str()`` so that it could return unicode
|
|
|
|
instances would solve this problem. As a further benefit, some code
|
|
|
|
that is currently not Unicode-safe because it uses ``str()`` would
|
|
|
|
become Unicode-safe.
|
2005-08-04 22:59:00 -04:00
|
|
|
|
|
|
|
|
|
|
|
Specification
|
2016-12-03 19:03:37 -05:00
|
|
|
=============
|
|
|
|
|
|
|
|
A Python implementation of the ``str()`` built-in follows::
|
|
|
|
|
|
|
|
def str(s):
|
|
|
|
"""Return a nice string representation of the object. The
|
|
|
|
return value is a str or unicode instance.
|
|
|
|
"""
|
|
|
|
if type(s) is str or type(s) is unicode:
|
|
|
|
return s
|
|
|
|
r = s.__str__()
|
|
|
|
if not isinstance(r, (str, unicode)):
|
|
|
|
raise TypeError('__str__ returned non-string')
|
|
|
|
return r
|
|
|
|
|
|
|
|
The following function would be added to the C API and would be the
|
|
|
|
equivalent to the ``str()`` built-in (ideally it be called ``PyObject_Str``,
|
|
|
|
but changing that function could cause a massive number of
|
|
|
|
compatibility problems)::
|
|
|
|
|
|
|
|
PyObject *PyString_New(PyObject *);
|
|
|
|
|
|
|
|
A reference implementation is available on Sourceforge [1]_ as a
|
|
|
|
patch.
|
|
|
|
|
2005-08-04 22:59:00 -04:00
|
|
|
|
|
|
|
Backwards Compatibility
|
2016-12-03 19:03:37 -05:00
|
|
|
=======================
|
2005-08-04 22:59:00 -04:00
|
|
|
|
2016-12-03 19:03:37 -05:00
|
|
|
Some code may require that ``str()`` returns a str instance. In the
|
|
|
|
standard library, only one such case has been found so far. The
|
|
|
|
function ``email.header_decode()`` requires a str instance and the
|
|
|
|
``email.Header.decode_header()`` function tries to ensure this by
|
|
|
|
calling ``str()`` on its argument. The code was fixed by changing
|
|
|
|
the line "header = str(header)" to::
|
2005-08-22 17:12:08 -04:00
|
|
|
|
2016-12-03 19:03:37 -05:00
|
|
|
if isinstance(header, unicode):
|
|
|
|
header = header.encode('ascii')
|
2005-08-22 17:12:08 -04:00
|
|
|
|
2016-12-03 19:03:37 -05:00
|
|
|
Whether this is truly a bug is questionable since ``decode_header()``
|
|
|
|
really operates on byte strings, not character strings. Code that
|
|
|
|
passes it a unicode instance could itself be considered buggy.
|
2005-08-04 22:59:00 -04:00
|
|
|
|
|
|
|
|
|
|
|
Alternative Solutions
|
2016-12-03 19:03:37 -05:00
|
|
|
=====================
|
2005-08-04 22:59:00 -04:00
|
|
|
|
2016-12-03 19:03:37 -05:00
|
|
|
A new built-in function could be added instead of changing ``str()``.
|
|
|
|
Doing so would introduce virtually no backwards compatibility
|
|
|
|
problems. However, since the compatibility problems are expected to
|
|
|
|
rare, changing ``str()`` seems preferable to adding a new built-in.
|
2005-08-22 17:12:08 -04:00
|
|
|
|
2016-12-03 19:03:37 -05:00
|
|
|
The basestring type could be changed to have the proposed behaviour,
|
|
|
|
rather than changing ``str()``. However, that would be confusing
|
|
|
|
behaviour for an abstract base type.
|
2005-08-04 22:59:00 -04:00
|
|
|
|
|
|
|
|
|
|
|
References
|
2016-12-03 19:03:37 -05:00
|
|
|
==========
|
2005-08-04 22:59:00 -04:00
|
|
|
|
2018-07-21 19:57:17 -04:00
|
|
|
.. [1] https://bugs.python.org/issue1266570
|
2005-08-04 22:59:00 -04:00
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
2016-12-03 19:03:37 -05:00
|
|
|
=========
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
2005-08-04 22:59:00 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
2016-12-03 19:03:37 -05:00
|
|
|
..
|
|
|
|
Local Variables:
|
|
|
|
mode: indented-text
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
sentence-end-double-space: t
|
|
|
|
fill-column: 70
|
|
|
|
End:
|