From 4e21b23c38d1b75509dfe3286677b4812e421c93 Mon Sep 17 00:00:00 2001 From: Mariatta Date: Sat, 3 Dec 2016 16:03:37 -0800 Subject: [PATCH] restify PEP 349 (#148) --- pep-0349.txt | 196 +++++++++++++++++++++++++++------------------------ 1 file changed, 102 insertions(+), 94 deletions(-) diff --git a/pep-0349.txt b/pep-0349.txt index 2a30bda5b..df4518cb7 100644 --- a/pep-0349.txt +++ b/pep-0349.txt @@ -5,140 +5,148 @@ Last-Modified: $Date$ Author: Neil Schemenauer Status: Deferred Type: Standards Track -Content-Type: text/plain +Content-Type: text/x-rst Created: 02-Aug-2005 Python-Version: 2.5 Post-History: 06-Aug-2005 Abstract +======== - This PEP proposes to change the str() built-in function so that it - can return unicode strings. This change would make it easier to - write code that works with either string type and would also make - some existing code handle unicode strings. The C function - PyObject_Str() would remain unchanged and the function - PyString_New() would be added instead. +This PEP proposes to change the ``str()`` built-in function so that it +can return unicode strings. This change would make it easier to +write code that works with either string type and would also make +some existing code handle unicode strings. The C function +``PyObject_Str()`` would remain unchanged and the function +``PyString_New()`` would be added instead. Rationale +========= - Python has had a Unicode string type for some time now but use of - it is not yet widespread. There is a large amount of Python code - that assumes that string data is represented as str instances. - The long term plan for Python is to phase out the str type and use - unicode for all string data. Clearly, a smooth migration path - must be provided. +Python has had a Unicode string type for some time now but use of +it is not yet widespread. There is a large amount of Python code +that assumes that string data is represented as str instances. +The long term plan for Python is to phase out the str type and use +unicode for all string data. Clearly, a smooth migration path +must be provided. - We need to upgrade existing libraries, written for str instances, - to be made capable of operating in an all-unicode string world. - We can't change to an all-unicode world until all essential - libraries are made capable for it. Upgrading the libraries in one - shot does not seem feasible. A more realistic strategy is to - individually make the libraries capable of operating on unicode - strings while preserving their current all-str environment - behaviour. +We need to upgrade existing libraries, written for str instances, +to be made capable of operating in an all-unicode string world. +We can't change to an all-unicode world until all essential +libraries are made capable for it. Upgrading the libraries in one +shot does not seem feasible. A more realistic strategy is to +individually make the libraries capable of operating on unicode +strings while preserving their current all-str environment +behaviour. - First, we need to be able to write code that can accept unicode - instances without attempting to coerce them to str instances. Let - us label such code as Unicode-safe. Unicode-safe libraries can be - used in an all-unicode world. +First, we need to be able to write code that can accept unicode +instances without attempting to coerce them to str instances. Let +us label such code as Unicode-safe. Unicode-safe libraries can be +used in an all-unicode world. - Second, we need to be able to write code that, when provided only - str instances, will not create unicode results. Let us label such - code as str-stable. Libraries that are str-stable can be used by - libraries and applications that are not yet Unicode-safe. - - Sometimes it is simple to write code that is both str-stable and - Unicode-safe. For example, the following function just works: +Second, we need to be able to write code that, when provided only +str instances, will not create unicode results. Let us label such +code as str-stable. Libraries that are str-stable can be used by +libraries and applications that are not yet Unicode-safe. - def appendx(s): - return s + 'x' +Sometimes it is simple to write code that is both str-stable and +Unicode-safe. For example, the following function just works:: - That's not too surprising since the unicode type is designed to - make the task easier. The principle is that when str and unicode - instances meet, the result is a unicode instance. One notable - difficulty arises when code requires a string representation of an - object; an operation traditionally accomplished by using the str() - built-in function. - - Using the current str() function makes the code not Unicode-safe. - Replacing a str() call with a unicode() call makes the code not - str-stable. Changing str() so that it could return unicode - instances would solve this problem. As a further benefit, some code - that is currently not Unicode-safe because it uses str() would - become Unicode-safe. + def appendx(s): + return s + 'x' + +That's not too surprising since the unicode type is designed to +make the task easier. The principle is that when str and unicode +instances meet, the result is a unicode instance. One notable +difficulty arises when code requires a string representation of an +object; an operation traditionally accomplished by using the ``str()`` +built-in function. + +Using the current ``str()`` function makes the code not Unicode-safe. +Replacing a ``str()`` call with a ``unicode()`` call makes the code not +str-stable. Changing ``str()`` so that it could return unicode +instances would solve this problem. As a further benefit, some code +that is currently not Unicode-safe because it uses ``str()`` would +become Unicode-safe. Specification +============= - A Python implementation of the str() built-in follows: +A Python implementation of the ``str()`` built-in follows:: - def str(s): - """Return a nice string representation of the object. The - return value is a str or unicode instance. - """ - if type(s) is str or type(s) is unicode: - return s - r = s.__str__() - if not isinstance(r, (str, unicode)): - raise TypeError('__str__ returned non-string') - return r - - The following function would be added to the C API and would be the - equivalent to the str() built-in (ideally it be called PyObject_Str, - but changing that function could cause a massive number of - compatibility problems): + def str(s): + """Return a nice string representation of the object. The + return value is a str or unicode instance. + """ + if type(s) is str or type(s) is unicode: + return s + r = s.__str__() + if not isinstance(r, (str, unicode)): + raise TypeError('__str__ returned non-string') + return r - PyObject *PyString_New(PyObject *); +The following function would be added to the C API and would be the +equivalent to the ``str()`` built-in (ideally it be called ``PyObject_Str``, +but changing that function could cause a massive number of +compatibility problems):: + + PyObject *PyString_New(PyObject *); + +A reference implementation is available on Sourceforge [1]_ as a +patch. - A reference implementation is available on Sourceforge [1] as a - patch. - Backwards Compatibility +======================= - Some code may require that str() returns a str instance. In the - standard library, only one such case has been found so far. The - function email.header_decode() requires a str instance and the - email.Header.decode_header() function tries to ensure this by - calling str() on its argument. The code was fixed by changing - the line "header = str(header)" to: +Some code may require that ``str()`` returns a str instance. In the +standard library, only one such case has been found so far. The +function ``email.header_decode()`` requires a str instance and the +``email.Header.decode_header()`` function tries to ensure this by +calling ``str()`` on its argument. The code was fixed by changing +the line "header = str(header)" to:: - if isinstance(header, unicode): - header = header.encode('ascii') + if isinstance(header, unicode): + header = header.encode('ascii') - Whether this is truly a bug is questionable since decode_header() - really operates on byte strings, not character strings. Code that - passes it a unicode instance could itself be considered buggy. +Whether this is truly a bug is questionable since ``decode_header()`` +really operates on byte strings, not character strings. Code that +passes it a unicode instance could itself be considered buggy. Alternative Solutions +===================== - A new built-in function could be added instead of changing str(). - Doing so would introduce virtually no backwards compatibility - problems. However, since the compatibility problems are expected to - rare, changing str() seems preferable to adding a new built-in. +A new built-in function could be added instead of changing ``str()``. +Doing so would introduce virtually no backwards compatibility +problems. However, since the compatibility problems are expected to +rare, changing ``str()`` seems preferable to adding a new built-in. - The basestring type could be changed to have the proposed behaviour, - rather than changing str(). However, that would be confusing - behaviour for an abstract base type. +The basestring type could be changed to have the proposed behaviour, +rather than changing ``str()``. However, that would be confusing +behaviour for an abstract base type. References +========== - [1] http://www.python.org/sf/1266570 +.. [1] http://www.python.org/sf/1266570 Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -End: + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: