Reformat.

2006-02-22 20:49:37 +00:00 · 2006-02-22 20:49:37 +00:00 · 61b80831fe
parent 2da9d64f5e
commit 61b80831fe
1 changed files with 139 additions and 141 deletions
--- a/pep-0358.txt
+++ b/pep-0358.txt
@ -12,197 +12,195 @@ Post-History:


 Abstract
-========

-This PEP outlines the introduction of a raw bytes sequence object.
-Adding the bytes object is one step in the transition to Unicode based
-str objects.
+    This PEP outlines the introduction of a raw bytes sequence object.
+    Adding the bytes object is one step in the transition to Unicode
+    based str objects.


 Motivation
-==========

-Python's current string objects are overloaded. They serve to hold
-both sequences of characters and sequences of bytes. This overloading
-of purpose leads to confusion and bugs. In future versions of Python,
-string objects will be used for holding character data. The bytes object
-will fulfil the role of a byte container. Eventually the unicode
-built-in will be renamed to str and the str object will be removed.
+    Python's current string objects are overloaded. They serve to hold
+    both sequences of characters and sequences of bytes. This
+    overloading of purpose leads to confusion and bugs. In future
+    versions of Python, string objects will be used for holding
+    character data. The bytes object will fulfil the role of a byte
+    container. Eventually the unicode built-in will be renamed to str
+    and the str object will be removed.


 Specification
-=============

-A bytes object stores a mutable sequence of integers that are in the
-range 0 to 255.  Unlike string objects, indexing a bytes object returns
-an integer.  Assigning an element using a object that is not an integer
-causes a TypeError exception.  Assigning an element to a value outside
-the range 0 to 255 causes a ValueError exception.  The __len__ method of
-bytes returns the number of integers stored in the sequence (i.e. the
-number of bytes).
+    A bytes object stores a mutable sequence of integers that are in the
+    range 0 to 255.  Unlike string objects, indexing a bytes object
+    returns an integer.  Assigning an element using a object that is not
+    an integer causes a TypeError exception.  Assigning an element to a
+    value outside the range 0 to 255 causes a ValueError exception.  The
+    __len__ method of bytes returns the number of integers stored in the
+    sequence (i.e. the number of bytes).

-The constructor of the bytes object has the following signature:
+    The constructor of the bytes object has the following signature:

-    bytes([initialiser[, [encoding]])
+        bytes([initialiser[, [encoding]])

-If no arguments are provided then an object containing zero elements is
-created and returned.  The initialiser argument can be a string or a
-sequence of integers.  The pseudo-code for the constructor is:
+    If no arguments are provided then an object containing zero elements
+    is created and returned.  The initialiser argument can be a string
+    or a sequence of integers.  The pseudo-code for the constructor is:

-    def bytes(initialiser=[], encoding=None):
-        if isinstance(initialiser, basestring):
-            if isinstance(initialiser, unicode):
-                if encoding is None:
-                    encoding = sys.getdefaultencoding()
-                initialiser = initialiser.encode(encoding)
-            initialiser = [ord(c) for c in initialiser]
-        elif encoding is not None:
-            raise TypeError("explicit encoding invalid for non-string "
-                            "initialiser")
-        create bytes object and fill with integers from initialiser
-        return bytes object
+        def bytes(initialiser=[], encoding=None):
+            if isinstance(initialiser, basestring):
+                if isinstance(initialiser, unicode):
+                    if encoding is None:
+                        encoding = sys.getdefaultencoding()
+                    initialiser = initialiser.encode(encoding)
+                initialiser = [ord(c) for c in initialiser]
+            elif encoding is not None:
+                raise TypeError("explicit encoding invalid for non-string "
+                                "initialiser")
+            create bytes object and fill with integers from initialiser
+            return bytes object

-The __repr__ method returns a string that can be evaluated to generate a
-new bytes object containing the same sequence of integers.  The sequence
-is represented by a list of ints.  For example:
+    The __repr__ method returns a string that can be evaluated to
+    generate a new bytes object containing the same sequence of
+    integers.  The sequence is represented by a list of ints.  For
+    example:

-    >>> repr(bytes[10, 20, 30])
-    'bytes([10, 20, 30])'
+        >>> repr(bytes[10, 20, 30])
+        'bytes([10, 20, 30])'

-The object has a decode method equivalent to the decode method of the
-str object.  The object has a classmethod fromhex that takes a string of
-characters from the set [0-9a-zA-Z ] and returns a bytes object (similar
-to binascii.unhexlify).  For example:
+    The object has a decode method equivalent to the decode method of
+    the str object.  The object has a classmethod fromhex that takes a
+    string of characters from the set [0-9a-zA-Z ] and returns a bytes
+    object (similar to binascii.unhexlify).  For example:

-    >>> bytes.fromhex('5c5350ff')
-    bytes([92, 83, 80, 255]])
-    >>> bytes.fromhex('5c 53 50 ff')
-    bytes([92, 83, 80, 255]])
+        >>> bytes.fromhex('5c5350ff')
+        bytes([92, 83, 80, 255]])
+        >>> bytes.fromhex('5c 53 50 ff')
+        bytes([92, 83, 80, 255]])

-The object has a hex method that does the reverse conversion (similar to
-binascii.hexlify):
+    The object has a hex method that does the reverse conversion
+    (similar to binascii.hexlify):

-    >> bytes([92, 83, 80, 255]]).hex()
-    '5c5350ff'
+        >> bytes([92, 83, 80, 255]]).hex()
+        '5c5350ff'

-The bytes object has methods similar to the list object:
+    The bytes object has methods similar to the list object:

-    __add__
-    __contains__
-    __delitem__
-    __delslice__
-    __eq__
-    __ge__
-    __getitem__
-    __getslice__
-    __gt__
-    __hash__
-    __iadd__
-    __imul__
-    __iter__
-    __le__
-    __len__
-    __lt__
-    __mul__
-    __ne__
-    __reduce__
-    __reduce_ex__
-    __repr__
-    __rmul__
-    __setitem__
-    __setslice__
-    append
-    count
-    extend
-    index
-    insert
-    pop
-    remove
+        __add__
+        __contains__
+        __delitem__
+        __delslice__
+        __eq__
+        __ge__
+        __getitem__
+        __getslice__
+        __gt__
+        __hash__
+        __iadd__
+        __imul__
+        __iter__
+        __le__
+        __len__
+        __lt__
+        __mul__
+        __ne__
+        __reduce__
+        __reduce_ex__
+        __repr__
+        __rmul__
+        __setitem__
+        __setslice__
+        append
+        count
+        extend
+        index
+        insert
+        pop
+        remove


 Out of scope issues
-===================

-* If we provide a literal syntax for bytes then it should look distinctly
-  different than the syntax for literal strings.  Also, a new type, even
-  built-in, is much less drastic than a new literal (which requires
-  lexer and parser support in addition to everything else).  Since there
-  appears to be no immediate need for a literal representation,
-  designing and implementing one is out of the scope of this PEP.
+    * If we provide a literal syntax for bytes then it should look
+      distinctly different than the syntax for literal strings.  Also, a
+      new type, even built-in, is much less drastic than a new literal
+      (which requires lexer and parser support in addition to everything
+      else).  Since there appears to be no immediate need for a literal
+      representation, designing and implementing one is out of the scope
+      of this PEP.

-* Python 3k will have a much different I/O subsystem.  Deciding how that
-  I/O subsystem will work and interact with the bytes object is out of
-  the scope of this PEP.
+    * Python 3k will have a much different I/O subsystem.  Deciding how
+      that I/O subsystem will work and interact with the bytes object is
+      out of the scope of this PEP.

-* It has been suggested that a special method named __bytes__ be added
-  to language to allow objects to be converted into byte arrays.  This
-  decision is out of scope.
+    * It has been suggested that a special method named __bytes__ be
+      added to language to allow objects to be converted into byte
+      arrays.  This decision is out of scope.


 Unresolved issues
-=================

-* Perhaps the bytes object should be implemented as a extension module
-  until we are more sure of the design (similar to how the set object
-  was prototyped).
+    * Perhaps the bytes object should be implemented as a extension
+      module until we are more sure of the design (similar to how the
+      set object was prototyped).

-* Should the bytes object implement the buffer interface?  Probably, but
-  we need to look into the implications of that (e.g. regex operations
-  on byte arrays).
+    * Should the bytes object implement the buffer interface?  Probably,
+      but we need to look into the implications of that (e.g. regex
+      operations on byte arrays).

-* Should the object implement __reversed__ and reverse?  Should it
-  implement sort?
+    * Should the object implement __reversed__ and reverse?  Should it
+      implement sort?

-* Need to clarify what some of the methods do.  How are comparisons
-  done?  Hashing?  Pickling and marshalling?
+    * Need to clarify what some of the methods do.  How are comparisons
+      done?  Hashing?  Pickling and marshalling?


 Questions and answers
-=====================

-Q: Why have the optional encoding argument when the encode method of
-   Unicode objects does the same thing.
+    Q: Why have the optional encoding argument when the encode method of
+       Unicode objects does the same thing.

-A: In the current version of Python, the encode method returns a str
-   object and we cannot change that without breaking code.  The construct
-   bytes(s.encode(...)) is expensive because it has to copy the byte
-   sequence multiple times.  Also, Python generally provides two ways of
-   converting an object of type A into an object of type B: ask an A
-   instance to convert itself to a B, or ask the type B to create a new
-   instance from an A. Depending on what A and B are, both APIs make
-   sense; sometimes reasons of decoupling require that A can't know
-   about B, in which case you have to use the latter approach; sometimes
-   B can't know about A, in which case you have to use the former.
+    A: In the current version of Python, the encode method returns a str
+       object and we cannot change that without breaking code.  The
+       construct bytes(s.encode(...)) is expensive because it has to
+       copy the byte sequence multiple times.  Also, Python generally
+       provides two ways of converting an object of type A into an
+       object of type B: ask an A instance to convert itself to a B, or
+       ask the type B to create a new instance from an A. Depending on
+       what A and B are, both APIs make sense; sometimes reasons of
+       decoupling require that A can't know about B, in which case you
+       have to use the latter approach; sometimes B can't know about A,
+       in which case you have to use the former.


-Q: Why does bytes ignore the encoding argument if the initialiser is a
-   str?
+    Q: Why does bytes ignore the encoding argument if the initialiser is
+       a str?

-A: There is no sane meaning that the encoding can have in that case.
-   str objects *are* byte arrays and they know nothing about the
-   encoding of character data they contain.  We need to assume that the
-   programmer has provided str object that already uses the desired
-   encoding. If you need something other than a pure copy of the bytes
-   then you need to first decode the string.  For example:
+    A: There is no sane meaning that the encoding can have in that case.
+       str objects *are* byte arrays and they know nothing about the
+       encoding of character data they contain.  We need to assume that
+       the programmer has provided str object that already uses the
+       desired encoding. If you need something other than a pure copy of
+       the bytes then you need to first decode the string.  For example:

-       bytes(s.decode(encoding1), encoding2)
+           bytes(s.decode(encoding1), encoding2)


-Q: Why not have the encoding argument default to Latin-1 (or some other
-   encoding that covers the entire byte range) rather than ASCII ?
+    Q: Why not have the encoding argument default to Latin-1 (or some
+       other encoding that covers the entire byte range) rather than
+       ASCII?

-A: The system default encoding for Python is ASCII.  It seems least
-   confusing to use that default.  Also, in Py3k, using Latin-1 as
-   the default might not be what users expect.  For example, they might
-   prefer a Unicode encoding.  Any default will not always work as
-   expected.  At least ASCII will complain loudly if you try to encode
-   non-ASCII data.
+    A: The system default encoding for Python is ASCII.  It seems least
+       confusing to use that default.  Also, in Py3k, using Latin-1 as
+       the default might not be what users expect.  For example, they
+       might prefer a Unicode encoding.  Any default will not always
+       work as expected.  At least ASCII will complain loudly if you try
+       to encode non-ASCII data.


 Copyright
-=========

-This document has been placed in the public domain.
+    This document has been placed in the public domain.