Update the bytes object to better resemble my intentions.

2007-02-22 23:57:46 +00:00 · 2007-02-22 23:57:46 +00:00 · 3ed8bc79de
parent 7c97ac0ea4
commit 3ed8bc79de
2 changed files with 84 additions and 48 deletions
--- a/pep-0000.txt
+++ b/pep-0000.txt
@ -107,7 +107,7 @@ Index by Category
 I   350  Codetags                                     Elliott
 S   354  Enumerations in Python                       Finney
 S   355  Path - Object oriented filesystem paths      Lindqvist
- S   358  The "bytes" Object                           Schemenauer
+ S   358  The "bytes" Object                           Schemenauer, GvR
 S   362  Function Signature Object                    Cannon, Seo
 S   754  IEEE 754 Floating Point Special Values       Warnes
 S  3101  Advanced String Formatting                   Talin
@ -431,7 +431,7 @@ Numerical Index
 S   355  Path - Object oriented filesystem paths      Lindqvist
 IF  356  Python 2.5 Release Schedule                  Norwitz, et al
 SF  357  Allowing Any Object to be Used for Slicing   Oliphant
- S   358  The "bytes" Object                           Schemenauer
+ S   358  The "bytes" Object                           Schemenauer, GvR
 SW  359  The "make" Statement                         Bethard
 I   360  Externally Maintained Packages               Cannon
 I   361  Python 2.6 Release Schedule                  Norwitz, et al
--- a/pep-0358.txt
+++ b/pep-0358.txt
@ -2,12 +2,12 @@ PEP: 358
 Title: The "bytes" Object
 Version: $Revision$
 Last-Modified: $Date$
-Author: Neil Schemenauer <nas@arctrix.com>
+Author: Neil Schemenauer <nas@arctrix.com>, Guido van Rossum <guido@google.com>
 Status: Draft
 Type: Standards Track
 Content-Type: text/plain
 Created: 15-Feb-2006
-Python-Version: 2.5
+Python-Version: 2.6, 3.0
 Post-History:


@ -20,74 +20,86 @@ Abstract

 Motivation

-    Python's current string objects are overloaded. They serve to hold
-    both sequences of characters and sequences of bytes. This
-    overloading of purpose leads to confusion and bugs. In future
+    Python's current string objects are overloaded.  They serve to hold
+    both sequences of characters and sequences of bytes.  This
+    overloading of purpose leads to confusion and bugs.  In future
    versions of Python, string objects will be used for holding
-    character data. The bytes object will fulfil the role of a byte
-    container. Eventually the unicode built-in will be renamed to str
+    character data.  The bytes object will fulfil the role of a byte
+    container.  Eventually the unicode built-in will be renamed to str
    and the str object will be removed.


 Specification

-    A bytes object stores a mutable sequence of integers that are in the
-    range 0 to 255.  Unlike string objects, indexing a bytes object
-    returns an integer.  Assigning an element using a object that is not
-    an integer causes a TypeError exception.  Assigning an element to a
-    value outside the range 0 to 255 causes a ValueError exception.  The
-    __len__ method of bytes returns the number of integers stored in the
-    sequence (i.e. the number of bytes).
+    A bytes object stores a mutable sequence of integers that are in
+    the range 0 to 255.  Unlike string objects, indexing a bytes
+    object returns an integer.  Assigning an element using a object
+    that is not an integer causes a TypeError exception.  Assigning an
+    element to a value outside the range 0 to 255 causes a ValueError
+    exception.  The .__len__() method of bytes returns the number of
+    integers stored in the sequence (i.e. the number of bytes).

    The constructor of the bytes object has the following signature:

        bytes([initialiser[, [encoding]])

    If no arguments are provided then an object containing zero elements
-    is created and returned.  The initialiser argument can be a string
-    or a sequence of integers.  The pseudo-code for the constructor is:
+    is created and returned.  The initialiser argument can be a string,
+    a sequence of integers, or a single integer.  The pseudo-code for the
+    constructor is:

        def bytes(initialiser=[], encoding=None):
-            if isinstance(initialiser, basestring):
-                if isinstance(initialiser, unicode):
+            if isinstance(initialiser, int): # In 2.6, (int, long)
+                initialiser = [0]*initialiser
+            elif isinstance(initialiser, basestring):
+                if isinstance(initialiser, unicode): # In 3.0, always
                    if encoding is None:
+                        # In 3.0, raise TypeError("explicit encoding required")
                        encoding = sys.getdefaultencoding()
                    initialiser = initialiser.encode(encoding)
                initialiser = [ord(c) for c in initialiser]
-            elif encoding is not None:
-                raise TypeError("explicit encoding invalid for non-string "
-                                "initialiser")
-            create bytes object and fill with integers from initialiser
+            else:
+                if encoding is not None:
+                    raise TypeError("explicit encoding invalid for non-string "
+                                    "initialiser")
+            # Create bytes object and fill with integers from initialiser
+            # while ensuring each integer is in range(256); initialiser
+            # can be any iterable
            return bytes object

-    The __repr__ method returns a string that can be evaluated to
+    The .__repr__() method returns a string that can be evaluated to
    generate a new bytes object containing the same sequence of
-    integers.  The sequence is represented by a list of ints.  For
-    example:
+    integers.  The sequence is represented by a list of ints using
+    hexadecimal notation.  For example:

        >>> repr(bytes[10, 20, 30])
-        'bytes([10, 20, 30])'
+        'bytes([0x0a, 0x14, 0x1e])'

-    The object has a decode method equivalent to the decode method of
-    the str object.  The object has a classmethod fromhex that takes a
-    string of characters from the set [0-9a-zA-Z ] and returns a bytes
-    object (similar to binascii.unhexlify).  For example:
+    The object has a .decode() method equivalent to the .decode()
+    method of the str object.  (This is redundant since it can also be
+    decoded by calling unicode(b, <encoding>) (in 2.6) or str(b,
+    <encoding>) (in 3.0); do we need encode/decode methods?  In a
+    sense the spelling using a constructor is cleaner.)  The object
+    has a classmethod .fromhex() that takes a string of characters
+    from the set [0-9a-zA-Z ] and returns a bytes object (similar to
+    binascii.unhexlify).  For example:

        >>> bytes.fromhex('5c5350ff')
        bytes([92, 83, 80, 255]])
        >>> bytes.fromhex('5c 53 50 ff')
        bytes([92, 83, 80, 255]])

-    The object has a hex method that does the reverse conversion
+    The object has a .hex() method that does the reverse conversion
    (similar to binascii.hexlify):

        >> bytes([92, 83, 80, 255]]).hex()
        '5c5350ff'

-    The bytes object has methods similar to the list object:
+    The bytes object has some methods similar to list method, and
+    others similar to str methods:

        __add__
-        __contains__
+        __contains__ (with int arg, like list; with bytes arg, like str)
        __delitem__
        __delslice__
        __eq__
@ -95,7 +107,6 @@ Specification
        __getitem__
        __getslice__
        __gt__
-        __hash__
        __iadd__
        __imul__
        __iter__
@ -107,16 +118,39 @@ Specification
        __reduce__
        __reduce_ex__
        __repr__
+        __reversed__
        __rmul__
        __setitem__
        __setslice__
        append
        count
+        decode
+        endswith
        extend
+        find
        index
        insert
+        join
+        partition
        pop
        remove
+        replace
+        rindex
+        rpartition
+        split
+        startswith
+        reverse
+        rfind
+        rindex
+        rsplit
+        translate
+
+    Note the conspicuous absence of .isupper(), .upper(), and friends.
+    There is no __hash__ because the object is mutable.  There is no
+    usecase for a .sort() method.
+
+    The bytes also supports the buffer interface, supporting reading
+    and writing binary (but not character) data.


 Out of scope issues
@ -127,7 +161,9 @@ Out of scope issues
      (which requires lexer and parser support in addition to everything
      else).  Since there appears to be no immediate need for a literal
      representation, designing and implementing one is out of the scope
-      of this PEP.
+      of this PEP.  (Hmm...  A b"..." literal accepting only ASCII
+      values is likely to be added to 3.0; not clear about 2.6.  This
+      needs a PEP.)

    * Python 3k will have a much different I/O subsystem.  Deciding how
      that I/O subsystem will work and interact with the bytes object is
@ -140,19 +176,19 @@ Out of scope issues

 Unresolved issues

-    * Perhaps the bytes object should be implemented as a extension
-      module until we are more sure of the design (similar to how the
-      set object was prototyped).
+    * Need to specify the methods more carefully.  

-    * Should the bytes object implement the buffer interface?  Probably,
-      but we need to look into the implications of that (e.g. regex
-      operations on byte arrays).
+    * Should all those list methods really be implemented?

-    * Should the object implement __reversed__ and reverse?  Should it
-      implement sort?
+    * A case could be made for supporting .ljust(), .rjust(),
+      .center() with a mandatory second argument.

-    * Need to clarify what some of the methods do.  How are comparisons
-      done?  Hashing?  Pickling and marshalling?
+    * A case could be made for supporting .split() with a mandatory
+      argument.
+
+    * How should pickling and marshalling work?
+
+    * I probably forgot a few things.


 Questions and answers
@ -174,7 +210,7 @@ Questions and answers


    Q: Why does bytes ignore the encoding argument if the initialiser is
-       a str?
+       a str?  (This only applies to 2.6.)

    A: There is no sane meaning that the encoding can have in that case.
       str objects *are* byte arrays and they know nothing about the