Another update, clarifying (I hope) the method signatures and mentioning

other stuff that came up over dinner.
2007-02-23 04:31:15 +00:00 · 2007-02-23 04:31:15 +00:00 · 9ca7df06ee
parent 3ed8bc79de
commit 9ca7df06ee
1 changed files with 136 additions and 107 deletions
--- a/pep-0358.txt
+++ b/pep-0358.txt
@ -13,9 +13,16 @@ Post-History:
 Abstract
-    This PEP outlines the introduction of a raw bytes sequence object.
+    This PEP outlines the introduction of a raw bytes sequence type.
-    Adding the bytes object is one step in the transition to Unicode
+    Adding the bytes type is one step in the transition to Unicode
-    based str objects.
+    based str objects which will be introduced in Python 3.0.
    The PEP describes how the bytes type should work in Python 2.6, as
    well as how it should work in Python 3.0.  (Occasionally there are
    differences because in Python 2.6, we have two string types, str
    and unicode, while in Python 3.0 we will only have one string
    type, whose name will be str but whose semantics will be like the
    2.6 unicode type.)
 Motivation
@ -33,39 +40,48 @@ Specification
    A bytes object stores a mutable sequence of integers that are in
    the range 0 to 255.  Unlike string objects, indexing a bytes
-    object returns an integer.  Assigning an element using a object
+    object returns an integer.  Assigning or comparin an object that
-    that is not an integer causes a TypeError exception.  Assigning an
+    is not an integer to an element causes a TypeError exception.
-    element to a value outside the range 0 to 255 causes a ValueError
+    Assigning an element to a value outside the range 0 to 255 causes
-    exception.  The .__len__() method of bytes returns the number of
+    a ValueError exception.  The .__len__() method of bytes returns
-    integers stored in the sequence (i.e. the number of bytes).
+    the number of integers stored in the sequence (i.e. the number of
    bytes).
    The constructor of the bytes object has the following signature:
-        bytes([initialiser[, [encoding]])
+        bytes([initializer[, encoding]])
-    If no arguments are provided then an object containing zero elements
+    If no arguments are provided then a bytes object containing zero
-    is created and returned.  The initialiser argument can be a string,
+    elements is created and returned.  The initializer argument can be
-    a sequence of integers, or a single integer.  The pseudo-code for the
+    a string (in 2.6, either str or unicode), an iterable of integers,
-    constructor is:
+    or a single integer.  The pseudo-code for the constructor
    (optimized for clear semantics, not for speed) is:
-        def bytes(initialiser=[], encoding=None):
+        def bytes(initializer=0, encoding=None):
-            if isinstance(initialiser, int): # In 2.6, (int, long)
+            if isinstance(initializer, int): # In 2.6, (int, long)
-                initialiser = [0]*initialiser
+                initializer = [0]*initializer
-            elif isinstance(initialiser, basestring):
+            elif isinstance(initializer, basestring):
-                if isinstance(initialiser, unicode): # In 3.0, always
+                if isinstance(initializer, unicode): # In 3.0, always
                    if encoding is None:
                        # In 3.0, raise TypeError("explicit encoding required")
                        encoding = sys.getdefaultencoding()
-                    initialiser = initialiser.encode(encoding)
+                    initializer = initializer.encode(encoding)
-                initialiser = [ord(c) for c in initialiser]
+                initializer = [ord(c) for c in initializer]
            else:
                if encoding is not None:
-                    raise TypeError("explicit encoding invalid for non-string "
+                    raise TypeError("no encoding allowed for this initializer")
-                                    "initialiser")
+                tmp = []
-            # Create bytes object and fill with integers from initialiser
+                for c in initializer:
-            # while ensuring each integer is in range(256); initialiser
+                    if not isinstance(c, int):
-            # can be any iterable
+                        raise TypeError("initializer must be iterable of ints")
-            return bytes object
+                    if not 0 <= c < 256:
                        raise ValueError("initializer element out of range")
                    tmp.append(c)
                initializer = tmp
            new = <new bytes object of length len(initializer)>
            for i, c in enumerate(initializer):
                new[i] = c
            return new
    The .__repr__() method returns a string that can be evaluated to
    generate a new bytes object containing the same sequence of
@ -76,13 +92,10 @@ Specification
        'bytes([0x0a, 0x14, 0x1e])'
    The object has a .decode() method equivalent to the .decode()
-    method of the str object.  (This is redundant since it can also be
+    method of the str object.  The object has a classmethod .fromhex()
-    decoded by calling unicode(b, <encoding>) (in 2.6) or str(b,
+    that takes a string of characters from the set [0-9a-zA-Z ] and
-    <encoding>) (in 3.0); do we need encode/decode methods?  In a
+    returns a bytes object (similar to binascii.unhexlify).  For
-    sense the spelling using a constructor is cleaner.)  The object
+    example:
    has a classmethod .fromhex() that takes a string of characters
    from the set [0-9a-zA-Z ] and returns a bytes object (similar to
    binascii.unhexlify).  For example:
        >>> bytes.fromhex('5c5350ff')
        bytes([92, 83, 80, 255]])
@ -96,102 +109,118 @@ Specification
        '5c5350ff'
    The bytes object has some methods similar to list method, and
-    others similar to str methods:
+    others similar to str methods.  Here is a complete list of
    methods, with their approximate signatures:
-        __add__
+        .__add__(bytes) -> bytes
-        __contains__ (with int arg, like list; with bytes arg, like str)
+        .__contains__(int | bytes) -> bool
-        __delitem__
+        .__delitem__(int | slice) -> None
-        __delslice__
+        .__delslice__(int, int) -> None
-        __eq__
+        .__eq__(bytes) -> bool
-        __ge__
+        .__ge__(bytes) -> bool
-        __getitem__
+        .__getitem__(int | slice) -> int | bytes
-        __getslice__
+        .__getslice__(int, int) -> bytes
-        __gt__
+        .__gt__(bytes) -> bool
-        __iadd__
+        .__iadd__(bytes) -> bytes
-        __imul__
+        .__imul__(int) -> bytes
-        __iter__
+        .__iter__() -> iterator
-        __le__
+        .__le__(bytes) -> bool
-        __len__
+        .__len__() -> int
-        __lt__
+        .__lt__(bytes) -> bool
-        __mul__
+        .__mul__(int) -> bytes
-        __ne__
+        .__ne__(bytes) -> bool
-        __reduce__
+        .__reduce__(...) -> ...
-        __reduce_ex__
+        .__reduce_ex__(...) -> ...
-        __repr__
+        .__repr__() -> str
-        __reversed__
+        .__reversed__() -> bytes
-        __rmul__
+        .__rmul__(int) -> bytes
-        __setitem__
+        .__setitem__(int | slice, int | iterable[int]) -> None
-        __setslice__
+        .__setslice__(int, int, iterable[int]) -> Bote
-        append
+        .append(int) -> None
-        count
+        .count(int) -> int
-        decode
+        .decode(str) -> str | unicode # in 3.0, only str
-        endswith
+        .endswith(bytes) -> bool
-        extend
+        .extend(iterable[int]) -> None
-        find
+        .find(bytes) -> int
-        index
+        .index(bytes | int) -> int
-        insert
+        .insert(int, int) -> None
-        join
+        .join(iterable[bytes]) -> bytes
-        partition
+        .partition(bytes) -> (bytes, bytes, bytes)
-        pop
+        .pop([int]) -> int
-        remove
+        .remove(int) -> None
-        replace
+        .replace(bytes, bytes) -> bytes
-        rindex
+        .rindex(bytes | int) -> int
-        rpartition
+        .rpartition(bytes) -> (bytes, bytes, bytes)
-        split
+        .split(bytes) -> list[bytes]
-        startswith
+        .startswith(bytes) -> bool
-        reverse
+        .reverse() -> None
-        rfind
+        .rfind(bytes) -> int
-        rindex
+        .rindex(bytes | int) -> int
-        rsplit
+        .rsplit(bytes) -> list[bytes]
-        translate
+        .translate(bytes, [bytes]) -> bytes
    Note the conspicuous absence of .isupper(), .upper(), and friends.
-    There is no __hash__ because the object is mutable.  There is no
+    (But see "Open Issues" below.)  There is no .__hash__() because
-    usecase for a .sort() method.
+    the object is mutable.  There is no use case for a .sort() method.
-    The bytes also supports the buffer interface, supporting reading
+    The bytes type also supports the buffer interface, supporting
-    and writing binary (but not character) data.
+    reading and writing binary (but not character) data.
-Out of scope issues
+Out of Scope Issues
-    * If we provide a literal syntax for bytes then it should look
+    * Python 3k will have a much different I/O subsystem.  Deciding
-      distinctly different than the syntax for literal strings.  Also, a
+      how that I/O subsystem will work and interact with the bytes
-      new type, even built-in, is much less drastic than a new literal
+      object is out of the scope of this PEP.  The expectation however
-      (which requires lexer and parser support in addition to everything
+      is that binary I/O will read and write bytes, while text I/O
-      else).  Since there appears to be no immediate need for a literal
+      will read strings.  Since the bytes type supports the buffer
-      representation, designing and implementing one is out of the scope
+      interface, the existing binary I/O operations in Python 2.6 will
-      of this PEP.  (Hmm...  A b"..." literal accepting only ASCII
+      support bytes objects.
      values is likely to be added to 3.0; not clear about 2.6.  This
      needs a PEP.)
-    * Python 3k will have a much different I/O subsystem.  Deciding how
+    * It has been suggested that a special method named .__bytes__()
-      that I/O subsystem will work and interact with the bytes object is
+      be added to language to allow objects to be converted into byte
      out of the scope of this PEP.
    * It has been suggested that a special method named __bytes__ be
      added to language to allow objects to be converted into byte
      arrays.  This decision is out of scope.
-Unresolved issues
+Open Issues
-    * Need to specify the methods more carefully.  
+    * The .decode() method is redundant since a bytes object b can
      also be decoded by calling unicode(b, <encoding>) (in 2.6) or
      str(b, <encoding>) (in 3.0).  Do we need encode/decode methods
      at all?  In a sense the spelling using a constructor is cleaner.
    * Need to specify the methods still more carefully.
    * Pickling and marshalling support need to be specified.
    * Should all those list methods really be implemented?
    * There is growing support for a b"..." literal.  Here's a brief
      spec.  Each invocation of b"..." produces a new bytes object
      (this is unlike "..." but similar to [...] and {...}).  Inside
      the literal, only ASCII characters and non-Unicode backslash
      escapes are allowed; non-ASCII characters not specified as
      escapes are rejected by the compiler regardless of the source
      encoding.  The resulting object's value is the same as if
      bytes(map(ord, "...")) were called.
    * A case could be made for supporting .ljust(), .rjust(),
      .center() with a mandatory second argument.
    * A case could be made for supporting .split() with a mandatory
      argument.
-    * How should pickling and marshalling work?
+    * A case could even be made for supporting .islower(), .isupper(),
-
+      .isspace(), .isalpha(), .isalnum(), .isdigit() and the
-    * I probably forgot a few things.
+      corresponding conversions (.lower() etc.), using the ASCII
      definitions for letters, digits and whitespace.  If this is
      accepted, the cases for .ljust(), .rjust(), .center() and
      .split() become much stronger, and they should have default
      arguments as well, using an ASCII space or all ASCII whitespace
      (for .split()).
-Questions and answers
+Frequently Asked Questions
    Q: Why have the optional encoding argument when the encode method of
       Unicode objects does the same thing.
@ -209,7 +238,7 @@ Questions and answers
       in which case you have to use the former.
-    Q: Why does bytes ignore the encoding argument if the initialiser is
+    Q: Why does bytes ignore the encoding argument if the initializer is
       a str?  (This only applies to 2.6.)
    A: There is no sane meaning that the encoding can have in that case.