python-peps/pep-0358.txt

PEP: 358
Title: The "bytes" Object
Version: $Revision$
Last-Modified: $Date$
Author: Neil Schemenauer <nas@arctrix.com>, Guido van Rossum <guido@python.org>
Status: Accepted
Type: Standards Track
Content-Type: text/plain
Created: 15-Feb-2006
Python-Version: 2.6, 3.0
Post-History:


Update

    This PEP has partially been superseded by PEP 3137.


Abstract

    This PEP outlines the introduction of a raw bytes sequence type.
    Adding the bytes type is one step in the transition to Unicode
    based str objects which will be introduced in Python 3.0.

    The PEP describes how the bytes type should work in Python 2.6, as
    well as how it should work in Python 3.0.  (Occasionally there are
    differences because in Python 2.6, we have two string types, str
    and unicode, while in Python 3.0 we will only have one string
    type, whose name will be str but whose semantics will be like the
    2.6 unicode type.)


Motivation

    Python's current string objects are overloaded.  They serve to hold
    both sequences of characters and sequences of bytes.  This
    overloading of purpose leads to confusion and bugs.  In future
    versions of Python, string objects will be used for holding
    character data.  The bytes object will fulfil the role of a byte
    container.  Eventually the unicode type will be renamed to str
    and the old str type will be removed.


Specification

    A bytes object stores a mutable sequence of integers that are in
    the range 0 to 255.  Unlike string objects, indexing a bytes
    object returns an integer.  Assigning or comparing an object that
    is not an integer to an element causes a TypeError exception.
    Assigning an element to a value outside the range 0 to 255 causes
    a ValueError exception.  The .__len__() method of bytes returns
    the number of integers stored in the sequence (i.e. the number of
    bytes).

    The constructor of the bytes object has the following signature:

        bytes([initializer[, encoding]])

    If no arguments are provided then a bytes object containing zero
    elements is created and returned.  The initializer argument can be
    a string (in 2.6, either str or unicode), an iterable of integers,
    or a single integer.  The pseudo-code for the constructor
    (optimized for clear semantics, not for speed) is:

        def bytes(initializer=0, encoding=None):
            if isinstance(initializer, int): # In 2.6, int -> (int, long)
                initializer = [0]*initializer
            elif isinstance(initializer, basestring):
                if isinstance(initializer, unicode): # In 3.0, "if True"
                    if encoding is None:
                        # In 3.0, raise TypeError("explicit encoding required")
                        encoding = sys.getdefaultencoding()
                    initializer = initializer.encode(encoding)
                initializer = [ord(c) for c in initializer]
            else:
                if encoding is not None:
                    raise TypeError("no encoding allowed for this initializer")
                tmp = []
                for c in initializer:
                    if not isinstance(c, int):
                        raise TypeError("initializer must be iterable of ints")
                    if not 0 <= c < 256:
                        raise ValueError("initializer element out of range")
                    tmp.append(c)
                initializer = tmp
            new = <new bytes object of length len(initializer)>
            for i, c in enumerate(initializer):
                new[i] = c
            return new

    The .__repr__() method returns a string that can be evaluated to
    generate a new bytes object containing a bytes literal:

        >>> bytes([10, 20, 30])
        b'\n\x14\x1e'

    The object has a .decode() method equivalent to the .decode()
    method of the str object.  The object has a classmethod .fromhex()
    that takes a string of characters from the set [0-9a-fA-F ] and
    returns a bytes object (similar to binascii.unhexlify).  For
    example:

        >>> bytes.fromhex('5c5350ff')
	b'\\SP\xff'
        >>> bytes.fromhex('5c 53 50 ff')
	b'\\SP\xff'

    The object has a .hex() method that does the reverse conversion
    (similar to binascii.hexlify):

        >> bytes([92, 83, 80, 255]).hex()
        '5c5350ff'

    The bytes object has some methods similar to list methods, and
    others similar to str methods.  Here is a complete list of
    methods, with their approximate signatures:

        .__add__(bytes) -> bytes
        .__contains__(int | bytes) -> bool
        .__delitem__(int | slice) -> None
        .__delslice__(int, int) -> None
        .__eq__(bytes) -> bool
        .__ge__(bytes) -> bool
        .__getitem__(int | slice) -> int | bytes
        .__getslice__(int, int) -> bytes
        .__gt__(bytes) -> bool
        .__iadd__(bytes) -> bytes
        .__imul__(int) -> bytes
        .__iter__() -> iterator
        .__le__(bytes) -> bool
        .__len__() -> int
        .__lt__(bytes) -> bool
        .__mul__(int) -> bytes
        .__ne__(bytes) -> bool
        .__reduce__(...) -> ...
        .__reduce_ex__(...) -> ...
        .__repr__() -> str
        .__reversed__() -> bytes
        .__rmul__(int) -> bytes
        .__setitem__(int | slice, int | iterable[int]) -> None
        .__setslice__(int, int, iterable[int]) -> Bote
        .append(int) -> None
        .count(int) -> int
        .decode(str) -> str | unicode # in 3.0, only str
        .endswith(bytes) -> bool
        .extend(iterable[int]) -> None
        .find(bytes) -> int
        .index(bytes | int) -> int
        .insert(int, int) -> None
        .join(iterable[bytes]) -> bytes
        .partition(bytes) -> (bytes, bytes, bytes)
        .pop([int]) -> int
        .remove(int) -> None
        .replace(bytes, bytes) -> bytes
        .rindex(bytes | int) -> int
        .rpartition(bytes) -> (bytes, bytes, bytes)
        .split(bytes) -> list[bytes]
        .startswith(bytes) -> bool
        .reverse() -> None
        .rfind(bytes) -> int
        .rindex(bytes | int) -> int
        .rsplit(bytes) -> list[bytes]
        .translate(bytes, [bytes]) -> bytes

    Note the conspicuous absence of .isupper(), .upper(), and friends.
    (But see "Open Issues" below.)  There is no .__hash__() because
    the object is mutable.  There is no use case for a .sort() method.

    The bytes type also supports the buffer interface, supporting
    reading and writing binary (but not character) data.


Out of Scope Issues

    * Python 3k will have a much different I/O subsystem.  Deciding
      how that I/O subsystem will work and interact with the bytes
      object is out of the scope of this PEP.  The expectation however
      is that binary I/O will read and write bytes, while text I/O
      will read strings.  Since the bytes type supports the buffer
      interface, the existing binary I/O operations in Python 2.6 will
      support bytes objects.

    * It has been suggested that a special method named .__bytes__()
      be added to the language to allow objects to be converted into
      byte arrays.  This decision is out of scope.

    * A bytes literal of the form b"..." is also proposed.  This is
      the subject of PEP 3112.


Open Issues

    * The .decode() method is redundant since a bytes object b can
      also be decoded by calling unicode(b, <encoding>) (in 2.6) or
      str(b, <encoding>) (in 3.0).  Do we need encode/decode methods
      at all?  In a sense the spelling using a constructor is cleaner.

    * Need to specify the methods still more carefully.

    * Pickling and marshalling support need to be specified.

    * Should all those list methods really be implemented?

    * A case could be made for supporting .ljust(), .rjust(),
      .center() with a mandatory second argument.

    * A case could be made for supporting .split() with a mandatory
      argument.

    * A case could even be made for supporting .islower(), .isupper(),
      .isspace(), .isalpha(), .isalnum(), .isdigit() and the
      corresponding conversions (.lower() etc.), using the ASCII
      definitions for letters, digits and whitespace.  If this is
      accepted, the cases for .ljust(), .rjust(), .center() and
      .split() become much stronger, and they should have default
      arguments as well, using an ASCII space or all ASCII whitespace
      (for .split()).


Frequently Asked Questions

    Q: Why have the optional encoding argument when the encode method of
       Unicode objects does the same thing?

    A: In the current version of Python, the encode method returns a str
       object and we cannot change that without breaking code.  The
       construct bytes(s.encode(...)) is expensive because it has to
       copy the byte sequence multiple times.  Also, Python generally
       provides two ways of converting an object of type A into an
       object of type B: ask an A instance to convert itself to a B, or
       ask the type B to create a new instance from an A. Depending on
       what A and B are, both APIs make sense; sometimes reasons of
       decoupling require that A can't know about B, in which case you
       have to use the latter approach; sometimes B can't know about A,
       in which case you have to use the former.


    Q: Why does bytes ignore the encoding argument if the initializer is
       a str?  (This only applies to 2.6.)

    A: There is no sane meaning that the encoding can have in that case.
       str objects *are* byte arrays and they know nothing about the
       encoding of character data they contain.  We need to assume that
       the programmer has provided a str object that already uses the
       desired encoding. If you need something other than a pure copy of
       the bytes then you need to first decode the string.  For example:

           bytes(s.decode(encoding1), encoding2)


    Q: Why not have the encoding argument default to Latin-1 (or some
       other encoding that covers the entire byte range) rather than
       ASCII?

    A: The system default encoding for Python is ASCII.  It seems least
       confusing to use that default.  Also, in Py3k, using Latin-1 as
       the default might not be what users expect.  For example, they
       might prefer a Unicode encoding.  Any default will not always
       work as expected.  At least ASCII will complain loudly if you try
       to encode non-ASCII data.


Copyright

    This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
+								PEP: 358
-												Add PEP 358 to index.

											
										
										
											2006-02-22 15:43:33 -05:00
+								Title: The "bytes" Object
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
+								Version: $Revision$
 								Last-Modified: $Date$
-												Fix my email.

											
										
										
											2007-06-20 00:21:24 -04:00
+								Author: Neil Schemenauer <nas@arctrix.com>, Guido van Rossum <guido@python.org>
-												PEP 358 is accepted.

											
										
										
											2007-03-18 16:04:00 -04:00
+								Status: Accepted
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
+								Type: Standards Track
 								Content-Type: text/plain
 								Created: 15-Feb-2006
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
+								Python-Version: 2.6, 3.0
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
+								Post-History:
-												Mark PEP 3137 as accepted.
Mark Summerfield sent some comments on PEP 358 (I also referenced PEP 3137).

											
										
										
											2007-10-03 12:04:53 -04:00
+								Update
 								    This PEP has partially been superseded by PEP 3137.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
+								Abstract
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    This PEP outlines the introduction of a raw bytes sequence type.
 								    Adding the bytes type is one step in the transition to Unicode
 								    based str objects which will be introduced in Python 3.0.
 								    The PEP describes how the bytes type should work in Python 2.6, as
 								    well as how it should work in Python 3.0.  (Occasionally there are
 								    differences because in Python 2.6, we have two string types, str
 								    and unicode, while in Python 3.0 we will only have one string
 								    type, whose name will be str but whose semantics will be like the
 .6 unicode type.)
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
 								Motivation
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
+								    Python's current string objects are overloaded.  They serve to hold
 								    both sequences of characters and sequences of bytes.  This
 								    overloading of purpose leads to confusion and bugs.  In future
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								    versions of Python, string objects will be used for holding
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
+								    character data.  The bytes object will fulfil the role of a byte
-												Mark PEP 3137 as accepted.
Mark Summerfield sent some comments on PEP 358 (I also referenced PEP 3137).

											
										
										
											2007-10-03 12:04:53 -04:00
+								    container.  Eventually the unicode type will be renamed to str
 								    and the old str type will be removed.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
 								Specification
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
+								    A bytes object stores a mutable sequence of integers that are in
 								    the range 0 to 255.  Unlike string objects, indexing a bytes
-												Mark PEP 3137 as accepted.
Mark Summerfield sent some comments on PEP 358 (I also referenced PEP 3137).

											
										
										
											2007-10-03 12:04:53 -04:00
+								    object returns an integer.  Assigning or comparing an object that
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    is not an integer to an element causes a TypeError exception.
 								    Assigning an element to a value outside the range 0 to 255 causes
 								    a ValueError exception.  The .__len__() method of bytes returns
 								    the number of integers stored in the sequence (i.e. the number of
 								    bytes).
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
 								    The constructor of the bytes object has the following signature:
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								        bytes([initializer[, encoding]])
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    If no arguments are provided then a bytes object containing zero
 								    elements is created and returned.  The initializer argument can be
 								    a string (in 2.6, either str or unicode), an iterable of integers,
 								    or a single integer.  The pseudo-code for the constructor
 								    (optimized for clear semantics, not for speed) is:
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								        def bytes(initializer=0, encoding=None):
-												Mark PEP 3137 as accepted.
Mark Summerfield sent some comments on PEP 358 (I also referenced PEP 3137).

											
										
										
											2007-10-03 12:04:53 -04:00
+								            if isinstance(initializer, int): # In 2.6, int -> (int, long)
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								                initializer = [0]*initializer
 								            elif isinstance(initializer, basestring):
-												Mark PEP 3137 as accepted.
Mark Summerfield sent some comments on PEP 358 (I also referenced PEP 3137).

											
										
										
											2007-10-03 12:04:53 -04:00
+								                if isinstance(initializer, unicode): # In 3.0, "if True"
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								                    if encoding is None:
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
+								                        # In 3.0, raise TypeError("explicit encoding required")
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								                        encoding = sys.getdefaultencoding()
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								                    initializer = initializer.encode(encoding)
 								                initializer = [ord(c) for c in initializer]
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
+								            else:
 								                if encoding is not None:
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								                    raise TypeError("no encoding allowed for this initializer")
 								                tmp = []
 								                for c in initializer:
 								                    if not isinstance(c, int):
 								                        raise TypeError("initializer must be iterable of ints")
 								                    if not 0 <= c < 256:
 								                        raise ValueError("initializer element out of range")
 								                    tmp.append(c)
 								                initializer = tmp
 								            new = <new bytes object of length len(initializer)>
 								            for i, c in enumerate(initializer):
 								                new[i] = c
 								            return new
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
+								    The .__repr__() method returns a string that can be evaluated to
-												Mark a few PEP 3100 items as done, update PEP 358 wrt. bytes literals.

											
										
										
											2007-02-26 12:33:15 -05:00
+								    generate a new bytes object containing a bytes literal:
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
-												Mark a few PEP 3100 items as done, update PEP 358 wrt. bytes literals.

											
										
										
											2007-02-26 12:33:15 -05:00
+								        >>> bytes([10, 20, 30])
 								        b'\n\x14\x1e'
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
+								    The object has a .decode() method equivalent to the .decode()
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    method of the str object.  The object has a classmethod .fromhex()
-												Correctly specify the set of hexadecimal characters.

											
										
										
											2007-02-27 03:39:07 -05:00
+								    that takes a string of characters from the set [0-9a-fA-F ] and
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    returns a bytes object (similar to binascii.unhexlify).  For
 								    example:
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
 								        >>> bytes.fromhex('5c5350ff')
-												Mark a few PEP 3100 items as done, update PEP 358 wrt. bytes literals.

											
										
										
											2007-02-26 12:33:15 -05:00
+									b'\\SP\xff'
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								        >>> bytes.fromhex('5c 53 50 ff')
-												Mark a few PEP 3100 items as done, update PEP 358 wrt. bytes literals.

											
										
										
											2007-02-26 12:33:15 -05:00
+									b'\\SP\xff'
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
+								    The object has a .hex() method that does the reverse conversion
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								    (similar to binascii.hexlify):
-												Mark a few PEP 3100 items as done, update PEP 358 wrt. bytes literals.

											
										
										
											2007-02-26 12:33:15 -05:00
+								        >> bytes([92, 83, 80, 255]).hex()
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								        '5c5350ff'
-												Mark PEP 3137 as accepted.
Mark Summerfield sent some comments on PEP 358 (I also referenced PEP 3137).

											
										
										
											2007-10-03 12:04:53 -04:00
+								    The bytes object has some methods similar to list methods, and
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    others similar to str methods.  Here is a complete list of
 								    methods, with their approximate signatures:
 								        .__add__(bytes) -> bytes
 								        .__contains__(int | bytes) -> bool
 								        .__delitem__(int | slice) -> None
 								        .__delslice__(int, int) -> None
 								        .__eq__(bytes) -> bool
 								        .__ge__(bytes) -> bool
 								        .__getitem__(int | slice) -> int | bytes
 								        .__getslice__(int, int) -> bytes
 								        .__gt__(bytes) -> bool
 								        .__iadd__(bytes) -> bytes
 								        .__imul__(int) -> bytes
 								        .__iter__() -> iterator
 								        .__le__(bytes) -> bool
 								        .__len__() -> int
 								        .__lt__(bytes) -> bool
 								        .__mul__(int) -> bytes
 								        .__ne__(bytes) -> bool
 								        .__reduce__(...) -> ...
 								        .__reduce_ex__(...) -> ...
 								        .__repr__() -> str
 								        .__reversed__() -> bytes
 								        .__rmul__(int) -> bytes
 								        .__setitem__(int | slice, int | iterable[int]) -> None
 								        .__setslice__(int, int, iterable[int]) -> Bote
 								        .append(int) -> None
 								        .count(int) -> int
 								        .decode(str) -> str | unicode # in 3.0, only str
 								        .endswith(bytes) -> bool
 								        .extend(iterable[int]) -> None
 								        .find(bytes) -> int
 								        .index(bytes | int) -> int
 								        .insert(int, int) -> None
 								        .join(iterable[bytes]) -> bytes
 								        .partition(bytes) -> (bytes, bytes, bytes)
 								        .pop([int]) -> int
 								        .remove(int) -> None
 								        .replace(bytes, bytes) -> bytes
 								        .rindex(bytes | int) -> int
 								        .rpartition(bytes) -> (bytes, bytes, bytes)
 								        .split(bytes) -> list[bytes]
 								        .startswith(bytes) -> bool
 								        .reverse() -> None
 								        .rfind(bytes) -> int
 								        .rindex(bytes | int) -> int
 								        .rsplit(bytes) -> list[bytes]
 								        .translate(bytes, [bytes]) -> bytes
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
 								    Note the conspicuous absence of .isupper(), .upper(), and friends.
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    (But see "Open Issues" below.)  There is no .__hash__() because
 								    the object is mutable.  There is no use case for a .sort() method.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    The bytes type also supports the buffer interface, supporting
 								    reading and writing binary (but not character) data.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								Out of Scope Issues
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    * Python 3k will have a much different I/O subsystem.  Deciding
 								      how that I/O subsystem will work and interact with the bytes
 								      object is out of the scope of this PEP.  The expectation however
 								      is that binary I/O will read and write bytes, while text I/O
 								      will read strings.  Since the bytes type supports the buffer
 								      interface, the existing binary I/O operations in Python 2.6 will
 								      support bytes objects.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    * It has been suggested that a special method named .__bytes__()
-												Mark PEP 3137 as accepted.
Mark Summerfield sent some comments on PEP 358 (I also referenced PEP 3137).

											
										
										
											2007-10-03 12:04:53 -04:00
+								      be added to the language to allow objects to be converted into
 								      byte arrays.  This decision is out of scope.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Add the bytes literal PEP, by Jason Orendorff.
And accept it!

											
										
										
											2007-02-24 00:42:52 -05:00
+								    * A bytes literal of the form b"..." is also proposed.  This is
 								      the subject of PEP 3112.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								Open Issues
 								    * The .decode() method is redundant since a bytes object b can
 								      also be decoded by calling unicode(b, <encoding>) (in 2.6) or
 								      str(b, <encoding>) (in 3.0).  Do we need encode/decode methods
 								      at all?  In a sense the spelling using a constructor is cleaner.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    * Need to specify the methods still more carefully.
 								    * Pickling and marshalling support need to be specified.
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
 								    * Should all those list methods really be implemented?
 								    * A case could be made for supporting .ljust(), .rjust(),
 								      .center() with a mandatory second argument.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
+								    * A case could be made for supporting .split() with a mandatory
 								      argument.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    * A case could even be made for supporting .islower(), .isupper(),
 								      .isspace(), .isalpha(), .isalnum(), .isdigit() and the
 								      corresponding conversions (.lower() etc.), using the ASCII
 								      definitions for letters, digits and whitespace.  If this is
 								      accepted, the cases for .ljust(), .rjust(), .center() and
 								      .split() become much stronger, and they should have default
 								      arguments as well, using an ASCII space or all ASCII whitespace
 								      (for .split()).
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								Frequently Asked Questions
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								    Q: Why have the optional encoding argument when the encode method of
-												Mark a few PEP 3100 items as done, update PEP 358 wrt. bytes literals.

											
										
										
											2007-02-26 12:33:15 -05:00
+								       Unicode objects does the same thing?
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								    A: In the current version of Python, the encode method returns a str
 								       object and we cannot change that without breaking code.  The
 								       construct bytes(s.encode(...)) is expensive because it has to
 								       copy the byte sequence multiple times.  Also, Python generally
 								       provides two ways of converting an object of type A into an
 								       object of type B: ask an A instance to convert itself to a B, or
 								       ask the type B to create a new instance from an A. Depending on
 								       what A and B are, both APIs make sense; sometimes reasons of
 								       decoupling require that A can't know about B, in which case you
 								       have to use the latter approach; sometimes B can't know about A,
 								       in which case you have to use the former.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Another update, clarifying (I hope) the method signatures and mentioning
other stuff that came up over dinner.

											
										
										
											2007-02-22 23:31:15 -05:00
+								    Q: Why does bytes ignore the encoding argument if the initializer is
-												Update the bytes object to better resemble my intentions.

											
										
										
											2007-02-22 18:57:46 -05:00
+								       a str?  (This only applies to 2.6.)
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								    A: There is no sane meaning that the encoding can have in that case.
 								       str objects *are* byte arrays and they know nothing about the
 								       encoding of character data they contain.  We need to assume that
-												Mark PEP 3137 as accepted.
Mark Summerfield sent some comments on PEP 358 (I also referenced PEP 3137).

											
										
										
											2007-10-03 12:04:53 -04:00
+								       the programmer has provided a str object that already uses the
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								       desired encoding. If you need something other than a pure copy of
 								       the bytes then you need to first decode the string.  For example:
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								           bytes(s.decode(encoding1), encoding2)
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								    Q: Why not have the encoding argument default to Latin-1 (or some
 								       other encoding that covers the entire byte range) rather than
 								       ASCII?
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								    A: The system default encoding for Python is ASCII.  It seems least
 								       confusing to use that default.  Also, in Py3k, using Latin-1 as
 								       the default might not be what users expect.  For example, they
 								       might prefer a Unicode encoding.  Any default will not always
 								       work as expected.  At least ASCII will complain loudly if you try
 								       to encode non-ASCII data.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
 								Copyright
-												Reformat.

											
										
										
											2006-02-22 15:49:37 -05:00
+								    This document has been placed in the public domain.
-												Add 'The "bytes" object' PEP.

											
										
										
											2006-02-22 15:40:03 -05:00
-												Use trailing Emacs section from plaintext template.

											
										
										
											2007-02-23 04:01:52 -05:00
+								Local Variables:
 								mode: indented-text
 								indent-tabs-mode: nil
 								sentence-end-double-space: t
 								fill-column: 70
 								coding: utf-8
 								End: