Update the bytes object to better resemble my intentions.

This commit is contained in:
Guido van Rossum 2007-02-22 23:57:46 +00:00
parent 7c97ac0ea4
commit 3ed8bc79de
2 changed files with 84 additions and 48 deletions

View File

@ -107,7 +107,7 @@ Index by Category
I 350 Codetags Elliott
S 354 Enumerations in Python Finney
S 355 Path - Object oriented filesystem paths Lindqvist
S 358 The "bytes" Object Schemenauer
S 358 The "bytes" Object Schemenauer, GvR
S 362 Function Signature Object Cannon, Seo
S 754 IEEE 754 Floating Point Special Values Warnes
S 3101 Advanced String Formatting Talin
@ -431,7 +431,7 @@ Numerical Index
S 355 Path - Object oriented filesystem paths Lindqvist
IF 356 Python 2.5 Release Schedule Norwitz, et al
SF 357 Allowing Any Object to be Used for Slicing Oliphant
S 358 The "bytes" Object Schemenauer
S 358 The "bytes" Object Schemenauer, GvR
SW 359 The "make" Statement Bethard
I 360 Externally Maintained Packages Cannon
I 361 Python 2.6 Release Schedule Norwitz, et al

View File

@ -2,12 +2,12 @@ PEP: 358
Title: The "bytes" Object
Version: $Revision$
Last-Modified: $Date$
Author: Neil Schemenauer <nas@arctrix.com>
Author: Neil Schemenauer <nas@arctrix.com>, Guido van Rossum <guido@google.com>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 15-Feb-2006
Python-Version: 2.5
Python-Version: 2.6, 3.0
Post-History:
@ -20,74 +20,86 @@ Abstract
Motivation
Python's current string objects are overloaded. They serve to hold
both sequences of characters and sequences of bytes. This
overloading of purpose leads to confusion and bugs. In future
Python's current string objects are overloaded. They serve to hold
both sequences of characters and sequences of bytes. This
overloading of purpose leads to confusion and bugs. In future
versions of Python, string objects will be used for holding
character data. The bytes object will fulfil the role of a byte
container. Eventually the unicode built-in will be renamed to str
character data. The bytes object will fulfil the role of a byte
container. Eventually the unicode built-in will be renamed to str
and the str object will be removed.
Specification
A bytes object stores a mutable sequence of integers that are in the
range 0 to 255. Unlike string objects, indexing a bytes object
returns an integer. Assigning an element using a object that is not
an integer causes a TypeError exception. Assigning an element to a
value outside the range 0 to 255 causes a ValueError exception. The
__len__ method of bytes returns the number of integers stored in the
sequence (i.e. the number of bytes).
A bytes object stores a mutable sequence of integers that are in
the range 0 to 255. Unlike string objects, indexing a bytes
object returns an integer. Assigning an element using a object
that is not an integer causes a TypeError exception. Assigning an
element to a value outside the range 0 to 255 causes a ValueError
exception. The .__len__() method of bytes returns the number of
integers stored in the sequence (i.e. the number of bytes).
The constructor of the bytes object has the following signature:
bytes([initialiser[, [encoding]])
If no arguments are provided then an object containing zero elements
is created and returned. The initialiser argument can be a string
or a sequence of integers. The pseudo-code for the constructor is:
is created and returned. The initialiser argument can be a string,
a sequence of integers, or a single integer. The pseudo-code for the
constructor is:
def bytes(initialiser=[], encoding=None):
if isinstance(initialiser, basestring):
if isinstance(initialiser, unicode):
if isinstance(initialiser, int): # In 2.6, (int, long)
initialiser = [0]*initialiser
elif isinstance(initialiser, basestring):
if isinstance(initialiser, unicode): # In 3.0, always
if encoding is None:
# In 3.0, raise TypeError("explicit encoding required")
encoding = sys.getdefaultencoding()
initialiser = initialiser.encode(encoding)
initialiser = [ord(c) for c in initialiser]
elif encoding is not None:
raise TypeError("explicit encoding invalid for non-string "
"initialiser")
create bytes object and fill with integers from initialiser
else:
if encoding is not None:
raise TypeError("explicit encoding invalid for non-string "
"initialiser")
# Create bytes object and fill with integers from initialiser
# while ensuring each integer is in range(256); initialiser
# can be any iterable
return bytes object
The __repr__ method returns a string that can be evaluated to
The .__repr__() method returns a string that can be evaluated to
generate a new bytes object containing the same sequence of
integers. The sequence is represented by a list of ints. For
example:
integers. The sequence is represented by a list of ints using
hexadecimal notation. For example:
>>> repr(bytes[10, 20, 30])
'bytes([10, 20, 30])'
'bytes([0x0a, 0x14, 0x1e])'
The object has a decode method equivalent to the decode method of
the str object. The object has a classmethod fromhex that takes a
string of characters from the set [0-9a-zA-Z ] and returns a bytes
object (similar to binascii.unhexlify). For example:
The object has a .decode() method equivalent to the .decode()
method of the str object. (This is redundant since it can also be
decoded by calling unicode(b, <encoding>) (in 2.6) or str(b,
<encoding>) (in 3.0); do we need encode/decode methods? In a
sense the spelling using a constructor is cleaner.) The object
has a classmethod .fromhex() that takes a string of characters
from the set [0-9a-zA-Z ] and returns a bytes object (similar to
binascii.unhexlify). For example:
>>> bytes.fromhex('5c5350ff')
bytes([92, 83, 80, 255]])
>>> bytes.fromhex('5c 53 50 ff')
bytes([92, 83, 80, 255]])
The object has a hex method that does the reverse conversion
The object has a .hex() method that does the reverse conversion
(similar to binascii.hexlify):
>> bytes([92, 83, 80, 255]]).hex()
'5c5350ff'
The bytes object has methods similar to the list object:
The bytes object has some methods similar to list method, and
others similar to str methods:
__add__
__contains__
__contains__ (with int arg, like list; with bytes arg, like str)
__delitem__
__delslice__
__eq__
@ -95,7 +107,6 @@ Specification
__getitem__
__getslice__
__gt__
__hash__
__iadd__
__imul__
__iter__
@ -107,16 +118,39 @@ Specification
__reduce__
__reduce_ex__
__repr__
__reversed__
__rmul__
__setitem__
__setslice__
append
count
decode
endswith
extend
find
index
insert
join
partition
pop
remove
replace
rindex
rpartition
split
startswith
reverse
rfind
rindex
rsplit
translate
Note the conspicuous absence of .isupper(), .upper(), and friends.
There is no __hash__ because the object is mutable. There is no
usecase for a .sort() method.
The bytes also supports the buffer interface, supporting reading
and writing binary (but not character) data.
Out of scope issues
@ -127,7 +161,9 @@ Out of scope issues
(which requires lexer and parser support in addition to everything
else). Since there appears to be no immediate need for a literal
representation, designing and implementing one is out of the scope
of this PEP.
of this PEP. (Hmm... A b"..." literal accepting only ASCII
values is likely to be added to 3.0; not clear about 2.6. This
needs a PEP.)
* Python 3k will have a much different I/O subsystem. Deciding how
that I/O subsystem will work and interact with the bytes object is
@ -140,19 +176,19 @@ Out of scope issues
Unresolved issues
* Perhaps the bytes object should be implemented as a extension
module until we are more sure of the design (similar to how the
set object was prototyped).
* Need to specify the methods more carefully.
* Should the bytes object implement the buffer interface? Probably,
but we need to look into the implications of that (e.g. regex
operations on byte arrays).
* Should all those list methods really be implemented?
* Should the object implement __reversed__ and reverse? Should it
implement sort?
* A case could be made for supporting .ljust(), .rjust(),
.center() with a mandatory second argument.
* Need to clarify what some of the methods do. How are comparisons
done? Hashing? Pickling and marshalling?
* A case could be made for supporting .split() with a mandatory
argument.
* How should pickling and marshalling work?
* I probably forgot a few things.
Questions and answers
@ -174,7 +210,7 @@ Questions and answers
Q: Why does bytes ignore the encoding argument if the initialiser is
a str?
a str? (This only applies to 2.6.)
A: There is no sane meaning that the encoding can have in that case.
str objects *are* byte arrays and they know nothing about the