Update the bytes object to better resemble my intentions.
This commit is contained in:
parent
7c97ac0ea4
commit
3ed8bc79de
|
@ -107,7 +107,7 @@ Index by Category
|
|||
I 350 Codetags Elliott
|
||||
S 354 Enumerations in Python Finney
|
||||
S 355 Path - Object oriented filesystem paths Lindqvist
|
||||
S 358 The "bytes" Object Schemenauer
|
||||
S 358 The "bytes" Object Schemenauer, GvR
|
||||
S 362 Function Signature Object Cannon, Seo
|
||||
S 754 IEEE 754 Floating Point Special Values Warnes
|
||||
S 3101 Advanced String Formatting Talin
|
||||
|
@ -431,7 +431,7 @@ Numerical Index
|
|||
S 355 Path - Object oriented filesystem paths Lindqvist
|
||||
IF 356 Python 2.5 Release Schedule Norwitz, et al
|
||||
SF 357 Allowing Any Object to be Used for Slicing Oliphant
|
||||
S 358 The "bytes" Object Schemenauer
|
||||
S 358 The "bytes" Object Schemenauer, GvR
|
||||
SW 359 The "make" Statement Bethard
|
||||
I 360 Externally Maintained Packages Cannon
|
||||
I 361 Python 2.6 Release Schedule Norwitz, et al
|
||||
|
|
128
pep-0358.txt
128
pep-0358.txt
|
@ -2,12 +2,12 @@ PEP: 358
|
|||
Title: The "bytes" Object
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Neil Schemenauer <nas@arctrix.com>
|
||||
Author: Neil Schemenauer <nas@arctrix.com>, Guido van Rossum <guido@google.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/plain
|
||||
Created: 15-Feb-2006
|
||||
Python-Version: 2.5
|
||||
Python-Version: 2.6, 3.0
|
||||
Post-History:
|
||||
|
||||
|
||||
|
@ -20,74 +20,86 @@ Abstract
|
|||
|
||||
Motivation
|
||||
|
||||
Python's current string objects are overloaded. They serve to hold
|
||||
both sequences of characters and sequences of bytes. This
|
||||
overloading of purpose leads to confusion and bugs. In future
|
||||
Python's current string objects are overloaded. They serve to hold
|
||||
both sequences of characters and sequences of bytes. This
|
||||
overloading of purpose leads to confusion and bugs. In future
|
||||
versions of Python, string objects will be used for holding
|
||||
character data. The bytes object will fulfil the role of a byte
|
||||
container. Eventually the unicode built-in will be renamed to str
|
||||
character data. The bytes object will fulfil the role of a byte
|
||||
container. Eventually the unicode built-in will be renamed to str
|
||||
and the str object will be removed.
|
||||
|
||||
|
||||
Specification
|
||||
|
||||
A bytes object stores a mutable sequence of integers that are in the
|
||||
range 0 to 255. Unlike string objects, indexing a bytes object
|
||||
returns an integer. Assigning an element using a object that is not
|
||||
an integer causes a TypeError exception. Assigning an element to a
|
||||
value outside the range 0 to 255 causes a ValueError exception. The
|
||||
__len__ method of bytes returns the number of integers stored in the
|
||||
sequence (i.e. the number of bytes).
|
||||
A bytes object stores a mutable sequence of integers that are in
|
||||
the range 0 to 255. Unlike string objects, indexing a bytes
|
||||
object returns an integer. Assigning an element using a object
|
||||
that is not an integer causes a TypeError exception. Assigning an
|
||||
element to a value outside the range 0 to 255 causes a ValueError
|
||||
exception. The .__len__() method of bytes returns the number of
|
||||
integers stored in the sequence (i.e. the number of bytes).
|
||||
|
||||
The constructor of the bytes object has the following signature:
|
||||
|
||||
bytes([initialiser[, [encoding]])
|
||||
|
||||
If no arguments are provided then an object containing zero elements
|
||||
is created and returned. The initialiser argument can be a string
|
||||
or a sequence of integers. The pseudo-code for the constructor is:
|
||||
is created and returned. The initialiser argument can be a string,
|
||||
a sequence of integers, or a single integer. The pseudo-code for the
|
||||
constructor is:
|
||||
|
||||
def bytes(initialiser=[], encoding=None):
|
||||
if isinstance(initialiser, basestring):
|
||||
if isinstance(initialiser, unicode):
|
||||
if isinstance(initialiser, int): # In 2.6, (int, long)
|
||||
initialiser = [0]*initialiser
|
||||
elif isinstance(initialiser, basestring):
|
||||
if isinstance(initialiser, unicode): # In 3.0, always
|
||||
if encoding is None:
|
||||
# In 3.0, raise TypeError("explicit encoding required")
|
||||
encoding = sys.getdefaultencoding()
|
||||
initialiser = initialiser.encode(encoding)
|
||||
initialiser = [ord(c) for c in initialiser]
|
||||
elif encoding is not None:
|
||||
raise TypeError("explicit encoding invalid for non-string "
|
||||
"initialiser")
|
||||
create bytes object and fill with integers from initialiser
|
||||
else:
|
||||
if encoding is not None:
|
||||
raise TypeError("explicit encoding invalid for non-string "
|
||||
"initialiser")
|
||||
# Create bytes object and fill with integers from initialiser
|
||||
# while ensuring each integer is in range(256); initialiser
|
||||
# can be any iterable
|
||||
return bytes object
|
||||
|
||||
The __repr__ method returns a string that can be evaluated to
|
||||
The .__repr__() method returns a string that can be evaluated to
|
||||
generate a new bytes object containing the same sequence of
|
||||
integers. The sequence is represented by a list of ints. For
|
||||
example:
|
||||
integers. The sequence is represented by a list of ints using
|
||||
hexadecimal notation. For example:
|
||||
|
||||
>>> repr(bytes[10, 20, 30])
|
||||
'bytes([10, 20, 30])'
|
||||
'bytes([0x0a, 0x14, 0x1e])'
|
||||
|
||||
The object has a decode method equivalent to the decode method of
|
||||
the str object. The object has a classmethod fromhex that takes a
|
||||
string of characters from the set [0-9a-zA-Z ] and returns a bytes
|
||||
object (similar to binascii.unhexlify). For example:
|
||||
The object has a .decode() method equivalent to the .decode()
|
||||
method of the str object. (This is redundant since it can also be
|
||||
decoded by calling unicode(b, <encoding>) (in 2.6) or str(b,
|
||||
<encoding>) (in 3.0); do we need encode/decode methods? In a
|
||||
sense the spelling using a constructor is cleaner.) The object
|
||||
has a classmethod .fromhex() that takes a string of characters
|
||||
from the set [0-9a-zA-Z ] and returns a bytes object (similar to
|
||||
binascii.unhexlify). For example:
|
||||
|
||||
>>> bytes.fromhex('5c5350ff')
|
||||
bytes([92, 83, 80, 255]])
|
||||
>>> bytes.fromhex('5c 53 50 ff')
|
||||
bytes([92, 83, 80, 255]])
|
||||
|
||||
The object has a hex method that does the reverse conversion
|
||||
The object has a .hex() method that does the reverse conversion
|
||||
(similar to binascii.hexlify):
|
||||
|
||||
>> bytes([92, 83, 80, 255]]).hex()
|
||||
'5c5350ff'
|
||||
|
||||
The bytes object has methods similar to the list object:
|
||||
The bytes object has some methods similar to list method, and
|
||||
others similar to str methods:
|
||||
|
||||
__add__
|
||||
__contains__
|
||||
__contains__ (with int arg, like list; with bytes arg, like str)
|
||||
__delitem__
|
||||
__delslice__
|
||||
__eq__
|
||||
|
@ -95,7 +107,6 @@ Specification
|
|||
__getitem__
|
||||
__getslice__
|
||||
__gt__
|
||||
__hash__
|
||||
__iadd__
|
||||
__imul__
|
||||
__iter__
|
||||
|
@ -107,16 +118,39 @@ Specification
|
|||
__reduce__
|
||||
__reduce_ex__
|
||||
__repr__
|
||||
__reversed__
|
||||
__rmul__
|
||||
__setitem__
|
||||
__setslice__
|
||||
append
|
||||
count
|
||||
decode
|
||||
endswith
|
||||
extend
|
||||
find
|
||||
index
|
||||
insert
|
||||
join
|
||||
partition
|
||||
pop
|
||||
remove
|
||||
replace
|
||||
rindex
|
||||
rpartition
|
||||
split
|
||||
startswith
|
||||
reverse
|
||||
rfind
|
||||
rindex
|
||||
rsplit
|
||||
translate
|
||||
|
||||
Note the conspicuous absence of .isupper(), .upper(), and friends.
|
||||
There is no __hash__ because the object is mutable. There is no
|
||||
usecase for a .sort() method.
|
||||
|
||||
The bytes also supports the buffer interface, supporting reading
|
||||
and writing binary (but not character) data.
|
||||
|
||||
|
||||
Out of scope issues
|
||||
|
@ -127,7 +161,9 @@ Out of scope issues
|
|||
(which requires lexer and parser support in addition to everything
|
||||
else). Since there appears to be no immediate need for a literal
|
||||
representation, designing and implementing one is out of the scope
|
||||
of this PEP.
|
||||
of this PEP. (Hmm... A b"..." literal accepting only ASCII
|
||||
values is likely to be added to 3.0; not clear about 2.6. This
|
||||
needs a PEP.)
|
||||
|
||||
* Python 3k will have a much different I/O subsystem. Deciding how
|
||||
that I/O subsystem will work and interact with the bytes object is
|
||||
|
@ -140,19 +176,19 @@ Out of scope issues
|
|||
|
||||
Unresolved issues
|
||||
|
||||
* Perhaps the bytes object should be implemented as a extension
|
||||
module until we are more sure of the design (similar to how the
|
||||
set object was prototyped).
|
||||
* Need to specify the methods more carefully.
|
||||
|
||||
* Should the bytes object implement the buffer interface? Probably,
|
||||
but we need to look into the implications of that (e.g. regex
|
||||
operations on byte arrays).
|
||||
* Should all those list methods really be implemented?
|
||||
|
||||
* Should the object implement __reversed__ and reverse? Should it
|
||||
implement sort?
|
||||
* A case could be made for supporting .ljust(), .rjust(),
|
||||
.center() with a mandatory second argument.
|
||||
|
||||
* Need to clarify what some of the methods do. How are comparisons
|
||||
done? Hashing? Pickling and marshalling?
|
||||
* A case could be made for supporting .split() with a mandatory
|
||||
argument.
|
||||
|
||||
* How should pickling and marshalling work?
|
||||
|
||||
* I probably forgot a few things.
|
||||
|
||||
|
||||
Questions and answers
|
||||
|
@ -174,7 +210,7 @@ Questions and answers
|
|||
|
||||
|
||||
Q: Why does bytes ignore the encoding argument if the initialiser is
|
||||
a str?
|
||||
a str? (This only applies to 2.6.)
|
||||
|
||||
A: There is no sane meaning that the encoding can have in that case.
|
||||
str objects *are* byte arrays and they know nothing about the
|
||||
|
|
Loading…
Reference in New Issue