Reformat.
This commit is contained in:
parent
2da9d64f5e
commit
61b80831fe
280
pep-0358.txt
280
pep-0358.txt
|
@ -12,197 +12,195 @@ Post-History:
|
|||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP outlines the introduction of a raw bytes sequence object.
|
||||
Adding the bytes object is one step in the transition to Unicode based
|
||||
str objects.
|
||||
This PEP outlines the introduction of a raw bytes sequence object.
|
||||
Adding the bytes object is one step in the transition to Unicode
|
||||
based str objects.
|
||||
|
||||
|
||||
Motivation
|
||||
==========
|
||||
|
||||
Python's current string objects are overloaded. They serve to hold
|
||||
both sequences of characters and sequences of bytes. This overloading
|
||||
of purpose leads to confusion and bugs. In future versions of Python,
|
||||
string objects will be used for holding character data. The bytes object
|
||||
will fulfil the role of a byte container. Eventually the unicode
|
||||
built-in will be renamed to str and the str object will be removed.
|
||||
Python's current string objects are overloaded. They serve to hold
|
||||
both sequences of characters and sequences of bytes. This
|
||||
overloading of purpose leads to confusion and bugs. In future
|
||||
versions of Python, string objects will be used for holding
|
||||
character data. The bytes object will fulfil the role of a byte
|
||||
container. Eventually the unicode built-in will be renamed to str
|
||||
and the str object will be removed.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
A bytes object stores a mutable sequence of integers that are in the
|
||||
range 0 to 255. Unlike string objects, indexing a bytes object returns
|
||||
an integer. Assigning an element using a object that is not an integer
|
||||
causes a TypeError exception. Assigning an element to a value outside
|
||||
the range 0 to 255 causes a ValueError exception. The __len__ method of
|
||||
bytes returns the number of integers stored in the sequence (i.e. the
|
||||
number of bytes).
|
||||
A bytes object stores a mutable sequence of integers that are in the
|
||||
range 0 to 255. Unlike string objects, indexing a bytes object
|
||||
returns an integer. Assigning an element using a object that is not
|
||||
an integer causes a TypeError exception. Assigning an element to a
|
||||
value outside the range 0 to 255 causes a ValueError exception. The
|
||||
__len__ method of bytes returns the number of integers stored in the
|
||||
sequence (i.e. the number of bytes).
|
||||
|
||||
The constructor of the bytes object has the following signature:
|
||||
The constructor of the bytes object has the following signature:
|
||||
|
||||
bytes([initialiser[, [encoding]])
|
||||
bytes([initialiser[, [encoding]])
|
||||
|
||||
If no arguments are provided then an object containing zero elements is
|
||||
created and returned. The initialiser argument can be a string or a
|
||||
sequence of integers. The pseudo-code for the constructor is:
|
||||
If no arguments are provided then an object containing zero elements
|
||||
is created and returned. The initialiser argument can be a string
|
||||
or a sequence of integers. The pseudo-code for the constructor is:
|
||||
|
||||
def bytes(initialiser=[], encoding=None):
|
||||
if isinstance(initialiser, basestring):
|
||||
if isinstance(initialiser, unicode):
|
||||
if encoding is None:
|
||||
encoding = sys.getdefaultencoding()
|
||||
initialiser = initialiser.encode(encoding)
|
||||
initialiser = [ord(c) for c in initialiser]
|
||||
elif encoding is not None:
|
||||
raise TypeError("explicit encoding invalid for non-string "
|
||||
"initialiser")
|
||||
create bytes object and fill with integers from initialiser
|
||||
return bytes object
|
||||
def bytes(initialiser=[], encoding=None):
|
||||
if isinstance(initialiser, basestring):
|
||||
if isinstance(initialiser, unicode):
|
||||
if encoding is None:
|
||||
encoding = sys.getdefaultencoding()
|
||||
initialiser = initialiser.encode(encoding)
|
||||
initialiser = [ord(c) for c in initialiser]
|
||||
elif encoding is not None:
|
||||
raise TypeError("explicit encoding invalid for non-string "
|
||||
"initialiser")
|
||||
create bytes object and fill with integers from initialiser
|
||||
return bytes object
|
||||
|
||||
The __repr__ method returns a string that can be evaluated to generate a
|
||||
new bytes object containing the same sequence of integers. The sequence
|
||||
is represented by a list of ints. For example:
|
||||
The __repr__ method returns a string that can be evaluated to
|
||||
generate a new bytes object containing the same sequence of
|
||||
integers. The sequence is represented by a list of ints. For
|
||||
example:
|
||||
|
||||
>>> repr(bytes[10, 20, 30])
|
||||
'bytes([10, 20, 30])'
|
||||
>>> repr(bytes[10, 20, 30])
|
||||
'bytes([10, 20, 30])'
|
||||
|
||||
The object has a decode method equivalent to the decode method of the
|
||||
str object. The object has a classmethod fromhex that takes a string of
|
||||
characters from the set [0-9a-zA-Z ] and returns a bytes object (similar
|
||||
to binascii.unhexlify). For example:
|
||||
The object has a decode method equivalent to the decode method of
|
||||
the str object. The object has a classmethod fromhex that takes a
|
||||
string of characters from the set [0-9a-zA-Z ] and returns a bytes
|
||||
object (similar to binascii.unhexlify). For example:
|
||||
|
||||
>>> bytes.fromhex('5c5350ff')
|
||||
bytes([92, 83, 80, 255]])
|
||||
>>> bytes.fromhex('5c 53 50 ff')
|
||||
bytes([92, 83, 80, 255]])
|
||||
>>> bytes.fromhex('5c5350ff')
|
||||
bytes([92, 83, 80, 255]])
|
||||
>>> bytes.fromhex('5c 53 50 ff')
|
||||
bytes([92, 83, 80, 255]])
|
||||
|
||||
The object has a hex method that does the reverse conversion (similar to
|
||||
binascii.hexlify):
|
||||
The object has a hex method that does the reverse conversion
|
||||
(similar to binascii.hexlify):
|
||||
|
||||
>> bytes([92, 83, 80, 255]]).hex()
|
||||
'5c5350ff'
|
||||
>> bytes([92, 83, 80, 255]]).hex()
|
||||
'5c5350ff'
|
||||
|
||||
The bytes object has methods similar to the list object:
|
||||
The bytes object has methods similar to the list object:
|
||||
|
||||
__add__
|
||||
__contains__
|
||||
__delitem__
|
||||
__delslice__
|
||||
__eq__
|
||||
__ge__
|
||||
__getitem__
|
||||
__getslice__
|
||||
__gt__
|
||||
__hash__
|
||||
__iadd__
|
||||
__imul__
|
||||
__iter__
|
||||
__le__
|
||||
__len__
|
||||
__lt__
|
||||
__mul__
|
||||
__ne__
|
||||
__reduce__
|
||||
__reduce_ex__
|
||||
__repr__
|
||||
__rmul__
|
||||
__setitem__
|
||||
__setslice__
|
||||
append
|
||||
count
|
||||
extend
|
||||
index
|
||||
insert
|
||||
pop
|
||||
remove
|
||||
__add__
|
||||
__contains__
|
||||
__delitem__
|
||||
__delslice__
|
||||
__eq__
|
||||
__ge__
|
||||
__getitem__
|
||||
__getslice__
|
||||
__gt__
|
||||
__hash__
|
||||
__iadd__
|
||||
__imul__
|
||||
__iter__
|
||||
__le__
|
||||
__len__
|
||||
__lt__
|
||||
__mul__
|
||||
__ne__
|
||||
__reduce__
|
||||
__reduce_ex__
|
||||
__repr__
|
||||
__rmul__
|
||||
__setitem__
|
||||
__setslice__
|
||||
append
|
||||
count
|
||||
extend
|
||||
index
|
||||
insert
|
||||
pop
|
||||
remove
|
||||
|
||||
|
||||
Out of scope issues
|
||||
===================
|
||||
|
||||
* If we provide a literal syntax for bytes then it should look distinctly
|
||||
different than the syntax for literal strings. Also, a new type, even
|
||||
built-in, is much less drastic than a new literal (which requires
|
||||
lexer and parser support in addition to everything else). Since there
|
||||
appears to be no immediate need for a literal representation,
|
||||
designing and implementing one is out of the scope of this PEP.
|
||||
* If we provide a literal syntax for bytes then it should look
|
||||
distinctly different than the syntax for literal strings. Also, a
|
||||
new type, even built-in, is much less drastic than a new literal
|
||||
(which requires lexer and parser support in addition to everything
|
||||
else). Since there appears to be no immediate need for a literal
|
||||
representation, designing and implementing one is out of the scope
|
||||
of this PEP.
|
||||
|
||||
* Python 3k will have a much different I/O subsystem. Deciding how that
|
||||
I/O subsystem will work and interact with the bytes object is out of
|
||||
the scope of this PEP.
|
||||
* Python 3k will have a much different I/O subsystem. Deciding how
|
||||
that I/O subsystem will work and interact with the bytes object is
|
||||
out of the scope of this PEP.
|
||||
|
||||
* It has been suggested that a special method named __bytes__ be added
|
||||
to language to allow objects to be converted into byte arrays. This
|
||||
decision is out of scope.
|
||||
* It has been suggested that a special method named __bytes__ be
|
||||
added to language to allow objects to be converted into byte
|
||||
arrays. This decision is out of scope.
|
||||
|
||||
|
||||
Unresolved issues
|
||||
=================
|
||||
|
||||
* Perhaps the bytes object should be implemented as a extension module
|
||||
until we are more sure of the design (similar to how the set object
|
||||
was prototyped).
|
||||
* Perhaps the bytes object should be implemented as a extension
|
||||
module until we are more sure of the design (similar to how the
|
||||
set object was prototyped).
|
||||
|
||||
* Should the bytes object implement the buffer interface? Probably, but
|
||||
we need to look into the implications of that (e.g. regex operations
|
||||
on byte arrays).
|
||||
* Should the bytes object implement the buffer interface? Probably,
|
||||
but we need to look into the implications of that (e.g. regex
|
||||
operations on byte arrays).
|
||||
|
||||
* Should the object implement __reversed__ and reverse? Should it
|
||||
implement sort?
|
||||
* Should the object implement __reversed__ and reverse? Should it
|
||||
implement sort?
|
||||
|
||||
* Need to clarify what some of the methods do. How are comparisons
|
||||
done? Hashing? Pickling and marshalling?
|
||||
* Need to clarify what some of the methods do. How are comparisons
|
||||
done? Hashing? Pickling and marshalling?
|
||||
|
||||
|
||||
Questions and answers
|
||||
=====================
|
||||
|
||||
Q: Why have the optional encoding argument when the encode method of
|
||||
Unicode objects does the same thing.
|
||||
Q: Why have the optional encoding argument when the encode method of
|
||||
Unicode objects does the same thing.
|
||||
|
||||
A: In the current version of Python, the encode method returns a str
|
||||
object and we cannot change that without breaking code. The construct
|
||||
bytes(s.encode(...)) is expensive because it has to copy the byte
|
||||
sequence multiple times. Also, Python generally provides two ways of
|
||||
converting an object of type A into an object of type B: ask an A
|
||||
instance to convert itself to a B, or ask the type B to create a new
|
||||
instance from an A. Depending on what A and B are, both APIs make
|
||||
sense; sometimes reasons of decoupling require that A can't know
|
||||
about B, in which case you have to use the latter approach; sometimes
|
||||
B can't know about A, in which case you have to use the former.
|
||||
A: In the current version of Python, the encode method returns a str
|
||||
object and we cannot change that without breaking code. The
|
||||
construct bytes(s.encode(...)) is expensive because it has to
|
||||
copy the byte sequence multiple times. Also, Python generally
|
||||
provides two ways of converting an object of type A into an
|
||||
object of type B: ask an A instance to convert itself to a B, or
|
||||
ask the type B to create a new instance from an A. Depending on
|
||||
what A and B are, both APIs make sense; sometimes reasons of
|
||||
decoupling require that A can't know about B, in which case you
|
||||
have to use the latter approach; sometimes B can't know about A,
|
||||
in which case you have to use the former.
|
||||
|
||||
|
||||
Q: Why does bytes ignore the encoding argument if the initialiser is a
|
||||
str?
|
||||
Q: Why does bytes ignore the encoding argument if the initialiser is
|
||||
a str?
|
||||
|
||||
A: There is no sane meaning that the encoding can have in that case.
|
||||
str objects *are* byte arrays and they know nothing about the
|
||||
encoding of character data they contain. We need to assume that the
|
||||
programmer has provided str object that already uses the desired
|
||||
encoding. If you need something other than a pure copy of the bytes
|
||||
then you need to first decode the string. For example:
|
||||
A: There is no sane meaning that the encoding can have in that case.
|
||||
str objects *are* byte arrays and they know nothing about the
|
||||
encoding of character data they contain. We need to assume that
|
||||
the programmer has provided str object that already uses the
|
||||
desired encoding. If you need something other than a pure copy of
|
||||
the bytes then you need to first decode the string. For example:
|
||||
|
||||
bytes(s.decode(encoding1), encoding2)
|
||||
bytes(s.decode(encoding1), encoding2)
|
||||
|
||||
|
||||
Q: Why not have the encoding argument default to Latin-1 (or some other
|
||||
encoding that covers the entire byte range) rather than ASCII ?
|
||||
Q: Why not have the encoding argument default to Latin-1 (or some
|
||||
other encoding that covers the entire byte range) rather than
|
||||
ASCII?
|
||||
|
||||
A: The system default encoding for Python is ASCII. It seems least
|
||||
confusing to use that default. Also, in Py3k, using Latin-1 as
|
||||
the default might not be what users expect. For example, they might
|
||||
prefer a Unicode encoding. Any default will not always work as
|
||||
expected. At least ASCII will complain loudly if you try to encode
|
||||
non-ASCII data.
|
||||
A: The system default encoding for Python is ASCII. It seems least
|
||||
confusing to use that default. Also, in Py3k, using Latin-1 as
|
||||
the default might not be what users expect. For example, they
|
||||
might prefer a Unicode encoding. Any default will not always
|
||||
work as expected. At least ASCII will complain loudly if you try
|
||||
to encode non-ASCII data.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue