84 lines
2.0 KiB
Plaintext
84 lines
2.0 KiB
Plaintext
|
PEP: 332
|
|||
|
Title: Byte vectors and String/Unicode Unification
|
|||
|
Version: $Revision$
|
|||
|
Last-Modified: $Date$
|
|||
|
Author: Skip Montanaro <skip@pobox.com>
|
|||
|
Status: Draft
|
|||
|
Type: Standards Track
|
|||
|
Content-Type: text/x-rst
|
|||
|
Created: 11-Aug-2004
|
|||
|
Python-Version: 2.5
|
|||
|
Post-History:
|
|||
|
|
|||
|
|
|||
|
Abstract
|
|||
|
========
|
|||
|
|
|||
|
This PEP outlines the introduction of a raw ``bytes`` sequence object
|
|||
|
and the unification of the current ``str`` and ``unicode`` objects.
|
|||
|
|
|||
|
|
|||
|
Rationale
|
|||
|
=========
|
|||
|
|
|||
|
Python's current string objects are overloaded. They serve both to
|
|||
|
hold ASCII and non-ASCII character data and to also hold sequences of
|
|||
|
raw bytes which have no reasonable interpretation as displayable
|
|||
|
character sequences. This overlap hasn't been a big problem in the
|
|||
|
past, but as Python moves closer to requiring source code to be
|
|||
|
properly encoded, the use of strings to represent raw byte sequences
|
|||
|
will be more problematic. In addition, as Python's Unicode support
|
|||
|
has improved, it's easier to consider strings as ASCII-encoded Unicode
|
|||
|
objects.
|
|||
|
|
|||
|
|
|||
|
Proposed Implementation
|
|||
|
=======================
|
|||
|
|
|||
|
The number in parentheses indicates the Python version in which the
|
|||
|
feature will be introduced.
|
|||
|
|
|||
|
- Add a ``bytes`` builtin which is just a synonym for ``str``. (2.5)
|
|||
|
|
|||
|
- Add a ``b"..."`` string literal which is equivalent to raw string
|
|||
|
literals, with the exception that values which conflict with the
|
|||
|
source encoding of the containing file not generate warnings. (2.5)
|
|||
|
|
|||
|
- Warn about the use of variables named "bytes". (2.5 or 2.6)
|
|||
|
|
|||
|
- Introduce a ``bytes`` builtin which refers to a sequence distinct
|
|||
|
from the ``str`` type. (2.6)
|
|||
|
|
|||
|
- Make ``str`` a synonym for ``unicode``. (3.0)
|
|||
|
|
|||
|
|
|||
|
Bytes Object API
|
|||
|
================
|
|||
|
|
|||
|
TBD.
|
|||
|
|
|||
|
|
|||
|
Issues
|
|||
|
======
|
|||
|
|
|||
|
- Can this be accomplished before Python 3.0?
|
|||
|
|
|||
|
- Should ``bytes`` objects be mutable or immutable? (Guido seems to
|
|||
|
like them to be mutable.)
|
|||
|
|
|||
|
|
|||
|
Copyright
|
|||
|
=========
|
|||
|
|
|||
|
This document has been placed in the public domain.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
..
|
|||
|
Local Variables:
|
|||
|
mode: indented-text
|
|||
|
indent-tabs-mode: nil
|
|||
|
sentence-end-double-space: t
|
|||
|
fill-column: 70
|
|||
|
End:
|