Remove utf-8b codec. Rename error handler to utf8b.

This commit is contained in:
Martin v. Löwis 2009-05-03 08:16:57 +00:00
parent 16ede3ad18
commit def841b431
1 changed files with 5 additions and 15 deletions

View File

@ -72,27 +72,17 @@ be decoded. With this PEP, non-decodable bytes >128 will be
represented as lone half surrogate codes U+DC80..U+DCFF. Bytes below
128 will produce exceptions; see the discussion below.
To convert non-decodable bytes, a new error handler ([2])
"python-escape" is introduced, which produces these half
surrogates. On encoding, the error handler converts the half surrogate
back to the corresponding byte. This error handler will be used in any
API that receives or produces file names, command line arguments, or
environment variables.
To convert non-decodable bytes, a new error handler ([2]) "utf8b" is
introduced, which produces these half surrogates. On encoding, the
error handler converts the half surrogate back to the corresponding
byte. This error handler will be used in any API that receives or
produces file names, command line arguments, or environment variables.
The error handler interface is extended to allow the encode error
handler to return byte strings immediately, in addition to returning
Unicode strings which then get encoded again (also see the discussion
below).
If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding "utf-8b", as the regular UTF-8 codec would not
re-encode half surrogates as single bytes. The UTF-8b codec decodes
invalid bytes (which must be >= 0x80) into half surrogate codes
U+DC80..U+DCFF. Unlike the utf-8 codec, the utf-8b codec follows the
strict definition of UTF-8 to determine what an invalid byte is
(which, among other restrictions, disallows to encode surrogate codes
in UTF-8).
Byte-orientied interfaces that already exist in Python 3.0 are not
affected by this specification. They are neither enhanced nor
deprecated.