Remove utf-8b codec. Rename error handler to utf8b.

This commit is contained in:
Martin v. Löwis 2009-05-03 08:16:57 +00:00
parent 16ede3ad18
commit def841b431
1 changed files with 5 additions and 15 deletions

View File

@ -72,27 +72,17 @@ be decoded. With this PEP, non-decodable bytes >128 will be
represented as lone half surrogate codes U+DC80..U+DCFF. Bytes below represented as lone half surrogate codes U+DC80..U+DCFF. Bytes below
128 will produce exceptions; see the discussion below. 128 will produce exceptions; see the discussion below.
To convert non-decodable bytes, a new error handler ([2]) To convert non-decodable bytes, a new error handler ([2]) "utf8b" is
"python-escape" is introduced, which produces these half introduced, which produces these half surrogates. On encoding, the
surrogates. On encoding, the error handler converts the half surrogate error handler converts the half surrogate back to the corresponding
back to the corresponding byte. This error handler will be used in any byte. This error handler will be used in any API that receives or
API that receives or produces file names, command line arguments, or produces file names, command line arguments, or environment variables.
environment variables.
The error handler interface is extended to allow the encode error The error handler interface is extended to allow the encode error
handler to return byte strings immediately, in addition to returning handler to return byte strings immediately, in addition to returning
Unicode strings which then get encoded again (also see the discussion Unicode strings which then get encoded again (also see the discussion
below). below).
If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding "utf-8b", as the regular UTF-8 codec would not
re-encode half surrogates as single bytes. The UTF-8b codec decodes
invalid bytes (which must be >= 0x80) into half surrogate codes
U+DC80..U+DCFF. Unlike the utf-8 codec, the utf-8b codec follows the
strict definition of UTF-8 to determine what an invalid byte is
(which, among other restrictions, disallows to encode surrogate codes
in UTF-8).
Byte-orientied interfaces that already exist in Python 3.0 are not Byte-orientied interfaces that already exist in Python 3.0 are not
affected by this specification. They are neither enhanced nor affected by this specification. They are neither enhanced nor
deprecated. deprecated.