Remove utf-8b codec. Rename error handler to utf8b.
This commit is contained in:
parent
16ede3ad18
commit
def841b431
20
pep-0383.txt
20
pep-0383.txt
|
@ -72,27 +72,17 @@ be decoded. With this PEP, non-decodable bytes >128 will be
|
||||||
represented as lone half surrogate codes U+DC80..U+DCFF. Bytes below
|
represented as lone half surrogate codes U+DC80..U+DCFF. Bytes below
|
||||||
128 will produce exceptions; see the discussion below.
|
128 will produce exceptions; see the discussion below.
|
||||||
|
|
||||||
To convert non-decodable bytes, a new error handler ([2])
|
To convert non-decodable bytes, a new error handler ([2]) "utf8b" is
|
||||||
"python-escape" is introduced, which produces these half
|
introduced, which produces these half surrogates. On encoding, the
|
||||||
surrogates. On encoding, the error handler converts the half surrogate
|
error handler converts the half surrogate back to the corresponding
|
||||||
back to the corresponding byte. This error handler will be used in any
|
byte. This error handler will be used in any API that receives or
|
||||||
API that receives or produces file names, command line arguments, or
|
produces file names, command line arguments, or environment variables.
|
||||||
environment variables.
|
|
||||||
|
|
||||||
The error handler interface is extended to allow the encode error
|
The error handler interface is extended to allow the encode error
|
||||||
handler to return byte strings immediately, in addition to returning
|
handler to return byte strings immediately, in addition to returning
|
||||||
Unicode strings which then get encoded again (also see the discussion
|
Unicode strings which then get encoded again (also see the discussion
|
||||||
below).
|
below).
|
||||||
|
|
||||||
If the locale's encoding is UTF-8, the file system encoding is set to
|
|
||||||
a new encoding "utf-8b", as the regular UTF-8 codec would not
|
|
||||||
re-encode half surrogates as single bytes. The UTF-8b codec decodes
|
|
||||||
invalid bytes (which must be >= 0x80) into half surrogate codes
|
|
||||||
U+DC80..U+DCFF. Unlike the utf-8 codec, the utf-8b codec follows the
|
|
||||||
strict definition of UTF-8 to determine what an invalid byte is
|
|
||||||
(which, among other restrictions, disallows to encode surrogate codes
|
|
||||||
in UTF-8).
|
|
||||||
|
|
||||||
Byte-orientied interfaces that already exist in Python 3.0 are not
|
Byte-orientied interfaces that already exist in Python 3.0 are not
|
||||||
affected by this specification. They are neither enhanced nor
|
affected by this specification. They are neither enhanced nor
|
||||||
deprecated.
|
deprecated.
|
||||||
|
|
Loading…
Reference in New Issue