Add discussion of error handlers proposed by Glen
Linderman.
This commit is contained in:
parent
d99932829d
commit
d0df1270f4
14
pep-0383.txt
14
pep-0383.txt
|
@ -80,7 +80,8 @@ environment variables.
|
||||||
|
|
||||||
The error handler interface is extended to allow the encode error
|
The error handler interface is extended to allow the encode error
|
||||||
handler to return byte strings immediately, in addition to returning
|
handler to return byte strings immediately, in addition to returning
|
||||||
Unicode strings which then get encoded again.
|
Unicode strings which then get encoded again (also see the discussion
|
||||||
|
below).
|
||||||
|
|
||||||
If the locale's encoding is UTF-8, the file system encoding is set to
|
If the locale's encoding is UTF-8, the file system encoding is set to
|
||||||
a new encoding "utf-8b", as the regular UTF-8 codec would not
|
a new encoding "utf-8b", as the regular UTF-8 codec would not
|
||||||
|
@ -123,6 +124,17 @@ for accepting and returning bytes, would be written as::
|
||||||
# fn is now a str object
|
# fn is now a str object
|
||||||
yield fn.encode(fse, "python-escape")
|
yield fn.encode(fse, "python-escape")
|
||||||
|
|
||||||
|
The encode error handler interface presently requires replacement
|
||||||
|
Unicode to be provide in lieu of the non-encodable Unicode from the
|
||||||
|
source string. It promptly encodes that replacement Unicode. In some
|
||||||
|
error handlers, such as the python-escape proposed here, it is simpler
|
||||||
|
and more efficient for the error handler to provide a pre-encoded
|
||||||
|
replacement byte string, rather than forcing it to calculating Unicode
|
||||||
|
from which the encoder would create the desired bytes. In fact, with
|
||||||
|
python-escape, there are required byte sequences which cannot be
|
||||||
|
generated from replacement Unicode.
|
||||||
|
|
||||||
|
|
||||||
References
|
References
|
||||||
==========
|
==========
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue