Add discussion of error handlers proposed by Glen

Linderman.
This commit is contained in:
Martin v. Löwis 2009-04-30 08:34:04 +00:00
parent d99932829d
commit d0df1270f4
1 changed files with 13 additions and 1 deletions

View File

@ -80,7 +80,8 @@ environment variables.
The error handler interface is extended to allow the encode error
handler to return byte strings immediately, in addition to returning
Unicode strings which then get encoded again.
Unicode strings which then get encoded again (also see the discussion
below).
If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding "utf-8b", as the regular UTF-8 codec would not
@ -123,6 +124,17 @@ for accepting and returning bytes, would be written as::
# fn is now a str object
yield fn.encode(fse, "python-escape")
The encode error handler interface presently requires replacement
Unicode to be provide in lieu of the non-encodable Unicode from the
source string. It promptly encodes that replacement Unicode. In some
error handlers, such as the python-escape proposed here, it is simpler
and more efficient for the error handler to provide a pre-encoded
replacement byte string, rather than forcing it to calculating Unicode
from which the encoder would create the desired bytes. In fact, with
python-escape, there are required byte sequences which cannot be
generated from replacement Unicode.
References
==========