Replace error handler discussion with text from
Stephen Turnbull.
This commit is contained in:
parent
8f3e8ce98e
commit
eeac57f9fa
20
pep-0383.txt
20
pep-0383.txt
|
@ -126,15 +126,17 @@ and returning bytes, would be written as::
|
|||
# fn is now a str object
|
||||
yield fn.encode(fse, "utf8b")
|
||||
|
||||
The encode error handler interface presently requires replacement
|
||||
Unicode to be provide in lieu of the non-encodable Unicode from the
|
||||
source string. It promptly encodes that replacement Unicode. In some
|
||||
error handlers, such as the utf8 proposed here, it is simpler
|
||||
and more efficient for the error handler to provide a pre-encoded
|
||||
replacement byte string, rather than forcing it to calculating Unicode
|
||||
from which the encoder would create the desired bytes. In fact, with
|
||||
utf8b, there are required byte sequences which cannot be
|
||||
generated from replacement Unicode.
|
||||
The extension to the encode error handler interface proposed by this
|
||||
PEP is necessary to implement the 'utf8b' error handler, because there
|
||||
are required byte sequences which cannot be generated from replacement
|
||||
Unicode. However, the encode error handler interface presently
|
||||
requires replacement Unicode to be provided in lieu of the
|
||||
non-encodable Unicode from the source string. Then it promptly
|
||||
encodes that replacement Unicode. In some error handlers, such as the
|
||||
'utf8b' proposed here, it is also simpler and more efficient for the
|
||||
error handler to provide a pre-encoded replacement byte string, rather
|
||||
than forcing it to calculating Unicode from which the encoder would
|
||||
create the desired bytes.
|
||||
|
||||
A few alternative approaches have been proposed:
|
||||
|
||||
|
|
Loading…
Reference in New Issue