Replace remaining python-escape occurrences.
This commit is contained in:
parent
5dd402c72d
commit
8f3e8ce98e
16
pep-0383.txt
16
pep-0383.txt
|
@ -95,7 +95,7 @@ Discussion
|
|||
|
||||
While providing a uniform API to non-decodable bytes, this interface
|
||||
has the limitation that chosen representation only "works" if the data
|
||||
get converted back to bytes with the python-escape error handler
|
||||
get converted back to bytes with the utf8b error handler
|
||||
also. Encoding the data with the locale's encoding and the (default)
|
||||
strict error handler will raise an exception, encoding them with UTF-8
|
||||
will produce non-sensical data.
|
||||
|
@ -115,25 +115,25 @@ likely pass the result strings back into APIs like os.stat() or
|
|||
open(), which then encodes them back into their original byte
|
||||
representation. Applications that need to process the original byte
|
||||
strings can obtain them by encoding the character strings with the
|
||||
file system encoding, passing "python-escape" as the error handler
|
||||
name. For example, a function that works like os.listdir, except
|
||||
for accepting and returning bytes, would be written as::
|
||||
file system encoding, passing "utf8b" as the error handler name. For
|
||||
example, a function that works like os.listdir, except for accepting
|
||||
and returning bytes, would be written as::
|
||||
|
||||
def listdir_b(dirname):
|
||||
fse = sys.getfilesystemencoding()
|
||||
dirname = dirname.decode(fse, "python-escape")
|
||||
dirname = dirname.decode(fse, "utf8b")
|
||||
for fn in os.listdir(dirname):
|
||||
# fn is now a str object
|
||||
yield fn.encode(fse, "python-escape")
|
||||
yield fn.encode(fse, "utf8b")
|
||||
|
||||
The encode error handler interface presently requires replacement
|
||||
Unicode to be provide in lieu of the non-encodable Unicode from the
|
||||
source string. It promptly encodes that replacement Unicode. In some
|
||||
error handlers, such as the python-escape proposed here, it is simpler
|
||||
error handlers, such as the utf8 proposed here, it is simpler
|
||||
and more efficient for the error handler to provide a pre-encoded
|
||||
replacement byte string, rather than forcing it to calculating Unicode
|
||||
from which the encoder would create the desired bytes. In fact, with
|
||||
python-escape, there are required byte sequences which cannot be
|
||||
utf8b, there are required byte sequences which cannot be
|
||||
generated from replacement Unicode.
|
||||
|
||||
A few alternative approaches have been proposed:
|
||||
|
|
Loading…
Reference in New Issue