diff --git a/pep-0383.txt b/pep-0383.txt index 9bf4fff70..09d8042c5 100644 --- a/pep-0383.txt +++ b/pep-0383.txt @@ -95,7 +95,7 @@ Discussion While providing a uniform API to non-decodable bytes, this interface has the limitation that chosen representation only "works" if the data -get converted back to bytes with the python-escape error handler +get converted back to bytes with the utf8b error handler also. Encoding the data with the locale's encoding and the (default) strict error handler will raise an exception, encoding them with UTF-8 will produce non-sensical data. @@ -115,25 +115,25 @@ likely pass the result strings back into APIs like os.stat() or open(), which then encodes them back into their original byte representation. Applications that need to process the original byte strings can obtain them by encoding the character strings with the -file system encoding, passing "python-escape" as the error handler -name. For example, a function that works like os.listdir, except -for accepting and returning bytes, would be written as:: +file system encoding, passing "utf8b" as the error handler name. For +example, a function that works like os.listdir, except for accepting +and returning bytes, would be written as:: def listdir_b(dirname): fse = sys.getfilesystemencoding() - dirname = dirname.decode(fse, "python-escape") + dirname = dirname.decode(fse, "utf8b") for fn in os.listdir(dirname): # fn is now a str object - yield fn.encode(fse, "python-escape") + yield fn.encode(fse, "utf8b") The encode error handler interface presently requires replacement Unicode to be provide in lieu of the non-encodable Unicode from the source string. It promptly encodes that replacement Unicode. In some -error handlers, such as the python-escape proposed here, it is simpler +error handlers, such as the utf8 proposed here, it is simpler and more efficient for the error handler to provide a pre-encoded replacement byte string, rather than forcing it to calculating Unicode from which the encoder would create the desired bytes. In fact, with -python-escape, there are required byte sequences which cannot be +utf8b, there are required byte sequences which cannot be generated from replacement Unicode. A few alternative approaches have been proposed: