PEP 597: Copy editing (#1100)

2019-06-06 08:19:01 -05:00 · 2019-06-06 08:19:01 -05:00 · 091ba8436e
parent 36da0b1352
commit 091ba8436e
1 changed files with 82 additions and 78 deletions
--- a/pep-0597.rst
+++ b/pep-0597.rst
@ -28,69 +28,71 @@ Package authors using macOS or Linux may forget that the default encoding
 is not always UTF-8.

 For example, ``long_description = open("README.md").read()`` in
-``setup.py`` is a common mistake.  If there are at least one emoji or any
-other non-ASCII characters in the ``README.md`` file, many Windows users
-cannot install the package by ``UnicodeDecodeError``.
+``setup.py`` is a common mistake.  If there is at least one emoji or any
+other non-ASCII character in the ``README.md`` file, many Windows users
+cannot install the package due to a ``UnicodeDecodeError``.


-Code page is not stable
-----------------------
+Active code page is not stable
+------------------------------

-Some tools on Windows change code page to 65001 (UTF-8), and Microsoft
-is using UTF-8 and cp65001 more widely in recent Windows 10.
+Some tools on Windows change the active code page to 65001 (UTF-8), and
+Microsoft is using UTF-8 and cp65001 more widely in recent versions of
+Windows 10.

-For example, "Command Prompt" uses legacy code page by default.
-But WSL changes the code page to 65001, and  ``python.exe`` on Windows
-can be executed from WSL.  So ``python.exe`` executed from legacy
-console and from WSL cannot read text files written by each other.
+For example, "Command Prompt" uses the legacy code page by default.
+But the Windows Subsystem for Linux (WSL) changes the active code page to
+65001, and ``python.exe`` can be executed from the WSL.  So ``python.exe``
+executed from the legacy console and from the WSL cannot read text files
+written by each other.

-But many Windows users don't understand which code page is currently used.
-So changing default text file encoding based on current code page will
-cause confusion.
+But many Windows users don't understand which code page is active.
+So changing the default text file encoding based on the active code page
+causes confusion.

 Consistent default text encoding will make Python behavior more expectable
-and easy to learn.
+and easier to learn.


-Use UTF-8 by default is easier to new programmers
-------------------------------------------------
+Using UTF-8 by default is easier on new programmers
+---------------------------------------------------

 Python is one of the most popular first programming languages.

 New programmers may not know about encoding.  When they download text data
-written in UTF-8 from the internet, they are forced to know encoding.
+written in UTF-8 from the Internet, they are forced to learn about encoding.

 Popular text editors like VS Code or Atom use UTF-8 by default.
-Even notepad.exe uses UTF-8 by default from Windows 10 2019 may update.
-(Note that Python 3.9 will be released in 2021.)
+Even Microsoft Notepad uses UTF-8 by default since the Windows 10 May 2019
+Update.  (Note that Python 3.9 will be released in 2021.)

-Additionally, the default encoding of Python source file is UTF-8.
+Additionally, the default encoding of Python source files is UTF-8.
 We can assume new Python programmers who don't know about encoding
 use editors which use UTF-8 by default.

-It would be nice if new programmers are not forced to know about encoding
+It would be nice if new programmers are not forced to learn about encoding
 until they need to handle text files encoded in encoding other than UTF-8.


 Specification
 =============

-From Python 3.9, default encoding of ``TextIOWrapper`` and ``open()`` is
-changed from ``locale.getpreferredencoding(False)`` (called "locale encoding"
-in this PEP) to "UTF-8".
+From Python 3.9, the default encoding of ``TextIOWrapper`` and ``open()`` is
+changed from ``locale.getpreferredencoding(False)`` to "UTF-8".

 When there is device encoding (``os.device_encoding(buffer.fileno())``),
-it still precedes than the default encoding.
+it still supersedes the default encoding.


-Not affected areas
------------------
+Unaffected areas
+----------------

-Unlike UTF-8 mode, ``locale.getpreferredencoding(False)`` still respect
+Unlike UTF-8 mode, ``locale.getpreferredencoding(False)`` still respects
 locale encoding.

-``stdin``, ``stdout``, and ``stderr`` keep respecting locale too.  For example,
-these commands don't cause mojibake regardless code page::
+``stdin``, ``stdout``, and ``stderr`` continue to respect locale encoding
+as well.  For example, these commands do not cause mojibake regardless of the
+active code page::

   > python -c "print('こんにちは')" | more
   こんにちは
@ -98,35 +100,35 @@ these commands don't cause mojibake regardless code page::
   > type temp.txt
   こんにちは

-Pipes and TTY should use locale encoding:
+Pipes and TTY should use the locale encoding:

-* ``subprocess`` and ``os.popen`` use locale encoding because subprocess
-  will use locale encoding.
-* ``getpass.getpass`` uses locale encoding when using TTY.
+* ``subprocess`` and ``os.popen`` use the locale encoding because the
+  subprocess will use the locale encoding.
+* ``getpass.getpass`` uses the locale encoding when using TTY.


 Affected APIs
--------------
+-------------

-All other code using default encoding of ``TextIOWrapper`` or ``open`` are
-affected.  This is incomplete list of APIs affected by this PEP:
+All other code using the default encoding of ``TextIOWrapper`` or ``open`` are
+affected.  This is an incomplete list of APIs affected by this PEP:

-* ``lzma.open``, ``gzip.open``, ``bz2.open``, ``ZipFile.read_text``.
+* ``lzma.open``, ``gzip.open``, ``bz2.open``, ``ZipFile.read_text``
 * ``socket.makefile``
 * ``tempfile.TemporaryFile``, ``tempfile.NamedTemporaryFile``
 * ``trace.CoverageResults.write_results_file``

-These APIs will use always "UTF-8" when opening text files.
+These APIs will always use "UTF-8" when opening text files.


 Deprecation Warning
 -------------------

-From 3.8, ``DeprecationWarning`` is shown when encoding is omitted and
-locale encoding is not UTF-8.  This helps not only
-writing forward compatible code, but also investigating unexpected
-``UnicodeDecodeError`` caused by assuming default text encoding is
-UTF-8. (See `People assume it is always UTF-8`_ above.)
+From 3.8 onwards, ``DeprecationWarning`` is shown when encoding is omitted and
+the locale encoding is not UTF-8.  This helps not only when writing
+forward-compatible code, but also when investigating an unexpected
+``UnicodeDecodeError`` caused by assuming the default text encoding is UTF-8.
+(See `People assume it is always UTF-8`_ above.)


 Rationale
@ -141,69 +143,70 @@ If we enable UTF-8 mode by default, even people using Windows will forget
 the default encoding is not always UTF-8.  More scripts will be written
 assuming the default encoding is UTF-8.

-So changing default encoding of text files to always UTF-8 would be
-better even if UTF-8 mode is enabled by default at some point.
+So changing the default encoding of text files to UTF-8 would be better
+even if UTF-8 mode is enabled by default at some point.


 Why not change std(in|out|err) encoding too?
 --------------------------------------------

-Even when locale encoding is not UTF-8, there will be many UTF-8
-text files.  These files are downloaded from the internet, or
-written by modern text editor same to editing Python source.
+Even when the locale encoding is not UTF-8, there can be many UTF-8
+text files.  These files could be downloaded from the Internet or
+written by modern text editors.

-On the other hand, terminal encoding is assumed to be equal to
-locale encoding.  And other tools are assumed to read and write
-locale encoding too.
+On the other hand, terminal encoding is assumed to be the same as
+locale encoding.  And other tools are assumed to read and write the
+locale encoding as well.

-std(in|out|err) are likely to be connected to a terminal or other
-tools.  So locale encoding should be respected.
+std(in|out|err) are likely to be connected to a terminal or other tools.
+So the locale encoding should be respected.


-Why not warn always when encoding is omitted?
----------------------------------------------
+Why not always warn when encoding is omitted?
+---------------------------------------------

-Omitting default encoding is a common mistake when writing portable code.
+Omitting encoding is a common mistake when writing portable code.

 But when portability does not matter, assuming UTF-8 is not so bad because
-Python already implemented locale coercion (:pep:`538`) and UTF-8 mode
+Python already implements locale coercion (:pep:`538`) and UTF-8 mode
 (:pep:`540`).

-And these scripts will become portable when default encoding is changed
-to always UTF-8.
-
+And these scripts will become portable when the default encoding is changed
+to UTF-8.


 Backward compatibility
 ======================

-There may be scripts relying on locale or code page which is not UTF-8.
-They must be rewritten to specify ``encoding`` explicitly.
+There may be scripts relying on the locale encoding or active code page not
+being UTF-8.  They must be rewritten to specify ``encoding`` explicitly.

-* If the script assumed ``latin1`` or ``cp932``, use ``encoding="latin1"``
+* If the script assumes ``latin1`` or ``cp932``, ``encoding="latin1"``
  or ``encoding="cp932"`` should be used.

 * If the script is designed to respect locale encoding,
  ``locale.getpreferredencoding(False)`` should be used.

-  There are non-portable short forms of ``locale.getpreferredencoding(False)``.
+  There are non-portable short forms of
+  ``locale.getpreferredencoding(False)``.

  * On Windows, ``"mbcs"`` can be used instead.
  * On Unix, ``os.fsencoding()`` can be used instead.

-Note that such scripts will be broken even without upgrading Python:
+Note that such scripts will be broken even without upgrading Python, such as
+when:

 * Upgrading Windows
 * Changing the language setting
 * Changing terminal from legacy console to a modern one
-* Using tools which does ``chcp 65001``
+* Using tools which do ``chcp 65001``


 How to Teach This
 =================

-When opening text files, "UTF-8" is used by default.  It is consistent
-with default encoding used for ``text.encode()``.
+When opening text files, "UTF-8" is used by default.  It is consistent with
+the default encoding used for ``text.encode()``.


 Reference Implementation
@ -222,18 +225,19 @@ Open Issues
 ===========

 Alias for locale encoding
--------------------------
+-------------------------

 ``encoding=locale.getpreferredencoding(False)`` is too long, and
-``"mbcs"`` or ``os.fsencoding()`` are not portable.
+``"mbcs"`` and ``os.fsencoding()`` are not portable.

-We may be possible to add new alias encoding "locale" for easy and
+It may be possible to add a new "locale" encoding alias as an easy and
 portable version of ``locale.getpreferredencoding(False)``.

-I'm not sure this is easy enough because ``encodings`` is imported
-before ``_bootlocale`` currently.
+The difficulty of this is uncertain because ``encodings`` is currently
+imported prior to ``_bootlocale``.

-Another option is ``TextIOWrapper`` treats `"locale"` as special case::
+Another option is for ``TextIOWrapper`` to treat `"locale"` as a special
+case::

   if encoding == "locale":
       encoding = locale.getpreferredencoding(False)