PEP 597: Copy editing (#1100)

2019-06-06 08:19:01 -05:00 · 2019-06-06 08:19:01 -05:00 · 091ba8436e
parent 36da0b1352
commit 091ba8436e
1 changed files with 82 additions and 78 deletions
--- a/pep-0597.rst
+++ b/pep-0597.rst
@ -28,69 +28,71 @@ Package authors using macOS or Linux may forget that the default encoding
 is not always UTF-8.
 For example, ``long_description = open("README.md").read()`` in
-``setup.py`` is a common mistake.  If there are at least one emoji or any
+``setup.py`` is a common mistake.  If there is at least one emoji or any
-other non-ASCII characters in the ``README.md`` file, many Windows users
+other non-ASCII character in the ``README.md`` file, many Windows users
-cannot install the package by ``UnicodeDecodeError``.
+cannot install the package due to a ``UnicodeDecodeError``.
-Code page is not stable
+Active code page is not stable
-----------------------
+------------------------------
-Some tools on Windows change code page to 65001 (UTF-8), and Microsoft
+Some tools on Windows change the active code page to 65001 (UTF-8), and
-is using UTF-8 and cp65001 more widely in recent Windows 10.
+Microsoft is using UTF-8 and cp65001 more widely in recent versions of
 Windows 10.
-For example, "Command Prompt" uses legacy code page by default.
+For example, "Command Prompt" uses the legacy code page by default.
-But WSL changes the code page to 65001, and  ``python.exe`` on Windows
+But the Windows Subsystem for Linux (WSL) changes the active code page to
-can be executed from WSL.  So ``python.exe`` executed from legacy
+65001, and ``python.exe`` can be executed from the WSL.  So ``python.exe``
-console and from WSL cannot read text files written by each other.
+executed from the legacy console and from the WSL cannot read text files
 written by each other.
-But many Windows users don't understand which code page is currently used.
+But many Windows users don't understand which code page is active.
-So changing default text file encoding based on current code page will
+So changing the default text file encoding based on the active code page
-cause confusion.
+causes confusion.
 Consistent default text encoding will make Python behavior more expectable
-and easy to learn.
+and easier to learn.
-Use UTF-8 by default is easier to new programmers
+Using UTF-8 by default is easier on new programmers
-------------------------------------------------
+---------------------------------------------------
 Python is one of the most popular first programming languages.
 New programmers may not know about encoding.  When they download text data
-written in UTF-8 from the internet, they are forced to know encoding.
+written in UTF-8 from the Internet, they are forced to learn about encoding.
 Popular text editors like VS Code or Atom use UTF-8 by default.
-Even notepad.exe uses UTF-8 by default from Windows 10 2019 may update.
+Even Microsoft Notepad uses UTF-8 by default since the Windows 10 May 2019
-(Note that Python 3.9 will be released in 2021.)
+Update.  (Note that Python 3.9 will be released in 2021.)
-Additionally, the default encoding of Python source file is UTF-8.
+Additionally, the default encoding of Python source files is UTF-8.
 We can assume new Python programmers who don't know about encoding
 use editors which use UTF-8 by default.
-It would be nice if new programmers are not forced to know about encoding
+It would be nice if new programmers are not forced to learn about encoding
 until they need to handle text files encoded in encoding other than UTF-8.
 Specification
 =============
-From Python 3.9, default encoding of ``TextIOWrapper`` and ``open()`` is
+From Python 3.9, the default encoding of ``TextIOWrapper`` and ``open()`` is
-changed from ``locale.getpreferredencoding(False)`` (called "locale encoding"
+changed from ``locale.getpreferredencoding(False)`` to "UTF-8".
 in this PEP) to "UTF-8".
 When there is device encoding (``os.device_encoding(buffer.fileno())``),
-it still precedes than the default encoding.
+it still supersedes the default encoding.
-Not affected areas
+Unaffected areas
------------------
+----------------
-Unlike UTF-8 mode, ``locale.getpreferredencoding(False)`` still respect
+Unlike UTF-8 mode, ``locale.getpreferredencoding(False)`` still respects
 locale encoding.
-``stdin``, ``stdout``, and ``stderr`` keep respecting locale too.  For example,
+``stdin``, ``stdout``, and ``stderr`` continue to respect locale encoding
-these commands don't cause mojibake regardless code page::
+as well.  For example, these commands do not cause mojibake regardless of the
 active code page::
   > python -c "print('こんにちは')" | more
   こんにちは
@ -98,35 +100,35 @@ these commands don't cause mojibake regardless code page::
   > type temp.txt
   こんにちは
-Pipes and TTY should use locale encoding:
+Pipes and TTY should use the locale encoding:
-* ``subprocess`` and ``os.popen`` use locale encoding because subprocess
+* ``subprocess`` and ``os.popen`` use the locale encoding because the
-  will use locale encoding.
+  subprocess will use the locale encoding.
-* ``getpass.getpass`` uses locale encoding when using TTY.
+* ``getpass.getpass`` uses the locale encoding when using TTY.
 Affected APIs
--------------
+-------------
-All other code using default encoding of ``TextIOWrapper`` or ``open`` are
+All other code using the default encoding of ``TextIOWrapper`` or ``open`` are
-affected.  This is incomplete list of APIs affected by this PEP:
+affected.  This is an incomplete list of APIs affected by this PEP:
-* ``lzma.open``, ``gzip.open``, ``bz2.open``, ``ZipFile.read_text``.
+* ``lzma.open``, ``gzip.open``, ``bz2.open``, ``ZipFile.read_text``
 * ``socket.makefile``
 * ``tempfile.TemporaryFile``, ``tempfile.NamedTemporaryFile``
 * ``trace.CoverageResults.write_results_file``
-These APIs will use always "UTF-8" when opening text files.
+These APIs will always use "UTF-8" when opening text files.
 Deprecation Warning
 -------------------
-From 3.8, ``DeprecationWarning`` is shown when encoding is omitted and
+From 3.8 onwards, ``DeprecationWarning`` is shown when encoding is omitted and
-locale encoding is not UTF-8.  This helps not only
+the locale encoding is not UTF-8.  This helps not only when writing
-writing forward compatible code, but also investigating unexpected
+forward-compatible code, but also when investigating an unexpected
-``UnicodeDecodeError`` caused by assuming default text encoding is
+``UnicodeDecodeError`` caused by assuming the default text encoding is UTF-8.
-UTF-8. (See `People assume it is always UTF-8`_ above.)
+(See `People assume it is always UTF-8`_ above.)
 Rationale
@ -141,69 +143,70 @@ If we enable UTF-8 mode by default, even people using Windows will forget
 the default encoding is not always UTF-8.  More scripts will be written
 assuming the default encoding is UTF-8.
-So changing default encoding of text files to always UTF-8 would be
+So changing the default encoding of text files to UTF-8 would be better
-better even if UTF-8 mode is enabled by default at some point.
+even if UTF-8 mode is enabled by default at some point.
 Why not change std(in|out|err) encoding too?
 --------------------------------------------
-Even when locale encoding is not UTF-8, there will be many UTF-8
+Even when the locale encoding is not UTF-8, there can be many UTF-8
-text files.  These files are downloaded from the internet, or
+text files.  These files could be downloaded from the Internet or
-written by modern text editor same to editing Python source.
+written by modern text editors.
-On the other hand, terminal encoding is assumed to be equal to
+On the other hand, terminal encoding is assumed to be the same as
-locale encoding.  And other tools are assumed to read and write
+locale encoding.  And other tools are assumed to read and write the
-locale encoding too.
+locale encoding as well.
-std(in|out|err) are likely to be connected to a terminal or other
+std(in|out|err) are likely to be connected to a terminal or other tools.
-tools.  So locale encoding should be respected.
+So the locale encoding should be respected.
-Why not warn always when encoding is omitted?
+Why not always warn when encoding is omitted?
----------------------------------------------
+---------------------------------------------
-Omitting default encoding is a common mistake when writing portable code.
+Omitting encoding is a common mistake when writing portable code.
 But when portability does not matter, assuming UTF-8 is not so bad because
-Python already implemented locale coercion (:pep:`538`) and UTF-8 mode
+Python already implements locale coercion (:pep:`538`) and UTF-8 mode
 (:pep:`540`).
-And these scripts will become portable when default encoding is changed
+And these scripts will become portable when the default encoding is changed
-to always UTF-8.
+to UTF-8.
 Backward compatibility
 ======================
-There may be scripts relying on locale or code page which is not UTF-8.
+There may be scripts relying on the locale encoding or active code page not
-They must be rewritten to specify ``encoding`` explicitly.
+being UTF-8.  They must be rewritten to specify ``encoding`` explicitly.
-* If the script assumed ``latin1`` or ``cp932``, use ``encoding="latin1"``
+* If the script assumes ``latin1`` or ``cp932``, ``encoding="latin1"``
  or ``encoding="cp932"`` should be used.
 * If the script is designed to respect locale encoding,
  ``locale.getpreferredencoding(False)`` should be used.
-  There are non-portable short forms of ``locale.getpreferredencoding(False)``.
+  There are non-portable short forms of
  ``locale.getpreferredencoding(False)``.
  * On Windows, ``"mbcs"`` can be used instead.
  * On Unix, ``os.fsencoding()`` can be used instead.
-Note that such scripts will be broken even without upgrading Python:
+Note that such scripts will be broken even without upgrading Python, such as
 when:
 * Upgrading Windows
 * Changing the language setting
 * Changing terminal from legacy console to a modern one
-* Using tools which does ``chcp 65001``
+* Using tools which do ``chcp 65001``
 How to Teach This
 =================
-When opening text files, "UTF-8" is used by default.  It is consistent
+When opening text files, "UTF-8" is used by default.  It is consistent with
-with default encoding used for ``text.encode()``.
+the default encoding used for ``text.encode()``.
 Reference Implementation
@ -222,18 +225,19 @@ Open Issues
 ===========
 Alias for locale encoding
--------------------------
+-------------------------
 ``encoding=locale.getpreferredencoding(False)`` is too long, and
-``"mbcs"`` or ``os.fsencoding()`` are not portable.
+``"mbcs"`` and ``os.fsencoding()`` are not portable.
-We may be possible to add new alias encoding "locale" for easy and
+It may be possible to add a new "locale" encoding alias as an easy and
 portable version of ``locale.getpreferredencoding(False)``.
-I'm not sure this is easy enough because ``encodings`` is imported
+The difficulty of this is uncertain because ``encodings`` is currently
-before ``_bootlocale`` currently.
+imported prior to ``_bootlocale``.
-Another option is ``TextIOWrapper`` treats `"locale"` as special case::
+Another option is for ``TextIOWrapper`` to treat `"locale"` as a special
 case::
   if encoding == "locale":
       encoding = locale.getpreferredencoding(False)