Fix a couple of issues with pep0540 (#252)

2017-05-09 01:24:28 +03:00 · 2017-05-09 01:24:28 +03:00 · ae226965ea
parent 95bdb222e1
commit ae226965ea
1 changed files with 151 additions and 151 deletions
--- a/pep-0540.txt
+++ b/pep-0540.txt
@ -17,11 +17,11 @@ Abstract
 Add a new UTF-8 mode, disabled by default, to ignore the locale and
 force the usage of the UTF-8 encoding.

-Basically, the UTF-8 mode behaves as Python 2: it "just works" and don't
+Basically, UTF-8 mode behaves as Python 2: it "just works" and doesn't
 bother users with encodings, but it can produce mojibake. The UTF-8 mode
 can be configured as strict to prevent mojibake.

-New ``-X utf8`` command line option and ``PYTHONUTF8`` environment
+A new ``-X utf8`` command line option and a ``PYTHONUTF8`` environment
 variable are added to control the UTF-8 mode. The POSIX locale enables
 the UTF-8 mode.

@ -35,31 +35,31 @@ Rationale
 Since Python 3.0 was released in 2008, the usual answer to users getting
 Unicode errors is to ask developers to fix their code to handle Unicode
 properly. Most applications and Python modules were fixed, but users
-keep reporting Unicode errors regulary: see the long list of issues in
+kept reporting Unicode errors regularly: see the long list of issues in
 the `Links`_ section below.

-In fact, a second class of bug comes from a locale which is not properly
-configured. The usual answer to such bug report is: "it is not a bug,
+In fact, a second class of bugs comes from a locale which is not properly
+configured. The usual answer to such a bug report is: "it is not a bug,
 you must fix your locale".

 Technically, the answer is correct, but from a practical point of view,
-the answer is not acceptable. In many cases, "fixing the issue" is an
+the answer is not acceptable. In many cases, "fixing the issue" is a
 hard task. Moreover, sometimes, the usage of the POSIX locale is
 deliberate.

 A good example of a concrete issue are build systems which create a
 fresh environment for each build using a chroot, a container, a virtual
-machine or something else to get reproductible builds. Such setup
-usually uses the POSIX locale.  To get 100% reproductible builds, the
+machine or something else to get reproducible builds. Such a setup
+usually uses the POSIX locale.  To get 100% reproducible builds, the
 POSIX locale is a good choice: see the `Locales section of
 reproducible-builds.org
 <https://reproducible-builds.org/docs/locales/>`_.

 UNIX users don't expect Unicode errors, since the common command lines
 tools like ``cat``, ``grep`` or ``sed`` never fail with Unicode errors.
-These users expect that Python 3 "just works" with any locale and don't
+These users expect that Python 3 "just works" with any locale and won't
 bother them with encodings. From their point of the view, the bug is not
-their locale but is obviously Python 3.
+their locale, it's obviously Python 3.

 Since Python 2 handles data as bytes, it's rarer in Python 2
 compared to Python 3 to get Unicode errors. It also explains why users
@ -68,7 +68,7 @@ also perceive Python 3 as the root cause of their Unicode errors.
 Some users expect that Python 3 just works with any locale and so don't
 bother with mojibake, whereas some developers are working hard to prevent
 mojibake and so expect that Python 3 fails early before creating
-mojibake.
+it.

 Since different group of users have different expectations, there is no
 silver bullet which solves all issues at once. Last but not least,
@ -105,7 +105,7 @@ decode and encode operating system data. These functions use the
 filesystem error handler: ``sys.getfilesystemencodeerrors()``.

 .. note::
-   In some corner case, the *current* ``LC_CTYPE`` locale must be used
+   In some corner cases, the *current* ``LC_CTYPE`` locale must be used
   instead of ``sys.getfilesystemencoding()``. For example, the ``time``
   module uses the *current* ``LC_CTYPE`` locale to decode timezone
   names.
@ -121,7 +121,7 @@ this preference order:
 * ``LC_CTYPE``
 * ``LANG``

-The POSIX locale,also known as "the C locale", is used:
+The POSIX locale, also known as "the C locale", is used:

 * if the first set variable is set to ``"C"``
 * if all these variables are unset, for example when a program is
@ -140,7 +140,7 @@ arguments are decoded by ``mbstowcs()`` and encoded back by
 ``os.fsencode()``, an ``UnicodeEncodeError`` exception is raised instead
 of retrieving the original byte string.

-To fix this issue, Python checks since Python 3.4 if ``mbstowcs()``
+To fix this issue, from Python 3.4, a check is made to see if ``mbstowcs()``
 really uses the ASCII encoding if the the ``LC_CTYPE`` uses the the
 POSIX locale and ``nl_langinfo(CODESET)`` returns ``"ASCII"`` (or an
 alias to ASCII). If not (the effective encoding is not ASCII), Python
@ -158,7 +158,7 @@ In many cases, the POSIX locale is not really expected by users who get
 it by mistake. Examples:

 * program started in an empty environment
-* User forcing LANG=C to get messages in english
+* User forcing LANG=C to get messages in English
 * LANG=C used for bad reasons, without being aware of the ASCII encoding
 * SSH shell
 * Linux installed with no configured locale
@ -178,7 +178,7 @@ the UTF-8 encoding:
 * Debian (eglibc 2.13-1, 2011), Ubuntu: ``"C.UTF-8"``
 * HP-UX: ``"C.utf8"``

-It was proposed to add a ``C.UTF-8`` locale to the glibc: `glibc C.UTF-8
+It was proposed to add a ``C.UTF-8`` locale to glibc: `glibc C.UTF-8
 proposal <https://sourceware.org/glibc/wiki/Proposals/C.UTF-8>`_.

 It is not planned to add such locale to BSD systems.
@ -190,7 +190,7 @@ Popularity of the UTF-8 encoding
 Python 3 uses UTF-8 by default for Python source files.

 On Mac OS X, Windows and Android, Python always use UTF-8 for operating
-system data. For Windows, see the `PEP 529`_: "Change Windows filesystem
+system data. For Windows, see `PEP 529`_: "Change Windows filesystem
 encoding to UTF-8".

 On Linux, UTF-8 became the de facto standard encoding,
@ -198,8 +198,8 @@ replacing legacy encodings like ISO 8859-1 or ShiftJIS. For example,
 using different encodings for filenames and standard streams is likely
 to create mojibake, so UTF-8 is now used *everywhere*.

-The UTF-8 encoding is the default encoding of XML and JSON file format.
-In January 2017, UTF-8 was used in `more than 88% of web pages
+The UTF-8 encoding is the default encoding of XML and JSON file formats.
+As of January 2017, UTF-8 was used in `more than 88% of web pages
 <https://w3techs.com/technologies/details/en-utf8/all/all>`_ (HTML,
 Javascript, CSS, etc.).

@ -209,7 +209,7 @@ information on the UTF-8 codec.
 .. note::
   Some applications and operating systems (especially Windows) use Byte
   Order Markers (BOM) to indicate the used Unicode encoding: UTF-7,
-   UTF-8, UTF-16-LE, etc. BOM are not well supported and rarely used in
+   UTF-8, UTF-16-LE, etc. BOM are not well supported and are rarely used in
   Python.


@ -221,7 +221,7 @@ the wild which don't use UTF-8. And there are a lot of data stored in
 different encodings. For example, an old USB key using the ext3
 filesystem with filenames encoded to ISO 8859-1.

-The Linux kernel and the libc don't decode filenames: a filename is used
+The Linux kernel and libc don't decode filenames: a filename is used
 as a raw array of bytes. The common solution to support any filename is
 to store filenames as bytes and don't try to decode them. When displayed
 to stdout, mojibake is displayed if the filename and the terminal don't
@ -231,8 +231,8 @@ Python 3 promotes Unicode everywhere including filenames. A solution to
 support filenames not decodable from the locale encoding was found: the
 ``surrogateescape`` error handler (`PEP 383`_), store undecodable bytes
 as surrogate characters. This error handler is used by default for
-`operating system data`_, by ``os.fsdecode()`` and ``os.fsencode()`` for
-example (except on Windows which uses the ``strict`` error handler).
+`operating system data`_, for example, by ``os.fsdecode()`` and
+``os.fsencode()`` (except on Windows which uses the ``strict`` error handler).


 Standard streams
@ -243,7 +243,7 @@ stderr. The ``strict`` error handler is used by stdin and stdout to
 prevent mojibake.

 The ``backslashreplace`` error handler is used by stderr to avoid
-Unicode encode error when displaying non-ASCII text. It is especially
+Unicode encode errors when displaying non-ASCII text. It is especially
 useful when the POSIX locale is used, because this locale usually uses
 the ASCII encoding.

@ -254,15 +254,15 @@ contains an undecoded byte stored as a surrogate character.

 Python 3.6 now uses ``surrogateescape`` for stdin and stdout if the
 POSIX locale is used: `issue #19977
-<http://bugs.python.org/issue19977>`_. The idea is to passthrough
-`operating system data`_ even if it means mojibake, because most UNIX
+<http://bugs.python.org/issue19977>`_. The idea is to pass through
+`operating system data`_ even if it creates mojibake, because most UNIX
 applications work like that. Most UNIX applications store filenames as
-bytes, usually simply because bytes are first-citizen class in the used
+bytes, usually because bytes are first-citizen class in the used
 programming language, whereas Unicode is badly supported.

 .. note::
   The encoding and/or the error handler of standard streams can be
-   overriden with the ``PYTHONIOENCODING`` environment variable.
+   overridden with the ``PYTHONIOENCODING`` environment variable.


 Proposal
@ -276,18 +276,18 @@ force the usage of the UTF-8 encoding with the ``surrogateescape`` error
 handler, instead using the locale encoding (with ``strict`` or
 ``surrogateescape`` error handler depending on the case).

-Basically, the UTF-8 mode behaves as Python 2: it "just works" and don't
+Basically, the UTF-8 mode behaves as Python 2: it "just works" and doesn't
 bother users with encodings, but it can produce mojibake. It can be
 configured as strict to prevent mojibake: the UTF-8 encoding is used
 with the ``strict`` error handler for inputs and outputs, but the
 ``surrogateescape`` error handler is still used for `operating system
 data`_.

-New ``-X utf8`` command line option and ``PYTHONUTF8`` environment
+A new ``-X utf8`` command line option and a ``PYTHONUTF8`` environment
 variable are added to control the UTF-8 mode. The UTF-8 mode is enabled
-by ``-X utf8`` or ``PYTHONUTF8=1``.  The UTF-8 is configured as strict
-by ``-X utf8=strict`` or ``PYTHONUTF8=strict``. Other option values fail
-with an error.
+by using ``-X utf8`` or ``PYTHONUTF8=1``.  It can be configured as strict
+by using ``-X utf8=strict`` or ``PYTHONUTF8=strict``. Other option values
+fail with an error.

 The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode
 can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``.
@ -300,23 +300,23 @@ Options priority for the UTF-8 mode:
 * POSIX locale

 For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the UTF-8 mode,
-whereas ``LC_ALL=C python3.7 -X utf8=0`` disables the UTF-8 mode and so
-use the encoding of the POSIX locale.
+whereas ``LC_ALL=C python3.7 -X utf8=0`` disables the UTF-8 mode and
+uses the encoding of the POSIX locale.

 Encodings used by ``open()``, highest priority first:

 * *encoding* and *errors* parameters (if set)
 * UTF-8 mode
-* os.device_encoding(fd)
-* os.getpreferredencoding(False)
+* ``os.device_encoding(fd)``
+* ``os.getpreferredencoding(False)``


 Encoding and error handler
 --------------------------

 The UTF-8 mode changes the default encoding and error handler used by
-open(), os.fsdecode(), os.fsencode(), sys.stdin, sys.stdout and
-sys.stderr:
+``open()``, ``os.fsdecode()``, ``os.fsencode()``, ``sys.stdin``,
+``sys.stdout`` and ``sys.stderr``:

 ============================  =======================  ==========================  ==========================
 Function                      Default                  UTF-8 mode or POSIX locale  UTF-8 Strict mode
@ -342,7 +342,7 @@ The UTF-8 mode uses the ``surrogateescape`` error handler instead of the
 strict mode for convenience: the idea is that data not encoded to UTF-8
 are passed through "Python" without being modified, as raw bytes.

-The ``PYTHONIOENCODING`` environment variable has the priority over the
+The ``PYTHONIOENCODING`` environment variable has priority over the
 UTF-8 mode for standard streams. For example, ``PYTHONIOENCODING=latin1
 python3 -X utf8`` uses the Latin1 encoding for stdin, stdout and stderr.

@ -372,15 +372,15 @@ sys.stderr                    UTF-8/backslashreplace   UTF-8/backslashreplace
 ============================  =======================  ==========================

 The "Legacy Windows FS encoding" is enabled by setting the
-``PYTHONLEGACYWINDOWSFSENCODING`` environment variable to ``1``, see the
-`PEP 529`.
+``PYTHONLEGACYWINDOWSFSENCODING`` environment variable to ``1`` as specified
+in `PEP 529` .

 Enabling the legacy Windows filesystem encoding disables the UTF-8 mode
 (as ``-X utf8=0``).

-If stdin and/or stdout is redirected to a pipe, sys.stdin and/or
-sys.output uses ``mbcs`` encoding by default, rather than UTF-8. But
-with the UTF-8 mode, sys.stdin and sys.stdout always use the UTF-8
+If stdin and/or stdout is redirected to a pipe, ``sys.stdin`` and/or
+``sys.output`` use ``mbcs`` encoding by default rather than UTF-8. But
+with the UTF-8 mode, ``sys.stdin`` and ``sys.stdout`` always use the UTF-8
 encoding.

 There is no POSIX locale on Windows. The ANSI code page is used to the
@ -390,23 +390,23 @@ locale encoding, and this code page never uses the ASCII encoding.
 Rationale
 ---------

-The UTF-8 mode is disabled by default to keep hard Unicode errors when
-encoding or decoding `operating system data`_ failed, and to keep the
-backward compatibility. The user is responsible to enable explicitly the
-UTF-8 mode, and so is better prepared for mojibake than if the UTF-8
-mode would be enabled *by default*.
+UTF-8 mode is disabled by default in order to keep hard Unicode errors when
+encoding or decoding `operating system data`_ fails and preserve
+backward compatibility. In addition, users will be better prepared for
+mojibake if it is their responsibility to explicitly enable UTF-8 mode
+than they would be if it was enabled *by default*.

-The UTF-8 mode should be used on systems known to be configured with
+UTF-8 mode should be used on systems known to be configured with
 UTF-8 where most applications speak UTF-8. It prevents Unicode errors if
 the user overrides a locale *by mistake* or if a Python program is
 started with no locale configured (and so with the POSIX locale).

 Most UNIX applications handle `operating system data`_ as bytes, so
-``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables have a
+the ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables have a
 limited impact on how these data are handled by the application.

-The Python UTF-8 mode should help to make Python more interoperable with
-the  other UNIX applications in the system assuming that *UTF-8* is used
+The UTF-8 mode should help make Python more interoperable with
+other UNIX applications on the system assuming that *UTF-8* is used
 everywhere and that users *expect* UTF-8.

 Ignoring ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables in
@ -434,14 +434,14 @@ it should work fine.
 External code using bytes
 ^^^^^^^^^^^^^^^^^^^^^^^^^

-If the external code process data as bytes, surrogate characters are not
+If the external code processes data as bytes, surrogate characters are not
 an issue since they are only used inside Python. Python encodes back
 surrogate characters to bytes at the edges, before calling external
 code.

 The UTF-8 mode can produce mojibake since Python and external code don't
 both of invalid bytes, but it's a deliberate choice. The UTF-8 mode can
-be configured as strict to prevent mojibake and be fail early when data
+be configured as strict to prevent mojibake and fail early when data
 is not decodable from UTF-8 or not encodable to UTF-8.

 External code using text
@ -455,14 +455,14 @@ surrogate characters.
 Use Cases
 =========

-The following use cases were written to help to understand the impact of
-chosen encodings and error handlers on concrete examples.
+The following use cases were written to highlight the impact of
+the chosen encodings and error handlers on concrete examples.

 The "Always work" results were written to prove the benefit of having a
 UTF-8 mode which works with any data and any locale, compared to the
 existing old Python versions.

-The "Mojibake" column shows that ignoring the locale causes a pratical
+The "Mojibake" column shows that ignoring the locale causes a practical
 issue: the UTF-8 mode produces mojibake if the terminal doesn't use the
 UTF-8 encoding.

@ -477,15 +477,15 @@ Script listing the content of the current directory into stdout::

 Result:

-========================  ============  =========
-Python                    Always work?  Mojibake?
-========================  ============  =========
-Python 2                  **Yes**       **Yes**
-Python 3                  No            No
-Python 3.5, POSIX locale  **Yes**       **Yes**
-UTF-8 mode                **Yes**       **Yes**
-UTF-8 Strict mode         No            No
-========================  ============  =========
+========================  =============  =========
+Python                    Always works?  Mojibake?
+========================  =============  =========
+Python 2                  **Yes**        **Yes**
+Python 3                  No             No
+Python 3.5, POSIX locale  **Yes**        **Yes**
+UTF-8 mode                **Yes**        **Yes**
+UTF-8 Strict mode         No             No
+========================  =============  =========

 "No" means that the script can fail on decoding or encoding a filename
 depending on the locale or the filename.
@ -494,7 +494,7 @@ To be able to always work, the program must be able to produce mojibake.
 Mojibake is more user friendly than an error with a truncated or empty
 output.

-Example with a directory which contains the file called ``b'xxx\xff'``
+For example, using a directory which contains a file called ``b'xxx\xff'``
 (the byte ``0xFF`` is invalid in UTF-8).

 Default and UTF-8 Strict mode fail on ``print()`` with an encode error::
@ -511,7 +511,7 @@ Default and UTF-8 Strict mode fail on ``print()`` with an encode error::
        print(name)
    UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ...

-The UTF-8 mode, POSIX locale, Python 2 and the UNIX ``ls`` command work
+UTF-8 mode, POSIX locale, Python 2 and the UNIX ``ls`` command work
 but display mojibake::

    $ python3.7 -X utf8 ../ls.py
@ -541,17 +541,17 @@ a text file::

 Result:

-========================  ============  =========
-Python                    Always work?  Mojibake?
-========================  ============  =========
-Python 2                  **Yes**       **Yes**
-Python 3                  No            No
-Python 3.5, POSIX locale  No            No
-UTF-8 mode                **Yes**       **Yes**
-UTF-8 Strict mode         No            No
-========================  ============  =========
+========================  =============  =========
+Python                    Always works?  Mojibake?
+========================  =============  =========
+Python 2                  **Yes**        **Yes**
+Python 3                  No             No
+Python 3.5, POSIX locale  No             No
+UTF-8 mode                **Yes**        **Yes**
+UTF-8 Strict mode         No             No
+========================  =============  =========

-"Yes" involves that mojibake can be produced. "No" means that the script
+"Yes" implies that mojibake can be produced. "No" means that the script
 can fail on decoding or encoding a filename depending on the locale or
 the filename. Typical error::

@ -572,15 +572,15 @@ Very basic example used to illustrate a common issue, display the euro sign

 Result:

-========================  ============  =========
-Python                    Always work?  Mojibake?
-========================  ============  =========
-Python 2                  No            No
-Python 3                  No            No
-Python 3.5, POSIX locale  No            No
-UTF-8 mode                **Yes**       **Yes**
-UTF-8 Strict mode         **Yes**       **Yes**
-========================  ============  =========
+========================  =============  =========
+Python                    Always works?  Mojibake?
+========================  =============  =========
+Python 2                  No             No
+Python 3                  No             No
+Python 3.5, POSIX locale  No             No
+UTF-8 mode                **Yes**        **Yes**
+UTF-8 Strict mode         **Yes**        **Yes**
+========================  =============  =========

 The UTF-8 and UTF-8 Strict modes will always encode the euro sign as
 UTF-8. If the terminal uses a different encoding, we get mojibake.
@ -589,7 +589,7 @@ UTF-8. If the terminal uses a different encoding, we get mojibake.
 Replace a word in a text
 ------------------------

-The following scripts replaces the word "apple" with "orange". It
+The following script replaces the word "apple" with "orange". It
 reads input from stdin and writes the output into stdout::

    import sys
@ -598,15 +598,15 @@ reads input from stdin and writes the output into stdout::

 Result:

-========================  ============  =========
-Python                    Always work?  Mojibake?
-========================  ============  =========
-Python 2                  **Yes**       **Yes**
-Python 3                  No            No
-Python 3.5, POSIX locale  **Yes**       **Yes**
-UTF-8 mode                **Yes**       **Yes**
-UTF-8 Strict mode         No            No
-========================  ============  =========
+========================  =============  =========
+Python                    Always works?  Mojibake?
+========================  =============  =========
+Python 2                  **Yes**        **Yes**
+Python 3                  No             No
+Python 3.5, POSIX locale  **Yes**        **Yes**
+UTF-8 mode                **Yes**        **Yes**
+UTF-8 Strict mode         No             No
+========================  =============  =========

 Producer-consumer model using pipes
 -----------------------------------
@ -618,32 +618,32 @@ On a shell, such programs are run with the command::

    producer | consumer

-The question if these programs will work with any data and any locale.
+The question is if these programs will work with any data and any locale.
 UNIX users don't expect Unicode errors, and so expect that such programs
-"just works".
+"just work".

 If the producer only produces ASCII output, no error should occur. Let's
-say the that producer writes at least one non-ASCII character (at least
+say the that the producer writes at least one non-ASCII character (at least
 one byte in the range ``0x80..0xff``).

 To simplify the problem, let's say that the consumer has no output
-(don't write result into a file or stdout).
+(doesn't write results into a file or stdout).

 A "Bytes producer" is an application which cannot fail with a Unicode
 error and produces bytes into stdout.

 Let's say that a "Bytes consumer" does not decode stdin but stores data
-as bytes: such consumer always work. Common UNIX command line tools like
+as bytes: such a consumer always works. Common UNIX command line tools like
 ``cat``, ``grep`` or ``sed`` are in this category. Many Python 2
 applications are also in this category.

-"Python producer" and "Python consumer" are producer and consumer
+"Python producer" and "Python consumer" are a producer and consumer
 implemented in Python.

 Bytes producer, Bytes consumer
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-It always work, but it is out of the scope of this PEP since it doesn't
+It always works, but it is out of the scope of this PEP since it doesn't
 involve Python.

 Python producer, Bytes consumer
@ -655,18 +655,18 @@ Python producer::

 Result:

-========================  ============  =========
-Python                    Always work?  Mojibake?
-========================  ============  =========
-Python 2                  No            No
-Python 3                  No            No
-Python 3.5, POSIX locale  No            No
-UTF-8 mode                **Yes**       **Yes**
-UTF-8 Strict mode         No            No
-========================  ============  =========
+========================  =============  =========
+Python                    Always works?  Mojibake?
+========================  =============  =========
+Python 2                  No             No
+Python 3                  No             No
+Python 3.5, POSIX locale  No             No
+UTF-8 mode                **Yes**        **Yes**
+UTF-8 Strict mode         No             No
+========================  =============  =========

 The question here is not if the consumer is able to decode the input,
-but if Python is able to produce its ouput. So it's similar to the
+but if Python is able to produce its output. So it's similar to the
 `Display Unicode characters into stdout`_ case.

 UTF-8 modes work with any locale since the consumer doesn't try to
@ -684,15 +684,15 @@ Python consumer::

 Result:

-========================  ============  =========
-Python                    Always work?  Mojibake?
-========================  ============  =========
-Python 2                  **Yes**       **Yes**
-Python 3                  No            No
-Python 3.5, POSIX locale  **Yes**       **Yes**
-UTF-8 mode                **Yes**       **Yes**
-UTF-8 Strict mode         No            No
-========================  ============  =========
+========================  =============  =========
+Python                    Always works?  Mojibake?
+========================  =============  =========
+Python 2                  **Yes**        **Yes**
+Python 3                  No             No
+Python 3.5, POSIX locale  **Yes**        **Yes**
+UTF-8 mode                **Yes**        **Yes**
+UTF-8 Strict mode         No             No
+========================  =============  =========

 Python 3 fails on decoding stdin depending on the input and the locale.

@ -711,17 +711,17 @@ Python consumer::
    result = text.replace("apple", "orange")
    # ignore the result

-Result, same Python version used for the producer and the consumer:
+Result, using the same Python version for the producer and the consumer:

-========================  ============  =========
-Python                    Always work?  Mojibake?
-========================  ============  =========
-Python 2                  No            No
-Python 3                  No            No
-Python 3.5, POSIX locale  No            No
-UTF-8 mode                **Yes**       **Yes**
-UTF-8 Strict mode         No            No
-========================  ============  =========
+========================  =============  =========
+Python                    Always works?  Mojibake?
+========================  =============  =========
+Python 2                  No             No
+Python 3                  No             No
+Python 3.5, POSIX locale  No             No
+UTF-8 mode                **Yes**        **Yes**
+UTF-8 Strict mode         No             No
+========================  =============  =========

 This case combines a Python producer with a Python consumer, so the
 result is the subset of `Python producer, Bytes consumer`_ and `Bytes
@ -737,7 +737,7 @@ with the ``surrogateescape`` error handler, encoding errors should not
 occur and so the change should not break applications.

 The more likely source of trouble comes from external libraries. Python
-can decode successfully data from UTF-8, but a library using the locale
+can successfully decode data from UTF-8, but a library using the locale
 encoding can fail to encode the decoded text back to bytes.  Hopefully,
 encoding text in a library is a rare operation. Very few libraries
 expect text, most libraries expect bytes and even manipulate bytes
@ -754,14 +754,14 @@ Don't modify the encoding of the POSIX locale
 ---------------------------------------------

 A first version of the PEP did not change the encoding and error handler
-used of the POSIX locale.
+used for the POSIX locale.

 The problem is that adding the ``-X utf8`` command line option or
 setting the ``PYTHONUTF8`` environment variable is not possible in some
 cases, or at least not convenient.

-Moreover, many users simply expect that Python 3 behaves as Python 2:
-don't bother them with encodings and "just works" in all cases. These
+Moreover, many users simply expect that Python 3 behaves like Python 2:
+it doesn't bother them with encodings and "just works" in all cases. These
 users don't worry about mojibake, or even expect mojibake because of
 complex documents using multiple incompatibles encodings.

@ -769,7 +769,7 @@ complex documents using multiple incompatibles encodings.
 Always use UTF-8
 ----------------

-Python already always use the UTF-8 encoding on Mac OS X, Android and
+Python already always uses the UTF-8 encoding on Mac OS X, Android and
 Windows.  Since UTF-8 became the de facto encoding, it makes sense to
 always use it on all platforms with any locale.

@ -783,7 +783,7 @@ Force UTF-8 for the POSIX locale
 An alternative to always using UTF-8 in any case is to only use UTF-8 when the
 ``LC_CTYPE`` locale is the POSIX locale.

-The `PEP 538`_ "Coercing the legacy C locale to C.UTF-8" of  Nick
+`PEP 538`_ "Coercing the legacy C locale to C.UTF-8" by Nick
 Coghlan proposes to implement that using the ``C.UTF-8`` locale.


@ -791,20 +791,20 @@ Use the strict error handler for operating system data
 ------------------------------------------------------

 Using the ``surrogateescape`` error handler for `operating system data`_
-creates surprising surrogate characters. No Python codec (except of
-``utf-7``) accept surrogates, and so encoding text coming from the
-operating system is likely to raise an error error. The problem is that
+creates surprising surrogate characters. No Python codec (except for
+``utf-7``) accepts surrogates so encoding text coming from the
+operating system is likely to raise an error. The problem is that
 the error comes late, very far from where the data was read.

 The ``strict`` error handler can be used instead to decode
 (``os.fsdecode()``) and encode (``os.fsencode()``) operating system
-data, to raise encoding errors as soon as possible. It helps to find
+data and raise encoding errors as soon as possible. Using it helps find
 bugs more quickly.

 The main drawback of this strategy is that it doesn't work in practice.
 Python 3 is designed on top on Unicode strings. Most functions expect
 Unicode and produce Unicode. Even if many operating system functions
-have two flavors, bytes and Unicode, the Unicode flavar is used is most
+have two flavors, bytes and Unicode, the Unicode flavor is used in most
 cases. There are good reasons for that: Unicode is more convenient in
 Python 3 and using Unicode helps to support the full Unicode Character
 Set (UCS) on Windows (even if Python now uses UTF-8 since Python 3.6,
@ -884,11 +884,11 @@ with the POSIX locale:
  stdout <http://bugs.python.org/issue8533>`_, regrtest fails with Unicode
  encode error if the locale is POSIX

-Some issues are real bug in applications which must set explicitly the
+Some issues are real bugs in applications which must explicitly set the
 encoding. Well, it just works in the common case (locale configured
-correctly), so what? But the program "suddenly" fails when the POSIX
-locale is used (probably for bad reasons). Such bug is not well
-understood by users. Example of such issue:
+correctly), so what? The program "suddenly" fails when the POSIX
+locale is used (probably for bad reasons). Such bugs are not well
+understood by users. Example of such issues:

 * 2013-11-21: `pip: open() uses the locale encoding to parse Python
  script, instead of the encoding cookie
@ -902,7 +902,7 @@ Prior Art
 =========

 Perl has a ``-C`` command line option and a ``PERLUNICODE`` environment
-varaible to force UTF-8: see `perlrun
+variable to force UTF-8: see `perlrun
 <http://perldoc.perl.org/perlrun.html>`_. It is possible to configure
 UTF-8 per standard stream, on input and output streams, etc.