From 6ccd6bc4284e26f27686718b386ebb97007dd32c Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Thu, 31 Mar 2022 13:07:01 +0900 Subject: [PATCH] PEP 686: Update (#2470) --- pep-0686.rst | 82 ++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 60 insertions(+), 22 deletions(-) diff --git a/pep-0686.rst b/pep-0686.rst index 720e1b6b0..9d24934b6 100644 --- a/pep-0686.rst +++ b/pep-0686.rst @@ -54,22 +54,37 @@ Users can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``. -``locale.get_encoding()`` -------------------------- +``locale.getencoding()`` +------------------------ -Currently, ``TextIOWrapper`` uses ``locale.getpreferredencoding(False)`` -when ``encoding="locale"`` option is specified. It is ``"UTF-8"`` in UTF-8 mode. +Since UTF-8 mode affects ``locale.getpreferredencoding(False)``, +we need an API to get locale encoding regardless of UTF-8 mode. -This behavior is inconsistent with the :pep:`597` motivation. +``locale.getencoding()`` will be added for this purpose. +It returns locale encoding too, but ignores UTF-8 mode. + +When ``warn_default_encoding`` option is specified, +``locale.getpreferredencoding()`` will emit ``EncodingWarning`` like +``open()`` (see also :pep:`597`). + + +Fixing ``encoding="locale"`` option +----------------------------------- + +:pep:`597` added the ``encoding="locale"`` option to the ``TextIOWrapper``. +This option is used to specify the locale encoding explicitly. +``TextIOWrapper`` should use locale encoding when the option is specified, +regardless of default text encoding. + +But ``TextIOWrapper`` uses ``"UTF-8"`` in UTF-8 mode even if +``encoding="locale"`` is specified for now. +This behavior is inconsistent with the :pep:`597` motivation. +It is because we didn't expect making UTF-8 mode default when Python +changes its default text encoding. + +This inconsistency should be fixed before making UTF-8 mode default. ``TextIOWrapper`` should use locale encoding when ``encoding="locale"`` is -passed before/after the default encoding is changed to UTF-8. - -To fix this inconsistency, we will add ``locale.get_encoding()``. -It is the same as ``locale.getpreferredencoding(False)`` but it ignores -the UTF-8 mode. - -This change will be released in Python 3.11 so that users can use UTF-8 mode -that is the same as Python 3.13. +passed even in UTF-8 mode. Backward Compatibility @@ -83,16 +98,18 @@ When a Python program depends on the default encoding, this change may cause ``UnicodeError``, mojibake, or even silent data corruption. So this change should be announced loudly. -To resolve this backward incompatibility, users can do: +This is the guideline to fix this backward compatibility issue: -* Disable UTF-8 mode. -* Use ``EncodingWarning`` to find where the default encoding is used and use - ``encoding="locale"`` option if locale encoding should be used - (as defined in :pep:`597`). -* Find every occurrence of ``locale.getpreferredencoding(False)`` in the - application, and replace it with ``locale.get_locale_encoding()`` if - locale encoding should be used. -* Test the application with UTF-8 mode. +1. Disable UTF-8 mode. +2. Use ``EncodingWarning`` (:pep:`597`) to find every places UTF-8 mode + affects. + + * If ``encoding`` option is omitted, consider using ``encoding="utf-8"`` + or ``encoding="locale"``. + * If ``locale.getpreferredencoding()`` is used, consider using + ``"utf-8"`` or ``locale.getencoding()``. + +3. Test the application with UTF-8 mode. Preceding examples @@ -122,10 +139,31 @@ Additionally, such warnings are not useful for non-cross platform applications run on Unix. So forcing users to specify the ``encoding`` everywhere is too painful. +Emitting a lot of ``DeprecationWarning`` will lead users ignore warnings. + +:pep:`387` requires adding a warning for backward incompatible changes. +But it doesn't require using ``DeprecationWarning``. +So using optional ``EncodingWarning`` doesn't violate the :pep:`387`. Java also rejected this idea in `JEP 400`_. +Use ``PYTHONIOENCODING`` for PIPEs +---------------------------------- + +To ease backward compatibility issue, using ``PYTHONIOENCODING`` as the +default encoding of PIPEs in the ``subprocess`` module is considered. + +With this idea, users can use legacy encoding for +``subprocess.Popen(text=True)`` even in UTF-8 mode. + +But this idea makes "default encoding" complicated. +And this idea is also backward incompatible. + +So this idea is rejected. Users can disable UTF-8 mode until they replace +``text=True`` with ``encoding="utf-8"`` or ``encoding="locale"``. + + How to teach this =================