PEP 616: Rename the methods and restrict to one affix (#1340)

2020-03-25 13:17:50 -04:00 · 2020-03-25 13:17:50 -04:00 · 68062417c0
parent dcb29f4019
commit 68062417c0
1 changed files with 89 additions and 108 deletions
--- a/pep-0616.rst
+++ b/pep-0616.rst
@ -13,10 +13,10 @@ Post-History: 20-Mar-2020
 Abstract
 ========

-This is a proposal to add two new methods, ``cutprefix`` and
-``cutsuffix``, to the APIs of Python's various string objects.  These
+This is a proposal to add two new methods, ``removeprefix()`` and
+``removesuffix()``, to the APIs of Python's various string objects.  These
 methods would remove a prefix or suffix (respectively) from a string,
-if present, and would be added to to Unicode ``str`` objects, binary
+if present, and would be added to Unicode ``str`` objects, binary
 ``bytes`` and ``bytearray`` objects, and ``collections.UserString``.


@ -27,7 +27,7 @@ There have been repeated issues on Python-Ideas [#pyid]_ [3]_,
 Python-Dev [4]_ [5]_ [6]_ [7]_, the Bug Tracker, and
 StackOverflow [#confusion]_, related to user confusion about the
 existing ``str.lstrip`` and ``str.rstrip`` methods.  These users are
-typically expecting the behavior of ``cutprefix`` and ``cutsuffix``,
+typically expecting the behavior of ``removeprefix`` and ``removesuffix``,
 but they are surprised that the parameter for ``lstrip`` is
 interpreted as a set of characters, not a substring.  This repeated
 issue is evidence that these methods are useful.  The new methods
@ -35,9 +35,15 @@ allow a cleaner redirection of users to the desired behavior.

 As another testimonial for the usefulness of these methods, several
 users on Python-Ideas [#pyid]_ reported frequently including similar
-functions in their own code for productivity.  The implementation
+functions in their code for productivity.  The implementation
 often contained subtle mistakes regarding the handling of the empty
-string (see `Specification`_).
+string, so a well-tested built-in method would be useful.
+
+The existing solutions for creating the desired behavior are to either
+implement the methods as in the `Specification`_ below, or to use
+regular expressions as in the expression
+``re.sub('^' + re.escape(prefix), '', s)``, which is less discoverable,
+requires a module import, and results in less readable code.


 Specification
@ -46,95 +52,43 @@ Specification
 The builtin ``str`` class will gain two new methods with roughly the
 following behavior::

-    def cutprefix(self, prefix, /):
-        if not isinstance(self, str):
-            raise TypeError()
-        self_str = str(self)
-
-        if isinstance(prefix, tuple):
-            for option in tuple(prefix):
-                if not isinstance(option, str):
-                    raise TypeError()
-                option_str = str(option)
-
-                if self_str.startswith(option_str):
-                    return self_str[len(option_str):]
-
-            return self_str[:]
-
-        if not isinstance(prefix, str):
-            raise TypeError()
-
-        prefix_str = str(prefix)
-
-        if self_str.startswith(prefix_str):
-            return self_str[len(prefix_str):]
+    def removeprefix(self: str, prefix: str, /) -> str:
+        if self.startswith(prefix):
+            return self[len(prefix):]
        else:
-            return self_str[:]
+            return self[:]

-
-    def cutsuffix(self, suffix, /):
-        if not isinstance(self, str):
-            raise TypeError()
-        self_str = str(self)
-
-        if isinstance(suffix, tuple):
-            for option in tuple(suffix):
-                if not isinstance(option, str):
-                    raise TypeError()
-                option_str = str(option)
-
-                if not option_str:
-                    return self_str[:]
-                if self_str.endswith(option_str):
-                    return self_str[:-len(option_str)]
-
-            return self_str[:]
-
-        if not isinstance(suffix, str):
-            raise TypeError()
-        suffix_str = str(suffix)
-
-        if suffix_str and self_str.startswith(suffix_str):
-            return self_str[:-len(suffix_str)]
+    def removesuffix(self: str, suffix: str, /) -> str:
+        if suffix and self.endswith(suffix):
+            return self[:-len(suffix)]
        else:
-            return self_str[:]
+            return self[:]

+Note that ``self[:]`` might not actually make a copy -- if the affix
+is empty or not found, and if ``type(self) is str``, then these methods
+may, but are not required to, make the optimization of returning ``self``.
+However, when called on instances of subclasses of ``str``, these
+methods should return base ``str`` objects, not ``self``.

-Note that without the check for the truthyness of suffixes,
-``s.cutsuffix('')`` would be mishandled and always return the empty
+Without the check for the truthiness of ``suffix``,
+``s.removesuffix('')`` would be mishandled and always return the empty
 string due to the unintended evaluation of ``self[:-0]``.

 Methods with the corresponding semantics will be added to the builtin
 ``bytes`` and ``bytearray`` objects.  If ``b`` is either a ``bytes``
-or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
-will accept any bytes-like object or tuple of bytes-like objects as an
-argument.  The one-at-a-time checking of types matches the implementation
-of ``startswith()`` and ``endswith()`` methods.
-
-The ``self_str[:]`` copying behavior in the code ensures that the
-``bytearray`` methods do not return ``self``, but it does not preclude
-the ``str`` and ``bytes`` methods from returning ``self``.  Because
-``str`` and ``bytes`` instances are immutable, the  ``cutprefix()``
-and ``cutsuffix()`` methods on these objects may (but are not
-required to) make the optimization of returning ``self`` if
-``type(self) is str`` (``type(self) is bytes`` respectively)
-and the given affixes are not found, or are empty.  As such, the
-following behavior is considered a CPython implementation detail, and
-is not guaranteed by this specification::
-
-    >>> x = 'foobar' * 10**6
-    >>> x.cutprefix('baz') is x is x.cutsuffix('baz')
-    True
-    >>> x.cutprefix('') is x is x.cutsuffix('')
-    True
+or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()``
+will accept any bytes-like object as an argument.  Although the methods
+on the immutable ``str`` and ``bytes`` types may make the aforementioned
+optimization of returning the original object, ``bytearray.removeprefix()``
+and ``bytearray.removesuffix()`` should *always* return a copy, never the
+original object.

 To test whether any affixes were removed during the call, users
-should use the constant-time behavior of comparing the lengths of
+may use the constant-time behavior of comparing the lengths of
 the original and new strings::

    >>> string = 'Python String Input'
-    >>> new_string = string.cutprefix("Py")
+    >>> new_string = string.removeprefix("Py")
    >>> modified = (len(string) != len(new_string))
    >>> modified
    True
@ -159,8 +113,7 @@ one or more of the following:
 2. More performant:

   The code does not require a call to the Python built-in ``len``
-   function, nor to the more expensive ``str.replace()``
-   method.
+   function nor to the more expensive ``str.replace()`` method.

 3. More descriptive:

@ -180,7 +133,7 @@ find_recursionlimit.py

 - Improved::

-    print(test_func_name.cutprefix("test_"))
+    print(test_func_name.removeprefix("test_"))


 deccheck.py
@ -202,7 +155,7 @@ intended to be removed.
 - Improved::

    if funcname.startswith("context."):
-        self.funcname = funcname.cutprefix("context.")
+        self.funcname = funcname.removeprefix("context.")
        self.contextfunc = True
    else:
        self.funcname = funcname
@ -211,7 +164,7 @@ intended to be removed.
 - Arguably further improved::

    self.contextfunc = funcname.startswith("context.")
-    self.funcname = funcname.cutprefix("context.")
+    self.funcname = funcname.removeprefix("context.")


 cookiejar.py
@ -229,12 +182,15 @@ cookiejar.py
 - Improved::

    def strip_quotes(text):
-        return text.cutprefix('"').cutsuffix('"')
+        return text.removeprefix('"').removesuffix('"')


 test_concurrent_futures.py
 --------------------------

+In the following example, the meaning of the code changes slightly,
+but in context, it behaves the same.
+
 - Current::

    if name.endswith(('Mixin', 'Tests')):
@ -246,7 +202,9 @@ test_concurrent_futures.py

 - Improved::

-    return name.cutsuffix(('Mixin', 'Tests', 'Test'))
+    return (name.removesuffix('Mixin')
+                .removesuffix('Tests')
+                .removesuffix('Test'))


 There were many other such examples in the stdlib.
@ -259,7 +217,7 @@ Expand the lstrip and rstrip APIs
 ---------------------------------

 Because ``lstrip`` takes a string as its argument, it could be viewed
-as taking an iterable of length-1 strings.  The API could therefore be
+as taking an iterable of length-1 strings.  The API could, therefore, be
 generalized to accept any iterable of strings, which would be
 successively removed as prefixes.  While this behavior would be
 consistent, it would not be obvious for users to have to call
@ -275,37 +233,61 @@ expansion of the ``lstrip``/``rstrip`` API -- repeatedly applying the
 function until the argument is unchanged.  This behavior is attainable
 from the proposed behavior via by the following::

-    >>> s = 'FooBar' * 100 + 'Baz'
-    >>> prefixes = ('Bar', 'Foo')
-    >>> while len(s) != len(s := s.cutprefix(prefixes)): pass
+    >>> s = 'Foo' * 100 + 'Bar'
+    >>> prefix = 'Foo'
+    >>> while len(s) != len(s := s.cutprefix(prefix)): pass
    >>> s
-    'Baz'
+    'Bar'

 or the more obvious and readable alternative::

-    >>> s = 'FooBar' * 100 + 'Baz'
-    >>> prefixes = ('Bar', 'Foo')
-    >>> while s.startswith(prefixes): s = s.cutprefix(prefixes)
+    >>> s = 'Foo' * 100 + 'Bar'
+    >>> prefix = 'Foo'
+    >>> while s.startswith(prefix): s = s.cutprefix(prefix)
    >>> s
-    'Baz'
+    'Bar'


 Raising an exception when not found
 -----------------------------------

-There was a suggestion that ``s.cutprefix(pre)`` should raise an
+There was a suggestion that ``s.removeprefix(pre)`` should raise an
 exception if ``not s.startswith(pre)``.  However, this does not match
 with the behavior and feel of other string methods.  There could be
 ``required=False`` keyword added, but this violates the KISS
 principle.


+Accepting a tuple of affixes
+-----------------------------
+
+It could be convenient to write the ``test_concurrent_futures.py``
+example above as ``name.removesuffix(('Mixin', 'Tests', 'Test'))``, so
+there was a suggestion that the new methods be able to take a tuple of
+strings as an argument, similar to the ``startswith()`` API.  Within
+the tuple, only the first matching affix would be removed.  This was
+rejected on the following grounds:
+
+* This behavior can be surprising or visually confusing, especially
+  when one prefix is empty or is a substring of another prefix, as in
+  ``'FooBar'.removeprefix(('', 'Foo')) == 'Foo'``
+  or ``'FooBar text'.removeprefix(('Foo', 'FooBar ')) == 'Bar text'``.
+
+* The API for ``str.replace()`` only accepts a single pair of
+  replacement strings, but has stood the test of time by refusing the
+  temptation to guess in the face of ambiguous multiple replacements.
+
+* There may be a compelling use case for such a feature in the future,
+  but generalization before the basic feature sees real-world use would
+  be easy to get permanently wrong.
+
+
 Alternative Method Names
 ------------------------

 Several alternatives method names have been proposed.  Some are listed
 below, along with commentary for why they should be rejected in favor
-of ``cutprefix`` (the same arguments hold for ``cutsuffix``).
+of ``removeprefix`` (the same arguments hold for ``removesuffix``).

 - ``ltrim``, ``trimprefix``, etc.:

@ -316,24 +298,23 @@ of ``cutprefix`` (the same arguments hold for ``cutsuffix``).

  This would avoid adding a new method, but for different
  behavior, it's better to have two different methods than one
-  method with a keyword argument that select the behavior.
+  method with a keyword argument that selects the behavior.

- ``cut_prefix``:
+- ``remove_prefix``:

  All of the other methods of the string API, e.g.
  ``str.startswith()``, use ``lowercase`` rather than
  ``lower_case_with_underscores``.

- ``cutleft``, ``leftcut``, or ``lcut``:
+- ``removeleft``, ``leftremove``, or ``lremove``:

  The explicitness of "prefix" is preferred.

- ``removeprefix``, ``deleteprefix``, ``withoutprefix``, ``dropprefix``, etc.:
+- ``cutprefix``, ``deleteprefix``, ``withoutprefix``, ``dropprefix``, etc.:

-  All of these might have been acceptable, but they have more
-  characters than ``cut``.  Some suggested that the verb "cut"
-  implies mutability, but the string API already contains verbs
-  like "replace", "strip", "split", and "swapcase".
+  Many of these might have been acceptable, but "remove" is
+  unambiguous and matches how one would describe the "remove the prefix"
+  behavior in English.

 - ``stripprefix``:

@ -345,7 +326,7 @@ of ``cutprefix`` (the same arguments hold for ``cutsuffix``).
 Reference Implementation
 ========================

-See the pull request on GitHub [#pr]_ (not updated).
+See the pull request on GitHub [#pr]_.


 References