PEP 616 Revisions (#1333)

* Add a PEP: String methods to remove prefixes and suffixes * add PEP number * Add sponsor * Fix typo * changes after review: passing tuples, clarity on returning self, formatting
2020-03-22 15:02:10 -04:00 · 2020-03-22 15:02:10 -04:00 · 0093a853e8
parent 3ff1593f56
commit 0093a853e8
1 changed files with 142 additions and 255 deletions
--- a/pep-0616.rst
+++ b/pep-0616.rst
@ -14,20 +14,10 @@ Abstract
 ========

 This is a proposal to add two new methods, ``cutprefix`` and
-``cutsuffix``, to the APIs of Python's various string objects.  In
-particular, the methods would be added to Unicode ``str`` objects, 
-binary ``bytes`` and ``bytearray`` objects, and
-``collections.UserString``. 
-
-If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
-``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
-been removed.  If ``s`` does not have ``pre`` as a prefix, an 
-unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
-is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
-
-The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is
-roughly equivalent to 
-``s[:-len(suf)] if suf and s.endswith(suf) else s``.
+``cutsuffix``, to the APIs of Python's various string objects.  These
+methods would remove a prefix or suffix (respectively) from a string,
+if present, and would be added to to Unicode ``str`` objects, binary
+``bytes`` and ``bytearray`` objects, and ``collections.UserString``.


 Rationale
@ -55,36 +45,79 @@ Specification
 The builtin ``str`` class will gain two new methods with roughly the
 following behavior::

-    def cutprefix(self: str, pre: str, /) -> str:
-        if self.startswith(pre):
-            return self[len(pre):]
-        return self[:]
-    
-    def cutsuffix(self: str, suf: str, /) -> str:
-        if suf and self.endswith(suf):
-            return self[:-len(suf)]
-        return self[:]
+    def cutprefix(self, prefix, /):
+        if not isinstance(self, str):
+            raise TypeError()
+        self_str = str(self)

-The only difference between the real implementation and the above is
-that, as with other string methods like ``replace``, the 
-methods will raise a ``TypeError`` if any of ``self``, ``pre`` or 
-``suf`` is not an instace of ``str``, and will cast subclasses of
-``str`` to builtin ``str`` objects.
+        if isinstance(prefix, tuple):
+            for option in prefix:
+                if not isinstance(option, str):
+                    raise TypeError()
+                option_str = str(option)

-Note that without the check for the truthyness of ``suf``, 
+                if self_str.startswith(option_str):
+                    return self_str[len(option_str):]
+
+            return self_str[:]
+
+        if not isinstance(prefix, str):
+            raise TypeError()
+
+        prefix_str = str(prefix)
+
+        if self_str.startswith(prefix_str):
+            return self_str[len(prefix_str):]
+        else:
+            return self_str[:]
+
+
+    def cutsuffix(self, suffix, /):
+        if not isinstance(self, str):
+            raise TypeError()
+        self_str = str(self)
+
+        if isinstance(suffix, tuple):
+            for option in suffix:
+                if not isinstance(option, str):
+                    raise TypeError()
+                option_str = str(option)
+
+                if option_str and self_str.endswith(option_str):
+                    return self_str[:-len(option_str)]
+
+            return self_str[:]
+
+        if not isinstance(suffix, str):
+            raise TypeError()
+        suffix_str = str(suffix)
+
+        if suffix_str and self_str.startswith(suffix_str):
+            return self_str[:-len(suffix_str)]
+        else:
+            return self_str[:]
+
+Note that without the check for the truthyness of suffixes, 
 ``s.cutsuffix('')`` would be mishandled and always return the empty 
 string due to the unintended evaluation of ``self[:-0]``.

 Methods with the corresponding semantics will be added to the builtin 
 ``bytes`` and ``bytearray`` objects.  If ``b`` is either a ``bytes``
 or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
-will accept any bytes-like object as an argument.
+will accept any bytes-like object or tuple of bytes-like objects as an
+argument. The one-at-a-time checking of types matches the implementation
+of ``startswith()`` and ``endswith()`` methods.

-Note that the ``bytearray`` methods return a copy of ``self``; they do
-not operate in place.
-
-The following behavior is considered a CPython implementation detail,
-but is not guaranteed by this specification::
+The ``self_str[:]`` copying behavior in the code ensures that the 
+``bytearray`` methods do not return ``self``, but it does not preclude
+the ``str`` and ``bytes`` methods from returning ``self``. Because 
+``str`` and ``bytes`` instances are immutable, the  ``cutprefix()``
+and ``cutsuffix()`` methods on these objects methods may (but are not
+required to) make the optimization of returning ``self`` if 
+``type(self) is str`` (``type(self) is bytes`` respectively)
+and the given affixes are not found, or are empty. As such, following
+behavior is considered a CPython implementation detail, and is not
+guaranteed by this specification::

    >>> x = 'foobar' * 10**6
    >>> x.cutprefix('baz') is x is x.cutsuffix('baz')
@ -92,20 +125,21 @@ but is not guaranteed by this specification::
    >>> x.cutprefix('') is x is x.cutsuffix('')
    True

-That is, for CPython's immutable ``str`` and ``bytes`` objects, the 
-methods return the original object when the affix is not found or if
-the affix is empty.  Because these types test for equality using 
-shortcuts for identity and length, the following equivalent 
-expressions are evaluated at approximately the same speed, for any 
-``str`` objects (or ``bytes`` objects) ``x`` and ``y``::
+To test whether any affixes were removed during the call, users
+should use the constant-time behavior of comparing the lengths of
+original and new strings::

-    >>> (True, x[len(y):]) if x.startswith(y) else (False, x)
-    >>> (True, z) if x != (z := x.cutprefix(y)) else (False, x)
+    >>> string = 'Python String Input'
+    >>> new_string = string.cutprefix("Py")
+    >>> modified = (len(string) != len(new_string))
+    >>> modified
+    True

+Users may also continue using ``startswith()`` and ``endswith()``
+methods for control flow instead of testing the lengths as above.

-The two methods will also be added to ``collections.UserString``, 
-where they rely on the implementation of the new ``str`` methods.
-
+The two methods will also be added to ``collections.UserString``, with
+similar behavior.

 Motivating examples from the Python standard library
 ====================================================
@ -113,41 +147,21 @@ Motivating examples from the Python standard library
 The examples below demonstrate how the proposed methods can make code
 one or more of the following:

-    Less fragile:
-        The code will not depend on the user to count the length of a
-        literal.
-    More performant:
-        The code does not require a call to the Python built-in 
-        ``len`` function.
-    More descriptive:
-        The methods give a higher-level API for code readability, as
-        opposed to the traditional method of string slicing.
+1. Less fragile:
+    
+    - The code will not depend on the user to count the length of a
+      literal.

+2. More performant:
+    
+    - The code does not require a call to the Python built-in 
+      ``len`` function, nor to the more expensive ``str.replace``
+      function.

-refactor.py
-----------
-
- Current::
-
-    if fix_name.startswith(self.FILE_PREFIX):
-        fix_name = fix_name[len(self.FILE_PREFIX):]
-
- Improved::
-
-    fix_name = fix_name.cutprefix(self.FILE_PREFIX)
-
-
-c_annotations.py:
-----------------
-
- Current::
-
-    if name.startswith("c."):
-        name = name[2:]
-
- Improved::
-
-    name = name.cutprefix("c.")
+3. More descriptive:
+    
+    - The methods give a higher-level API for code readability, as
+      opposed to the traditional method of string slicing.


 find_recursionlimit.py
@ -162,7 +176,8 @@ find_recursionlimit.py

 - Improved::

-    print(test_finc_name.cutprefix("test_"))
+    print(test_func_name.cutprefix("test_"))
+

 deccheck.py
 -----------
@ -195,83 +210,6 @@ intended to be removed.
    self.funcname = funcname.cutprefix("context.")


-test_i18n.py
------------
-
- Current::
-
-    if test_func_name.startswith("test_"):
-        print(test_func_name[5:])
-    else:
-        print(test_func_name)
-
- Improved::
-
-    print(test_finc_name.cutprefix("test_"))
-
- Current::
-
-    if creationDate.endswith('\\n'):
-        creationDate = creationDate[:-len('\\n')]
-
- Improved::
-
-    creationDate = creationDate.cutsuffix('\\n')
-
-
-shared_memory.py
----------------
-
- Current::
-
-    reported_name = self._name
-    if _USE_POSIX and self._prepend_leading_slash:
-        if self._name.startswith("/"):
-            reported_name = self._name[1:]
-    return reported_name
-
- Improved::
-
-    if _USE_POSIX and self._prepend_leading_slash:
-        return self._name.cutprefix("/")
-    return self._name
-
-
-build-installer.py
------------------
-
- Current::
-
-    if archiveName.endswith('.tar.gz'):
-        retval = os.path.basename(archiveName[:-7])
-        if ((retval.startswith('tcl') or retval.startswith('tk'))
-                and retval.endswith('-src')):
-            retval = retval[:-4]
-
- Improved::
-
-    if archiveName.endswith('.tar.gz'):
-        retval = os.path.basename(archiveName[:-7])
-        if retval.startswith(('tcl', 'tk')):
-            retval = retval.cutsuffix('-src')
-
-Depending on personal style, ``archiveName[:-7]`` could also be
-changed to ``archiveName.cutsuffix('.tar.gz')``.
-
-
-test_core.py
------------
-
- Current::
-
-    if output.endswith("\n"):
-        output = output[:-1]
-
- Improved::
-
-    output = output.cutsuffix("\n")
-
-
 cookiejar.py
 ------------

@ -289,31 +227,6 @@ cookiejar.py
    def strip_quotes(text):
        return text.cutprefix('"').cutsuffix('"')

- Current::
-
-    if line.endswith("\n"): line = line[:-1]
-
- Improved::
-
-    line = line.cutsuffix("\n")
-    
-
-fixdiv.py
---------
-
- Current::
-
-    def chop(line):
-        if line.endswith("\n"):
-            return line[:-1]
-        else:
-            return line
-
- Improved::
-
-    def chop(line):
-        return line.cutsuffix("\n")
-

 test_concurrent_futures.py
 --------------------------
@ -332,37 +245,10 @@ but in context, it behaves the same.

 - Improved::

-    return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test')
+    return name.cutsuffix(('Mixin', 'Tests', 'Test'))


-msvc9compiler.py
----------------
-
- Current::
-
-    if value.endswith(os.pathsep):
-        value = value[:-1]
-
- Improved::
-
-    value = value.cutsuffix(os.pathsep)
-
-
-test_pathlib.py
---------------
-
- Current::
-
-    self.assertTrue(r.startswith(clsname + '('), r)
-    self.assertTrue(r.endswith(')'), r)
-    inner = r[len(clsname) + 1 : -1]
-
- Improved::
-
-    self.assertTrue(r.startswith(clsname + '('), r)
-    self.assertTrue(r.endswith(')'), r)
-    inner = r.cutprefix(clsname + '(').cutsuffix(')')
-
+There were many other such examples in the stdlib.


 Rejected Ideas
@ -379,37 +265,27 @@ consistent, it would not be obvious for users to have to call
 ``'foobar'.cutprefix(('foo,))`` for the common use case of a 
 single prefix.

-Allow multiple prefixes
-----------------------
-
-Some users discussed the desire to be able to remove multiple 
-prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``.
-However, this adds ambiguity about the order in which the prefixes are
-removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``.
-After this proposal, this can be spelled explicitly as 
-``s.cutprefix('Foo').cutprefix('FooBar')``.
-
 Remove multiple copies of a prefix
 ----------------------------------

 This is the behavior that would be consistent with the aforementioned
-expansion of the ``lstrip/rstrip`` API -- repeatedly applying the
+expansion of the ``lstrip``/``rstrip`` API -- repeatedly applying the
 function until the argument is unchanged.  This behavior is attainable
-from the proposed behavior via the following::
+from the proposed behavior via by the following::
    
-    >>> s = 'foo' * 100 + 'bar'
-    >>> while s != (s := s.cutprefix("foo")): pass
+    >>> s = 'foobar' * 100 + 'bar'
+    >>> prefixes = ('bar', 'foo')
+    >>> while len(s) != len(s := s.cutprefix(prefixes)): pass
    >>> s
    'bar'

-The above can be modififed by chaining multiple ``cutprefix`` calls
-together to achieve the full behavior of the ``lstrip``/``rstrip``
-generalization, while being explicit in the order of removal.
+or the more obvious and readable alternative::

-While the proposed API could later be extended to include some of
-these use cases, to do so before any observation of how these methods
-are used in practice would be premature and may lead to choosing the
-wrong behavior.
+    >>> s = 'foo' * 100 + 'bar'
+    >>> prefixes = ('bar', 'foo')
+    >>> while s.startswith(prefixes): s = s.cutprefix(prefixes)
+    >>> s
+    'bar'


 Raising an exception when not found
@ -427,30 +303,41 @@ Alternative Method Names

 Several alternatives method names have been proposed.  Some are listed
 below, along with commentary for why they should be rejected in favor
-of ``cutprefix`` (the same arguments hold for ``cutsuffix``)
+of ``cutprefix`` (the same arguments hold for ``cutsuffix``).

-    ``ltrim``
-        "Trim" does in other languages (e.g. JavaScript, Java, Go,
-        PHP) what ``strip`` methods do in Python.
-    ``lstrip(string=...)``
-        This would avoid adding a new method, but for different 
-        behavior, it's better to have two different methods than one
-        method with a keyword argument that select the behavior.
-    ``cut_prefix``
-        All of the other methods of the string API, e.g.
-        ``str.startswith()``, use ``lowercase`` rather than
-        ``lower_case_with_underscores``.
-    ``cutleft``, ``leftcut``, or ``lcut``
-        The explicitness of "prefix" is preferred.
-    ``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc.
-        All of these might have been acceptable, but they have more
-        characters than ``cut``.  Some suggested that the verb "cut"
-        implies mutability, but the string API already contains verbs
-        like "replace", "strip", "split", and "swapcase".
-    ``stripprefix``
-        Users may benefit from the mnemonic that "strip" means working
-        with sets of characters, while other methods work with
-        substrings, so re-using "strip" here should be avoided.
+- ``ltrim``
+
+    - "Trim" does in other languages (e.g. JavaScript, Java, Go,
+      PHP) what ``strip`` methods do in Python.
+
+- ``lstrip(string=...)``
+
+    - This would avoid adding a new method, but for different 
+      behavior, it's better to have two different methods than one
+      method with a keyword argument that select the behavior.
+
+- ``cut_prefix``
+
+    - All of the other methods of the string API, e.g.
+      ``str.startswith()``, use ``lowercase`` rather than
+      ``lower_case_with_underscores``.
+
+- ``cutleft``, ``leftcut``, or ``lcut``
+
+    - The explicitness of "prefix" is preferred.
+
+- ``removeprefix``, ``deleteprefix``, ``withoutprefix``, ``dropprefix``, etc.
+
+    - All of these might have been acceptable, but they have more
+      characters than ``cut``.  Some suggested that the verb "cut"
+      implies mutability, but the string API already contains verbs
+      like "replace", "strip", "split", and "swapcase".
+
+- ``stripprefix``
+
+    - Users may benefit from remembering that "strip" means working
+      with sets of characters, while other methods work with
+      substrings, so re-using "strip" here should be avoided.


 Reference Implementation