PEP 616: String methods to remove prefixes and suffixes (#1332)

* Add a PEP: String methods to remove prefixes and suffixes * add PEP number * Add sponsor * Fix typo
2020-03-20 12:21:06 -04:00 · 2020-03-20 12:21:06 -04:00 · 914ef2c51a
parent 4516435ea1
commit 914ef2c51a
1 changed files with 488 additions and 0 deletions
--- a/pep-0616.rst
+++ b/pep-0616.rst
@ -0,0 +1,488 @@
+PEP: 616
+Title: String methods to remove prefixes and suffixes
+Author: Dennis Sweeney <sweeney.dennis650@gmail.com>
+Sponsor: Eric V. Smith <eric@trueblade.com>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 19-Mar-2020
+Python-Version: 3.9
+Post-History: 30-Aug-2002
+
+
+Abstract
+========
+
+This is a proposal to add two new methods, ``cutprefix`` and
+``cutsuffix``, to the APIs of Python's various string objects.  In
+particular, the methods would be added to Unicode ``str`` objects, 
+binary ``bytes`` and ``bytearray`` objects, and
+``collections.UserString``. 
+
+If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
+``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
+been removed.  If ``s`` does not have ``pre`` as a prefix, an 
+unchanged copy of ``s`` is returned.  In summary, ``s.cutprefix(pre)``
+is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
+
+The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is
+roughly equivalent to 
+``s[:-len(suf)] if suf and s.endswith(suf) else s``.
+
+
+Rationale
+=========
+
+There have been repeated issues [#confusion]_ on the Bug Tracker 
+and StackOverflow related to user confusion about the existing 
+``str.lstrip`` and ``str.rstrip`` methods.  These users are typically
+expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they 
+are surprised that the parameter for ``lstrip`` is interpreted as a
+set of characters, not a substring.  This repeated issue is evidence
+that these methods are useful, and the new methods allow a cleaner
+redirection of users to the desired behavior.
+
+As another testimonial for the usefulness of these methods, several
+users on Python-Ideas [#pyid]_ reported frequently including similar
+functions in their own code for productivity.  The implementation
+often contained subtle mistakes regarding the handling of the empty
+string (see `Specification`_).
+
+
+Specification
+=============
+
+The builtin ``str`` class will gain two new methods with roughly the
+following behavior::
+
+    def cutprefix(self: str, pre: str, /) -> str:
+        if self.startswith(pre):
+            return self[len(pre):]
+        return self[:]
+    
+    def cutsuffix(self: str, suf: str, /) -> str:
+        if suf and self.endswith(suf):
+            return self[:-len(suf)]
+        return self[:]
+
+The only difference between the real implementation and the above is
+that, as with other string methods like ``replace``, the 
+methods will raise a ``TypeError`` if any of ``self``, ``pre`` or 
+``suf`` is not an instace of ``str``, and will cast subclasses of
+``str`` to builtin ``str`` objects.
+
+Note that without the check for the truthyness of ``suf``, 
+``s.cutsuffix('')`` would be mishandled and always return the empty 
+string due to the unintended evaluation of ``self[:-0]``.
+
+Methods with the corresponding semantics will be added to the builtin 
+``bytes`` and ``bytearray`` objects.  If ``b`` is either a ``bytes``
+or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
+will accept any bytes-like object as an argument.
+
+Note that the ``bytearray`` methods return a copy of ``self``; they do
+not operate in place.
+
+The following behavior is considered a CPython implementation detail,
+but is not guaranteed by this specification::
+
+    >>> x = 'foobar' * 10**6
+    >>> x.cutprefix('baz') is x is x.cutsuffix('baz')
+    True
+    >>> x.cutprefix('') is x is x.cutsuffix('')
+    True
+
+That is, for CPython's immutable ``str`` and ``bytes`` objects, the 
+methods return the original object when the affix is not found or if
+the affix is empty.  Because these types test for equality using 
+shortcuts for identity and length, the following equivalent 
+expressions are evaluated at approximately the same speed, for any 
+``str`` objects (or ``bytes`` objects) ``x`` and ``y``::
+
+    >>> (True, x[len(y):]) if x.startswith(y) else (False, x)
+    >>> (True, z) if x != (z := x.cutprefix(y)) else (False, x)
+
+
+The two methods will also be added to ``collections.UserString``, 
+where they rely on the implementation of the new ``str`` methods.
+
+
+Motivating examples from the Python standard library
+====================================================
+
+The examples below demonstrate how the proposed methods can make code
+one or more of the following:
+
+    Less fragile:
+        The code will not depend on the user to count the length of a
+        literal.
+    More performant:
+        The code does not require a call to the Python built-in 
+        ``len`` function.
+    More descriptive:
+        The methods give a higher-level API for code readability, as
+        opposed to the traditional method of string slicing.
+
+
+refactor.py
+-----------
+
+- Current::
+
+    if fix_name.startswith(self.FILE_PREFIX):
+        fix_name = fix_name[len(self.FILE_PREFIX):]
+
+- Improved::
+
+    fix_name = fix_name.cutprefix(self.FILE_PREFIX)
+
+
+c_annotations.py:
+-----------------
+
+- Current::
+
+    if name.startswith("c."):
+        name = name[2:]
+
+- Improved::
+
+    name = name.cutprefix("c.")
+
+
+find_recursionlimit.py
+----------------------
+
+- Current::
+
+    if test_func_name.startswith("test_"):
+        print(test_func_name[5:])
+    else:
+        print(test_func_name)
+
+- Improved::
+
+    print(test_finc_name.cutprefix("test_"))
+
+deccheck.py
+-----------
+
+This is an interesting case because the author chose to use the
+``str.replace`` method in a situation where only a prefix was
+intended to be removed.
+
+- Current::
+
+    if funcname.startswith("context."):
+        self.funcname = funcname.replace("context.", "")
+        self.contextfunc = True
+    else:
+        self.funcname = funcname
+        self.contextfunc = False
+
+- Improved::
+
+    if funcname.startswith("context."):
+        self.funcname = funcname.cutprefix("context.")
+        self.contextfunc = True
+    else:
+        self.funcname = funcname
+        self.contextfunc = False
+
+- Arguably further improved::
+
+    self.contextfunc = funcname.startswith("context.")
+    self.funcname = funcname.cutprefix("context.")
+
+
+test_i18n.py
+------------
+
+- Current::
+
+    if test_func_name.startswith("test_"):
+        print(test_func_name[5:])
+    else:
+        print(test_func_name)
+
+- Improved::
+
+    print(test_finc_name.cutprefix("test_"))
+
+- Current::
+
+    if creationDate.endswith('\\n'):
+        creationDate = creationDate[:-len('\\n')]
+
+- Improved::
+
+    creationDate = creationDate.cutsuffix('\\n')
+
+
+shared_memory.py
+----------------
+
+- Current::
+
+    reported_name = self._name
+    if _USE_POSIX and self._prepend_leading_slash:
+        if self._name.startswith("/"):
+            reported_name = self._name[1:]
+    return reported_name
+
+- Improved::
+
+    if _USE_POSIX and self._prepend_leading_slash:
+        return self._name.cutprefix("/")
+    return self._name
+
+
+build-installer.py
+------------------
+
+- Current::
+
+    if archiveName.endswith('.tar.gz'):
+        retval = os.path.basename(archiveName[:-7])
+        if ((retval.startswith('tcl') or retval.startswith('tk'))
+                and retval.endswith('-src')):
+            retval = retval[:-4]
+
+- Improved::
+
+    if archiveName.endswith('.tar.gz'):
+        retval = os.path.basename(archiveName[:-7])
+        if retval.startswith(('tcl', 'tk')):
+            retval = retval.cutsuffix('-src')
+
+Depending on personal style, ``archiveName[:-7]`` could also be
+changed to ``archiveName.cutsuffix('.tar.gz')``.
+
+
+test_core.py
+------------
+
+- Current::
+
+    if output.endswith("\n"):
+        output = output[:-1]
+
+- Improved::
+
+    output = output.cutsuffix("\n")
+
+
+cookiejar.py
+------------
+
+- Current::
+
+    def strip_quotes(text):
+        if text.startswith('"'):
+            text = text[1:]
+        if text.endswith('"'):
+            text = text[:-1]
+        return text
+
+- Improved::
+
+    def strip_quotes(text):
+        return text.cutprefix('"').cutsuffix('"')
+
+- Current::
+
+    if line.endswith("\n"): line = line[:-1]
+
+- Improved::
+
+    line = line.cutsuffix("\n")
+    
+
+fixdiv.py
+---------
+
+- Current::
+
+    def chop(line):
+        if line.endswith("\n"):
+            return line[:-1]
+        else:
+            return line
+
+- Improved::
+
+    def chop(line):
+        return line.cutsuffix("\n")
+
+
+test_concurrent_futures.py
+--------------------------
+
+In the following example, the meaning of the code changes slightly,
+but in context, it behaves the same.
+
+- Current::
+
+    if name.endswith(('Mixin', 'Tests')):
+        return name[:-5]
+    elif name.endswith('Test'):
+        return name[:-4]
+    else:
+        return name
+
+- Improved::
+
+    return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test')
+
+
+msvc9compiler.py
+----------------
+
+- Current::
+
+    if value.endswith(os.pathsep):
+        value = value[:-1]
+
+- Improved::
+
+    value = value.cutsuffix(os.pathsep)
+
+
+test_pathlib.py
+---------------
+
+- Current::
+
+    self.assertTrue(r.startswith(clsname + '('), r)
+    self.assertTrue(r.endswith(')'), r)
+    inner = r[len(clsname) + 1 : -1]
+
+- Improved::
+
+    self.assertTrue(r.startswith(clsname + '('), r)
+    self.assertTrue(r.endswith(')'), r)
+    inner = r.cutprefix(clsname + '(').cutsuffix(')')
+
+
+
+Rejected Ideas
+==============
+
+Expand the lstrip and rstrip APIs
+---------------------------------
+
+Because ``lstrip`` takes a string as its argument, it could be viewed
+as taking an iterable of length-1 strings.  The API could therefore be 
+generalized to accept any iterable of strings, which would be 
+successively removed as prefixes.  While this behavior would be 
+consistent, it would not be obvious for users to have to call 
+``'foobar'.cutprefix(('foo,))`` for the common use case of a 
+single prefix.
+
+Allow multiple prefixes
+-----------------------
+
+Some users discussed the desire to be able to remove multiple 
+prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``.
+However, this adds ambiguity about the order in which the prefixes are
+removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``.
+After this proposal, this can be spelled explicitly as 
+``s.cutprefix('Foo').cutprefix('FooBar')``.
+
+Remove multiple copies of a prefix
+----------------------------------
+
+This is the behavior that would be consistent with the aforementioned
+expansion of the ``lstrip/rstrip`` API -- repeatedly applying the
+function until the argument is unchanged.  This behavior is attainable
+from the proposed behavior via the following::
+    
+    >>> s = 'foo' * 100 + 'bar'
+    >>> while s != (s := s.cutprefix("foo")): pass
+    >>> s
+    'bar'
+
+The above can be modififed by chaining multiple ``cutprefix`` calls
+together to achieve the full behavior of the ``lstrip``/``rstrip``
+generalization, while being explicit in the order of removal.
+
+While the proposed API could later be extended to include some of
+these use cases, to do so before any observation of how these methods
+are used in practice would be premature and may lead to choosing the
+wrong behavior.
+
+
+Raising an exception when not found
+-----------------------------------
+
+There was a suggestion that ``s.cutprefix(pre)`` should raise an
+exception if ``not s.startswith(pre)``.  However, this does not match
+with the behavior and feel of other string methods.  There could be
+``required=False`` keyword added, but this violates the KISS
+principle.
+
+
+Alternative Method Names
+------------------------
+
+Several alternatives method names have been proposed.  Some are listed
+below, along with commentary for why they should be rejected in favor
+of ``cutprefix`` (the same arguments hold for ``cutsuffix``)
+
+    ``ltrim``
+        "Trim" does in other languages (e.g. JavaScript, Java, Go,
+        PHP) what ``strip`` methods do in Python.
+    ``lstrip(string=...)``
+        This would avoid adding a new method, but for different 
+        behavior, it's better to have two different methods than one
+        method with a keyword argument that select the behavior.
+    ``cut_prefix``
+        All of the other methods of the string API, e.g.
+        ``str.startswith()``, use ``lowercase`` rather than
+        ``lower_case_with_underscores``.
+    ``cutleft``, ``leftcut``, or ``lcut``
+        The explicitness of "prefix" is preferred.
+    ``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc.
+        All of these might have been acceptable, but they have more
+        characters than ``cut``.  Some suggested that the verb "cut"
+        implies mutability, but the string API already contains verbs
+        like "replace", "strip", "split", and "swapcase".
+    ``stripprefix``
+        Users may benefit from the mnemonic that "strip" means working
+        with sets of characters, while other methods work with
+        substrings, so re-using "strip" here should be avoided.
+
+
+Reference Implementation
+========================
+
+See the pull request on GitHub [#pr]_.
+
+
+References
+==========
+
+.. [#pr] GitHub pull request with implementation
+   (https://github.com/python/cpython/pull/18939)
+.. [#pyid] Discussion on Python-Ideas
+   (https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/)
+.. [#confusion] Comment listing Bug Tracker and StackOverflow issues 
+   (https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFIII3AX22K3N3KT7RB4DPBY3LPVG/)
+
+
+Copyright
+=========
+
+This document is placed in the public domain or under the
+CC0-1.0-Universal license, whichever is more permissive.
+
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End: