python-peps/pep-0616.rst

PEP: 616
Title: String methods to remove prefixes and suffixes
Author: Dennis Sweeney <sweeney.dennis650@gmail.com>
Sponsor: Eric V. Smith <eric@trueblade.com>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 19-Mar-2020
Python-Version: 3.9
Post-History: 20-Mar-2020


Abstract
========

This is a proposal to add two new methods, ``removeprefix()`` and
``removesuffix()``, to the APIs of Python's various string objects.  These
methods would remove a prefix or suffix (respectively) from a string,
if present, and would be added to Unicode ``str`` objects, binary
``bytes`` and ``bytearray`` objects, and ``collections.UserString``.


Rationale
=========

There have been repeated issues on Python-Ideas [#pyid]_ [3]_,
Python-Dev [4]_ [5]_ [6]_ [7]_, the Bug Tracker, and
StackOverflow [#confusion]_, related to user confusion about the
existing ``str.lstrip`` and ``str.rstrip`` methods.  These users are
typically expecting the behavior of ``removeprefix`` and ``removesuffix``,
but they are surprised that the parameter for ``lstrip`` is
interpreted as a set of characters, not a substring.  This repeated
issue is evidence that these methods are useful.  The new methods
allow a cleaner redirection of users to the desired behavior.

As another testimonial for the usefulness of these methods, several
users on Python-Ideas [#pyid]_ reported frequently including similar
functions in their code for productivity.  The implementation
often contained subtle mistakes regarding the handling of the empty
string, so a well-tested built-in method would be useful.

The existing solutions for creating the desired behavior are to either
implement the methods as in the `Specification`_ below, or to use
regular expressions as in the expression
``re.sub('^' + re.escape(prefix), '', s)``, which is less discoverable,
requires a module import, and results in less readable code.


Specification
=============

The builtin ``str`` class will gain two new methods which will behave
as follows when ``type(self) is type(prefix) is type(suffix) is str``::

    def removeprefix(self: str, prefix: str, /) -> str:
        if self.startswith(prefix):
            return self[len(prefix):]
        else:
            return self[:]

    def removesuffix(self: str, suffix: str, /) -> str:
        # suffix='' should not call self[:-0].
        if suffix and self.endswith(suffix):
            return self[:-len(suffix)]
        else:
            return self[:]

When the arguments are instances of ``str`` subclasses, the methods should
behave as though those arguments were first coerced to base ``str``
objects, and the return value should always be a base ``str``.

Methods with the corresponding semantics will be added to the builtin
``bytes`` and ``bytearray`` objects.  If ``b`` is either a ``bytes``
or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()``
will accept any bytes-like object as an argument. The two methods will
also be added to ``collections.UserString``, with similar behavior.


Motivating examples from the Python standard library
====================================================

The examples below demonstrate how the proposed methods can make code
one or more of the following:

1. Less fragile:

   The code will not depend on the user to count the length of a literal.

2. More performant:

   The code does not require a call to the Python built-in ``len``
   function nor to the more expensive ``str.replace()`` method.

3. More descriptive:

   The methods give a higher-level API for code readability as
   opposed to the traditional method of string slicing.


find_recursionlimit.py
----------------------

- Current::

    if test_func_name.startswith("test_"):
        print(test_func_name[5:])
    else:
        print(test_func_name)

- Improved::

    print(test_func_name.removeprefix("test_"))


deccheck.py
-----------

This is an interesting case because the author chose to use the
``str.replace`` method in a situation where only a prefix was
intended to be removed.

- Current::

    if funcname.startswith("context."):
        self.funcname = funcname.replace("context.", "")
        self.contextfunc = True
    else:
        self.funcname = funcname
        self.contextfunc = False

- Improved::

    if funcname.startswith("context."):
        self.funcname = funcname.removeprefix("context.")
        self.contextfunc = True
    else:
        self.funcname = funcname
        self.contextfunc = False

- Arguably further improved::

    self.contextfunc = funcname.startswith("context.")
    self.funcname = funcname.removeprefix("context.")


cookiejar.py
------------

- Current::

    def strip_quotes(text):
        if text.startswith('"'):
            text = text[1:]
        if text.endswith('"'):
            text = text[:-1]
        return text

- Improved::

    def strip_quotes(text):
        return text.removeprefix('"').removesuffix('"')


test_i18n.py
------------

- Current::

    creationDate = header['POT-Creation-Date']

    # peel off the escaped newline at the end of string
    if creationDate.endswith('\\n'):
        creationDate = creationDate[:-len('\\n')]

- Improved::

    creationDate = header['POT-Creation-Date'].removesuffix('\\n')


There were many other such examples in the stdlib.


Rejected Ideas
==============

Expand the lstrip and rstrip APIs
---------------------------------

Because ``lstrip`` takes a string as its argument, it could be viewed
as taking an iterable of length-1 strings.  The API could, therefore, be
generalized to accept any iterable of strings, which would be
successively removed as prefixes.  While this behavior would be
consistent, it would not be obvious for users to have to call
``'foobar'.lstrip(('foo',))`` for the common use case of a
single prefix.


Remove multiple copies of a prefix
----------------------------------

This is the behavior that would be consistent with the aforementioned
expansion of the ``lstrip``/``rstrip`` API -- repeatedly applying the
function until the argument is unchanged.  This behavior is attainable
from the proposed behavior via by the following::

    >>> s = 'Foo' * 100 + 'Bar'
    >>> prefix = 'Foo'
    >>> while s.startswith(prefix): s = s.removeprefix(prefix)
    >>> s
    'Bar'


Raising an exception when not found
-----------------------------------

There was a suggestion that ``s.removeprefix(pre)`` should raise an
exception if ``not s.startswith(pre)``.  However, this does not match
with the behavior and feel of other string methods.  There could be
``required=False`` keyword added, but this violates the KISS
principle.


Accepting a tuple of affixes
----------------------------

It could be convenient to write the ``test_concurrent_futures.py``
example above as ``name.removesuffix(('Mixin', 'Tests', 'Test'))``, so
there was a suggestion that the new methods be able to take a tuple of
strings as an argument, similar to the ``startswith()`` API.  Within
the tuple, only the first matching affix would be removed.  This was
rejected on the following grounds:

* This behavior can be surprising or visually confusing, especially
  when one prefix is empty or is a substring of another prefix, as in
  ``'FooBar'.removeprefix(('', 'Foo')) == 'FooBar'``
  or ``'FooBar text'.removeprefix(('Foo', 'FooBar ')) == 'Bar text'``.

* The API for ``str.replace()`` only accepts a single pair of
  replacement strings, but has stood the test of time by refusing the
  temptation to guess in the face of ambiguous multiple replacements.

* There may be a compelling use case for such a feature in the future,
  but generalization before the basic feature sees real-world use would
  be easy to get permanently wrong.


Alternative Method Names
------------------------

Several alternatives method names have been proposed.  Some are listed
below, along with commentary for why they should be rejected in favor
of ``removeprefix`` (the same arguments hold for ``removesuffix``).

- ``ltrim``, ``trimprefix``, etc.:

  "Trim" does in other languages (e.g. JavaScript, Java, Go, PHP)
  what ``strip`` methods do in Python.

- ``lstrip(string=...)``

  This would avoid adding a new method, but for different
  behavior, it's better to have two different methods than one
  method with a keyword argument that selects the behavior.

- ``remove_prefix``:

  All of the other methods of the string API, e.g.
  ``str.startswith()``, use ``lowercase`` rather than
  ``lower_case_with_underscores``.

- ``removeleft``, ``leftremove``, or ``lremove``:

  The explicitness of "prefix" is preferred.

- ``cutprefix``, ``deleteprefix``, ``withoutprefix``, ``dropprefix``, etc.:

  Many of these might have been acceptable, but "remove" is
  unambiguous and matches how one would describe the "remove the prefix"
  behavior in English.

- ``stripprefix``:

  Users may benefit from remembering that "strip" means working
  with sets of characters, while other methods work with
  substrings, so re-using "strip" here should be avoided.


How to Teach This
=================

Among the uses for the ``partition()``, ``startswith()``, and
``split()`` string methods or the ``enumerate()`` or ``zip()``
built-in functions, a common theme is that if a beginner finds
themselves manually indexing or slicing a string, then they should
consider whether there is a higher-level method that better
communicates *what* the code should do rather than merely *how* the
code should do it.  The proposed  ``removeprefix()`` and
``removesuffix()`` methods expand the high-level string "toolbox" and
further allow for this sort of skepticism toward manual slicing.

The main opportunity for user confusion will be the conflation of
``lstrip``/``rstrip`` with ``removeprefix``/``removesuffix``.
It may therefore be helpful to emphasize (as the documentation will)
the following differences between the methods:

* ``(l/r)strip``:

  - The argument is interpreted as a character set.

  - The characters are repeatedly removed from the appropriate end of
    the string.

* ``remove(prefix/suffix)``:

  - The argument is interpreted as an unbroken substring.

  - Only at most one copy of the prefix/suffix is removed.


Reference Implementation
========================

See the pull request on GitHub [#pr]_.


History of Major revisions
==========================

* Version 3: Remove tuple behavior.

* Version 2: Changed name to ``removeprefix``/``removesuffix``;
  added support for tuples as arguments

* Version 1: Initial draft with ``cutprefix``/``cutsuffix``


References
==========

.. [#pr] GitHub pull request with implementation
   (https://github.com/python/cpython/pull/18939)
.. [#pyid] [Python-Ideas] "New explicit methods to trim strings"
   (https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/)
.. [3] "Re: [Python-ideas] adding a trim convenience function"
   (https://mail.python.org/archives/list/python-ideas@python.org/thread/SJ7CKPZSKB5RWT7H3YNXOJUQ7QLD2R3X/#C2W5T7RCFSHU5XI72HG53A6R3J3SN4MV)
.. [4] "Re: [Python-Dev] strip behavior provides inconsistent results with certain strings"
   (https://mail.python.org/archives/list/python-ideas@python.org/thread/XYFQMFPUV6FR2N5BGYWPBVMZ5BE5PJ6C/#XYFQMFPUV6FR2N5BGYWPBVMZ5BE5PJ6C)
.. [5] [Python-Dev] "correction of a bug"
   (https://mail.python.org/archives/list/python-dev@python.org/thread/AOZ7RFQTQLCZCTVNKESZI67PB3PSS72X/#AOZ7RFQTQLCZCTVNKESZI67PB3PSS72X)
.. [6] [Python-Dev] "str.lstrip bug?"
   (https://mail.python.org/archives/list/python-dev@python.org/thread/OJDKRIESKGTQFNLX6KZSGKU57UXNZYAN/#CYZUFFJ2Q5ZZKMJIQBZVZR4NSLK5ZPIH)
.. [7] [Python-Dev] "strip behavior provides inconsistent results with certain strings"
   (https://mail.python.org/archives/list/python-dev@python.org/thread/ZWRGCGANHGVDPP44VQKRIYOYX7LNVDVG/#ZWRGCGANHGVDPP44VQKRIYOYX7LNVDVG)
.. [#confusion] Comment listing Bug Tracker and StackOverflow issues
   (https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFIII3AX22K3N3KT7RB4DPBY3LPVG/)


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: