374 lines
12 KiB
ReStructuredText
374 lines
12 KiB
ReStructuredText
PEP: 616
|
||
Title: String methods to remove prefixes and suffixes
|
||
Author: Dennis Sweeney <sweeney.dennis650@gmail.com>
|
||
Sponsor: Eric V. Smith <eric@trueblade.com>
|
||
Status: Final
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 19-Mar-2020
|
||
Python-Version: 3.9
|
||
Post-History: 20-Mar-2020
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This is a proposal to add two new methods, ``removeprefix()`` and
|
||
``removesuffix()``, to the APIs of Python's various string objects. These
|
||
methods would remove a prefix or suffix (respectively) from a string,
|
||
if present, and would be added to Unicode ``str`` objects, binary
|
||
``bytes`` and ``bytearray`` objects, and ``collections.UserString``.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
There have been repeated issues on Python-Ideas [#pyid]_ [3]_,
|
||
Python-Dev [4]_ [5]_ [6]_ [7]_, the Bug Tracker, and
|
||
StackOverflow [#confusion]_, related to user confusion about the
|
||
existing ``str.lstrip`` and ``str.rstrip`` methods. These users are
|
||
typically expecting the behavior of ``removeprefix`` and ``removesuffix``,
|
||
but they are surprised that the parameter for ``lstrip`` is
|
||
interpreted as a set of characters, not a substring. This repeated
|
||
issue is evidence that these methods are useful. The new methods
|
||
allow a cleaner redirection of users to the desired behavior.
|
||
|
||
As another testimonial for the usefulness of these methods, several
|
||
users on Python-Ideas [#pyid]_ reported frequently including similar
|
||
functions in their code for productivity. The implementation
|
||
often contained subtle mistakes regarding the handling of the empty
|
||
string, so a well-tested built-in method would be useful.
|
||
|
||
The existing solutions for creating the desired behavior are to either
|
||
implement the methods as in the `Specification`_ below, or to use
|
||
regular expressions as in the expression
|
||
``re.sub('^' + re.escape(prefix), '', s)``, which is less discoverable,
|
||
requires a module import, and results in less readable code.
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
The builtin ``str`` class will gain two new methods which will behave
|
||
as follows when ``type(self) is type(prefix) is type(suffix) is str``::
|
||
|
||
def removeprefix(self: str, prefix: str, /) -> str:
|
||
if self.startswith(prefix):
|
||
return self[len(prefix):]
|
||
else:
|
||
return self[:]
|
||
|
||
def removesuffix(self: str, suffix: str, /) -> str:
|
||
# suffix='' should not call self[:-0].
|
||
if suffix and self.endswith(suffix):
|
||
return self[:-len(suffix)]
|
||
else:
|
||
return self[:]
|
||
|
||
When the arguments are instances of ``str`` subclasses, the methods should
|
||
behave as though those arguments were first coerced to base ``str``
|
||
objects, and the return value should always be a base ``str``.
|
||
|
||
Methods with the corresponding semantics will be added to the builtin
|
||
``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes``
|
||
or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()``
|
||
will accept any bytes-like object as an argument. The two methods will
|
||
also be added to ``collections.UserString``, with similar behavior.
|
||
|
||
|
||
Motivating examples from the Python standard library
|
||
====================================================
|
||
|
||
The examples below demonstrate how the proposed methods can make code
|
||
one or more of the following:
|
||
|
||
1. Less fragile:
|
||
|
||
The code will not depend on the user to count the length of a literal.
|
||
|
||
2. More performant:
|
||
|
||
The code does not require a call to the Python built-in ``len``
|
||
function nor to the more expensive ``str.replace()`` method.
|
||
|
||
3. More descriptive:
|
||
|
||
The methods give a higher-level API for code readability as
|
||
opposed to the traditional method of string slicing.
|
||
|
||
|
||
find_recursionlimit.py
|
||
----------------------
|
||
|
||
- Current::
|
||
|
||
if test_func_name.startswith("test_"):
|
||
print(test_func_name[5:])
|
||
else:
|
||
print(test_func_name)
|
||
|
||
- Improved::
|
||
|
||
print(test_func_name.removeprefix("test_"))
|
||
|
||
|
||
deccheck.py
|
||
-----------
|
||
|
||
This is an interesting case because the author chose to use the
|
||
``str.replace`` method in a situation where only a prefix was
|
||
intended to be removed.
|
||
|
||
- Current::
|
||
|
||
if funcname.startswith("context."):
|
||
self.funcname = funcname.replace("context.", "")
|
||
self.contextfunc = True
|
||
else:
|
||
self.funcname = funcname
|
||
self.contextfunc = False
|
||
|
||
- Improved::
|
||
|
||
if funcname.startswith("context."):
|
||
self.funcname = funcname.removeprefix("context.")
|
||
self.contextfunc = True
|
||
else:
|
||
self.funcname = funcname
|
||
self.contextfunc = False
|
||
|
||
- Arguably further improved::
|
||
|
||
self.contextfunc = funcname.startswith("context.")
|
||
self.funcname = funcname.removeprefix("context.")
|
||
|
||
|
||
cookiejar.py
|
||
------------
|
||
|
||
- Current::
|
||
|
||
def strip_quotes(text):
|
||
if text.startswith('"'):
|
||
text = text[1:]
|
||
if text.endswith('"'):
|
||
text = text[:-1]
|
||
return text
|
||
|
||
- Improved::
|
||
|
||
def strip_quotes(text):
|
||
return text.removeprefix('"').removesuffix('"')
|
||
|
||
|
||
test_i18n.py
|
||
------------
|
||
|
||
- Current::
|
||
|
||
creationDate = header['POT-Creation-Date']
|
||
|
||
# peel off the escaped newline at the end of string
|
||
if creationDate.endswith('\\n'):
|
||
creationDate = creationDate[:-len('\\n')]
|
||
|
||
- Improved::
|
||
|
||
creationDate = header['POT-Creation-Date'].removesuffix('\\n')
|
||
|
||
|
||
There were many other such examples in the stdlib.
|
||
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Expand the lstrip and rstrip APIs
|
||
---------------------------------
|
||
|
||
Because ``lstrip`` takes a string as its argument, it could be viewed
|
||
as taking an iterable of length-1 strings. The API could, therefore, be
|
||
generalized to accept any iterable of strings, which would be
|
||
successively removed as prefixes. While this behavior would be
|
||
consistent, it would not be obvious for users to have to call
|
||
``'foobar'.lstrip(('foo',))`` for the common use case of a
|
||
single prefix.
|
||
|
||
|
||
Remove multiple copies of a prefix
|
||
----------------------------------
|
||
|
||
This is the behavior that would be consistent with the aforementioned
|
||
expansion of the ``lstrip``/``rstrip`` API -- repeatedly applying the
|
||
function until the argument is unchanged. This behavior is attainable
|
||
from the proposed behavior via by the following::
|
||
|
||
>>> s = 'Foo' * 100 + 'Bar'
|
||
>>> prefix = 'Foo'
|
||
>>> while s.startswith(prefix): s = s.removeprefix(prefix)
|
||
>>> s
|
||
'Bar'
|
||
|
||
|
||
Raising an exception when not found
|
||
-----------------------------------
|
||
|
||
There was a suggestion that ``s.removeprefix(pre)`` should raise an
|
||
exception if ``not s.startswith(pre)``. However, this does not match
|
||
with the behavior and feel of other string methods. There could be
|
||
``required=False`` keyword added, but this violates the KISS
|
||
principle.
|
||
|
||
|
||
Accepting a tuple of affixes
|
||
----------------------------
|
||
|
||
It could be convenient to write the ``test_concurrent_futures.py``
|
||
example above as ``name.removesuffix(('Mixin', 'Tests', 'Test'))``, so
|
||
there was a suggestion that the new methods be able to take a tuple of
|
||
strings as an argument, similar to the ``startswith()`` API. Within
|
||
the tuple, only the first matching affix would be removed. This was
|
||
rejected on the following grounds:
|
||
|
||
* This behavior can be surprising or visually confusing, especially
|
||
when one prefix is empty or is a substring of another prefix, as in
|
||
``'FooBar'.removeprefix(('', 'Foo')) == 'FooBar'``
|
||
or ``'FooBar text'.removeprefix(('Foo', 'FooBar ')) == 'Bar text'``.
|
||
|
||
* The API for ``str.replace()`` only accepts a single pair of
|
||
replacement strings, but has stood the test of time by refusing the
|
||
temptation to guess in the face of ambiguous multiple replacements.
|
||
|
||
* There may be a compelling use case for such a feature in the future,
|
||
but generalization before the basic feature sees real-world use would
|
||
be easy to get permanently wrong.
|
||
|
||
|
||
Alternative Method Names
|
||
------------------------
|
||
|
||
Several alternatives method names have been proposed. Some are listed
|
||
below, along with commentary for why they should be rejected in favor
|
||
of ``removeprefix`` (the same arguments hold for ``removesuffix``).
|
||
|
||
- ``ltrim``, ``trimprefix``, etc.:
|
||
|
||
"Trim" does in other languages (e.g. JavaScript, Java, Go, PHP)
|
||
what ``strip`` methods do in Python.
|
||
|
||
- ``lstrip(string=...)``
|
||
|
||
This would avoid adding a new method, but for different
|
||
behavior, it's better to have two different methods than one
|
||
method with a keyword argument that selects the behavior.
|
||
|
||
- ``remove_prefix``:
|
||
|
||
All of the other methods of the string API, e.g.
|
||
``str.startswith()``, use ``lowercase`` rather than
|
||
``lower_case_with_underscores``.
|
||
|
||
- ``removeleft``, ``leftremove``, or ``lremove``:
|
||
|
||
The explicitness of "prefix" is preferred.
|
||
|
||
- ``cutprefix``, ``deleteprefix``, ``withoutprefix``, ``dropprefix``, etc.:
|
||
|
||
Many of these might have been acceptable, but "remove" is
|
||
unambiguous and matches how one would describe the "remove the prefix"
|
||
behavior in English.
|
||
|
||
- ``stripprefix``:
|
||
|
||
Users may benefit from remembering that "strip" means working
|
||
with sets of characters, while other methods work with
|
||
substrings, so re-using "strip" here should be avoided.
|
||
|
||
|
||
How to Teach This
|
||
=================
|
||
|
||
Among the uses for the ``partition()``, ``startswith()``, and
|
||
``split()`` string methods or the ``enumerate()`` or ``zip()``
|
||
built-in functions, a common theme is that if a beginner finds
|
||
themselves manually indexing or slicing a string, then they should
|
||
consider whether there is a higher-level method that better
|
||
communicates *what* the code should do rather than merely *how* the
|
||
code should do it. The proposed ``removeprefix()`` and
|
||
``removesuffix()`` methods expand the high-level string "toolbox" and
|
||
further allow for this sort of skepticism toward manual slicing.
|
||
|
||
The main opportunity for user confusion will be the conflation of
|
||
``lstrip``/``rstrip`` with ``removeprefix``/``removesuffix``.
|
||
It may therefore be helpful to emphasize (as the documentation will)
|
||
the following differences between the methods:
|
||
|
||
* ``(l/r)strip``:
|
||
|
||
- The argument is interpreted as a character set.
|
||
|
||
- The characters are repeatedly removed from the appropriate end of
|
||
the string.
|
||
|
||
* ``remove(prefix/suffix)``:
|
||
|
||
- The argument is interpreted as an unbroken substring.
|
||
|
||
- Only at most one copy of the prefix/suffix is removed.
|
||
|
||
|
||
Reference Implementation
|
||
========================
|
||
|
||
See the pull request on GitHub [#pr]_.
|
||
|
||
|
||
History of Major revisions
|
||
==========================
|
||
|
||
* Version 3: Remove tuple behavior.
|
||
|
||
* Version 2: Changed name to ``removeprefix``/``removesuffix``;
|
||
added support for tuples as arguments
|
||
|
||
* Version 1: Initial draft with ``cutprefix``/``cutsuffix``
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#pr] GitHub pull request with implementation
|
||
(https://github.com/python/cpython/pull/18939)
|
||
.. [#pyid] [Python-Ideas] "New explicit methods to trim strings"
|
||
(https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/)
|
||
.. [3] "Re: [Python-ideas] adding a trim convenience function"
|
||
(https://mail.python.org/archives/list/python-ideas@python.org/thread/SJ7CKPZSKB5RWT7H3YNXOJUQ7QLD2R3X/#C2W5T7RCFSHU5XI72HG53A6R3J3SN4MV)
|
||
.. [4] "Re: [Python-Dev] strip behavior provides inconsistent results with certain strings"
|
||
(https://mail.python.org/archives/list/python-ideas@python.org/thread/XYFQMFPUV6FR2N5BGYWPBVMZ5BE5PJ6C/#XYFQMFPUV6FR2N5BGYWPBVMZ5BE5PJ6C)
|
||
.. [5] [Python-Dev] "correction of a bug"
|
||
(https://mail.python.org/archives/list/python-dev@python.org/thread/AOZ7RFQTQLCZCTVNKESZI67PB3PSS72X/#AOZ7RFQTQLCZCTVNKESZI67PB3PSS72X)
|
||
.. [6] [Python-Dev] "str.lstrip bug?"
|
||
(https://mail.python.org/archives/list/python-dev@python.org/thread/OJDKRIESKGTQFNLX6KZSGKU57UXNZYAN/#CYZUFFJ2Q5ZZKMJIQBZVZR4NSLK5ZPIH)
|
||
.. [7] [Python-Dev] "strip behavior provides inconsistent results with certain strings"
|
||
(https://mail.python.org/archives/list/python-dev@python.org/thread/ZWRGCGANHGVDPP44VQKRIYOYX7LNVDVG/#ZWRGCGANHGVDPP44VQKRIYOYX7LNVDVG)
|
||
.. [#confusion] Comment listing Bug Tracker and StackOverflow issues
|
||
(https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFIII3AX22K3N3KT7RB4DPBY3LPVG/)
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document is placed in the public domain or under the
|
||
CC0-1.0-Universal license, whichever is more permissive.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|