2020-03-20 12:21:06 -04:00
|
|
|
|
PEP: 616
|
|
|
|
|
Title: String methods to remove prefixes and suffixes
|
|
|
|
|
Author: Dennis Sweeney <sweeney.dennis650@gmail.com>
|
|
|
|
|
Sponsor: Eric V. Smith <eric@trueblade.com>
|
|
|
|
|
Status: Draft
|
|
|
|
|
Type: Standards Track
|
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 19-Mar-2020
|
|
|
|
|
Python-Version: 3.9
|
2020-03-20 14:55:52 -04:00
|
|
|
|
Post-History: 20-Mar-2020
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
This is a proposal to add two new methods, ``cutprefix`` and
|
2020-03-22 15:02:10 -04:00
|
|
|
|
``cutsuffix``, to the APIs of Python's various string objects. These
|
|
|
|
|
methods would remove a prefix or suffix (respectively) from a string,
|
|
|
|
|
if present, and would be added to to Unicode ``str`` objects, binary
|
|
|
|
|
``bytes`` and ``bytearray`` objects, and ``collections.UserString``.
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
=========
|
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
There have been repeated issues on Python-Ideas [#pyid]_ [3]_,
|
|
|
|
|
Python-Dev [4]_ [5]_ [6]_ [7]_, the Bug Tracker, and
|
|
|
|
|
StackOverflow [#confusion]_, related to user confusion about the
|
|
|
|
|
existing ``str.lstrip`` and ``str.rstrip`` methods. These users are
|
|
|
|
|
typically expecting the behavior of ``cutprefix`` and ``cutsuffix``,
|
|
|
|
|
but they are surprised that the parameter for ``lstrip`` is
|
|
|
|
|
interpreted as a set of characters, not a substring. This repeated
|
|
|
|
|
issue is evidence that these methods are useful. The new methods
|
|
|
|
|
allow a cleaner redirection of users to the desired behavior.
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
As another testimonial for the usefulness of these methods, several
|
|
|
|
|
users on Python-Ideas [#pyid]_ reported frequently including similar
|
|
|
|
|
functions in their own code for productivity. The implementation
|
|
|
|
|
often contained subtle mistakes regarding the handling of the empty
|
|
|
|
|
string (see `Specification`_).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Specification
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
The builtin ``str`` class will gain two new methods with roughly the
|
|
|
|
|
following behavior::
|
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
def cutprefix(self, prefix, /):
|
|
|
|
|
if not isinstance(self, str):
|
|
|
|
|
raise TypeError()
|
|
|
|
|
self_str = str(self)
|
|
|
|
|
|
|
|
|
|
if isinstance(prefix, tuple):
|
|
|
|
|
for option in prefix:
|
|
|
|
|
if not isinstance(option, str):
|
|
|
|
|
raise TypeError()
|
|
|
|
|
option_str = str(option)
|
|
|
|
|
|
|
|
|
|
if self_str.startswith(option_str):
|
|
|
|
|
return self_str[len(option_str):]
|
|
|
|
|
|
|
|
|
|
return self_str[:]
|
|
|
|
|
|
|
|
|
|
if not isinstance(prefix, str):
|
|
|
|
|
raise TypeError()
|
|
|
|
|
|
|
|
|
|
prefix_str = str(prefix)
|
|
|
|
|
|
|
|
|
|
if self_str.startswith(prefix_str):
|
|
|
|
|
return self_str[len(prefix_str):]
|
|
|
|
|
else:
|
|
|
|
|
return self_str[:]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def cutsuffix(self, suffix, /):
|
|
|
|
|
if not isinstance(self, str):
|
|
|
|
|
raise TypeError()
|
|
|
|
|
self_str = str(self)
|
|
|
|
|
|
|
|
|
|
if isinstance(suffix, tuple):
|
|
|
|
|
for option in suffix:
|
|
|
|
|
if not isinstance(option, str):
|
|
|
|
|
raise TypeError()
|
|
|
|
|
option_str = str(option)
|
|
|
|
|
|
|
|
|
|
if option_str and self_str.endswith(option_str):
|
|
|
|
|
return self_str[:-len(option_str)]
|
|
|
|
|
|
|
|
|
|
return self_str[:]
|
|
|
|
|
|
|
|
|
|
if not isinstance(suffix, str):
|
|
|
|
|
raise TypeError()
|
|
|
|
|
suffix_str = str(suffix)
|
|
|
|
|
|
|
|
|
|
if suffix_str and self_str.startswith(suffix_str):
|
|
|
|
|
return self_str[:-len(suffix_str)]
|
|
|
|
|
else:
|
|
|
|
|
return self_str[:]
|
|
|
|
|
|
|
|
|
|
Note that without the check for the truthyness of suffixes,
|
2020-03-20 12:21:06 -04:00
|
|
|
|
``s.cutsuffix('')`` would be mishandled and always return the empty
|
|
|
|
|
string due to the unintended evaluation of ``self[:-0]``.
|
|
|
|
|
|
|
|
|
|
Methods with the corresponding semantics will be added to the builtin
|
|
|
|
|
``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes``
|
|
|
|
|
or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
|
2020-03-22 15:02:10 -04:00
|
|
|
|
will accept any bytes-like object or tuple of bytes-like objects as an
|
2020-03-22 17:16:57 -04:00
|
|
|
|
argument. The one-at-a-time checking of types matches the implementation
|
2020-03-22 15:02:10 -04:00
|
|
|
|
of ``startswith()`` and ``endswith()`` methods.
|
|
|
|
|
|
|
|
|
|
The ``self_str[:]`` copying behavior in the code ensures that the
|
|
|
|
|
``bytearray`` methods do not return ``self``, but it does not preclude
|
2020-03-22 17:16:57 -04:00
|
|
|
|
the ``str`` and ``bytes`` methods from returning ``self``. Because
|
2020-03-22 15:02:10 -04:00
|
|
|
|
``str`` and ``bytes`` instances are immutable, the ``cutprefix()``
|
2020-03-22 17:16:57 -04:00
|
|
|
|
and ``cutsuffix()`` methods on these objects may (but are not
|
2020-03-22 15:02:10 -04:00
|
|
|
|
required to) make the optimization of returning ``self`` if
|
|
|
|
|
``type(self) is str`` (``type(self) is bytes`` respectively)
|
2020-03-22 17:16:57 -04:00
|
|
|
|
and the given affixes are not found, or are empty. As such, the
|
|
|
|
|
following behavior is considered a CPython implementation detail, and
|
|
|
|
|
is not guaranteed by this specification::
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
>>> x = 'foobar' * 10**6
|
|
|
|
|
>>> x.cutprefix('baz') is x is x.cutsuffix('baz')
|
|
|
|
|
True
|
|
|
|
|
>>> x.cutprefix('') is x is x.cutsuffix('')
|
|
|
|
|
True
|
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
To test whether any affixes were removed during the call, users
|
|
|
|
|
should use the constant-time behavior of comparing the lengths of
|
2020-03-22 17:16:57 -04:00
|
|
|
|
the original and new strings::
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
>>> string = 'Python String Input'
|
|
|
|
|
>>> new_string = string.cutprefix("Py")
|
|
|
|
|
>>> modified = (len(string) != len(new_string))
|
|
|
|
|
>>> modified
|
|
|
|
|
True
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
Users may also continue using ``startswith()`` and ``endswith()``
|
|
|
|
|
methods for control flow instead of testing the lengths as above.
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
The two methods will also be added to ``collections.UserString``, with
|
|
|
|
|
similar behavior.
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
Motivating examples from the Python standard library
|
|
|
|
|
====================================================
|
|
|
|
|
|
|
|
|
|
The examples below demonstrate how the proposed methods can make code
|
|
|
|
|
one or more of the following:
|
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
1. Less fragile:
|
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
The code will not depend on the user to count the length of a literal.
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
2. More performant:
|
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
The code does not require a call to the Python built-in ``len``
|
|
|
|
|
function, nor to the more expensive ``str.replace()``
|
|
|
|
|
method.
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
3. More descriptive:
|
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
The methods give a higher-level API for code readability, as
|
|
|
|
|
opposed to the traditional method of string slicing.
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
find_recursionlimit.py
|
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
|
|
- Current::
|
|
|
|
|
|
|
|
|
|
if test_func_name.startswith("test_"):
|
|
|
|
|
print(test_func_name[5:])
|
|
|
|
|
else:
|
|
|
|
|
print(test_func_name)
|
|
|
|
|
|
|
|
|
|
- Improved::
|
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
print(test_func_name.cutprefix("test_"))
|
|
|
|
|
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
deccheck.py
|
|
|
|
|
-----------
|
|
|
|
|
|
|
|
|
|
This is an interesting case because the author chose to use the
|
|
|
|
|
``str.replace`` method in a situation where only a prefix was
|
|
|
|
|
intended to be removed.
|
|
|
|
|
|
|
|
|
|
- Current::
|
|
|
|
|
|
|
|
|
|
if funcname.startswith("context."):
|
|
|
|
|
self.funcname = funcname.replace("context.", "")
|
|
|
|
|
self.contextfunc = True
|
|
|
|
|
else:
|
|
|
|
|
self.funcname = funcname
|
|
|
|
|
self.contextfunc = False
|
|
|
|
|
|
|
|
|
|
- Improved::
|
|
|
|
|
|
|
|
|
|
if funcname.startswith("context."):
|
|
|
|
|
self.funcname = funcname.cutprefix("context.")
|
|
|
|
|
self.contextfunc = True
|
|
|
|
|
else:
|
|
|
|
|
self.funcname = funcname
|
|
|
|
|
self.contextfunc = False
|
|
|
|
|
|
|
|
|
|
- Arguably further improved::
|
|
|
|
|
|
|
|
|
|
self.contextfunc = funcname.startswith("context.")
|
|
|
|
|
self.funcname = funcname.cutprefix("context.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
cookiejar.py
|
|
|
|
|
------------
|
|
|
|
|
|
|
|
|
|
- Current::
|
|
|
|
|
|
|
|
|
|
def strip_quotes(text):
|
|
|
|
|
if text.startswith('"'):
|
|
|
|
|
text = text[1:]
|
|
|
|
|
if text.endswith('"'):
|
|
|
|
|
text = text[:-1]
|
|
|
|
|
return text
|
|
|
|
|
|
|
|
|
|
- Improved::
|
|
|
|
|
|
|
|
|
|
def strip_quotes(text):
|
|
|
|
|
return text.cutprefix('"').cutsuffix('"')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
test_concurrent_futures.py
|
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
|
|
In the following example, the meaning of the code changes slightly,
|
|
|
|
|
but in context, it behaves the same.
|
|
|
|
|
|
|
|
|
|
- Current::
|
|
|
|
|
|
|
|
|
|
if name.endswith(('Mixin', 'Tests')):
|
|
|
|
|
return name[:-5]
|
|
|
|
|
elif name.endswith('Test'):
|
|
|
|
|
return name[:-4]
|
|
|
|
|
else:
|
|
|
|
|
return name
|
|
|
|
|
|
|
|
|
|
- Improved::
|
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
return name.cutsuffix(('Mixin', 'Tests', 'Test'))
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
There were many other such examples in the stdlib.
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rejected Ideas
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
Expand the lstrip and rstrip APIs
|
|
|
|
|
---------------------------------
|
|
|
|
|
|
|
|
|
|
Because ``lstrip`` takes a string as its argument, it could be viewed
|
|
|
|
|
as taking an iterable of length-1 strings. The API could therefore be
|
|
|
|
|
generalized to accept any iterable of strings, which would be
|
|
|
|
|
successively removed as prefixes. While this behavior would be
|
|
|
|
|
consistent, it would not be obvious for users to have to call
|
|
|
|
|
``'foobar'.cutprefix(('foo,))`` for the common use case of a
|
|
|
|
|
single prefix.
|
|
|
|
|
|
|
|
|
|
Remove multiple copies of a prefix
|
|
|
|
|
----------------------------------
|
|
|
|
|
|
|
|
|
|
This is the behavior that would be consistent with the aforementioned
|
2020-03-22 15:02:10 -04:00
|
|
|
|
expansion of the ``lstrip``/``rstrip`` API -- repeatedly applying the
|
2020-03-20 12:21:06 -04:00
|
|
|
|
function until the argument is unchanged. This behavior is attainable
|
2020-03-22 15:02:10 -04:00
|
|
|
|
from the proposed behavior via by the following::
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
>>> s = 'foobar' * 100 + 'bar'
|
|
|
|
|
>>> prefixes = ('bar', 'foo')
|
|
|
|
|
>>> while len(s) != len(s := s.cutprefix(prefixes)): pass
|
2020-03-20 12:21:06 -04:00
|
|
|
|
>>> s
|
|
|
|
|
'bar'
|
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
or the more obvious and readable alternative::
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
2020-03-22 15:02:10 -04:00
|
|
|
|
>>> s = 'foo' * 100 + 'bar'
|
|
|
|
|
>>> prefixes = ('bar', 'foo')
|
|
|
|
|
>>> while s.startswith(prefixes): s = s.cutprefix(prefixes)
|
|
|
|
|
>>> s
|
|
|
|
|
'bar'
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Raising an exception when not found
|
|
|
|
|
-----------------------------------
|
|
|
|
|
|
|
|
|
|
There was a suggestion that ``s.cutprefix(pre)`` should raise an
|
|
|
|
|
exception if ``not s.startswith(pre)``. However, this does not match
|
|
|
|
|
with the behavior and feel of other string methods. There could be
|
|
|
|
|
``required=False`` keyword added, but this violates the KISS
|
|
|
|
|
principle.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alternative Method Names
|
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
|
|
Several alternatives method names have been proposed. Some are listed
|
|
|
|
|
below, along with commentary for why they should be rejected in favor
|
2020-03-22 15:02:10 -04:00
|
|
|
|
of ``cutprefix`` (the same arguments hold for ``cutsuffix``).
|
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
- ``ltrim``, ``trimprefix``, etc.:
|
2020-03-22 15:02:10 -04:00
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
"Trim" does in other languages (e.g. JavaScript, Java, Go, PHP)
|
|
|
|
|
what ``strip`` methods do in Python.
|
2020-03-22 15:02:10 -04:00
|
|
|
|
|
|
|
|
|
- ``lstrip(string=...)``
|
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
This would avoid adding a new method, but for different
|
|
|
|
|
behavior, it's better to have two different methods than one
|
|
|
|
|
method with a keyword argument that select the behavior.
|
2020-03-22 15:02:10 -04:00
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
- ``cut_prefix``:
|
2020-03-22 15:02:10 -04:00
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
All of the other methods of the string API, e.g.
|
|
|
|
|
``str.startswith()``, use ``lowercase`` rather than
|
|
|
|
|
``lower_case_with_underscores``.
|
2020-03-22 15:02:10 -04:00
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
- ``cutleft``, ``leftcut``, or ``lcut``:
|
2020-03-22 15:02:10 -04:00
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
The explicitness of "prefix" is preferred.
|
2020-03-22 15:02:10 -04:00
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
- ``removeprefix``, ``deleteprefix``, ``withoutprefix``, ``dropprefix``, etc.:
|
2020-03-22 15:02:10 -04:00
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
All of these might have been acceptable, but they have more
|
|
|
|
|
characters than ``cut``. Some suggested that the verb "cut"
|
|
|
|
|
implies mutability, but the string API already contains verbs
|
|
|
|
|
like "replace", "strip", "split", and "swapcase".
|
2020-03-22 15:02:10 -04:00
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
- ``stripprefix``:
|
2020-03-22 15:02:10 -04:00
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
Users may benefit from remembering that "strip" means working
|
|
|
|
|
with sets of characters, while other methods work with
|
|
|
|
|
substrings, so re-using "strip" here should be avoided.
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reference Implementation
|
|
|
|
|
========================
|
|
|
|
|
|
2020-03-22 17:16:57 -04:00
|
|
|
|
See the pull request on GitHub [#pr]_ (not updated).
|
2020-03-20 12:21:06 -04:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
.. [#pr] GitHub pull request with implementation
|
|
|
|
|
(https://github.com/python/cpython/pull/18939)
|
2020-03-22 17:16:57 -04:00
|
|
|
|
.. [#pyid] [Python-Ideas] "New explicit methods to trim strings"
|
2020-03-20 12:21:06 -04:00
|
|
|
|
(https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/)
|
2020-03-22 17:16:57 -04:00
|
|
|
|
.. [3] "Re: [Python-ideas] adding a trim convenience function"
|
|
|
|
|
(https://mail.python.org/archives/list/python-ideas@python.org/thread/SJ7CKPZSKB5RWT7H3YNXOJUQ7QLD2R3X/#C2W5T7RCFSHU5XI72HG53A6R3J3SN4MV)
|
|
|
|
|
.. [4] "Re: [Python-Dev] strip behavior provides inconsistent results with certain strings"
|
|
|
|
|
(https://mail.python.org/archives/list/python-ideas@python.org/thread/XYFQMFPUV6FR2N5BGYWPBVMZ5BE5PJ6C/#XYFQMFPUV6FR2N5BGYWPBVMZ5BE5PJ6C)
|
|
|
|
|
.. [5] [Python-Dev] "correction of a bug"
|
|
|
|
|
(https://mail.python.org/archives/list/python-dev@python.org/thread/AOZ7RFQTQLCZCTVNKESZI67PB3PSS72X/#AOZ7RFQTQLCZCTVNKESZI67PB3PSS72X)
|
|
|
|
|
.. [6] [Python-Dev] "str.lstrip bug?"
|
|
|
|
|
(https://mail.python.org/archives/list/python-dev@python.org/thread/OJDKRIESKGTQFNLX6KZSGKU57UXNZYAN/#CYZUFFJ2Q5ZZKMJIQBZVZR4NSLK5ZPIH)
|
|
|
|
|
.. [7] [Python-Dev] "strip behavior provides inconsistent results with certain strings"
|
|
|
|
|
(https://mail.python.org/archives/list/python-dev@python.org/thread/ZWRGCGANHGVDPP44VQKRIYOYX7LNVDVG/#ZWRGCGANHGVDPP44VQKRIYOYX7LNVDVG)
|
2020-03-20 12:21:06 -04:00
|
|
|
|
.. [#confusion] Comment listing Bug Tracker and StackOverflow issues
|
|
|
|
|
(https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFIII3AX22K3N3KT7RB4DPBY3LPVG/)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document is placed in the public domain or under the
|
|
|
|
|
CC0-1.0-Universal license, whichever is more permissive.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
coding: utf-8
|
|
|
|
|
End:
|