PEP 616: Rename the methods and restrict to one affix (#1340)
This commit is contained in:
parent
dcb29f4019
commit
68062417c0
197
pep-0616.rst
197
pep-0616.rst
|
@ -13,10 +13,10 @@ Post-History: 20-Mar-2020
|
|||
Abstract
|
||||
========
|
||||
|
||||
This is a proposal to add two new methods, ``cutprefix`` and
|
||||
``cutsuffix``, to the APIs of Python's various string objects. These
|
||||
This is a proposal to add two new methods, ``removeprefix()`` and
|
||||
``removesuffix()``, to the APIs of Python's various string objects. These
|
||||
methods would remove a prefix or suffix (respectively) from a string,
|
||||
if present, and would be added to to Unicode ``str`` objects, binary
|
||||
if present, and would be added to Unicode ``str`` objects, binary
|
||||
``bytes`` and ``bytearray`` objects, and ``collections.UserString``.
|
||||
|
||||
|
||||
|
@ -27,7 +27,7 @@ There have been repeated issues on Python-Ideas [#pyid]_ [3]_,
|
|||
Python-Dev [4]_ [5]_ [6]_ [7]_, the Bug Tracker, and
|
||||
StackOverflow [#confusion]_, related to user confusion about the
|
||||
existing ``str.lstrip`` and ``str.rstrip`` methods. These users are
|
||||
typically expecting the behavior of ``cutprefix`` and ``cutsuffix``,
|
||||
typically expecting the behavior of ``removeprefix`` and ``removesuffix``,
|
||||
but they are surprised that the parameter for ``lstrip`` is
|
||||
interpreted as a set of characters, not a substring. This repeated
|
||||
issue is evidence that these methods are useful. The new methods
|
||||
|
@ -35,9 +35,15 @@ allow a cleaner redirection of users to the desired behavior.
|
|||
|
||||
As another testimonial for the usefulness of these methods, several
|
||||
users on Python-Ideas [#pyid]_ reported frequently including similar
|
||||
functions in their own code for productivity. The implementation
|
||||
functions in their code for productivity. The implementation
|
||||
often contained subtle mistakes regarding the handling of the empty
|
||||
string (see `Specification`_).
|
||||
string, so a well-tested built-in method would be useful.
|
||||
|
||||
The existing solutions for creating the desired behavior are to either
|
||||
implement the methods as in the `Specification`_ below, or to use
|
||||
regular expressions as in the expression
|
||||
``re.sub('^' + re.escape(prefix), '', s)``, which is less discoverable,
|
||||
requires a module import, and results in less readable code.
|
||||
|
||||
|
||||
Specification
|
||||
|
@ -46,95 +52,43 @@ Specification
|
|||
The builtin ``str`` class will gain two new methods with roughly the
|
||||
following behavior::
|
||||
|
||||
def cutprefix(self, prefix, /):
|
||||
if not isinstance(self, str):
|
||||
raise TypeError()
|
||||
self_str = str(self)
|
||||
|
||||
if isinstance(prefix, tuple):
|
||||
for option in tuple(prefix):
|
||||
if not isinstance(option, str):
|
||||
raise TypeError()
|
||||
option_str = str(option)
|
||||
|
||||
if self_str.startswith(option_str):
|
||||
return self_str[len(option_str):]
|
||||
|
||||
return self_str[:]
|
||||
|
||||
if not isinstance(prefix, str):
|
||||
raise TypeError()
|
||||
|
||||
prefix_str = str(prefix)
|
||||
|
||||
if self_str.startswith(prefix_str):
|
||||
return self_str[len(prefix_str):]
|
||||
def removeprefix(self: str, prefix: str, /) -> str:
|
||||
if self.startswith(prefix):
|
||||
return self[len(prefix):]
|
||||
else:
|
||||
return self_str[:]
|
||||
return self[:]
|
||||
|
||||
|
||||
def cutsuffix(self, suffix, /):
|
||||
if not isinstance(self, str):
|
||||
raise TypeError()
|
||||
self_str = str(self)
|
||||
|
||||
if isinstance(suffix, tuple):
|
||||
for option in tuple(suffix):
|
||||
if not isinstance(option, str):
|
||||
raise TypeError()
|
||||
option_str = str(option)
|
||||
|
||||
if not option_str:
|
||||
return self_str[:]
|
||||
if self_str.endswith(option_str):
|
||||
return self_str[:-len(option_str)]
|
||||
|
||||
return self_str[:]
|
||||
|
||||
if not isinstance(suffix, str):
|
||||
raise TypeError()
|
||||
suffix_str = str(suffix)
|
||||
|
||||
if suffix_str and self_str.startswith(suffix_str):
|
||||
return self_str[:-len(suffix_str)]
|
||||
def removesuffix(self: str, suffix: str, /) -> str:
|
||||
if suffix and self.endswith(suffix):
|
||||
return self[:-len(suffix)]
|
||||
else:
|
||||
return self_str[:]
|
||||
return self[:]
|
||||
|
||||
Note that ``self[:]`` might not actually make a copy -- if the affix
|
||||
is empty or not found, and if ``type(self) is str``, then these methods
|
||||
may, but are not required to, make the optimization of returning ``self``.
|
||||
However, when called on instances of subclasses of ``str``, these
|
||||
methods should return base ``str`` objects, not ``self``.
|
||||
|
||||
Note that without the check for the truthyness of suffixes,
|
||||
``s.cutsuffix('')`` would be mishandled and always return the empty
|
||||
Without the check for the truthiness of ``suffix``,
|
||||
``s.removesuffix('')`` would be mishandled and always return the empty
|
||||
string due to the unintended evaluation of ``self[:-0]``.
|
||||
|
||||
Methods with the corresponding semantics will be added to the builtin
|
||||
``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes``
|
||||
or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
|
||||
will accept any bytes-like object or tuple of bytes-like objects as an
|
||||
argument. The one-at-a-time checking of types matches the implementation
|
||||
of ``startswith()`` and ``endswith()`` methods.
|
||||
|
||||
The ``self_str[:]`` copying behavior in the code ensures that the
|
||||
``bytearray`` methods do not return ``self``, but it does not preclude
|
||||
the ``str`` and ``bytes`` methods from returning ``self``. Because
|
||||
``str`` and ``bytes`` instances are immutable, the ``cutprefix()``
|
||||
and ``cutsuffix()`` methods on these objects may (but are not
|
||||
required to) make the optimization of returning ``self`` if
|
||||
``type(self) is str`` (``type(self) is bytes`` respectively)
|
||||
and the given affixes are not found, or are empty. As such, the
|
||||
following behavior is considered a CPython implementation detail, and
|
||||
is not guaranteed by this specification::
|
||||
|
||||
>>> x = 'foobar' * 10**6
|
||||
>>> x.cutprefix('baz') is x is x.cutsuffix('baz')
|
||||
True
|
||||
>>> x.cutprefix('') is x is x.cutsuffix('')
|
||||
True
|
||||
or ``bytearray`` object, then ``b.removeprefix()`` and ``b.removesuffix()``
|
||||
will accept any bytes-like object as an argument. Although the methods
|
||||
on the immutable ``str`` and ``bytes`` types may make the aforementioned
|
||||
optimization of returning the original object, ``bytearray.removeprefix()``
|
||||
and ``bytearray.removesuffix()`` should *always* return a copy, never the
|
||||
original object.
|
||||
|
||||
To test whether any affixes were removed during the call, users
|
||||
should use the constant-time behavior of comparing the lengths of
|
||||
may use the constant-time behavior of comparing the lengths of
|
||||
the original and new strings::
|
||||
|
||||
>>> string = 'Python String Input'
|
||||
>>> new_string = string.cutprefix("Py")
|
||||
>>> new_string = string.removeprefix("Py")
|
||||
>>> modified = (len(string) != len(new_string))
|
||||
>>> modified
|
||||
True
|
||||
|
@ -159,8 +113,7 @@ one or more of the following:
|
|||
2. More performant:
|
||||
|
||||
The code does not require a call to the Python built-in ``len``
|
||||
function, nor to the more expensive ``str.replace()``
|
||||
method.
|
||||
function nor to the more expensive ``str.replace()`` method.
|
||||
|
||||
3. More descriptive:
|
||||
|
||||
|
@ -180,7 +133,7 @@ find_recursionlimit.py
|
|||
|
||||
- Improved::
|
||||
|
||||
print(test_func_name.cutprefix("test_"))
|
||||
print(test_func_name.removeprefix("test_"))
|
||||
|
||||
|
||||
deccheck.py
|
||||
|
@ -202,7 +155,7 @@ intended to be removed.
|
|||
- Improved::
|
||||
|
||||
if funcname.startswith("context."):
|
||||
self.funcname = funcname.cutprefix("context.")
|
||||
self.funcname = funcname.removeprefix("context.")
|
||||
self.contextfunc = True
|
||||
else:
|
||||
self.funcname = funcname
|
||||
|
@ -211,7 +164,7 @@ intended to be removed.
|
|||
- Arguably further improved::
|
||||
|
||||
self.contextfunc = funcname.startswith("context.")
|
||||
self.funcname = funcname.cutprefix("context.")
|
||||
self.funcname = funcname.removeprefix("context.")
|
||||
|
||||
|
||||
cookiejar.py
|
||||
|
@ -229,12 +182,15 @@ cookiejar.py
|
|||
- Improved::
|
||||
|
||||
def strip_quotes(text):
|
||||
return text.cutprefix('"').cutsuffix('"')
|
||||
return text.removeprefix('"').removesuffix('"')
|
||||
|
||||
|
||||
test_concurrent_futures.py
|
||||
--------------------------
|
||||
|
||||
In the following example, the meaning of the code changes slightly,
|
||||
but in context, it behaves the same.
|
||||
|
||||
- Current::
|
||||
|
||||
if name.endswith(('Mixin', 'Tests')):
|
||||
|
@ -246,7 +202,9 @@ test_concurrent_futures.py
|
|||
|
||||
- Improved::
|
||||
|
||||
return name.cutsuffix(('Mixin', 'Tests', 'Test'))
|
||||
return (name.removesuffix('Mixin')
|
||||
.removesuffix('Tests')
|
||||
.removesuffix('Test'))
|
||||
|
||||
|
||||
There were many other such examples in the stdlib.
|
||||
|
@ -259,7 +217,7 @@ Expand the lstrip and rstrip APIs
|
|||
---------------------------------
|
||||
|
||||
Because ``lstrip`` takes a string as its argument, it could be viewed
|
||||
as taking an iterable of length-1 strings. The API could therefore be
|
||||
as taking an iterable of length-1 strings. The API could, therefore, be
|
||||
generalized to accept any iterable of strings, which would be
|
||||
successively removed as prefixes. While this behavior would be
|
||||
consistent, it would not be obvious for users to have to call
|
||||
|
@ -275,37 +233,61 @@ expansion of the ``lstrip``/``rstrip`` API -- repeatedly applying the
|
|||
function until the argument is unchanged. This behavior is attainable
|
||||
from the proposed behavior via by the following::
|
||||
|
||||
>>> s = 'FooBar' * 100 + 'Baz'
|
||||
>>> prefixes = ('Bar', 'Foo')
|
||||
>>> while len(s) != len(s := s.cutprefix(prefixes)): pass
|
||||
>>> s = 'Foo' * 100 + 'Bar'
|
||||
>>> prefix = 'Foo'
|
||||
>>> while len(s) != len(s := s.cutprefix(prefix)): pass
|
||||
>>> s
|
||||
'Baz'
|
||||
'Bar'
|
||||
|
||||
or the more obvious and readable alternative::
|
||||
|
||||
>>> s = 'FooBar' * 100 + 'Baz'
|
||||
>>> prefixes = ('Bar', 'Foo')
|
||||
>>> while s.startswith(prefixes): s = s.cutprefix(prefixes)
|
||||
>>> s = 'Foo' * 100 + 'Bar'
|
||||
>>> prefix = 'Foo'
|
||||
>>> while s.startswith(prefix): s = s.cutprefix(prefix)
|
||||
>>> s
|
||||
'Baz'
|
||||
'Bar'
|
||||
|
||||
|
||||
Raising an exception when not found
|
||||
-----------------------------------
|
||||
|
||||
There was a suggestion that ``s.cutprefix(pre)`` should raise an
|
||||
There was a suggestion that ``s.removeprefix(pre)`` should raise an
|
||||
exception if ``not s.startswith(pre)``. However, this does not match
|
||||
with the behavior and feel of other string methods. There could be
|
||||
``required=False`` keyword added, but this violates the KISS
|
||||
principle.
|
||||
|
||||
|
||||
Accepting a tuple of affixes
|
||||
-----------------------------
|
||||
|
||||
It could be convenient to write the ``test_concurrent_futures.py``
|
||||
example above as ``name.removesuffix(('Mixin', 'Tests', 'Test'))``, so
|
||||
there was a suggestion that the new methods be able to take a tuple of
|
||||
strings as an argument, similar to the ``startswith()`` API. Within
|
||||
the tuple, only the first matching affix would be removed. This was
|
||||
rejected on the following grounds:
|
||||
|
||||
* This behavior can be surprising or visually confusing, especially
|
||||
when one prefix is empty or is a substring of another prefix, as in
|
||||
``'FooBar'.removeprefix(('', 'Foo')) == 'Foo'``
|
||||
or ``'FooBar text'.removeprefix(('Foo', 'FooBar ')) == 'Bar text'``.
|
||||
|
||||
* The API for ``str.replace()`` only accepts a single pair of
|
||||
replacement strings, but has stood the test of time by refusing the
|
||||
temptation to guess in the face of ambiguous multiple replacements.
|
||||
|
||||
* There may be a compelling use case for such a feature in the future,
|
||||
but generalization before the basic feature sees real-world use would
|
||||
be easy to get permanently wrong.
|
||||
|
||||
|
||||
Alternative Method Names
|
||||
------------------------
|
||||
|
||||
Several alternatives method names have been proposed. Some are listed
|
||||
below, along with commentary for why they should be rejected in favor
|
||||
of ``cutprefix`` (the same arguments hold for ``cutsuffix``).
|
||||
of ``removeprefix`` (the same arguments hold for ``removesuffix``).
|
||||
|
||||
- ``ltrim``, ``trimprefix``, etc.:
|
||||
|
||||
|
@ -316,24 +298,23 @@ of ``cutprefix`` (the same arguments hold for ``cutsuffix``).
|
|||
|
||||
This would avoid adding a new method, but for different
|
||||
behavior, it's better to have two different methods than one
|
||||
method with a keyword argument that select the behavior.
|
||||
method with a keyword argument that selects the behavior.
|
||||
|
||||
- ``cut_prefix``:
|
||||
- ``remove_prefix``:
|
||||
|
||||
All of the other methods of the string API, e.g.
|
||||
``str.startswith()``, use ``lowercase`` rather than
|
||||
``lower_case_with_underscores``.
|
||||
|
||||
- ``cutleft``, ``leftcut``, or ``lcut``:
|
||||
- ``removeleft``, ``leftremove``, or ``lremove``:
|
||||
|
||||
The explicitness of "prefix" is preferred.
|
||||
|
||||
- ``removeprefix``, ``deleteprefix``, ``withoutprefix``, ``dropprefix``, etc.:
|
||||
- ``cutprefix``, ``deleteprefix``, ``withoutprefix``, ``dropprefix``, etc.:
|
||||
|
||||
All of these might have been acceptable, but they have more
|
||||
characters than ``cut``. Some suggested that the verb "cut"
|
||||
implies mutability, but the string API already contains verbs
|
||||
like "replace", "strip", "split", and "swapcase".
|
||||
Many of these might have been acceptable, but "remove" is
|
||||
unambiguous and matches how one would describe the "remove the prefix"
|
||||
behavior in English.
|
||||
|
||||
- ``stripprefix``:
|
||||
|
||||
|
@ -345,7 +326,7 @@ of ``cutprefix`` (the same arguments hold for ``cutsuffix``).
|
|||
Reference Implementation
|
||||
========================
|
||||
|
||||
See the pull request on GitHub [#pr]_ (not updated).
|
||||
See the pull request on GitHub [#pr]_.
|
||||
|
||||
|
||||
References
|
||||
|
|
Loading…
Reference in New Issue