489 lines
13 KiB
ReStructuredText
489 lines
13 KiB
ReStructuredText
PEP: 616
|
||
Title: String methods to remove prefixes and suffixes
|
||
Author: Dennis Sweeney <sweeney.dennis650@gmail.com>
|
||
Sponsor: Eric V. Smith <eric@trueblade.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 19-Mar-2020
|
||
Python-Version: 3.9
|
||
Post-History: 30-Aug-2002
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
This is a proposal to add two new methods, ``cutprefix`` and
|
||
``cutsuffix``, to the APIs of Python's various string objects. In
|
||
particular, the methods would be added to Unicode ``str`` objects,
|
||
binary ``bytes`` and ``bytearray`` objects, and
|
||
``collections.UserString``.
|
||
|
||
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
|
||
``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
|
||
been removed. If ``s`` does not have ``pre`` as a prefix, an
|
||
unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)``
|
||
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
|
||
|
||
The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is
|
||
roughly equivalent to
|
||
``s[:-len(suf)] if suf and s.endswith(suf) else s``.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
There have been repeated issues [#confusion]_ on the Bug Tracker
|
||
and StackOverflow related to user confusion about the existing
|
||
``str.lstrip`` and ``str.rstrip`` methods. These users are typically
|
||
expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they
|
||
are surprised that the parameter for ``lstrip`` is interpreted as a
|
||
set of characters, not a substring. This repeated issue is evidence
|
||
that these methods are useful, and the new methods allow a cleaner
|
||
redirection of users to the desired behavior.
|
||
|
||
As another testimonial for the usefulness of these methods, several
|
||
users on Python-Ideas [#pyid]_ reported frequently including similar
|
||
functions in their own code for productivity. The implementation
|
||
often contained subtle mistakes regarding the handling of the empty
|
||
string (see `Specification`_).
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
The builtin ``str`` class will gain two new methods with roughly the
|
||
following behavior::
|
||
|
||
def cutprefix(self: str, pre: str, /) -> str:
|
||
if self.startswith(pre):
|
||
return self[len(pre):]
|
||
return self[:]
|
||
|
||
def cutsuffix(self: str, suf: str, /) -> str:
|
||
if suf and self.endswith(suf):
|
||
return self[:-len(suf)]
|
||
return self[:]
|
||
|
||
The only difference between the real implementation and the above is
|
||
that, as with other string methods like ``replace``, the
|
||
methods will raise a ``TypeError`` if any of ``self``, ``pre`` or
|
||
``suf`` is not an instace of ``str``, and will cast subclasses of
|
||
``str`` to builtin ``str`` objects.
|
||
|
||
Note that without the check for the truthyness of ``suf``,
|
||
``s.cutsuffix('')`` would be mishandled and always return the empty
|
||
string due to the unintended evaluation of ``self[:-0]``.
|
||
|
||
Methods with the corresponding semantics will be added to the builtin
|
||
``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes``
|
||
or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
|
||
will accept any bytes-like object as an argument.
|
||
|
||
Note that the ``bytearray`` methods return a copy of ``self``; they do
|
||
not operate in place.
|
||
|
||
The following behavior is considered a CPython implementation detail,
|
||
but is not guaranteed by this specification::
|
||
|
||
>>> x = 'foobar' * 10**6
|
||
>>> x.cutprefix('baz') is x is x.cutsuffix('baz')
|
||
True
|
||
>>> x.cutprefix('') is x is x.cutsuffix('')
|
||
True
|
||
|
||
That is, for CPython's immutable ``str`` and ``bytes`` objects, the
|
||
methods return the original object when the affix is not found or if
|
||
the affix is empty. Because these types test for equality using
|
||
shortcuts for identity and length, the following equivalent
|
||
expressions are evaluated at approximately the same speed, for any
|
||
``str`` objects (or ``bytes`` objects) ``x`` and ``y``::
|
||
|
||
>>> (True, x[len(y):]) if x.startswith(y) else (False, x)
|
||
>>> (True, z) if x != (z := x.cutprefix(y)) else (False, x)
|
||
|
||
|
||
The two methods will also be added to ``collections.UserString``,
|
||
where they rely on the implementation of the new ``str`` methods.
|
||
|
||
|
||
Motivating examples from the Python standard library
|
||
====================================================
|
||
|
||
The examples below demonstrate how the proposed methods can make code
|
||
one or more of the following:
|
||
|
||
Less fragile:
|
||
The code will not depend on the user to count the length of a
|
||
literal.
|
||
More performant:
|
||
The code does not require a call to the Python built-in
|
||
``len`` function.
|
||
More descriptive:
|
||
The methods give a higher-level API for code readability, as
|
||
opposed to the traditional method of string slicing.
|
||
|
||
|
||
refactor.py
|
||
-----------
|
||
|
||
- Current::
|
||
|
||
if fix_name.startswith(self.FILE_PREFIX):
|
||
fix_name = fix_name[len(self.FILE_PREFIX):]
|
||
|
||
- Improved::
|
||
|
||
fix_name = fix_name.cutprefix(self.FILE_PREFIX)
|
||
|
||
|
||
c_annotations.py:
|
||
-----------------
|
||
|
||
- Current::
|
||
|
||
if name.startswith("c."):
|
||
name = name[2:]
|
||
|
||
- Improved::
|
||
|
||
name = name.cutprefix("c.")
|
||
|
||
|
||
find_recursionlimit.py
|
||
----------------------
|
||
|
||
- Current::
|
||
|
||
if test_func_name.startswith("test_"):
|
||
print(test_func_name[5:])
|
||
else:
|
||
print(test_func_name)
|
||
|
||
- Improved::
|
||
|
||
print(test_finc_name.cutprefix("test_"))
|
||
|
||
deccheck.py
|
||
-----------
|
||
|
||
This is an interesting case because the author chose to use the
|
||
``str.replace`` method in a situation where only a prefix was
|
||
intended to be removed.
|
||
|
||
- Current::
|
||
|
||
if funcname.startswith("context."):
|
||
self.funcname = funcname.replace("context.", "")
|
||
self.contextfunc = True
|
||
else:
|
||
self.funcname = funcname
|
||
self.contextfunc = False
|
||
|
||
- Improved::
|
||
|
||
if funcname.startswith("context."):
|
||
self.funcname = funcname.cutprefix("context.")
|
||
self.contextfunc = True
|
||
else:
|
||
self.funcname = funcname
|
||
self.contextfunc = False
|
||
|
||
- Arguably further improved::
|
||
|
||
self.contextfunc = funcname.startswith("context.")
|
||
self.funcname = funcname.cutprefix("context.")
|
||
|
||
|
||
test_i18n.py
|
||
------------
|
||
|
||
- Current::
|
||
|
||
if test_func_name.startswith("test_"):
|
||
print(test_func_name[5:])
|
||
else:
|
||
print(test_func_name)
|
||
|
||
- Improved::
|
||
|
||
print(test_finc_name.cutprefix("test_"))
|
||
|
||
- Current::
|
||
|
||
if creationDate.endswith('\\n'):
|
||
creationDate = creationDate[:-len('\\n')]
|
||
|
||
- Improved::
|
||
|
||
creationDate = creationDate.cutsuffix('\\n')
|
||
|
||
|
||
shared_memory.py
|
||
----------------
|
||
|
||
- Current::
|
||
|
||
reported_name = self._name
|
||
if _USE_POSIX and self._prepend_leading_slash:
|
||
if self._name.startswith("/"):
|
||
reported_name = self._name[1:]
|
||
return reported_name
|
||
|
||
- Improved::
|
||
|
||
if _USE_POSIX and self._prepend_leading_slash:
|
||
return self._name.cutprefix("/")
|
||
return self._name
|
||
|
||
|
||
build-installer.py
|
||
------------------
|
||
|
||
- Current::
|
||
|
||
if archiveName.endswith('.tar.gz'):
|
||
retval = os.path.basename(archiveName[:-7])
|
||
if ((retval.startswith('tcl') or retval.startswith('tk'))
|
||
and retval.endswith('-src')):
|
||
retval = retval[:-4]
|
||
|
||
- Improved::
|
||
|
||
if archiveName.endswith('.tar.gz'):
|
||
retval = os.path.basename(archiveName[:-7])
|
||
if retval.startswith(('tcl', 'tk')):
|
||
retval = retval.cutsuffix('-src')
|
||
|
||
Depending on personal style, ``archiveName[:-7]`` could also be
|
||
changed to ``archiveName.cutsuffix('.tar.gz')``.
|
||
|
||
|
||
test_core.py
|
||
------------
|
||
|
||
- Current::
|
||
|
||
if output.endswith("\n"):
|
||
output = output[:-1]
|
||
|
||
- Improved::
|
||
|
||
output = output.cutsuffix("\n")
|
||
|
||
|
||
cookiejar.py
|
||
------------
|
||
|
||
- Current::
|
||
|
||
def strip_quotes(text):
|
||
if text.startswith('"'):
|
||
text = text[1:]
|
||
if text.endswith('"'):
|
||
text = text[:-1]
|
||
return text
|
||
|
||
- Improved::
|
||
|
||
def strip_quotes(text):
|
||
return text.cutprefix('"').cutsuffix('"')
|
||
|
||
- Current::
|
||
|
||
if line.endswith("\n"): line = line[:-1]
|
||
|
||
- Improved::
|
||
|
||
line = line.cutsuffix("\n")
|
||
|
||
|
||
fixdiv.py
|
||
---------
|
||
|
||
- Current::
|
||
|
||
def chop(line):
|
||
if line.endswith("\n"):
|
||
return line[:-1]
|
||
else:
|
||
return line
|
||
|
||
- Improved::
|
||
|
||
def chop(line):
|
||
return line.cutsuffix("\n")
|
||
|
||
|
||
test_concurrent_futures.py
|
||
--------------------------
|
||
|
||
In the following example, the meaning of the code changes slightly,
|
||
but in context, it behaves the same.
|
||
|
||
- Current::
|
||
|
||
if name.endswith(('Mixin', 'Tests')):
|
||
return name[:-5]
|
||
elif name.endswith('Test'):
|
||
return name[:-4]
|
||
else:
|
||
return name
|
||
|
||
- Improved::
|
||
|
||
return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test')
|
||
|
||
|
||
msvc9compiler.py
|
||
----------------
|
||
|
||
- Current::
|
||
|
||
if value.endswith(os.pathsep):
|
||
value = value[:-1]
|
||
|
||
- Improved::
|
||
|
||
value = value.cutsuffix(os.pathsep)
|
||
|
||
|
||
test_pathlib.py
|
||
---------------
|
||
|
||
- Current::
|
||
|
||
self.assertTrue(r.startswith(clsname + '('), r)
|
||
self.assertTrue(r.endswith(')'), r)
|
||
inner = r[len(clsname) + 1 : -1]
|
||
|
||
- Improved::
|
||
|
||
self.assertTrue(r.startswith(clsname + '('), r)
|
||
self.assertTrue(r.endswith(')'), r)
|
||
inner = r.cutprefix(clsname + '(').cutsuffix(')')
|
||
|
||
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
Expand the lstrip and rstrip APIs
|
||
---------------------------------
|
||
|
||
Because ``lstrip`` takes a string as its argument, it could be viewed
|
||
as taking an iterable of length-1 strings. The API could therefore be
|
||
generalized to accept any iterable of strings, which would be
|
||
successively removed as prefixes. While this behavior would be
|
||
consistent, it would not be obvious for users to have to call
|
||
``'foobar'.cutprefix(('foo,))`` for the common use case of a
|
||
single prefix.
|
||
|
||
Allow multiple prefixes
|
||
-----------------------
|
||
|
||
Some users discussed the desire to be able to remove multiple
|
||
prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``.
|
||
However, this adds ambiguity about the order in which the prefixes are
|
||
removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``.
|
||
After this proposal, this can be spelled explicitly as
|
||
``s.cutprefix('Foo').cutprefix('FooBar')``.
|
||
|
||
Remove multiple copies of a prefix
|
||
----------------------------------
|
||
|
||
This is the behavior that would be consistent with the aforementioned
|
||
expansion of the ``lstrip/rstrip`` API -- repeatedly applying the
|
||
function until the argument is unchanged. This behavior is attainable
|
||
from the proposed behavior via the following::
|
||
|
||
>>> s = 'foo' * 100 + 'bar'
|
||
>>> while s != (s := s.cutprefix("foo")): pass
|
||
>>> s
|
||
'bar'
|
||
|
||
The above can be modififed by chaining multiple ``cutprefix`` calls
|
||
together to achieve the full behavior of the ``lstrip``/``rstrip``
|
||
generalization, while being explicit in the order of removal.
|
||
|
||
While the proposed API could later be extended to include some of
|
||
these use cases, to do so before any observation of how these methods
|
||
are used in practice would be premature and may lead to choosing the
|
||
wrong behavior.
|
||
|
||
|
||
Raising an exception when not found
|
||
-----------------------------------
|
||
|
||
There was a suggestion that ``s.cutprefix(pre)`` should raise an
|
||
exception if ``not s.startswith(pre)``. However, this does not match
|
||
with the behavior and feel of other string methods. There could be
|
||
``required=False`` keyword added, but this violates the KISS
|
||
principle.
|
||
|
||
|
||
Alternative Method Names
|
||
------------------------
|
||
|
||
Several alternatives method names have been proposed. Some are listed
|
||
below, along with commentary for why they should be rejected in favor
|
||
of ``cutprefix`` (the same arguments hold for ``cutsuffix``)
|
||
|
||
``ltrim``
|
||
"Trim" does in other languages (e.g. JavaScript, Java, Go,
|
||
PHP) what ``strip`` methods do in Python.
|
||
``lstrip(string=...)``
|
||
This would avoid adding a new method, but for different
|
||
behavior, it's better to have two different methods than one
|
||
method with a keyword argument that select the behavior.
|
||
``cut_prefix``
|
||
All of the other methods of the string API, e.g.
|
||
``str.startswith()``, use ``lowercase`` rather than
|
||
``lower_case_with_underscores``.
|
||
``cutleft``, ``leftcut``, or ``lcut``
|
||
The explicitness of "prefix" is preferred.
|
||
``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc.
|
||
All of these might have been acceptable, but they have more
|
||
characters than ``cut``. Some suggested that the verb "cut"
|
||
implies mutability, but the string API already contains verbs
|
||
like "replace", "strip", "split", and "swapcase".
|
||
``stripprefix``
|
||
Users may benefit from the mnemonic that "strip" means working
|
||
with sets of characters, while other methods work with
|
||
substrings, so re-using "strip" here should be avoided.
|
||
|
||
|
||
Reference Implementation
|
||
========================
|
||
|
||
See the pull request on GitHub [#pr]_.
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#pr] GitHub pull request with implementation
|
||
(https://github.com/python/cpython/pull/18939)
|
||
.. [#pyid] Discussion on Python-Ideas
|
||
(https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/)
|
||
.. [#confusion] Comment listing Bug Tracker and StackOverflow issues
|
||
(https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFIII3AX22K3N3KT7RB4DPBY3LPVG/)
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document is placed in the public domain or under the
|
||
CC0-1.0-Universal license, whichever is more permissive.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|