python-peps/pep-0616.rst

376 lines
11 KiB
ReStructuredText
Raw Normal View History

PEP: 616
Title: String methods to remove prefixes and suffixes
Author: Dennis Sweeney <sweeney.dennis650@gmail.com>
Sponsor: Eric V. Smith <eric@trueblade.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 19-Mar-2020
Python-Version: 3.9
2020-03-20 14:55:52 -04:00
Post-History: 20-Mar-2020
Abstract
========
This is a proposal to add two new methods, ``cutprefix`` and
``cutsuffix``, to the APIs of Python's various string objects. These
methods would remove a prefix or suffix (respectively) from a string,
if present, and would be added to to Unicode ``str`` objects, binary
``bytes`` and ``bytearray`` objects, and ``collections.UserString``.
Rationale
=========
There have been repeated issues [#confusion]_ on the Bug Tracker
and StackOverflow related to user confusion about the existing
``str.lstrip`` and ``str.rstrip`` methods. These users are typically
expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they
are surprised that the parameter for ``lstrip`` is interpreted as a
set of characters, not a substring. This repeated issue is evidence
that these methods are useful, and the new methods allow a cleaner
redirection of users to the desired behavior.
As another testimonial for the usefulness of these methods, several
users on Python-Ideas [#pyid]_ reported frequently including similar
functions in their own code for productivity. The implementation
often contained subtle mistakes regarding the handling of the empty
string (see `Specification`_).
Specification
=============
The builtin ``str`` class will gain two new methods with roughly the
following behavior::
def cutprefix(self, prefix, /):
if not isinstance(self, str):
raise TypeError()
self_str = str(self)
if isinstance(prefix, tuple):
for option in prefix:
if not isinstance(option, str):
raise TypeError()
option_str = str(option)
if self_str.startswith(option_str):
return self_str[len(option_str):]
return self_str[:]
if not isinstance(prefix, str):
raise TypeError()
prefix_str = str(prefix)
if self_str.startswith(prefix_str):
return self_str[len(prefix_str):]
else:
return self_str[:]
def cutsuffix(self, suffix, /):
if not isinstance(self, str):
raise TypeError()
self_str = str(self)
if isinstance(suffix, tuple):
for option in suffix:
if not isinstance(option, str):
raise TypeError()
option_str = str(option)
if option_str and self_str.endswith(option_str):
return self_str[:-len(option_str)]
return self_str[:]
if not isinstance(suffix, str):
raise TypeError()
suffix_str = str(suffix)
if suffix_str and self_str.startswith(suffix_str):
return self_str[:-len(suffix_str)]
else:
return self_str[:]
Note that without the check for the truthyness of suffixes,
``s.cutsuffix('')`` would be mishandled and always return the empty
string due to the unintended evaluation of ``self[:-0]``.
Methods with the corresponding semantics will be added to the builtin
``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes``
or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
will accept any bytes-like object or tuple of bytes-like objects as an
argument. The one-at-a-time checking of types matches the implementation
of ``startswith()`` and ``endswith()`` methods.
The ``self_str[:]`` copying behavior in the code ensures that the
``bytearray`` methods do not return ``self``, but it does not preclude
the ``str`` and ``bytes`` methods from returning ``self``. Because
``str`` and ``bytes`` instances are immutable, the ``cutprefix()``
and ``cutsuffix()`` methods on these objects methods may (but are not
required to) make the optimization of returning ``self`` if
``type(self) is str`` (``type(self) is bytes`` respectively)
and the given affixes are not found, or are empty. As such, following
behavior is considered a CPython implementation detail, and is not
guaranteed by this specification::
>>> x = 'foobar' * 10**6
>>> x.cutprefix('baz') is x is x.cutsuffix('baz')
True
>>> x.cutprefix('') is x is x.cutsuffix('')
True
To test whether any affixes were removed during the call, users
should use the constant-time behavior of comparing the lengths of
original and new strings::
>>> string = 'Python String Input'
>>> new_string = string.cutprefix("Py")
>>> modified = (len(string) != len(new_string))
>>> modified
True
Users may also continue using ``startswith()`` and ``endswith()``
methods for control flow instead of testing the lengths as above.
The two methods will also be added to ``collections.UserString``, with
similar behavior.
Motivating examples from the Python standard library
====================================================
The examples below demonstrate how the proposed methods can make code
one or more of the following:
1. Less fragile:
- The code will not depend on the user to count the length of a
literal.
2. More performant:
- The code does not require a call to the Python built-in
``len`` function, nor to the more expensive ``str.replace``
function.
3. More descriptive:
- The methods give a higher-level API for code readability, as
opposed to the traditional method of string slicing.
find_recursionlimit.py
----------------------
- Current::
if test_func_name.startswith("test_"):
print(test_func_name[5:])
else:
print(test_func_name)
- Improved::
print(test_func_name.cutprefix("test_"))
deccheck.py
-----------
This is an interesting case because the author chose to use the
``str.replace`` method in a situation where only a prefix was
intended to be removed.
- Current::
if funcname.startswith("context."):
self.funcname = funcname.replace("context.", "")
self.contextfunc = True
else:
self.funcname = funcname
self.contextfunc = False
- Improved::
if funcname.startswith("context."):
self.funcname = funcname.cutprefix("context.")
self.contextfunc = True
else:
self.funcname = funcname
self.contextfunc = False
- Arguably further improved::
self.contextfunc = funcname.startswith("context.")
self.funcname = funcname.cutprefix("context.")
cookiejar.py
------------
- Current::
def strip_quotes(text):
if text.startswith('"'):
text = text[1:]
if text.endswith('"'):
text = text[:-1]
return text
- Improved::
def strip_quotes(text):
return text.cutprefix('"').cutsuffix('"')
test_concurrent_futures.py
--------------------------
In the following example, the meaning of the code changes slightly,
but in context, it behaves the same.
- Current::
if name.endswith(('Mixin', 'Tests')):
return name[:-5]
elif name.endswith('Test'):
return name[:-4]
else:
return name
- Improved::
return name.cutsuffix(('Mixin', 'Tests', 'Test'))
There were many other such examples in the stdlib.
Rejected Ideas
==============
Expand the lstrip and rstrip APIs
---------------------------------
Because ``lstrip`` takes a string as its argument, it could be viewed
as taking an iterable of length-1 strings. The API could therefore be
generalized to accept any iterable of strings, which would be
successively removed as prefixes. While this behavior would be
consistent, it would not be obvious for users to have to call
``'foobar'.cutprefix(('foo,))`` for the common use case of a
single prefix.
Remove multiple copies of a prefix
----------------------------------
This is the behavior that would be consistent with the aforementioned
expansion of the ``lstrip``/``rstrip`` API -- repeatedly applying the
function until the argument is unchanged. This behavior is attainable
from the proposed behavior via by the following::
>>> s = 'foobar' * 100 + 'bar'
>>> prefixes = ('bar', 'foo')
>>> while len(s) != len(s := s.cutprefix(prefixes)): pass
>>> s
'bar'
or the more obvious and readable alternative::
>>> s = 'foo' * 100 + 'bar'
>>> prefixes = ('bar', 'foo')
>>> while s.startswith(prefixes): s = s.cutprefix(prefixes)
>>> s
'bar'
Raising an exception when not found
-----------------------------------
There was a suggestion that ``s.cutprefix(pre)`` should raise an
exception if ``not s.startswith(pre)``. However, this does not match
with the behavior and feel of other string methods. There could be
``required=False`` keyword added, but this violates the KISS
principle.
Alternative Method Names
------------------------
Several alternatives method names have been proposed. Some are listed
below, along with commentary for why they should be rejected in favor
of ``cutprefix`` (the same arguments hold for ``cutsuffix``).
- ``ltrim``
- "Trim" does in other languages (e.g. JavaScript, Java, Go,
PHP) what ``strip`` methods do in Python.
- ``lstrip(string=...)``
- This would avoid adding a new method, but for different
behavior, it's better to have two different methods than one
method with a keyword argument that select the behavior.
- ``cut_prefix``
- All of the other methods of the string API, e.g.
``str.startswith()``, use ``lowercase`` rather than
``lower_case_with_underscores``.
- ``cutleft``, ``leftcut``, or ``lcut``
- The explicitness of "prefix" is preferred.
- ``removeprefix``, ``deleteprefix``, ``withoutprefix``, ``dropprefix``, etc.
- All of these might have been acceptable, but they have more
characters than ``cut``. Some suggested that the verb "cut"
implies mutability, but the string API already contains verbs
like "replace", "strip", "split", and "swapcase".
- ``stripprefix``
- Users may benefit from remembering that "strip" means working
with sets of characters, while other methods work with
substrings, so re-using "strip" here should be avoided.
Reference Implementation
========================
See the pull request on GitHub [#pr]_.
References
==========
.. [#pr] GitHub pull request with implementation
(https://github.com/python/cpython/pull/18939)
.. [#pyid] Discussion on Python-Ideas
(https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/)
.. [#confusion] Comment listing Bug Tracker and StackOverflow issues
(https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFIII3AX22K3N3KT7RB4DPBY3LPVG/)
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: