PEP 616 Revisions (#1333)

* Add a PEP: String methods to remove prefixes and suffixes

* add PEP number

* Add sponsor

* Fix typo

* changes after review: passing tuples, clarity on returning self, formatting
This commit is contained in:
sweeneyde 2020-03-22 15:02:10 -04:00 committed by GitHub
parent 3ff1593f56
commit 0093a853e8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 142 additions and 255 deletions

View File

@ -14,20 +14,10 @@ Abstract
======== ========
This is a proposal to add two new methods, ``cutprefix`` and This is a proposal to add two new methods, ``cutprefix`` and
``cutsuffix``, to the APIs of Python's various string objects. In ``cutsuffix``, to the APIs of Python's various string objects. These
particular, the methods would be added to Unicode ``str`` objects, methods would remove a prefix or suffix (respectively) from a string,
binary ``bytes`` and ``bytearray`` objects, and if present, and would be added to to Unicode ``str`` objects, binary
``collections.UserString``. ``bytes`` and ``bytearray`` objects, and ``collections.UserString``.
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
been removed. If ``s`` does not have ``pre`` as a prefix, an
unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)``
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is
roughly equivalent to
``s[:-len(suf)] if suf and s.endswith(suf) else s``.
Rationale Rationale
@ -55,36 +45,79 @@ Specification
The builtin ``str`` class will gain two new methods with roughly the The builtin ``str`` class will gain two new methods with roughly the
following behavior:: following behavior::
def cutprefix(self: str, pre: str, /) -> str: def cutprefix(self, prefix, /):
if self.startswith(pre): if not isinstance(self, str):
return self[len(pre):] raise TypeError()
return self[:] self_str = str(self)
def cutsuffix(self: str, suf: str, /) -> str: if isinstance(prefix, tuple):
if suf and self.endswith(suf): for option in prefix:
return self[:-len(suf)] if not isinstance(option, str):
return self[:] raise TypeError()
option_str = str(option)
The only difference between the real implementation and the above is if self_str.startswith(option_str):
that, as with other string methods like ``replace``, the return self_str[len(option_str):]
methods will raise a ``TypeError`` if any of ``self``, ``pre`` or
``suf`` is not an instace of ``str``, and will cast subclasses of
``str`` to builtin ``str`` objects.
Note that without the check for the truthyness of ``suf``, return self_str[:]
if not isinstance(prefix, str):
raise TypeError()
prefix_str = str(prefix)
if self_str.startswith(prefix_str):
return self_str[len(prefix_str):]
else:
return self_str[:]
def cutsuffix(self, suffix, /):
if not isinstance(self, str):
raise TypeError()
self_str = str(self)
if isinstance(suffix, tuple):
for option in suffix:
if not isinstance(option, str):
raise TypeError()
option_str = str(option)
if option_str and self_str.endswith(option_str):
return self_str[:-len(option_str)]
return self_str[:]
if not isinstance(suffix, str):
raise TypeError()
suffix_str = str(suffix)
if suffix_str and self_str.startswith(suffix_str):
return self_str[:-len(suffix_str)]
else:
return self_str[:]
Note that without the check for the truthyness of suffixes,
``s.cutsuffix('')`` would be mishandled and always return the empty ``s.cutsuffix('')`` would be mishandled and always return the empty
string due to the unintended evaluation of ``self[:-0]``. string due to the unintended evaluation of ``self[:-0]``.
Methods with the corresponding semantics will be added to the builtin Methods with the corresponding semantics will be added to the builtin
``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` ``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes``
or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()`` or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
will accept any bytes-like object as an argument. will accept any bytes-like object or tuple of bytes-like objects as an
argument. The one-at-a-time checking of types matches the implementation
of ``startswith()`` and ``endswith()`` methods.
Note that the ``bytearray`` methods return a copy of ``self``; they do The ``self_str[:]`` copying behavior in the code ensures that the
not operate in place. ``bytearray`` methods do not return ``self``, but it does not preclude
the ``str`` and ``bytes`` methods from returning ``self``. Because
The following behavior is considered a CPython implementation detail, ``str`` and ``bytes`` instances are immutable, the ``cutprefix()``
but is not guaranteed by this specification:: and ``cutsuffix()`` methods on these objects methods may (but are not
required to) make the optimization of returning ``self`` if
``type(self) is str`` (``type(self) is bytes`` respectively)
and the given affixes are not found, or are empty. As such, following
behavior is considered a CPython implementation detail, and is not
guaranteed by this specification::
>>> x = 'foobar' * 10**6 >>> x = 'foobar' * 10**6
>>> x.cutprefix('baz') is x is x.cutsuffix('baz') >>> x.cutprefix('baz') is x is x.cutsuffix('baz')
@ -92,20 +125,21 @@ but is not guaranteed by this specification::
>>> x.cutprefix('') is x is x.cutsuffix('') >>> x.cutprefix('') is x is x.cutsuffix('')
True True
That is, for CPython's immutable ``str`` and ``bytes`` objects, the To test whether any affixes were removed during the call, users
methods return the original object when the affix is not found or if should use the constant-time behavior of comparing the lengths of
the affix is empty. Because these types test for equality using original and new strings::
shortcuts for identity and length, the following equivalent
expressions are evaluated at approximately the same speed, for any
``str`` objects (or ``bytes`` objects) ``x`` and ``y``::
>>> (True, x[len(y):]) if x.startswith(y) else (False, x) >>> string = 'Python String Input'
>>> (True, z) if x != (z := x.cutprefix(y)) else (False, x) >>> new_string = string.cutprefix("Py")
>>> modified = (len(string) != len(new_string))
>>> modified
True
Users may also continue using ``startswith()`` and ``endswith()``
methods for control flow instead of testing the lengths as above.
The two methods will also be added to ``collections.UserString``, The two methods will also be added to ``collections.UserString``, with
where they rely on the implementation of the new ``str`` methods. similar behavior.
Motivating examples from the Python standard library Motivating examples from the Python standard library
==================================================== ====================================================
@ -113,43 +147,23 @@ Motivating examples from the Python standard library
The examples below demonstrate how the proposed methods can make code The examples below demonstrate how the proposed methods can make code
one or more of the following: one or more of the following:
Less fragile: 1. Less fragile:
The code will not depend on the user to count the length of a
- The code will not depend on the user to count the length of a
literal. literal.
More performant:
The code does not require a call to the Python built-in 2. More performant:
``len`` function.
More descriptive: - The code does not require a call to the Python built-in
The methods give a higher-level API for code readability, as ``len`` function, nor to the more expensive ``str.replace``
function.
3. More descriptive:
- The methods give a higher-level API for code readability, as
opposed to the traditional method of string slicing. opposed to the traditional method of string slicing.
refactor.py
-----------
- Current::
if fix_name.startswith(self.FILE_PREFIX):
fix_name = fix_name[len(self.FILE_PREFIX):]
- Improved::
fix_name = fix_name.cutprefix(self.FILE_PREFIX)
c_annotations.py:
-----------------
- Current::
if name.startswith("c."):
name = name[2:]
- Improved::
name = name.cutprefix("c.")
find_recursionlimit.py find_recursionlimit.py
---------------------- ----------------------
@ -162,7 +176,8 @@ find_recursionlimit.py
- Improved:: - Improved::
print(test_finc_name.cutprefix("test_")) print(test_func_name.cutprefix("test_"))
deccheck.py deccheck.py
----------- -----------
@ -195,83 +210,6 @@ intended to be removed.
self.funcname = funcname.cutprefix("context.") self.funcname = funcname.cutprefix("context.")
test_i18n.py
------------
- Current::
if test_func_name.startswith("test_"):
print(test_func_name[5:])
else:
print(test_func_name)
- Improved::
print(test_finc_name.cutprefix("test_"))
- Current::
if creationDate.endswith('\\n'):
creationDate = creationDate[:-len('\\n')]
- Improved::
creationDate = creationDate.cutsuffix('\\n')
shared_memory.py
----------------
- Current::
reported_name = self._name
if _USE_POSIX and self._prepend_leading_slash:
if self._name.startswith("/"):
reported_name = self._name[1:]
return reported_name
- Improved::
if _USE_POSIX and self._prepend_leading_slash:
return self._name.cutprefix("/")
return self._name
build-installer.py
------------------
- Current::
if archiveName.endswith('.tar.gz'):
retval = os.path.basename(archiveName[:-7])
if ((retval.startswith('tcl') or retval.startswith('tk'))
and retval.endswith('-src')):
retval = retval[:-4]
- Improved::
if archiveName.endswith('.tar.gz'):
retval = os.path.basename(archiveName[:-7])
if retval.startswith(('tcl', 'tk')):
retval = retval.cutsuffix('-src')
Depending on personal style, ``archiveName[:-7]`` could also be
changed to ``archiveName.cutsuffix('.tar.gz')``.
test_core.py
------------
- Current::
if output.endswith("\n"):
output = output[:-1]
- Improved::
output = output.cutsuffix("\n")
cookiejar.py cookiejar.py
------------ ------------
@ -289,31 +227,6 @@ cookiejar.py
def strip_quotes(text): def strip_quotes(text):
return text.cutprefix('"').cutsuffix('"') return text.cutprefix('"').cutsuffix('"')
- Current::
if line.endswith("\n"): line = line[:-1]
- Improved::
line = line.cutsuffix("\n")
fixdiv.py
---------
- Current::
def chop(line):
if line.endswith("\n"):
return line[:-1]
else:
return line
- Improved::
def chop(line):
return line.cutsuffix("\n")
test_concurrent_futures.py test_concurrent_futures.py
-------------------------- --------------------------
@ -332,37 +245,10 @@ but in context, it behaves the same.
- Improved:: - Improved::
return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test') return name.cutsuffix(('Mixin', 'Tests', 'Test'))
msvc9compiler.py There were many other such examples in the stdlib.
----------------
- Current::
if value.endswith(os.pathsep):
value = value[:-1]
- Improved::
value = value.cutsuffix(os.pathsep)
test_pathlib.py
---------------
- Current::
self.assertTrue(r.startswith(clsname + '('), r)
self.assertTrue(r.endswith(')'), r)
inner = r[len(clsname) + 1 : -1]
- Improved::
self.assertTrue(r.startswith(clsname + '('), r)
self.assertTrue(r.endswith(')'), r)
inner = r.cutprefix(clsname + '(').cutsuffix(')')
Rejected Ideas Rejected Ideas
@ -379,37 +265,27 @@ consistent, it would not be obvious for users to have to call
``'foobar'.cutprefix(('foo,))`` for the common use case of a ``'foobar'.cutprefix(('foo,))`` for the common use case of a
single prefix. single prefix.
Allow multiple prefixes
-----------------------
Some users discussed the desire to be able to remove multiple
prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``.
However, this adds ambiguity about the order in which the prefixes are
removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``.
After this proposal, this can be spelled explicitly as
``s.cutprefix('Foo').cutprefix('FooBar')``.
Remove multiple copies of a prefix Remove multiple copies of a prefix
---------------------------------- ----------------------------------
This is the behavior that would be consistent with the aforementioned This is the behavior that would be consistent with the aforementioned
expansion of the ``lstrip/rstrip`` API -- repeatedly applying the expansion of the ``lstrip``/``rstrip`` API -- repeatedly applying the
function until the argument is unchanged. This behavior is attainable function until the argument is unchanged. This behavior is attainable
from the proposed behavior via the following:: from the proposed behavior via by the following::
>>> s = 'foo' * 100 + 'bar' >>> s = 'foobar' * 100 + 'bar'
>>> while s != (s := s.cutprefix("foo")): pass >>> prefixes = ('bar', 'foo')
>>> while len(s) != len(s := s.cutprefix(prefixes)): pass
>>> s >>> s
'bar' 'bar'
The above can be modififed by chaining multiple ``cutprefix`` calls or the more obvious and readable alternative::
together to achieve the full behavior of the ``lstrip``/``rstrip``
generalization, while being explicit in the order of removal.
While the proposed API could later be extended to include some of >>> s = 'foo' * 100 + 'bar'
these use cases, to do so before any observation of how these methods >>> prefixes = ('bar', 'foo')
are used in practice would be premature and may lead to choosing the >>> while s.startswith(prefixes): s = s.cutprefix(prefixes)
wrong behavior. >>> s
'bar'
Raising an exception when not found Raising an exception when not found
@ -427,28 +303,39 @@ Alternative Method Names
Several alternatives method names have been proposed. Some are listed Several alternatives method names have been proposed. Some are listed
below, along with commentary for why they should be rejected in favor below, along with commentary for why they should be rejected in favor
of ``cutprefix`` (the same arguments hold for ``cutsuffix``) of ``cutprefix`` (the same arguments hold for ``cutsuffix``).
``ltrim`` - ``ltrim``
"Trim" does in other languages (e.g. JavaScript, Java, Go,
- "Trim" does in other languages (e.g. JavaScript, Java, Go,
PHP) what ``strip`` methods do in Python. PHP) what ``strip`` methods do in Python.
``lstrip(string=...)``
This would avoid adding a new method, but for different - ``lstrip(string=...)``
- This would avoid adding a new method, but for different
behavior, it's better to have two different methods than one behavior, it's better to have two different methods than one
method with a keyword argument that select the behavior. method with a keyword argument that select the behavior.
``cut_prefix``
All of the other methods of the string API, e.g. - ``cut_prefix``
- All of the other methods of the string API, e.g.
``str.startswith()``, use ``lowercase`` rather than ``str.startswith()``, use ``lowercase`` rather than
``lower_case_with_underscores``. ``lower_case_with_underscores``.
``cutleft``, ``leftcut``, or ``lcut``
The explicitness of "prefix" is preferred. - ``cutleft``, ``leftcut``, or ``lcut``
``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc.
All of these might have been acceptable, but they have more - The explicitness of "prefix" is preferred.
- ``removeprefix``, ``deleteprefix``, ``withoutprefix``, ``dropprefix``, etc.
- All of these might have been acceptable, but they have more
characters than ``cut``. Some suggested that the verb "cut" characters than ``cut``. Some suggested that the verb "cut"
implies mutability, but the string API already contains verbs implies mutability, but the string API already contains verbs
like "replace", "strip", "split", and "swapcase". like "replace", "strip", "split", and "swapcase".
``stripprefix``
Users may benefit from the mnemonic that "strip" means working - ``stripprefix``
- Users may benefit from remembering that "strip" means working
with sets of characters, while other methods work with with sets of characters, while other methods work with
substrings, so re-using "strip" here should be avoided. substrings, so re-using "strip" here should be avoided.