PEP 616 Revisions (#1333)
* Add a PEP: String methods to remove prefixes and suffixes * add PEP number * Add sponsor * Fix typo * changes after review: passing tuples, clarity on returning self, formatting
This commit is contained in:
parent
3ff1593f56
commit
0093a853e8
375
pep-0616.rst
375
pep-0616.rst
|
@ -14,20 +14,10 @@ Abstract
|
||||||
========
|
========
|
||||||
|
|
||||||
This is a proposal to add two new methods, ``cutprefix`` and
|
This is a proposal to add two new methods, ``cutprefix`` and
|
||||||
``cutsuffix``, to the APIs of Python's various string objects. In
|
``cutsuffix``, to the APIs of Python's various string objects. These
|
||||||
particular, the methods would be added to Unicode ``str`` objects,
|
methods would remove a prefix or suffix (respectively) from a string,
|
||||||
binary ``bytes`` and ``bytearray`` objects, and
|
if present, and would be added to to Unicode ``str`` objects, binary
|
||||||
``collections.UserString``.
|
``bytes`` and ``bytearray`` objects, and ``collections.UserString``.
|
||||||
|
|
||||||
If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then
|
|
||||||
``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has
|
|
||||||
been removed. If ``s`` does not have ``pre`` as a prefix, an
|
|
||||||
unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)``
|
|
||||||
is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``.
|
|
||||||
|
|
||||||
The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is
|
|
||||||
roughly equivalent to
|
|
||||||
``s[:-len(suf)] if suf and s.endswith(suf) else s``.
|
|
||||||
|
|
||||||
|
|
||||||
Rationale
|
Rationale
|
||||||
|
@ -55,36 +45,79 @@ Specification
|
||||||
The builtin ``str`` class will gain two new methods with roughly the
|
The builtin ``str`` class will gain two new methods with roughly the
|
||||||
following behavior::
|
following behavior::
|
||||||
|
|
||||||
def cutprefix(self: str, pre: str, /) -> str:
|
def cutprefix(self, prefix, /):
|
||||||
if self.startswith(pre):
|
if not isinstance(self, str):
|
||||||
return self[len(pre):]
|
raise TypeError()
|
||||||
return self[:]
|
self_str = str(self)
|
||||||
|
|
||||||
def cutsuffix(self: str, suf: str, /) -> str:
|
if isinstance(prefix, tuple):
|
||||||
if suf and self.endswith(suf):
|
for option in prefix:
|
||||||
return self[:-len(suf)]
|
if not isinstance(option, str):
|
||||||
return self[:]
|
raise TypeError()
|
||||||
|
option_str = str(option)
|
||||||
|
|
||||||
The only difference between the real implementation and the above is
|
if self_str.startswith(option_str):
|
||||||
that, as with other string methods like ``replace``, the
|
return self_str[len(option_str):]
|
||||||
methods will raise a ``TypeError`` if any of ``self``, ``pre`` or
|
|
||||||
``suf`` is not an instace of ``str``, and will cast subclasses of
|
|
||||||
``str`` to builtin ``str`` objects.
|
|
||||||
|
|
||||||
Note that without the check for the truthyness of ``suf``,
|
return self_str[:]
|
||||||
|
|
||||||
|
if not isinstance(prefix, str):
|
||||||
|
raise TypeError()
|
||||||
|
|
||||||
|
prefix_str = str(prefix)
|
||||||
|
|
||||||
|
if self_str.startswith(prefix_str):
|
||||||
|
return self_str[len(prefix_str):]
|
||||||
|
else:
|
||||||
|
return self_str[:]
|
||||||
|
|
||||||
|
|
||||||
|
def cutsuffix(self, suffix, /):
|
||||||
|
if not isinstance(self, str):
|
||||||
|
raise TypeError()
|
||||||
|
self_str = str(self)
|
||||||
|
|
||||||
|
if isinstance(suffix, tuple):
|
||||||
|
for option in suffix:
|
||||||
|
if not isinstance(option, str):
|
||||||
|
raise TypeError()
|
||||||
|
option_str = str(option)
|
||||||
|
|
||||||
|
if option_str and self_str.endswith(option_str):
|
||||||
|
return self_str[:-len(option_str)]
|
||||||
|
|
||||||
|
return self_str[:]
|
||||||
|
|
||||||
|
if not isinstance(suffix, str):
|
||||||
|
raise TypeError()
|
||||||
|
suffix_str = str(suffix)
|
||||||
|
|
||||||
|
if suffix_str and self_str.startswith(suffix_str):
|
||||||
|
return self_str[:-len(suffix_str)]
|
||||||
|
else:
|
||||||
|
return self_str[:]
|
||||||
|
|
||||||
|
Note that without the check for the truthyness of suffixes,
|
||||||
``s.cutsuffix('')`` would be mishandled and always return the empty
|
``s.cutsuffix('')`` would be mishandled and always return the empty
|
||||||
string due to the unintended evaluation of ``self[:-0]``.
|
string due to the unintended evaluation of ``self[:-0]``.
|
||||||
|
|
||||||
Methods with the corresponding semantics will be added to the builtin
|
Methods with the corresponding semantics will be added to the builtin
|
||||||
``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes``
|
``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes``
|
||||||
or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
|
or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()``
|
||||||
will accept any bytes-like object as an argument.
|
will accept any bytes-like object or tuple of bytes-like objects as an
|
||||||
|
argument. The one-at-a-time checking of types matches the implementation
|
||||||
|
of ``startswith()`` and ``endswith()`` methods.
|
||||||
|
|
||||||
Note that the ``bytearray`` methods return a copy of ``self``; they do
|
The ``self_str[:]`` copying behavior in the code ensures that the
|
||||||
not operate in place.
|
``bytearray`` methods do not return ``self``, but it does not preclude
|
||||||
|
the ``str`` and ``bytes`` methods from returning ``self``. Because
|
||||||
The following behavior is considered a CPython implementation detail,
|
``str`` and ``bytes`` instances are immutable, the ``cutprefix()``
|
||||||
but is not guaranteed by this specification::
|
and ``cutsuffix()`` methods on these objects methods may (but are not
|
||||||
|
required to) make the optimization of returning ``self`` if
|
||||||
|
``type(self) is str`` (``type(self) is bytes`` respectively)
|
||||||
|
and the given affixes are not found, or are empty. As such, following
|
||||||
|
behavior is considered a CPython implementation detail, and is not
|
||||||
|
guaranteed by this specification::
|
||||||
|
|
||||||
>>> x = 'foobar' * 10**6
|
>>> x = 'foobar' * 10**6
|
||||||
>>> x.cutprefix('baz') is x is x.cutsuffix('baz')
|
>>> x.cutprefix('baz') is x is x.cutsuffix('baz')
|
||||||
|
@ -92,20 +125,21 @@ but is not guaranteed by this specification::
|
||||||
>>> x.cutprefix('') is x is x.cutsuffix('')
|
>>> x.cutprefix('') is x is x.cutsuffix('')
|
||||||
True
|
True
|
||||||
|
|
||||||
That is, for CPython's immutable ``str`` and ``bytes`` objects, the
|
To test whether any affixes were removed during the call, users
|
||||||
methods return the original object when the affix is not found or if
|
should use the constant-time behavior of comparing the lengths of
|
||||||
the affix is empty. Because these types test for equality using
|
original and new strings::
|
||||||
shortcuts for identity and length, the following equivalent
|
|
||||||
expressions are evaluated at approximately the same speed, for any
|
|
||||||
``str`` objects (or ``bytes`` objects) ``x`` and ``y``::
|
|
||||||
|
|
||||||
>>> (True, x[len(y):]) if x.startswith(y) else (False, x)
|
>>> string = 'Python String Input'
|
||||||
>>> (True, z) if x != (z := x.cutprefix(y)) else (False, x)
|
>>> new_string = string.cutprefix("Py")
|
||||||
|
>>> modified = (len(string) != len(new_string))
|
||||||
|
>>> modified
|
||||||
|
True
|
||||||
|
|
||||||
|
Users may also continue using ``startswith()`` and ``endswith()``
|
||||||
|
methods for control flow instead of testing the lengths as above.
|
||||||
|
|
||||||
The two methods will also be added to ``collections.UserString``,
|
The two methods will also be added to ``collections.UserString``, with
|
||||||
where they rely on the implementation of the new ``str`` methods.
|
similar behavior.
|
||||||
|
|
||||||
|
|
||||||
Motivating examples from the Python standard library
|
Motivating examples from the Python standard library
|
||||||
====================================================
|
====================================================
|
||||||
|
@ -113,43 +147,23 @@ Motivating examples from the Python standard library
|
||||||
The examples below demonstrate how the proposed methods can make code
|
The examples below demonstrate how the proposed methods can make code
|
||||||
one or more of the following:
|
one or more of the following:
|
||||||
|
|
||||||
Less fragile:
|
1. Less fragile:
|
||||||
The code will not depend on the user to count the length of a
|
|
||||||
|
- The code will not depend on the user to count the length of a
|
||||||
literal.
|
literal.
|
||||||
More performant:
|
|
||||||
The code does not require a call to the Python built-in
|
2. More performant:
|
||||||
``len`` function.
|
|
||||||
More descriptive:
|
- The code does not require a call to the Python built-in
|
||||||
The methods give a higher-level API for code readability, as
|
``len`` function, nor to the more expensive ``str.replace``
|
||||||
|
function.
|
||||||
|
|
||||||
|
3. More descriptive:
|
||||||
|
|
||||||
|
- The methods give a higher-level API for code readability, as
|
||||||
opposed to the traditional method of string slicing.
|
opposed to the traditional method of string slicing.
|
||||||
|
|
||||||
|
|
||||||
refactor.py
|
|
||||||
-----------
|
|
||||||
|
|
||||||
- Current::
|
|
||||||
|
|
||||||
if fix_name.startswith(self.FILE_PREFIX):
|
|
||||||
fix_name = fix_name[len(self.FILE_PREFIX):]
|
|
||||||
|
|
||||||
- Improved::
|
|
||||||
|
|
||||||
fix_name = fix_name.cutprefix(self.FILE_PREFIX)
|
|
||||||
|
|
||||||
|
|
||||||
c_annotations.py:
|
|
||||||
-----------------
|
|
||||||
|
|
||||||
- Current::
|
|
||||||
|
|
||||||
if name.startswith("c."):
|
|
||||||
name = name[2:]
|
|
||||||
|
|
||||||
- Improved::
|
|
||||||
|
|
||||||
name = name.cutprefix("c.")
|
|
||||||
|
|
||||||
|
|
||||||
find_recursionlimit.py
|
find_recursionlimit.py
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
|
@ -162,7 +176,8 @@ find_recursionlimit.py
|
||||||
|
|
||||||
- Improved::
|
- Improved::
|
||||||
|
|
||||||
print(test_finc_name.cutprefix("test_"))
|
print(test_func_name.cutprefix("test_"))
|
||||||
|
|
||||||
|
|
||||||
deccheck.py
|
deccheck.py
|
||||||
-----------
|
-----------
|
||||||
|
@ -195,83 +210,6 @@ intended to be removed.
|
||||||
self.funcname = funcname.cutprefix("context.")
|
self.funcname = funcname.cutprefix("context.")
|
||||||
|
|
||||||
|
|
||||||
test_i18n.py
|
|
||||||
------------
|
|
||||||
|
|
||||||
- Current::
|
|
||||||
|
|
||||||
if test_func_name.startswith("test_"):
|
|
||||||
print(test_func_name[5:])
|
|
||||||
else:
|
|
||||||
print(test_func_name)
|
|
||||||
|
|
||||||
- Improved::
|
|
||||||
|
|
||||||
print(test_finc_name.cutprefix("test_"))
|
|
||||||
|
|
||||||
- Current::
|
|
||||||
|
|
||||||
if creationDate.endswith('\\n'):
|
|
||||||
creationDate = creationDate[:-len('\\n')]
|
|
||||||
|
|
||||||
- Improved::
|
|
||||||
|
|
||||||
creationDate = creationDate.cutsuffix('\\n')
|
|
||||||
|
|
||||||
|
|
||||||
shared_memory.py
|
|
||||||
----------------
|
|
||||||
|
|
||||||
- Current::
|
|
||||||
|
|
||||||
reported_name = self._name
|
|
||||||
if _USE_POSIX and self._prepend_leading_slash:
|
|
||||||
if self._name.startswith("/"):
|
|
||||||
reported_name = self._name[1:]
|
|
||||||
return reported_name
|
|
||||||
|
|
||||||
- Improved::
|
|
||||||
|
|
||||||
if _USE_POSIX and self._prepend_leading_slash:
|
|
||||||
return self._name.cutprefix("/")
|
|
||||||
return self._name
|
|
||||||
|
|
||||||
|
|
||||||
build-installer.py
|
|
||||||
------------------
|
|
||||||
|
|
||||||
- Current::
|
|
||||||
|
|
||||||
if archiveName.endswith('.tar.gz'):
|
|
||||||
retval = os.path.basename(archiveName[:-7])
|
|
||||||
if ((retval.startswith('tcl') or retval.startswith('tk'))
|
|
||||||
and retval.endswith('-src')):
|
|
||||||
retval = retval[:-4]
|
|
||||||
|
|
||||||
- Improved::
|
|
||||||
|
|
||||||
if archiveName.endswith('.tar.gz'):
|
|
||||||
retval = os.path.basename(archiveName[:-7])
|
|
||||||
if retval.startswith(('tcl', 'tk')):
|
|
||||||
retval = retval.cutsuffix('-src')
|
|
||||||
|
|
||||||
Depending on personal style, ``archiveName[:-7]`` could also be
|
|
||||||
changed to ``archiveName.cutsuffix('.tar.gz')``.
|
|
||||||
|
|
||||||
|
|
||||||
test_core.py
|
|
||||||
------------
|
|
||||||
|
|
||||||
- Current::
|
|
||||||
|
|
||||||
if output.endswith("\n"):
|
|
||||||
output = output[:-1]
|
|
||||||
|
|
||||||
- Improved::
|
|
||||||
|
|
||||||
output = output.cutsuffix("\n")
|
|
||||||
|
|
||||||
|
|
||||||
cookiejar.py
|
cookiejar.py
|
||||||
------------
|
------------
|
||||||
|
|
||||||
|
@ -289,31 +227,6 @@ cookiejar.py
|
||||||
def strip_quotes(text):
|
def strip_quotes(text):
|
||||||
return text.cutprefix('"').cutsuffix('"')
|
return text.cutprefix('"').cutsuffix('"')
|
||||||
|
|
||||||
- Current::
|
|
||||||
|
|
||||||
if line.endswith("\n"): line = line[:-1]
|
|
||||||
|
|
||||||
- Improved::
|
|
||||||
|
|
||||||
line = line.cutsuffix("\n")
|
|
||||||
|
|
||||||
|
|
||||||
fixdiv.py
|
|
||||||
---------
|
|
||||||
|
|
||||||
- Current::
|
|
||||||
|
|
||||||
def chop(line):
|
|
||||||
if line.endswith("\n"):
|
|
||||||
return line[:-1]
|
|
||||||
else:
|
|
||||||
return line
|
|
||||||
|
|
||||||
- Improved::
|
|
||||||
|
|
||||||
def chop(line):
|
|
||||||
return line.cutsuffix("\n")
|
|
||||||
|
|
||||||
|
|
||||||
test_concurrent_futures.py
|
test_concurrent_futures.py
|
||||||
--------------------------
|
--------------------------
|
||||||
|
@ -332,37 +245,10 @@ but in context, it behaves the same.
|
||||||
|
|
||||||
- Improved::
|
- Improved::
|
||||||
|
|
||||||
return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test')
|
return name.cutsuffix(('Mixin', 'Tests', 'Test'))
|
||||||
|
|
||||||
|
|
||||||
msvc9compiler.py
|
There were many other such examples in the stdlib.
|
||||||
----------------
|
|
||||||
|
|
||||||
- Current::
|
|
||||||
|
|
||||||
if value.endswith(os.pathsep):
|
|
||||||
value = value[:-1]
|
|
||||||
|
|
||||||
- Improved::
|
|
||||||
|
|
||||||
value = value.cutsuffix(os.pathsep)
|
|
||||||
|
|
||||||
|
|
||||||
test_pathlib.py
|
|
||||||
---------------
|
|
||||||
|
|
||||||
- Current::
|
|
||||||
|
|
||||||
self.assertTrue(r.startswith(clsname + '('), r)
|
|
||||||
self.assertTrue(r.endswith(')'), r)
|
|
||||||
inner = r[len(clsname) + 1 : -1]
|
|
||||||
|
|
||||||
- Improved::
|
|
||||||
|
|
||||||
self.assertTrue(r.startswith(clsname + '('), r)
|
|
||||||
self.assertTrue(r.endswith(')'), r)
|
|
||||||
inner = r.cutprefix(clsname + '(').cutsuffix(')')
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Rejected Ideas
|
Rejected Ideas
|
||||||
|
@ -379,37 +265,27 @@ consistent, it would not be obvious for users to have to call
|
||||||
``'foobar'.cutprefix(('foo,))`` for the common use case of a
|
``'foobar'.cutprefix(('foo,))`` for the common use case of a
|
||||||
single prefix.
|
single prefix.
|
||||||
|
|
||||||
Allow multiple prefixes
|
|
||||||
-----------------------
|
|
||||||
|
|
||||||
Some users discussed the desire to be able to remove multiple
|
|
||||||
prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``.
|
|
||||||
However, this adds ambiguity about the order in which the prefixes are
|
|
||||||
removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``.
|
|
||||||
After this proposal, this can be spelled explicitly as
|
|
||||||
``s.cutprefix('Foo').cutprefix('FooBar')``.
|
|
||||||
|
|
||||||
Remove multiple copies of a prefix
|
Remove multiple copies of a prefix
|
||||||
----------------------------------
|
----------------------------------
|
||||||
|
|
||||||
This is the behavior that would be consistent with the aforementioned
|
This is the behavior that would be consistent with the aforementioned
|
||||||
expansion of the ``lstrip/rstrip`` API -- repeatedly applying the
|
expansion of the ``lstrip``/``rstrip`` API -- repeatedly applying the
|
||||||
function until the argument is unchanged. This behavior is attainable
|
function until the argument is unchanged. This behavior is attainable
|
||||||
from the proposed behavior via the following::
|
from the proposed behavior via by the following::
|
||||||
|
|
||||||
>>> s = 'foo' * 100 + 'bar'
|
>>> s = 'foobar' * 100 + 'bar'
|
||||||
>>> while s != (s := s.cutprefix("foo")): pass
|
>>> prefixes = ('bar', 'foo')
|
||||||
|
>>> while len(s) != len(s := s.cutprefix(prefixes)): pass
|
||||||
>>> s
|
>>> s
|
||||||
'bar'
|
'bar'
|
||||||
|
|
||||||
The above can be modififed by chaining multiple ``cutprefix`` calls
|
or the more obvious and readable alternative::
|
||||||
together to achieve the full behavior of the ``lstrip``/``rstrip``
|
|
||||||
generalization, while being explicit in the order of removal.
|
|
||||||
|
|
||||||
While the proposed API could later be extended to include some of
|
>>> s = 'foo' * 100 + 'bar'
|
||||||
these use cases, to do so before any observation of how these methods
|
>>> prefixes = ('bar', 'foo')
|
||||||
are used in practice would be premature and may lead to choosing the
|
>>> while s.startswith(prefixes): s = s.cutprefix(prefixes)
|
||||||
wrong behavior.
|
>>> s
|
||||||
|
'bar'
|
||||||
|
|
||||||
|
|
||||||
Raising an exception when not found
|
Raising an exception when not found
|
||||||
|
@ -427,28 +303,39 @@ Alternative Method Names
|
||||||
|
|
||||||
Several alternatives method names have been proposed. Some are listed
|
Several alternatives method names have been proposed. Some are listed
|
||||||
below, along with commentary for why they should be rejected in favor
|
below, along with commentary for why they should be rejected in favor
|
||||||
of ``cutprefix`` (the same arguments hold for ``cutsuffix``)
|
of ``cutprefix`` (the same arguments hold for ``cutsuffix``).
|
||||||
|
|
||||||
``ltrim``
|
- ``ltrim``
|
||||||
"Trim" does in other languages (e.g. JavaScript, Java, Go,
|
|
||||||
|
- "Trim" does in other languages (e.g. JavaScript, Java, Go,
|
||||||
PHP) what ``strip`` methods do in Python.
|
PHP) what ``strip`` methods do in Python.
|
||||||
``lstrip(string=...)``
|
|
||||||
This would avoid adding a new method, but for different
|
- ``lstrip(string=...)``
|
||||||
|
|
||||||
|
- This would avoid adding a new method, but for different
|
||||||
behavior, it's better to have two different methods than one
|
behavior, it's better to have two different methods than one
|
||||||
method with a keyword argument that select the behavior.
|
method with a keyword argument that select the behavior.
|
||||||
``cut_prefix``
|
|
||||||
All of the other methods of the string API, e.g.
|
- ``cut_prefix``
|
||||||
|
|
||||||
|
- All of the other methods of the string API, e.g.
|
||||||
``str.startswith()``, use ``lowercase`` rather than
|
``str.startswith()``, use ``lowercase`` rather than
|
||||||
``lower_case_with_underscores``.
|
``lower_case_with_underscores``.
|
||||||
``cutleft``, ``leftcut``, or ``lcut``
|
|
||||||
The explicitness of "prefix" is preferred.
|
- ``cutleft``, ``leftcut``, or ``lcut``
|
||||||
``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc.
|
|
||||||
All of these might have been acceptable, but they have more
|
- The explicitness of "prefix" is preferred.
|
||||||
|
|
||||||
|
- ``removeprefix``, ``deleteprefix``, ``withoutprefix``, ``dropprefix``, etc.
|
||||||
|
|
||||||
|
- All of these might have been acceptable, but they have more
|
||||||
characters than ``cut``. Some suggested that the verb "cut"
|
characters than ``cut``. Some suggested that the verb "cut"
|
||||||
implies mutability, but the string API already contains verbs
|
implies mutability, but the string API already contains verbs
|
||||||
like "replace", "strip", "split", and "swapcase".
|
like "replace", "strip", "split", and "swapcase".
|
||||||
``stripprefix``
|
|
||||||
Users may benefit from the mnemonic that "strip" means working
|
- ``stripprefix``
|
||||||
|
|
||||||
|
- Users may benefit from remembering that "strip" means working
|
||||||
with sets of characters, while other methods work with
|
with sets of characters, while other methods work with
|
||||||
substrings, so re-using "strip" here should be avoided.
|
substrings, so re-using "strip" here should be avoided.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue