From 914ef2c51a6ae6e07b930675994a172e42fc580e Mon Sep 17 00:00:00 2001 From: sweeneyde <36520290+sweeneyde@users.noreply.github.com> Date: Fri, 20 Mar 2020 12:21:06 -0400 Subject: [PATCH] PEP 616: String methods to remove prefixes and suffixes (#1332) * Add a PEP: String methods to remove prefixes and suffixes * add PEP number * Add sponsor * Fix typo --- pep-0616.rst | 488 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 488 insertions(+) create mode 100644 pep-0616.rst diff --git a/pep-0616.rst b/pep-0616.rst new file mode 100644 index 000000000..11b5d56db --- /dev/null +++ b/pep-0616.rst @@ -0,0 +1,488 @@ +PEP: 616 +Title: String methods to remove prefixes and suffixes +Author: Dennis Sweeney +Sponsor: Eric V. Smith +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 19-Mar-2020 +Python-Version: 3.9 +Post-History: 30-Aug-2002 + + +Abstract +======== + +This is a proposal to add two new methods, ``cutprefix`` and +``cutsuffix``, to the APIs of Python's various string objects. In +particular, the methods would be added to Unicode ``str`` objects, +binary ``bytes`` and ``bytearray`` objects, and +``collections.UserString``. + +If ``s`` is one these objects, and ``s`` has ``pre`` as a prefix, then +``s.cutprefix(pre)`` returns a copy of ``s`` in which that prefix has +been removed. If ``s`` does not have ``pre`` as a prefix, an +unchanged copy of ``s`` is returned. In summary, ``s.cutprefix(pre)`` +is roughly equivalent to ``s[len(pre):] if s.startswith(pre) else s``. + +The behavior of ``cutsuffix`` is analogous: ``s.cutsuffix(suf)`` is +roughly equivalent to +``s[:-len(suf)] if suf and s.endswith(suf) else s``. + + +Rationale +========= + +There have been repeated issues [#confusion]_ on the Bug Tracker +and StackOverflow related to user confusion about the existing +``str.lstrip`` and ``str.rstrip`` methods. These users are typically +expecting the behavior of ``cutprefix`` and ``cutsuffix``, but they +are surprised that the parameter for ``lstrip`` is interpreted as a +set of characters, not a substring. This repeated issue is evidence +that these methods are useful, and the new methods allow a cleaner +redirection of users to the desired behavior. + +As another testimonial for the usefulness of these methods, several +users on Python-Ideas [#pyid]_ reported frequently including similar +functions in their own code for productivity. The implementation +often contained subtle mistakes regarding the handling of the empty +string (see `Specification`_). + + +Specification +============= + +The builtin ``str`` class will gain two new methods with roughly the +following behavior:: + + def cutprefix(self: str, pre: str, /) -> str: + if self.startswith(pre): + return self[len(pre):] + return self[:] + + def cutsuffix(self: str, suf: str, /) -> str: + if suf and self.endswith(suf): + return self[:-len(suf)] + return self[:] + +The only difference between the real implementation and the above is +that, as with other string methods like ``replace``, the +methods will raise a ``TypeError`` if any of ``self``, ``pre`` or +``suf`` is not an instace of ``str``, and will cast subclasses of +``str`` to builtin ``str`` objects. + +Note that without the check for the truthyness of ``suf``, +``s.cutsuffix('')`` would be mishandled and always return the empty +string due to the unintended evaluation of ``self[:-0]``. + +Methods with the corresponding semantics will be added to the builtin +``bytes`` and ``bytearray`` objects. If ``b`` is either a ``bytes`` +or ``bytearray`` object, then ``b.cutsuffix()`` and ``b.cutprefix()`` +will accept any bytes-like object as an argument. + +Note that the ``bytearray`` methods return a copy of ``self``; they do +not operate in place. + +The following behavior is considered a CPython implementation detail, +but is not guaranteed by this specification:: + + >>> x = 'foobar' * 10**6 + >>> x.cutprefix('baz') is x is x.cutsuffix('baz') + True + >>> x.cutprefix('') is x is x.cutsuffix('') + True + +That is, for CPython's immutable ``str`` and ``bytes`` objects, the +methods return the original object when the affix is not found or if +the affix is empty. Because these types test for equality using +shortcuts for identity and length, the following equivalent +expressions are evaluated at approximately the same speed, for any +``str`` objects (or ``bytes`` objects) ``x`` and ``y``:: + + >>> (True, x[len(y):]) if x.startswith(y) else (False, x) + >>> (True, z) if x != (z := x.cutprefix(y)) else (False, x) + + +The two methods will also be added to ``collections.UserString``, +where they rely on the implementation of the new ``str`` methods. + + +Motivating examples from the Python standard library +==================================================== + +The examples below demonstrate how the proposed methods can make code +one or more of the following: + + Less fragile: + The code will not depend on the user to count the length of a + literal. + More performant: + The code does not require a call to the Python built-in + ``len`` function. + More descriptive: + The methods give a higher-level API for code readability, as + opposed to the traditional method of string slicing. + + +refactor.py +----------- + +- Current:: + + if fix_name.startswith(self.FILE_PREFIX): + fix_name = fix_name[len(self.FILE_PREFIX):] + +- Improved:: + + fix_name = fix_name.cutprefix(self.FILE_PREFIX) + + +c_annotations.py: +----------------- + +- Current:: + + if name.startswith("c."): + name = name[2:] + +- Improved:: + + name = name.cutprefix("c.") + + +find_recursionlimit.py +---------------------- + +- Current:: + + if test_func_name.startswith("test_"): + print(test_func_name[5:]) + else: + print(test_func_name) + +- Improved:: + + print(test_finc_name.cutprefix("test_")) + +deccheck.py +----------- + +This is an interesting case because the author chose to use the +``str.replace`` method in a situation where only a prefix was +intended to be removed. + +- Current:: + + if funcname.startswith("context."): + self.funcname = funcname.replace("context.", "") + self.contextfunc = True + else: + self.funcname = funcname + self.contextfunc = False + +- Improved:: + + if funcname.startswith("context."): + self.funcname = funcname.cutprefix("context.") + self.contextfunc = True + else: + self.funcname = funcname + self.contextfunc = False + +- Arguably further improved:: + + self.contextfunc = funcname.startswith("context.") + self.funcname = funcname.cutprefix("context.") + + +test_i18n.py +------------ + +- Current:: + + if test_func_name.startswith("test_"): + print(test_func_name[5:]) + else: + print(test_func_name) + +- Improved:: + + print(test_finc_name.cutprefix("test_")) + +- Current:: + + if creationDate.endswith('\\n'): + creationDate = creationDate[:-len('\\n')] + +- Improved:: + + creationDate = creationDate.cutsuffix('\\n') + + +shared_memory.py +---------------- + +- Current:: + + reported_name = self._name + if _USE_POSIX and self._prepend_leading_slash: + if self._name.startswith("/"): + reported_name = self._name[1:] + return reported_name + +- Improved:: + + if _USE_POSIX and self._prepend_leading_slash: + return self._name.cutprefix("/") + return self._name + + +build-installer.py +------------------ + +- Current:: + + if archiveName.endswith('.tar.gz'): + retval = os.path.basename(archiveName[:-7]) + if ((retval.startswith('tcl') or retval.startswith('tk')) + and retval.endswith('-src')): + retval = retval[:-4] + +- Improved:: + + if archiveName.endswith('.tar.gz'): + retval = os.path.basename(archiveName[:-7]) + if retval.startswith(('tcl', 'tk')): + retval = retval.cutsuffix('-src') + +Depending on personal style, ``archiveName[:-7]`` could also be +changed to ``archiveName.cutsuffix('.tar.gz')``. + + +test_core.py +------------ + +- Current:: + + if output.endswith("\n"): + output = output[:-1] + +- Improved:: + + output = output.cutsuffix("\n") + + +cookiejar.py +------------ + +- Current:: + + def strip_quotes(text): + if text.startswith('"'): + text = text[1:] + if text.endswith('"'): + text = text[:-1] + return text + +- Improved:: + + def strip_quotes(text): + return text.cutprefix('"').cutsuffix('"') + +- Current:: + + if line.endswith("\n"): line = line[:-1] + +- Improved:: + + line = line.cutsuffix("\n") + + +fixdiv.py +--------- + +- Current:: + + def chop(line): + if line.endswith("\n"): + return line[:-1] + else: + return line + +- Improved:: + + def chop(line): + return line.cutsuffix("\n") + + +test_concurrent_futures.py +-------------------------- + +In the following example, the meaning of the code changes slightly, +but in context, it behaves the same. + +- Current:: + + if name.endswith(('Mixin', 'Tests')): + return name[:-5] + elif name.endswith('Test'): + return name[:-4] + else: + return name + +- Improved:: + + return name.cutsuffix('Mixin').cutsuffix('Tests').cutsuffix('Test') + + +msvc9compiler.py +---------------- + +- Current:: + + if value.endswith(os.pathsep): + value = value[:-1] + +- Improved:: + + value = value.cutsuffix(os.pathsep) + + +test_pathlib.py +--------------- + +- Current:: + + self.assertTrue(r.startswith(clsname + '('), r) + self.assertTrue(r.endswith(')'), r) + inner = r[len(clsname) + 1 : -1] + +- Improved:: + + self.assertTrue(r.startswith(clsname + '('), r) + self.assertTrue(r.endswith(')'), r) + inner = r.cutprefix(clsname + '(').cutsuffix(')') + + + +Rejected Ideas +============== + +Expand the lstrip and rstrip APIs +--------------------------------- + +Because ``lstrip`` takes a string as its argument, it could be viewed +as taking an iterable of length-1 strings. The API could therefore be +generalized to accept any iterable of strings, which would be +successively removed as prefixes. While this behavior would be +consistent, it would not be obvious for users to have to call +``'foobar'.cutprefix(('foo,))`` for the common use case of a +single prefix. + +Allow multiple prefixes +----------------------- + +Some users discussed the desire to be able to remove multiple +prefixes, calling, for example, ``s.cutprefix('From: ', 'CC: ')``. +However, this adds ambiguity about the order in which the prefixes are +removed, especially in cases like ``s.cutprefix('Foo', 'FooBar')``. +After this proposal, this can be spelled explicitly as +``s.cutprefix('Foo').cutprefix('FooBar')``. + +Remove multiple copies of a prefix +---------------------------------- + +This is the behavior that would be consistent with the aforementioned +expansion of the ``lstrip/rstrip`` API -- repeatedly applying the +function until the argument is unchanged. This behavior is attainable +from the proposed behavior via the following:: + + >>> s = 'foo' * 100 + 'bar' + >>> while s != (s := s.cutprefix("foo")): pass + >>> s + 'bar' + +The above can be modififed by chaining multiple ``cutprefix`` calls +together to achieve the full behavior of the ``lstrip``/``rstrip`` +generalization, while being explicit in the order of removal. + +While the proposed API could later be extended to include some of +these use cases, to do so before any observation of how these methods +are used in practice would be premature and may lead to choosing the +wrong behavior. + + +Raising an exception when not found +----------------------------------- + +There was a suggestion that ``s.cutprefix(pre)`` should raise an +exception if ``not s.startswith(pre)``. However, this does not match +with the behavior and feel of other string methods. There could be +``required=False`` keyword added, but this violates the KISS +principle. + + +Alternative Method Names +------------------------ + +Several alternatives method names have been proposed. Some are listed +below, along with commentary for why they should be rejected in favor +of ``cutprefix`` (the same arguments hold for ``cutsuffix``) + + ``ltrim`` + "Trim" does in other languages (e.g. JavaScript, Java, Go, + PHP) what ``strip`` methods do in Python. + ``lstrip(string=...)`` + This would avoid adding a new method, but for different + behavior, it's better to have two different methods than one + method with a keyword argument that select the behavior. + ``cut_prefix`` + All of the other methods of the string API, e.g. + ``str.startswith()``, use ``lowercase`` rather than + ``lower_case_with_underscores``. + ``cutleft``, ``leftcut``, or ``lcut`` + The explicitness of "prefix" is preferred. + ``removeprefix``, ``deleteprefix``, ``withoutprefix``, etc. + All of these might have been acceptable, but they have more + characters than ``cut``. Some suggested that the verb "cut" + implies mutability, but the string API already contains verbs + like "replace", "strip", "split", and "swapcase". + ``stripprefix`` + Users may benefit from the mnemonic that "strip" means working + with sets of characters, while other methods work with + substrings, so re-using "strip" here should be avoided. + + +Reference Implementation +======================== + +See the pull request on GitHub [#pr]_. + + +References +========== + +.. [#pr] GitHub pull request with implementation + (https://github.com/python/cpython/pull/18939) +.. [#pyid] Discussion on Python-Ideas + (https://mail.python.org/archives/list/python-ideas@python.org/thread/RJARZSUKCXRJIP42Z2YBBAEN5XA7KEC3/) +.. [#confusion] Comment listing Bug Tracker and StackOverflow issues + (https://mail.python.org/archives/list/python-ideas@python.org/message/GRGAFIII3AX22K3N3KT7RB4DPBY3LPVG/) + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: