PEP: 3126 Title: Remove Implicit String Concatenation Version: $Revision$ Last-Modified: $Date$ Author: Jim J. Jewett <JimJJewett@gmail.com>, Raymond Hettinger <python@rcn.com> Status: Rejected Type: Standards Track Content-Type: text/x-rst Created: 29-Apr-2007 Post-History: 29-Apr-2007, 30-Apr-2007, 07-May-2007 Rejection Notice ================ This PEP is rejected. There wasn't enough support in favor, the feature to be removed isn't all that harmful, and there are some use cases that would become harder. Abstract ======== Python inherited many of its parsing rules from C. While this has been generally useful, there are some individual rules which are less useful for python, and should be eliminated. This PEP proposes to eliminate implicit string concatenation based only on the adjacency of literals. Instead of:: "abc" "def" == "abcdef" authors will need to be explicit, and either add the strings:: "abc" + "def" == "abcdef" or join them:: "".join(["abc", "def"]) == "abcdef" Motivation ========== One goal for Python 3000 should be to simplify the language by removing unnecessary features. Implicit string concatenation should be dropped in favor of existing techniques. This will simplify the grammar and simplify a user's mental picture of Python. The latter is important for letting the language "fit in your head". A large group of current users do not even know about implicit concatenation. Of those who do know about it, a large portion never use it or habitually avoid it. Of those who both know about it and use it, very few could state with confidence the implicit operator precedence and under what circumstances it is computed when the definition is compiled versus when it is run. History or Future ----------------- Many Python parsing rules are intentionally compatible with C. This is a useful default, but Special Cases need to be justified based on their utility in Python. We should no longer assume that python programmers will also be familiar with C, so compatibility between languages should be treated as a tie-breaker, rather than a justification. In C, implicit concatenation is the only way to join strings without using a (run-time) function call to store into a variable. In Python, the strings can be joined (and still recognized as immutable) using more standard Python idioms, such ``+`` or ``"".join``. Problem ------- Implicit String concatentation leads to tuples and lists which are shorter than they appear; this is turn can lead to confusing, or even silent, errors. For example, given a function which accepts several parameters, but offers a default value for some of them:: def f(fmt, *args): print fmt % args This looks like a valid call, but isn't:: >>> f("User %s got a message %s", "Bob" "Time for dinner") Traceback (most recent call last): File "<pyshell#8>", line 2, in <module> "Bob" File "<pyshell#3>", line 2, in f print fmt % args TypeError: not enough arguments for format string Calls to this function can silently do the wrong thing:: def g(arg1, arg2=None): ... # silently transformed into the possibly very different # g("arg1 on this linearg2 on this line", None) g("arg1 on this line" "arg2 on this line") To quote Jason Orendorff [#Orendorff] Oh. I just realized this happens a lot out here. Where I work, we use scons, and each SConscript has a long list of filenames:: sourceFiles = [ 'foo.c' 'bar.c', #...many lines omitted... 'q1000x.c'] It's a common mistake to leave off a comma, and then scons complains that it can't find 'foo.cbar.c'. This is pretty bewildering behavior even if you *are* a Python programmer, and not everyone here is. Solution ======== In Python, strings are objects and they support the __add__ operator, so it is possible to write:: "abc" + "def" Because these are literals, this addition can still be optimized away by the compiler; the CPython compiler already does so. [#rcn-constantfold]_ Other existing alternatives include multiline (triple-quoted) strings, and the join method:: """This string extends across multiple lines, but you may want to use something like Textwrap.dedent to clear out the leading spaces and/or reformat. """ >>> "".join(["empty", "string", "joiner"]) == "emptystringjoiner" True >>> " ".join(["space", "string", "joiner"]) == "space string joiner" >>> "\n".join(["multiple", "lines"]) == "multiple\nlines" == ( """multiple lines""") True Concerns ======== Operator Precedence ------------------- Guido indicated [#rcn-constantfold]_ that this change should be handled by PEP, because there were a few edge cases with other string operators, such as the %. (Assuming that str % stays -- it may be eliminated in favor of PEP 3101 -- Advanced String Formatting. [#PEP3101]_ [#elimpercent]_) The resolution is to use parentheses to enforce precedence -- the same solution that can be used today:: # Clearest, works today, continues to work, optimization is # already possible. ("abc %s def" + "ghi") % var # Already works today; precedence makes the optimization more # difficult to recognize, but does not change the semantics. "abc" + "def %s ghi" % var as opposed to:: # Already fails because modulus (%) is higher precedence than # addition (+) ("abc %s def" + "ghi" % var) # Works today only because adjacency is higher precedence than # modulus. This will no longer be available. "abc %s" "def" % var # So the 2-to-3 translator can automically replace it with the # (already valid): ("abc %s" + "def") % var Long Commands ------------- ... build up (what I consider to be) readable SQL queries [#skipSQL]_:: rows = self.executesql("select cities.city, state, country" " from cities, venues, events, addresses" " where cities.city like %s" " and events.active = 1" " and venues.address = addresses.id" " and addresses.city = cities.id" " and events.venue = venues.id", (city,)) Alternatives again include triple-quoted strings, ``+``, and ``.join``:: query="""select cities.city, state, country from cities, venues, events, addresses where cities.city like %s and events.active = 1" and venues.address = addresses.id and addresses.city = cities.id and events.venue = venues.id""" query=( "select cities.city, state, country" + " from cities, venues, events, addresses" + " where cities.city like %s" + " and events.active = 1" + " and venues.address = addresses.id" + " and addresses.city = cities.id" + " and events.venue = venues.id" ) query="\n".join(["select cities.city, state, country", " from cities, venues, events, addresses", " where cities.city like %s", " and events.active = 1", " and venues.address = addresses.id", " and addresses.city = cities.id", " and events.venue = venues.id"]) # And yes, you *could* inline any of the above querystrings # the same way the original was inlined. rows = self.executesql(query, (city,)) Regular Expressions ------------------- Complex regular expressions are sometimes stated in terms of several implicitly concatenated strings with each regex component on a different line and followed by a comment. The plus operator can be inserted here but it does make the regex harder to read. One alternative is to use the re.VERBOSE option. Another alternative is to build-up the regex with a series of += lines:: # Existing idiom which relies on implicit concatenation r = ('a{20}' # Twenty A's 'b{5}' # Followed by Five B's ) # Mechanical replacement r = ('a{20}' +# Twenty A's 'b{5}' # Followed by Five B's ) # already works today r = '''a{20} # Twenty A's b{5} # Followed by Five B's ''' # Compiled with the re.VERBOSE flag # already works today r = 'a{20}' # Twenty A's r += 'b{5}' # Followed by Five B's Internationalization -------------------- Some internationalization tools -- notably xgettext -- have already been special-cased for implicit concatenation, but not for Python's explicit concatenation. [#barryi8]_ These tools will fail to extract the (already legal):: _("some string" + " and more of it") but often have a special case for:: _("some string" " and more of it") It should also be possible to just use an overly long line (xgettext limits messages to 2048 characters [#xgettext2048]_, which is less than Python's enforced limit) or triple-quoted strings, but these solutions sacrifice some readability in the code:: # Lines over a certain length are unpleasant. _("some string and more of it") # Changing whitespace is not ideal. _("""Some string and more of it""") _("""Some string and more of it""") _("Some string \ and more of it") I do not see a good short-term resolution for this. Transition ========== The proposed new constructs are already legal in current Python, and can be used immediately. The 2 to 3 translator can be made to mechanically change:: "str1" "str2" ("line1" #comment "line2") into:: ("str1" + "str2") ("line1" +#comments "line2") If users want to use one of the other idioms, they can; as these idioms are all already legal in python 2, the edits can be made to the original source, rather than patching up the translator. Open Issues =========== Is there a better way to support external text extraction tools, or at least ``xgettext`` [#gettext]_ in particular? References ========== .. [#Orendorff] Implicit String Concatenation, Orendorff http://mail.python.org/pipermail/python-ideas/2007-April/000397.html .. [#rcn-constantfold] Reminder: Py3k PEPs due by April, Hettinger, van Rossum http://mail.python.org/pipermail/python-3000/2007-April/006563.html .. [#PEP3101] PEP 3101, Advanced String Formatting, Talin http://www.python.org/dev/peps/pep-3101/ .. [#elimpercent] ps to question Re: Need help completing ABC pep, van Rossum http://mail.python.org/pipermail/python-3000/2007-April/006737.html .. [#skipSQL] (email Subject) PEP 30XZ: Simplified Parsing, Skip, http://mail.python.org/pipermail/python-3000/2007-May/007261.html .. [#barryi8] (email Subject) PEP 30XZ: Simplified Parsing http://mail.python.org/pipermail/python-3000/2007-May/007305.html .. [#gettext] GNU gettext manual http://www.gnu.org/software/gettext/ .. [#xgettext2048] Unix man page for xgettext -- Notes section http://www.scit.wlv.ac.uk/cgi-bin/mansec?1+xgettext Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: