388 lines
11 KiB
ReStructuredText
388 lines
11 KiB
ReStructuredText
PEP: 3126
|
||
Title: Remove Implicit String Concatenation
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Jim J. Jewett <JimJJewett@gmail.com>,
|
||
Raymond Hettinger <python@rcn.com>
|
||
Status: Rejected
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 29-Apr-2007
|
||
Post-History: 29-Apr-2007, 30-Apr-2007, 07-May-2007
|
||
|
||
|
||
Rejection Notice
|
||
================
|
||
|
||
This PEP is rejected. There wasn't enough support in favor, the
|
||
feature to be removed isn't all that harmful, and there are some use
|
||
cases that would become harder.
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
Python inherited many of its parsing rules from C. While this has
|
||
been generally useful, there are some individual rules which are less
|
||
useful for python, and should be eliminated.
|
||
|
||
This PEP proposes to eliminate implicit string concatenation based
|
||
only on the adjacency of literals.
|
||
|
||
Instead of::
|
||
|
||
"abc" "def" == "abcdef"
|
||
|
||
authors will need to be explicit, and either add the strings::
|
||
|
||
"abc" + "def" == "abcdef"
|
||
|
||
or join them::
|
||
|
||
"".join(["abc", "def"]) == "abcdef"
|
||
|
||
|
||
Motivation
|
||
==========
|
||
|
||
One goal for Python 3000 should be to simplify the language by
|
||
removing unnecessary features. Implicit string concatenation should
|
||
be dropped in favor of existing techniques. This will simplify the
|
||
grammar and simplify a user's mental picture of Python. The latter is
|
||
important for letting the language "fit in your head". A large group
|
||
of current users do not even know about implicit concatenation. Of
|
||
those who do know about it, a large portion never use it or habitually
|
||
avoid it. Of those who both know about it and use it, very few could
|
||
state with confidence the implicit operator precedence and under what
|
||
circumstances it is computed when the definition is compiled versus
|
||
when it is run.
|
||
|
||
|
||
History or Future
|
||
-----------------
|
||
|
||
Many Python parsing rules are intentionally compatible with C. This
|
||
is a useful default, but Special Cases need to be justified based on
|
||
their utility in Python. We should no longer assume that python
|
||
programmers will also be familiar with C, so compatibility between
|
||
languages should be treated as a tie-breaker, rather than a
|
||
justification.
|
||
|
||
In C, implicit concatenation is the only way to join strings without
|
||
using a (run-time) function call to store into a variable. In Python,
|
||
the strings can be joined (and still recognized as immutable) using
|
||
more standard Python idioms, such ``+`` or ``"".join``.
|
||
|
||
|
||
Problem
|
||
-------
|
||
|
||
Implicit String concatenation leads to tuples and lists which are
|
||
shorter than they appear; this is turn can lead to confusing, or even
|
||
silent, errors. For example, given a function which accepts several
|
||
parameters, but offers a default value for some of them::
|
||
|
||
def f(fmt, *args):
|
||
print fmt % args
|
||
|
||
This looks like a valid call, but isn't::
|
||
|
||
>>> f("User %s got a message %s",
|
||
"Bob"
|
||
"Time for dinner")
|
||
|
||
Traceback (most recent call last):
|
||
File "<pyshell#8>", line 2, in <module>
|
||
"Bob"
|
||
File "<pyshell#3>", line 2, in f
|
||
print fmt % args
|
||
TypeError: not enough arguments for format string
|
||
|
||
|
||
Calls to this function can silently do the wrong thing::
|
||
|
||
def g(arg1, arg2=None):
|
||
...
|
||
|
||
# silently transformed into the possibly very different
|
||
# g("arg1 on this linearg2 on this line", None)
|
||
g("arg1 on this line"
|
||
"arg2 on this line")
|
||
|
||
To quote Jason Orendorff [#Orendorff]
|
||
|
||
Oh. I just realized this happens a lot out here. Where I work,
|
||
we use scons, and each SConscript has a long list of filenames::
|
||
|
||
sourceFiles = [
|
||
'foo.c'
|
||
'bar.c',
|
||
#...many lines omitted...
|
||
'q1000x.c']
|
||
|
||
It's a common mistake to leave off a comma, and then scons
|
||
complains that it can't find 'foo.cbar.c'. This is pretty
|
||
bewildering behavior even if you *are* a Python programmer,
|
||
and not everyone here is.
|
||
|
||
|
||
Solution
|
||
========
|
||
|
||
In Python, strings are objects and they support the __add__ operator,
|
||
so it is possible to write::
|
||
|
||
"abc" + "def"
|
||
|
||
Because these are literals, this addition can still be optimized away
|
||
by the compiler; the CPython compiler already does so.
|
||
[#rcn-constantfold]_
|
||
|
||
Other existing alternatives include multiline (triple-quoted) strings,
|
||
and the join method::
|
||
|
||
"""This string
|
||
extends across
|
||
multiple lines, but you may want to use something like
|
||
Textwrap.dedent
|
||
to clear out the leading spaces
|
||
and/or reformat.
|
||
"""
|
||
|
||
|
||
>>> "".join(["empty", "string", "joiner"]) == "emptystringjoiner"
|
||
True
|
||
|
||
>>> " ".join(["space", "string", "joiner"]) == "space string joiner"
|
||
True
|
||
|
||
>>> "\n".join(["multiple", "lines"]) == "multiple\nlines" == (
|
||
"""multiple
|
||
lines""")
|
||
True
|
||
|
||
|
||
Concerns
|
||
========
|
||
|
||
|
||
Operator Precedence
|
||
-------------------
|
||
|
||
Guido indicated [#rcn-constantfold]_ that this change should be
|
||
handled by PEP, because there were a few edge cases with other string
|
||
operators, such as the %. (Assuming that str % stays -- it may be
|
||
eliminated in favor of :pep:`3101` -- Advanced String Formatting.
|
||
[#elimpercent]_)
|
||
|
||
The resolution is to use parentheses to enforce precedence -- the same
|
||
solution that can be used today::
|
||
|
||
# Clearest, works today, continues to work, optimization is
|
||
# already possible.
|
||
("abc %s def" + "ghi") % var
|
||
|
||
# Already works today; precedence makes the optimization more
|
||
# difficult to recognize, but does not change the semantics.
|
||
"abc" + "def %s ghi" % var
|
||
|
||
as opposed to::
|
||
|
||
# Already fails because modulus (%) is higher precedence than
|
||
# addition (+)
|
||
("abc %s def" + "ghi" % var)
|
||
|
||
# Works today only because adjacency is higher precedence than
|
||
# modulus. This will no longer be available.
|
||
"abc %s" "def" % var
|
||
|
||
# So the 2-to-3 translator can automatically replace it with the
|
||
# (already valid):
|
||
("abc %s" + "def") % var
|
||
|
||
|
||
Long Commands
|
||
-------------
|
||
|
||
... build up (what I consider to be) readable SQL queries [#skipSQL]_::
|
||
|
||
rows = self.executesql("select cities.city, state, country"
|
||
" from cities, venues, events, addresses"
|
||
" where cities.city like %s"
|
||
" and events.active = 1"
|
||
" and venues.address = addresses.id"
|
||
" and addresses.city = cities.id"
|
||
" and events.venue = venues.id",
|
||
(city,))
|
||
|
||
Alternatives again include triple-quoted strings, ``+``, and ``.join``::
|
||
|
||
query="""select cities.city, state, country
|
||
from cities, venues, events, addresses
|
||
where cities.city like %s
|
||
and events.active = 1"
|
||
and venues.address = addresses.id
|
||
and addresses.city = cities.id
|
||
and events.venue = venues.id"""
|
||
|
||
query=( "select cities.city, state, country"
|
||
+ " from cities, venues, events, addresses"
|
||
+ " where cities.city like %s"
|
||
+ " and events.active = 1"
|
||
+ " and venues.address = addresses.id"
|
||
+ " and addresses.city = cities.id"
|
||
+ " and events.venue = venues.id"
|
||
)
|
||
|
||
query="\n".join(["select cities.city, state, country",
|
||
" from cities, venues, events, addresses",
|
||
" where cities.city like %s",
|
||
" and events.active = 1",
|
||
" and venues.address = addresses.id",
|
||
" and addresses.city = cities.id",
|
||
" and events.venue = venues.id"])
|
||
|
||
# And yes, you *could* inline any of the above querystrings
|
||
# the same way the original was inlined.
|
||
rows = self.executesql(query, (city,))
|
||
|
||
|
||
Regular Expressions
|
||
-------------------
|
||
|
||
Complex regular expressions are sometimes stated in terms of several
|
||
implicitly concatenated strings with each regex component on a
|
||
different line and followed by a comment. The plus operator can be
|
||
inserted here but it does make the regex harder to read. One
|
||
alternative is to use the re.VERBOSE option. Another alternative is
|
||
to build-up the regex with a series of += lines::
|
||
|
||
# Existing idiom which relies on implicit concatenation
|
||
r = ('a{20}' # Twenty A's
|
||
'b{5}' # Followed by Five B's
|
||
)
|
||
|
||
# Mechanical replacement
|
||
r = ('a{20}' +# Twenty A's
|
||
'b{5}' # Followed by Five B's
|
||
)
|
||
|
||
# already works today
|
||
r = '''a{20} # Twenty A's
|
||
b{5} # Followed by Five B's
|
||
''' # Compiled with the re.VERBOSE flag
|
||
|
||
# already works today
|
||
r = 'a{20}' # Twenty A's
|
||
r += 'b{5}' # Followed by Five B's
|
||
|
||
|
||
Internationalization
|
||
--------------------
|
||
|
||
Some internationalization tools -- notably xgettext -- have already
|
||
been special-cased for implicit concatenation, but not for Python's
|
||
explicit concatenation. [#barryi8]_
|
||
|
||
These tools will fail to extract the (already legal)::
|
||
|
||
_("some string" +
|
||
" and more of it")
|
||
|
||
but often have a special case for::
|
||
|
||
_("some string"
|
||
" and more of it")
|
||
|
||
It should also be possible to just use an overly long line (xgettext
|
||
limits messages to 2048 characters [#xgettext2048]_, which is less
|
||
than Python's enforced limit) or triple-quoted strings, but these
|
||
solutions sacrifice some readability in the code::
|
||
|
||
# Lines over a certain length are unpleasant.
|
||
_("some string and more of it")
|
||
|
||
# Changing whitespace is not ideal.
|
||
_("""Some string
|
||
and more of it""")
|
||
_("""Some string
|
||
and more of it""")
|
||
_("Some string \
|
||
and more of it")
|
||
|
||
I do not see a good short-term resolution for this.
|
||
|
||
|
||
Transition
|
||
==========
|
||
|
||
The proposed new constructs are already legal in current Python, and
|
||
can be used immediately.
|
||
|
||
The 2 to 3 translator can be made to mechanically change::
|
||
|
||
"str1" "str2"
|
||
("line1" #comment
|
||
"line2")
|
||
|
||
into::
|
||
|
||
("str1" + "str2")
|
||
("line1" +#comments
|
||
"line2")
|
||
|
||
If users want to use one of the other idioms, they can; as these
|
||
idioms are all already legal in python 2, the edits can be made
|
||
to the original source, rather than patching up the translator.
|
||
|
||
|
||
Open Issues
|
||
===========
|
||
|
||
Is there a better way to support external text extraction tools, or at
|
||
least ``xgettext`` [#gettext]_ in particular?
|
||
|
||
|
||
References
|
||
==========
|
||
|
||
.. [#Orendorff] Implicit String Concatenation, Orendorff
|
||
https://mail.python.org/pipermail/python-ideas/2007-April/000397.html
|
||
|
||
.. [#rcn-constantfold] Reminder: Py3k PEPs due by April, Hettinger,
|
||
van Rossum
|
||
https://mail.python.org/pipermail/python-3000/2007-April/006563.html
|
||
|
||
.. [#elimpercent] ps to question Re: Need help completing ABC pep,
|
||
van Rossum
|
||
https://mail.python.org/pipermail/python-3000/2007-April/006737.html
|
||
|
||
.. [#skipSQL] (email Subject) PEP 30XZ: Simplified Parsing, Skip,
|
||
https://mail.python.org/pipermail/python-3000/2007-May/007261.html
|
||
|
||
.. [#barryi8] (email Subject) PEP 30XZ: Simplified Parsing
|
||
https://mail.python.org/pipermail/python-3000/2007-May/007305.html
|
||
|
||
.. [#gettext] GNU gettext manual
|
||
http://www.gnu.org/software/gettext/
|
||
|
||
.. [#xgettext2048] Unix man page for xgettext -- Notes section
|
||
http://www.scit.wlv.ac.uk/cgi-bin/mansec?1+xgettext
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|
||
|
||
|
||
..
|
||
Local Variables:
|
||
mode: indented-text
|
||
indent-tabs-mode: nil
|
||
sentence-end-double-space: t
|
||
fill-column: 70
|
||
coding: utf-8
|
||
End:
|