New version, Raymond is now co-author.
This commit is contained in:
parent
9ebf76db3a
commit
4426f2d7c8
405
pep-3126.txt
405
pep-3126.txt
|
@ -2,111 +2,380 @@ PEP: 3126
|
|||
Title: Remove Implicit String Concatenation
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Jim J. Jewett <JimJJewett@gmail.com>
|
||||
Author: Jim J. Jewett <JimJJewett@gmail.com>,
|
||||
Raymond D. Hettinger <python at rcn.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/plain
|
||||
Content-Type: text/x-rst
|
||||
Created: 29-Apr-2007
|
||||
Post-History: 29-Apr-2007, 30-Apr-2007
|
||||
Post-History: 29-Apr-2007, 30-Apr-2007, 07-May-2007
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
Python initially inherited its parsing from C. While this has
|
||||
been generally useful, there are some remnants which have been
|
||||
less useful for python, and should be eliminated.
|
||||
Python inherited many of its parsing rules from C. While this has
|
||||
been generally useful, there are some individual rules which are less
|
||||
useful for python, and should be eliminated.
|
||||
|
||||
This PEP proposes to eliminate Implicit String concatenation
|
||||
based on adjacency of literals.
|
||||
This PEP proposes to eliminate implicit string concatenation based
|
||||
only on the adjacency of literals.
|
||||
|
||||
Instead of
|
||||
Instead of::
|
||||
|
||||
"abc" "def" == "abcdef"
|
||||
"abc" "def" == "abcdef"
|
||||
|
||||
authors will need to be explicit, and add the strings
|
||||
authors will need to be explicit, and either add the strings::
|
||||
|
||||
"abc" + "def" == "abcdef"
|
||||
"abc" + "def" == "abcdef"
|
||||
|
||||
or join them::
|
||||
|
||||
"".join(["abc", "def"]) == "abcdef"
|
||||
|
||||
|
||||
Rationale for Removing Implicit String Concatenation
|
||||
Motivation
|
||||
==========
|
||||
|
||||
Implicit String concatentation can lead to confusing, or even
|
||||
silent, errors.
|
||||
One goal for Python 3000 should be to simplify the language by
|
||||
removing unnecessary features. Implicit string concatenation should
|
||||
be dropped in favor of existing techniques. This will simplify the
|
||||
grammar and simplify a user's mental picture of Python. The latter is
|
||||
important for letting the language "fit in your head". A large group
|
||||
of current users do not even know about implicit concatenation. Of
|
||||
those who do know about it, a large portion never use it or habitually
|
||||
avoid it. Of those who both know about it and use it, very few could
|
||||
state with confidence the implicit operator precedence and under what
|
||||
circumstances it is computed when the definition is compiled versus
|
||||
when it is run.
|
||||
|
||||
def f(arg1, arg2=None): pass
|
||||
|
||||
f("abc" "def") # forgot the comma, no warning ...
|
||||
# silently becomes f("abcdef", None)
|
||||
History or Future
|
||||
-----------------
|
||||
|
||||
or, using the scons build framework,
|
||||
Many Python parsing rules are intentionally compatible with C. This
|
||||
is a useful default, but Special Cases need to be justified based on
|
||||
their utility in Python. We should no longer assume that python
|
||||
programmers will also be familiar with C, so compatibility between
|
||||
languages should be treated as a tie-breaker, rather than a
|
||||
justification.
|
||||
|
||||
In C, implicit concatenation is the only way to join strings without
|
||||
using a (run-time) function call to store into a variable. In Python,
|
||||
the strings can be joined (and still recognized as immutable) using
|
||||
more standard Python idioms, such ``+`` or ``"".join``.
|
||||
|
||||
|
||||
Problem
|
||||
-------
|
||||
|
||||
Implicit String concatentation leads to tuples and lists which are
|
||||
shorter than they appear; this is turn can lead to confusing, or even
|
||||
silent, errors. For example, given a function which accepts several
|
||||
parameters, but offers a default value for some of them::
|
||||
|
||||
def f(fmt, *args):
|
||||
print fmt % args
|
||||
|
||||
This looks like a valid call, but isn't::
|
||||
|
||||
>>> f("User %s got a message %s",
|
||||
"Bob"
|
||||
"Time for dinner")
|
||||
|
||||
Traceback (most recent call last):
|
||||
File "<pyshell#8>", line 2, in <module>
|
||||
"Bob"
|
||||
File "<pyshell#3>", line 2, in f
|
||||
print fmt % args
|
||||
TypeError: not enough arguments for format string
|
||||
|
||||
|
||||
Calls to this function can silently do the wrong thing::
|
||||
|
||||
def g(arg1, arg2=None):
|
||||
...
|
||||
|
||||
# silently transformed into the possibly very different
|
||||
# g("arg1 on this linearg2 on this line", None)
|
||||
g("arg1 on this line"
|
||||
"arg2 on this line")
|
||||
|
||||
To quote Jason Orendorff [#Orendorff]
|
||||
|
||||
Oh. I just realized this happens a lot out here. Where I work,
|
||||
we use scons, and each SConscript has a long list of filenames::
|
||||
|
||||
sourceFiles = [
|
||||
'foo.c'
|
||||
'bar.c',
|
||||
#...many lines omitted...
|
||||
'q1000x.c']
|
||||
'foo.c'
|
||||
'bar.c',
|
||||
#...many lines omitted...
|
||||
'q1000x.c']
|
||||
|
||||
It's a common mistake to leave off a comma, and then scons complains
|
||||
that it can't find 'foo.cbar.c'. This is pretty bewildering behavior
|
||||
even if you *are* a Python programmer, and not everyone here is. [1]
|
||||
It's a common mistake to leave off a comma, and then scons
|
||||
complains that it can't find 'foo.cbar.c'. This is pretty
|
||||
bewildering behavior even if you *are* a Python programmer,
|
||||
and not everyone here is.
|
||||
|
||||
Note that in C, the implicit concatenation is more justified; there
|
||||
is no other way to join strings without (at least) a function call.
|
||||
|
||||
In Python, strings are objects which support the __add__ operator;
|
||||
it is possible to write:
|
||||
Solution
|
||||
========
|
||||
|
||||
"abc" + "def"
|
||||
In Python, strings are objects and they support the __add__ operator,
|
||||
so it is possible to write::
|
||||
|
||||
Because these are literals, this addition can still be optimized
|
||||
away by the compiler. (The CPython compiler already does. [2])
|
||||
"abc" + "def"
|
||||
|
||||
Guido indicated [2] that this change should be handled by PEP, because
|
||||
there were a few edge cases with other string operators, such as the %.
|
||||
(Assuming that str % stays -- it may be eliminated in favor of
|
||||
PEP 3101 -- Advanced String Formatting. [3] [4])
|
||||
|
||||
The resolution is to treat them the same as today.
|
||||
Because these are literals, this addition can still be optimized away
|
||||
by the compiler; the CPython compiler already does so.
|
||||
[#rcn-constantfold]_
|
||||
|
||||
Other existing alternatives include multiline (triple-quoted) strings,
|
||||
and the join method::
|
||||
|
||||
"""This string
|
||||
extends across
|
||||
multiple lines, but you may want to use something like
|
||||
Textwrap.dedent
|
||||
to clear out the leading spaces
|
||||
and/or reformat.
|
||||
"""
|
||||
|
||||
|
||||
>>> "".join(["empty", "string", "joiner"]) == "emptystringjoiner"
|
||||
True
|
||||
|
||||
>>> " ".join(["space", "string", "joiner"]) == "space string joiner"
|
||||
|
||||
>>> "\n".join(["multiple", "lines"]) == "multiple\nlines" == (
|
||||
"""multiple
|
||||
lines""")
|
||||
True
|
||||
|
||||
|
||||
Concerns
|
||||
========
|
||||
|
||||
|
||||
Operator Precedence
|
||||
-------------------
|
||||
|
||||
Guido indicated [#rcn-constantfold]_ that this change should be
|
||||
handled by PEP, because there were a few edge cases with other string
|
||||
operators, such as the %. (Assuming that str % stays -- it may be
|
||||
eliminated in favor of PEP 3101 -- Advanced String Formatting.
|
||||
[#PEP3101]_ [#elimpercent]_)
|
||||
|
||||
The resolution is to use parentheses to enforce precedence -- the same
|
||||
solution that can be used today::
|
||||
|
||||
# Clearest, works today, continues to work, optimization is
|
||||
# already possible.
|
||||
("abc %s def" + "ghi") % var
|
||||
|
||||
# Already works today; precedence makes the optimization more
|
||||
# difficult to recognize, but does not change the semantics.
|
||||
"abc" + "def %s ghi" % var
|
||||
|
||||
as opposed to::
|
||||
|
||||
# Already fails because modulus (%) is higher precedence than
|
||||
# addition (+)
|
||||
("abc %s def" + "ghi" % var)
|
||||
|
||||
# Works today only because adjacency is higher precedence than
|
||||
# modulus. This will no longer be available.
|
||||
"abc %s" "def" % var
|
||||
|
||||
# So the 2-to-3 translator can automically replace it with the
|
||||
# (already valid):
|
||||
("abc %s" + "def") % var
|
||||
|
||||
|
||||
Long Commands
|
||||
-------------
|
||||
|
||||
... build up (what I consider to be) readable SQL queries [#skipSQL]_::
|
||||
|
||||
rows = self.executesql("select cities.city, state, country"
|
||||
" from cities, venues, events, addresses"
|
||||
" where cities.city like %s"
|
||||
" and events.active = 1"
|
||||
" and venues.address = addresses.id"
|
||||
" and addresses.city = cities.id"
|
||||
" and events.venue = venues.id",
|
||||
(city,))
|
||||
|
||||
Alternatives again include triple-quoted strings, ``+``, and ``.join``::
|
||||
|
||||
query="""select cities.city, state, country
|
||||
from cities, venues, events, addresses
|
||||
where cities.city like %s
|
||||
and events.active = 1"
|
||||
and venues.address = addresses.id
|
||||
and addresses.city = cities.id
|
||||
and events.venue = venues.id"""
|
||||
|
||||
query=( "select cities.city, state, country"
|
||||
+ " from cities, venues, events, addresses"
|
||||
+ " where cities.city like %s"
|
||||
+ " and events.active = 1"
|
||||
+ " and venues.address = addresses.id"
|
||||
+ " and addresses.city = cities.id"
|
||||
+ " and events.venue = venues.id"
|
||||
)
|
||||
|
||||
query="\n".join(["select cities.city, state, country",
|
||||
" from cities, venues, events, addresses",
|
||||
" where cities.city like %s",
|
||||
" and events.active = 1",
|
||||
" and venues.address = addresses.id",
|
||||
" and addresses.city = cities.id",
|
||||
" and events.venue = venues.id"])
|
||||
|
||||
# And yes, you *could* inline any of the above querystrings
|
||||
# the same way the original was inlined.
|
||||
rows = self.executesql(query, (city,))
|
||||
|
||||
|
||||
Regular Expressions
|
||||
-------------------
|
||||
|
||||
Complex regular expressions are sometimes stated in terms of several
|
||||
implicitly concatenated strings with each regex component on a
|
||||
different line and followed by a comment. The plus operator can be
|
||||
inserted here but it does make the regex harder to read. One
|
||||
alternative is to use the re.VERBOSE option. Another alternative is
|
||||
to build-up the regex with a series of += lines::
|
||||
|
||||
# Existing idiom which relies on implicit concatenation
|
||||
r = ('a{20}' # Twenty A's
|
||||
'b{5}' # Followed by Five B's
|
||||
)
|
||||
|
||||
# Mechanical replacement
|
||||
r = ('a{20}' +# Twenty A's
|
||||
'b{5}' # Followed by Five B's
|
||||
)
|
||||
|
||||
# already works today
|
||||
r = '''a{20} # Twenty A's
|
||||
b{5} # Followed by Five B's
|
||||
''' # Compiled with the re.VERBOSE flag
|
||||
|
||||
# already works today
|
||||
r = 'a{20}' # Twenty A's
|
||||
r += 'b{5}' # Followed by Five B's
|
||||
|
||||
|
||||
Internationalization
|
||||
--------------------
|
||||
|
||||
Some internationalization tools -- notably xgettext -- have already
|
||||
been special-cased for implicit concatenation, but not for Python's
|
||||
explicit concatenation. [#barryi8]_
|
||||
|
||||
These tools will fail to extract the (already legal)::
|
||||
|
||||
_("some string" +
|
||||
" and more of it")
|
||||
|
||||
but often have a special case for::
|
||||
|
||||
_("some string"
|
||||
" and more of it")
|
||||
|
||||
It should also be possible to just use an overly long line (xgettext
|
||||
limits messages to 2048 characters [#xgettext2048]_, which is less
|
||||
than Python's enforced limit) or triple-quoted strings, but these
|
||||
solutions sacrifice some readability in the code::
|
||||
|
||||
# Lines over a certain length are unpleasant.
|
||||
_("some string and more of it")
|
||||
|
||||
# Changing whitespace is not ideal.
|
||||
_("""Some string
|
||||
and more of it""")
|
||||
_("""Some string
|
||||
and more of it""")
|
||||
_("Some string \
|
||||
and more of it")
|
||||
|
||||
I do not see a good short-term resolution for this.
|
||||
|
||||
|
||||
Transition
|
||||
==========
|
||||
|
||||
The proposed new constructs are already legal in current Python, and
|
||||
can be used immediately.
|
||||
|
||||
The 2 to 3 translator can be made to mechanically change::
|
||||
|
||||
"str1" "str2"
|
||||
("line1" #comment
|
||||
"line2")
|
||||
|
||||
into::
|
||||
|
||||
("str1" + "str2")
|
||||
("line1" +#comments
|
||||
"line2")
|
||||
|
||||
If users want to use one of the other idioms, they can; as these
|
||||
idioms are all already legal in python 2, the edits can be made
|
||||
to the original source, rather than patching up the translator.
|
||||
|
||||
|
||||
Open Issues
|
||||
===========
|
||||
|
||||
Is there a better way to support external text extraction tools, or at
|
||||
least ``xgettext`` [#gettext]_ in particular?
|
||||
|
||||
("abc %s def" + "ghi" % var) # fails like today.
|
||||
# raises TypeError because of
|
||||
# precedence. (% before +)
|
||||
|
||||
("abc" + "def %s ghi" % var) # works like today; precedence makes
|
||||
# the optimization more difficult to
|
||||
# recognize, but does not change the
|
||||
# semantics.
|
||||
|
||||
("abc %s def" + "ghi") % var # works like today, because of
|
||||
# precedence: () before %
|
||||
# CPython compiler can already
|
||||
# add the literals at compile-time.
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
[1] Implicit String Concatenation, Jewett, Orendorff
|
||||
http://mail.python.org/pipermail/python-ideas/2007-April/000397.html
|
||||
.. [#Orendorff] Implicit String Concatenation, Orendorff
|
||||
http://mail.python.org/pipermail/python-ideas/2007-April/000397.html
|
||||
|
||||
[2] Reminder: Py3k PEPs due by April, Hettinger, van Rossum
|
||||
http://mail.python.org/pipermail/python-3000/2007-April/006563.html
|
||||
.. [#rcn-constantfold] Reminder: Py3k PEPs due by April, Hettinger,
|
||||
van Rossum
|
||||
http://mail.python.org/pipermail/python-3000/2007-April/006563.html
|
||||
|
||||
[3] PEP 3101, Advanced String Formatting, Talin
|
||||
http://www.python.org/peps/pep-3101.html
|
||||
.. [#PEP3101] PEP 3101, Advanced String Formatting, Talin
|
||||
http://www.python.org/peps/pep-3101.html
|
||||
|
||||
.. [#elimpercent] ps to question Re: Need help completing ABC pep,
|
||||
van Rossum
|
||||
http://mail.python.org/pipermail/python-3000/2007-April/006737.html
|
||||
|
||||
.. [#skipSQL] (email Subject) PEP 30XZ: Simplified Parsing, Skip,
|
||||
http://mail.python.org/pipermail/python-3000/2007-May/007261.html
|
||||
|
||||
.. [#barryi8] (email Subject) PEP 30XZ: Simplified Parsing
|
||||
http://mail.python.org/pipermail/python-3000/2007-May/007305.html
|
||||
|
||||
.. [#gettext] GNU gettext manual
|
||||
http://www.gnu.org/software/gettext/
|
||||
|
||||
.. [#xgettext2048] Unix man page for xgettext -- Notes section
|
||||
http://www.scit.wlv.ac.uk/cgi-bin/mansec?1+xgettext
|
||||
|
||||
[4] ps to question Re: Need help completing ABC pep, van Rossum
|
||||
http://mail.python.org/pipermail/python-3000/2007-April/006737.html
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
||||
|
|
Loading…
Reference in New Issue