New version, Raymond is now co-author.
This commit is contained in:
parent
9ebf76db3a
commit
4426f2d7c8
405
pep-3126.txt
405
pep-3126.txt
|
@ -2,111 +2,380 @@ PEP: 3126
|
||||||
Title: Remove Implicit String Concatenation
|
Title: Remove Implicit String Concatenation
|
||||||
Version: $Revision$
|
Version: $Revision$
|
||||||
Last-Modified: $Date$
|
Last-Modified: $Date$
|
||||||
Author: Jim J. Jewett <JimJJewett@gmail.com>
|
Author: Jim J. Jewett <JimJJewett@gmail.com>,
|
||||||
|
Raymond D. Hettinger <python at rcn.com>
|
||||||
Status: Draft
|
Status: Draft
|
||||||
Type: Standards Track
|
Type: Standards Track
|
||||||
Content-Type: text/plain
|
Content-Type: text/x-rst
|
||||||
Created: 29-Apr-2007
|
Created: 29-Apr-2007
|
||||||
Post-History: 29-Apr-2007, 30-Apr-2007
|
Post-History: 29-Apr-2007, 30-Apr-2007, 07-May-2007
|
||||||
|
|
||||||
|
|
||||||
Abstract
|
Abstract
|
||||||
|
========
|
||||||
|
|
||||||
Python initially inherited its parsing from C. While this has
|
Python inherited many of its parsing rules from C. While this has
|
||||||
been generally useful, there are some remnants which have been
|
been generally useful, there are some individual rules which are less
|
||||||
less useful for python, and should be eliminated.
|
useful for python, and should be eliminated.
|
||||||
|
|
||||||
This PEP proposes to eliminate Implicit String concatenation
|
This PEP proposes to eliminate implicit string concatenation based
|
||||||
based on adjacency of literals.
|
only on the adjacency of literals.
|
||||||
|
|
||||||
Instead of
|
Instead of::
|
||||||
|
|
||||||
"abc" "def" == "abcdef"
|
"abc" "def" == "abcdef"
|
||||||
|
|
||||||
authors will need to be explicit, and add the strings
|
authors will need to be explicit, and either add the strings::
|
||||||
|
|
||||||
"abc" + "def" == "abcdef"
|
"abc" + "def" == "abcdef"
|
||||||
|
|
||||||
|
or join them::
|
||||||
|
|
||||||
|
"".join(["abc", "def"]) == "abcdef"
|
||||||
|
|
||||||
|
|
||||||
Rationale for Removing Implicit String Concatenation
|
Motivation
|
||||||
|
==========
|
||||||
|
|
||||||
Implicit String concatentation can lead to confusing, or even
|
One goal for Python 3000 should be to simplify the language by
|
||||||
silent, errors.
|
removing unnecessary features. Implicit string concatenation should
|
||||||
|
be dropped in favor of existing techniques. This will simplify the
|
||||||
|
grammar and simplify a user's mental picture of Python. The latter is
|
||||||
|
important for letting the language "fit in your head". A large group
|
||||||
|
of current users do not even know about implicit concatenation. Of
|
||||||
|
those who do know about it, a large portion never use it or habitually
|
||||||
|
avoid it. Of those who both know about it and use it, very few could
|
||||||
|
state with confidence the implicit operator precedence and under what
|
||||||
|
circumstances it is computed when the definition is compiled versus
|
||||||
|
when it is run.
|
||||||
|
|
||||||
def f(arg1, arg2=None): pass
|
|
||||||
|
|
||||||
f("abc" "def") # forgot the comma, no warning ...
|
History or Future
|
||||||
# silently becomes f("abcdef", None)
|
-----------------
|
||||||
|
|
||||||
or, using the scons build framework,
|
Many Python parsing rules are intentionally compatible with C. This
|
||||||
|
is a useful default, but Special Cases need to be justified based on
|
||||||
|
their utility in Python. We should no longer assume that python
|
||||||
|
programmers will also be familiar with C, so compatibility between
|
||||||
|
languages should be treated as a tie-breaker, rather than a
|
||||||
|
justification.
|
||||||
|
|
||||||
|
In C, implicit concatenation is the only way to join strings without
|
||||||
|
using a (run-time) function call to store into a variable. In Python,
|
||||||
|
the strings can be joined (and still recognized as immutable) using
|
||||||
|
more standard Python idioms, such ``+`` or ``"".join``.
|
||||||
|
|
||||||
|
|
||||||
|
Problem
|
||||||
|
-------
|
||||||
|
|
||||||
|
Implicit String concatentation leads to tuples and lists which are
|
||||||
|
shorter than they appear; this is turn can lead to confusing, or even
|
||||||
|
silent, errors. For example, given a function which accepts several
|
||||||
|
parameters, but offers a default value for some of them::
|
||||||
|
|
||||||
|
def f(fmt, *args):
|
||||||
|
print fmt % args
|
||||||
|
|
||||||
|
This looks like a valid call, but isn't::
|
||||||
|
|
||||||
|
>>> f("User %s got a message %s",
|
||||||
|
"Bob"
|
||||||
|
"Time for dinner")
|
||||||
|
|
||||||
|
Traceback (most recent call last):
|
||||||
|
File "<pyshell#8>", line 2, in <module>
|
||||||
|
"Bob"
|
||||||
|
File "<pyshell#3>", line 2, in f
|
||||||
|
print fmt % args
|
||||||
|
TypeError: not enough arguments for format string
|
||||||
|
|
||||||
|
|
||||||
|
Calls to this function can silently do the wrong thing::
|
||||||
|
|
||||||
|
def g(arg1, arg2=None):
|
||||||
|
...
|
||||||
|
|
||||||
|
# silently transformed into the possibly very different
|
||||||
|
# g("arg1 on this linearg2 on this line", None)
|
||||||
|
g("arg1 on this line"
|
||||||
|
"arg2 on this line")
|
||||||
|
|
||||||
|
To quote Jason Orendorff [#Orendorff]
|
||||||
|
|
||||||
|
Oh. I just realized this happens a lot out here. Where I work,
|
||||||
|
we use scons, and each SConscript has a long list of filenames::
|
||||||
|
|
||||||
sourceFiles = [
|
sourceFiles = [
|
||||||
'foo.c'
|
'foo.c'
|
||||||
'bar.c',
|
'bar.c',
|
||||||
#...many lines omitted...
|
#...many lines omitted...
|
||||||
'q1000x.c']
|
'q1000x.c']
|
||||||
|
|
||||||
It's a common mistake to leave off a comma, and then scons complains
|
It's a common mistake to leave off a comma, and then scons
|
||||||
that it can't find 'foo.cbar.c'. This is pretty bewildering behavior
|
complains that it can't find 'foo.cbar.c'. This is pretty
|
||||||
even if you *are* a Python programmer, and not everyone here is. [1]
|
bewildering behavior even if you *are* a Python programmer,
|
||||||
|
and not everyone here is.
|
||||||
|
|
||||||
Note that in C, the implicit concatenation is more justified; there
|
|
||||||
is no other way to join strings without (at least) a function call.
|
|
||||||
|
|
||||||
In Python, strings are objects which support the __add__ operator;
|
Solution
|
||||||
it is possible to write:
|
========
|
||||||
|
|
||||||
"abc" + "def"
|
In Python, strings are objects and they support the __add__ operator,
|
||||||
|
so it is possible to write::
|
||||||
|
|
||||||
Because these are literals, this addition can still be optimized
|
"abc" + "def"
|
||||||
away by the compiler. (The CPython compiler already does. [2])
|
|
||||||
|
|
||||||
Guido indicated [2] that this change should be handled by PEP, because
|
Because these are literals, this addition can still be optimized away
|
||||||
there were a few edge cases with other string operators, such as the %.
|
by the compiler; the CPython compiler already does so.
|
||||||
(Assuming that str % stays -- it may be eliminated in favor of
|
[#rcn-constantfold]_
|
||||||
PEP 3101 -- Advanced String Formatting. [3] [4])
|
|
||||||
|
Other existing alternatives include multiline (triple-quoted) strings,
|
||||||
The resolution is to treat them the same as today.
|
and the join method::
|
||||||
|
|
||||||
|
"""This string
|
||||||
|
extends across
|
||||||
|
multiple lines, but you may want to use something like
|
||||||
|
Textwrap.dedent
|
||||||
|
to clear out the leading spaces
|
||||||
|
and/or reformat.
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
>>> "".join(["empty", "string", "joiner"]) == "emptystringjoiner"
|
||||||
|
True
|
||||||
|
|
||||||
|
>>> " ".join(["space", "string", "joiner"]) == "space string joiner"
|
||||||
|
|
||||||
|
>>> "\n".join(["multiple", "lines"]) == "multiple\nlines" == (
|
||||||
|
"""multiple
|
||||||
|
lines""")
|
||||||
|
True
|
||||||
|
|
||||||
|
|
||||||
|
Concerns
|
||||||
|
========
|
||||||
|
|
||||||
|
|
||||||
|
Operator Precedence
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Guido indicated [#rcn-constantfold]_ that this change should be
|
||||||
|
handled by PEP, because there were a few edge cases with other string
|
||||||
|
operators, such as the %. (Assuming that str % stays -- it may be
|
||||||
|
eliminated in favor of PEP 3101 -- Advanced String Formatting.
|
||||||
|
[#PEP3101]_ [#elimpercent]_)
|
||||||
|
|
||||||
|
The resolution is to use parentheses to enforce precedence -- the same
|
||||||
|
solution that can be used today::
|
||||||
|
|
||||||
|
# Clearest, works today, continues to work, optimization is
|
||||||
|
# already possible.
|
||||||
|
("abc %s def" + "ghi") % var
|
||||||
|
|
||||||
|
# Already works today; precedence makes the optimization more
|
||||||
|
# difficult to recognize, but does not change the semantics.
|
||||||
|
"abc" + "def %s ghi" % var
|
||||||
|
|
||||||
|
as opposed to::
|
||||||
|
|
||||||
|
# Already fails because modulus (%) is higher precedence than
|
||||||
|
# addition (+)
|
||||||
|
("abc %s def" + "ghi" % var)
|
||||||
|
|
||||||
|
# Works today only because adjacency is higher precedence than
|
||||||
|
# modulus. This will no longer be available.
|
||||||
|
"abc %s" "def" % var
|
||||||
|
|
||||||
|
# So the 2-to-3 translator can automically replace it with the
|
||||||
|
# (already valid):
|
||||||
|
("abc %s" + "def") % var
|
||||||
|
|
||||||
|
|
||||||
|
Long Commands
|
||||||
|
-------------
|
||||||
|
|
||||||
|
... build up (what I consider to be) readable SQL queries [#skipSQL]_::
|
||||||
|
|
||||||
|
rows = self.executesql("select cities.city, state, country"
|
||||||
|
" from cities, venues, events, addresses"
|
||||||
|
" where cities.city like %s"
|
||||||
|
" and events.active = 1"
|
||||||
|
" and venues.address = addresses.id"
|
||||||
|
" and addresses.city = cities.id"
|
||||||
|
" and events.venue = venues.id",
|
||||||
|
(city,))
|
||||||
|
|
||||||
|
Alternatives again include triple-quoted strings, ``+``, and ``.join``::
|
||||||
|
|
||||||
|
query="""select cities.city, state, country
|
||||||
|
from cities, venues, events, addresses
|
||||||
|
where cities.city like %s
|
||||||
|
and events.active = 1"
|
||||||
|
and venues.address = addresses.id
|
||||||
|
and addresses.city = cities.id
|
||||||
|
and events.venue = venues.id"""
|
||||||
|
|
||||||
|
query=( "select cities.city, state, country"
|
||||||
|
+ " from cities, venues, events, addresses"
|
||||||
|
+ " where cities.city like %s"
|
||||||
|
+ " and events.active = 1"
|
||||||
|
+ " and venues.address = addresses.id"
|
||||||
|
+ " and addresses.city = cities.id"
|
||||||
|
+ " and events.venue = venues.id"
|
||||||
|
)
|
||||||
|
|
||||||
|
query="\n".join(["select cities.city, state, country",
|
||||||
|
" from cities, venues, events, addresses",
|
||||||
|
" where cities.city like %s",
|
||||||
|
" and events.active = 1",
|
||||||
|
" and venues.address = addresses.id",
|
||||||
|
" and addresses.city = cities.id",
|
||||||
|
" and events.venue = venues.id"])
|
||||||
|
|
||||||
|
# And yes, you *could* inline any of the above querystrings
|
||||||
|
# the same way the original was inlined.
|
||||||
|
rows = self.executesql(query, (city,))
|
||||||
|
|
||||||
|
|
||||||
|
Regular Expressions
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Complex regular expressions are sometimes stated in terms of several
|
||||||
|
implicitly concatenated strings with each regex component on a
|
||||||
|
different line and followed by a comment. The plus operator can be
|
||||||
|
inserted here but it does make the regex harder to read. One
|
||||||
|
alternative is to use the re.VERBOSE option. Another alternative is
|
||||||
|
to build-up the regex with a series of += lines::
|
||||||
|
|
||||||
|
# Existing idiom which relies on implicit concatenation
|
||||||
|
r = ('a{20}' # Twenty A's
|
||||||
|
'b{5}' # Followed by Five B's
|
||||||
|
)
|
||||||
|
|
||||||
|
# Mechanical replacement
|
||||||
|
r = ('a{20}' +# Twenty A's
|
||||||
|
'b{5}' # Followed by Five B's
|
||||||
|
)
|
||||||
|
|
||||||
|
# already works today
|
||||||
|
r = '''a{20} # Twenty A's
|
||||||
|
b{5} # Followed by Five B's
|
||||||
|
''' # Compiled with the re.VERBOSE flag
|
||||||
|
|
||||||
|
# already works today
|
||||||
|
r = 'a{20}' # Twenty A's
|
||||||
|
r += 'b{5}' # Followed by Five B's
|
||||||
|
|
||||||
|
|
||||||
|
Internationalization
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Some internationalization tools -- notably xgettext -- have already
|
||||||
|
been special-cased for implicit concatenation, but not for Python's
|
||||||
|
explicit concatenation. [#barryi8]_
|
||||||
|
|
||||||
|
These tools will fail to extract the (already legal)::
|
||||||
|
|
||||||
|
_("some string" +
|
||||||
|
" and more of it")
|
||||||
|
|
||||||
|
but often have a special case for::
|
||||||
|
|
||||||
|
_("some string"
|
||||||
|
" and more of it")
|
||||||
|
|
||||||
|
It should also be possible to just use an overly long line (xgettext
|
||||||
|
limits messages to 2048 characters [#xgettext2048]_, which is less
|
||||||
|
than Python's enforced limit) or triple-quoted strings, but these
|
||||||
|
solutions sacrifice some readability in the code::
|
||||||
|
|
||||||
|
# Lines over a certain length are unpleasant.
|
||||||
|
_("some string and more of it")
|
||||||
|
|
||||||
|
# Changing whitespace is not ideal.
|
||||||
|
_("""Some string
|
||||||
|
and more of it""")
|
||||||
|
_("""Some string
|
||||||
|
and more of it""")
|
||||||
|
_("Some string \
|
||||||
|
and more of it")
|
||||||
|
|
||||||
|
I do not see a good short-term resolution for this.
|
||||||
|
|
||||||
|
|
||||||
|
Transition
|
||||||
|
==========
|
||||||
|
|
||||||
|
The proposed new constructs are already legal in current Python, and
|
||||||
|
can be used immediately.
|
||||||
|
|
||||||
|
The 2 to 3 translator can be made to mechanically change::
|
||||||
|
|
||||||
|
"str1" "str2"
|
||||||
|
("line1" #comment
|
||||||
|
"line2")
|
||||||
|
|
||||||
|
into::
|
||||||
|
|
||||||
|
("str1" + "str2")
|
||||||
|
("line1" +#comments
|
||||||
|
"line2")
|
||||||
|
|
||||||
|
If users want to use one of the other idioms, they can; as these
|
||||||
|
idioms are all already legal in python 2, the edits can be made
|
||||||
|
to the original source, rather than patching up the translator.
|
||||||
|
|
||||||
|
|
||||||
|
Open Issues
|
||||||
|
===========
|
||||||
|
|
||||||
|
Is there a better way to support external text extraction tools, or at
|
||||||
|
least ``xgettext`` [#gettext]_ in particular?
|
||||||
|
|
||||||
("abc %s def" + "ghi" % var) # fails like today.
|
|
||||||
# raises TypeError because of
|
|
||||||
# precedence. (% before +)
|
|
||||||
|
|
||||||
("abc" + "def %s ghi" % var) # works like today; precedence makes
|
|
||||||
# the optimization more difficult to
|
|
||||||
# recognize, but does not change the
|
|
||||||
# semantics.
|
|
||||||
|
|
||||||
("abc %s def" + "ghi") % var # works like today, because of
|
|
||||||
# precedence: () before %
|
|
||||||
# CPython compiler can already
|
|
||||||
# add the literals at compile-time.
|
|
||||||
|
|
||||||
|
|
||||||
References
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
[1] Implicit String Concatenation, Jewett, Orendorff
|
.. [#Orendorff] Implicit String Concatenation, Orendorff
|
||||||
http://mail.python.org/pipermail/python-ideas/2007-April/000397.html
|
http://mail.python.org/pipermail/python-ideas/2007-April/000397.html
|
||||||
|
|
||||||
[2] Reminder: Py3k PEPs due by April, Hettinger, van Rossum
|
.. [#rcn-constantfold] Reminder: Py3k PEPs due by April, Hettinger,
|
||||||
http://mail.python.org/pipermail/python-3000/2007-April/006563.html
|
van Rossum
|
||||||
|
http://mail.python.org/pipermail/python-3000/2007-April/006563.html
|
||||||
|
|
||||||
[3] PEP 3101, Advanced String Formatting, Talin
|
.. [#PEP3101] PEP 3101, Advanced String Formatting, Talin
|
||||||
http://www.python.org/peps/pep-3101.html
|
http://www.python.org/peps/pep-3101.html
|
||||||
|
|
||||||
|
.. [#elimpercent] ps to question Re: Need help completing ABC pep,
|
||||||
|
van Rossum
|
||||||
|
http://mail.python.org/pipermail/python-3000/2007-April/006737.html
|
||||||
|
|
||||||
|
.. [#skipSQL] (email Subject) PEP 30XZ: Simplified Parsing, Skip,
|
||||||
|
http://mail.python.org/pipermail/python-3000/2007-May/007261.html
|
||||||
|
|
||||||
|
.. [#barryi8] (email Subject) PEP 30XZ: Simplified Parsing
|
||||||
|
http://mail.python.org/pipermail/python-3000/2007-May/007305.html
|
||||||
|
|
||||||
|
.. [#gettext] GNU gettext manual
|
||||||
|
http://www.gnu.org/software/gettext/
|
||||||
|
|
||||||
|
.. [#xgettext2048] Unix man page for xgettext -- Notes section
|
||||||
|
http://www.scit.wlv.ac.uk/cgi-bin/mansec?1+xgettext
|
||||||
|
|
||||||
[4] ps to question Re: Need help completing ABC pep, van Rossum
|
|
||||||
http://mail.python.org/pipermail/python-3000/2007-April/006737.html
|
|
||||||
|
|
||||||
Copyright
|
Copyright
|
||||||
|
=========
|
||||||
|
|
||||||
This document has been placed in the public domain.
|
This document has been placed in the public domain.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Local Variables:
|
..
|
||||||
mode: indented-text
|
Local Variables:
|
||||||
indent-tabs-mode: nil
|
mode: indented-text
|
||||||
sentence-end-double-space: t
|
indent-tabs-mode: nil
|
||||||
fill-column: 70
|
sentence-end-double-space: t
|
||||||
coding: utf-8
|
fill-column: 70
|
||||||
End:
|
coding: utf-8
|
||||||
|
End:
|
||||||
|
|
Loading…
Reference in New Issue