New version, Raymond is now co-author.

This commit is contained in:
Guido van Rossum 2007-05-07 18:05:23 +00:00
parent 9ebf76db3a
commit 4426f2d7c8
1 changed files with 337 additions and 68 deletions

View File

@ -2,43 +2,109 @@ PEP: 3126
Title: Remove Implicit String Concatenation
Version: $Revision$
Last-Modified: $Date$
Author: Jim J. Jewett <JimJJewett@gmail.com>
Author: Jim J. Jewett <JimJJewett@gmail.com>,
Raymond D. Hettinger <python at rcn.com>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Content-Type: text/x-rst
Created: 29-Apr-2007
Post-History: 29-Apr-2007, 30-Apr-2007
Post-History: 29-Apr-2007, 30-Apr-2007, 07-May-2007
Abstract
========
Python initially inherited its parsing from C. While this has
been generally useful, there are some remnants which have been
less useful for python, and should be eliminated.
Python inherited many of its parsing rules from C. While this has
been generally useful, there are some individual rules which are less
useful for python, and should be eliminated.
This PEP proposes to eliminate Implicit String concatenation
based on adjacency of literals.
This PEP proposes to eliminate implicit string concatenation based
only on the adjacency of literals.
Instead of
Instead of::
"abc" "def" == "abcdef"
authors will need to be explicit, and add the strings
authors will need to be explicit, and either add the strings::
"abc" + "def" == "abcdef"
or join them::
Rationale for Removing Implicit String Concatenation
"".join(["abc", "def"]) == "abcdef"
Implicit String concatentation can lead to confusing, or even
silent, errors.
def f(arg1, arg2=None): pass
Motivation
==========
f("abc" "def") # forgot the comma, no warning ...
# silently becomes f("abcdef", None)
One goal for Python 3000 should be to simplify the language by
removing unnecessary features. Implicit string concatenation should
be dropped in favor of existing techniques. This will simplify the
grammar and simplify a user's mental picture of Python. The latter is
important for letting the language "fit in your head". A large group
of current users do not even know about implicit concatenation. Of
those who do know about it, a large portion never use it or habitually
avoid it. Of those who both know about it and use it, very few could
state with confidence the implicit operator precedence and under what
circumstances it is computed when the definition is compiled versus
when it is run.
or, using the scons build framework,
History or Future
-----------------
Many Python parsing rules are intentionally compatible with C. This
is a useful default, but Special Cases need to be justified based on
their utility in Python. We should no longer assume that python
programmers will also be familiar with C, so compatibility between
languages should be treated as a tie-breaker, rather than a
justification.
In C, implicit concatenation is the only way to join strings without
using a (run-time) function call to store into a variable. In Python,
the strings can be joined (and still recognized as immutable) using
more standard Python idioms, such ``+`` or ``"".join``.
Problem
-------
Implicit String concatentation leads to tuples and lists which are
shorter than they appear; this is turn can lead to confusing, or even
silent, errors. For example, given a function which accepts several
parameters, but offers a default value for some of them::
def f(fmt, *args):
print fmt % args
This looks like a valid call, but isn't::
>>> f("User %s got a message %s",
"Bob"
"Time for dinner")
Traceback (most recent call last):
File "<pyshell#8>", line 2, in <module>
"Bob"
File "<pyshell#3>", line 2, in f
print fmt % args
TypeError: not enough arguments for format string
Calls to this function can silently do the wrong thing::
def g(arg1, arg2=None):
...
# silently transformed into the possibly very different
# g("arg1 on this linearg2 on this line", None)
g("arg1 on this line"
"arg2 on this line")
To quote Jason Orendorff [#Orendorff]
Oh. I just realized this happens a lot out here. Where I work,
we use scons, and each SConscript has a long list of filenames::
sourceFiles = [
'foo.c'
@ -46,67 +112,270 @@ Rationale for Removing Implicit String Concatenation
#...many lines omitted...
'q1000x.c']
It's a common mistake to leave off a comma, and then scons complains
that it can't find 'foo.cbar.c'. This is pretty bewildering behavior
even if you *are* a Python programmer, and not everyone here is. [1]
It's a common mistake to leave off a comma, and then scons
complains that it can't find 'foo.cbar.c'. This is pretty
bewildering behavior even if you *are* a Python programmer,
and not everyone here is.
Note that in C, the implicit concatenation is more justified; there
is no other way to join strings without (at least) a function call.
In Python, strings are objects which support the __add__ operator;
it is possible to write:
Solution
========
In Python, strings are objects and they support the __add__ operator,
so it is possible to write::
"abc" + "def"
Because these are literals, this addition can still be optimized
away by the compiler. (The CPython compiler already does. [2])
Because these are literals, this addition can still be optimized away
by the compiler; the CPython compiler already does so.
[#rcn-constantfold]_
Guido indicated [2] that this change should be handled by PEP, because
there were a few edge cases with other string operators, such as the %.
(Assuming that str % stays -- it may be eliminated in favor of
PEP 3101 -- Advanced String Formatting. [3] [4])
Other existing alternatives include multiline (triple-quoted) strings,
and the join method::
The resolution is to treat them the same as today.
"""This string
extends across
multiple lines, but you may want to use something like
Textwrap.dedent
to clear out the leading spaces
and/or reformat.
"""
("abc %s def" + "ghi" % var) # fails like today.
# raises TypeError because of
# precedence. (% before +)
("abc" + "def %s ghi" % var) # works like today; precedence makes
# the optimization more difficult to
# recognize, but does not change the
# semantics.
>>> "".join(["empty", "string", "joiner"]) == "emptystringjoiner"
True
("abc %s def" + "ghi") % var # works like today, because of
# precedence: () before %
# CPython compiler can already
# add the literals at compile-time.
>>> " ".join(["space", "string", "joiner"]) == "space string joiner"
>>> "\n".join(["multiple", "lines"]) == "multiple\nlines" == (
"""multiple
lines""")
True
Concerns
========
Operator Precedence
-------------------
Guido indicated [#rcn-constantfold]_ that this change should be
handled by PEP, because there were a few edge cases with other string
operators, such as the %. (Assuming that str % stays -- it may be
eliminated in favor of PEP 3101 -- Advanced String Formatting.
[#PEP3101]_ [#elimpercent]_)
The resolution is to use parentheses to enforce precedence -- the same
solution that can be used today::
# Clearest, works today, continues to work, optimization is
# already possible.
("abc %s def" + "ghi") % var
# Already works today; precedence makes the optimization more
# difficult to recognize, but does not change the semantics.
"abc" + "def %s ghi" % var
as opposed to::
# Already fails because modulus (%) is higher precedence than
# addition (+)
("abc %s def" + "ghi" % var)
# Works today only because adjacency is higher precedence than
# modulus. This will no longer be available.
"abc %s" "def" % var
# So the 2-to-3 translator can automically replace it with the
# (already valid):
("abc %s" + "def") % var
Long Commands
-------------
... build up (what I consider to be) readable SQL queries [#skipSQL]_::
rows = self.executesql("select cities.city, state, country"
" from cities, venues, events, addresses"
" where cities.city like %s"
" and events.active = 1"
" and venues.address = addresses.id"
" and addresses.city = cities.id"
" and events.venue = venues.id",
(city,))
Alternatives again include triple-quoted strings, ``+``, and ``.join``::
query="""select cities.city, state, country
from cities, venues, events, addresses
where cities.city like %s
and events.active = 1"
and venues.address = addresses.id
and addresses.city = cities.id
and events.venue = venues.id"""
query=( "select cities.city, state, country"
+ " from cities, venues, events, addresses"
+ " where cities.city like %s"
+ " and events.active = 1"
+ " and venues.address = addresses.id"
+ " and addresses.city = cities.id"
+ " and events.venue = venues.id"
)
query="\n".join(["select cities.city, state, country",
" from cities, venues, events, addresses",
" where cities.city like %s",
" and events.active = 1",
" and venues.address = addresses.id",
" and addresses.city = cities.id",
" and events.venue = venues.id"])
# And yes, you *could* inline any of the above querystrings
# the same way the original was inlined.
rows = self.executesql(query, (city,))
Regular Expressions
-------------------
Complex regular expressions are sometimes stated in terms of several
implicitly concatenated strings with each regex component on a
different line and followed by a comment. The plus operator can be
inserted here but it does make the regex harder to read. One
alternative is to use the re.VERBOSE option. Another alternative is
to build-up the regex with a series of += lines::
# Existing idiom which relies on implicit concatenation
r = ('a{20}' # Twenty A's
'b{5}' # Followed by Five B's
)
# Mechanical replacement
r = ('a{20}' +# Twenty A's
'b{5}' # Followed by Five B's
)
# already works today
r = '''a{20} # Twenty A's
b{5} # Followed by Five B's
''' # Compiled with the re.VERBOSE flag
# already works today
r = 'a{20}' # Twenty A's
r += 'b{5}' # Followed by Five B's
Internationalization
--------------------
Some internationalization tools -- notably xgettext -- have already
been special-cased for implicit concatenation, but not for Python's
explicit concatenation. [#barryi8]_
These tools will fail to extract the (already legal)::
_("some string" +
" and more of it")
but often have a special case for::
_("some string"
" and more of it")
It should also be possible to just use an overly long line (xgettext
limits messages to 2048 characters [#xgettext2048]_, which is less
than Python's enforced limit) or triple-quoted strings, but these
solutions sacrifice some readability in the code::
# Lines over a certain length are unpleasant.
_("some string and more of it")
# Changing whitespace is not ideal.
_("""Some string
and more of it""")
_("""Some string
and more of it""")
_("Some string \
and more of it")
I do not see a good short-term resolution for this.
Transition
==========
The proposed new constructs are already legal in current Python, and
can be used immediately.
The 2 to 3 translator can be made to mechanically change::
"str1" "str2"
("line1" #comment
"line2")
into::
("str1" + "str2")
("line1" +#comments
"line2")
If users want to use one of the other idioms, they can; as these
idioms are all already legal in python 2, the edits can be made
to the original source, rather than patching up the translator.
Open Issues
===========
Is there a better way to support external text extraction tools, or at
least ``xgettext`` [#gettext]_ in particular?
References
==========
[1] Implicit String Concatenation, Jewett, Orendorff
.. [#Orendorff] Implicit String Concatenation, Orendorff
http://mail.python.org/pipermail/python-ideas/2007-April/000397.html
[2] Reminder: Py3k PEPs due by April, Hettinger, van Rossum
.. [#rcn-constantfold] Reminder: Py3k PEPs due by April, Hettinger,
van Rossum
http://mail.python.org/pipermail/python-3000/2007-April/006563.html
[3] PEP 3101, Advanced String Formatting, Talin
.. [#PEP3101] PEP 3101, Advanced String Formatting, Talin
http://www.python.org/peps/pep-3101.html
[4] ps to question Re: Need help completing ABC pep, van Rossum
.. [#elimpercent] ps to question Re: Need help completing ABC pep,
van Rossum
http://mail.python.org/pipermail/python-3000/2007-April/006737.html
.. [#skipSQL] (email Subject) PEP 30XZ: Simplified Parsing, Skip,
http://mail.python.org/pipermail/python-3000/2007-May/007261.html
.. [#barryi8] (email Subject) PEP 30XZ: Simplified Parsing
http://mail.python.org/pipermail/python-3000/2007-May/007305.html
.. [#gettext] GNU gettext manual
http://www.gnu.org/software/gettext/
.. [#xgettext2048] Unix man page for xgettext -- Notes section
http://www.scit.wlv.ac.uk/cgi-bin/mansec?1+xgettext
Copyright
=========
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: