python-peps/pep-0008.txt

584 lines
22 KiB
Plaintext
Raw Normal View History

PEP: 8
Title: Style Guide for Python Code
Version: $Revision$
Author: guido@python.org (Guido van Rossum),
2001-08-14 11:45:26 -04:00
barry@zope.com (Barry Warsaw)
Status: Active
Type: Informational
Created: 05-Jul-2001
2001-07-05 14:56:34 -04:00
Post-History: 05-Jul-2001
Introduction
This document gives coding conventions for the Python code
comprising the standard library for the main Python distribution.
Please see the companion informational PEP describing style
guidelines for the C code in the C implementation of Python[1].
This document was adapted from Guido's original Python Style
Guide essay[2], with some additions from Barry's style guide[5].
Where there's conflict, Guido's style rules for the purposes of
this PEP. This PEP may still be incomplete (in fact, it may never
be finished <wink>).
A Foolish Consistency is the Hobgoblin of Little Minds
A style guide is about consistency. Consistency with this style
guide is important. Consistency within a project is more
important. Consistency within one module or function is most
important.
But most importantly: know when to be inconsistent -- sometimes
the style guide just doesn't apply. When in doubt, use your best
judgement. Look at other examples and decide what looks best.
And don't hesitate to ask!
Two good reasons to break a particular rule:
(1) When applying the rule would make the code less readable, even
for someone who is used to reading code that follows the rules.
(2) To be consistent with surrounding code that also breaks it
(maybe for historic reasons) -- although this is also an
opportunity to clean up someone else's mess (in true XP style).
Code lay-out
Indentation
Use the default of Emacs' Python-mode: 4 spaces for one
indentation level. For really old code that you don't want to
mess up, you can continue to use 8-space tabs. Emacs Python-mode
auto-detects the prevailing indentation level used in a file and
sets its indentation parameters accordingly.
Tabs or Spaces?
Never mix tabs and spaces. The most popular way of indenting
Python is with spaces only. The second-most popular way is with
tabs only. Code indented with a mixture of tabs and spaces should
be converted to using spaces exclusively. (In Emacs, select the
whole buffer and hit ESC-x untabify.) When invoking the python
command line interpreter with the -t option, it issues warnings
about code that illegally mixes tabs and spaces. When using -tt
these warnings become errors. These options are highly
recommended!
For new projects, spaces-only are strongly recommended over tabs.
Most editors have features that make this easy to do. (In Emacs,
make sure indent-tabs-mode is nil).
Maximum Line Length
There are still many devices around that are limited to 80
character lines; plus, limiting windows to 80 characters makes it
possible to have several windows side-by-side. The default
wrapping on such devices looks ugly. Therefore, please limit all
lines to a maximum of 79 characters (Emacs wraps lines that are
exactly 80 characters long). For flowing long blocks of text
(docstrings or comments), limiting the length to 72 characters is
recommended.
The preferred way of wrapping long lines is by using Python's
implied line continuation inside parentheses, brackets and braces.
If necessary, you can add an extra pair of parentheses around an
expression, but sometimes using a backslash looks better. Make
sure to indent the continued line appropriately. Emacs
Python-mode does this right. Some examples:
class Rectangle(Blob):
def __init__(self, width, height,
color='black', emphasis=None, highlight=0):
if width == 0 and height == 0 and \
color == 'red' and emphasis == 'strong' or \
highlight > 100:
raise ValueError, "sorry, you lose"
if width == 0 and height == 0 and (color == 'red' or
emphasis is None):
raise ValueError, "I don't think so"
Blob.__init__(self, width, height,
color, emphasis, highlight)
Blank Lines
Separate top-level function and class definitions with two blank
lines. Method definitions inside a class are separated by a
single blank line. Extra blank lines may be used (sparingly) to
separate groups of related functions. Blank lines may be omitted
between a bunch of related one-liners (e.g. a set of dummy
implementations).
When blank lines are used to separate method definitions, there is
also a blank line between the `class' line and the first method
definition.
Use blank lines in functions, sparingly, to indicate logical
sections.
Python accepts the control-L (i.e. ^L) form feed character as
whitespace; Emacs (and some printing tools) treat these
characters as page separators, so you may use them to separate
pages of related sections of your file.
Imports
- Imports should usually be on separate lines, e.g.:
No: import sys, os
Yes: import sys
import os
it's okay to say this though:
from types import StringType, ListType
- Imports are always put at the top of the file, just after any
module comments and docstrings, and before module globals and
constants. Imports should be grouped, with the order being
1. standard library imports
2. related major package imports (i.e. all email package imports next)
3. application specific imports
You should put a blank line between each group of imports.
- Relative imports for intra-package imports are highly
discouraged. Always use the absolute package path for all
imports.
- When importing a class from a class-containing module, it's usually
okay to spell this
from MyClass import MyClass
from foo.bar.YourClass import YourClass
If this spelling causes local name clashes, then spell them
import MyClass
import foo.bar.YourClass
and use "MyClass.MyClass" and "foo.bar.YourClass.YourClass"
Whitespace in Expressions and Statements
Pet Peeves
Guido hates whitespace in the following places:
- Immediately inside parentheses, brackets or braces, as in:
"spam( ham[ 1 ], { eggs: 2 } )". Always write this as
"spam(ham[1], {eggs: 2})".
- Immediately before a comma, semicolon, or colon, as in:
"if x == 4 : print x , y ; x , y = y , x". Always write this as
"if x == 4: print x, y; x, y = y, x".
- Immediately before the open parenthesis that starts the argument
list of a function call, as in "spam (1)". Always write
this as "spam(1)".
- Immediately before the open parenthesis that starts an indexing or
slicing, as in: "dict ['key'] = list [index]". Always
write this as "dict['key'] = list[index]".
- More than one space around an assignment (or other) operator to
align it with another, as in:
x = 1
y = 2
long_variable = 3
Always write this as
x = 1
y = 2
long_variable = 3
(Don't bother to argue with him on any of the above -- Guido's
grown accustomed to this style over 20 years.)
Other Recommendations
- Always surround these binary operators with a single space on
either side: assignment (=), comparisons (==, <, >, !=, <>, <=,
>=, in, not in, is, is not), Booleans (and, or, not).
- Use your better judgment for the insertion of spaces around
arithmetic operators. Always be consistent about whitespace on
either side of a binary operator. Some examples:
i = i+1
submitted = submitted + 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)
c = (a + b) * (a - b)
- Don't use spaces around the '=' sign when used to indicate a
keyword argument or a default parameter value. For instance:
def complex(real, imag=0.0):
return magic(r=real, i=imag)
Comments
Comments that contradict the code are worse than no comments.
Always make a priority of keeping the comments up-to-date when the
code changes!
Comments should be complete sentences. If a comment is a phrase
or sentence, its first word should be capitalized, unless it is an
identifier that begins with a lower case letter (never alter the
case of identifiers!).
If a comment is short, the period at the end is best omitted.
Block comments generally consist of one or more paragraphs built
out of complete sentences, and each sentence should end in a
period.
You should use two spaces after a sentence-ending period, since it
makes Emacs wrapping and filling work consistenty.
When writing English, Strunk and White apply.
Python coders from non-English speaking countries: please write
your comments in English, unless you are 120% sure that the code
will never be read by people who don't speak your language.
Block Comments
Block comments generally apply to some (or all) code that follows
them, and are indented to the same level as that code. Each line
of a block comment starts with a # and a single space (unless it
is indented text inside the comment). Paragraphs inside a block
comment are separated by a line containing a single #. Block
comments are best surrounded by a blank line above and below them
(or two lines above and a single line below for a block comment at
the start of a a new section of function definitions).
Inline Comments
An inline comment is a comment on the same line as a statement.
Inline comments should be used sparingly. Inline comments should
be separated by at least two spaces from the statement. They
should start with a # and a single space.
Inline comments are unnecessary and in fact distracting if they state
the obvious. Don't do this:
x = x+1 # Increment x
But sometimes, this is useful:
x = x+1 # Compensate for border
Documentation Strings
Conventions for writing good documentation strings
(a.k.a. "docstrings") are immortalized in PEP 257 [3].
- Write docstrings for all public modules, functions, classes, and
methods. Docstrings are not necessary for non-public methods,
but you should have a comment that describes what the method
does. This comment should appear after the "def" line.
2002-05-24 15:42:30 -04:00
- PEP 257 describes good docstring conventions. Note that most
importantly, the """ that ends a multiline docstring should be
on a line by itself, e.g.:
"""Return a foobang
Optional plotz says to frobnicate the bizbaz first.
"""
- For one liner docstrings, it's okay to keep the closing """ on
the same line.
Version Bookkeeping
If you have to have RCS or CVS crud in your source file, do it as
follows.
__version__ = "$Revision$"
# $Source$
These lines should be included after the module's docstring,
before any other code, separated by a blank line above and
below.
Naming Conventions
The naming conventions of Python's library are a bit of a mess, so
we'll never get this completely consistent -- nevertheless, here
are some guidelines.
Descriptive: Naming Styles
There are a lot of different naming styles. It helps to be able
to recognize what naming style is being used, independently from
what they are used for.
The following naming styles are commonly distinguished:
- x (single lowercase letter)
- X (single uppercase letter)
- lowercase
- lower_case_with_underscores
- UPPERCASE
- UPPER_CASE_WITH_UNDERSCORES
- CapitalizedWords (or CapWords, or CamelCase -- so named because
of the bumpy look of its letters[4])
- mixedCase (differs from CapitalizedWords by initial lowercase
character!)
- Capitalized_Words_With_Underscores (ugly!)
There's also the style of using a short unique prefix to group
related names together. This is not used much in Python, but it
is mentioned for completeness. For example, the os.stat()
function returns a tuple whose items traditionally have names like
st_mode, st_size, st_mtime and so on. The X11 library uses a
leading X for all its public functions. (In Python, this style is
generally deemed unnecessary because attribute and method names
are prefixed with an object, and function names are prefixed with
a module name.)
In addition, the following special forms using leading or trailing
underscores are recognized (these can generally be combined with any
case convention):
- _single_leading_underscore: weak "internal use" indicator
(e.g. "from M import *" does not import objects whose name
starts with an underscore).
- single_trailing_underscore_: used by convention to avoid
conflicts with Python keyword, e.g.
"Tkinter.Toplevel(master, class_='ClassName')".
- __double_leading_underscore: class-private names as of Python 1.4.
- __double_leading_and_trailing_underscore__: "magic" objects or
attributes that live in user-controlled namespaces,
e.g. __init__, __import__ or __file__. Sometimes these are
defined by the user to trigger certain magic behavior
(e.g. operator overloading); sometimes these are inserted by the
infrastructure for its own use or for debugging purposes. Since
the infrastructure (loosely defined as the Python interpreter
and the standard library) may decide to grow its list of magic
attributes in future versions, user code should generally
refrain from using this convention for its own use. User code
that aspires to become part of the infrastructure could combine
this with a short prefix inside the underscores,
e.g. __bobo_magic_attr__.
Prescriptive: Naming Conventions
Names to Avoid
Never use the characters `l' (lowercase letter el), `O'
(uppercase letter oh), or `I' (uppercase letter eye) as single
character variable names. In some fonts, these characters are
indistinguisable from the numerals one and zero. When tempted
to use `l' use `L' instead.
Module Names
Module names can be either CapWords or lowercase. There is no
unambiguous convention to decide which to use. Modules that
export a single class (or a number of closely related classes,
plus some additional support) are often named in CapWords, with
the module name being the same as the class name (e.g. the
standard StringIO module). Modules that export a bunch of
functions are usually named in all lowercase.
Since module names are mapped to file names, and some file
systems are case insensitive and truncate long names, it is
important that module names be chosen to be fairly short and not
in conflict with other module names that only differ in the case
-- this won't be a problem on Unix, but it may be a problem when
the code is transported to Mac or Windows.
There is an emerging convention that when an extension module
written in C or C++ has an accompanying Python module that
provides a higher level (e.g. more object oriented) interface,
the Python module's name CapWords, while the C/C++ module is
named in all lowercase and has a leading underscore
(e.g. _socket).
Python packages generally have a short all lowercase name.
Class Names
Almost without exception, class names use the CapWords
convention. Classes for internal use have a leading underscore
in addition.
Exception Names
If a module defines a single exception raised for all sorts of
conditions, it is generally called "error" or "Error". It seems
that built-in (extension) modules use "error" (e.g. os.error),
while Python modules generally use "Error" (e.g. xdrlib.Error).
The trend seems to be toward CapWords exception names.
Function Names
Plain functions exported by a module can either use the CapWords
style or lowercase (or lower_case_with_underscores). There is
no strong preference, but it seems that the CapWords style is
used for functions that provide major functionality
(e.g. nstools.WorldOpen()), while lowercase is used more for
"utility" functions (e.g. pathhack.kos_root()).
Global Variable Names
(Let's hope that these variables are meant for use inside one
module only.) The conventions are about the same as those for
exported functions. Modules that are designed for use via "from
M import *" should prefix their globals (and internal functions
and classes) with an underscore to prevent exporting them.
Method Names
The story is largely the same as for functions. Use lowercase
for methods accessed by other classes or functions that are part
of the implementation of an object type. Use one leading
underscore for "internal" methods and instance variables when
there is no chance of a conflict with subclass or superclass
attributes or when a subclass might actually need access to
them. Use two leading underscores (class-private names,
enforced by Python 1.4) in those cases where it is important
that only the current class accesses an attribute. (But realize
that Python contains enough loopholes so that an insistent user
could gain access nevertheless, e.g. via the __dict__ attribute.)
Designing for inheritance
Always decide whether a class's methods and instance variables
should be public or non-public. In general, never make data
variables public unless you're implementing essentially a
record. It's almost always preferrable to give a functional
interface to your class instead (and some Python 2.2
developments will make this much nicer).
Also decide whether your attributes should be private or not.
The difference between private and non-public is that the former
will never be useful for a derived class, while the latter might
be. Yes, you should design your classes with inheritence in
mind!
Private attributes should have two leading underscores, no
trailing underscores.
Non-public attributes should have a single leading underscore,
no trailing underscores.
Public attributes should have no leading or trailing
underscores, unless they conflict with reserved words, in which
case, a single trailing underscore is preferrable to a leading
one, or a corrupted spelling, e.g. class_ rather than klass.
(This last point is a bit controversial; if you prefer klass
over class_ then just be consistent. :).
Programming Recommendations
- Comparisons to singletons like None should always be done with
'is' or 'is not'. Also, beware of writing "if x" when you
really mean "if x is not None" -- e.g. when testing whether a
variable or argument that defaults to None was set to some other
value.
- Class-based exceptions are always preferred over string-based
exceptions. Modules or packages should define their own
domain-specific base exception class, which should be subclassed
from the built-in Exception class. Always include a class
docstring. E.g.:
class MessageError(Exception):
"""Base class for errors in the email package."""
- Avoid the use of the string module; instead use string methods.
These are always much faster and share the same API with unicode
strings.
- Avoid slicing strings when checking for prefixes or suffixes.
Use startswith() and endswith() instead, since they are faster,
cleaner and less error prone. E.g.:
No: if foo[:3] == 'bar':
Yes: if foo.startswith('bar'):
The exception is if your code must work with Python 1.5.2 (but
let's hope not!).
- Object type comparisons should always use isinstance() instead
of comparing types directly. E.g.
No: if type(obj) is type(1):
Yes: if isinstance(obj, int):
When checking if an object is a string, keep in mind that it
might be a unicode string too! In Python 2.2, the types module
has the StringTypes type defined for that purpose, e.g.:
from types import StringTypes:
if isinstance(strorunicodeobj, StringTypes):
- For sequences, (strings, lists, tuples), use the fact that empty
sequences are false, so "if not seq" or "if seq" is preferable
to "if len(seq)" or "if not len(seq)".
- Don't write string literals that rely on significant trailing
whitespace. Such trailing whitespace is visually
indistinguishable and some editors (or more recently,
reindent.py) will trim them.
- Don't compare boolean values to True or False using == (bool
types are new in Python 2.3).
References
[1] PEP 7, Style Guide for C Code, van Rossum
[2] http://www.python.org/doc/essays/styleguide.html
[3] PEP 257, Docstring Conventions, Goodger, van Rossum
[4] http://www.wikipedia.com/wiki/CamelCase
[5] Barry's GNU Mailman/mimelib style guide
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/mimelib/mimelib/STYLEGUIDE.txt?rev=1.1&content-type=text/vnd.viewcvs-markup
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: