This commit is contained in:
Larry Hastings 2013-08-04 01:46:12 -07:00
commit 297ca2977e
20 changed files with 4388 additions and 1263 deletions

View File

@ -3,12 +3,13 @@ Title: Style Guide for Python Code
Version: $Revision$
Last-Modified: $Date$
Author: Guido van Rossum <guido@python.org>,
Barry Warsaw <barry@python.org>
Barry Warsaw <barry@python.org>,
Nick Coghlan <ncoghlan@gmail.com>
Status: Active
Type: Process
Content-Type: text/x-rst
Created: 05-Jul-2001
Post-History: 05-Jul-2001
Post-History: 05-Jul-2001, 01-Aug-2013
Introduction
@ -23,6 +24,13 @@ This document and PEP 257 (Docstring Conventions) were adapted from
Guido's original Python Style Guide essay, with some additions from
Barry's style guide [2]_.
This style guide evolves over time as additional conventions are
identified and past conventions are rendered obsolete by changes in
the language itself.
Many projects have their own coding style guidelines. In the event of any
conflicts, such project-specific guides take precedence for that project.
A Foolish Consistency is the Hobgoblin of Little Minds
======================================================
@ -41,15 +49,24 @@ style guide just doesn't apply. When in doubt, use your best
judgment. Look at other examples and decide what looks best. And
don't hesitate to ask!
Two good reasons to break a particular rule:
In particular: do not break backwards compatibility just to comply with
this PEP!
1. When applying the rule would make the code less readable, even for
someone who is used to reading code that follows the rules.
Some other good reasons to ignore a particular guideline:
1. When applying the guideline would make the code less readable, even
for someone who is used to reading code that follows this PEP.
2. To be consistent with surrounding code that also breaks it (maybe
for historic reasons) -- although this is also an opportunity to
clean up someone else's mess (in true XP style).
3. Because the code in question predates the introduction of the
guideline and there is no other reason to be modifying that code.
4. When the code needs to remain compatible with older versions of
Python that don't support the feature recommended by the style guide.
Code lay-out
============
@ -59,9 +76,6 @@ Indentation
Use 4 spaces per indentation level.
For really old code that you don't want to mess up, you can continue
to use 8-space tabs.
Continuation lines should align wrapped elements either vertically
using Python's implicit line joining inside parentheses, brackets and
braces, or using a hanging indent. When using a hanging indent the
@ -101,7 +115,8 @@ Optional::
var_three, var_four)
The closing brace/bracket/parenthesis on multi-line constructs may
either line up under the last item of the list, as in::
either line up under the first non-whitespace character of the last
line of list, as in::
my_list = [
1, 2, 3,
@ -128,39 +143,70 @@ starts the multi-line construct, as in::
Tabs or Spaces?
---------------
Never mix tabs and spaces.
Spaces are the preferred indentation method.
The most popular way of indenting Python is with spaces only. The
second-most popular way is with tabs only. Code indented with a
mixture of tabs and spaces should be converted to using spaces
exclusively. When invoking the Python command line interpreter with
Tabs should be used solely to remain consistent with code that is
already indented with tabs.
Python 3 disallows mixing the use of tabs and spaces for indentation.
Python 2 code indented with a mixture of tabs and spaces should be
converted to using spaces exclusively.
When invoking the Python 2 command line interpreter with
the ``-t`` option, it issues warnings about code that illegally mixes
tabs and spaces. When using ``-tt`` these warnings become errors.
These options are highly recommended!
For new projects, spaces-only are strongly recommended over tabs.
Most editors have features that make this easy to do.
Maximum Line Length
-------------------
Limit all lines to a maximum of 79 characters.
There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to
have several windows side-by-side. The default wrapping on such
devices disrupts the visual structure of the code, making it more
difficult to understand. Therefore, please limit all lines to a
maximum of 79 characters. For flowing long blocks of text (docstrings
or comments), limiting the length to 72 characters is recommended.
For flowing long blocks of text with fewer structural restrictions
(docstrings or comments), the line length should be limited to 72
characters.
Limiting the required editor window width makes it possible to have
several files open side-by-side, and works well when using code
review tools that present the two versions in adjacent columns.
The default wrapping in most tools disrupts the visual structure of the
code, making it more difficult to understand. The limits are chosen to
avoid wrapping in editors with the window width set to 80, even
if the tool places a marker glyph in the final column when wrapping
lines. Some web based tools may not offer dynamic line wrapping at all.
Some teams strongly prefer a longer line length. For code maintained
exclusively or primarily by a team that can reach agreement on this
issue, it is okay to increase the nominal line length from 80 to
100 characters (effectively increasing the maximum length to 99
characters), provided that comments and docstrings are still wrapped
at 72 characters.
The Python standard library is conservative and requires limiting
lines to 79 characters (and docstrings/comments to 72).
The preferred way of wrapping long lines is by using Python's implied
line continuation inside parentheses, brackets and braces. Long lines
can be broken over multiple lines by wrapping expressions in
parentheses. These should be used in preference to using a backslash
for line continuation. Make sure to indent the continued line
appropriately. The preferred place to break around a binary operator
is *after* the operator, not before it. Some examples::
for line continuation.
Backslashes may still be appropriate at times. For example, long,
multiple ``with``-statements cannot use implicit continuation, so
backslashes are acceptable::
with open('/path/to/some/file/you/want/to/read') as file_1, \
open('/path/to/some/file/being/written', 'w') as file_2:
file_2.write(file_1.read())
Another such case is with ``assert`` statements.
Make sure to indent the continued line appropriately. The preferred
place to break around a binary operator is *after* the operator, not
before it. Some examples::
class Rectangle(Blob):
@ -198,18 +244,21 @@ you may use them to separate pages of related sections of your file.
Note, some editors and web-based code viewers may not recognize
control-L as a form feed and will show another glyph in its place.
Encodings (PEP 263)
-------------------
Code in the core Python distribution should always use the ASCII or
Latin-1 encoding (a.k.a. ISO-8859-1). For Python 3.0 and beyond,
UTF-8 is preferred over Latin-1, see PEP 3120.
Source File Encoding
--------------------
Files using ASCII should not have a coding cookie. Latin-1 (or UTF-8)
should only be used when a comment or docstring needs to mention an
author name that requires Latin-1; otherwise, using ``\x``, ``\u`` or
``\U`` escapes is the preferred way to include non-ASCII data in
string literals.
Code in the core Python distribution should always use UTF-8 (or ASCII
in Python 2).
Files using ASCII (in Python 2) or UTF-8 (in Python 3) should not have
an encoding declaration.
In the standard library, non-default encodings should be used only for
test purposes or when a comment or docstring needs to mention an author
name that contains non-ASCII characters; otherwise, using ``\x``,
``\u``, ``\U``, or ``\N`` escapes is the preferred way to include
non-ASCII data in string literals.
For Python 3.0 and beyond, the following policy is prescribed for the
standard library (see PEP 3131): All identifiers in the Python
@ -253,11 +302,27 @@ Imports
Put any relevant ``__all__`` specification after the imports.
- Relative imports for intra-package imports are highly discouraged.
Always use the absolute package path for all imports. Even now that
PEP 328 is fully implemented in Python 2.5, its style of explicit
relative imports is actively discouraged; absolute imports are more
portable and usually more readable.
- Absolute imports are recommended, as they are usually more readable
and tend to be better behaved (or at least give better error
messages) if the import system is incorrectly configured (such as
when a directory inside a package ends up on ``sys.path``)::
import mypkg.sibling
from mypkg import sibling
from mypkg.sibling import example
However, explicit relative imports are an acceptable alternative to
absolute imports, especially when dealing with complex package layouts
where using absolute imports would be unnecessarily verbose::
from . import sibling
from .sibling import example
Standard library code should avoid complex package layouts and always
use absolute imports.
Implicit relative imports should *never* be used and have been removed
in Python 3.
- When importing a class from a class-containing module, it's usually
okay to spell this::
@ -272,6 +337,18 @@ Imports
and use "myclass.MyClass" and "foo.bar.yourclass.YourClass".
- Wildcard imports (``from <module> import *``) should be avoided, as
they make it unclear which names are present in the namespace,
confusing both readers and many automated tools. There is one
defensible use case for a wildcard import, which is to republish an
internal interface as part of a public API (for example, overwriting
a pure Python implementation of an interface with the definitions
from an optional accelerator module and exactly which definitions
will be overwritten isn't known in advance).
When republishing names this way, the guidelines below regarding
public and internal interfaces still apply.
Whitespace in Expressions and Statements
========================================
@ -330,7 +407,7 @@ Other Recommendations
- If operators with different priorities are used, consider adding
whitespace around the operators with the lowest priority(ies). Use
your own judgement; however, never use more than one space, and
your own judgment; however, never use more than one space, and
always have the same amount of whitespace on both sides of a binary
operator.
@ -747,6 +824,36 @@ With this in mind, here are the Pythonic guidelines:
advanced callers.
Public and internal interfaces
------------------------------
Any backwards compatibility guarantees apply only to public interfaces.
Accordingly, it is important that users be able to clearly distinguish
between public and internal interfaces.
Documented interfaces are considered public, unless the documentation
explicitly declares them to be provisional or internal interfaces exempt
from the usual backwards compatibility guarantees. All undocumented
interfaces should be assumed to be internal.
To better support introspection, modules should explicitly declare the
names in their public API using the ``__all__`` attribute. Setting
``__all__`` to an empty list indicates that the module has no public API.
Even with ``__all__`` set appropriately, internal interfaces (packages,
modules, classes, functions, attributes or other names) should still be
prefixed with a single leading underscore.
An interface is also considered internal if any containing namespace
(package, module or class) is considered internal.
Imported names should always be considered an implementation detail.
Other modules must not rely on indirect access to such imported names
unless they are an explicitly documented part of the containing module's
API, such as ``os.path`` or a package's ``__init__`` module that exposes
functionality from submodules.
Programming Recommendations
===========================
@ -756,10 +863,12 @@ Programming Recommendations
For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form ``a += b``
or ``a = a + b``. Those statements run more slowly in Jython. In
performance sensitive parts of the library, the ``''.join()`` form
should be used instead. This will ensure that concatenation occurs
in linear time across various implementations.
or ``a = a + b``. This optimization is fragile even in CPython (it
only works for some types) and isn't present at all in implementations
that don't use refcounting. In performance sensitive parts of the
library, the ``''.join()`` form should be used instead. This will
ensure that concatenation occurs in linear time across various
implementations.
- Comparisons to singletons like None should always be done with
``is`` or ``is not``, never the equality operators.
@ -786,29 +895,59 @@ Programming Recommendations
operator. However, it is best to implement all six operations so
that confusion doesn't arise in other contexts.
- Use class-based exceptions.
- Always use a def statement instead of an assignment statement that binds
a lambda expression directly to a name.
String exceptions in new code are forbidden, and this language
feature has been removed in Python 2.6.
Yes::
Modules or packages should define their own domain-specific base
exception class, which should be subclassed from the built-in
Exception class. Always include a class docstring. E.g.::
def f(x): return 2*x
class MessageError(Exception):
"""Base class for errors in the email package."""
No::
f = lambda x: 2*x
The first form means that the name of the resulting function object is
specifically 'f' instead of the generic '<lambda>'. This is more
useful for tracebacks and string representations in general. The use
of the assignment statement eliminates the sole benefit a lambda
expression can offer over an explicit def statement (i.e. that it can
be embedded inside a larger expression)
- Derive exceptions from ``Exception`` rather than ``BaseException``.
Direct inheritance from ``BaseException`` is reserved for exceptions
where catching them is almost always the wrong thing to do.
Design exception hierarchies based on the distinctions that code
*catching* the exceptions is likely to need, rather than the locations
where the exceptions are raised. Aim to answer the question
"What went wrong?" programmatically, rather than only stating that
"A problem occurred" (see PEP 3151 for an example of this lesson being
learned for the builtin exception hierarchy)
Class naming conventions apply here, although you should add the
suffix "Error" to your exception classes, if the exception is an
error. Non-error exceptions need no special suffix.
suffix "Error" to your exception classes if the exception is an
error. Non-error exceptions that are used for non-local flow control
or other forms of signaling need no special suffix.
- When raising an exception, use ``raise ValueError('message')``
- Use exception chaining appropriately. In Python 3, "raise X from Y"
should be used to indicate explicit replacement without losing the
original traceback.
When deliberately replacing an inner exception (using "raise X" in
Python 2 or "raise X from None" in Python 3.3+), ensure that relevant
details are transferred to the new exception (such as preserving the
attribute name when converting KeyError to AttributeError, or
embedding the text of the original exception in the new exception
message).
- When raising an exception in Python 2, use ``raise ValueError('message')``
instead of the older form ``raise ValueError, 'message'``.
The paren-using form is preferred because when the exception
arguments are long or include string formatting, you don't need to
use line continuation characters thanks to the containing
parentheses. The older form is not legal syntax in Python 3.
The latter form is not legal Python 3 syntax.
The paren-using form also means that when the exception arguments are
long or include string formatting, you don't need to use line
continuation characters thanks to the containing parentheses.
- When catching exceptions, mention specific exceptions whenever
possible instead of using a bare ``except:`` clause.
@ -838,6 +977,21 @@ Programming Recommendations
exception propagate upwards with ``raise``. ``try...finally``
can be a better way to handle this case.
- When binding caught exceptions to a name, prefer the explicit name
binding syntax added in Python 2.6::
try:
process_data()
except Exception as exc:
raise DataProcessingFailedError(str(exc))
This is the only syntax supported in Python 3, and avoids the
ambiguity problems associated with the older comma-based syntax.
- When catching operating system errors, prefer the explicit exception
hierarchy introduced in Python 3.3 over introspection of ``errno``
values.
- Additionally, for all try/except clauses, limit the ``try`` clause
to the absolute minimum amount of code necessary. Again, this
avoids masking bugs.
@ -860,6 +1014,10 @@ Programming Recommendations
# Will also catch KeyError raised by handle_value()
return key_not_found(key)
- When a resource is local to a particular section of code, use a
``with`` statement to ensure it is cleaned up promptly and reliably
after use. A try/finally statement is also acceptable.
- Context managers should be invoked through separate functions or methods
whenever they do something other than acquire and release resources.
For example:
@ -894,9 +1052,6 @@ Programming Recommendations
Yes: if foo.startswith('bar'):
No: if foo[:3] == 'bar':
The exception is if your code must work with Python 1.5.2 (but let's
hope not!).
- Object type comparisons should always use isinstance() instead of
comparing types directly. ::
@ -905,11 +1060,15 @@ Programming Recommendations
No: if type(obj) is type(1):
When checking if an object is a string, keep in mind that it might
be a unicode string too! In Python 2.3, str and unicode have a
be a unicode string too! In Python 2, str and unicode have a
common base class, basestring, so you can do::
if isinstance(obj, basestring):
Note that in Python 3, ``unicode`` and ``basestring`` no longer exist
(there is only ``str``) and a bytes object is no longer a kind of
string (it is a sequence of integers instead)
- For sequences, (strings, lists, tuples), use the fact that empty
sequences are false. ::
@ -934,6 +1093,10 @@ Programming Recommendations
annotation style. Instead, the annotations are left for users to
discover and experiment with useful annotation styles.
It is recommended that third party experiments with annotations use an
associated decorator to indicate how the annotation should be
interpreted.
Early core developer attempts to use function annotations revealed
inconsistent, ad-hoc annotation styles. For example:
@ -991,6 +1154,8 @@ References
.. [3] http://www.wikipedia.com/wiki/CamelCase
.. [4] PEP 8 modernisation, July 2013
http://bugs.python.org/issue18472
Copyright
=========

View File

@ -4,7 +4,7 @@ Version: $Revision$
Last-Modified: $Date$
Author: Raymond Hettinger <python@rcn.com>
W Isaac Carroll <icarroll@pobox.com>
Status: Deferred
Status: Rejected
Type: Standards Track
Content-Type: text/plain
Created: 25-Apr-2003
@ -21,19 +21,32 @@ Abstract
Notice
Deferred; see
Rejected; see
http://mail.python.org/pipermail/python-ideas/2013-June/021610.html
This PEP has been deferred since 2006; see
http://mail.python.org/pipermail/python-dev/2006-February/060718.html
Subsequent efforts to revive the PEP in April 2009 did not
meet with success because no syntax emerged that could
compete with a while-True and an inner if-break.
compete with the following form:
A syntax was found for a basic do-while loop but it found
had little support because the condition was at the top:
while True:
<setup code>
if not <condition>:
break
<loop body>
A syntax alternative to the one proposed in the PEP was found for
a basic do-while loop but it gained little support because the
condition was at the top:
do ... while <cond>:
<loop body>
Users of the language are advised to use the while-True form with
an inner if-break when a do-while loop would have been appropriate.
Motivation

View File

@ -19,10 +19,17 @@ This PEP provides a convention to ensure that Python scripts can continue to
be portable across ``*nix`` systems, regardless of the default version of the
Python interpreter (i.e. the version invoked by the ``python`` command).
* ``python2`` will refer to some version of Python 2.x
* ``python3`` will refer to some version of Python 3.x
* ``python`` *should* refer to the same target as ``python2`` but *may*
refer to ``python3`` on some bleeding edge distributions
* ``python2`` will refer to some version of Python 2.x.
* ``python3`` will refer to some version of Python 3.x.
* for the time being, all distributions *should* ensure that ``python``
refers to the same target as ``python2``.
* however, end users should be aware that ``python`` refers to ``python3``
on at least Arch Linux (that change is what prompted the creation of this
PEP), so ``python`` should be used in the shebang line only for scripts
that are source compatible with both Python 2 and 3.
* in preparation for an eventual change in the default version of Python,
Python 2 only scripts should either be updated to be source compatible
with Python 3 or else to use ``python2`` in the shebang line.
Recommendation
@ -103,15 +110,29 @@ aspects of migrating to Python 3 as the default version of Python for a
system. They will hopefully be helpful to any distributions considering
making such a change.
* Distributions that only include ``python3`` in their base install (i.e.
they do not provide ``python2`` by default) along with those that are
aggressively trying to reach that point (and are willing to break third
party scripts while attempting to get there) are already beginning to alias
the ``python`` command to ``python3``
* More conservative distributions that are less willing to tolerate breakage
of third party scripts continue to alias it to ``python2``. Until the
conventions described in this PEP are more widely adopted, having ``python``
invoke ``python2`` will remain the recommended option.
* The main barrier to a distribution switching the ``python`` command from
``python2`` to ``python3`` isn't breakage within the distribution, but
instead breakage of private third party scripts developed by sysadmins
and other users. Updating the ``python`` command to invoke ``python3``
by default indicates that a distribution is willing to break such scripts
with errors that are potentially quite confusing for users that aren't
yet familiar with the backwards incompatible changes in Python 3. For
example, while the change of ``print`` from a statement to a builtin
function is relatively simple for automated converters to handle, the
SyntaxError from attempting to use the Python 2 notation in Python 3 is
thoroughly confusing if you aren't already aware of the change::
$ python3 -c 'print "Hello, world!"'
File "<string>", line 1
print "Hello, world!"
^
SyntaxError: invalid syntax
* Avoiding breakage of such third party scripts is the key reason this
PEP recommends that ``python`` continue to refer to ``python2`` for the
time being. Until the conventions described in this PEP are more widely
adopted, having ``python`` invoke ``python2`` will remain the recommended
option.
* The ``pythonX.X`` (e.g. ``python2.6``) commands exist on some systems, on
which they invoke specific minor versions of the Python interpreter. It
can be useful for distribution-specific packages to take advantage of these
@ -148,10 +169,13 @@ making such a change.
``python`` command is only executed in an interactive manner as a user
convenience, or to run scripts that are source compatible with both Python
2 and Python 3.
* one symbolic date being considered for a possible change to the official
recommendation in this PEP is the planned switch of Python 2.7 from full
maintenance to security update only status in 2015 (see PEP 373).
Backwards Compatibility
=========================
=======================
A potential problem can arise if a script adhering to the
``python2``/``python3`` convention is executed on a system not supporting
@ -217,7 +241,8 @@ Exclusion of MS Windows
This PEP deliberately excludes any proposals relating to Microsoft Windows, as
devising an equivalent solution for Windows was deemed too complex to handle
here. PEP 397 and the related discussion on the python-dev mailing list
address this issue.
address this issue (like this PEP, the PEP 397 launcher invokes Python 2 by
default if versions of both Python 2 and 3 are installed on the system).
References

View File

@ -627,7 +627,7 @@ release, one possible layout for such an approach might look like::
<news entries>
# Add maint.1, compat.1 etc as releases are made
Putting the version information in the directory heirarchy isn't strictly
Putting the version information in the directory hierarchy isn't strictly
necessary (since the NEWS file generator could figure out from the version
history), but does make it easier for *humans* to keep the different versions
in order.

File diff suppressed because it is too large Load Diff

329
pep-0426/pydist-schema.json Normal file
View File

@ -0,0 +1,329 @@
{
"id": "http://www.python.org/dev/peps/pep-0426/",
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Metadata for Python Software Packages 2.0",
"type": "object",
"properties": {
"metadata_version": {
"description": "Version of the file format",
"type": "string",
"pattern": "^(\\d+(\\.\\d+)*)$"
},
"generator": {
"description": "Name and version of the program that produced this file.",
"type": "string",
"pattern": "^[0-9A-Za-z]([0-9A-Za-z_.-]*[0-9A-Za-z])( \\(.*\\))?$"
},
"name": {
"description": "The name of the distribution.",
"type": "string",
"$ref": "#/definitions/distribution_name"
},
"version": {
"description": "The distribution's public version identifier",
"type": "string",
"pattern": "^(\\d+(\\.\\d+)*)((a|b|c|rc)(\\d+))?(\\.(post)(\\d+))?(\\.(dev)(\\d+))?$"
},
"source_label": {
"description": "A constrained identifying text string",
"type": "string",
"pattern": "^[0-9a-z_.-+]+$"
},
"source_url": {
"description": "A string containing a full URL where the source for this specific version of the distribution can be downloaded.",
"type": "string",
"format": "uri"
},
"summary": {
"description": "A one-line summary of what the distribution does.",
"type": "string"
},
"document_names": {
"description": "Names of supporting metadata documents",
"type": "object",
"properties": {
"description": {
"type": "string",
"$ref": "#/definitions/document_name"
},
"changelog": {
"type": "string",
"$ref": "#/definitions/document_name"
},
"license": {
"type": "string",
"$ref": "#/definitions/document_name"
}
},
"additionalProperties": false
},
"keywords": {
"description": "A list of additional keywords to be used to assist searching for the distribution in a larger catalog.",
"type": "array",
"items": {
"type": "string"
}
},
"license": {
"description": "A string indicating the license covering the distribution.",
"type": "string"
},
"classifiers": {
"description": "A list of strings, with each giving a single classification value for the distribution.",
"type": "array",
"items": {
"type": "string"
}
},
"contacts": {
"description": "A list of contributor entries giving the recommended contact points for getting more information about the project.",
"type": "array",
"items": {
"type": "object",
"$ref": "#/definitions/contact"
}
},
"contributors": {
"description": "A list of contributor entries for other contributors not already listed as current project points of contact.",
"type": "array",
"items": {
"type": "object",
"$ref": "#/definitions/contact"
}
},
"project_urls": {
"description": "A mapping of arbitrary text labels to additional URLs relevant to the project.",
"type": "object"
},
"extras": {
"description": "A list of optional sets of dependencies that may be used to define conditional dependencies in \"may_require\" and similar fields.",
"type": "array",
"items": {
"type": "string",
"$ref": "#/definitions/extra_name"
}
},
"meta_requires": {
"description": "A list of subdistributions made available through this metadistribution.",
"type": "array",
"$ref": "#/definitions/dependencies"
},
"run_requires": {
"description": "A list of other distributions needed to run this distribution.",
"type": "array",
"$ref": "#/definitions/dependencies"
},
"test_requires": {
"description": "A list of other distributions needed when this distribution is tested.",
"type": "array",
"$ref": "#/definitions/dependencies"
},
"build_requires": {
"description": "A list of other distributions needed when this distribution is built.",
"type": "array",
"$ref": "#/definitions/dependencies"
},
"dev_requires": {
"description": "A list of other distributions needed when this distribution is developed.",
"type": "array",
"$ref": "#/definitions/dependencies"
},
"provides": {
"description": "A list of strings naming additional dependency requirements that are satisfied by installing this distribution. These strings must be of the form Name or Name (Version)",
"type": "array",
"items": {
"type": "string",
"$ref": "#/definitions/provides_declaration"
}
},
"modules": {
"description": "A list of modules and/or packages available for import after installing this distribution.",
"type": "array",
"items": {
"type": "string",
"$ref": "#/definitions/qualified_name"
}
},
"namespaces": {
"description": "A list of namespace packages this distribution contributes to",
"type": "array",
"items": {
"type": "string",
"$ref": "#/definitions/qualified_name"
}
},
"commands": {
"description": "Command line interfaces provided by this distribution",
"type": "object",
"$ref": "#/definitions/commands"
},
"exports": {
"description": "Other exported interfaces provided by this distribution",
"type": "object",
"$ref": "#/definitions/exports"
},
"obsoleted_by": {
"description": "A string that indicates that this project is no longer being developed. The named project provides a substitute or replacement.",
"type": "string",
"$ref": "#/definitions/requirement"
},
"supports_environments": {
"description": "A list of strings specifying the environments that the distribution explicitly supports.",
"type": "array",
"items": {
"type": "string",
"$ref": "#/definitions/environment_marker"
}
},
"install_hooks": {
"description": "The install_hooks field is used to define various operations that may be invoked on a distribution in a platform independent manner.",
"type": "object",
"properties": {
"postinstall": {
"type": "string",
"$ref": "#/definitions/export_specifier"
},
"preuninstall": {
"type": "string",
"$ref": "#/definitions/export_specifier"
}
}
},
"extensions": {
"description": "Extensions to the metadata may be present in a mapping under the 'extensions' key.",
"type": "object"
}
},
"required": ["metadata_version", "name", "version", "summary"],
"additionalProperties": false,
"definitions": {
"contact": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
},
"url": {
"type": "string"
},
"role": {
"type": "string"
}
},
"required": ["name"],
"additionalProperties": false
},
"dependencies": {
"type": "array",
"items": {
"type": "object",
"$ref": "#/definitions/dependency"
}
},
"dependency": {
"type": "object",
"properties": {
"extra": {
"type": "string",
"$ref": "#/definitions/valid_name"
},
"environment": {
"type": "string",
"$ref": "#/definitions/environment_marker"
},
"requires": {
"type": "array",
"items": {
"type": "string",
"$ref": "#/definitions/requirement"
}
}
},
"required": ["requires"],
"additionalProperties": false
},
"commands": {
"type": "object",
"properties": {
"wrap_console": {
"type": "object",
"$ref": "#/definitions/command_map"
},
"wrap_gui": {
"type": "object",
"$ref": "#/definitions/command_map"
},
"prebuilt": {
"type": "array",
"items": {
"type": "string",
"$ref": "#/definitions/relative_path"
}
}
},
"additionalProperties": false
},
"exports": {
"type": "object",
"patternProperties": {
"^[A-Za-z]([0-9A-Za-z_])*([.][A-Za-z]([0-9A-Za-z_])*)*$": {
"type": "object",
"patternProperties": {
".": {
"type": "string",
"$ref": "#/definitions/export_specifier"
}
},
"additionalProperties": false
}
},
"additionalProperties": false
},
"command_map": {
"type": "object",
"patternProperties": {
"^[0-9A-Za-z]([0-9A-Za-z_.-]*[0-9A-Za-z])?$": {
"type": "string",
"$ref": "#/definitions/export_specifier"
}
},
"additionalProperties": false
},
"distribution_name": {
"type": "string",
"pattern": "^[0-9A-Za-z]([0-9A-Za-z_.-]*[0-9A-Za-z])?$"
},
"requirement": {
"type": "string"
},
"provides_declaration": {
"type": "string"
},
"environment_marker": {
"type": "string"
},
"document_name": {
"type": "string"
},
"extra_name" : {
"type": "string",
"pattern": "^[0-9A-Za-z]([0-9A-Za-z_.-]*[0-9A-Za-z])?$"
},
"relative_path" : {
"type": "string"
},
"export_specifier": {
"type": "string",
"pattern": "^([A-Za-z_][A-Za-z_0-9]*([.][A-Za-z_][A-Za-z_0-9]*)*)(:[A-Za-z_][A-Za-z_0-9]*([.][A-Za-z_][A-Za-z_0-9]*)*)?(\\[[0-9A-Za-z]([0-9A-Za-z_.-]*[0-9A-Za-z])?\\])?$"
},
"qualified_name" : {
"type": "string",
"pattern": "^[A-Za-z_][A-Za-z_0-9]*([.][A-Za-z_][A-Za-z_0-9]*)*$"
}
}
}

View File

@ -3,11 +3,11 @@ Title: Simplifying the CPython startup sequence
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Draft
Status: Deferred
Type: Standards Track
Content-Type: text/x-rst
Created: 28-Dec-2012
Python-Version: 3.4
Python-Version: 3.5
Post-History: 28-Dec-2012, 2-Jan-2013
@ -25,6 +25,31 @@ resolution for most of these should become clearer as the reference
implementation is developed.
PEP Deferral
============
Python 3.4 is nearing its first alpha, and already includes a couple of
significant low level changes in PEP 445 (memory allocator customisation)
and PEP 442 (safe object finalization). As a result of the latter PEP,
the shutdown procedure of CPython has also been changed to be more heavily
reliant on the cyclic garbage collector, significantly reducing the
number of modules that will experience the "module globals set to None"
behaviour that is used to deliberate break cycles and attempt to releases
more external resources cleanly.
Furthermore, I am heavily involved in the current round of updates to the
Python packaging ecosystem (as both the lead author of PEP 426 and
BDFL-delegate for several other PEPs), leaving little to spare to work on
this proposal. The other developers I would trust to lead this effort are
also working on other things.
So, due to those practical resource constraints, the proximity of Python
3.4 deadlines, and recognition that making too many significant changes to
the low level CPython infrastructure in one release is likely to be unwise,
further work on this PEP has been deferred to the Python 3.5 development
cycle.
Proposal
========

View File

@ -5,7 +5,7 @@ Last-Modified: $Date$
Author: Barry Warsaw <barry@python.org>,
Eli Bendersky <eliben@gmail.com>,
Ethan Furman <ethan@stoneleaf.us>
Status: Accepted
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 2013-02-23
@ -467,6 +467,10 @@ assignment to ``Animal`` is equivalent to::
... cat = 3
... dog = 4
The reason for defaulting to ``1`` as the starting number and not ``0`` is
that ``0`` is ``False`` in a boolean sense, but enum members all evaluate
to ``True``.
Proposed variations
===================

View File

@ -45,6 +45,12 @@ Python installation the barrier to installing additional software is
considerably reduced. It is hoped that this will therefore increase
the likelihood that Python projects will reuse third party software.
The Python community also has an issue of complexity around the current
bootstrap procedure for pip and setuptools. They all have
their own bootstrap download file with slightly different usages and
even refer to each other in some cases. Having a single bootstrap which
is common amongst them all, with a simple usage, would be far preferable.
It is also hoped that this is reduces the number of proposals to
include more and more software in the Python standard library, and
therefore that more popular Python software is more easily upgradeable
@ -54,23 +60,32 @@ beyond requiring Python installation upgrades.
Proposal
========
This proposal affects three components of packaging: `the pip bootstrap`_,
`setuptools`_ and, thanks to easier package installation, `modifications to
publishing packages`_.
The bootstrap will install the pip implementation, setuptools by downloading
their installation files from PyPI.
This proposal affects two components of packaging: `the pip bootstrap`_ and,
thanks to easier package installation, `modifications to publishing
packages`_.
The core of this proposal is that the user experience of using pip should not
require the user to install pip.
The pip bootstrap
-----------------
The Python installation includes an executable called "pip3" (see PEP 394 for
naming rationale etc.) that attempts to import pip machinery. If it can
then the pip command proceeds as normal. If it cannot it will bootstrap pip by
downloading the pip implementation wheel file. Once installed, the pip command
proceeds as normal.
naming rationale etc.) that attempts to import pip machinery. If it can then
the pip command proceeds as normal. If it cannot it will bootstrap pip by
downloading the pip implementation and setuptools wheel files. Hereafter the
installation of the "pip implementation" will imply installation of setuptools
and virtualenv. Once installed, the pip command proceeds as normal. Once the
bootstrap process is complete the "pip3" command is no longer the bootstrap
but rather the full pip command.
A boostrap is used in the place of a the full pip code so that we
don't have to bundle pip and also the install tool is upgradeable
outside of the regular Python upgrade timeframe and processes.
A boostrap is used in the place of a the full pip code so that we don't have
to bundle pip and also pip is upgradeable outside of the regular Python
upgrade timeframe and processes.
To avoid issues with sudo we will have the bootstrap default to
installing the pip implementation to the per-user site-packages
@ -88,82 +103,58 @@ The bootstrap process will proceed as follows:
2. The user will invoke a pip command, typically "pip3 install
<package>", for example "pip3 install Django".
3. The boostrap script will attempt to import the pip implementation.
If this succeeds, the pip command is processed normally.
If this succeeds, the pip command is processed normally. Stop.
4. On failing to import the pip implementation the bootstrap notifies
the user that it is "upgrading pip" and contacts PyPI to obtain the
latest download wheel file (see PEP 427.)
5. Upon downloading the file it is installed using the distlib
installation machinery for wheel packages. Upon completing the
installation the user is notified that "pip3 has been upgraded."
TODO how is it verified?
6. The pip tool may now import the pip implementation and continues to
the user that it needs to "install pip". It will ask the user whether it
should install pip as a system-wide site-packages or as a user-only
package. This choice will also be present as a command-line option to pip
so non-interactive use is possible.
5. The bootstrap will and contact PyPI to obtain the latest download wheel
file (see PEP 427.)
6. Upon downloading the file it is installed using "python setup.py install".
7. The pip tool may now import the pip implementation and continues to
process the requested user command normally.
Users may be running in an environment which cannot access the public
Internet and are relying solely on a local package repository. They
would use the "-i" (Base URL of Python Package Index) argument to the
"pip3 install" command. This use case will be handled by:
"pip3 install" command. This simply overrides the default index URL pointing
to PyPI.
1. Recognising the command-line arguments that specify alternative or
additional locations to discover packages and attempting to
download the package from those locations.
2. If the package is not found there then we attempt to donwload it
using the standard "https://pypi.python.org/pypi/simple/pip" index.
3. If that also fails, for any reason, we indicate to the user the
operation we were attempting, the reason for failure (if we know
it) and display further instructions for downloading and installing
the file manually.
Some users may have no Internet access suitable for fetching the pip
implementation file. These users can manually download and install the
setuptools and pip tar files. Adding specific support for this use-case is
unnecessary.
Manual installation of the pip implementation will be supported
through the manual download of the wheel file and "pip3 install
<downloaded wheel file>".
This installation will not perform standard pip installation steps of
saving the file to a cache directory or updating any local database of
installed files.
The download of the pip implementation install file should be performed
securely. The transport from pypi.python.org will be done over HTTPS but the CA
certificate check will most likely not be performed, and therefore the download
would still be vulnerable to active MITM attacks. To mitigate this
risk we will use the embedded signature support in the wheel format to validate
the downloaded file.
The download of the pip implementation install file will be performed
securely. The transport from pypi.python.org will be done over HTTPS with the
CA certificate check performed. This facility will be present in Python 3.4+
using Operating System certificates (see PEP XXXX).
Beyond those arguments controlling index location and download
options, the "pip3" boostrap command may support further standard pip
options for verbosity, quietness and logging.
The "pip3" command will support two new command-line options that are used
in the boostrapping, and otherwise ignored. They control where the pip
implementation is installed:
--bootstrap
Install to the user's packages directory. The name of this option is chosen
to promote it as the preferred installation option.
--bootstrap-to-system
Install to the system site-packages directory.
These command-line options will also need to be implemented, but otherwise
ignored, in the pip implementation.
Consideration should be given to defaulting pip to install packages to the
user's packages directory if pip is installed in that location.
The "--no-install" option to the "pip3" command will not affect the
bootstrapping process.
setuptools
----------
The deprecation of requiring setuptools for installation is an existing goal of
the packaging comminity (TODO ref needed). Currently pip depends upon setuptools
functionality, and it is installed by the current pip boostrap. This PEP does
not propose installing setuptools during the new bootstrap.
It is intended that before Python 3.4 is shipped the functionlity required by
pip will be present in Python's standard library as the distlib module, and that
pip would be modified to use that functionality when present. TODO PEP reference
for distlib
Many existing "setup.py" files require setuptools to be installed (because one
of the first things they do is import setuptools). It is intended that pip's
behaviour will be either:
1. If setuptools is not present it can only install from wheel files and
sdists with 2.0+ metadata, or
2. If setuptools is present it can also install from sdists with legacy
metadata and eggs
By default, installing setuptools when necessary should be automatic so that
users are not inconvenienced, but advanced users should be able to ask that it
instead be treated as an error if no wheel is available to satisfy an
installation request or dependency (so they don't inadvertently install
setuptools on their production systems if they don't want to).
Modifications to publishing packages
------------------------------------
@ -189,22 +180,36 @@ Implementation
==============
The changes to pip required by this PEP are being tracked in that project's
issue tracker [2]_
issue tracker [2]_. Most notably, the addition of --bootstrap and --bootstrap-
to-system to the pip command-line.
It would be preferable that the pip and setuptools projects distribute a wheel
format download.
The required code for this implementation is the "pip3" command described
above. The additional pypublish can be developed outside of the scope of this
PEP's work.
Finally, it would be desirable that "pip3" be ported to Python 2.6+ to allow
the single command to replace existing pip, setuptools and virtualenv (which
would be added to the bootstrap) bootstrap scripts. Having that bootstrap
included in a future Python 2.7 release would also be highly desirable.
Risks
=====
The Fedora variant of Linux has had a separate program called "pip" (a
Perl package installer) available for install for some time. The
current Python "pip" program is installed as "pip-python". It is
hoped that the Fedora community will resolve this issue by renaming
the Perl installer.
The key that is used to sign the pip implementation download might be
compromised and this PEP currently proposes no mechanism for key
revocation.
There is a Perl package installer also named "pip". It is quite rare and not
commonly used. The Fedora variant of Linux has historically named Python's
"pip" as "python-pip" and Perl's "pip" as "perl-pip". This policy has been
altered[3] so that future and upgraded Fedora installations will use the name
"pip" for Python's "pip". Existing (non-upgraded) installations will still
have the old name for the Python "pip", though the potential for confusion is
now much reduced.
References
@ -216,6 +221,9 @@ References
.. [2] pip issue tracking work needed for this PEP
https://github.com/pypa/pip/issues/863
.. [3] Fedora's python-pip package does not provide /usr/bin/pip
https://bugzilla.redhat.com/show_bug.cgi?id=958377
Acknowledgments
===============
@ -223,7 +231,9 @@ Acknowledgments
Nick Coghlan for his thoughts on the proposal and dealing with the Red
Hat issue.
Jannis Leidel and Carl Meyer for their thoughts.
Jannis Leidel and Carl Meyer for their thoughts. Marcus Smith for feedback.
Marcela Mašláňová for resolving the Fedora issue.
Copyright

View File

@ -9,7 +9,7 @@ Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18 Mar 2013
Post-History: 30 Mar 2013, 27-May-2013
Post-History: 30 Mar 2013, 27 May 2013, 20 Jun 2013
Replaces: 386
@ -27,7 +27,7 @@ standardised approach to versioning, as described in PEP 345 and PEP 386.
This PEP was broken out of the metadata 2.0 specification in PEP 426.
Unlike PEP 426, the notes that remain in this document are intended as
part of the final specification.
part of the final specification (except for this one).
Definitions
@ -40,7 +40,7 @@ document are to be interpreted as described in RFC 2119.
The following terms are to be interpreted as described in PEP 426:
* "Distributions"
* "Versions"
* "Releases"
* "Build tools"
* "Index servers"
* "Publication tools"
@ -52,9 +52,13 @@ The following terms are to be interpreted as described in PEP 426:
Version scheme
==============
Distribution versions are identified by both a public version identifier,
which supports all defined version comparison operations, and a build
label, which supports only strict equality comparisons.
Distributions are identified by a public version identifier which
supports all defined version comparison operations
Distributions may also define a source label, which is not used by
automated tools. Source labels are useful when a project internal
versioning scheme requires translation to create a compliant public
version identifier.
The version scheme is used both to describe the distribution version
provided by a particular distribution archive, as well as to place
@ -84,7 +88,7 @@ Public version identifiers are separated into up to four segments:
* Post-release segment: ``.postN``
* Development release segment: ``.devN``
Any given version will be a "release", "pre-release", "post-release" or
Any given release will be a "final release", "pre-release", "post-release" or
"developmental release" as defined in the following sections.
.. note::
@ -99,34 +103,43 @@ Any given version will be a "release", "pre-release", "post-release" or
sections.
Build labels
------------
Source labels
-------------
Build labels are text strings with minimal defined semantics.
Source labels are text strings with minimal defined semantics.
To ensure build labels can be readily incorporated as part of file names
and URLs, they MUST be comprised of only ASCII alphanumerics, plus signs,
periods and hyphens.
To ensure source labels can be readily incorporated as part of file names
and URLs, and to avoid formatting inconsistences in hexadecimal hash
representations they MUST be limited to the following set of permitted
characters:
In addition, build labels MUST be unique within a given distribution.
* Lowercase ASCII letters (``[a-z]``)
* ASCII digits (``[0-9]``)
* underscores (``_``)
* hyphens (``-``)
* periods (``.``)
* plus signs (``+``)
As with distribution names, all comparisons of build labels MUST be case
insensitive.
Source labels MUST start and end with an ASCII letter or digit.
Source labels MUST be unique within each project and MUST NOT match any
defined version for the project.
Releases
--------
Final releases
--------------
A version identifier that consists solely of a release segment is termed
a "release".
A version identifier that consists solely of a release segment is
termed a "final release".
The release segment consists of one or more non-negative integer values,
separated by dots::
The release segment consists of one or more non-negative integer
values, separated by dots::
N[.N]+
Releases within a project will typically be numbered in a consistently
increasing fashion.
Final releases within a project MUST be numbered in a consistently
increasing fashion, otherwise automated tools will not be able to upgrade
them correctly.
Comparison and ordering of release segments considers the numeric value
of each component of the release segment in turn. When comparing release
@ -157,8 +170,8 @@ For example::
2.0
2.0.1
A release series is any set of release numbers that start with a common
prefix. For example, ``3.3.1``, ``3.3.5`` and ``3.3.9.45`` are all
A release series is any set of final release numbers that start with a
common prefix. For example, ``3.3.1``, ``3.3.5`` and ``3.3.9.45`` are all
part of the ``3.3`` release series.
.. note::
@ -206,8 +219,8 @@ of both ``c`` and ``rc`` releases for a common release segment.
Post-releases
-------------
Some projects use post-releases to address minor errors in a release that
do not affect the distributed software (for example, correcting an error
Some projects use post-releases to address minor errors in a final release
that do not affect the distributed software (for example, correcting an error
in the release notes).
If used as part of a project's development cycle, these post-releases are
@ -371,7 +384,7 @@ are permitted and MUST be ordered as shown::
.devN, aN, bN, cN, rcN, <no suffix>, .postN
Note that `rc` will always sort after `c` (regardless of the numeric
component) although they are semantically equivalent. Tools are free to
component) although they are semantically equivalent. Tools MAY
reject this case as ambiguous and remain in compliance with the PEP.
Within an alpha (``1.0a1``), beta (``1.0b1``), or release candidate
@ -444,7 +457,7 @@ Compatibility with other version schemes
Some projects may choose to use a version scheme which requires
translation in order to comply with the public version scheme defined in
this PEP. In such cases, the build label can be used to
this PEP. In such cases, the source label can be used to
record the project specific version as an arbitrary label, while the
translated public version is published in the version field.
@ -488,7 +501,7 @@ identifier. As hashes cannot be ordered reliably such versions are not
permitted in the public version field.
As with semantic versioning, the public ``.devN`` suffix may be used to
uniquely identify such releases for publication, while the build label is
uniquely identify such releases for publication, while the source label is
used to record the original DVCS based version label.
@ -496,7 +509,7 @@ Date based versions
~~~~~~~~~~~~~~~~~~~
As with other incompatible version schemes, date based versions can be
stored in the build label field. Translating them to a compliant
stored in the source label field. Translating them to a compliant
public version is straightforward: use a leading ``"0."`` prefix in the
public version label, with the date based version number as the remaining
components in the release segment.
@ -506,6 +519,22 @@ numbering based on API compatibility, as well as triggering more appropriate
version comparison semantics.
Olson database versioning
~~~~~~~~~~~~~~~~~~~~~~~~~
The ``pytz`` project inherits its versioning scheme from the corresponding
Olson timezone database versioning scheme: the year followed by a lowercase
character indicating the version of the database within that year.
This can be translated to a compliant 3-part version identifier as
``0.<year>.<serial>``, where the serial starts at zero (for the '<year>a'
release) and is incremented with each subsequent database update within the
year.
As with other translated version identifiers, the corresponding Olson
database version would be recorded in the source label field.
Version specifiers
==================
@ -521,7 +550,6 @@ clause:
* ``~=``: `Compatible release`_ clause
* ``==``: `Version matching`_ clause
* ``!=``: `Version exclusion`_ clause
* ``is``: `Build reference`_ clause
* ``<=``, ``>=``: `Inclusive ordered comparison`_ clause
* ``<``, ``>``: `Exclusive ordered comparison`_ clause
@ -605,6 +633,11 @@ version. The *only* substitution performed is the zero padding of the
release segment to ensure the release segments are compared with the same
length.
Whether or not strict version matching is appropriate depends on the specific
use case for the version specifier. Automated tools SHOULD at least issue
warnings and MAY reject them entirely when strict version matches are used
inappropriately.
Prefix matching may be requested instead of strict comparison, by appending
a trailing ``.*`` to the version identifier in the version matching clause.
This means that additional trailing segments will be ignored when
@ -626,10 +659,6 @@ comparison operator is intended primarily for use when defining
dependencies for repeatable *deployments of applications* while using
a shared distribution index.
Publication tools and index servers SHOULD at least emit a warning when
dependencies are pinned in this fashion and MAY refuse to allow publication
of such overly specific dependencies.
Version exclusion
-----------------
@ -649,74 +678,6 @@ match or not as shown::
!= 1.1.* # Same prefix, so 1.1.post1 does not match clause
Build reference
---------------
A build reference includes the build reference operator ``is`` and
a build label or a build URL.
Publication tools and public index servers SHOULD NOT permit build
references in dependency specifications.
Installation tools SHOULD support the use of build references to identify
dependencies.
Build label matching works solely on strict equality comparisons: the
candidate build label must be exactly the same as the build label in the
version clause for the clause to match the candidate distribution.
For example, a build reference could be used to depend on a ``hashdist``
generated build of ``zlib`` with the ``hashdist`` hash used as a build
label::
zlib (is d4jwf2sb2g6glprsdqfdpcracwpzujwq)
A build URL is distinguished from a build label by the presence of
``:`` and ``/`` characters in the build reference. As these characters
are not permitted in build labels, they indicate that the reference uses
a build URL.
Some appropriate targets for a build URL are a binary archive, a
source tarball, an sdist archive or a direct reference to a tag or
specific commit in an online version control system. The exact URLs and
targets supported will be installation tool specific.
For example, a local prebuilt wheel file may be referenced directly::
exampledist (is file:///localbuilds/exampledist-1.0-py33-none-any.whl)
All build URL references SHOULD either specify a local file URL, a secure
transport mechanism (such as ``https``) or else include an expected hash
value in the URL for verification purposes. If an insecure network
transport is specified without any hash information (or with hash
information that the tool doesn't understand), automated tools SHOULD
at least emit a warning and MAY refuse to rely on the URL.
It is RECOMMENDED that only hashes which are unconditionally provided by
the latest version of the standard library's ``hashlib`` module be used
for source archive hashes. At time of writing, that list consists of
``'md5'``, ``'sha1'``, ``'sha224'``, ``'sha256'``, ``'sha384'``, and
``'sha512'``.
For binary or source archive references, an expected hash value may be
specified by including a ``<hash-algorithm>=<expected-hash>`` as part of
the URL fragment.
For version control references, the ``VCS+protocol`` scheme SHOULD be
used to identify both the version control system and the secure transport.
To support version control systems that do not support including commit or
tag references directly in the URL, that information may be appended to the
end of the URL using the ``@<tag>`` notation.
The use of ``is`` when defining dependencies for published distributions
is strongly discouraged as it greatly complicates the deployment of
security fixes. The build label matching operator is intended primarily
for use when defining dependencies for repeatable *deployments of
applications* while using a shared distribution index, as well as to
reference dependencies which are not published through an index server.
Inclusive ordered comparison
----------------------------
@ -755,62 +716,108 @@ Handling of pre-releases
------------------------
Pre-releases of any kind, including developmental releases, are implicitly
excluded from all version specifiers, *unless* a pre-release or developmental
release is explicitly mentioned in one of the clauses. For example, these
specifiers implicitly exclude all pre-releases and development
releases of later versions::
2.2
>= 1.0
While these specifiers would include at least some of them::
2.2.dev0
2.2, != 2.3b2
>= 1.0a1
>= 1.0c1
>= 1.0, != 1.0b2
>= 1.0, < 2.0.dev123
excluded from all version specifiers, *unless* they are already present
on the system, explicitly requested by the user, or if the only available
version that satisfies the version specifier is a pre-release.
By default, dependency resolution tools SHOULD:
* accept already installed pre-releases for all version specifiers
* accept remotely available pre-releases for version specifiers which
include at least one version clauses that references a pre-release
* accept remotely available pre-releases for version specifiers where
there is no final or post release that satisfies the version specifier
* exclude all other pre-releases from consideration
Dependency resolution tools MAY issue a warning if a pre-release is needed
to satisfy a version specifier.
Dependency resolution tools SHOULD also allow users to request the
following alternative behaviours:
* accepting pre-releases for all version specifiers
* excluding pre-releases for all version specifiers (reporting an error or
warning if a pre-release is already installed locally)
warning if a pre-release is already installed locally, or if a
pre-release is the only way to satisfy a particular specifier)
Dependency resolution tools MAY also allow the above behaviour to be
controlled on a per-distribution basis.
Post-releases and purely numeric releases receive no special treatment in
version specifiers - they are always included unless explicitly excluded.
Post-releases and final releases receive no special treatment in version
specifiers - they are always included unless explicitly excluded.
Examples
--------
* ``3.1``: version 3.1 or later, but not
version 4.0 or later. Excludes pre-releases and developmental releases.
* ``3.1.2``: version 3.1.2 or later, but not
version 3.2.0 or later. Excludes pre-releases and developmental releases.
* ``3.1a1``: version 3.1a1 or later, but not
version 4.0 or later. Allows pre-releases like 3.2a4 and developmental
releases like 3.2.dev1.
* ``3.1``: version 3.1 or later, but not version 4.0 or later.
* ``3.1.2``: version 3.1.2 or later, but not version 3.2.0 or later.
* ``3.1a1``: version 3.1a1 or later, but not version 4.0 or later.
* ``== 3.1``: specifically version 3.1 (or 3.1.0), excludes all pre-releases,
post releases, developmental releases and any 3.1.x maintenance releases.
* ``== 3.1.*``: any version that starts with 3.1, excluding pre-releases and
developmental releases. Equivalent to the ``3.1.0`` compatible release
clause.
* ``== 3.1.*``: any version that starts with 3.1. Equivalent to the
``3.1.0`` compatible release clause.
* ``3.1.0, != 3.1.3``: version 3.1.0 or later, but not version 3.1.3 and
not version 3.2.0 or later. Excludes pre-releases and developmental
releases.
not version 3.2.0 or later.
Direct references
=================
Some automated tools may permit the use of a direct reference as an
alternative to a normal version specifier. A direct reference consists of
the word ``from`` and an explicit URL.
Whether or not direct references are appropriate depends on the specific
use case for the version specifier. Automated tools SHOULD at least issue
warnings and MAY reject them entirely when direct references are used
inappropriately.
Public index servers SHOULD NOT allow the use of direct references in
uploaded distributions. Direct references are intended as a tool for
software integrators rather than publishers.
Depending on the use case, some appropriate targets for a direct URL
reference may be a valid ``source_url`` entry (see PEP 426), an sdist, or
a wheel binary archive. The exact URLs and targets supported will be tool
dependent.
For example, a local source archive may be referenced directly::
pip (from file:///localbuilds/pip-1.3.1.zip)
Alternatively, a prebuilt archive may also be referenced::
pip (from file:///localbuilds/pip-1.3.1-py33-none-any.whl)
All direct references that do not refer to a local file URL SHOULD
specify a secure transport mechanism (such as ``https``), include an
expected hash value in the URL for verification purposes, or both. If an
insecure transport is specified without any hash information, with hash
information that the tool doesn't understand, or with a selected hash
algorithm that the tool considers too weak to trust, automated tools
SHOULD at least emit a warning and MAY refuse to rely on the URL.
It is RECOMMENDED that only hashes which are unconditionally provided by
the latest version of the standard library's ``hashlib`` module be used
for source archive hashes. At time of writing, that list consists of
``'md5'``, ``'sha1'``, ``'sha224'``, ``'sha256'``, ``'sha384'``, and
``'sha512'``.
For source archive and wheel references, an expected hash value may be
specified by including a ``<hash-algorithm>=<expected-hash>`` entry as
part of the URL fragment.
Version control references, the ``VCS+protocol`` scheme SHOULD be
used to identify both the version control system and the secure transport.
To support version control systems that do not support including commit or
tag references directly in the URL, that information may be appended to the
end of the URL using the ``@<tag>`` notation.
Remote URL examples::
pip (from https://github.com/pypa/pip/archive/1.3.1.zip)
pip (from http://github.com/pypa/pip/archive/1.3.1.zip#sha1=da9234ee9982d4bbb3c72346a6de940a148ea686)
pip (from git+https://github.com/pypa/pip.git@1.3.1)
Updating the versioning specification
@ -823,56 +830,45 @@ Actually changing the version comparison semantics still requires a new
versioning scheme and metadata version defined in new PEPs.
Open issues
===========
* The new ``is`` operator seems like a reasonable way to cleanly allow
installation tools to bring in non-published dependencies, while heavily
discouraging the practice for published libraries. It also makes
build labels more useful by allowing them to be used to pin dependencies
in the integration use case.
However, it's an early draft of the idea, so feedback is definitely
welcome.
Summary of differences from \PEP 386
====================================
* Moved the description of version specifiers into the versioning PEP
* added the "build label" concept to better handle projects that wish to
* Added the "source label" concept to better handle projects that wish to
use a non-compliant versioning scheme internally, especially those based
on DVCS hashes
* added the "compatible release" clause
* Added the "direct reference" concept as a standard notation for direct
references to resources (rather than each tool needing to invents its own)
* added the "build reference" clause
* Added the "compatible release" clause
* added the trailing wildcard syntax for prefix based version matching
* Added the trailing wildcard syntax for prefix based version matching
and exclusion
* changed the top level sort position of the ``.devN`` suffix
* Changed the top level sort position of the ``.devN`` suffix
* allowed single value version numbers
* Allowed single value version numbers
* explicit exclusion of leading or trailing whitespace
* Explicit exclusion of leading or trailing whitespace
* explicit criterion for the exclusion of date based versions
* Explicit criterion for the exclusion of date based versions
* implicitly exclude pre-releases unless explicitly requested
* Implicitly exclude pre-releases unless they're already present or
needed to satisfy a dependency
* treat post releases the same way as unqualified releases
* Treat post releases the same way as unqualified releases
* Discuss ordering and dependencies across metadata versions
The rationale for major changes is given in the following sections.
Adding build labels
-------------------
Adding source labels
--------------------
The new build label support is intended to make it clearer that the
The new source label support is intended to make it clearer that the
constraints on public version identifiers are there primarily to aid in
the creation of reliable automated dependency analysis tools. Projects
are free to use whatever versioning scheme they like internally, so long
@ -1011,11 +1007,12 @@ The previous interpretation also excluded post-releases from some version
specifiers for no adequately justified reason.
The updated interpretation is intended to make it difficult to accidentally
accept a pre-release version as satisfying a dependency, while allowing
pre-release versions to be explicitly requested when needed.
accept a pre-release version as satisfying a dependency, while still
allowing pre-release versions to be retrieved automatically when that's the
only way to satisfy a dependency.
The "some forward compatibility assumed" default version constraint is
taken directly from the Ruby community's "pessimistic version constraint"
derived from the Ruby community's "pessimistic version constraint"
operator [2]_ to allow projects to take a cautious approach to forward
compatibility promises, while still easily setting a minimum required
version for their dependencies. It is made the default behaviour rather
@ -1038,16 +1035,26 @@ improved tools for dynamic path manipulation.
The trailing wildcard syntax to request prefix based version matching was
added to make it possible to sensibly define both compatible release clauses
and the desired pre-release handling semantics for ``<`` and ``>`` ordered
comparison clauses.
and the desired pre- and post-release handling semantics for ``<`` and ``>``
ordered comparison clauses.
Build references are added for two purposes. In conjunction with build
labels, they allow hash based references, such as those employed by
`hashdist <http://hashdist.readthedocs.org/en/latest/build_spec.html>`__,
or generated from version control. In conjunction with build URLs, they
allow the new metadata standard to natively support an existing feature of
``pip``, which allows arbitrary URLs like
``file:///localbuilds/exampledist-1.0-py33-none-any.whl``.
Adding direct references
------------------------
Direct references are added as an "escape clause" to handle messy real
world situations that don't map neatly to the standard distribution model.
This includes dependencies on unpublished software for internal use, as well
as handling the more complex compatibility issues that may arise when
wrapping third party libraries as C extensions (this is of especial concern
to the scientific community).
Index servers are deliberately given a lot of freedom to disallow direct
references, since they're intended primarily as a tool for integrators
rather than publishers. PyPI in particular is currently going through the
process of *eliminating* dependencies on external references, as unreliable
external services have the effect of slowing down installation operations,
as well as reducing PyPI's own apparent reliability.
References

View File

@ -4,13 +4,13 @@ Version: $Revision$
Last-Modified: $Date$
Author: Antoine Pitrou <solipsis@pitrou.net>
BDFL-Delegate: Benjamin Peterson <benjamin@python.org>
Status: Draft
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 2013-05-18
Python-Version: 3.4
Post-History: 2013-05-18
Resolution: TBD
Resolution: http://mail.python.org/pipermail/python-dev/2013-June/126746.html
Abstract
@ -201,8 +201,7 @@ Predictability
--------------
Following this scheme, an object's finalizer is always called exactly
once. The only exception is if an object is resurrected: the finalizer
will be called again when the object becomes unreachable again.
once, even if it was resurrected afterwards.
For CI objects, the order in which finalizers are called (step 2 above)
is undefined.

View File

@ -4,11 +4,11 @@ Version: $Revision$
Last-Modified: $Date$
Author: Łukasz Langa <lukasz@langa.pl>
Discussions-To: Python-Dev <python-dev@python.org>
Status: Draft
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 22-May-2013
Post-History: 22-May-2013, 25-May-2013
Post-History: 22-May-2013, 25-May-2013, 31-May-2013
Replaces: 245, 246, 3124
@ -44,11 +44,14 @@ However, it currently:
In addition, it is currently a common anti-pattern for Python code to
inspect the types of received arguments, in order to decide what to do
with the objects. For example, code may wish to accept either an object
of some type, or a sequence of objects of that type.
with the objects.
For example, code may wish to accept either an object
of some type, or a sequence of objects of that type.
Currently, the "obvious way" to do this is by type inspection, but this
is brittle and closed to extension. Abstract Base Classes make it easier
is brittle and closed to extension.
Abstract Base Classes make it easier
to discover present behaviour, but don't help adding new behaviour.
A developer using an already-written library may be unable to change how
their objects are treated by such code, especially if the objects they
@ -63,7 +66,7 @@ User API
To define a generic function, decorate it with the ``@singledispatch``
decorator. Note that the dispatch happens on the type of the first
argument, create your function accordingly::
argument. Create your function accordingly::
>>> from functools import singledispatch
>>> @singledispatch
@ -73,7 +76,7 @@ argument, create your function accordingly::
... print(arg)
To add overloaded implementations to the function, use the
``register()`` attribute of the generic function. It is a decorator,
``register()`` attribute of the generic function. This is a decorator,
taking a type parameter and decorating a function implementing the
operation for that type::
@ -98,7 +101,7 @@ To enable registering lambdas and pre-existing functions, the
...
>>> fun.register(type(None), nothing)
The ``register()`` attribute returns the undecorated function which
The ``register()`` attribute returns the undecorated function. This
enables decorator stacking, pickling, as well as creating unit tests for
each variant independently::
@ -134,13 +137,17 @@ argument::
Where there is no registered implementation for a specific type, its
method resolution order is used to find a more generic implementation.
The original function decorated with ``@singledispatch`` is registered
for the base ``object`` type, which means it is used if no better
implementation is found.
To check which implementation will the generic function choose for
a given type, use the ``dispatch()`` attribute::
>>> fun.dispatch(float)
<function fun_num at 0x104319058>
>>> fun.dispatch(dict)
<function fun at 0x103fe4788>
>>> fun.dispatch(dict) # note: default implementation
<function fun at 0x103fe0000>
To access all registered implementations, use the read-only ``registry``
attribute::
@ -152,7 +159,7 @@ attribute::
>>> fun.registry[float]
<function fun_num at 0x1035a2840>
>>> fun.registry[object]
<function fun at 0x103170788>
<function fun at 0x103fe0000>
The proposed API is intentionally limited and opinionated, as to ensure
it is easy to explain and use, as well as to maintain consistency with
@ -168,12 +175,12 @@ implementation is mature, the goal is to move it largely as-is. The
reference implementation is available on hg.python.org [#ref-impl]_.
The dispatch type is specified as a decorator argument. An alternative
form using function annotations has been considered but its inclusion
has been deferred. As of May 2013, this usage pattern is out of scope
for the standard library [#pep-0008]_ and the best practices for
form using function annotations was considered but its inclusion
has been rejected. As of May 2013, this usage pattern is out of scope
for the standard library [#pep-0008]_, and the best practices for
annotation usage are still debated.
Based on the current ``pkgutil.simplegeneric`` implementation and
Based on the current ``pkgutil.simplegeneric`` implementation, and
following the convention on registering virtual subclasses on Abstract
Base Classes, the dispatch registry will not be thread-safe.
@ -186,48 +193,37 @@ handling of old-style classes and Zope's ExtensionClasses. More
importantly, it introduces support for Abstract Base Classes (ABC).
When a generic function implementation is registered for an ABC, the
dispatch algorithm switches to a mode of MRO calculation for the
provided argument which includes the relevant ABCs. The algorithm is as
follows::
dispatch algorithm switches to an extended form of C3 linearization,
which includes the relevant ABCs in the MRO of the provided argument.
The algorithm inserts ABCs where their functionality is introduced, i.e.
``issubclass(cls, abc)`` returns ``True`` for the class itself but
returns ``False`` for all its direct base classes. Implicit ABCs for
a given class (either registered or inferred from the presence of
a special method like ``__len__()``) are inserted directly after the
last ABC explicitly listed in the MRO of said class.
def _compose_mro(cls, haystack):
"""Calculates the MRO for a given class `cls`, including relevant
abstract base classes from `haystack`."""
bases = set(cls.__mro__)
mro = list(cls.__mro__)
for regcls in haystack:
if regcls in bases or not issubclass(cls, regcls):
continue # either present in the __mro__ or unrelated
for index, base in enumerate(mro):
if not issubclass(base, regcls):
break
if base in bases and not issubclass(regcls, base):
# Conflict resolution: put classes present in __mro__
# and their subclasses first.
index += 1
mro.insert(index, regcls)
return mro
In its most basic form, it returns the MRO for the given type::
In its most basic form, this linearization returns the MRO for the given
type::
>>> _compose_mro(dict, [])
[<class 'dict'>, <class 'object'>]
When the haystack consists of ABCs that the specified type is a subclass
of, they are inserted in a predictable order::
When the second argument contains ABCs that the specified type is
a subclass of, they are inserted in a predictable order::
>>> _compose_mro(dict, [Sized, MutableMapping, str,
... Sequence, Iterable])
[<class 'dict'>, <class 'collections.abc.MutableMapping'>,
<class 'collections.abc.Iterable'>, <class 'collections.abc.Sized'>,
<class 'collections.abc.Mapping'>, <class 'collections.abc.Sized'>,
<class 'collections.abc.Iterable'>, <class 'collections.abc.Container'>,
<class 'object'>]
While this mode of operation is significantly slower, all dispatch
decisions are cached. The cache is invalidated on registering new
implementations on the generic function or when user code calls
``register()`` on an ABC to register a new virtual subclass. In the
latter case, it is possible to create a situation with ambiguous
dispatch, for instance::
``register()`` on an ABC to implicitly subclass it. In the latter case,
it is possible to create a situation with ambiguous dispatch, for
instance::
>>> from collections import Iterable, Container
>>> class P:
@ -254,9 +250,9 @@ guess::
RuntimeError: Ambiguous dispatch: <class 'collections.abc.Container'>
or <class 'collections.abc.Iterable'>
Note that this exception would not be raised if ``Iterable`` and
``Container`` had been provided as base classes during class definition.
In this case dispatch happens in the MRO order::
Note that this exception would not be raised if one or more ABCs had
been provided explicitly as base classes during class definition. In
this case dispatch happens in the MRO order::
>>> class Ten(Iterable, Container):
... def __iter__(self):
@ -268,13 +264,31 @@ In this case dispatch happens in the MRO order::
>>> g(Ten())
'iterable'
A similar conflict arises when subclassing an ABC is inferred from the
presence of a special method like ``__len__()`` or ``__contains__()``::
>>> class Q:
... def __contains__(self, value):
... return False
...
>>> issubclass(Q, Container)
True
>>> Iterable.register(Q)
>>> g(Q())
Traceback (most recent call last):
...
RuntimeError: Ambiguous dispatch: <class 'collections.abc.Container'>
or <class 'collections.abc.Iterable'>
An early version of the PEP contained a custom approach that was simpler
but created a number of edge cases with surprising results [#why-c3]_.
Usage Patterns
==============
This PEP proposes extending behaviour only of functions specifically
marked as generic. Just as a base class method may be overridden by
a subclass, so too may a function be overloaded to provide custom
a subclass, so too a function may be overloaded to provide custom
functionality for a given type.
Universal overloading does not equal *arbitrary* overloading, in the
@ -371,6 +385,8 @@ References
a particular annotation style".
(http://www.python.org/dev/peps/pep-0008)
.. [#why-c3] http://bugs.python.org/issue18244
.. [#pep-3124] http://www.python.org/dev/peps/pep-3124/
.. [#peak-rules] http://peak.telecommunity.com/DevCenter/PEAK_2dRules

773
pep-0445.txt Normal file
View File

@ -0,0 +1,773 @@
PEP: 445
Title: Add new APIs to customize Python memory allocators
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@gmail.com>
BDFL-Delegate: Antoine Pitrou <solipsis@pitrou.net>
Status: Accepted
Type: Standards Track
Content-Type: text/x-rst
Created: 15-june-2013
Python-Version: 3.4
Resolution: http://mail.python.org/pipermail/python-dev/2013-July/127222.html
Abstract
========
This PEP proposes new Application Programming Interfaces (API) to customize
Python memory allocators. The only implementation required to conform to
this PEP is CPython, but other implementations may choose to be compatible,
or to re-use a similar scheme.
Rationale
=========
Use cases:
* Applications embedding Python which want to isolate Python memory from
the memory of the application, or want to use a different memory
allocator optimized for its Python usage
* Python running on embedded devices with low memory and slow CPU.
A custom memory allocator can be used for efficiency and/or to get
access all the memory of the device.
* Debug tools for memory allocators:
- track the memory usage (find memory leaks)
- get the location of a memory allocation: Python filename and line
number, and the size of a memory block
- detect buffer underflow, buffer overflow and misuse of Python
allocator APIs (see `Redesign Debug Checks on Memory Block
Allocators as Hooks`_)
- force memory allocations to fail to test handling of the
``MemoryError`` exception
Proposal
========
New Functions and Structures
----------------------------
* Add a new GIL-free (no need to hold the GIL) memory allocator:
- ``void* PyMem_RawMalloc(size_t size)``
- ``void* PyMem_RawRealloc(void *ptr, size_t new_size)``
- ``void PyMem_RawFree(void *ptr)``
- The newly allocated memory will not have been initialized in any
way.
- Requesting zero bytes returns a distinct non-*NULL* pointer if
possible, as if ``PyMem_Malloc(1)`` had been called instead.
* Add a new ``PyMemAllocator`` structure::
typedef struct {
/* user context passed as the first argument to the 3 functions */
void *ctx;
/* allocate a memory block */
void* (*malloc) (void *ctx, size_t size);
/* allocate or resize a memory block */
void* (*realloc) (void *ctx, void *ptr, size_t new_size);
/* release a memory block */
void (*free) (void *ctx, void *ptr);
} PyMemAllocator;
* Add a new ``PyMemAllocatorDomain`` enum to choose the Python
allocator domain. Domains:
- ``PYMEM_DOMAIN_RAW``: ``PyMem_RawMalloc()``, ``PyMem_RawRealloc()``
and ``PyMem_RawFree()``
- ``PYMEM_DOMAIN_MEM``: ``PyMem_Malloc()``, ``PyMem_Realloc()`` and
``PyMem_Free()``
- ``PYMEM_DOMAIN_OBJ``: ``PyObject_Malloc()``, ``PyObject_Realloc()``
and ``PyObject_Free()``
* Add new functions to get and set memory block allocators:
- ``void PyMem_GetAllocator(PyMemAllocatorDomain domain, PyMemAllocator *allocator)``
- ``void PyMem_SetAllocator(PyMemAllocatorDomain domain, PyMemAllocator *allocator)``
- The new allocator must return a distinct non-*NULL* pointer when
requesting zero bytes
- For the ``PYMEM_DOMAIN_RAW`` domain, the allocator must be
thread-safe: the GIL is not held when the allocator is called.
* Add a new ``PyObjectArenaAllocator`` structure::
typedef struct {
/* user context passed as the first argument to the 2 functions */
void *ctx;
/* allocate an arena */
void* (*alloc) (void *ctx, size_t size);
/* release an arena */
void (*free) (void *ctx, void *ptr, size_t size);
} PyObjectArenaAllocator;
* Add new functions to get and set the arena allocator used by
*pymalloc*:
- ``void PyObject_GetArenaAllocator(PyObjectArenaAllocator *allocator)``
- ``void PyObject_SetArenaAllocator(PyObjectArenaAllocator *allocator)``
* Add a new function to reinstall the debug checks on memory allocators when
a memory allocator is replaced with ``PyMem_SetAllocator()``:
- ``void PyMem_SetupDebugHooks(void)``
- Install the debug hooks on all memory block allocators. The function can be
called more than once, hooks are only installed once.
- The function does nothing is Python is not compiled in debug mode.
* Memory block allocators always return *NULL* if *size* is greater than
``PY_SSIZE_T_MAX``. The check is done before calling the inner
function.
.. note::
The *pymalloc* allocator is optimized for objects smaller than 512 bytes
with a short lifetime. It uses memory mappings with a fixed size of 256
KB called "arenas".
Here is how the allocators are set up by default:
* ``PYMEM_DOMAIN_RAW``, ``PYMEM_DOMAIN_MEM``: ``malloc()``,
``realloc()`` and ``free()``; call ``malloc(1)`` when requesting zero
bytes
* ``PYMEM_DOMAIN_OBJ``: *pymalloc* allocator which falls back on
``PyMem_Malloc()`` for allocations larger than 512 bytes
* *pymalloc* arena allocator: ``VirtualAlloc()`` and ``VirtualFree()`` on
Windows, ``mmap()`` and ``munmap()`` when available, or ``malloc()``
and ``free()``
Redesign Debug Checks on Memory Block Allocators as Hooks
---------------------------------------------------------
Since Python 2.3, Python implements different checks on memory
allocators in debug mode:
* Newly allocated memory is filled with the byte ``0xCB``, freed memory
is filled with the byte ``0xDB``.
* Detect API violations, ex: ``PyObject_Free()`` called on a memory
block allocated by ``PyMem_Malloc()``
* Detect write before the start of the buffer (buffer underflow)
* Detect write after the end of the buffer (buffer overflow)
In Python 3.3, the checks are installed by replacing ``PyMem_Malloc()``,
``PyMem_Realloc()``, ``PyMem_Free()``, ``PyObject_Malloc()``,
``PyObject_Realloc()`` and ``PyObject_Free()`` using macros. The new
allocator allocates a larger buffer and writes a pattern to detect buffer
underflow, buffer overflow and use after free (by filling the buffer with
the byte ``0xDB``). It uses the original ``PyObject_Malloc()``
function to allocate memory. So ``PyMem_Malloc()`` and
``PyMem_Realloc()`` indirectly call``PyObject_Malloc()`` and
``PyObject_Realloc()``.
This PEP redesigns the debug checks as hooks on the existing allocators
in debug mode. Examples of call traces without the hooks:
* ``PyMem_RawMalloc()`` => ``_PyMem_RawMalloc()`` => ``malloc()``
* ``PyMem_Realloc()`` => ``_PyMem_RawRealloc()`` => ``realloc()``
* ``PyObject_Free()`` => ``_PyObject_Free()``
Call traces when the hooks are installed (debug mode):
* ``PyMem_RawMalloc()`` => ``_PyMem_DebugMalloc()``
=> ``_PyMem_RawMalloc()`` => ``malloc()``
* ``PyMem_Realloc()`` => ``_PyMem_DebugRealloc()``
=> ``_PyMem_RawRealloc()`` => ``realloc()``
* ``PyObject_Free()`` => ``_PyMem_DebugFree()``
=> ``_PyObject_Free()``
As a result, ``PyMem_Malloc()`` and ``PyMem_Realloc()`` now call
``malloc()`` and ``realloc()`` in both release mode and debug mode,
instead of calling ``PyObject_Malloc()`` and ``PyObject_Realloc()`` in
debug mode.
When at least one memory allocator is replaced with
``PyMem_SetAllocator()``, the ``PyMem_SetupDebugHooks()`` function must
be called to reinstall the debug hooks on top on the new allocator.
Don't call malloc() directly anymore
------------------------------------
``PyObject_Malloc()`` falls back on ``PyMem_Malloc()`` instead of
``malloc()`` if size is greater or equal than 512 bytes, and
``PyObject_Realloc()`` falls back on ``PyMem_Realloc()`` instead of
``realloc()``
Direct calls to ``malloc()`` are replaced with ``PyMem_Malloc()``, or
``PyMem_RawMalloc()`` if the GIL is not held.
External libraries like zlib or OpenSSL can be configured to allocate memory
using ``PyMem_Malloc()`` or ``PyMem_RawMalloc()``. If the allocator of a
library can only be replaced globally (rather than on an object-by-object
basis), it shouldn't be replaced when Python is embedded in an application.
For the "track memory usage" use case, it is important to track memory
allocated in external libraries to have accurate reports, because these
allocations can be large (e.g. they can raise a ``MemoryError`` exception)
and would otherwise be missed in memory usage reports.
Examples
========
Use case 1: Replace Memory Allocators, keep pymalloc
----------------------------------------------------
Dummy example wasting 2 bytes per memory block,
and 10 bytes per *pymalloc* arena::
#include <stdlib.h>
size_t alloc_padding = 2;
size_t arena_padding = 10;
void* my_malloc(void *ctx, size_t size)
{
int padding = *(int *)ctx;
return malloc(size + padding);
}
void* my_realloc(void *ctx, void *ptr, size_t new_size)
{
int padding = *(int *)ctx;
return realloc(ptr, new_size + padding);
}
void my_free(void *ctx, void *ptr)
{
free(ptr);
}
void* my_alloc_arena(void *ctx, size_t size)
{
int padding = *(int *)ctx;
return malloc(size + padding);
}
void my_free_arena(void *ctx, void *ptr, size_t size)
{
free(ptr);
}
void setup_custom_allocator(void)
{
PyMemAllocator alloc;
PyObjectArenaAllocator arena;
alloc.ctx = &alloc_padding;
alloc.malloc = my_malloc;
alloc.realloc = my_realloc;
alloc.free = my_free;
PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &alloc);
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, &alloc);
/* leave PYMEM_DOMAIN_OBJ unchanged, use pymalloc */
arena.ctx = &arena_padding;
arena.alloc = my_alloc_arena;
arena.free = my_free_arena;
PyObject_SetArenaAllocator(&arena);
PyMem_SetupDebugHooks();
}
Use case 2: Replace Memory Allocators, override pymalloc
--------------------------------------------------------
If you have a dedicated allocator optimized for allocations of objects
smaller than 512 bytes with a short lifetime, pymalloc can be overriden
(replace ``PyObject_Malloc()``).
Dummy example wasting 2 bytes per memory block::
#include <stdlib.h>
size_t padding = 2;
void* my_malloc(void *ctx, size_t size)
{
int padding = *(int *)ctx;
return malloc(size + padding);
}
void* my_realloc(void *ctx, void *ptr, size_t new_size)
{
int padding = *(int *)ctx;
return realloc(ptr, new_size + padding);
}
void my_free(void *ctx, void *ptr)
{
free(ptr);
}
void setup_custom_allocator(void)
{
PyMemAllocator alloc;
alloc.ctx = &padding;
alloc.malloc = my_malloc;
alloc.realloc = my_realloc;
alloc.free = my_free;
PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &alloc);
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, &alloc);
PyMem_SetAllocator(PYMEM_DOMAIN_OBJ, &alloc);
PyMem_SetupDebugHooks();
}
The *pymalloc* arena does not need to be replaced, because it is no more
used by the new allocator.
Use case 3: Setup Hooks On Memory Block Allocators
--------------------------------------------------
Example to setup hooks on all memory block allocators::
struct {
PyMemAllocator raw;
PyMemAllocator mem;
PyMemAllocator obj;
/* ... */
} hook;
static void* hook_malloc(void *ctx, size_t size)
{
PyMemAllocator *alloc = (PyMemAllocator *)ctx;
void *ptr;
/* ... */
ptr = alloc->malloc(alloc->ctx, size);
/* ... */
return ptr;
}
static void* hook_realloc(void *ctx, void *ptr, size_t new_size)
{
PyMemAllocator *alloc = (PyMemAllocator *)ctx;
void *ptr2;
/* ... */
ptr2 = alloc->realloc(alloc->ctx, ptr, new_size);
/* ... */
return ptr2;
}
static void hook_free(void *ctx, void *ptr)
{
PyMemAllocator *alloc = (PyMemAllocator *)ctx;
/* ... */
alloc->free(alloc->ctx, ptr);
/* ... */
}
void setup_hooks(void)
{
PyMemAllocator alloc;
static int installed = 0;
if (installed)
return;
installed = 1;
alloc.malloc = hook_malloc;
alloc.realloc = hook_realloc;
alloc.free = hook_free;
PyMem_GetAllocator(PYMEM_DOMAIN_RAW, &hook.raw);
PyMem_GetAllocator(PYMEM_DOMAIN_MEM, &hook.mem);
PyMem_GetAllocator(PYMEM_DOMAIN_OBJ, &hook.obj);
alloc.ctx = &hook.raw;
PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &alloc);
alloc.ctx = &hook.mem;
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, &alloc);
alloc.ctx = &hook.obj;
PyMem_SetAllocator(PYMEM_DOMAIN_OBJ, &alloc);
}
.. note::
``PyMem_SetupDebugHooks()`` does not need to be called because
memory allocator are not replaced: the debug checks on memory
block allocators are installed automatically at startup.
Performances
============
The implementation of this PEP (issue #3329) has no visible overhead on
the Python benchmark suite.
Results of the `Python benchmarks suite
<http://hg.python.org/benchmarks>`_ (-b 2n3): some tests are 1.04x
faster, some tests are 1.04 slower. Results of pybench microbenchmark:
"+0.1%" slower globally (diff between -4.9% and +5.6%).
The full output of benchmarks is attached to the issue #3329.
Rejected Alternatives
=====================
More specific functions to get/set memory allocators
----------------------------------------------------
It was originally proposed a larger set of C API functions, with one pair
of functions for each allocator domain:
* ``void PyMem_GetRawAllocator(PyMemAllocator *allocator)``
* ``void PyMem_GetAllocator(PyMemAllocator *allocator)``
* ``void PyObject_GetAllocator(PyMemAllocator *allocator)``
* ``void PyMem_SetRawAllocator(PyMemAllocator *allocator)``
* ``void PyMem_SetAllocator(PyMemAllocator *allocator)``
* ``void PyObject_SetAllocator(PyMemAllocator *allocator)``
This alternative was rejected because it is not possible to write
generic code with more specific functions: code must be duplicated for
each memory allocator domain.
Make PyMem_Malloc() reuse PyMem_RawMalloc() by default
------------------------------------------------------
If ``PyMem_Malloc()`` called ``PyMem_RawMalloc()`` by default,
calling ``PyMem_SetAllocator(PYMEM_DOMAIN_RAW, alloc)`` would also
patch ``PyMem_Malloc()`` indirectly.
This alternative was rejected because ``PyMem_SetAllocator()`` would
have a different behaviour depending on the domain. Always having the
same behaviour is less error-prone.
Add a new PYDEBUGMALLOC environment variable
--------------------------------------------
It was proposed to add a new ``PYDEBUGMALLOC`` environment variable to
enable debug checks on memory block allocators. It would have had the same
effect as calling the ``PyMem_SetupDebugHooks()``, without the need
to write any C code. Another advantage is to allow to enable debug checks
even in release mode: debug checks would always be compiled in, but only
enabled when the environment variable is present and non-empty.
This alternative was rejected because a new environment variable would
make Python initialization even more complex. `PEP 432
<http://www.python.org/dev/peps/pep-0432/>`_ tries to simplify the
CPython startup sequence.
Use macros to get customizable allocators
-----------------------------------------
To have no overhead in the default configuration, customizable
allocators would be an optional feature enabled by a configuration
option or by macros.
This alternative was rejected because the use of macros implies having
to recompile extensions modules to use the new allocator and allocator
hooks. Not having to recompile Python nor extension modules makes debug
hooks easier to use in practice.
Pass the C filename and line number
-----------------------------------
Define allocator functions as macros using ``__FILE__`` and ``__LINE__``
to get the C filename and line number of a memory allocation.
Example of ``PyMem_Malloc`` macro with the modified
``PyMemAllocator`` structure::
typedef struct {
/* user context passed as the first argument
to the 3 functions */
void *ctx;
/* allocate a memory block */
void* (*malloc) (void *ctx, const char *filename, int lineno,
size_t size);
/* allocate or resize a memory block */
void* (*realloc) (void *ctx, const char *filename, int lineno,
void *ptr, size_t new_size);
/* release a memory block */
void (*free) (void *ctx, const char *filename, int lineno,
void *ptr);
} PyMemAllocator;
void* _PyMem_MallocTrace(const char *filename, int lineno,
size_t size);
/* the function is still needed for the Python stable ABI */
void* PyMem_Malloc(size_t size);
#define PyMem_Malloc(size) \
_PyMem_MallocTrace(__FILE__, __LINE__, size)
The GC allocator functions would also have to be patched. For example,
``_PyObject_GC_Malloc()`` is used in many C functions and so objects of
different types would have the same allocation location.
This alternative was rejected because passing a filename and a line
number to each allocator makes the API more complex: pass 3 new
arguments (ctx, filename, lineno) to each allocator function, instead of
just a context argument (ctx). Having to also modify GC allocator
functions adds too much complexity for a little gain.
GIL-free PyMem_Malloc()
-----------------------
In Python 3.3, when Python is compiled in debug mode, ``PyMem_Malloc()``
indirectly calls ``PyObject_Malloc()`` which requires the GIL to be
held (it isn't thread-safe). That's why ``PyMem_Malloc()`` must be called
with the GIL held.
This PEP changes ``PyMem_Malloc()``: it now always calls ``malloc()``
rather than ``PyObject_Malloc()``. The "GIL must be held" restriction
could therefore be removed from ``PyMem_Malloc()``.
This alternative was rejected because allowing to call
``PyMem_Malloc()`` without holding the GIL can break applications
which setup their own allocators or allocator hooks. Holding the GIL is
convenient to develop a custom allocator: no need to care about other
threads. It is also convenient for a debug allocator hook: Python
objects can be safely inspected, and the C API may be used for reporting.
Moreover, calling ``PyGILState_Ensure()`` in a memory allocator has
unexpected behaviour, especially at Python startup and when creating of a
new Python thread state. It is better to free custom allocators of
the responsibility of acquiring the GIL.
Don't add PyMem_RawMalloc()
---------------------------
Replace ``malloc()`` with ``PyMem_Malloc()``, but only if the GIL is
held. Otherwise, keep ``malloc()`` unchanged.
The ``PyMem_Malloc()`` is used without the GIL held in some Python
functions. For example, the ``main()`` and ``Py_Main()`` functions of
Python call ``PyMem_Malloc()`` whereas the GIL do not exist yet. In this
case, ``PyMem_Malloc()`` would be replaced with ``malloc()`` (or
``PyMem_RawMalloc()``).
This alternative was rejected because ``PyMem_RawMalloc()`` is required
for accurate reports of the memory usage. When a debug hook is used to
track the memory usage, the memory allocated by direct calls to
``malloc()`` cannot be tracked. ``PyMem_RawMalloc()`` can be hooked and
so all the memory allocated by Python can be tracked, including
memory allocated without holding the GIL.
Use existing debug tools to analyze memory use
----------------------------------------------
There are many existing debug tools to analyze memory use. Some
examples: `Valgrind <http://valgrind.org/>`_, `Purify
<http://ibm.com/software/awdtools/purify/>`_, `Clang AddressSanitizer
<http://code.google.com/p/address-sanitizer/>`_, `failmalloc
<http://www.nongnu.org/failmalloc/>`_, etc.
The problem is to retrieve the Python object related to a memory pointer
to read its type and/or its content. Another issue is to retrieve the
source of the memory allocation: the C backtrace is usually useless
(same reasoning than macros using ``__FILE__`` and ``__LINE__``, see
`Pass the C filename and line number`_), the Python filename and line
number (or even the Python traceback) is more useful.
This alternative was rejected because classic tools are unable to
introspect Python internals to collect such information. Being able to
setup a hook on allocators called with the GIL held allows to collect a
lot of useful data from Python internals.
Add a msize() function
----------------------
Add another function to ``PyMemAllocator`` and
``PyObjectArenaAllocator`` structures::
size_t msize(void *ptr);
This function returns the size of a memory block or a memory mapping.
Return (size_t)-1 if the function is not implemented or if the pointer
is unknown (ex: NULL pointer).
On Windows, this function can be implemented using ``_msize()`` and
``VirtualQuery()``.
The function can be used to implement a hook tracking the memory usage.
The ``free()`` method of an allocator only gets the address of a memory
block, whereas the size of the memory block is required to update the
memory usage.
The additional ``msize()`` function was rejected because only few
platforms implement it. For example, Linux with the GNU libc does not
provide a function to get the size of a memory block. ``msize()`` is not
currently used in the Python source code. The function would only be
used to track memory use, and make the API more complex. A debug hook
can implement the function internally, there is no need to add it to
``PyMemAllocator`` and ``PyObjectArenaAllocator`` structures.
No context argument
-------------------
Simplify the signature of allocator functions, remove the context
argument:
* ``void* malloc(size_t size)``
* ``void* realloc(void *ptr, size_t new_size)``
* ``void free(void *ptr)``
It is likely for an allocator hook to be reused for
``PyMem_SetAllocator()`` and ``PyObject_SetAllocator()``, or even
``PyMem_SetRawAllocator()``, but the hook must call a different function
depending on the allocator. The context is a convenient way to reuse the
same custom allocator or hook for different Python allocators.
In C++, the context can be used to pass *this*.
External Libraries
==================
Examples of API used to customize memory allocators.
Libraries used by Python:
* OpenSSL: `CRYPTO_set_mem_functions()
<http://git.openssl.org/gitweb/?p=openssl.git;a=blob;f=crypto/mem.c;h=f7984fa958eb1edd6c61f6667f3f2b29753be662;hb=HEAD#l124>`_
to set memory management functions globally
* expat: `parserCreate()
<http://hg.python.org/cpython/file/cc27d50bd91a/Modules/expat/xmlparse.c#l724>`_
has a per-instance memory handler
* zlib: `zlib 1.2.8 Manual <http://www.zlib.net/manual.html#Usage>`_,
pass an opaque pointer
* bz2: `bzip2 and libbzip2, version 1.0.5
<http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html>`_,
pass an opaque pointer
* lzma: `LZMA SDK - How to Use
<http://www.asawicki.info/news_1368_lzma_sdk_-_how_to_use.html>`_,
pass an opaque pointer
* lipmpdec: no opaque pointer (classic malloc API)
Other libraries:
* glib: `g_mem_set_vtable()
<http://developer.gnome.org/glib/unstable/glib-Memory-Allocation.html#g-mem-set-vtable>`_
* libxml2:
`xmlGcMemSetup() <http://xmlsoft.org/html/libxml-xmlmemory.html>`_,
global
* Oracle's OCI: `Oracle Call Interface Programmer's Guide,
Release 2 (9.2)
<http://docs.oracle.com/cd/B10501_01/appdev.920/a96584/oci15re4.htm>`_,
pass an opaque pointer
The new *ctx* parameter of this PEP was inspired by the API of zlib and
Oracle's OCI libraries.
See also the `GNU libc: Memory Allocation Hooks
<http://www.gnu.org/software/libc/manual/html_node/Hooks-for-Malloc.html>`_
which uses a different approach to hook memory allocators.
Memory Allocators
=================
The C standard library provides the well known ``malloc()`` function.
Its implementation depends on the platform and of the C library. The GNU
C library uses a modified ptmalloc2, based on "Doug Lea's Malloc"
(dlmalloc). FreeBSD uses `jemalloc
<http://www.canonware.com/jemalloc/>`_. Google provides *tcmalloc* which
is part of `gperftools <http://code.google.com/p/gperftools/>`_.
``malloc()`` uses two kinds of memory: heap and memory mappings. Memory
mappings are usually used for large allocations (ex: larger than 256
KB), whereas the heap is used for small allocations.
On UNIX, the heap is handled by ``brk()`` and ``sbrk()`` system calls,
and it is contiguous. On Windows, the heap is handled by
``HeapAlloc()`` and can be discontiguous. Memory mappings are handled by
``mmap()`` on UNIX and ``VirtualAlloc()`` on Windows, they can be
discontiguous.
Releasing a memory mapping gives back immediatly the memory to the
system. On UNIX, the heap memory is only given back to the system if the
released block is located at the end of the heap. Otherwise, the memory
will only be given back to the system when all the memory located after
the released memory is also released.
To allocate memory on the heap, an allocator tries to reuse free space.
If there is no contiguous space big enough, the heap must be enlarged,
even if there is more free space than required size. This issue is
called the "memory fragmentation": the memory usage seen by the system
is higher than real usage. On Windows, ``HeapAlloc()`` creates
a new memory mapping with ``VirtualAlloc()`` if there is not enough free
contiguous memory.
CPython has a *pymalloc* allocator for allocations smaller than 512
bytes. This allocator is optimized for small objects with a short
lifetime. It uses memory mappings called "arenas" with a fixed size of
256 KB.
Other allocators:
* Windows provides a `Low-fragmentation Heap
<http://msdn.microsoft.com/en-us/library/windows/desktop/aa366750%28v=vs.85%29.aspx>`_.
* The Linux kernel uses `slab allocation
<http://en.wikipedia.org/wiki/Slab_allocation>`_.
* The glib library has a `Memory Slice API
<https://developer.gnome.org/glib/unstable/glib-Memory-Slices.html>`_:
efficient way to allocate groups of equal-sized chunks of memory
This PEP allows to choose exactly which memory allocator is used for your
application depending on its usage of the memory (number of allocations,
size of allocations, lifetime of objects, etc.).
Links
=====
CPython issues related to memory allocation:
* `Issue #3329: Add new APIs to customize memory allocators
<http://bugs.python.org/issue3329>`_
* `Issue #13483: Use VirtualAlloc to allocate memory arenas
<http://bugs.python.org/issue13483>`_
* `Issue #16742: PyOS_Readline drops GIL and calls PyOS_StdioReadline,
which isn't thread safe <http://bugs.python.org/issue16742>`_
* `Issue #18203: Replace calls to malloc() with PyMem_Malloc() or
PyMem_RawMalloc() <http://bugs.python.org/issue18203>`_
* `Issue #18227: Use Python memory allocators in external libraries like
zlib or OpenSSL <http://bugs.python.org/issue18227>`_
Projects analyzing the memory usage of Python applications:
* `pytracemalloc
<https://pypi.python.org/pypi/pytracemalloc>`_
* `Meliae: Python Memory Usage Analyzer
<https://pypi.python.org/pypi/meliae>`_
* `Guppy-PE: umbrella package combining Heapy and GSL
<http://guppy-pe.sourceforge.net/>`_
* `PySizer (developed for Python 2.4)
<http://pysizer.8325.org/>`_
Copyright
=========
This document has been placed into the public domain.

248
pep-0446.txt Normal file
View File

@ -0,0 +1,248 @@
PEP: 446
Title: Add new parameters to configure the inheritance of files and for non-blocking sockets
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 3-July-2013
Python-Version: 3.4
Abstract
========
This PEP proposes new portable parameters and functions to configure the
inheritance of file descriptors and the non-blocking flag of sockets.
Rationale
=========
Inheritance of file descriptors
-------------------------------
The inheritance of file descriptors in child processes can be configured
on each file descriptor using a *close-on-exec* flag. By default, the
close-on-exec flag is not set.
On Windows, the close-on-exec flag is the inverse of ``HANDLE_FLAG_INHERIT``. File
descriptors are not inherited if the ``bInheritHandles`` parameter of
the ``CreateProcess()`` function is ``FALSE``, even if the
``HANDLE_FLAG_INHERIT`` flag is set. If ``bInheritHandles`` is ``TRUE``,
only file descriptors with ``HANDLE_FLAG_INHERIT`` flag set are
inherited, others are not.
On UNIX, the close-on-exec flag is ``O_CLOEXEC``. File descriptors with
the ``O_CLOEXEC`` flag set are closed at the execution of a new program
(ex: when calling ``execv()``).
The ``O_CLOEXEC`` flag has no effect on ``fork()``, all file descriptors
are inherited by the child process. Futhermore, most properties file
descriptors are shared between the parent and the child processes,
except file attributes which are duplicated (``O_CLOEXEC`` is the only
file attribute). Setting ``O_CLOEXEC`` flag of a file descriptor in the
child process does not change the ``O_CLOEXEC`` flag of the file
descriptor in the parent process.
Issues of the inheritance of file descriptors
---------------------------------------------
Inheritance of file descriptors causes issues. For example, closing a
file descriptor in the parent process does not release the resource
(file, socket, ...), because the file descriptor is still open in the
child process.
Leaking file descriptors is also a major security vulnerability. An
untrusted child process can read sensitive data like passwords and take
control of the parent process though leaked file descriptors. It is for
example a known vulnerability to escape from a chroot.
Non-blocking sockets
--------------------
To handle multiple network clients in a single thread, a multiplexing
function like ``select()`` can be used. For best performances, sockets
must be configured as non-blocking. Operations like ``send()`` and
``recv()`` return an ``EAGAIN`` or ``EWOULDBLOCK`` error if the
operation would block.
By default, newly created sockets are blocking. Setting the non-blocking
mode requires additional system calls.
On UNIX, the blocking flag is ``O_NONBLOCK``: a pipe and a socket are
non-blocking if the ``O_NONBLOCK`` flag is set.
Setting flags at the creation of the file descriptor
----------------------------------------------------
Windows and recent versions of other operating systems like Linux
support setting the close-on-exec flag directly at the creation of file
descriptors, and close-on-exec and blocking flags at the creation of
sockets.
Setting these flags at the creation is atomic and avoids additional
system calls.
Proposal
========
New cloexec And blocking Parameters
-----------------------------------
Add a new optional *cloexec* on functions creating file descriptors:
* ``io.FileIO``
* ``io.open()``
* ``open()``
* ``os.dup()``
* ``os.dup2()``
* ``os.fdopen()``
* ``os.open()``
* ``os.openpty()``
* ``os.pipe()``
* ``select.devpoll()``
* ``select.epoll()``
* ``select.kqueue()``
Add new optional *cloexec* and *blocking* parameters to functions
creating sockets:
* ``asyncore.dispatcher.create_socket()``
* ``socket.socket()``
* ``socket.socket.accept()``
* ``socket.socket.dup()``
* ``socket.socket.fromfd``
* ``socket.socketpair()``
The default value of *cloexec* is ``False`` and the default value of
*blocking* is ``True``.
The atomicity is not guaranteed. If the platform does not support
setting close-on-exec and blocking flags at the creation of the file
descriptor or socket, the flags are set using additional system calls.
New Functions
-------------
Add new functions the get and set the close-on-exec flag of a file
descriptor, available on all platforms:
* ``os.get_cloexec(fd:int) -> bool``
* ``os.set_cloexec(fd:int, cloexec: bool)``
Add new functions the get and set the blocking flag of a file
descriptor, only available on UNIX:
* ``os.get_blocking(fd:int) -> bool``
* ``os.set_blocking(fd:int, blocking: bool)``
Other Changes
-------------
The ``subprocess.Popen`` class must clear the close-on-exec flag of file
descriptors of the ``pass_fds`` parameter. The flag is cleared in the
child process before executing the program; the change does not change
the flag in the parent process.
The close-on-exec flag must also be set on private file descriptors and
sockets in the Python standard library. For example, on UNIX,
os.urandom() opens ``/dev/urandom`` to read some random bytes and the
file descriptor is closed at function exit. The file descriptor is not
expected to be inherited by child processes.
Rejected Alternatives
=====================
PEP 433
-------
The PEP 433 entitled "Easier suppression of file descriptor inheritance"
is a previous attempt proposing various other alternatives, but no
consensus could be reached.
This PEP has a well defined behaviour (the default value of the new
*cloexec* parameter is not configurable), is more conservative (no
backward compatibility issue), and is much simpler.
Add blocking parameter for file descriptors and use Windows overlapped I/O
--------------------------------------------------------------------------
Windows supports non-blocking operations on files using an extension of
the Windows API called "Overlapped I/O". Using this extension requires
to modify the Python standard library and applications to pass a
``OVERLAPPED`` structure and an event loop to wait for the completion of
operations.
This PEP only tries to expose portable flags on file descriptors and
sockets. Supporting overlapped I/O requires an abstraction providing a
high-level and portable API for asynchronous operations on files and
sockets. Overlapped I/O are out of the scope of this PEP.
UNIX supports non-blocking files, moreover recent versions of operating
systems support setting the non-blocking flag at the creation of a file
descriptor. It would be possible to add a new optional *blocking*
parameter to Python functions creating file descriptors. On Windows,
creating a file descriptor with ``blocking=False`` would raise a
``NotImplementedError``. This behaviour is not acceptable for the ``os``
module which is designed as a thin wrapper on the C functions of the
operating system. If a platform does not support a function, the
function should not be available on the platform. For example,
the ``os.fork()`` function is not available on Windows.
UNIX has more flag on file descriptors: ``O_DSYNC``, ``O_SYNC``,
``O_DIRECT``, etc. Adding all these flags complicates the signature and
the implementation of functions creating file descriptor like open().
Moreover, these flags do not work on any file type, and are not
portable.
For all these reasons, this alternative was rejected. The PEP 3156
proposes an abstraction for asynchronous I/O supporting non-blocking
files on Windows.
Links
=====
Python issues:
* `#10115: Support accept4() for atomic setting of flags at socket
creation <http://bugs.python.org/issue10115>`_
* `#12105: open() does not able to set flags, such as O_CLOEXEC
<http://bugs.python.org/issue12105>`_
* `#12107: TCP listening sockets created without FD_CLOEXEC flag
<http://bugs.python.org/issue12107>`_
* `#16850: Add "e" mode to open(): close-and-exec
(O_CLOEXEC) / O_NOINHERIT <http://bugs.python.org/issue16850>`_
* `#16860: Use O_CLOEXEC in the tempfile module
<http://bugs.python.org/issue16860>`_
* `#16946: subprocess: _close_open_fd_range_safe() does not set
close-on-exec flag on Linux < 2.6.23 if O_CLOEXEC is defined
<http://bugs.python.org/issue16946>`_
* `#17070: Use the new cloexec to improve security and avoid bugs
<http://bugs.python.org/issue17070>`_
Other links:
* `Secure File Descriptor Handling
<http://udrepper.livejournal.com/20407.html>`_ (Ulrich Drepper,
2008)
* `Ghosts of Unix past, part 2: Conflated designs
<http://lwn.net/Articles/412131/>`_ (Neil Brown, 2010) explains the
history of ``O_CLOEXEC`` and ``O_NONBLOCK`` flags
Copyright
=========
This document has been placed into the public domain.

408
pep-0447.txt Normal file
View File

@ -0,0 +1,408 @@
PEP: 447
Title: Add __locallookup__ method to metaclass
Version: $Revision$
Last-Modified: $Date$
Author: Ronald Oussoren <ronaldoussoren@mac.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Jun-2013
Post-History: 2-Jul-2013, 15-Jul-2013, 29-Jul-2013
Abstract
========
Currently ``object.__getattribute__`` and ``super.__getattribute__`` peek
in the ``__dict__`` of classes on the MRO for a class when looking for
an attribute. This PEP adds an optional ``__locallookup__`` method to
a metaclass that can be used to override this behavior.
Rationale
=========
It is currently not possible to influence how the `super class`_ looks
up attributes (that is, ``super.__getattribute__`` unconditionally
peeks in the class ``__dict__``), and that can be problematic for
dynamic classes that can grow new methods on demand.
The ``__locallookup__`` method makes it possible to dynamicly add
attributes even when looking them up using the `super class`_.
The new method affects ``object.__getattribute__`` (and
`PyObject_GenericGetAttr`_) as well for consistency.
Background
----------
The current behavior of ``super.__getattribute__`` causes problems for
classes that are dynamic proxies for other (non-Python) classes or types,
an example of which is `PyObjC`_. PyObjC creates a Python class for every
class in the Objective-C runtime, and looks up methods in the Objective-C
runtime when they are used. This works fine for normal access, but doesn't
work for access with ``super`` objects. Because of this PyObjC currently
includes a custom ``super`` that must be used with its classes.
The API in this PEP makes it possible to remove the custom ``super`` and
simplifies the implementation because the custom lookup behavior can be
added in a central location.
The superclass attribute lookup hook
====================================
Both ``super.__getattribute__`` and ``object.__getattribute__`` (or
`PyObject_GenericGetAttr`_ in C code) walk an object's MRO and peek in the
class' ``__dict__`` to look up attributes. A way to affect this lookup is
using a method on the meta class for the type, that by default looks up
the name in the class ``__dict__``.
In Python code
--------------
A meta type can define a method ``__locallookup__`` that is called during
attribute resolution by both ``super.__getattribute__`` and ``object.__getattribute``::
class MetaType(type):
def __locallookup__(cls, name):
try:
return cls.__dict__[name]
except KeyError:
raise AttributeError(name) from None
The ``__locallookup__`` method has as its arguments a class and the name of the attribute
that is looked up. It should return the value of the attribute without invoking descriptors,
or raise `AttributeError`_ when the name cannot be found.
The `type`_ class provides a default implementation for ``__locallookup__``, that
looks up the name in the class dictionary.
Example usage
.............
The code below implements a silly metaclass that redirects attribute lookup to uppercase
versions of names::
class UpperCaseAccess (type):
def __locallookup__(cls, name):
return cls.__dict__[name.upper()]
class SillyObject (metaclass=UpperCaseAccess):
def m(self):
return 42
def M(self):
return "fourtytwo"
obj = SillyObject()
assert obj.m() == "fortytwo"
In C code
---------
A new slot ``tp_locallookup`` is added to the ``PyTypeObject`` struct, this slot
corresponds to the ``__locallookup__`` method on `type`_.
The slot has the following prototype::
PyObject* (*locallookupfunc)(PyTypeObject* cls, PyObject* name);
This method should lookup *name* in the namespace of *cls*, without looking at superclasses,
and should not invoke descriptors. The method returns ``NULL`` without setting an exception
when the *name* cannot be found, and returns a new reference otherwise (not a borrowed reference).
Use of this hook by the interpreter
-----------------------------------
The new method is required for metatypes and as such is defined on `type_`. Both
``super.__getattribute__`` and ``object.__getattribute__``/`PyObject_GenericGetAttr`_
(through ``_PyType_Lookup``) use the this ``__locallookup__`` method when walking
the MRO.
Other changes to the implementation
-----------------------------------
The change for `PyObject_GenericGetAttr`_ will be done by changing the private function
``_PyType_Lookup``. This currently returns a borrowed reference, but must return a new
reference when the ``__locallookup__`` method is present. Because of this ``_PyType_Lookup``
will be renamed to ``_PyType_LookupName``, this will cause compile-time errors for all out-of-tree
users of this private API.
The attribute lookup cache in ``Objects/typeobject.c`` is disabled for classes that have a
metaclass that overrides ``__locallookup__``, because using the cache might not be valid
for such classes.
Performance impact
------------------
The pybench output below compares an implementation of this PEP with the regular
source tree, both based on changeset a5681f50bae2, run on an idle machine an
Core i7 processor running Centos 6.4.
Even though the machine was idle there were clear differences between runs,
I've seen difference in "minimum time" vary from -0.1% to +1.5%, with simular
(but slightly smaller) differences in the "average time" difference.
::
-------------------------------------------------------------------------------
PYBENCH 2.1
-------------------------------------------------------------------------------
* using CPython 3.4.0a0 (default, Jul 29 2013, 13:01:34) [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.perf_counter
* timer: resolution=1e-09, implementation=clock_gettime(CLOCK_MONOTONIC)
-------------------------------------------------------------------------------
Benchmark: pep447.pybench
-------------------------------------------------------------------------------
Rounds: 10
Warp: 10
Timer: time.perf_counter
Machine Details:
Platform ID: Linux-2.6.32-358.114.1.openstack.el6.x86_64-x86_64-with-centos-6.4-Final
Processor: x86_64
Python:
Implementation: CPython
Executable: /tmp/default-pep447/bin/python3
Version: 3.4.0a0
Compiler: GCC 4.4.7 20120313 (Red Hat 4.4.7-3)
Bits: 64bit
Build: Jul 29 2013 14:09:12 (#default)
Unicode: UCS4
-------------------------------------------------------------------------------
Comparing with: default.pybench
-------------------------------------------------------------------------------
Rounds: 10
Warp: 10
Timer: time.perf_counter
Machine Details:
Platform ID: Linux-2.6.32-358.114.1.openstack.el6.x86_64-x86_64-with-centos-6.4-Final
Processor: x86_64
Python:
Implementation: CPython
Executable: /tmp/default/bin/python3
Version: 3.4.0a0
Compiler: GCC 4.4.7 20120313 (Red Hat 4.4.7-3)
Bits: 64bit
Build: Jul 29 2013 13:01:34 (#default)
Unicode: UCS4
Test minimum run-time average run-time
this other diff this other diff
-------------------------------------------------------------------------------
BuiltinFunctionCalls: 45ms 44ms +1.3% 45ms 44ms +1.3%
BuiltinMethodLookup: 26ms 27ms -2.4% 27ms 27ms -2.2%
CompareFloats: 33ms 34ms -0.7% 33ms 34ms -1.1%
CompareFloatsIntegers: 66ms 67ms -0.9% 66ms 67ms -0.8%
CompareIntegers: 51ms 50ms +0.9% 51ms 50ms +0.8%
CompareInternedStrings: 34ms 33ms +0.4% 34ms 34ms -0.4%
CompareLongs: 29ms 29ms -0.1% 29ms 29ms -0.0%
CompareStrings: 43ms 44ms -1.8% 44ms 44ms -1.8%
ComplexPythonFunctionCalls: 44ms 42ms +3.9% 44ms 42ms +4.1%
ConcatStrings: 33ms 33ms -0.4% 33ms 33ms -1.0%
CreateInstances: 47ms 48ms -2.9% 47ms 49ms -3.4%
CreateNewInstances: 35ms 36ms -2.5% 36ms 36ms -2.5%
CreateStringsWithConcat: 69ms 70ms -0.7% 69ms 70ms -0.9%
DictCreation: 52ms 50ms +3.1% 52ms 50ms +3.0%
DictWithFloatKeys: 40ms 44ms -10.1% 43ms 45ms -5.8%
DictWithIntegerKeys: 32ms 36ms -11.2% 35ms 37ms -4.6%
DictWithStringKeys: 29ms 34ms -15.7% 35ms 40ms -11.0%
ForLoops: 30ms 29ms +2.2% 30ms 29ms +2.2%
IfThenElse: 38ms 41ms -6.7% 38ms 41ms -6.9%
ListSlicing: 36ms 36ms -0.7% 36ms 37ms -1.3%
NestedForLoops: 43ms 45ms -3.1% 43ms 45ms -3.2%
NestedListComprehensions: 39ms 40ms -1.7% 39ms 40ms -2.1%
NormalClassAttribute: 86ms 82ms +5.1% 86ms 82ms +5.0%
NormalInstanceAttribute: 42ms 42ms +0.3% 42ms 42ms +0.0%
PythonFunctionCalls: 39ms 38ms +3.5% 39ms 38ms +2.8%
PythonMethodCalls: 51ms 49ms +3.0% 51ms 50ms +2.8%
Recursion: 67ms 68ms -1.4% 67ms 68ms -1.4%
SecondImport: 41ms 36ms +12.5% 41ms 36ms +12.6%
SecondPackageImport: 45ms 40ms +13.1% 45ms 40ms +13.2%
SecondSubmoduleImport: 92ms 95ms -2.4% 95ms 98ms -3.6%
SimpleComplexArithmetic: 28ms 28ms -0.1% 28ms 28ms -0.2%
SimpleDictManipulation: 57ms 57ms -1.0% 57ms 58ms -1.0%
SimpleFloatArithmetic: 29ms 28ms +4.7% 29ms 28ms +4.9%
SimpleIntFloatArithmetic: 37ms 41ms -8.5% 37ms 41ms -8.7%
SimpleIntegerArithmetic: 37ms 41ms -9.4% 37ms 42ms -10.2%
SimpleListComprehensions: 33ms 33ms -1.9% 33ms 34ms -2.9%
SimpleListManipulation: 28ms 30ms -4.3% 29ms 30ms -4.1%
SimpleLongArithmetic: 26ms 26ms +0.5% 26ms 26ms +0.5%
SmallLists: 40ms 40ms +0.1% 40ms 40ms +0.1%
SmallTuples: 46ms 47ms -2.4% 46ms 48ms -3.0%
SpecialClassAttribute: 126ms 120ms +4.7% 126ms 121ms +4.4%
SpecialInstanceAttribute: 42ms 42ms +0.6% 42ms 42ms +0.8%
StringMappings: 94ms 91ms +3.9% 94ms 91ms +3.8%
StringPredicates: 48ms 49ms -1.7% 48ms 49ms -2.1%
StringSlicing: 45ms 45ms +1.4% 46ms 45ms +1.5%
TryExcept: 23ms 22ms +4.9% 23ms 22ms +4.8%
TryFinally: 32ms 32ms -0.1% 32ms 32ms +0.1%
TryRaiseExcept: 17ms 17ms +0.9% 17ms 17ms +0.5%
TupleSlicing: 49ms 48ms +1.1% 49ms 49ms +1.0%
WithFinally: 48ms 47ms +2.3% 48ms 47ms +2.4%
WithRaiseExcept: 45ms 44ms +0.8% 45ms 45ms +0.5%
-------------------------------------------------------------------------------
Totals: 2284ms 2287ms -0.1% 2306ms 2308ms -0.1%
(this=pep447.pybench, other=default.pybench)
A run of the benchmark suite (with option "-b 2n3") also seems to indicate that
the performance impact is minimal::
Report on Linux fangorn.local 2.6.32-358.114.1.openstack.el6.x86_64 #1 SMP Wed Jul 3 02:11:25 EDT 2013 x86_64 x86_64
Total CPU cores: 8
### call_method_slots ###
Min: 0.304120 -> 0.282791: 1.08x faster
Avg: 0.304394 -> 0.282906: 1.08x faster
Significant (t=2329.92)
Stddev: 0.00016 -> 0.00004: 4.1814x smaller
### call_simple ###
Min: 0.249268 -> 0.221175: 1.13x faster
Avg: 0.249789 -> 0.221387: 1.13x faster
Significant (t=2770.11)
Stddev: 0.00012 -> 0.00013: 1.1101x larger
### django_v2 ###
Min: 0.632590 -> 0.601519: 1.05x faster
Avg: 0.635085 -> 0.602653: 1.05x faster
Significant (t=321.32)
Stddev: 0.00087 -> 0.00051: 1.6933x smaller
### fannkuch ###
Min: 1.033181 -> 0.999779: 1.03x faster
Avg: 1.036457 -> 1.001840: 1.03x faster
Significant (t=260.31)
Stddev: 0.00113 -> 0.00070: 1.6112x smaller
### go ###
Min: 0.526714 -> 0.544428: 1.03x slower
Avg: 0.529649 -> 0.547626: 1.03x slower
Significant (t=-93.32)
Stddev: 0.00136 -> 0.00136: 1.0028x smaller
### iterative_count ###
Min: 0.109748 -> 0.116513: 1.06x slower
Avg: 0.109816 -> 0.117202: 1.07x slower
Significant (t=-357.08)
Stddev: 0.00008 -> 0.00019: 2.3664x larger
### json_dump_v2 ###
Min: 2.554462 -> 2.609141: 1.02x slower
Avg: 2.564472 -> 2.620013: 1.02x slower
Significant (t=-76.93)
Stddev: 0.00538 -> 0.00481: 1.1194x smaller
### meteor_contest ###
Min: 0.196336 -> 0.191925: 1.02x faster
Avg: 0.196878 -> 0.192698: 1.02x faster
Significant (t=61.86)
Stddev: 0.00053 -> 0.00041: 1.2925x smaller
### nbody ###
Min: 0.228039 -> 0.235551: 1.03x slower
Avg: 0.228857 -> 0.236052: 1.03x slower
Significant (t=-54.15)
Stddev: 0.00130 -> 0.00029: 4.4810x smaller
### pathlib ###
Min: 0.108501 -> 0.105339: 1.03x faster
Avg: 0.109084 -> 0.105619: 1.03x faster
Significant (t=311.08)
Stddev: 0.00022 -> 0.00011: 1.9314x smaller
### regex_effbot ###
Min: 0.057905 -> 0.056447: 1.03x faster
Avg: 0.058055 -> 0.056760: 1.02x faster
Significant (t=79.22)
Stddev: 0.00006 -> 0.00015: 2.7741x larger
### silent_logging ###
Min: 0.070810 -> 0.072436: 1.02x slower
Avg: 0.070899 -> 0.072609: 1.02x slower
Significant (t=-191.59)
Stddev: 0.00004 -> 0.00008: 2.2640x larger
### spectral_norm ###
Min: 0.290255 -> 0.299286: 1.03x slower
Avg: 0.290335 -> 0.299541: 1.03x slower
Significant (t=-572.10)
Stddev: 0.00005 -> 0.00015: 2.8547x larger
### threaded_count ###
Min: 0.107215 -> 0.115206: 1.07x slower
Avg: 0.107488 -> 0.115996: 1.08x slower
Significant (t=-109.39)
Stddev: 0.00016 -> 0.00076: 4.8665x larger
The following not significant results are hidden, use -v to show them:
call_method, call_method_unknown, chaos, fastpickle, fastunpickle, float, formatted_logging, hexiom2, json_load, normal_startup, nqueens, pidigits, raytrace, regex_compile, regex_v8, richards, simple_logging, startup_nosite, telco, unpack_sequence.
Alternative proposals
---------------------
``__getattribute_super__``
..........................
An earlier version of this PEP used the following static method on classes::
def __getattribute_super__(cls, name, object, owner): pass
This method performed name lookup as well as invoking descriptors and was necessarily
limited to working only with ``super.__getattribute__``.
Reuse ``tp_getattro``
.....................
It would be nice to avoid adding a new slot, thus keeping the API simpler and
easier to understand. A comment on `Issue 18181`_ asked about reusing the
``tp_getattro`` slot, that is super could call the ``tp_getattro`` slot of all
methods along the MRO.
That won't work because ``tp_getattro`` will look in the instance
``__dict__`` before it tries to resolve attributes using classes in the MRO.
This would mean that using ``tp_getattro`` instead of peeking the class
dictionaries changes the semantics of the `super class`_.
References
==========
* `Issue 18181`_ contains a prototype implementation
Copyright
=========
This document has been placed in the public domain.
.. _`Issue 18181`: http://bugs.python.org/issue18181
.. _`super class`: http://docs.python.org/3/library/functions.html#super
.. _`NotImplemented`: http://docs.python.org/3/library/constants.html#NotImplemented
.. _`PyObject_GenericGetAttr`: http://docs.python.org/3/c-api/object.html#PyObject_GenericGetAttr
.. _`type`: http://docs.python.org/3/library/functions.html#type
.. _`AttributeError`: http://docs.python.org/3/library/exceptions.html#AttributeError
.. _`PyObjC`: http://pyobjc.sourceforge.net/
.. _`classmethod`: http://docs.python.org/3/library/functions.html#classmethod

247
pep-0448.txt Normal file
View File

@ -0,0 +1,247 @@
PEP: 448
Title: Additional Unpacking Generalizations
Version: $Revision$
Last-Modified: $Date$
Author: Joshua Landau <joshua@landau.ws>
Discussions-To: python-ideas@python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 29-Jun-2013
Python-Version: 3.4
Post-History:
Abstract
========
This PEP proposes extended usages of the ``*`` iterable unpacking
operator to allow unpacking in more positions, an arbitrary number of
times, and in several additional circumstances.
Specifically:
Arbitrarily positioned unpacking operators::
>>> print(*[1], *[2], 3)
1 2 3
>>> dict(**{'x': 1}, y=3, **{'z': 2})
{'x': 1, 'y': 2, 'z': 3}
Function calls currently have the restriction that keyword arguments
must follow positional arguments and ``**`` unpackings must additionally
follow ``*`` unpackings. Because of the new levity for ``*`` and ``**``
unpackings, it may be advisable to lift some or all of these
restrictions.
As currently, if an argument is given multiple times - such as a
positional argument given both positionally and by keyword - a
TypeError is raised.
Unpacking is proposed to be allowed inside tuples, lists, sets,
dictionaries and comprehensions::
>>> *range(4), 4
(0, 1, 2, 3, 4)
>>> [*range(4), 4]
[0, 1, 2, 3, 4]
>>> {*range(4), 4}
{0, 1, 2, 3, 4}
>>> {'x': 1, **{'y': 2}}
{'x': 1, 'y': 2}
>>> ranges = [range(i) for i in range(5)]
>>> [*item for item in ranges]
[0, 0, 1, 0, 1, 2, 0, 1, 2, 3]
Rationale
=========
Current usage of the ``*`` iterable unpacking operator features
unnecessary restrictions that can harm readability.
Unpacking multiple times has an obvious rationale. When you want to
unpack several iterables into a function definition or follow an unpack
with more positional arguments, the most natural way would be to write::
function(**kw_arguments, **more_arguments)
function(*arguments, argument)
Simple examples where this is useful are ``print`` and ``str.format``.
Instead, you could be forced to write::
kwargs = dict(kw_arguments)
kwargs.update(more_arguments)
function(**kwargs)
args = list(arguments)
args.append(arg)
function(*args)
or, if you know to do so::
from collections import ChainMap
function(**ChainMap(more_arguments, arguments))
from itertools import chain
function(*chain(args, [arg]))
which add unnecessary line-noise and, with the first methods, causes
duplication of work.
There are two primary rationales for unpacking inside of containers.
Firstly there is a symmetry of assignment, where ``fst, *other, lst =
elems`` and ``elems = fst, *other, lst`` are approximate inverses,
ignoring the specifics of types. This, in effect, simplifies the
language by removing special cases.
Secondly, it vastly simplifies types of "addition" such as combining
dictionaries, and does so in an unambiguous and well-defined way::
combination = {**first_dictionary, "x": 1, "y": 2}
instead of::
combination = first_dictionary.copy()
combination.update({"x": 1, "y": 2})
which is especially important in contexts where expressions are
preferred. This is also useful as a more readable way of summing
iterables into a list, such as ``my_list + list(my_tuple) +
list(my_range)`` which is now equivalent to just ``[*my_list,
*my_tuple, *my_range]``.
The addition of unpacking to comprehensions is a logical extension.
It's usage will primarily be a neat replacement for ``[i for j in
2D_list for i in j]``, as the more readable ``[*l for l in 2D_list]``.
Other uses are possible, but expected to occur rarely.
Specification
=============
Function calls may accept an unbound number of ``*`` and ``**``
unpackings. There will be no restriction of the order of positional
arguments with relation to ``*`` unpackings nor any restriction of the
order of keyword arguments with relation to ``**`` unpackings.
Function calls currently have the restriction that keyword arguments
must follow positional arguments and ``**`` unpackings must additionally
follow ``*`` unpackings. Because of the new levity for ``*`` and ``**``
unpackings, it may be advisable to list some or all of these
restrictions.
As currently, if an argument is given multiple times - such as a
positional argument given both positionally and by keyword - a
TypeError is raised.
If the restrictions are kept, a function call will look like this::
function(
argument or *args, argument or *args, ...,
kwargument or *args, kwargument or *args, ...,
kwargument or **kwargs, kwargument or **kwargs, ...
)
If they are removed completely, a function call will look like this::
function(
argument or keyword_argument or *args or **kwargs,
argument or keyword_argument or *args or **kwargs,
...
)
Tuples, lists, sets and dictionaries will allow unpacking. This will
act as if the elements from unpacked items were inserted in order at
the site of unpacking, much as happens in unpacking in a function-call.
Dictionaries require ``**`` unpacking; all the others require ``*`` unpacking.
A dictionary's key remain in a right-to-left priority order, so
``{**{'a': 1}, 'a': 2, **{'a': 3}}`` evaluates to ``{'a': 3}``. There
is no restriction on the number or position of unpackings.
Comprehensions, by simple extension, will support unpacking. As before,
dictionaries require ``**`` unpacking, all the others require ``*``
unpacking and key priorities are unchanged.
Examples include::
{*[1, 2, 3], 4, 5, *{6, 7, 8}}
(*e for e in [[1], [3, 4, 5], [2]])
{**dictionary for dictionary in (globals(), locals())}
{**locals(), "override": None}
Disadvantages
=============
If the current restrictions for function call arguments (keyword
arguments must follow positional arguments and ``**`` unpackings must
additionally follow ``*`` unpackings) are kept, the allowable orders
for arguments in a function call is more complicated than before.
The simplest explanation for the rules may be "positional arguments
come first and keyword arguments follow, but ``*`` unpackings are
allowed after keyword arguments".
If the current restrictions are lifted, there are no obvious gains to
code as the only new orders that are allowed look silly: ``f(a, e=e,
d=d, b, c)`` being a simpler example.
Whilst ``*elements, = iterable`` causes ``elements`` to be a list,
``elements = *iterable,`` causes ``elements`` to be a tuple. The
reason for this may not be obvious at first glance and may confuse
people unfamiliar with the construct.
Implementation
==============
An implementation for an old version of Python 3 is found at Issue
2292 on bug tracker [1]_, although several changes should be made:
- It has yet to be updated to the most recent Python version
- It features a now redundant replacement for "yield from" which
should be removed
- It also loses support for calling function with keyword arguments before
positional arguments, which is an unnecessary backwards-incompatible change
- If the restrictions on the order of arguments in a function call are
partially or fully lifted, they would need to be included
References
==========
.. [1] Issue 2292, "Missing `*`-unpacking generalizations", Thomas Wouters
(http://bugs.python.org/issue2292)
.. [2] Discussion on Python-ideas list,
"list / array comprehensions extension", Alexander Heger
(http://mail.python.org/pipermail/python-ideas/2011-December/013097.html)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

50
pep-0466/test_cloexec.py Normal file
View File

@ -0,0 +1,50 @@
import os, fcntl, sys, errno
def get_cloexec(fd):
try:
flags = fcntl.fcntl(fd, fcntl.F_GETFD)
return bool(flags & fcntl.FD_CLOEXEC)
except IOError as err:
if err.errno == errno.EBADF:
return '<invalid file descriptor>'
else:
return str(err)
def set_cloexec(fd):
flags = fcntl.fcntl(fd, fcntl.F_GETFD)
flags |= fcntl.FD_CLOEXEC
fcntl.fcntl(fd, fcntl.F_SETFD, flags)
def main():
f = open(__file__, "rb")
fd = f.fileno()
print("initial state: fd=%s, cloexec=%s" % (fd, get_cloexec(fd)))
pid = os.fork()
if not pid:
set_cloexec(fd)
print("child process after fork, set cloexec: cloexec=%s" % get_cloexec(fd))
child_argv = [sys.executable, __file__, str(fd),
'child process after exec']
os.execv(child_argv[0], child_argv)
os.waitpid(pid, 0)
print("parent process after fork: cloexec=%s" % get_cloexec(fd))
child_argv = [sys.executable, __file__, str(fd),
'parent process after exec']
os.execv(child_argv[0], child_argv)
def after_exec():
fd = int(sys.argv[1])
name = sys.argv[2]
print("%s: fd=%s, cloexec=%s"
% (name, fd, get_cloexec(fd)))
sys.exit()
if __name__ == "__main__":
if len(sys.argv) == 1:
main()
else:
after_exec()

View File

@ -19,9 +19,11 @@ This PEP proposes the addition of an optional ``given`` clause to several
Python statements that do not currently have an associated code suite. This
clause will create a statement local namespace for additional names that are
accessible in the associated statement, but do not become part of the
containing namespace. To permit a sane implementation strategy, forward
references to names from the ``given`` clause will need to be marked
explicitly.
containing namespace.
Adoption of a new symbol, ``?``, is proposed to denote a forward reference
to the namespace created by running the associated code suite. It will be
a reference to a ``types.SimpleNamespace`` object.
The primary motivation is to enable a more declarative style of programming,
where the operation to be performed is presented to the reader first, and the
@ -72,12 +74,16 @@ The ``given`` clause would allow subexpressions to be referenced by
name in the header line, with the actual definitions following in
the indented clause. As a simple example::
sorted_data = sorted(data, key=.sort_key) given:
sorted_data = sorted(data, key=?.sort_key) given:
def sort_key(item):
return item.attr1, item.attr2
The leading ``.`` on ``.sort_key`` indicates to the compiler that this
is a forward reference to a name defined in the ``given`` clause.
The new symbol ``?`` is used to refer to the given namespace. It would be a
``types.SimpleNamespace`` instance, so ``?.sort_key`` functions as
a forward reference to a name defined in the ``given`` clause.
A docstring would be permitted in the given clause, and would be attached
to the result namespace as its ``__doc__`` attribute.
The ``pass`` statement is included to provide a consistent way to skip
inclusion of a meaningful expression in the header line. While this is not
@ -94,7 +100,7 @@ binding operations in the header line::
# Explicit early binding via given clause
seq = []
for i in range(10):
seq.append(.f) given i=i:
seq.append(.f) given i=i in:
def f():
return i
assert [f() for f in seq] == list(range(10))
@ -105,7 +111,7 @@ Semantics
The following statement::
op(.f, .g) given bound_a=a, bound_b=b:
op(?.f, ?.g) given bound_a=a, bound_b=b in:
def f():
return bound_a + bound_b
def g():
@ -121,9 +127,10 @@ hidden compiler variable or simply an entry on the interpreter stack)::
return bound_a + bound_b
def g():
return bound_a - bound_b
return f, g
__ref1, __ref2 = __scope(__arg1)
op(__ref1, __ref2)
return types.SimpleNamespace(**locals())
__ref = __scope(__arg1, __arg2)
__ref.__doc__ = __scope.__doc__
op(__ref.f, __ref.g)
A ``given`` clause is essentially a nested function which is created and
then immediately executed. Unless explicitly passed in, names are looked
@ -158,7 +165,7 @@ New::
yield_stmt: yield_expr [given_clause]
raise_stmt: 'raise' [test ['from' test]] [given_clause]
assert_stmt: 'assert' test [',' test] [given_clause]
given_clause: "given" (NAME '=' test)* ":" suite
given_clause: "given" [(NAME '=' test)+ "in"]":" suite
(Note that ``expr_stmt`` in the grammar is a slight misnomer, as it covers
assignment and augmented assignment in addition to simple expression
@ -207,7 +214,7 @@ For reference, here are the current definitions at that level::
flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
In addition to the above changes, the definition of ``atom`` would be changed
to also allow ``"." NAME``. The restriction of this usage to statements with
to also allow ``?``. The restriction of this usage to statements with
an associated ``given`` clause would be handled by a later stage of the
compilation process (likely AST construction, which already enforces
other restrictions where the grammar is overly permissive in order to
@ -277,13 +284,14 @@ without reading the body of the suite.
However, while they are the initial motivating use case, limiting this
feature solely to simple assignments would be overly restrictive. Once the
feature is defined at all, it would be quite arbitrary to prevent its use
for augmented assignments, return statements, yield expressions and
arbitrary expressions that may modify the application state.
for augmented assignments, return statements, yield expressions,
comprehensions and arbitrary expressions that may modify the
application state.
The ``given`` clause may also function as a more readable
alternative to some uses of lambda expressions and similar
constructs when passing one-off functions to operations
like ``sorted()``.
like ``sorted()`` or in callback based event-driven programming.
In module and class level code, the ``given`` clause will serve as a
clear and reliable replacement for usage of the ``del`` statement to keep
@ -350,7 +358,7 @@ container comprehensions::
# would be equivalent to
seq2 = .result given seq=seq:
seq2 = ?.result given seq=seq:
result = []
for y in seq:
if p(y):
@ -367,7 +375,7 @@ Not that, unlike PEP 403, the current version of this PEP *cannot*
provide a precisely equivalent expansion for a generator expression. The
closest it can get is to define an additional level of scoping::
seq2 = .g(seq) given:
seq2 = ?.g(seq) given:
def g(seq):
for y in seq:
if p(y):
@ -375,6 +383,22 @@ closest it can get is to define an additional level of scoping::
if q(x):
yield x
This limitation could be remedied by permitting the given clause to be
a generator function, in which case ? would refer to a generator-iterator
object rather than a simple namespace::
seq2 = ? given seq=seq in:
for y in seq:
if p(y):
for x in y:
if q(x):
yield x
However, this would make the meaning of "?" quite ambiguous, even more so
than is already the case for the meaning of ``def`` statements (which will
usually have a docstring indicating whether or not a function definition is
actually a generator)
Explaining Decorator Clause Evaluation and Application
------------------------------------------------------
@ -477,13 +501,18 @@ what the language allows them to express.
I believe the proposal in this PEP would finally let Python get close to the
"executable pseudocode" bar for the kind of thought expressed above::
sorted_list = sorted(original, key=.sort_key) given:
def sort_key(item):
sorted_list = sorted(original, key=?.key) given:
def key(item):
return item.attr1, item.attr2
Everything is in the same order as it was in the user's original thought, the
only addition they have to make is to give the sorting criteria a name so that
the usage can be linked up to the subsequent definition.
Everything is in the same order as it was in the user's original thought, and
they don't even need to come up with a name for the sorting criteria: it is
possible to reuse the keyword argument name directly.
A possible enhancement to those proposal would be to provide a convenient
shorthand syntax to say "use the given clause contents as keyword
arguments". Even without dedicated syntax, that can be written simply as
``**vars(?)``.
Harmful to Introspection
@ -516,7 +545,7 @@ world code is genuinely enhanced.
This is more of a deficiency in the PEP rather than the idea, though. If
it wasn't a real world problem, we wouldn't get so many complaints about
the lack of multi-line lambda support and Ruby's block construct
probaly wouldn't be quite so popular.
probably wouldn't be quite so popular.
Open Questions
@ -525,9 +554,12 @@ Open Questions
Syntax for Forward References
-----------------------------
The leading ``.`` arguably fails the "syntax shall not look like grit on
Uncle Tim's monitor" test. However, it does have the advantages of being
easy to type and already having an association with namespaces.
The ``?`` symbol is proposed for forward references to the given namespace
as it is short, currently unused and suggests "there's something missing
here that will be filled in later".
The proposal in the PEP doesn't neatly parallel any existing Python feature,
so reusing an already used symbol has been deliberately avoided.
Handling of ``nonlocal`` and ``global``
@ -541,8 +573,8 @@ Alternatively, they could be defined as operating as if the anonymous
functions were defined as in the expansion above.
Detailed Semantics #3: Handling of ``break`` and ``continue``
-------------------------------------------------------------
Handling of ``break`` and ``continue``
--------------------------------------
``break`` and ``continue`` will operate as if the anonymous functions were
defined as in the expansion above. They will be syntax errors if they occur
@ -561,6 +593,25 @@ they appear within a ``def`` statement within that suite.
Examples
========
Defining callbacks for event driven programming::
# Current Python (definition before use)
def cb(sock):
# Do something with socket
def eb(exc):
logging.exception(
"Failed connecting to %s:%s", host, port)
loop.create_connection((host, port), cb, eb) given:
# Becomes:
loop.create_connection((host, port), ?.cb, ?.eb) given:
def cb(sock):
# Do something with socket
def eb(exc):
logging.exception(
"Failed connecting to %s:%s", host, port)
Defining "one-off" classes which typically only have a single instance::
# Current Python (instantiation after definition)
@ -579,7 +630,7 @@ Defining "one-off" classes which typically only have a single instance::
... # However many lines
# Becomes:
public_name = .MeaningfulClassName(*params) given:
public_name = ?.MeaningfulClassName(*params) given:
class MeaningfulClassName():
... # Should trawl the stdlib for an example of doing this
@ -593,7 +644,7 @@ Calculating attributes without polluting the local namespace (from os.py)::
del _createenviron
# Becomes:
environ = ._createenviron() given:
environ = ?._createenviron() given:
def _createenviron():
... # 27 line function
@ -606,7 +657,7 @@ Replacing default argument hack (from functools.lru_cache)::
return decorating_function
# Becomes:
return .decorating_function given:
return ?.decorating_function given:
# Cell variables rather than locals, but should give similar speedup
tuple, sorted, len, KeyError = tuple, sorted, len, KeyError
def decorating_function(user_function):
@ -701,6 +752,9 @@ References
.. [9] Possible PEP 3150 style guidelines (#2):
http://mail.python.org/pipermail/python-ideas/2011-October/012341.html
.. [10] Multi-line lambdas (again!)
http://mail.python.org/pipermail/python-ideas/2013-August/022526.html
Copyright
=========

View File

@ -846,6 +846,12 @@ public API is as follows, indicating the differences with PEP 3148:
convention from the section "Callback Style" below) is always called
with a single argument, the Future object.
- ``remove_done_callback(fn)``. Remove the argument from the list of
callbacks. This method is not defined by PEP 3148. The argument
must be equal (using ``==``) to the argument passed to
``add_done_callback()``. Returns the number of times the callback
was removed.
- ``set_result(result)``. The Future must not be done (nor cancelled)
already. This makes the Future done and schedules the callbacks.
Difference with PEP 3148: This is a public API.
@ -1302,25 +1308,25 @@ package are provided:
- ``FIRST_EXCEPTION``: Wait until at least one Future is done (not
cancelled) with an exception set. (The exclusion of cancelled
Futures from the filter is surprising, but PEP 3148 does it this
way.)
Futures from the condition is surprising, but PEP 3148 does it
this way.)
- ``tulip.as_completed(fs, timeout=None)``. Returns an iterator whose
values are Futures; waiting for successive values waits until the
next Future or coroutine from the set ``fs`` completes, and returns
its result (or raises its exception). The optional argument
``timeout`` has the same meaning and default as it does for
``concurrent.futures.wait()``: when the timeout occurs, the next
Future returned by the iterator will raise ``TimeoutError`` when
waited for. Example of use::
values are Futures or coroutines; waiting for successive values
waits until the next Future or coroutine from the set ``fs``
completes, and returns its result (or raises its exception). The
optional argument ``timeout`` has the same meaning and default as it
does for ``concurrent.futures.wait()``: when the timeout occurs, the
next Future returned by the iterator will raise ``TimeoutError``
when waited for. Example of use::
for f in as_completed(fs):
result = yield from f # May raise an exception.
# Use result.
Note: if you do not wait for the futures as they are produced by the
iterator, your ``for`` loop may not make progress (since you are not
allowing other tasks to run).
Note: if you do not wait for the values produced by the iterator,
your ``for`` loop may not make progress (since you are not allowing
other tasks to run).
Sleeping
--------

View File

@ -202,7 +202,7 @@ def fixfile(inpath, input_lines, outfile):
print >> outfile, '</td></tr></table>'
print >> outfile, '<div class="header">\n<table border="0">'
for k, v in header:
if k.lower() in ('author', 'discussions-to'):
if k.lower() in ('author', 'bdfl-delegate', 'discussions-to'):
mailtos = []
for part in re.split(',\s*', v):
if '@' in part: