This commit is contained in:
Richard Jones 2013-07-09 12:29:13 +10:00
commit 1ed9a8a185
10 changed files with 2683 additions and 882 deletions

View File

@ -158,9 +158,21 @@ The preferred way of wrapping long lines is by using Python's implied
line continuation inside parentheses, brackets and braces. Long lines
can be broken over multiple lines by wrapping expressions in
parentheses. These should be used in preference to using a backslash
for line continuation. Make sure to indent the continued line
appropriately. The preferred place to break around a binary operator
is *after* the operator, not before it. Some examples::
for line continuation.
Backslashes may still be appropriate at times. For example, long,
multiple ``with``-statements cannot use implicit continuation, so
backslashes are acceptable::
with open('/path/to/some/file/you/want/to/read') as file_1, \
open('/path/to/some/file/being/written', 'w') as file_2:
file_2.write(file_1.read())
Another such case is with ``assert`` statements.
Make sure to indent the continued line appropriately. The preferred
place to break around a binary operator is *after* the operator, not
before it. Some examples::
class Rectangle(Blob):

View File

@ -4,7 +4,7 @@ Version: $Revision$
Last-Modified: $Date$
Author: Raymond Hettinger <python@rcn.com>
W Isaac Carroll <icarroll@pobox.com>
Status: Deferred
Status: Rejected
Type: Standards Track
Content-Type: text/plain
Created: 25-Apr-2003
@ -21,19 +21,32 @@ Abstract
Notice
Deferred; see
Rejected; see
http://mail.python.org/pipermail/python-ideas/2013-June/021610.html
This PEP has been deferred since 2006; see
http://mail.python.org/pipermail/python-dev/2006-February/060718.html
Subsequent efforts to revive the PEP in April 2009 did not
meet with success because no syntax emerged that could
compete with a while-True and an inner if-break.
compete with the following form:
A syntax was found for a basic do-while loop but it found
had little support because the condition was at the top:
while True:
<setup code>
if not <condition>:
break
<loop body>
A syntax alternative to the one proposed in the PEP was found for
a basic do-while loop but it gained little support because the
condition was at the top:
do ... while <cond>:
<loop body>
Users of the language are advised to use the while-True form with
an inner if-break when a do-while loop would have been appropriate.
Motivation

File diff suppressed because it is too large Load Diff

249
pep-0426/pymeta-schema.json Normal file
View File

@ -0,0 +1,249 @@
{
"id": "http://www.python.org/dev/peps/pep-0426/",
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Metadata for Python Software Packages 2.0",
"type": "object",
"properties": {
"metadata_version": {
"description": "Version of the file format",
"type": "string",
"pattern": "^(\\d+(\\.\\d+)*)$"
},
"generator": {
"description": "Name and version of the program that produced this file.",
"type": "string",
"pattern": "^[0-9A-Za-z]([0-9A-Za-z_.-]*[0-9A-Za-z])( \\((\\d+(\\.\\d+)*)((a|b|c|rc)(\\d+))?(\\.(post)(\\d+))?(\\.(dev)(\\d+))\\))?$"
},
"name": {
"description": "The name of the distribution.",
"type": "string",
"pattern": "^[0-9A-Za-z]([0-9A-Za-z_.-]*[0-9A-Za-z])?$"
},
"version": {
"description": "The distribution's public version identifier",
"type": "string",
"pattern": "^(\\d+(\\.\\d+)*)((a|b|c|rc)(\\d+))?(\\.(post)(\\d+))?(\\.(dev)(\\d+))?$"
},
"source_label": {
"description": "A constrained identifying text string",
"type": "string",
"pattern": "^[0-9a-z_.-+]+$"
},
"source_url": {
"description": "A string containing a full URL where the source for this specific version of the distribution can be downloaded.",
"type": "string",
"format": "uri"
},
"summary": {
"description": "A one-line summary of what the distribution does.",
"type": "string"
},
"document_names": {
"description": "Names of supporting metadata documents",
"type": "object",
"properties": {
"description": {
"type": "string",
"$ref": "#/definitions/document_name"
},
"changelog": {
"type": "string",
"$ref": "#/definitions/document_name"
},
"license": {
"type": "string",
"$ref": "#/definitions/document_name"
}
},
"additionalProperties": false
},
"keywords": {
"description": "A list of additional keywords to be used to assist searching for the distribution in a larger catalog.",
"type": "array",
"items": {
"type": "string"
}
},
"license": {
"description": "A string indicating the license covering the distribution.",
"type": "string"
},
"classifiers": {
"description": "A list of strings, with each giving a single classification value for the distribution.",
"type": "array",
"items": {
"type": "string"
}
},
"contacts": {
"description": "A list of contributor entries giving the recommended contact points for getting more information about the project.",
"type": "array",
"items": {
"type": "object",
"$ref": "#/definitions/contact"
}
},
"contributors": {
"description": "A list of contributor entries for other contributors not already listed as current project points of contact.",
"type": "array",
"items": {
"type": "object",
"$ref": "#/definitions/contact"
}
},
"project_urls": {
"description": "A mapping of arbitrary text labels to additional URLs relevant to the project.",
"type": "object"
},
"extras": {
"description": "A list of optional sets of dependencies that may be used to define conditional dependencies in \"may_require\" and similar fields.",
"type": "array",
"items": {
"type": "string",
"$ref": "#/definitions/extra_name"
}
},
"distributes": {
"description": "A list of subdistributions made available through this metadistribution.",
"type": "array",
"$ref": "#/definitions/dependencies"
},
"may_distribute": {
"description": "A list of subdistributions that may be made available through this metadistribution, based on the extras requested and the target deployment environment.",
"$ref": "#/definitions/conditional_dependencies"
},
"run_requires": {
"description": "A list of other distributions needed when to run this distribution.",
"type": "array",
"$ref": "#/definitions/dependencies"
},
"run_may_require": {
"description": "A list of other distributions that may be needed when this distribution is deployed, based on the extras requested and the target deployment environment.",
"$ref": "#/definitions/conditional_dependencies"
},
"test_requires": {
"description": "A list of other distributions needed when this distribution is tested.",
"type": "array",
"$ref": "#/definitions/dependencies"
},
"test_may_require": {
"description": "A list of other distributions that may be needed when this distribution is tested, based on the extras requested and the target deployment environment.",
"type": "array",
"$ref": "#/definitions/conditional_dependencies"
},
"build_requires": {
"description": "A list of other distributions needed when this distribution is built.",
"type": "array",
"$ref": "#/definitions/dependencies"
},
"build_may_require": {
"description": "A list of other distributions that may be needed when this distribution is built, based on the extras requested and the target deployment environment.",
"type": "array",
"$ref": "#/definitions/conditional_dependencies"
},
"dev_requires": {
"description": "A list of other distributions needed when this distribution is developed.",
"type": "array",
"$ref": "#/definitions/dependencies"
},
"dev_may_require": {
"description": "A list of other distributions that may be needed when this distribution is developed, based on the extras requested and the target deployment environment.",
"type": "array",
"$ref": "#/definitions/conditional_dependencies"
},
"provides": {
"description": "A list of strings naming additional dependency requirements that are satisfied by installing this distribution. These strings must be of the form Name or Name (Version), as for the requires field.",
"type": "array",
"items": {
"type": "string"
}
},
"obsoleted_by": {
"description": "A string that indicates that this project is no longer being developed. The named project provides a substitute or replacement.",
"type": "string",
"$ref": "#/definitions/version_specifier"
},
"supports_environments": {
"description": "A list of strings specifying the environments that the distribution explicitly supports.",
"type": "array",
"items": {
"type": "string",
"$ref": "#/definitions/environment_marker"
}
},
"metabuild_hooks": {
"description": "The metabuild_hooks field is used to define various operations that may be invoked on a distribution in a platform independent manner.",
"type": "object"
},
"extensions": {
"description": "Extensions to the metadata may be present in a mapping under the 'extensions' key.",
"type": "object"
}
},
"required": ["metadata_version", "name", "version"],
"additionalProperties": false,
"definitions": {
"contact": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"email": {
"type": "string"
},
"url": {
"type": "string"
},
"role": {
"type": "string"
}
},
"required": ["name"],
"additionalProperties": false
},
"dependencies": {
"type": "array",
"items": {
"type": "string",
"$ref": "#/definitions/version_specifier"
}
},
"conditional_dependencies": {
"type": "array",
"items": {
"type": "object",
"properties": {
"extra": {
"type": "string",
"$ref": "#/definitions/extra_name"
},
"environment": {
"type": "string",
"$ref": "#/definitions/environment_marker"
},
"dependencies": {
"type": "array",
"$ref": "#/definitions/dependencies"
}
},
"required": ["dependencies"],
"additionalProperties": false
}
},
"version_specifier": {
"type": "string"
},
"extra_name": {
"type": "string"
},
"environment_marker": {
"type": "string"
},
"document_name": {
"type": "string"
}
}
}

View File

@ -5,7 +5,7 @@ Last-Modified: $Date$
Author: Barry Warsaw <barry@python.org>,
Eli Bendersky <eliben@gmail.com>,
Ethan Furman <ethan@stoneleaf.us>
Status: Accepted
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 2013-02-23
@ -467,6 +467,10 @@ assignment to ``Animal`` is equivalent to::
... cat = 3
... dog = 4
The reason for defaulting to ``1`` as the starting number and not ``0`` is
that ``0`` is ``False`` in a boolean sense, but enum members all evaluate
to ``True``.
Proposed variations
===================

View File

@ -9,7 +9,7 @@ Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18 Mar 2013
Post-History: 30 Mar 2013, 27-May-2013
Post-History: 30 Mar 2013, 27 May 2013, 20 Jun 2013
Replaces: 386
@ -27,7 +27,7 @@ standardised approach to versioning, as described in PEP 345 and PEP 386.
This PEP was broken out of the metadata 2.0 specification in PEP 426.
Unlike PEP 426, the notes that remain in this document are intended as
part of the final specification.
part of the final specification (except for this one).
Definitions
@ -40,7 +40,7 @@ document are to be interpreted as described in RFC 2119.
The following terms are to be interpreted as described in PEP 426:
* "Distributions"
* "Versions"
* "Releases"
* "Build tools"
* "Index servers"
* "Publication tools"
@ -52,9 +52,13 @@ The following terms are to be interpreted as described in PEP 426:
Version scheme
==============
Distribution versions are identified by both a public version identifier,
which supports all defined version comparison operations, and a build
label, which supports only strict equality comparisons.
Distributions are identified by a public version identifier which
supports all defined version comparison operations
Distributions may also define a source label, which is not used by
automated tools. Source labels are useful when a project internal
versioning scheme requires translation to create a compliant public
version identifier.
The version scheme is used both to describe the distribution version
provided by a particular distribution archive, as well as to place
@ -84,7 +88,7 @@ Public version identifiers are separated into up to four segments:
* Post-release segment: ``.postN``
* Development release segment: ``.devN``
Any given version will be a "release", "pre-release", "post-release" or
Any given release will be a "final release", "pre-release", "post-release" or
"developmental release" as defined in the following sections.
.. note::
@ -105,28 +109,37 @@ Source labels
Source labels are text strings with minimal defined semantics.
To ensure source labels can be readily incorporated as part of file names
and URLs, they MUST be comprised of only ASCII alphanumerics, plus signs,
periods and hyphens.
and URLs, and to avoid formatting inconsistences in hexadecimal hash
representations they MUST be limited to the following set of permitted
characters:
In addition, source labels MUST be unique within a given distribution.
* Lowercase ASCII letters (``[a-z]``)
* ASCII digits (``[0-9]``)
* underscores (``_``)
* hyphens (``-``)
* periods (``.``)
* plus signs (``+``)
As with distribution names, all comparisons of source labels MUST be case
insensitive.
Source labels MUST start and end with an ASCII letter or digit.
Source labels MUST be unique within each project and MUST NOT match any
defined version for the project.
Releases
--------
Final releases
--------------
A version identifier that consists solely of a release segment is termed
a "release".
A version identifier that consists solely of a release segment is
termed a "final release".
The release segment consists of one or more non-negative integer values,
separated by dots::
The release segment consists of one or more non-negative integer
values, separated by dots::
N[.N]+
Releases within a project will typically be numbered in a consistently
increasing fashion.
Final releases within a project MUST be numbered in a consistently
increasing fashion, otherwise automated tools will not be able to upgrade
them correctly.
Comparison and ordering of release segments considers the numeric value
of each component of the release segment in turn. When comparing release
@ -157,8 +170,8 @@ For example::
2.0
2.0.1
A release series is any set of release numbers that start with a common
prefix. For example, ``3.3.1``, ``3.3.5`` and ``3.3.9.45`` are all
A release series is any set of final release numbers that start with a
common prefix. For example, ``3.3.1``, ``3.3.5`` and ``3.3.9.45`` are all
part of the ``3.3`` release series.
.. note::
@ -206,8 +219,8 @@ of both ``c`` and ``rc`` releases for a common release segment.
Post-releases
-------------
Some projects use post-releases to address minor errors in a release that
do not affect the distributed software (for example, correcting an error
Some projects use post-releases to address minor errors in a final release
that do not affect the distributed software (for example, correcting an error
in the release notes).
If used as part of a project's development cycle, these post-releases are
@ -371,7 +384,7 @@ are permitted and MUST be ordered as shown::
.devN, aN, bN, cN, rcN, <no suffix>, .postN
Note that `rc` will always sort after `c` (regardless of the numeric
component) although they are semantically equivalent. Tools are free to
component) although they are semantically equivalent. Tools MAY
reject this case as ambiguous and remain in compliance with the PEP.
Within an alpha (``1.0a1``), beta (``1.0b1``), or release candidate
@ -506,6 +519,22 @@ numbering based on API compatibility, as well as triggering more appropriate
version comparison semantics.
Olson database versioning
~~~~~~~~~~~~~~~~~~~~~~~~~
The ``pytz`` project inherits its versioning scheme from the corresponding
Olson timezone database versioning scheme: the year followed by a lowercase
character indicating the version of the database within that year.
This can be translated to a compliant 3-part version identifier as
``0.<year>.<serial>``, where the serial starts at zero (for the '<year>a'
release) and is incremented with each subsequent database update within the
year.
As with other translated version identifiers, the corresponding Olson
database version would be recorded in the source label field.
Version specifiers
==================
@ -521,7 +550,6 @@ clause:
* ``~=``: `Compatible release`_ clause
* ``==``: `Version matching`_ clause
* ``!=``: `Version exclusion`_ clause
* ``is``: `Source reference`_ clause
* ``<=``, ``>=``: `Inclusive ordered comparison`_ clause
* ``<``, ``>``: `Exclusive ordered comparison`_ clause
@ -605,6 +633,11 @@ version. The *only* substitution performed is the zero padding of the
release segment to ensure the release segments are compared with the same
length.
Whether or not strict version matching is appropriate depends on the specific
use case for the version specifier. Automated tools SHOULD at least issue
warnings and MAY reject them entirely when strict version matches are used
inappropriately.
Prefix matching may be requested instead of strict comparison, by appending
a trailing ``.*`` to the version identifier in the version matching clause.
This means that additional trailing segments will be ignored when
@ -645,75 +678,6 @@ match or not as shown::
!= 1.1.* # Same prefix, so 1.1.post1 does not match clause
Source reference
----------------
A source reference includes the source reference operator ``is`` and
a source label or a source URL.
Installation tools MAY also permit direct references to a platform
appropriate binary archive in a source reference clause.
Publication tools and public index servers SHOULD NOT permit direct
references to a platform appropriate binary archive in a source
reference clause.
Source label matching works solely on strict equality comparisons: the
candidate source label must be exactly the same as the source label in the
version clause for the clause to match the candidate distribution.
For example, a source reference could be used to depend directly on a
version control hash based identifier rather than the translated public
version::
exact-dependency (is 1.3.7+build.11.e0f985a)
A source URL is distinguished from a source label by the presence of
``:`` and ``/`` characters in the source reference. As these characters
are not permitted in source labels, they indicate that the reference uses
a source URL.
Some appropriate targets for a source URL are a source tarball, an sdist
archive or a direct reference to a tag or specific commit in an online
version control system. The exact URLs and
targets supported will be installation tool specific.
For example, a local source archive may be referenced directly::
pip (is file:///localbuilds/pip-1.3.1.zip)
All source URL references SHOULD either specify a local file URL, a secure
transport mechanism (such as ``https``) or else include an expected hash
value in the URL for verification purposes. If an insecure network
transport is specified without any hash information (or with hash
information that the tool doesn't understand), automated tools SHOULD
at least emit a warning and MAY refuse to rely on the URL.
It is RECOMMENDED that only hashes which are unconditionally provided by
the latest version of the standard library's ``hashlib`` module be used
for source archive hashes. At time of writing, that list consists of
``'md5'``, ``'sha1'``, ``'sha224'``, ``'sha256'``, ``'sha384'``, and
``'sha512'``.
For source archive references, an expected hash value may be
specified by including a ``<hash-algorithm>=<expected-hash>`` as part of
the URL fragment.
For version control references, the ``VCS+protocol`` scheme SHOULD be
used to identify both the version control system and the secure transport.
To support version control systems that do not support including commit or
tag references directly in the URL, that information may be appended to the
end of the URL using the ``@<tag>`` notation.
The use of ``is`` when defining dependencies for published distributions
is strongly discouraged as it greatly complicates the deployment of
security fixes. The source label matching operator is intended primarily
for use when defining dependencies for repeatable *deployments of
applications* while using a shared distribution index, as well as to
reference dependencies which are not published through an index server.
Inclusive ordered comparison
----------------------------
@ -752,62 +716,108 @@ Handling of pre-releases
------------------------
Pre-releases of any kind, including developmental releases, are implicitly
excluded from all version specifiers, *unless* a pre-release or developmental
release is explicitly mentioned in one of the clauses. For example, these
specifiers implicitly exclude all pre-releases and development
releases of later versions::
2.2
>= 1.0
While these specifiers would include at least some of them::
2.2.dev0
2.2, != 2.3b2
>= 1.0a1
>= 1.0c1
>= 1.0, != 1.0b2
>= 1.0, < 2.0.dev123
excluded from all version specifiers, *unless* they are already present
on the system, explicitly requested by the user, or if the only available
version that satisfies the version specifier is a pre-release.
By default, dependency resolution tools SHOULD:
* accept already installed pre-releases for all version specifiers
* accept remotely available pre-releases for version specifiers which
include at least one version clauses that references a pre-release
* accept remotely available pre-releases for version specifiers where
there is no final or post release that satisfies the version specifier
* exclude all other pre-releases from consideration
Dependency resolution tools MAY issue a warning if a pre-release is needed
to satisfy a version specifier.
Dependency resolution tools SHOULD also allow users to request the
following alternative behaviours:
* accepting pre-releases for all version specifiers
* excluding pre-releases for all version specifiers (reporting an error or
warning if a pre-release is already installed locally)
warning if a pre-release is already installed locally, or if a
pre-release is the only way to satisfy a particular specifier)
Dependency resolution tools MAY also allow the above behaviour to be
controlled on a per-distribution basis.
Post-releases and purely numeric releases receive no special treatment in
version specifiers - they are always included unless explicitly excluded.
Post-releases and final releases receive no special treatment in version
specifiers - they are always included unless explicitly excluded.
Examples
--------
* ``3.1``: version 3.1 or later, but not
version 4.0 or later. Excludes pre-releases and developmental releases.
* ``3.1.2``: version 3.1.2 or later, but not
version 3.2.0 or later. Excludes pre-releases and developmental releases.
* ``3.1a1``: version 3.1a1 or later, but not
version 4.0 or later. Allows pre-releases like 3.2a4 and developmental
releases like 3.2.dev1.
* ``3.1``: version 3.1 or later, but not version 4.0 or later.
* ``3.1.2``: version 3.1.2 or later, but not version 3.2.0 or later.
* ``3.1a1``: version 3.1a1 or later, but not version 4.0 or later.
* ``== 3.1``: specifically version 3.1 (or 3.1.0), excludes all pre-releases,
post releases, developmental releases and any 3.1.x maintenance releases.
* ``== 3.1.*``: any version that starts with 3.1, excluding pre-releases and
developmental releases. Equivalent to the ``3.1.0`` compatible release
clause.
* ``== 3.1.*``: any version that starts with 3.1. Equivalent to the
``3.1.0`` compatible release clause.
* ``3.1.0, != 3.1.3``: version 3.1.0 or later, but not version 3.1.3 and
not version 3.2.0 or later. Excludes pre-releases and developmental
releases.
not version 3.2.0 or later.
Direct references
=================
Some automated tools may permit the use of a direct reference as an
alternative to a normal version specifier. A direct reference consists of
the word ``from`` and an explicit URL.
Whether or not direct references are appropriate depends on the specific
use case for the version specifier. Automated tools SHOULD at least issue
warnings and MAY reject them entirely when direct references are used
inappropriately.
Public index servers SHOULD NOT allow the use of direct references in
uploaded distributions. Direct references are intended as a tool for
software integrators rather than publishers.
Depending on the use case, some appropriate targets for a direct URL
reference may be a valid ``source_url`` entry (see PEP 426), an sdist, or
a wheel binary archive. The exact URLs and targets supported will be tool
dependent.
For example, a local source archive may be referenced directly::
pip (from file:///localbuilds/pip-1.3.1.zip)
Alternatively, a prebuilt archive may also be referenced::
pip (from file:///localbuilds/pip-1.3.1-py33-none-any.whl)
All direct references that do not refer to a local file URL SHOULD
specify a secure transport mechanism (such as ``https``), include an
expected hash value in the URL for verification purposes, or both. If an
insecure transport is specified without any hash information, with hash
information that the tool doesn't understand, or with a selected hash
algorithm that the tool considers too weak to trust, automated tools
SHOULD at least emit a warning and MAY refuse to rely on the URL.
It is RECOMMENDED that only hashes which are unconditionally provided by
the latest version of the standard library's ``hashlib`` module be used
for source archive hashes. At time of writing, that list consists of
``'md5'``, ``'sha1'``, ``'sha224'``, ``'sha256'``, ``'sha384'``, and
``'sha512'``.
For source archive and wheel references, an expected hash value may be
specified by including a ``<hash-algorithm>=<expected-hash>`` entry as
part of the URL fragment.
Version control references, the ``VCS+protocol`` scheme SHOULD be
used to identify both the version control system and the secure transport.
To support version control systems that do not support including commit or
tag references directly in the URL, that information may be appended to the
end of the URL using the ``@<tag>`` notation.
Remote URL examples::
pip (from https://github.com/pypa/pip/archive/1.3.1.zip)
pip (from http://github.com/pypa/pip/archive/1.3.1.zip#sha1=da9234ee9982d4bbb3c72346a6de940a148ea686)
pip (from git+https://github.com/pypa/pip.git@1.3.1)
Updating the versioning specification
@ -825,28 +835,30 @@ Summary of differences from \PEP 386
* Moved the description of version specifiers into the versioning PEP
* added the "source label" concept to better handle projects that wish to
* Added the "source label" concept to better handle projects that wish to
use a non-compliant versioning scheme internally, especially those based
on DVCS hashes
* added the "compatible release" clause
* added the "source reference" clause
* Added the "direct reference" concept as a standard notation for direct
references to resources (rather than each tool needing to invents its own)
* added the trailing wildcard syntax for prefix based version matching
* Added the "compatible release" clause
* Added the trailing wildcard syntax for prefix based version matching
and exclusion
* changed the top level sort position of the ``.devN`` suffix
* Changed the top level sort position of the ``.devN`` suffix
* allowed single value version numbers
* Allowed single value version numbers
* explicit exclusion of leading or trailing whitespace
* Explicit exclusion of leading or trailing whitespace
* explicit criterion for the exclusion of date based versions
* Explicit criterion for the exclusion of date based versions
* implicitly exclude pre-releases unless explicitly requested
* Implicitly exclude pre-releases unless they're already present or
needed to satisfy a dependency
* treat post releases the same way as unqualified releases
* Treat post releases the same way as unqualified releases
* Discuss ordering and dependencies across metadata versions
@ -995,11 +1007,12 @@ The previous interpretation also excluded post-releases from some version
specifiers for no adequately justified reason.
The updated interpretation is intended to make it difficult to accidentally
accept a pre-release version as satisfying a dependency, while allowing
pre-release versions to be explicitly requested when needed.
accept a pre-release version as satisfying a dependency, while still
allowing pre-release versions to be retrieved automatically when that's the
only way to satisfy a dependency.
The "some forward compatibility assumed" default version constraint is
taken directly from the Ruby community's "pessimistic version constraint"
derived from the Ruby community's "pessimistic version constraint"
operator [2]_ to allow projects to take a cautious approach to forward
compatibility promises, while still easily setting a minimum required
version for their dependencies. It is made the default behaviour rather
@ -1022,16 +1035,26 @@ improved tools for dynamic path manipulation.
The trailing wildcard syntax to request prefix based version matching was
added to make it possible to sensibly define both compatible release clauses
and the desired pre-release handling semantics for ``<`` and ``>`` ordered
comparison clauses.
and the desired pre- and post-release handling semantics for ``<`` and ``>``
ordered comparison clauses.
Source references are added for two purposes. In conjunction with source
labels, they allow hash based references to exact versions that aren't
compliant with the fully ordered public version scheme, such as those
generated from version control. In combination with source URLs, they
also allow the new metadata standard to natively support an existing
feature of ``pip``, which allows arbitrary URLs like
``file:///localbuilds/exampledist-1.0-py33-none-any.whl``.
Adding direct references
------------------------
Direct references are added as an "escape clause" to handle messy real
world situations that don't map neatly to the standard distribution model.
This includes dependencies on unpublished software for internal use, as well
as handling the more complex compatibility issues that may arise when
wrapping third party libraries as C extensions (this is of especial concern
to the scientific community).
Index servers are deliberately given a lot of freedom to disallow direct
references, since they're intended primarily as a tool for integrators
rather than publishers. PyPI in particular is currently going through the
process of *eliminating* dependencies on external references, as unreliable
external services have the effect of slowing down installation operations,
as well as reducing PyPI's own apparent reliability.
References

View File

@ -4,13 +4,13 @@ Version: $Revision$
Last-Modified: $Date$
Author: Antoine Pitrou <solipsis@pitrou.net>
BDFL-Delegate: Benjamin Peterson <benjamin@python.org>
Status: Draft
Status: Accepted
Type: Standards Track
Content-Type: text/x-rst
Created: 2013-05-18
Python-Version: 3.4
Post-History: 2013-05-18
Resolution: TBD
Resolution: http://mail.python.org/pipermail/python-dev/2013-June/126746.html
Abstract
@ -201,8 +201,7 @@ Predictability
--------------
Following this scheme, an object's finalizer is always called exactly
once. The only exception is if an object is resurrected: the finalizer
will be called again when the object becomes unreachable again.
once, even if it was resurrected afterwards.
For CI objects, the order in which finalizers are called (step 2 above)
is undefined.

View File

@ -4,7 +4,7 @@ Version: $Revision$
Last-Modified: $Date$
Author: Łukasz Langa <lukasz@langa.pl>
Discussions-To: Python-Dev <python-dev@python.org>
Status: Accepted
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 22-May-2013
@ -193,48 +193,37 @@ handling of old-style classes and Zope's ExtensionClasses. More
importantly, it introduces support for Abstract Base Classes (ABC).
When a generic function implementation is registered for an ABC, the
dispatch algorithm switches to a mode of MRO calculation for the
provided argument which includes the relevant ABCs. The algorithm is as
follows::
dispatch algorithm switches to an extended form of C3 linearization,
which includes the relevant ABCs in the MRO of the provided argument.
The algorithm inserts ABCs where their functionality is introduced, i.e.
``issubclass(cls, abc)`` returns ``True`` for the class itself but
returns ``False`` for all its direct base classes. Implicit ABCs for
a given class (either registered or inferred from the presence of
a special method like ``__len__()``) are inserted directly after the
last ABC explicitly listed in the MRO of said class.
def _compose_mro(cls, haystack):
"""Calculates the MRO for a given class `cls`, including relevant
abstract base classes from `haystack`."""
bases = set(cls.__mro__)
mro = list(cls.__mro__)
for regcls in haystack:
if regcls in bases or not issubclass(cls, regcls):
continue # either present in the __mro__ or unrelated
for index, base in enumerate(mro):
if not issubclass(base, regcls):
break
if base in bases and not issubclass(regcls, base):
# Conflict resolution: put classes present in __mro__
# and their subclasses first.
index += 1
mro.insert(index, regcls)
return mro
In its most basic form, it returns the MRO for the given type::
In its most basic form, this linearization returns the MRO for the given
type::
>>> _compose_mro(dict, [])
[<class 'dict'>, <class 'object'>]
When the haystack consists of ABCs that the specified type is a subclass
of, they are inserted in a predictable order::
When the second argument contains ABCs that the specified type is
a subclass of, they are inserted in a predictable order::
>>> _compose_mro(dict, [Sized, MutableMapping, str,
... Sequence, Iterable])
[<class 'dict'>, <class 'collections.abc.MutableMapping'>,
<class 'collections.abc.Iterable'>, <class 'collections.abc.Sized'>,
<class 'collections.abc.Mapping'>, <class 'collections.abc.Sized'>,
<class 'collections.abc.Iterable'>, <class 'collections.abc.Container'>,
<class 'object'>]
While this mode of operation is significantly slower, all dispatch
decisions are cached. The cache is invalidated on registering new
implementations on the generic function or when user code calls
``register()`` on an ABC to register a new virtual subclass. In the
latter case, it is possible to create a situation with ambiguous
dispatch, for instance::
``register()`` on an ABC to implicitly subclass it. In the latter case,
it is possible to create a situation with ambiguous dispatch, for
instance::
>>> from collections import Iterable, Container
>>> class P:
@ -261,20 +250,38 @@ guess::
RuntimeError: Ambiguous dispatch: <class 'collections.abc.Container'>
or <class 'collections.abc.Iterable'>
Note that this exception would not be raised if ``Iterable`` and
``Container`` had been provided as base classes during class definition.
In this case dispatch happens in the MRO order::
Note that this exception would not be raised if one or more ABCs had
been provided explicitly as base classes during class definition. In
this case dispatch happens in the MRO order::
>>> class Ten(Iterable, Container):
... def __iter__(self):
... for i in range(10):
... yield i
... def __contains__(self, value):
... return value in range(10)
... return value in range(10)
...
>>> g(Ten())
'iterable'
A similar conflict arises when subclassing an ABC is inferred from the
presence of a special method like ``__len__()`` or ``__contains__()``::
>>> class Q:
... def __contains__(self, value):
... return False
...
>>> issubclass(Q, Container)
True
>>> Iterable.register(Q)
>>> g(Q())
Traceback (most recent call last):
...
RuntimeError: Ambiguous dispatch: <class 'collections.abc.Container'>
or <class 'collections.abc.Iterable'>
An early version of the PEP contained a custom approach that was simpler
but created a number of edge cases with surprising results [#why-c3]_.
Usage Patterns
==============
@ -378,6 +385,8 @@ References
a particular annotation style".
(http://www.python.org/dev/peps/pep-0008)
.. [#why-c3] http://bugs.python.org/issue18244
.. [#pep-3124] http://www.python.org/dev/peps/pep-3124/
.. [#peak-rules] http://peak.telecommunity.com/DevCenter/PEAK_2dRules

773
pep-0445.txt Normal file
View File

@ -0,0 +1,773 @@
PEP: 445
Title: Add new APIs to customize Python memory allocators
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@gmail.com>
BDFL-Delegate: Antoine Pitrou <solipsis@pitrou.net>
Status: Accepted
Type: Standards Track
Content-Type: text/x-rst
Created: 15-june-2013
Python-Version: 3.4
Resolution: http://mail.python.org/pipermail/python-dev/2013-July/127222.html
Abstract
========
This PEP proposes new Application Programming Interfaces (API) to customize
Python memory allocators. The only implementation required to conform to
this PEP is CPython, but other implementations may choose to be compatible,
or to re-use a similar scheme.
Rationale
=========
Use cases:
* Applications embedding Python which want to isolate Python memory from
the memory of the application, or want to use a different memory
allocator optimized for its Python usage
* Python running on embedded devices with low memory and slow CPU.
A custom memory allocator can be used for efficiency and/or to get
access all the memory of the device.
* Debug tools for memory allocators:
- track the memory usage (find memory leaks)
- get the location of a memory allocation: Python filename and line
number, and the size of a memory block
- detect buffer underflow, buffer overflow and misuse of Python
allocator APIs (see `Redesign Debug Checks on Memory Block
Allocators as Hooks`_)
- force memory allocations to fail to test handling of the
``MemoryError`` exception
Proposal
========
New Functions and Structures
----------------------------
* Add a new GIL-free (no need to hold the GIL) memory allocator:
- ``void* PyMem_RawMalloc(size_t size)``
- ``void* PyMem_RawRealloc(void *ptr, size_t new_size)``
- ``void PyMem_RawFree(void *ptr)``
- The newly allocated memory will not have been initialized in any
way.
- Requesting zero bytes returns a distinct non-*NULL* pointer if
possible, as if ``PyMem_Malloc(1)`` had been called instead.
* Add a new ``PyMemAllocator`` structure::
typedef struct {
/* user context passed as the first argument to the 3 functions */
void *ctx;
/* allocate a memory block */
void* (*malloc) (void *ctx, size_t size);
/* allocate or resize a memory block */
void* (*realloc) (void *ctx, void *ptr, size_t new_size);
/* release a memory block */
void (*free) (void *ctx, void *ptr);
} PyMemAllocator;
* Add a new ``PyMemAllocatorDomain`` enum to choose the Python
allocator domain. Domains:
- ``PYMEM_DOMAIN_RAW``: ``PyMem_RawMalloc()``, ``PyMem_RawRealloc()``
and ``PyMem_RawFree()``
- ``PYMEM_DOMAIN_MEM``: ``PyMem_Malloc()``, ``PyMem_Realloc()`` and
``PyMem_Free()``
- ``PYMEM_DOMAIN_OBJ``: ``PyObject_Malloc()``, ``PyObject_Realloc()``
and ``PyObject_Free()``
* Add new functions to get and set memory block allocators:
- ``void PyMem_GetAllocator(PyMemAllocatorDomain domain, PyMemAllocator *allocator)``
- ``void PyMem_SetAllocator(PyMemAllocatorDomain domain, PyMemAllocator *allocator)``
- The new allocator must return a distinct non-*NULL* pointer when
requesting zero bytes
- For the ``PYMEM_DOMAIN_RAW`` domain, the allocator must be
thread-safe: the GIL is not held when the allocator is called.
* Add a new ``PyObjectArenaAllocator`` structure::
typedef struct {
/* user context passed as the first argument to the 2 functions */
void *ctx;
/* allocate an arena */
void* (*alloc) (void *ctx, size_t size);
/* release an arena */
void (*free) (void *ctx, void *ptr, size_t size);
} PyObjectArenaAllocator;
* Add new functions to get and set the arena allocator used by
*pymalloc*:
- ``void PyObject_GetArenaAllocator(PyObjectArenaAllocator *allocator)``
- ``void PyObject_SetArenaAllocator(PyObjectArenaAllocator *allocator)``
* Add a new function to reinstall the debug checks on memory allocators when
a memory allocator is replaced with ``PyMem_SetAllocator()``:
- ``void PyMem_SetupDebugHooks(void)``
- Install the debug hooks on all memory block allocators. The function can be
called more than once, hooks are only installed once.
- The function does nothing is Python is not compiled in debug mode.
* Memory block allocators always return *NULL* if *size* is greater than
``PY_SSIZE_T_MAX``. The check is done before calling the inner
function.
.. note::
The *pymalloc* allocator is optimized for objects smaller than 512 bytes
with a short lifetime. It uses memory mappings with a fixed size of 256
KB called "arenas".
Here is how the allocators are set up by default:
* ``PYMEM_DOMAIN_RAW``, ``PYMEM_DOMAIN_MEM``: ``malloc()``,
``realloc()`` and ``free()``; call ``malloc(1)`` when requesting zero
bytes
* ``PYMEM_DOMAIN_OBJ``: *pymalloc* allocator which falls back on
``PyMem_Malloc()`` for allocations larger than 512 bytes
* *pymalloc* arena allocator: ``VirtualAlloc()`` and ``VirtualFree()`` on
Windows, ``mmap()`` and ``munmap()`` when available, or ``malloc()``
and ``free()``
Redesign Debug Checks on Memory Block Allocators as Hooks
---------------------------------------------------------
Since Python 2.3, Python implements different checks on memory
allocators in debug mode:
* Newly allocated memory is filled with the byte ``0xCB``, freed memory
is filled with the byte ``0xDB``.
* Detect API violations, ex: ``PyObject_Free()`` called on a memory
block allocated by ``PyMem_Malloc()``
* Detect write before the start of the buffer (buffer underflow)
* Detect write after the end of the buffer (buffer overflow)
In Python 3.3, the checks are installed by replacing ``PyMem_Malloc()``,
``PyMem_Realloc()``, ``PyMem_Free()``, ``PyObject_Malloc()``,
``PyObject_Realloc()`` and ``PyObject_Free()`` using macros. The new
allocator allocates a larger buffer and writes a pattern to detect buffer
underflow, buffer overflow and use after free (by filling the buffer with
the byte ``0xDB``). It uses the original ``PyObject_Malloc()``
function to allocate memory. So ``PyMem_Malloc()`` and
``PyMem_Realloc()`` indirectly call``PyObject_Malloc()`` and
``PyObject_Realloc()``.
This PEP redesigns the debug checks as hooks on the existing allocators
in debug mode. Examples of call traces without the hooks:
* ``PyMem_RawMalloc()`` => ``_PyMem_RawMalloc()`` => ``malloc()``
* ``PyMem_Realloc()`` => ``_PyMem_RawRealloc()`` => ``realloc()``
* ``PyObject_Free()`` => ``_PyObject_Free()``
Call traces when the hooks are installed (debug mode):
* ``PyMem_RawMalloc()`` => ``_PyMem_DebugMalloc()``
=> ``_PyMem_RawMalloc()`` => ``malloc()``
* ``PyMem_Realloc()`` => ``_PyMem_DebugRealloc()``
=> ``_PyMem_RawRealloc()`` => ``realloc()``
* ``PyObject_Free()`` => ``_PyMem_DebugFree()``
=> ``_PyObject_Free()``
As a result, ``PyMem_Malloc()`` and ``PyMem_Realloc()`` now call
``malloc()`` and ``realloc()`` in both release mode and debug mode,
instead of calling ``PyObject_Malloc()`` and ``PyObject_Realloc()`` in
debug mode.
When at least one memory allocator is replaced with
``PyMem_SetAllocator()``, the ``PyMem_SetupDebugHooks()`` function must
be called to reinstall the debug hooks on top on the new allocator.
Don't call malloc() directly anymore
------------------------------------
``PyObject_Malloc()`` falls back on ``PyMem_Malloc()`` instead of
``malloc()`` if size is greater or equal than 512 bytes, and
``PyObject_Realloc()`` falls back on ``PyMem_Realloc()`` instead of
``realloc()``
Direct calls to ``malloc()`` are replaced with ``PyMem_Malloc()``, or
``PyMem_RawMalloc()`` if the GIL is not held.
External libraries like zlib or OpenSSL can be configured to allocate memory
using ``PyMem_Malloc()`` or ``PyMem_RawMalloc()``. If the allocator of a
library can only be replaced globally (rather than on an object-by-object
basis), it shouldn't be replaced when Python is embedded in an application.
For the "track memory usage" use case, it is important to track memory
allocated in external libraries to have accurate reports, because these
allocations can be large (e.g. they can raise a ``MemoryError`` exception)
and would otherwise be missed in memory usage reports.
Examples
========
Use case 1: Replace Memory Allocators, keep pymalloc
----------------------------------------------------
Dummy example wasting 2 bytes per memory block,
and 10 bytes per *pymalloc* arena::
#include <stdlib.h>
size_t alloc_padding = 2;
size_t arena_padding = 10;
void* my_malloc(void *ctx, size_t size)
{
int padding = *(int *)ctx;
return malloc(size + padding);
}
void* my_realloc(void *ctx, void *ptr, size_t new_size)
{
int padding = *(int *)ctx;
return realloc(ptr, new_size + padding);
}
void my_free(void *ctx, void *ptr)
{
free(ptr);
}
void* my_alloc_arena(void *ctx, size_t size)
{
int padding = *(int *)ctx;
return malloc(size + padding);
}
void my_free_arena(void *ctx, void *ptr, size_t size)
{
free(ptr);
}
void setup_custom_allocator(void)
{
PyMemAllocator alloc;
PyObjectArenaAllocator arena;
alloc.ctx = &alloc_padding;
alloc.malloc = my_malloc;
alloc.realloc = my_realloc;
alloc.free = my_free;
PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &alloc);
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, &alloc);
/* leave PYMEM_DOMAIN_OBJ unchanged, use pymalloc */
arena.ctx = &arena_padding;
arena.alloc = my_alloc_arena;
arena.free = my_free_arena;
PyObject_SetArenaAllocator(&arena);
PyMem_SetupDebugHooks();
}
Use case 2: Replace Memory Allocators, override pymalloc
--------------------------------------------------------
If you have a dedicated allocator optimized for allocations of objects
smaller than 512 bytes with a short lifetime, pymalloc can be overriden
(replace ``PyObject_Malloc()``).
Dummy example wasting 2 bytes per memory block::
#include <stdlib.h>
size_t padding = 2;
void* my_malloc(void *ctx, size_t size)
{
int padding = *(int *)ctx;
return malloc(size + padding);
}
void* my_realloc(void *ctx, void *ptr, size_t new_size)
{
int padding = *(int *)ctx;
return realloc(ptr, new_size + padding);
}
void my_free(void *ctx, void *ptr)
{
free(ptr);
}
void setup_custom_allocator(void)
{
PyMemAllocator alloc;
alloc.ctx = &padding;
alloc.malloc = my_malloc;
alloc.realloc = my_realloc;
alloc.free = my_free;
PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &alloc);
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, &alloc);
PyMem_SetAllocator(PYMEM_DOMAIN_OBJ, &alloc);
PyMem_SetupDebugHooks();
}
The *pymalloc* arena does not need to be replaced, because it is no more
used by the new allocator.
Use case 3: Setup Hooks On Memory Block Allocators
--------------------------------------------------
Example to setup hooks on all memory block allocators::
struct {
PyMemAllocator raw;
PyMemAllocator mem;
PyMemAllocator obj;
/* ... */
} hook;
static void* hook_malloc(void *ctx, size_t size)
{
PyMemAllocator *alloc = (PyMemAllocator *)ctx;
void *ptr;
/* ... */
ptr = alloc->malloc(alloc->ctx, size);
/* ... */
return ptr;
}
static void* hook_realloc(void *ctx, void *ptr, size_t new_size)
{
PyMemAllocator *alloc = (PyMemAllocator *)ctx;
void *ptr2;
/* ... */
ptr2 = alloc->realloc(alloc->ctx, ptr, new_size);
/* ... */
return ptr2;
}
static void hook_free(void *ctx, void *ptr)
{
PyMemAllocator *alloc = (PyMemAllocator *)ctx;
/* ... */
alloc->free(alloc->ctx, ptr);
/* ... */
}
void setup_hooks(void)
{
PyMemAllocator alloc;
static int installed = 0;
if (installed)
return;
installed = 1;
alloc.malloc = hook_malloc;
alloc.realloc = hook_realloc;
alloc.free = hook_free;
PyMem_GetAllocator(PYMEM_DOMAIN_RAW, &hook.raw);
PyMem_GetAllocator(PYMEM_DOMAIN_MEM, &hook.mem);
PyMem_GetAllocator(PYMEM_DOMAIN_OBJ, &hook.obj);
alloc.ctx = &hook.raw;
PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &alloc);
alloc.ctx = &hook.mem;
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, &alloc);
alloc.ctx = &hook.obj;
PyMem_SetAllocator(PYMEM_DOMAIN_OBJ, &alloc);
}
.. note::
``PyMem_SetupDebugHooks()`` does not need to be called because
memory allocator are not replaced: the debug checks on memory
block allocators are installed automatically at startup.
Performances
============
The implementation of this PEP (issue #3329) has no visible overhead on
the Python benchmark suite.
Results of the `Python benchmarks suite
<http://hg.python.org/benchmarks>`_ (-b 2n3): some tests are 1.04x
faster, some tests are 1.04 slower. Results of pybench microbenchmark:
"+0.1%" slower globally (diff between -4.9% and +5.6%).
The full output of benchmarks is attached to the issue #3329.
Rejected Alternatives
=====================
More specific functions to get/set memory allocators
----------------------------------------------------
It was originally proposed a larger set of C API functions, with one pair
of functions for each allocator domain:
* ``void PyMem_GetRawAllocator(PyMemAllocator *allocator)``
* ``void PyMem_GetAllocator(PyMemAllocator *allocator)``
* ``void PyObject_GetAllocator(PyMemAllocator *allocator)``
* ``void PyMem_SetRawAllocator(PyMemAllocator *allocator)``
* ``void PyMem_SetAllocator(PyMemAllocator *allocator)``
* ``void PyObject_SetAllocator(PyMemAllocator *allocator)``
This alternative was rejected because it is not possible to write
generic code with more specific functions: code must be duplicated for
each memory allocator domain.
Make PyMem_Malloc() reuse PyMem_RawMalloc() by default
------------------------------------------------------
If ``PyMem_Malloc()`` called ``PyMem_RawMalloc()`` by default,
calling ``PyMem_SetAllocator(PYMEM_DOMAIN_RAW, alloc)`` would also
patch ``PyMem_Malloc()`` indirectly.
This alternative was rejected because ``PyMem_SetAllocator()`` would
have a different behaviour depending on the domain. Always having the
same behaviour is less error-prone.
Add a new PYDEBUGMALLOC environment variable
--------------------------------------------
It was proposed to add a new ``PYDEBUGMALLOC`` environment variable to
enable debug checks on memory block allocators. It would have had the same
effect as calling the ``PyMem_SetupDebugHooks()``, without the need
to write any C code. Another advantage is to allow to enable debug checks
even in release mode: debug checks would always be compiled in, but only
enabled when the environment variable is present and non-empty.
This alternative was rejected because a new environment variable would
make Python initialization even more complex. `PEP 432
<http://www.python.org/dev/peps/pep-0432/>`_ tries to simplify the
CPython startup sequence.
Use macros to get customizable allocators
-----------------------------------------
To have no overhead in the default configuration, customizable
allocators would be an optional feature enabled by a configuration
option or by macros.
This alternative was rejected because the use of macros implies having
to recompile extensions modules to use the new allocator and allocator
hooks. Not having to recompile Python nor extension modules makes debug
hooks easier to use in practice.
Pass the C filename and line number
-----------------------------------
Define allocator functions as macros using ``__FILE__`` and ``__LINE__``
to get the C filename and line number of a memory allocation.
Example of ``PyMem_Malloc`` macro with the modified
``PyMemAllocator`` structure::
typedef struct {
/* user context passed as the first argument
to the 3 functions */
void *ctx;
/* allocate a memory block */
void* (*malloc) (void *ctx, const char *filename, int lineno,
size_t size);
/* allocate or resize a memory block */
void* (*realloc) (void *ctx, const char *filename, int lineno,
void *ptr, size_t new_size);
/* release a memory block */
void (*free) (void *ctx, const char *filename, int lineno,
void *ptr);
} PyMemAllocator;
void* _PyMem_MallocTrace(const char *filename, int lineno,
size_t size);
/* the function is still needed for the Python stable ABI */
void* PyMem_Malloc(size_t size);
#define PyMem_Malloc(size) \
_PyMem_MallocTrace(__FILE__, __LINE__, size)
The GC allocator functions would also have to be patched. For example,
``_PyObject_GC_Malloc()`` is used in many C functions and so objects of
different types would have the same allocation location.
This alternative was rejected because passing a filename and a line
number to each allocator makes the API more complex: pass 3 new
arguments (ctx, filename, lineno) to each allocator function, instead of
just a context argument (ctx). Having to also modify GC allocator
functions adds too much complexity for a little gain.
GIL-free PyMem_Malloc()
-----------------------
In Python 3.3, when Python is compiled in debug mode, ``PyMem_Malloc()``
indirectly calls ``PyObject_Malloc()`` which requires the GIL to be
held (it isn't thread-safe). That's why ``PyMem_Malloc()`` must be called
with the GIL held.
This PEP changes ``PyMem_Malloc()``: it now always calls ``malloc()``
rather than ``PyObject_Malloc()``. The "GIL must be held" restriction
could therefore be removed from ``PyMem_Malloc()``.
This alternative was rejected because allowing to call
``PyMem_Malloc()`` without holding the GIL can break applications
which setup their own allocators or allocator hooks. Holding the GIL is
convenient to develop a custom allocator: no need to care about other
threads. It is also convenient for a debug allocator hook: Python
objects can be safely inspected, and the C API may be used for reporting.
Moreover, calling ``PyGILState_Ensure()`` in a memory allocator has
unexpected behaviour, especially at Python startup and when creating of a
new Python thread state. It is better to free custom allocators of
the responsibility of acquiring the GIL.
Don't add PyMem_RawMalloc()
---------------------------
Replace ``malloc()`` with ``PyMem_Malloc()``, but only if the GIL is
held. Otherwise, keep ``malloc()`` unchanged.
The ``PyMem_Malloc()`` is used without the GIL held in some Python
functions. For example, the ``main()`` and ``Py_Main()`` functions of
Python call ``PyMem_Malloc()`` whereas the GIL do not exist yet. In this
case, ``PyMem_Malloc()`` would be replaced with ``malloc()`` (or
``PyMem_RawMalloc()``).
This alternative was rejected because ``PyMem_RawMalloc()`` is required
for accurate reports of the memory usage. When a debug hook is used to
track the memory usage, the memory allocated by direct calls to
``malloc()`` cannot be tracked. ``PyMem_RawMalloc()`` can be hooked and
so all the memory allocated by Python can be tracked, including
memory allocated without holding the GIL.
Use existing debug tools to analyze memory use
----------------------------------------------
There are many existing debug tools to analyze memory use. Some
examples: `Valgrind <http://valgrind.org/>`_, `Purify
<http://ibm.com/software/awdtools/purify/>`_, `Clang AddressSanitizer
<http://code.google.com/p/address-sanitizer/>`_, `failmalloc
<http://www.nongnu.org/failmalloc/>`_, etc.
The problem is to retrieve the Python object related to a memory pointer
to read its type and/or its content. Another issue is to retrieve the
source of the memory allocation: the C backtrace is usually useless
(same reasoning than macros using ``__FILE__`` and ``__LINE__``, see
`Pass the C filename and line number`_), the Python filename and line
number (or even the Python traceback) is more useful.
This alternative was rejected because classic tools are unable to
introspect Python internals to collect such information. Being able to
setup a hook on allocators called with the GIL held allows to collect a
lot of useful data from Python internals.
Add a msize() function
----------------------
Add another function to ``PyMemAllocator`` and
``PyObjectArenaAllocator`` structures::
size_t msize(void *ptr);
This function returns the size of a memory block or a memory mapping.
Return (size_t)-1 if the function is not implemented or if the pointer
is unknown (ex: NULL pointer).
On Windows, this function can be implemented using ``_msize()`` and
``VirtualQuery()``.
The function can be used to implement a hook tracking the memory usage.
The ``free()`` method of an allocator only gets the address of a memory
block, whereas the size of the memory block is required to update the
memory usage.
The additional ``msize()`` function was rejected because only few
platforms implement it. For example, Linux with the GNU libc does not
provide a function to get the size of a memory block. ``msize()`` is not
currently used in the Python source code. The function would only be
used to track memory use, and make the API more complex. A debug hook
can implement the function internally, there is no need to add it to
``PyMemAllocator`` and ``PyObjectArenaAllocator`` structures.
No context argument
-------------------
Simplify the signature of allocator functions, remove the context
argument:
* ``void* malloc(size_t size)``
* ``void* realloc(void *ptr, size_t new_size)``
* ``void free(void *ptr)``
It is likely for an allocator hook to be reused for
``PyMem_SetAllocator()`` and ``PyObject_SetAllocator()``, or even
``PyMem_SetRawAllocator()``, but the hook must call a different function
depending on the allocator. The context is a convenient way to reuse the
same custom allocator or hook for different Python allocators.
In C++, the context can be used to pass *this*.
External Libraries
==================
Examples of API used to customize memory allocators.
Libraries used by Python:
* OpenSSL: `CRYPTO_set_mem_functions()
<http://git.openssl.org/gitweb/?p=openssl.git;a=blob;f=crypto/mem.c;h=f7984fa958eb1edd6c61f6667f3f2b29753be662;hb=HEAD#l124>`_
to set memory management functions globally
* expat: `parserCreate()
<http://hg.python.org/cpython/file/cc27d50bd91a/Modules/expat/xmlparse.c#l724>`_
has a per-instance memory handler
* zlib: `zlib 1.2.8 Manual <http://www.zlib.net/manual.html#Usage>`_,
pass an opaque pointer
* bz2: `bzip2 and libbzip2, version 1.0.5
<http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html>`_,
pass an opaque pointer
* lzma: `LZMA SDK - How to Use
<http://www.asawicki.info/news_1368_lzma_sdk_-_how_to_use.html>`_,
pass an opaque pointer
* lipmpdec: no opaque pointer (classic malloc API)
Other libraries:
* glib: `g_mem_set_vtable()
<http://developer.gnome.org/glib/unstable/glib-Memory-Allocation.html#g-mem-set-vtable>`_
* libxml2:
`xmlGcMemSetup() <http://xmlsoft.org/html/libxml-xmlmemory.html>`_,
global
* Oracle's OCI: `Oracle Call Interface Programmer's Guide,
Release 2 (9.2)
<http://docs.oracle.com/cd/B10501_01/appdev.920/a96584/oci15re4.htm>`_,
pass an opaque pointer
The new *ctx* parameter of this PEP was inspired by the API of zlib and
Oracle's OCI libraries.
See also the `GNU libc: Memory Allocation Hooks
<http://www.gnu.org/software/libc/manual/html_node/Hooks-for-Malloc.html>`_
which uses a different approach to hook memory allocators.
Memory Allocators
=================
The C standard library provides the well known ``malloc()`` function.
Its implementation depends on the platform and of the C library. The GNU
C library uses a modified ptmalloc2, based on "Doug Lea's Malloc"
(dlmalloc). FreeBSD uses `jemalloc
<http://www.canonware.com/jemalloc/>`_. Google provides *tcmalloc* which
is part of `gperftools <http://code.google.com/p/gperftools/>`_.
``malloc()`` uses two kinds of memory: heap and memory mappings. Memory
mappings are usually used for large allocations (ex: larger than 256
KB), whereas the heap is used for small allocations.
On UNIX, the heap is handled by ``brk()`` and ``sbrk()`` system calls,
and it is contiguous. On Windows, the heap is handled by
``HeapAlloc()`` and can be discontiguous. Memory mappings are handled by
``mmap()`` on UNIX and ``VirtualAlloc()`` on Windows, they can be
discontiguous.
Releasing a memory mapping gives back immediatly the memory to the
system. On UNIX, the heap memory is only given back to the system if the
released block is located at the end of the heap. Otherwise, the memory
will only be given back to the system when all the memory located after
the released memory is also released.
To allocate memory on the heap, an allocator tries to reuse free space.
If there is no contiguous space big enough, the heap must be enlarged,
even if there is more free space than required size. This issue is
called the "memory fragmentation": the memory usage seen by the system
is higher than real usage. On Windows, ``HeapAlloc()`` creates
a new memory mapping with ``VirtualAlloc()`` if there is not enough free
contiguous memory.
CPython has a *pymalloc* allocator for allocations smaller than 512
bytes. This allocator is optimized for small objects with a short
lifetime. It uses memory mappings called "arenas" with a fixed size of
256 KB.
Other allocators:
* Windows provides a `Low-fragmentation Heap
<http://msdn.microsoft.com/en-us/library/windows/desktop/aa366750%28v=vs.85%29.aspx>`_.
* The Linux kernel uses `slab allocation
<http://en.wikipedia.org/wiki/Slab_allocation>`_.
* The glib library has a `Memory Slice API
<https://developer.gnome.org/glib/unstable/glib-Memory-Slices.html>`_:
efficient way to allocate groups of equal-sized chunks of memory
This PEP allows to choose exactly which memory allocator is used for your
application depending on its usage of the memory (number of allocations,
size of allocations, lifetime of objects, etc.).
Links
=====
CPython issues related to memory allocation:
* `Issue #3329: Add new APIs to customize memory allocators
<http://bugs.python.org/issue3329>`_
* `Issue #13483: Use VirtualAlloc to allocate memory arenas
<http://bugs.python.org/issue13483>`_
* `Issue #16742: PyOS_Readline drops GIL and calls PyOS_StdioReadline,
which isn't thread safe <http://bugs.python.org/issue16742>`_
* `Issue #18203: Replace calls to malloc() with PyMem_Malloc() or
PyMem_RawMalloc() <http://bugs.python.org/issue18203>`_
* `Issue #18227: Use Python memory allocators in external libraries like
zlib or OpenSSL <http://bugs.python.org/issue18227>`_
Projects analyzing the memory usage of Python applications:
* `pytracemalloc
<https://pypi.python.org/pypi/pytracemalloc>`_
* `Meliae: Python Memory Usage Analyzer
<https://pypi.python.org/pypi/meliae>`_
* `Guppy-PE: umbrella package combining Heapy and GSL
<http://guppy-pe.sourceforge.net/>`_
* `PySizer (developed for Python 2.4)
<http://pysizer.8325.org/>`_
Copyright
=========
This document has been placed into the public domain.

242
pep-0446.txt Normal file
View File

@ -0,0 +1,242 @@
PEP: 446
Title: Add new parameters to configure the inheritance of files and for non-blocking sockets
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 3-July-2013
Python-Version: 3.4
Abstract
========
This PEP proposes new portable parameters and functions to configure the
inheritance of file descriptors and the non-blocking flag of sockets.
Rationale
=========
Inheritance of file descriptors
-------------------------------
The inheritance of file descriptors in child processes can be configured
on each file descriptor using a *close-on-exec* flag. By default, the
close-on-exec flag is not set.
On Windows, the close-on-exec flag is ``HANDLE_FLAG_INHERIT``. File
descriptors are not inherited if the ``bInheritHandles`` parameter of
the ``CreateProcess()`` function is ``FALSE``, even if the
``HANDLE_FLAG_INHERIT`` flag is set. If ``bInheritHandles`` is ``TRUE``,
only file descriptors with ``HANDLE_FLAG_INHERIT`` flag set are
inherited, others are not.
On UNIX, the close-on-exec flag is ``O_CLOEXEC``. File descriptors with
the ``O_CLOEXEC`` flag set are closed at the execution of a new program
(ex: when calling ``execv()``).
The ``O_CLOEXEC`` flag has no effect on ``fork()``, all file descriptors
are inherited by the child process. Futhermore, most properties file
descriptors are shared between the parent and the child processes,
except file attributes which are duplicated (``O_CLOEXEC`` is the only
file attribute). Setting ``O_CLOEXEC`` flag of a file descriptor in the
child process does not change the ``O_CLOEXEC`` flag of the file
descriptor in the parent process.
Issues of the inheritance of file descriptors
---------------------------------------------
Inheritance of file descriptors causes issues. For example, closing a
file descriptor in the parent process does not release the resource
(file, socket, ...), because the file descriptor is still open in the
child process.
Leaking file descriptors is also a major security vulnerability. An
untrusted child process can read sensitive data like passwords and take
control of the parent process though leaked file descriptors. It is for
example a known vulnerability to escape from a chroot.
Non-blocking sockets
--------------------
To handle multiple network clients in a single thread, a multiplexing
function like ``select()`` can be used. For best performances, sockets
must be configured as non-blocking. Operations like ``send()`` and
``recv()`` return an ``EAGAIN`` or ``EWOULDBLOCK`` error if the
operation would block.
By default, newly created sockets are blocking. Setting the non-blocking
mode requires additional system calls.
On UNIX, the blocking flag is ``O_NONBLOCK``: a pipe and a socket are
non-blocking if the ``O_NONBLOCK`` flag is set.
Setting flags at the creation of the file descriptor
----------------------------------------------------
Windows and recent versions of other operating systems like Linux
support setting the close-on-exec flag directly at the creation of file
descriptors, and close-on-exec and blocking flags at the creation of
sockets.
Setting these flags at the creation is atomic and avoids additional
system calls.
Proposal
========
New cloexec And blocking Parameters
-----------------------------------
Add a new optional *cloexec* on functions creating file descriptors:
* ``io.FileIO``
* ``io.open()``
* ``open()``
* ``os.dup()``
* ``os.dup2()``
* ``os.fdopen()``
* ``os.open()``
* ``os.openpty()``
* ``os.pipe()``
* ``select.devpoll()``
* ``select.epoll()``
* ``select.kqueue()``
Add new optional *cloexec* and *blocking* parameters to functions
creating sockets:
* ``asyncore.dispatcher.create_socket()``
* ``socket.socket()``
* ``socket.socket.accept()``
* ``socket.socket.dup()``
* ``socket.socket.fromfd``
* ``socket.socketpair()``
The default value of *cloexec* is ``False`` and the default value of
*blocking* is ``True``.
The atomicity is not guaranteed. If the platform does not support
setting close-on-exec and blocking flags at the creation of the file
descriptor or socket, the flags are set using additional system calls.
New Functions
-------------
Add new functions the get and set the close-on-exec flag of a file
descriptor, available on all platforms:
* ``os.get_cloexec(fd:int) -> bool``
* ``os.set_cloexec(fd:int, cloexec: bool)``
Add new functions the get and set the blocking flag of a file
descriptor, only available on UNIX:
* ``os.get_blocking(fd:int) -> bool``
* ``os.set_blocking(fd:int, blocking: bool)``
Other Changes
-------------
The ``subprocess.Popen`` class must clear the close-on-exec flag of file
descriptors of the ``pass_fds`` parameter. The flag is cleared in the
child process before executing the program, the change does not change
the flag in the parent process.
The close-on-exec flag must also be set on private file descriptors and
sockets in the Python standard library. For example, on UNIX,
os.urandom() opens ``/dev/urandom`` to read some random bytes and the
file descriptor is closed at function exit. The file descriptor is not
expected to be inherited by child processes.
Rejected Alternatives
=====================
PEP 433
-------
The PEP 433 entitled "Easier suppression of file descriptor inheritance"
is a previous attempt proposing various other alternatives, but no
consensus could be reached.
This PEP has a well defined behaviour (the default value of the new
*cloexec* parameter is not configurable), is more conservative (no
backward compatibility issue), and is much simpler.
Add blocking parameter for file descriptors and use Windows overlapped I/O
--------------------------------------------------------------------------
Windows supports non-blocking operations on files using an extension of
the Windows API called "Overlapped I/O". Using this extension requires
to modify the Python standard library and applications to pass a
``OVERLAPPED`` structure and an event loop to wait for the completion of
operations.
This PEP only tries to expose portable flags on file descriptors and
sockets. Supporting overlapped I/O requires an abstraction providing a
high-level and portable API for asynchronous operations on files and
sockets. Overlapped I/O are out of the scope of this PEP.
UNIX supports non-blocking files, moreover recent versions of operating
systems support setting the non-blocking flag at the creation of a file
descriptor. It would be possible to add a new optional *blocking*
parameter to Python functions creating file descriptors. On Windows,
creating a file descriptor with ``blocking=False`` would raise a
``NotImplementedError``. This behaviour is not acceptable for the ``os``
module which is designed as a thin wrapper on the C functions of the
operating system. If a platform does not support a function, the
function should not be available on the platform. For example,
the ``os.fork()`` function is not available on Windows.
For all these reasons, this alternative was rejected. The PEP 3156
proposes an abstraction for asynchronous I/O supporting non-blocking
files on Windows.
Links
=====
Python issues:
* `#10115: Support accept4() for atomic setting of flags at socket
creation <http://bugs.python.org/issue10115>`_
* `#12105: open() does not able to set flags, such as O_CLOEXEC
<http://bugs.python.org/issue12105>`_
* `#12107: TCP listening sockets created without FD_CLOEXEC flag
<http://bugs.python.org/issue12107>`_
* `#16850: Add "e" mode to open(): close-and-exec
(O_CLOEXEC) / O_NOINHERIT <http://bugs.python.org/issue16850>`_
* `#16860: Use O_CLOEXEC in the tempfile module
<http://bugs.python.org/issue16860>`_
* `#16946: subprocess: _close_open_fd_range_safe() does not set
close-on-exec flag on Linux < 2.6.23 if O_CLOEXEC is defined
<http://bugs.python.org/issue16946>`_
* `#17070: Use the new cloexec to improve security and avoid bugs
<http://bugs.python.org/issue17070>`_
Other links:
* `Secure File Descriptor Handling
<http://udrepper.livejournal.com/20407.html>`_ (Ulrich Drepper,
2008)
* `Ghosts of Unix past, part 2: Conflated designs
<http://lwn.net/Articles/412131/>`_ (Neil Brown, 2010) explains the
history of ``O_CLOEXEC`` and ``O_NONBLOCK`` flags
Copyright
=========
This document has been placed into the public domain.