This commit is contained in:
Brett Cannon 2015-10-16 10:43:49 -07:00
commit 679cfd5aba
23 changed files with 3323 additions and 462 deletions

View File

@ -424,11 +424,12 @@ How to Make A Release
that directory. Note though that if you're releasing a maintenance
release for an older version, don't change the current link.
___ If this is a final release (even a maintenance release), also unpack
the HTML docs to /srv/docs.python.org/release/X.Y.Z on
docs.iad1.psf.io. Make sure the files are in group "docs". If it is a
release of a security-fix-only version, tell the DE to build a version
with the "version switcher" and put it there.
___ If this is a final release (even a maintenance release), also
unpack the HTML docs to /srv/docs.python.org/release/X.Y.Z on
docs.iad1.psf.io. Make sure the files are in group "docs" and are
group-writeable. If it is a release of a security-fix-only version,
tell the DE to build a version with the "version switcher"
and put it there.
___ Let the DE check if the docs are built and work all right.
@ -484,6 +485,10 @@ How to Make A Release
Note that the easiest thing is probably to copy fields from
an existing Python release "page", editing as you go.
There should only be one "page" for a release (e.g. 3.5.0, 3.5.1).
Reuse the same page for all pre-releases, changing the version
number and the documentation as you go.
___ If this isn't the first release for a version, open the existing
"page" for editing and update it to the new release. Don't save yet!

951
pep-0103.txt Normal file
View File

@ -0,0 +1,951 @@
PEP: 103
Title: Collecting information about git
Version: $Revision$
Last-Modified: $Date$
Author: Oleg Broytman <phd@phdru.name>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 01-Jun-2015
Post-History: 12-Sep-2015
Abstract
========
This Informational PEP collects information about git. There is, of
course, a lot of documentation for git, so the PEP concentrates on
more complex (and more related to Python development) issues,
scenarios and examples.
The plan is to extend the PEP in the future collecting information
about equivalence of Mercurial and git scenarios to help migrating
Python development from Mercurial to git.
The author of the PEP doesn't currently plan to write a Process PEP on
migration Python development from Mercurial to git.
Documentation
=============
Git is accompanied with a lot of documentation, both online and
offline.
Documentation for starters
--------------------------
Git Tutorial: `part 1
<https://www.kernel.org/pub/software/scm/git/docs/gittutorial.html>`_,
`part 2
<https://www.kernel.org/pub/software/scm/git/docs/gittutorial-2.html>`_.
`Git User's manual
<https://www.kernel.org/pub/software/scm/git/docs/user-manual.html>`_.
`Everyday GIT With 20 Commands Or So
<https://www.kernel.org/pub/software/scm/git/docs/giteveryday.html>`_.
`Git workflows
<https://www.kernel.org/pub/software/scm/git/docs/gitworkflows.html>`_.
Advanced documentation
----------------------
`Git Magic
<http://www-cs-students.stanford.edu/~blynn/gitmagic/index.html>`_,
with a number of translations.
`Pro Git <https://git-scm.com/book>`_. The Book about git. Buy it at
Amazon or download in PDF, mobi, or ePub form. It has translations to
many different languages. Download Russian translation from `GArik
<https://github.com/GArik/progit/wiki>`_.
`Git Wiki <https://git.wiki.kernel.org/index.php/Main_Page>`_.
Offline documentation
---------------------
Git has builtin help: run ``git help $TOPIC``. For example, run
``git help git`` or ``git help help``.
Quick start
===========
Download and installation
-------------------------
Unix users: `download and install using your package manager
<https://git-scm.com/download/linux>`_.
Microsoft Windows: download `git-for-windows
<https://github.com/git-for-windows/git/releases>`_ or `msysGit
<https://github.com/msysgit/msysgit/releases>`_.
MacOS X: use git installed with `XCode
<https://developer.apple.com/xcode/downloads/>`_ or download from
`MacPorts <https://www.macports.org/ports.php?by=name&substr=git>`_ or
`git-osx-installer
<http://sourceforge.net/projects/git-osx-installer/files/>`_ or
install git with `Homebrew <http://brew.sh/>`_: ``brew install git``.
`git-cola <https://git-cola.github.io/index.html>`_ is a Git GUI
written in Python and GPL licensed. Linux, Windows, MacOS X.
`TortoiseGit <https://tortoisegit.org/>`_ is a Windows Shell Interface
to Git based on TortoiseSVN; open source.
Initial configuration
---------------------
This simple code is often appears in documentation, but it is
important so let repeat it here. Git stores author and committer
names/emails in every commit, so configure your real name and
preferred email::
$ git config --global user.name "User Name"
$ git config --global user.email user.name@example.org
Examples in this PEP
====================
Examples of git commands in this PEP use the following approach. It is
supposed that you, the user, works with a local repository named
``python`` that has an upstream remote repo named ``origin``. Your
local repo has two branches ``v1`` and ``master``. For most examples
the currently checked out branch is ``master``. That is, it's assumed
you have done something like that::
$ git clone https://git.python.org/python.git
$ cd python
$ git branch v1 origin/v1
The first command clones remote repository into local directory
`python``, creates a new local branch master, sets
remotes/origin/master as its upstream remote-tracking branch and
checks it out into the working directory.
The last command creates a new local branch v1 and sets
remotes/origin/v1 as its upstream remote-tracking branch.
The same result can be achieved with commands::
$ git clone -b v1 https://git.python.org/python.git
$ cd python
$ git checkout --track origin/master
The last command creates a new local branch master, sets
remotes/origin/master as its upstream remote-tracking branch and
checks it out into the working directory.
Branches and branches
=====================
Git terminology can be a bit misleading. Take, for example, the term
"branch". In git it has two meanings. A branch is a directed line of
commits (possibly with merges). And a branch is a label or a pointer
assigned to a line of commits. It is important to distinguish when you
talk about commits and when about their labels. Lines of commits are
by itself unnamed and are usually only lengthening and merging.
Labels, on the other hand, can be created, moved, renamed and deleted
freely.
Remote repositories and remote branches
=======================================
Remote-tracking branches are branches (pointers to commits) in your
local repository. They are there for git (and for you) to remember
what branches and commits have been pulled from and pushed to what
remote repos (you can pull from and push to many remotes).
Remote-tracking branches live under ``remotes/$REMOTE`` namespaces,
e.g. ``remotes/origin/master``.
To see the status of remote-tracking branches run::
$ git branch -rv
To see local and remote-tracking branches (and tags) pointing to
commits::
$ git log --decorate
You never do your own development on remote-tracking branches. You
create a local branch that has a remote branch as upstream and do
development on that local branch. On push git pushes commits to the
remote repo and updates remote-tracking branches, on pull git fetches
commits from the remote repo, updates remote-tracking branches and
fast-forwards, merges or rebases local branches.
When you do an initial clone like this::
$ git clone -b v1 https://git.python.org/python.git
git clones remote repository ``https://git.python.org/python.git`` to
directory ``python``, creates a remote named ``origin``, creates
remote-tracking branches, creates a local branch ``v1``, configure it
to track upstream remotes/origin/v1 branch and checks out ``v1`` into
the working directory.
Updating local and remote-tracking branches
-------------------------------------------
There is a major difference between
::
$ git fetch $REMOTE $BRANCH
and
::
$ git fetch $REMOTE $BRANCH:$BRANCH
The first command fetches commits from the named $BRANCH in the
$REMOTE repository that are not in your repository, updates
remote-tracking branch and leaves the id (the hash) of the head commit
in file .git/FETCH_HEAD.
The second command fetches commits from the named $BRANCH in the
$REMOTE repository that are not in your repository and updates both
the local branch $BRANCH and its upstream remote-tracking branch. But
it refuses to update branches in case of non-fast-forward. And it
refuses to update the current branch (currently checked out branch,
where HEAD is pointing to).
The first command is used internally by ``git pull``.
::
$ git pull $REMOTE $BRANCH
is equivalent to
::
$ git fetch $REMOTE $BRANCH
$ git merge FETCH_HEAD
Certainly, $BRANCH in that case should be your current branch. If you
want to merge a different branch into your current branch first update
that non-current branch and then merge::
$ git fetch origin v1:v1 # Update v1
$ git pull --rebase origin master # Update the current branch master
# using rebase instead of merge
$ git merge v1
If you have not yet pushed commits on ``v1``, though, the scenario has
to become a bit more complex. Git refuses to update
non-fast-forwardable branch, and you don't want to do force-pull
because that would remove your non-pushed commits and you would need
to recover. So you want to rebase ``v1`` but you cannot rebase
non-current branch. Hence, checkout ``v1`` and rebase it before
merging::
$ git checkout v1
$ git pull --rebase origin v1
$ git checkout master
$ git pull --rebase origin master
$ git merge v1
It is possible to configure git to make it fetch/pull a few branches
or all branches at once, so you can simply run
::
$ git pull origin
or even
::
$ git pull
Default remote repository for fetching/pulling is ``origin``. Default
set of references to fetch is calculated using matching algorithm: git
fetches all branches having the same name on both ends.
Push
''''
Pushing is a bit simpler. There is only one command ``push``. When you
run
::
$ git push origin v1 master
git pushes local v1 to remote v1 and local master to remote master.
The same as::
$ git push origin v1:v1 master:master
Git pushes commits to the remote repo and updates remote-tracking
branches. Git refuses to push commits that aren't fast-forwardable.
You can force-push anyway, but please remember - you can force-push to
your own repositories but don't force-push to public or shared repos.
If you find git refuses to push commits that aren't fast-forwardable,
better fetch and merge commits from the remote repo (or rebase your
commits on top of the fetched commits), then push. Only force-push if
you know what you do and why you do it. See the section `Commit
editing and caveats`_ below.
It is possible to configure git to make it push a few branches or all
branches at once, so you can simply run
::
$ git push origin
or even
::
$ git push
Default remote repository for pushing is ``origin``. Default set of
references to push in git before 2.0 is calculated using matching
algorithm: git pushes all branches having the same name on both ends.
Default set of references to push in git 2.0+ is calculated using
simple algorithm: git pushes the current branch back to its
@{upstream}.
To configure git before 2.0 to the new behaviour run::
$ git config push.default simple
To configure git 2.0+ to the old behaviour run::
$ git config push.default matching
Git doesn't allow to push a branch if it's the current branch in the
remote non-bare repository: git refuses to update remote working
directory. You really should push only to bare repositories. For
non-bare repositories git prefers pull-based workflow.
When you want to deploy code on a remote host and can only use push
(because your workstation is behind a firewall and you cannot pull
from it) you do that in two steps using two repositories: you push
from the workstation to a bare repo on the remote host, ssh to the
remote host and pull from the bare repo to a non-bare deployment repo.
That changed in git 2.3, but see `the blog post
<https://github.com/blog/1957-git-2-3-has-been-released#push-to-deploy>`_
for caveats; in 2.4 the push-to-deploy feature was `further improved
<https://github.com/blog/1994-git-2-4-atomic-pushes-push-to-deploy-and-more#push-to-deploy-improvements>`_.
Tags
''''
Git automatically fetches tags that point to commits being fetched
during fetch/pull. To fetch all tags (and commits they point to) run
``git fetch --tags origin``. To fetch some specific tags fetch them
explicitly::
$ git fetch origin tag $TAG1 tag $TAG2...
For example::
$ git fetch origin tag 1.4.2
$ git fetch origin v1:v1 tag 2.1.7
Git doesn't automatically pushes tags. That allows you to have private
tags. To push tags list them explicitly::
$ git push origin tag 1.4.2
$ git push origin v1 master tag 2.1.7
Or push all tags at once::
$ git push --tags origin
Don't move tags with ``git tag -f`` or remove tags with ``git tag -d``
after they have been published.
Private information
'''''''''''''''''''
When cloning/fetching/pulling/pushing git copies only database objects
(commits, trees, files and tags) and symbolic references (branches and
lightweight tags). Everything else is private to the repository and
never cloned, updated or pushed. It's your config, your hooks, your
private exclude file.
If you want to distribute hooks, copy them to the working tree, add,
commit, push and instruct the team to update and install the hooks
manually.
Commit editing and caveats
==========================
A warning not to edit published (pushed) commits also appears in
documentation but it's repeated here anyway as it's very important.
It is possible to recover from a forced push but it's PITA for the
entire team. Please avoid it.
To see what commits have not been published yet compare the head of the
branch with its upstream remote-tracking branch::
$ git log origin/master.. # from origin/master to HEAD (of master)
$ git log origin/v1..v1 # from origin/v1 to the head of v1
For every branch that has an upstream remote-tracking branch git
maintains an alias @{upstream} (short version @{u}), so the commands
above can be given as::
$ git log @{u}..
$ git log v1@{u}..v1
To see the status of all branches::
$ git branch -avv
To compare the status of local branches with a remote repo::
$ git remote show origin
Read `how to recover from upstream rebase
<https://git-scm.com/docs/git-rebase#_recovering_from_upstream_rebase>`_.
It is in ``git help rebase``.
On the other hand don't be too afraid about commit editing. You can
safely edit, reorder, remove, combine and split commits that haven't
been pushed yet. You can even push commits to your own (backup) repo,
edit them later and force-push edited commits to replace what have
already been pushed. Not a problem until commits are in a public
or shared repository.
Undo
====
Whatever you do, don't panic. Almost anything in git can be undone.
git checkout: restore file's content
------------------------------------
``git checkout``, for example, can be used to restore the content of
file(s) to that one of a commit. Like this::
git checkout HEAD~ README
The commands restores the contents of README file to the last but one
commit in the current branch. By default the commit ID is simply HEAD;
i.e. ``git checkout README`` restores README to the latest commit.
(Do not use ``git checkout`` to view a content of a file in a commit,
use ``git cat-file -p``; e.g. ``git cat-file -p HEAD~:path/to/README``).
git reset: remove (non-pushed) commits
--------------------------------------
``git reset`` moves the head of the current branch. The head can be
moved to point to any commit but it's often used to remove a commit or
a few (preferably, non-pushed ones) from the top of the branch - that
is, to move the branch backward in order to undo a few (non-pushed)
commits.
``git reset`` has three modes of operation - soft, hard and mixed.
Default is mixed. ProGit `explains
<https://git-scm.com/book/en/Git-Tools-Reset-Demystified>`_ the
difference very clearly. Bare repositories don't have indices or
working trees so in a bare repo only soft reset is possible.
Unstaging
'''''''''
Mixed mode reset with a path or paths can be used to unstage changes -
that is, to remove from index changes added with ``git add`` for
committing. See `The Book
<https://git-scm.com/book/en/Git-Basics-Undoing-Things>`_ for details
about unstaging and other undo tricks.
git reflog: reference log
-------------------------
Removing commits with ``git reset`` or moving the head of a branch
sounds dangerous and it is. But there is a way to undo: another
reset back to the original commit. Git doesn't remove commits
immediately; unreferenced commits (in git terminology they are called
"dangling commits") stay in the database for some time (default is two
weeks) so you can reset back to it or create a new branch pointing to
the original commit.
For every move of a branch's head - with ``git commit``, ``git
checkout``, ``git fetch``, ``git pull``, ``git rebase``, ``git reset``
and so on - git stores a reference log (reflog for short). For every
move git stores where the head was. Command ``git reflog`` can be used
to view (and manipulate) the log.
In addition to the moves of the head of every branch git stores the
moves of the HEAD - a symbolic reference that (usually) names the
current branch. HEAD is changed with ``git checkout $BRANCH``.
By default ``git reflog`` shows the moves of the HEAD, i.e. the
command is equivalent to ``git reflog HEAD``. To show the moves of the
head of a branch use the command ``git reflog $BRANCH``.
So to undo a ``git reset`` lookup the original commit in ``git
reflog``, verify it with ``git show`` or ``git log`` and run ``git
reset $COMMIT_ID``. Git stores the move of the branch's head in
reflog, so you can undo that undo later again.
In a more complex situation you'd want to move some commits along with
resetting the head of the branch. Cherry-pick them to the new branch.
For example, if you want to reset the branch ``master`` back to the
original commit but preserve two commits created in the current branch
do something like::
$ git branch save-master # create a new branch saving master
$ git reflog # find the original place of master
$ git reset $COMMIT_ID
$ git cherry-pick save-master~ save-master
$ git branch -D save-master # remove temporary branch
git revert: revert a commit
---------------------------
``git revert`` reverts a commit or commits, that is, it creates a new
commit or commits that revert(s) the effects of the given commits.
It's the only way to undo published commits (``git commit --amend``,
``git rebase`` and ``git reset`` change the branch in
non-fast-forwardable ways so they should only be used for non-pushed
commits.)
There is a problem with reverting a merge commit. ``git revert`` can
undo the code created by the merge commit but it cannot undo the fact
of merge. See the discussion `How to revert a faulty merge
<https://www.kernel.org/pub/software/scm/git/docs/howto/revert-a-faulty-merge.html>`_.
One thing that cannot be undone
-------------------------------
Whatever you undo, there is one thing that cannot be undone -
overwritten uncommitted changes. Uncommitted changes don't belong to
git so git cannot help preserving them.
Most of the time git warns you when you're going to execute a command
that overwrites uncommitted changes. Git doesn't allow you to switch
branches with ``git checkout``. It stops you when you're going to
rebase with non-clean working tree. It refuses to pull new commits
over non-committed files.
But there are commands that do exactly that - overwrite files in the
working tree. Commands like ``git checkout $PATHs`` or ``git reset
--hard`` silently overwrite files including your uncommitted changes.
With that in mind you can understand the stance "commit early, commit
often". Commit as often as possible. Commit on every save in your
editor or IDE. You can edit your commits before pushing - edit commit
messages, change commits, reorder, combine, split, remove. But save
your changes in git database, either commit changes or at least stash
them with ``git stash``.
Merge or rebase?
================
Internet is full of heated discussions on the topic: "merge or
rebase?" Most of them are meaningless. When a DVCS is being used in a
big team with a big and complex project with many branches there is
simply no way to avoid merges. So the question's diminished to
"whether to use rebase, and if yes - when to use rebase?" Considering
that it is very much recommended not to rebase published commits the
question's diminished even further: "whether to use rebase on
non-pushed commits?"
That small question is for the team to decide. The author of the PEP
recommends to use rebase when pulling, i.e. always do ``git pull
--rebase`` or even configure automatic setup of rebase for every new
branch::
$ git config branch.autosetuprebase always
and configure rebase for existing branches::
$ git config branch.$NAME.rebase true
For example::
$ git config branch.v1.rebase true
$ git config branch.master.rebase true
After that ``git pull origin master`` becomes equivalent to ``git pull
--rebase origin master``.
It is recommended to create new commits in a separate feature or topic
branch while using rebase to update the mainline branch. When the
topic branch is ready merge it into mainline. To avoid a tedious task
of resolving large number of conflicts at once you can merge the topic
branch to the mainline from time to time and switch back to the topic
branch to continue working on it. The entire workflow would be
something like::
$ git checkout -b issue-42 # create a new issue branch and switch to it
...edit/test/commit...
$ git checkout master
$ git pull --rebase origin master # update master from the upstream
$ git merge issue-42
$ git branch -d issue-42 # delete the topic branch
$ git push origin master
When the topic branch is deleted only the label is removed, commits
are stayed in the database, they are now merged into master::
o--o--o--o--o--M--< master - the mainline branch
\ /
--*--*--* - the topic branch, now unnamed
The topic branch is deleted to avoid cluttering branch namespace with
small topic branches. Information on what issue was fixed or what
feature was implemented should be in the commit messages.
Null-merges
===========
Git has a builtin merge strategy for what Python core developers call
"null-merge"::
$ git merge -s ours v1 # null-merge v1 into master
Branching models
================
Git doesn't assume any particular development model regarding
branching and merging. Some projects prefer to graduate patches from
the oldest branch to the newest, some prefer to cherry-pick commits
backwards, some use squashing (combining a number of commits into
one). Anything is possible.
There are a few examples to start with. `git help workflows
<https://www.kernel.org/pub/software/scm/git/docs/gitworkflows.html>`_
describes how the very git authors develop git.
ProGit book has a few chapters devoted to branch management in
different projects: `Git Branching - Branching Workflows
<https://git-scm.com/book/en/Git-Branching-Branching-Workflows>`_ and
`Distributed Git - Contributing to a Project
<https://git-scm.com/book/en/Distributed-Git-Contributing-to-a-Project>`_.
There is also a well-known article `A successful Git branching model
<http://nvie.com/posts/a-successful-git-branching-model/>`_ by Vincent
Driessen. It recommends a set of very detailed rules on creating and
managing mainline, topic and bugfix branches. To support the model the
author implemented `git flow <https://github.com/nvie/gitflow>`_
extension.
Advanced configuration
======================
Line endings
------------
Git has builtin mechanisms to handle line endings between platforms
with different end-of-line styles. To allow git to do CRLF conversion
assign ``text`` attribute to files using `.gitattributes
<https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html>`_.
For files that have to have specific line endings assign ``eol``
attribute. For binary files the attribute is, naturally, ``binary``.
For example::
$ cat .gitattributes
*.py text
*.txt text
*.png binary
/readme.txt eol=CRLF
To check what attributes git uses for files use ``git check-attr``
command. For example::
$ git check-attr -a -- \*.py
Advanced topics
===============
Staging area
------------
Staging area aka index aka cache is a distinguishing feature of git.
Staging area is where git collects patches before committing them.
Separation between collecting patches and commit phases provides a
very useful feature of git: you can review collected patches before
commit and even edit them - remove some hunks, add new hunks and
review again.
To add files to the index use ``git add``. Collecting patches before
committing means you need to do that for every change, not only to add
new (untracked) files. To simplify committing in case you just want to
commit everything without reviewing run ``git commit --all`` (or just
``-a``) - the command adds every changed tracked file to the index and
then commit. To commit a file or files regardless of patches collected
in the index run ``git commit [--only|-o] -- $FILE...``.
To add hunks of patches to the index use ``git add --patch`` (or just
``-p``). To remove collected files from the index use ``git reset HEAD
-- $FILE...`` To add/inspect/remove collected hunks use ``git add
--interactive`` (``-i``).
To see the diff between the index and the last commit (i.e., collected
patches) use ``git diff --cached``. To see the diff between the
working tree and the index (i.e., uncollected patches) use just ``git
diff``. To see the diff between the working tree and the last commit
(i.e., both collected and uncollected patches) run ``git diff HEAD``.
See `WhatIsTheIndex
<https://git.wiki.kernel.org/index.php/WhatIsTheIndex>`_ and
`IndexCommandQuickref
<https://git.wiki.kernel.org/index.php/IndexCommandQuickref>`_ in Git
Wiki.
ReReRe
======
Rerere is a mechanism that helps to resolve repeated merge conflicts.
The most frequent source of recurring merge conflicts are topic
branches that are merged into mainline and then the merge commits are
removed; that's often performed to test the topic branches and train
rerere; merge commits are removed to have clean linear history and
finish the topic branch with only one last merge commit.
Rerere works by remembering the states of tree before and after a
successful commit. That way rerere can automatically resolve conflicts
if they appear in the same files.
Rerere can be used manually with ``git rerere`` command but most often
it's used automatically. Enable rerere with these commands in a
working tree::
$ git config rerere.enabled true
$ git config rerere.autoupdate true
You don't need to turn rerere on globally - you don't want rerere in
bare repositories or single-branche repositories; you only need rerere
in repos where you often perform merges and resolve merge conflicts.
See `Rerere <https://git-scm.com/book/en/Git-Tools-Rerere>`_ in The
Book.
Database maintenance
====================
Git object database and other files/directories under ``.git`` require
periodic maintenance and cleanup. For example, commit editing left
unreferenced objects (dangling objects, in git terminology) and these
objects should be pruned to avoid collecting cruft in the DB. The
command ``git gc`` is used for maintenance. Git automatically runs
``git gc --auto`` as a part of some commands to do quick maintenance.
Users are recommended to run ``git gc --aggressive`` from time to
time; ``git help gc`` recommends to run it every few hundred
changesets; for more intensive projects it should be something like
once a week and less frequently (biweekly or monthly) for lesser
active projects.
``git gc --aggressive`` not only removes dangling objects, it also
repacks object database into indexed and better optimized pack(s); it
also packs symbolic references (branches and tags). Another way to do
it is to run ``git repack``.
There is a well-known `message
<https://gcc.gnu.org/ml/gcc/2007-12/msg00165.html>`_ from Linus
Torvalds regarding "stupidity" of ``git gc --aggressive``. The message
can safely be ignored now. It is old and outdated, ``git gc
--aggressive`` became much better since that time.
For those who still prefer ``git repack`` over ``git gc --aggressive``
the recommended parameters are ``git repack -a -d -f --depth=20
--window=250``. See `this detailed experiment
<http://vcscompare.blogspot.ru/2008/06/git-repack-parameters.html>`_
for explanation of the effects of these parameters.
From time to time run ``git fsck [--strict]`` to verify integrity of
the database. ``git fsck`` may produce a list of dangling objects;
that's not an error, just a reminder to perform regular maintenance.
Tips and tricks
===============
Command-line options and arguments
----------------------------------
`git help cli
<https://www.kernel.org/pub/software/scm/git/docs/gitcli.html>`_
recommends not to combine short options/flags. Most of the times
combining works: ``git commit -av`` works perfectly, but there are
situations when it doesn't. E.g., ``git log -p -5`` cannot be combined
as ``git log -p5``.
Some options have arguments, some even have default arguments. In that
case the argument for such option must be spelled in a sticky way:
``-Oarg``, never ``-O arg`` because for an option that has a default
argument the latter means "use default value for option ``-O`` and
pass ``arg`` further to the option parser". For example, ``git grep``
has an option ``-O`` that passes a list of names of the found files to
a program; default program for ``-O`` is a pager (usually ``less``),
but you can use your editor::
$ git grep -Ovim # but not -O vim
BTW, if git is instructed to use ``less`` as the pager (i.e., if pager
is not configured in git at all it uses ``less`` by default, or if it
gets ``less`` from GIT_PAGER or PAGER environment variables, or if it
was configured with ``git config --global core.pager less``, or
``less`` is used in the command ``git grep -Oless``) ``git grep``
passes ``+/$pattern`` option to ``less`` which is quite convenient.
Unfortunately, ``git grep`` doesn't pass the pattern if the pager is
not exactly ``less``, even if it's ``less`` with parameters (something
like ``git config --global core.pager less -FRSXgimq``); fortunately,
``git grep -Oless`` always passes the pattern.
bash/zsh completion
-------------------
It's a bit hard to type ``git rebase --interactive --preserve-merges
HEAD~5`` manually even for those who are happy to use command-line,
and this is where shell completion is of great help. Bash/zsh come
with programmable completion, often automatically installed and
enabled, so if you have bash/zsh and git installed, chances are you
are already done - just go and use it at the command-line.
If you don't have necessary bits installed, install and enable
bash_completion package. If you want to upgrade your git completion to
the latest and greatest download necessary file from `git contrib
<https://git.kernel.org/cgit/git/git.git/tree/contrib/completion>`_.
Git-for-windows comes with git-bash for which bash completion is
installed and enabled.
bash/zsh prompt
---------------
For command-line lovers shell prompt can carry a lot of useful
information. To include git information in the prompt use
`git-prompt.sh
<https://git.kernel.org/cgit/git/git.git/tree/contrib/completion/git-prompt.sh>`_.
Read the detailed instructions in the file.
Search the Net for "git prompt" to find other prompt variants.
git on server
=============
The simplest way to publish a repository or a group of repositories is
``git daemon``. The daemon provides anonymous access, by default it is
read-only. The repositories are accessible by git protocol (git://
URLs). Write access can be enabled but the protocol lacks any
authentication means, so it should be enabled only within a trusted
LAN. See ``git help daemon`` for details.
Git over ssh provides authentication and repo-level authorisation as
repositories can be made user- or group-writeable (see parameter
``core.sharedRepository`` in ``git help config``). If that's too
permissive or too restrictive for some project's needs there is a
wrapper `gitolite <http://gitolite.com/gitolite/index.html>`_ that can
be configured to allow access with great granularity; gitolite is
written in Perl and has a lot of documentation.
Web interface to browse repositories can be created using `gitweb
<https://git.kernel.org/cgit/git/git.git/tree/gitweb>`_ or `cgit
<http://git.zx2c4.com/cgit/about/>`_. Both are CGI scripts (written in
Perl and C). In addition to web interface both provide read-only dumb
http access for git (http(s):// URLs).
There are also more advanced web-based development environments that
include ability to manage users, groups and projects; private,
group-accessible and public repositories; they often include issue
trackers, wiki pages, pull requests and other tools for development
and communication. Among these environments are `Kallithea
<https://kallithea-scm.org/>`_ and `pagure <https://pagure.io/>`_,
both are written in Python; pagure was written by Fedora developers
and is being used to develop some Fedora projects. `Gogs
<http://gogs.io/>`_ is written in Go; there is a fork `Gitea
<http://gitea.io/>`_.
And last but not least, `Gitlab <https://about.gitlab.com/>`_. It's
perhaps the most advanced web-based development environment for git.
Written in Ruby, community edition is free and open source (MIT
license).
From Mercurial to git
=====================
There are many tools to convert Mercurial repositories to git. The
most famous are, probably, `hg-git <https://hg-git.github.io/>`_ and
`fast-export <http://repo.or.cz/w/fast-export.git>`_ (many years ago
it was known under the name ``hg2git``).
But a better tool, perhaps the best, is `git-remote-hg
<https://github.com/felipec/git-remote-hg>`_. It provides transparent
bidirectional (pull and push) access to Mercurial repositories from
git. Its author wrote a `comparison of alternatives
<https://github.com/felipec/git/wiki/Comparison-of-git-remote-hg-alternatives>`_
that seems to be mostly objective.
To use git-remote-hg, install or clone it, add to your PATH (or copy
script ``git-remote-hg`` to a directory that's already in PATH) and
prepend ``hg::`` to Mercurial URLs. For example::
$ git clone https://github.com/felipec/git-remote-hg.git
$ PATH=$PATH:"`pwd`"/git-remote-hg
$ git clone hg::https://hg.python.org/peps/ PEPs
To work with the repository just use regular git commands including
``git fetch/pull/push``.
To start converting your Mercurial habits to git see the page
`Mercurial for Git users
<https://mercurial.selenic.com/wiki/GitConcepts>`_ at Mercurial wiki.
At the second half of the page there is a table that lists
corresponding Mercurial and git commands. Should work perfectly in
both directions.
Python Developer's Guide also has a chapter `Mercurial for git
developers <https://docs.python.org/devguide/gitdevs.html>`_ that
documents a few differences between git and hg.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
vim: set fenc=us-ascii tw=70 :

View File

@ -3,7 +3,7 @@ Title: Function Signature Object
Version: $Revision$
Last-Modified: $Date$
Author: Brett Cannon <brett@python.org>, Jiwon Seo <seojiwon@gmail.com>,
Yury Selivanov <yselivanov@sprymix.com>, Larry Hastings <larry@hastings.org>
Yury Selivanov <yury@magic.io>, Larry Hastings <larry@hastings.org>
Status: Final
Type: Standards Track
Content-Type: text/x-rst

View File

@ -3,7 +3,7 @@ Title: Core development workflow automation for CPython
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Deferred
Status: Withdrawn
Type: Process
Content-Type: text/x-rst
Requires: 474
@ -23,11 +23,15 @@ experience for other contributors that are reliant on the core team to get
their changes incorporated.
PEP Deferral
============
PEP Withdrawal
==============
This PEP is currently deferred pending acceptance or rejection of the
Kallithea-based forge.python.org proposal in PEP 474.
This PEP has been `withdrawn by the author
<https://mail.python.org/pipermail/core-workflow/2015-October/000227.html>`_
in favour of the GitLab based proposal in PEP 507.
If anyone else would like to take over championing this PEP, contact the
`core-workflow mailing list <https://mail.python.org/mailman/listinfo/core-workflow>`_
Rationale for changes to the core development workflow

View File

@ -3,7 +3,7 @@ Title: Creating forge.python.org
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Draft
Status: Withdrawn
Type: Process
Content-Type: text/x-rst
Created: 19-Jul-2014
@ -23,6 +23,17 @@ This PEP does *not* propose any changes to the core development workflow
for CPython itself (see PEP 462 in relation to that).
PEP Withdrawal
==============
This PEP has been `withdrawn by the author
<https://mail.python.org/pipermail/core-workflow/2015-October/000227.html>`_
in favour of the GitLab based proposal in PEP 507.
If anyone else would like to take over championing this PEP, contact the
`core-workflow mailing list <https://mail.python.org/mailman/listinfo/core-workflow>`_
Proposal
========

View File

@ -7,6 +7,7 @@ Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 28-August-2014
Resolution: https://mail.python.org/pipermail/python-dev/2014-October/136676.html
Abstract
========

View File

@ -66,7 +66,7 @@ Features for 3.5
* PEP 479, change StopIteration handling inside generators
* PEP 484, the typing module, a new standard for type annotations
* PEP 485, math.isclose(), a function for testing approximate equality
* PEP 486, making the Widnows Python launcher aware of virtual environments
* PEP 486, making the Windows Python launcher aware of virtual environments
* PEP 488, eliminating .pyo files
* PEP 489, a new and improved mechanism for loading extension modules
* PEP 492, coroutines with async and await syntax

View File

@ -2,7 +2,7 @@ PEP: 492
Title: Coroutines with async and await syntax
Version: $Revision$
Last-Modified: $Date$
Author: Yury Selivanov <yselivanov@sprymix.com>
Author: Yury Selivanov <yury@magic.io>
Discussions-To: <python-dev@python.org>
Status: Final
Type: Standards Track
@ -125,7 +125,7 @@ Key properties of *coroutines*:
* Internally, two new code object flags were introduced:
- ``CO_COROUTINE`` is used to mark *native coroutines*
(defined with new syntax.)
(defined with new syntax).
- ``CO_ITERABLE_COROUTINE`` is used to make *generator-based
coroutines* compatible with *native coroutines* (set by
@ -139,7 +139,7 @@ Key properties of *coroutines*:
such behavior requires a future import (see PEP 479).
* When a *coroutine* is garbage collected, a ``RuntimeWarning`` is
raised if it was never awaited on (see also `Debugging Features`_.)
raised if it was never awaited on (see also `Debugging Features`_).
* See also `Coroutine objects`_ section.
@ -199,7 +199,7 @@ can be one of:
internally, coroutines are a special kind of generators, every
``await`` is suspended by a ``yield`` somewhere down the chain of
``await`` calls (please refer to PEP 3156 for a detailed
explanation.)
explanation).
To enable this behavior for coroutines, a new magic method called
``__await__`` is added. In asyncio, for instance, to enable *Future*
@ -222,7 +222,7 @@ can be one of:
It is a ``SyntaxError`` to use ``await`` outside of an ``async def``
function (like it is a ``SyntaxError`` to use ``yield`` outside of
``def`` function.)
``def`` function).
It is a ``TypeError`` to pass anything other than an *awaitable* object
to an ``await`` expression.
@ -918,7 +918,7 @@ There is no use of ``await`` names in CPython.
``async`` is mostly used by asyncio. We are addressing this by
renaming ``async()`` function to ``ensure_future()`` (see `asyncio`_
section for details.)
section for details).
Another use of ``async`` keyword is in ``Lib/xml/dom/xmlbuilder.py``,
to define an ``async = False`` attribute for ``DocumentLS`` class.
@ -970,7 +970,7 @@ PEP 3152 by Gregory Ewing proposes a different mechanism for coroutines
2. A new keyword ``cocall`` to call a *cofunction*. Can only be used
inside a *cofunction*. Maps to ``await`` in this proposal (with
some differences, see below.)
some differences, see below).
3. It is not possible to call a *cofunction* without a ``cocall``
keyword.

View File

@ -19,7 +19,7 @@ items.
.. Small features may be added up to the first beta
release. Bugs may be fixed until the final release,
which is planned for September 2015.
which is planned for December 2016.
Release Manager and Crew
@ -31,17 +31,37 @@ Release Manager and Crew
- Documentation: Georg Brandl
3.6 Lifespan
============
3.6 will receive bugfix updates approximately every 3-6 months for
approximately 18 months. After the release of 3.7.0 final, a final
3.6 bugfix update will be released. After that, it is expected that
security updates (source only) will be released until 5 years after
the release of 3.6 final, so until approximately December 2021.
Release Schedule
================
The releases:
3.6.0 schedule
--------------
- 3.6.0 alpha 1: TBD
- 3.6.0 beta 1: TBD
- 3.6.0 candidate 1: TBD
- 3.6.0 final: TBD (late 2016?)
- 3.6 development begins: 2015-05-24
- 3.6.0 alpha 1: 2016-05-15
- 3.6.0 alpha 2: 2016-06-12
- 3.6.0 alpha 3: 2016-07-10
- 3.6.0 alpha 4: 2016-08-07
- 3.6.0 beta 1: 2016-09-07
(Beta 1 is also "feature freeze"--no new features beyond this point.)
(No new features beyond this point.)
- 3.6.0 beta 2: 2016-10-02
- 3.6.0 beta 3: 2016-10-30
- 3.6.0 beta 4: 2016-11-20
- 3.6.0 candidate 1: 2016-12-04
- 3.6.0 candidate 2 (if needed): 2016-12-11
- 3.6.0 final: 2016-12-16
Features for 3.6

BIN
pep-0495-fold-2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 10 KiB

BIN
pep-0495-gap.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

437
pep-0495-gap.svg Normal file
View File

@ -0,0 +1,437 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="150mm"
height="140mm"
viewBox="0 0 531.49606 496.06299"
id="svg14800"
version="1.1"
inkscape:version="0.91 r13725"
sodipodi:docname="pep-0495-gap.svg"
inkscape:export-filename="/Users/a/Work/peps/pep-0495-fold.png"
inkscape:export-xdpi="90"
inkscape:export-ydpi="90">
<defs
id="defs14802">
<marker
inkscape:stockid="DotM"
orient="auto"
refY="0"
refX="0"
id="DotM"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path6980"
d="m -2.5,-1 c 0,2.76 -2.24,5 -5,5 -2.76,0 -5,-2.24 -5,-5 0,-2.76 2.24,-5 5,-5 2.76,0 5,2.24 5,5 z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1pt;stroke-opacity:1"
transform="matrix(0.4,0,0,0.4,2.96,0.4)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="DiamondSstart"
orient="auto"
refY="0"
refX="0"
id="DiamondSstart"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path7010"
d="M 0,-7.0710768 -7.0710894,0 0,7.0710589 7.0710462,0 0,-7.0710768 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1pt;stroke-opacity:1"
transform="matrix(0.2,0,0,0.2,1.2,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow2Mend"
orient="auto"
refY="0"
refX="0"
id="Arrow2Mend"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path6943"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:0.625;stroke-linejoin:round;stroke-opacity:1"
d="M 8.7185878,4.0337352 -2.2072895,0.01601326 8.7185884,-4.0017078 c -1.7454984,2.3720609 -1.7354408,5.6174519 -6e-7,8.035443 z"
transform="scale(-0.6,-0.6)"
inkscape:connector-curvature="0" />
</marker>
<pattern
inkscape:collect="always"
xlink:href="#pattern15623"
id="pattern15646"
patternTransform="translate(0,2.8515625e-5)" />
<pattern
inkscape:collect="always"
xlink:href="#Strips1_1"
id="pattern15599"
patternTransform="matrix(10,0,0,10,424.80508,-468.3217)" />
<pattern
inkscape:isstock="true"
inkscape:stockid="Stripes 1:1"
id="Strips1_1"
patternTransform="translate(0,0) scale(10,10)"
height="1"
width="2"
patternUnits="userSpaceOnUse"
inkscape:collect="always">
<rect
id="rect6108"
height="2"
width="1"
y="-0.5"
x="0"
style="fill:black;stroke:none" />
</pattern>
<marker
inkscape:stockid="Arrow1Lstart"
orient="auto"
refY="0"
refX="0"
id="Arrow1Lstart"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path6916"
d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1pt;stroke-opacity:1"
transform="matrix(0.8,0,0,0.8,10,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Lend"
orient="auto"
refY="0"
refX="0"
id="Arrow1Lend"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path6919"
d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1pt;stroke-opacity:1"
transform="matrix(-0.8,0,0,-0.8,-10,0)"
inkscape:connector-curvature="0" />
</marker>
<marker
inkscape:stockid="Arrow1Mend"
orient="auto"
refY="0"
refX="0"
id="Arrow1Mend"
style="overflow:visible"
inkscape:isstock="true">
<path
id="path6925"
d="M 0,0 5,-5 -12.5,0 5,5 0,0 Z"
style="fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:#000000;stroke-width:1pt;stroke-opacity:1"
transform="matrix(-0.4,0,0,-0.4,-4,0)"
inkscape:connector-curvature="0" />
</marker>
<pattern
patternUnits="userSpaceOnUse"
width="265.19116"
height="51.983494"
patternTransform="translate(-424.80508,468.3217)"
id="pattern15596">
<path
inkscape:connector-curvature="0"
id="path15588"
d="m 0.376692,25.991752 0,-25.61506 132.218888,0 132.21889,0 0,25.61506 0,25.61505 -132.21889,0 -132.218888,0 0,-25.61505 z"
style="opacity:0.5;fill:url(#pattern15599);fill-opacity:1;stroke:#ffd640;stroke-width:0.75338399;stroke-linecap:round;stroke-miterlimit:4;stroke-dasharray:0.75338398, 0.75338398;stroke-dashoffset:0;stroke-opacity:1" />
</pattern>
<pattern
patternUnits="userSpaceOnUse"
width="213.59843"
height="36.4331"
patternTransform="translate(-0.5,1122.7283)"
id="pattern15623">
<path
inkscape:connector-curvature="0"
id="path15613"
d="m 0.5,0.5 212.59843,0 0,17.7166 -212.59843,0 z"
style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:10;stroke-opacity:1" />
<path
inkscape:connector-curvature="0"
id="path15615"
d="m 0.5,18.2166 0,17.7165 212.59843,0 0,-17.7165"
style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:10;stroke-opacity:1" />
<path
inkscape:connector-curvature="0"
id="path15617"
d="m 0.98017929,9.3247 0,-7.9105 105.47376071,0 105.47375,0 0,7.9105 0,7.9105 -105.47375,0 -105.47376071,0 0,-7.9105 z"
style="opacity:0.5;fill:#ffd744;fill-opacity:1;stroke:#ffd744;stroke-width:0.75338399;stroke-linecap:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:7.5338397;stroke-opacity:0.50196078" />
<path
inkscape:connector-curvature="0"
id="path15621"
d="m 0.98017929,27.0292 0,-8.2872 105.47376071,0 105.47375,0 0,8.2872 0,8.2872 -105.47375,0 -105.47376071,0 0,-8.2872 z"
style="opacity:0.5;fill:#326c9c;fill-opacity:1;stroke:#326c9b;stroke-width:0.75338399;stroke-linecap:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:7.5338397;stroke-opacity:0.50196078" />
</pattern>
<pattern
patternUnits="userSpaceOnUse"
width="213.59843"
height="36.433102"
patternTransform="translate(-0.5,1122.7283)"
id="pattern15643">
<rect
id="rect15629"
y="0"
x="0"
height="36.433102"
width="213.59843"
style="fill:url(#pattern15646);stroke:none" />
</pattern>
</defs>
<sodipodi:namedview
id="base"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1.0"
inkscape:pageopacity="0.0"
inkscape:pageshadow="2"
inkscape:zoom="2.8284272"
inkscape:cx="215.26543"
inkscape:cy="232.89973"
inkscape:document-units="mm"
inkscape:current-layer="layer2"
showgrid="true"
inkscape:window-width="2556"
inkscape:window-height="1555"
inkscape:window-x="1"
inkscape:window-y="0"
inkscape:window-maximized="0"
objecttolerance="10000"
showborder="false"
fit-margin-top="0"
fit-margin-left="0"
fit-margin-right="0"
fit-margin-bottom="0">
<inkscape:grid
type="xygrid"
id="grid14808"
originx="37.568003"
spacingx="17.716536"
spacingy="17.716536"
empspacing="3"
originy="-71.39131" />
</sodipodi:namedview>
<metadata
id="metadata14805">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title></dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1"
transform="translate(37.568003,-484.90789)">
<path
style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.99921262;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#Arrow1Lstart);marker-end:url(#Arrow1Lend)"
d="M 476.5503,945.88825 0,946.42873 0,521.76422"
id="path14810"
inkscape:connector-curvature="0"
sodipodi:nodetypes="ccc" />
<flowRoot
xml:space="preserve"
id="flowRoot15458"
style="font-style:normal;font-weight:normal;font-size:40px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"><flowRegion
id="flowRegion15460"><rect
id="rect15462"
width="159.44882"
height="106.29922"
x="-425.19687"
y="946.06299" /></flowRegion><flowPara
id="flowPara15464" /></flowRoot> <flowRoot
xml:space="preserve"
id="flowRoot15466"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:40px;line-height:125%;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';text-align:center;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"><flowRegion
id="flowRegion15468"><rect
id="rect15470"
width="159.44882"
height="88.58268"
x="212.59843"
y="1070.0787"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:40px;line-height:125%;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';text-align:center;writing-mode:lr-tb;text-anchor:middle" /></flowRegion><flowPara
id="flowPara15474" /></flowRoot> <flowRoot
xml:space="preserve"
id="flowRoot15480"
style="font-style:normal;font-weight:normal;font-size:40px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"><flowRegion
id="flowRegion15482"><rect
id="rect15484"
width="70.866142"
height="53.149609"
x="212.59843"
y="1105.5118" /></flowRegion><flowPara
id="flowPara15486" /></flowRoot> <flowRoot
xml:space="preserve"
id="flowRoot15488"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:22.5px;line-height:125%;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
transform="translate(270.90867,-112.71393)"><flowRegion
id="flowRegion15490"><rect
id="rect15492"
width="265.74805"
height="88.58268"
x="159.44882"
y="1070.0787"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:22.5px;line-height:125%;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';text-align:start;writing-mode:lr-tb;text-anchor:start" /></flowRegion><flowPara
id="flowPara15496">UTC</flowPara></flowRoot> <text
xml:space="preserve"
style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:22.5px;line-height:125%;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Normal';text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
x="-570.61304"
y="-20.473276"
id="text15498"
sodipodi:linespacing="125%"
transform="matrix(0,-1,1,0,0,0)"><tspan
sodipodi:role="line"
id="tspan15500"
x="-570.61304"
y="-20.473276">local</tspan></text>
<path
style="fill:none;fill-rule:evenodd;stroke:#336d9c;stroke-width:2.12598419;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
d="M 52.152923,893.91006 266.74473,679.31828"
id="path15502"
inkscape:connector-curvature="0" />
<path
style="fill:none;fill-rule:evenodd;stroke:#336d9c;stroke-width:2.12598419;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
d="M 265.74804,733.46456 425.19686,574.01574"
id="path15504"
inkscape:connector-curvature="0" />
<path
style="fill:none;fill-rule:evenodd;stroke:#336d9c;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:1, 12;stroke-dashoffset:0;stroke-opacity:1"
d="m 265.74804,680.31496 0,53.1496 z"
id="path15678"
inkscape:connector-curvature="0"
sodipodi:nodetypes="ccc" />
<text
xml:space="preserve"
style="font-style:italic;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:20px;line-height:125%;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Italic';text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
x="-17.04035"
y="703.841"
id="text16422"
sodipodi:linespacing="125%"><tspan
sodipodi:role="line"
id="tspan16424"
x="-17.04035"
y="703.841">t</tspan></text>
<text
xml:space="preserve"
style="font-style:italic;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:20px;line-height:125%;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Italic';text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
x="240.81497"
y="962.27954"
id="text16438"
sodipodi:linespacing="125%"><tspan
sodipodi:role="line"
id="tspan16440"
x="240.81497"
y="962.27954">u<tspan
style="font-size:64.99999762%;baseline-shift:sub"
id="tspan16442">0</tspan></tspan></text>
<text
xml:space="preserve"
style="font-style:italic;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:20px;line-height:125%;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Italic';text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
x="294.96457"
y="963.77954"
id="text16444"
sodipodi:linespacing="125%"><tspan
sodipodi:role="line"
id="tspan16446"
x="294.96457"
y="963.77954">u<tspan
style="font-size:64.99999762%;baseline-shift:sub"
id="tspan16448">1</tspan></tspan></text>
<path
style="fill:none;fill-rule:evenodd;stroke:#336d9c;stroke-width:7.08661413;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
d="m 212.59843,941.81299 53.14961,0"
id="path16450"
inkscape:connector-curvature="0" />
<path
style="fill:none;fill-rule:evenodd;stroke:#336d9c;stroke-width:7.08661413;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
d="m 4.2499999,733.46456 0,-53.1496"
id="path16452"
inkscape:connector-curvature="0" />
<path
style="fill:none;fill-rule:evenodd;stroke:#ffd847;stroke-width:7.08661413;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
d="m 265.74804,941.81299 53.14961,0"
id="path16454"
inkscape:connector-curvature="0" />
<text
xml:space="preserve"
style="font-style:italic;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:20px;line-height:125%;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Bold Italic';text-align:start;letter-spacing:0px;word-spacing:0px;writing-mode:lr-tb;text-anchor:start;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
x="343.96481"
y="712.6087"
id="text16458"
sodipodi:linespacing="125%"><tspan
sodipodi:role="line"
id="tspan16460"
x="343.96481"
y="712.6087">Fold</tspan></text>
<path
style="fill:none;fill-rule:evenodd;stroke:#336d9c;stroke-width:2.12598425;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;stroke-miterlimit:4;stroke-dasharray:none"
d="m 265.74804,680.31492 0,53.14961"
id="path16481"
inkscape:connector-curvature="0" />
<path
style="fill:none;fill-rule:evenodd;stroke:#ffd847;stroke-width:7.08661413;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
d="m 11.716536,733.46456 0,-53.1496"
id="path16456"
inkscape:connector-curvature="0" />
</g>
<g
inkscape:groupmode="layer"
id="layer2"
inkscape:label="Layer 2">
<path
transform="translate(37.568003,-484.90789)"
style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:6, 2;stroke-dashoffset:0;stroke-opacity:1"
d="m 0,698.03149 248.0315,0 0,247.85676"
id="path15680"
inkscape:connector-curvature="0"
sodipodi:nodetypes="ccc" />
<path
transform="translate(37.568003,-484.90789)"
style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:4, 2;stroke-dashoffset:0;stroke-opacity:1"
d="m 248.0315,698.03149 53.14961,0 0,247.85676"
id="path15682"
inkscape:connector-curvature="0"
sodipodi:nodetypes="ccc" />
<path
transform="translate(37.568003,-484.90789)"
style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.99921262;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:0.9992126, 11.99055118000000064;stroke-dashoffset:0;stroke-opacity:1"
d="m 0,680.31496 318.89765,0 0,265.57329"
id="path15566"
inkscape:connector-curvature="0"
sodipodi:nodetypes="ccc" />
<path
transform="translate(37.568003,-484.90789)"
style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:0.99921262;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:0.99921262, 7.99370097999999984;stroke-dashoffset:0;stroke-opacity:1"
d="m 212.59843,733.46456 0,212.42369"
id="path15676"
inkscape:connector-curvature="0"
sodipodi:nodetypes="cc" />
<path
transform="translate(37.568003,-484.90789)"
style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:1, 12;stroke-dashoffset:0;stroke-opacity:1"
d="m 0,733.46456 265.74804,0 0,211.88321"
id="path15552"
inkscape:connector-curvature="0"
sodipodi:nodetypes="ccc" />
</g>
</svg>

After

Width:  |  Height:  |  Size: 20 KiB

View File

@ -4,34 +4,29 @@ Version: $Revision$
Last-Modified: $Date$
Author: Alexander Belopolsky <alexander.belopolsky@gmail.com>, Tim Peters <tim.peters@gmail.com>
Discussions-To: Datetime-SIG <datetime-sig@python.org>
Status: Draft
Status: Accepted
Type: Standards Track
Content-Type: text/x-rst
Created: 02-Aug-2015
Python-Version: 3.6
Resolution: https://mail.python.org/pipermail/datetime-sig/2015-September/000900.html
Abstract
========
This PEP adds a new attribute ``fold`` to the instances of
This PEP adds a new attribute ``fold`` to instances of the
``datetime.time`` and ``datetime.datetime`` classes that can be used
to differentiate between two moments in time for which local times are
the same. The allowed values for the `fold` attribute will be 0 and 1
the same. The allowed values for the ``fold`` attribute will be 0 and 1
with 0 corresponding to the earlier and 1 to the later of the two
possible readings of an ambiguous local time.
.. sidebar:: US public service advertisement
.. image:: pep-0495-daylightsavings.png
:align: center
:width: 95%
Rationale
=========
In the most world locations there have been and will be times when
In most world locations, there have been and will be times when
local clocks are moved back. [#]_ In those times, intervals are
introduced in which local clocks show the same time twice in the same
day. In these situations, the information displayed on a local clock
@ -40,9 +35,14 @@ a particular moment in time. The proposed solution is to add an
attribute to the ``datetime`` instances taking values of 0 and 1 that
will enumerate the two ambiguous times.
.. image:: pep-0495-daylightsavings.png
:align: center
:width: 30%
.. [#] People who live in locations observing the Daylight Saving
Time (DST) move their clocks back (usually one hour) every Fall.
It is less common, but occasionally clocks can be moved back for
other reasons. For example, Ukraine skipped the spring-forward
transition in March 1990 and instead, moved their clocks back on
@ -76,11 +76,11 @@ Proposal
The "fold" attribute
--------------------
We propose adding an attribute called ``fold`` to the instances
of ``datetime.time`` and ``datetime.datetime`` classes. This attribute
should have the value 0 for all instances except those that
represent the second (chronologically) moment in time in an ambiguous
case. For those instances, the value will be 1. [#]_
We propose adding an attribute called ``fold`` to instances of the
``datetime.time`` and ``datetime.datetime`` classes. This attribute
should have the value 0 for all instances except those that represent
the second (chronologically) moment in time in an ambiguous case. For
those instances, the value will be 1. [#]_
.. [#] An instance that has ``fold=1`` in a non-ambiguous case is
said to represent an invalid time (or is invalid for short), but
@ -93,10 +93,6 @@ case. For those instances, the value will be 1. [#]_
this PEP specifies how various functions should behave when given an
invalid instance.
.. image:: pep-0495-fold.png
:align: center
:width: 60%
Affected APIs
-------------
@ -121,15 +117,23 @@ Methods
The ``replace()`` methods of the ``datetime.time`` and
``datetime.datetime`` classes will get a new keyword-only argument
called ``fold``. It will
behave similarly to the other ``replace()`` arguments: if the ``fold``
argument is specified and given a value 0 or 1, the new instance
returned by ``replace()`` will have its ``fold`` attribute set
to that value. In CPython, any non-integer value of ``fold`` will
raise a ``TypeError``, but other implementations may allow the value
``None`` to behave the same as when ``fold`` is not given. If the
``fold`` argument is not specified, the original value of the ``fold``
attribute is copied to the result.
called ``fold``. It will behave similarly to the other ``replace()``
arguments: if the ``fold`` argument is specified and given a value 0
or 1, the new instance returned by ``replace()`` will have its
``fold`` attribute set to that value. In CPython, any non-integer
value of ``fold`` will raise a ``TypeError``, but other
implementations may allow the value ``None`` to behave the same as
when ``fold`` is not given. [#]_ (This is
a nod to the existing difference in treatment of ``None`` arguments
in other positions of this method across Python implementations;
it is not intended to leave the door open for future alternative
interpretation of ``fold=None``.) If the ``fold`` argument is not
specified, the original value of the ``fold`` attribute is copied to
the result.
.. [#] PyPy and pure Python implementation distributed with CPython
already allow ``None`` to mean "no change to existing
attribute" for all other attributes in ``replace()``.
C-API
.....
@ -137,14 +141,14 @@ C-API
Access macros will be defined to extract the value of ``fold`` from
``PyDateTime_DateTime`` and ``PyDateTime_Time`` objects.
.. code::
.. code::
int PyDateTime_GET_FOLD(PyDateTime_DateTime *o)
Return the value of ``fold`` as a C ``int``.
.. code::
.. code::
int PyDateTime_TIME_GET_FOLD(PyDateTime_Time *o)
Return the value of ``fold`` as a C ``int``.
@ -155,14 +159,17 @@ instance:
.. code::
PyObject* PyDateTime_FromDateAndTimeAndFold(int year, int month, int day, int hour, int minute, int second, int usecond, int fold)
PyObject* PyDateTime_FromDateAndTimeAndFold(
int year, int month, int day, int hour, int minute,
int second, int usecond, int fold)
Return a ``datetime.datetime`` object with the specified year, month,
day, hour, minute, second, microsecond and fold.
.. code::
PyObject* PyTime_FromTimeAndFold(int hour, int minute, int second, int usecond, int fold)
PyObject* PyTime_FromTimeAndFold(
int hour, int minute, int second, int usecond, int fold)
Return a ``datetime.time`` object with the specified hour, minute,
second, microsecond and fold.
@ -174,18 +181,23 @@ Affected Behaviors
What time is it?
................
The ``datetime.now()`` method called with no arguments, will set
The ``datetime.now()`` method called without arguments will set
``fold=1`` when returning the second of the two ambiguous times in a
system local time fold. When called with a ``tzinfo`` argument, the
value of the ``fold`` will be determined by the ``tzinfo.fromutc()``
implementation. If an instance of the ``datetime.timezone`` class
(*e.g.* ``datetime.timezone.utc``) is passed as ``tzinfo``, the
implementation. When an instance of the ``datetime.timezone`` class
(the stdlib's fixed-offset ``tzinfo`` subclass,
*e.g.* ``datetime.timezone.utc``) is passed as ``tzinfo``, the
returned datetime instance will always have ``fold=0``.
The ``datetime.utcnow()`` method is unaffected.
Conversion from naive to aware
..............................
A new feature is proposed to facilitate conversion from naive datetime
instances to aware.
The ``astimezone()`` method will now work for naive ``self``. The
system local timezone will be assumed in this case and the ``fold``
flag will be used to determine which local timezone is in effect
@ -199,6 +211,11 @@ For example, on a system set to US/Eastern timezone::
>>> dt.replace(fold=1).astimezone().strftime('%D %T %Z%z')
'11/02/14 01:30:00 EST-0500'
An implication is that ``datetime.now(tz)`` is fully equivalent to
``datetime.now().astimezone(tz)`` (assuming ``tz`` is an instance of a
post-PEP ``tzinfo`` implementation, i.e. one that correctly handles
and sets ``fold``).
Conversion from POSIX seconds from EPOCH
........................................
@ -227,8 +244,10 @@ time, there are two values ``s0`` and ``s1`` such that::
datetime.fromtimestamp(s0) == datetime.fromtimestamp(s1) == dt
(This is because ``==`` disregards the value of fold -- see below.)
In this case, ``dt.timestamp()`` will return the smaller of ``s0``
and ``s1`` values if ``dt.fold == True`` and the larger otherwise.
and ``s1`` values if ``dt.fold == 0`` and the larger otherwise.
For example, on a system set to US/Eastern timezone::
@ -238,7 +257,6 @@ For example, on a system set to US/Eastern timezone::
>>> datetime(2014, 11, 2, 1, 30, fold=1).timestamp()
1414909800.0
When a ``datetime.datetime`` instance ``dt`` represents a missing
time, there is no value ``s`` for which::
@ -254,6 +272,8 @@ is always the same as the offset right after the gap.
The value returned by ``dt.timestamp()`` given a missing
``dt`` will be the greater of the two "nice to know" values
if ``dt.fold == 0`` and the smaller otherwise.
(This is not a typo -- it's intentionally backwards from the rule for
ambiguous times.)
For example, on a system set to US/Eastern timezone::
@ -270,13 +290,14 @@ Users of pre-PEP implementations of ``tzinfo`` will not see any
changes in the behavior of their aware datetime instances. Two such
instances that differ only by the value of the ``fold`` attribute will
not be distinguishable by any means other than an explicit access to
the ``fold`` value.
the ``fold`` value. (This is because these pre-PEP implementations
are not using the ``fold`` attribute.)
On the other hand, if object's ``tzinfo`` is set to a fold-aware
implementation, then the value of ``fold`` will affect the result of
several methods but only if the corresponding time is in a fold or in
a gap: ``utcoffset()``, ``dst()``, ``tzname()``, ``astimezone()``,
``strftime()`` (if "%Z" or "%z" directive is used in the format
On the other hand, if an object's ``tzinfo`` is set to a fold-aware
implementation, then in a fold or gap the value of ``fold`` will
affect the result of several methods:
``utcoffset()``, ``dst()``, ``tzname()``, ``astimezone()``,
``strftime()`` (if the "%Z" or "%z" directive is used in the format
specification), ``isoformat()``, and ``timetuple()``.
@ -293,16 +314,21 @@ The ``datetime.datetime.time()`` method will copy the value of the
Pickles
.......
The value of the fold attribute will only be saved in pickles created
with protocol version 4 (introduced in Python 3.4) or greater.
Pickle sizes for the ``datetime.datetime`` and ``datetime.time``
objects will not change. The ``fold`` value will be encoded in the
first bit of the 5th byte of the ``datetime.datetime`` pickle payload
or the 2nd byte of the datetime.time. In the `current implementation`_
these bytes are used to store minute value (0-59) and the first bit is
always 0. (This change only affects pickle format. In the C
implementation, the ``fold`` attribute will get a full byte to store its
value.)
first bit of the 3rd byte of the ``datetime.datetime``
pickle payload; and in the first bit of the 1st byte of the
``datetime.time`` payload. In the `current implementation`_
these bytes are used to store the month (1-12) and hour (0-23) values
and the first bit is always 0. We picked these bytes because they are
the only bytes that are checked by the current unpickle code. Thus
loading post-PEP ``fold=1`` pickles in a pre-PEP Python will result in
an exception rather than an instance with out of range components.
.. _current implementation: https://hg.python.org/cpython/file/d3b20bff9c5d/Include/datetime.h#l17
.. _current implementation: https://hg.python.org/cpython/file/v3.5.0/Include/datetime.h#l10
Implementations of tzinfo in the Standard Library
@ -312,13 +338,16 @@ No new implementations of ``datetime.tzinfo`` abstract class are
proposed in this PEP. The existing (fixed offset) timezones do
not introduce ambiguous local times and their ``utcoffset()``
implementation will return the same constant value as they do now
regardless of the value of ``fold``.
regardless of the value of ``fold``.
The basic implementation of ``fromutc()`` in the abstract
``datetime.tzinfo`` class will not change. It is currently not
used anywhere in the stdlib because the only included ``tzinfo``
implementation (the ``datetime.timzeone`` class implementing fixed
offset timezones) override ``fromutc()``.
``datetime.tzinfo`` class will not change. It is currently not used
anywhere in the stdlib because the only included ``tzinfo``
implementation (the ``datetime.timezone`` class implementing fixed
offset timezones) overrides ``fromutc()``. Keeping the default
implementation unchanged has the benefit that pre-PEP 3rd party
implementations that inherit the default ``fromutc()`` are not
accidentally affected.
Guidelines for New tzinfo Implementations
@ -337,16 +366,102 @@ methods should ignore the value of ``fold`` unless they are called on
the ambiguous or missing times.
In the DST Fold
---------------
In the Fold
-----------
New subclasses should override the base-class ``fromutc()`` method and
implement it so that in all cases where two UTC times ``u1`` and
``u2`` (``u1`` <``u2``) correspond to the same local time
``fromutc(u1)`` will return an instance with ``fold=0`` and
``fromutc(u2)`` will return an instance with ``fold=1``. In all
implement it so that in all cases where two different UTC times ``u0`` and
``u1`` (``u0`` <``u1``) correspond to the same local time ``t``,
``fromutc(u0)`` will return an instance with ``fold=0`` and
``fromutc(u1)`` will return an instance with ``fold=1``. In all
other cases the returned instance should have ``fold=0``.
The ``utcoffset()``, ``tzname()`` and ``dst()`` methods should use the
value of the fold attribute to determine whether an otherwise
ambiguous time ``t`` corresponds to the time before or after the
transition. By definition, ``utcoffset()`` is greater before and
smaller after any transition that creates a fold. The values returned
by ``tzname()`` and ``dst()`` may or may not depend on the value of
the ``fold`` attribute depending on the kind of the transition.
.. image:: pep-0495-fold-2.png
:align: center
:width: 60%
The sketch above illustrates the relationship between the UTC and
local time around a fall-back transition. The zig-zag line is a graph
of the function implemented by ``fromutc()``. Two intervals on the
UTC axis adjacent to the transition point and having the size of the
time shift at the transition are mapped to the same interval on the
local axis. New implementations of ``fromutc()`` method should set
the fold attribute to 1 when ``self`` is in the region marked in
yellow on the UTC axis. (All intervals should be treated as closed on
the left and open on the right.)
Mind the Gap
------------
The ``fromutc()`` method should never produce a time in the gap.
If the ``utcoffset()``, ``tzname()`` or ``dst()`` method is called on a
local time that falls in a gap, the rules in effect before the
transition should be used if ``fold=0``. Otherwise, the rules in
effect after the transition should be used.
.. image:: pep-0495-gap.png
:align: center
:width: 60%
The sketch above illustrates the relationship between the UTC and
local time around a spring-forward transition. At the transition, the
local clock is advanced skipping the times in the gap. For the
purposes of determining the values of ``utcoffset()``, ``tzname()``
and ``dst()``, the line before the transition is extended forward to
find the UTC time corresponding to the time in the gap with ``fold=0``
and for instances with ``fold=1``, the line after the transition is
extended back.
Summary of Rules at a Transition
--------------------------------
On ambiguous/missing times ``utcoffset()`` should return values
according to the following table:
+-----------------+----------------+-----------------------------+
| | fold=0 | fold=1 |
+=================+================+=============================+
| Fold | oldoff | newoff = oldoff - delta |
+-----------------+----------------+-----------------------------+
| Gap | oldoff | newoff = oldoff + delta |
+-----------------+----------------+-----------------------------+
where ``oldoff`` (``newoff``) is the UTC offset before (after) the
transition and ``delta`` is the absolute size of the fold or the gap.
Note that the interpretation of the fold attribute is consistent in
the fold and gap cases. In both cases, ``fold=0`` (``fold=1``) means
use ``fromutc()`` line before (after) the transition to find the UTC
time. Only in the "Fold" case, the UTC times ``u0`` and ``u1`` are
"real" solutions for the equation ``fromutc(u) == t``, while in the
"Gap" case they are "imaginary" solutions.
The DST Transitions
-------------------
On a missing time introduced at the start of DST, the values returned
by ``utcoffset()`` and ``dst()`` methods should be as follows
+-----------------+----------------+------------------+
| | fold=0 | fold=1 |
+=================+================+==================+
| utcoffset() | stdoff | stdoff + dstoff |
+-----------------+----------------+------------------+
| dst() | zero | dstoff |
+-----------------+----------------+------------------+
On an ambiguous time introduced at the end of DST, the values returned
by ``utcoffset()`` and ``dst()`` methods should be as follows
@ -363,61 +478,101 @@ DST correction (typically ``dstoff = timedelta(hours=1)``) and ``zero
= timedelta(0)``.
Mind the DST Gap
----------------
Temporal Arithmetic and Comparison Operators
============================================
On a missing time introduced at the start of DST, the values returned
by ``utcoffset()`` and ``dst()`` methods should be as follows
.. epigraph::
+-----------------+----------------+------------------+
| | fold=0 | fold=1 |
+=================+================+==================+
| utcoffset() | stdoff | stdoff + dstoff |
+-----------------+----------------+------------------+
| dst() | zero | dstoff |
+-----------------+----------------+------------------+
| In *mathematicks* he was greater
| Than Tycho Brahe, or Erra Pater:
| For he, by geometric scale,
| Could take the size of pots of ale;
| Resolve, by sines and tangents straight,
| If bread or butter wanted weight,
| And wisely tell what hour o' th' day
| The clock does strike by algebra.
-- "Hudibras" by Samuel Butler
Non-DST Folds and Gaps
----------------------
On ambiguous/missing times introduced by the change in the standard time
offset, the ``dst()`` method should return the same value regardless of
the value of ``fold`` and the ``utcoffset()`` should return values
according to the following table:
+-----------------+----------------+-----------------------------+
| | fold=0 | fold=1 |
+=================+================+=============================+
| ambiguous | oldoff | newoff = oldoff - delta |
+-----------------+----------------+-----------------------------+
| missing | oldoff | newoff = oldoff + delta |
+-----------------+----------------+-----------------------------+
where ``delta`` is the size of the fold or the gap.
Temporal Arithmetic
===================
The value of "fold" will be ignored in all operations except those
that involve conversion between timezones. [#]_ As a consequence,
The value of the ``fold`` attribute will be ignored in all operations
with naive datetime instances. As a consequence, naive
``datetime.datetime`` or ``datetime.time`` instances that differ only
by the value of ``fold`` will compare as equal. Applications that
need to differentiate between such instances should check the value of
``fold`` or convert them to a timezone that does not have ambiguous
times.
``fold`` explicitly or convert those instances to a timezone that does
not have ambiguous times (such as UTC).
The result of addition (subtraction) of a timedelta to (from) a
datetime will always have ``fold`` set to 0 even if the
The value of ``fold`` will also be ignored whenever a timedelta is
added to or subtracted from a datetime instance which may be either
aware or naive. The result of addition (subtraction) of a timedelta
to (from) a datetime will always have ``fold`` set to 0 even if the
original datetime instance had ``fold=1``.
.. [#] Computing a difference between two aware datetime instances
with different values of ``tzinfo`` involves an implicit timezone
conversion. In this case, the result may depend on the value of
the ``fold`` attribute in either of the instances, but only if the
instance has ``tzinfo`` that accounts for the value of ``fold``
in its ``utcoffset()`` method.
No changes are proposed to the way the difference ``t - s`` is
computed for datetime instances ``t`` and ``s``. If both instances
are naive or ``t.tzinfo`` is the same instance as ``s.tzinfo``
(``t.tzinfo is s.tzinfo`` evaluates to ``True``) then ``t - s`` is a
timedelta ``d`` such that ``s + d == t``. As explained in the
previous paragraph, timedelta addition ignores both ``fold`` and
``tzinfo`` attributes and so does intra-zone or naive datetime
subtraction.
Naive and intra-zone comparisons will ignore the value of ``fold`` and
return the same results as they do now. (This is the only way to
preserve backward compatibility. If you need an aware intra-zone
comparison that uses the fold, convert both sides to UTC first.)
The inter-zone subtraction will be defined as it is now: ``t - s`` is
computed as ``(t - t.utcoffset()) - (s -
s.utcoffset()).replace(tzinfo=t.tzinfo)``, but the result will
depend on the values of ``t.fold`` and ``s.fold`` when either
``t.tzinfo`` or ``s.tzinfo`` is post-PEP. [#]_
.. [#] Note that the new rules may result in a paradoxical situation
when ``s == t`` but ``s - u != t - u``. Such paradoxes are
not really new and are inherent in the overloading of the minus
operator differently for intra- and inter-zone operations. For
example, one can easily construct datetime instances ``t`` and ``s``
with some variable offset ``tzinfo`` and a datetime ``u`` with
``tzinfo=timezone.utc`` such that ``(t - u) - (s - u) != t - s``.
The explanation for this paradox is that the minuses inside the
parentheses and the two other minuses are really three different
operations: inter-zone datetime subtraction, timedelta subtraction,
and intra-zone datetime subtraction, which each have the mathematical
properties of subtraction separately, but not when combined in a
single expression.
Aware datetime Equality Comparison
----------------------------------
The aware datetime comparison operators will work the same as they do
now, with results indirectly affected by the value of ``fold`` whenever
the ``utcoffset()`` value of one of the operands depends on it, with one
exception. Whenever one or both of the operands in inter-zone comparison is
such that its ``utcoffset()`` depends on the value of its ``fold``
fold attribute, the result is ``False``. [#]_
.. [#] This exception is designed to preserve the hash and equivalence
invariants in the face of paradoxes of inter-zone arithmetic.
Formally, ``t == s`` when ``t.tzinfo is s.tzinfo`` evaluates to
``False`` can be defined as follows. Let ``toutc(t, fold)`` be a
function that takes an aware datetime instance ``t`` and returns a
naive instance representing the same time in UTC assuming a given
value of ``fold``:
.. code::
def toutc(t, fold):
u = t - t.replace(fold=fold).utcoffset()
return u.replace(tzinfo=None)
Then ``t == s`` is equivalent to
.. code::
toutc(t, fold=0) == toutc(t, fold=1) == toutc(s, fold=0) == toutc(s, fold=1)
Backward and Forward Compatibility
@ -467,7 +622,7 @@ A non-technical answer
between fold=0 and fold=1 when I set it for tomorrow 01:30 AM.
What should I do?
* Alice: I've never hear of a Py-O-Clock, but I guess fold=0 is
the first 01:30 AM and fold=1 is the second.
the first 01:30 AM and fold=1 is the second.
A technical reason
@ -538,13 +693,12 @@ The following alternative names have also been considered:
**repeated**
Did not receive any support on the mailing list.
**ltdf**
(Local Time Disambiguation Flag) - short and no-one will attempt
to guess what it means without reading the docs. (Feel free to
use it in discussions with the meaning ltdf=False is the
earlier if you don't want to endorse any of the alternatives
above.)
to guess what it means without reading the docs. (This abbreviation
was used in PEP discussions with the meaning ``ltdf=False`` is the
earlier by those who didn't want to endorse any of the alternatives.)
.. _original: https://mail.python.org/pipermail/python-dev/2015-April/139099.html
.. _independently proposed: https://mail.python.org/pipermail/datetime-sig/2015-August/000479.html
@ -585,7 +739,7 @@ such program because ``astimezone()`` does not currently work with
naive datetimes.
This leaves us with only one situation where an existing program can
start producing diferent results after the implementation of this PEP:
start producing different results after the implementation of this PEP:
when a ``datetime.timestamp()`` method is called on a naive datetime
instance that happen to be in the fold or the gap. In the current
implementation, the result is undefined. Depending on the system
@ -638,13 +792,13 @@ hemisphere (where DST is in effect in June) one can get
Note that 12:00 was interpreted as 13:00 by ``mktime``. With the
``datetime.timestamp``, ``datetime.fromtimestamp``, it is currently
guaranteed that
guaranteed that
.. code::
>>> t = datetime.datetime(2015, 6, 1, 12).timestamp()
>>> datetime.datetime.fromtimestamp(t)
datetime.datetime(2015, 6, 1, 12, 0)
datetime.datetime(2015, 6, 1, 12, 0)
This PEP extends the same guarantee to both values of ``fold``:
@ -652,13 +806,13 @@ This PEP extends the same guarantee to both values of ``fold``:
>>> t = datetime.datetime(2015, 6, 1, 12, fold=0).timestamp()
>>> datetime.datetime.fromtimestamp(t)
datetime.datetime(2015, 6, 1, 12, 0)
datetime.datetime(2015, 6, 1, 12, 0)
.. code::
>>> t = datetime.datetime(2015, 6, 1, 12, fold=1).timestamp()
>>> datetime.datetime.fromtimestamp(t)
datetime.datetime(2015, 6, 1, 12, 0)
datetime.datetime(2015, 6, 1, 12, 0)
Thus one of the suggested uses for ``fold=-1`` -- to match the legacy
behavior -- is not needed. Either choice of ``fold`` will match the
@ -708,7 +862,7 @@ implement any desired behavior in a few lines of code.
Implementation
==============
* Github fork: https://github.com/abalkin/cpython
* Github fork: https://github.com/abalkin/cpython/tree/issue24773-s3
* Tracker issue: http://bugs.python.org/issue24773

View File

@ -8,7 +8,7 @@ Type: Standards Track
Content-Type: text/x-rst
Created: 01-Aug-2015
Python-Version: 3.6
Post-History: 07-Aug-2015, 30-Aug-2015, 04-Sep-2015
Post-History: 07-Aug-2015, 30-Aug-2015, 04-Sep-2015, 19-Sep-2015
Resolution: https://mail.python.org/pipermail/python-dev/2015-September/141526.html
Abstract
@ -173,8 +173,7 @@ In source code, f-strings are string literals that are prefixed by the
letter 'f' or 'F'. Everywhere this PEP uses 'f', 'F' may also be
used. 'f' may be combined with 'r', in either order, to produce raw
f-string literals. 'f' may not be combined with 'b': this PEP does not
propose to add binary f-strings. 'f' may also be combined with 'u', in
either order, although adding 'u' has no effect.
propose to add binary f-strings. 'f' may not be combined with 'u'.
When tokenizing source files, f-strings use the same rules as normal
strings, raw strings, binary strings, and triple quoted strings. That
@ -198,9 +197,14 @@ expressions. Expressions appear within curly braces ``'{'`` and
expressions are evaluated, formatted with the existing __format__
protocol, then the results are concatenated together with the string
literals. While scanning the string for expressions, any doubled
braces ``'{{'`` or ``'}}'`` are replaced by the corresponding single
brace. Doubled opening braces do not signify the start of an
expression.
braces ``'{{'`` or ``'}}'`` inside literal portions of an f-string are
replaced by the corresponding single brace. Doubled opening braces do
not signify the start of an expression.
Note that ``__format__()`` is not called directly on each value. The
actual code uses the equivalent of ``type(value).__format__(value,
format_spec)``, or ``format(value, format_spec)``. See the
documentation of the builtin ``format()`` function for more details.
Comments, using the ``'#'`` character, are not allowed inside an
expression.
@ -210,7 +214,7 @@ specified. The allowed conversions are ``'!s'``, ``'!r'``, or
``'!a'``. These are treated the same as in ``str.format()``: ``'!s'``
calls ``str()`` on the expression, ``'!r'`` calls ``repr()`` on the
expression, and ``'!a'`` calls ``ascii()`` on the expression. These
conversions are applied before the call to ``__format__``. The only
conversions are applied before the call to ``format()``. The only
reason to use ``'!s'`` is if you want to specify a format specifier
that applies to ``str``, not to the type of the expression.
@ -221,11 +225,11 @@ not provided, an empty string is used.
So, an f-string looks like::
f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } text ... '
f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } <text> ... '
The resulting expression's ``__format__`` method is called with the
format specifier. The resulting value is used when building the value
of the f-string.
The expression is then formatted using the ``__format__`` protocol,
using the format specifier as an argument. The resulting value is
used when building the value of the f-string.
Expressions cannot contain ``':'`` or ``'!'`` outside of strings or
parentheses, brackets, or braces. The exception is that the ``'!='``
@ -290,11 +294,11 @@ mechanism that ``str.format()`` uses to convert values to strings.
For example, this code::
f'abc{expr1:spec1}{expr2!r:spec2}def{expr3:!s}ghi'
f'abc{expr1:spec1}{expr2!r:spec2}def{expr3}ghi'
Might be be evaluated as::
'abc' + expr1.__format__(spec1) + repr(expr2).__format__(spec2) + 'def' + str(spec3).__format__('') + 'ghi'
'abc' + format(expr1, spec1) + format(repr(expr2), spec2) + 'def' + format(expr3) + 'ghi'
Expression evaluation
---------------------
@ -372,7 +376,15 @@ yields the value::
While the exact method of this run time concatenation is unspecified,
the above code might evaluate to::
'ab' + x.__format__('') + '{c}' + 'str<' + y.__format__('^4') + 'de'
'ab' + format(x) + '{c}' + 'str<' + format(y, '^4') + '>de'
Each f-string is entirely evaluated before being concatenated to
adjacent f-strings. That means that this::
>>> f'{x' f'}'
Is a syntax error, because the first f-string does not contain a
closing brace.
Error handling
--------------
@ -386,15 +398,13 @@ Unmatched braces::
>>> f'x={x'
File "<stdin>", line 1
SyntaxError: missing '}' in format string expression
SyntaxError: f-string: expecting '}'
Invalid expressions::
>>> f'x={!x}'
File "<fstring>", line 1
!x
^
SyntaxError: invalid syntax
File "<stdin>", line 1
SyntaxError: f-string: empty expression not allowed
Run time errors occur when evaluating the expressions inside an
f-string. Note that an f-string can be evaluated multiple times, and
@ -425,7 +435,8 @@ Leading and trailing whitespace in expressions is ignored
---------------------------------------------------------
For ease of readability, leading and trailing whitespace in
expressions is ignored.
expressions is ignored. This is a by-product of enclosing the
expression in parentheses before evaluation.
Evaluation order of expressions
-------------------------------
@ -577,8 +588,8 @@ Triple-quoted f-strings
Triple quoted f-strings are allowed. These strings are parsed just as
normal triple-quoted strings are. After parsing and decoding, the
normal f-string logic is applied, and ``__format__()`` on each value
is called.
normal f-string logic is applied, and ``__format__()`` is called on
each value.
Raw f-strings
-------------
@ -653,6 +664,14 @@ If you feel you must use lambdas, they may be used inside of parentheses::
>>> f'{(lambda x: x*2)(3)}'
'6'
Can't combine with 'u'
--------------------------
The 'u' prefix was added to Python 3.3 in PEP 414 as a means to ease
source compatibility with Python 2.7. Because Python 2.7 will never
support f-strings, there is nothing to be gained by being able to
combine the 'f' prefix with 'u'.
Examples from Python's source code
==================================

View File

@ -5,12 +5,12 @@ Version: $Revision$
Last-Modified: $Date$
Author: Alexander Belopolsky <alexander.belopolsky@gmail.com>, Tim Peters <tim.peters@gmail.com>
Discussions-To: Datetime-SIG <datetime-sig@python.org>
Status: Draft
Status: Rejected
Type: Standards Track
Content-Type: text/x-rst
Requires: 495
Created: 08-Aug-2015
Resolution: https://mail.python.org/pipermail/datetime-sig/2015-August/000354.html
Abstract
========

View File

@ -1,44 +1,46 @@
PEP: 502
Title: String Interpolation Redux
Title: String Interpolation - Extended Discussion
Version: $Revision$
Last-Modified: $Date$
Author: Mike G. Miller
Status: Draft
Type: Standards Track
Type: Informational
Content-Type: text/x-rst
Created: 10-Aug-2015
Python-Version: 3.6
Note: Open issues below are stated with a question mark (?),
and are therefore searchable.
Abstract
========
This proposal describes a new string interpolation feature for Python,
called an *expression-string*,
that is both concise and powerful,
improves readability in most cases,
yet does not conflict with existing code.
PEP 498: *Literal String Interpolation*, which proposed "formatted strings" was
accepted September 9th, 2015.
Additional background and rationale given during its design phase is detailed
below.
To recap that PEP,
a string prefix was introduced that marks the string as a template to be
rendered.
These formatted strings may contain one or more expressions
built on `the existing syntax`_ of ``str.format()``.
The formatted string expands at compile-time into a conventional string format
operation,
with the given expressions from its text extracted and passed instead as
positional arguments.
To achieve this end,
a new string prefix is introduced,
which expands at compile-time into an equivalent expression-string object,
with requested variables from its context passed as keyword arguments.
At runtime,
the new object uses these passed values to render a string to given
specifications, building on `the existing syntax`_ of ``str.format()``::
the resulting expressions are evaluated to render a string to given
specifications::
>>> location = 'World'
>>> e'Hello, {location} !' # new prefix: e''
'Hello, World !' # interpolated result
>>> f'Hello, {location} !' # new prefix: f''
'Hello, World !' # interpolated result
Format-strings may be thought of as merely syntactic sugar to simplify traditional
calls to ``str.format()``.
.. _the existing syntax: https://docs.python.org/3/library/string.html#format-string-syntax
This PEP does not recommend to remove or deprecate any of the existing string
formatting mechanisms.
Motivation
==========
@ -50,12 +52,16 @@ In comparison to other dynamic scripting languages
with similar use cases,
the amount of code necessary to build similar strings is substantially higher,
while at times offering lower readability due to verbosity, dense syntax,
or identifier duplication. [1]_
or identifier duplication.
These difficulties are described at moderate length in the original
`post to python-ideas`_
that started the snowball (that became PEP 498) rolling. [1]_
Furthermore, replacement of the print statement with the more consistent print
function of Python 3 (PEP 3105) has added one additional minor burden,
an additional set of parentheses to type and read.
Combined with the verbosity of current formatting solutions,
Combined with the verbosity of current string formatting solutions,
this puts an otherwise simple language at an unfortunate disadvantage to its
peers::
@ -66,7 +72,7 @@ peers::
# Python 3, str.format with named parameters
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(**locals()))
# Python 3, variation B, worst case
# Python 3, worst case
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(user=user,
id=id,
hostname=
@ -74,7 +80,7 @@ peers::
In Python, the formatting and printing of a string with multiple variables in a
single line of code of standard width is noticeably harder and more verbose,
indentation often exacerbating the issue.
with indentation exacerbating the issue.
For use cases such as smaller projects, systems programming,
shell script replacements, and even one-liners,
@ -82,36 +88,17 @@ where message formatting complexity has yet to be encapsulated,
this verbosity has likely lead a significant number of developers and
administrators to choose other languages over the years.
.. _post to python-ideas: https://mail.python.org/pipermail/python-ideas/2015-July/034659.html
Rationale
=========
Naming
------
The term expression-string was chosen because other applicable terms,
such as format-string and template are already well used in the Python standard
library.
The string prefix itself, ``e''`` was chosen to demonstrate that the
specification enables expressions,
is not limited to ``str.format()`` syntax,
and also does not lend itself to `the shorthand term`_ "f-string".
It is also slightly easier to type than other choices such as ``_''`` and
``i''``,
while perhaps `less odd-looking`_ to C-developers.
``printf('')`` vs. ``print(f'')``.
.. _the shorthand term: reference_needed
.. _less odd-looking: https://mail.python.org/pipermail/python-dev/2015-August/141147.html
Goals
-------------
The design goals of expression-strings are as follows:
The design goals of format strings are as follows:
#. Eliminate need to pass variables manually.
#. Eliminate repetition of identifiers and redundant parentheses.
@ -133,40 +120,44 @@ Python specified both single (``'``) and double (``"``) ASCII quote
characters to enclose strings.
It is not reasonable to choose one of them now to enable interpolation,
while leaving the other for uninterpolated strings.
"Backtick" characters (`````) are also `constrained by history`_ as a shortcut
for ``repr()``.
Other characters,
such as the "Backtick" (or grave accent `````) are also
`constrained by history`_
as a shortcut for ``repr()``.
This leaves a few remaining options for the design of such a feature:
* An operator, as in printf-style string formatting via ``%``.
* A class, such as ``string.Template()``.
* A function, such as ``str.format()``.
* New syntax
* A method or function, such as ``str.format()``.
* New syntax, or
* A new string prefix marker, such as the well-known ``r''`` or ``u''``.
The first three options above currently work well.
The first three options above are mature.
Each has specific use cases and drawbacks,
yet also suffer from the verbosity and visual noise mentioned previously.
All are discussed in the next section.
All options are discussed in the next sections.
.. _constrained by history: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html
Background
-------------
This proposal builds on several existing techniques and proposals and what
Formatted strings build on several existing techniques and proposals and what
we've collectively learned from them.
In keeping with the design goals of readability and error-prevention,
the following examples therefore use named,
not positional arguments.
The following examples focus on the design goals of readability and
error-prevention using named parameters.
Let's assume we have the following dictionary,
and would like to print out its items as an informative string for end users::
>>> params = {'user': 'nobody', 'id': 9, 'hostname': 'darkstar'}
Printf-style formatting
'''''''''''''''''''''''
Printf-style formatting, via operator
'''''''''''''''''''''''''''''''''''''
This `venerable technique`_ continues to have its uses,
such as with byte-based protocols,
@ -178,7 +169,7 @@ and familiarity to many programmers::
In this form, considering the prerequisite dictionary creation,
the technique is verbose, a tad noisy,
and relatively readable.
yet relatively readable.
Additional issues are that an operator can only take one argument besides the
original string,
meaning multiple parameters must be passed in a tuple or dictionary.
@ -190,8 +181,8 @@ or forget the trailing type, e.g. (``s`` or ``d``).
.. _venerable technique: https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
string.Template
'''''''''''''''
string.Template Class
'''''''''''''''''''''
The ``string.Template`` `class from`_ PEP 292
(Simpler String Substitutions)
@ -202,7 +193,7 @@ that finds its main use cases in shell and internationalization tools::
Template('Hello, user: $user, id: ${id}, on host: $hostname').substitute(params)
Also verbose, however the string itself is readable.
While also verbose, the string itself is readable.
Though functionality is limited,
it meets its requirements well.
It isn't powerful enough for many cases,
@ -232,8 +223,8 @@ and likely contributed to the PEP's lack of acceptance.
It was superseded by the following proposal.
str.format()
''''''''''''
str.format() Method
'''''''''''''''''''
The ``str.format()`` `syntax of`_ PEP 3101 is the most recent and modern of the
existing options.
@ -253,36 +244,32 @@ string literals::
host=hostname)
'Hello, user: nobody, id: 9, on host: darkstar'
The verbosity of the method-based approach is illustrated here.
.. _syntax of: https://docs.python.org/3/library/string.html#format-string-syntax
PEP 498 -- Literal String Formatting
''''''''''''''''''''''''''''''''''''
PEP 498 discusses and delves partially into implementation details of
expression-strings,
which it calls f-strings,
the idea and syntax
(with exception of the prefix letter)
of which is identical to that discussed here.
The resulting compile-time transformation however
returns a string joined from parts at runtime,
rather than an object.
It also, somewhat controversially to those first exposed to it,
introduces the idea that these strings shall be augmented with support for
arbitrary expressions,
which is discussed further in the following sections.
PEP 498 defines and discusses format strings,
as also described in the `Abstract`_ above.
It also, somewhat controversially to those first exposed,
introduces the idea that format-strings shall be augmented with support for
arbitrary expressions.
This is discussed further in the
Restricting Syntax section under
`Rejected Ideas`_.
PEP 501 -- Translation ready string interpolation
'''''''''''''''''''''''''''''''''''''''''''''''''
The complimentary PEP 501 brings internationalization into the discussion as a
first-class concern, with its proposal of i-strings,
first-class concern, with its proposal of the i-prefix,
``string.Template`` syntax integration compatible with ES6 (Javascript),
deferred rendering,
and a similar object return value.
and an object return value.
Implementations in Other Languages
@ -374,7 +361,8 @@ ES6 (Javascript)
Designers of `Template strings`_ faced the same issue as Python where single
and double quotes were taken.
Unlike Python however, "backticks" were not.
They were chosen as part of the ECMAScript 2015 (ES6) standard::
Despite `their issues`_,
they were chosen as part of the ECMAScript 2015 (ES6) standard::
console.log(`Fifteen is ${a + b} and\nnot ${2 * a + b}.`);
@ -391,8 +379,10 @@ as the tag::
* User implemented prefixes supported.
* Arbitrary expressions are supported.
.. _their issues: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html
.. _Template strings: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
C#, Version 6
'''''''''''''
@ -428,13 +418,14 @@ Arbitrary `interpolation under Swift`_ is available on all strings::
Additional examples
'''''''''''''''''''
A number of additional examples may be `found at Wikipedia`_.
A number of additional examples of string interpolation may be
`found at Wikipedia`_.
Now that background and history have been covered,
let's continue on for a solution.
.. _found at Wikipedia: https://en.wikipedia.org/wiki/String_interpolation#Examples
Now that background and imlementation history have been covered,
let's continue on for a solution.
New Syntax
----------
@ -442,178 +433,47 @@ New Syntax
This should be an option of last resort,
as every new syntax feature has a cost in terms of real-estate in a brain it
inhabits.
There is one alternative left on our list of possibilities,
There is however one alternative left on our list of possibilities,
which follows.
New String Prefix
-----------------
Given the history of string formatting in Python,
backwards-compatibility,
Given the history of string formatting in Python and backwards-compatibility,
implementations in other languages,
and the avoidance of new syntax unless necessary,
avoidance of new syntax unless necessary,
an acceptable design is reached through elimination
rather than unique insight.
Therefore, we choose to explicitly mark interpolated string literals with a
string prefix.
Therefore, marking interpolated string literals with a string prefix is chosen.
We also choose an expression syntax that reuses and builds on the strongest of
We also choose an expression syntax that reuses and builds on the strongest of
the existing choices,
``str.format()`` to avoid further duplication.
Specification
=============
String literals with the prefix of ``e`` shall be converted at compile-time to
the construction of an ``estr`` (perhaps ``types.ExpressionString``?) object.
Strings and values are parsed from the literal and passed as tuples to the
constructor::
``str.format()`` to avoid further duplication of functionality::
>>> location = 'World'
>>> e'Hello, {location} !'
>>> f'Hello, {location} !' # new prefix: f''
'Hello, World !' # interpolated result
# becomes
# estr('Hello, {location} !', # template
('Hello, ', ' !'), # string fragments
('location',), # expressions
('World',), # values
)
The object interpolates its result immediately at run-time::
'Hello, World !'
PEP 498 -- Literal String Formatting, delves into the mechanics and
implementation of this design.
ExpressionString Objects
------------------------
The ExpressionString object supports both immediate and deferred rendering of
its given template and parameters.
It does this by immediately rendering its inputs to its internal string and
``.rendered`` string member (still necessary?),
useful in the majority of use cases.
To allow for deferred rendering and caller-specified escaping,
all inputs are saved for later inspection,
with convenience methods available.
Notes:
* Inputs are saved to the object as ``.template`` and ``.context`` members
for later use.
* No explicit ``str(estr)`` call is necessary to render the result,
though doing so might be desired to free resources if significant.
* Additional or deferred rendering is available through the ``.render()``
method, which allows template and context to be overriden for flexibility.
* Manual escaping of potentially dangerous input is available through the
``.escape(escape_function)`` method,
the rules of which may therefore be specified by the caller.
The given function should both accept and return a single modified string.
* A sample Python implementation can `found at Bitbucket`_:
.. _found at Bitbucket: https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_demo.py
Inherits From ``str`` Type
'''''''''''''''''''''''''''
Inheriting from the ``str`` class is one of the techniques available to improve
compatibility with code expecting a string object,
as it will pass an ``isinstance(obj, str)`` test.
ExpressionString implements this and also renders its result into the "raw"
string of its string superclass,
providing compatibility with a majority of code.
Interpolation Syntax
--------------------
The strongest of the existing string formatting syntaxes is chosen,
``str.format()`` as a base to build on. [10]_ [11]_
..
* Additionally, single arbitrary expressions shall also be supported inside
braces as an extension::
>>> e'My age is {age + 1} years.'
See below for section on safety.
* Triple quoted strings with multiple lines shall be supported::
>>> e'''Hello,
{location} !'''
'Hello,\n World !'
* Adjacent implicit concatenation shall be supported;
interpolation does not `not bleed into`_ other strings::
>>> 'Hello {1, 2, 3} ' e'{location} !'
'Hello {1, 2, 3} World !'
* Additional implementation details,
for example expression and error-handling,
are specified in the compatible PEP 498.
.. _not bleed into: https://mail.python.org/pipermail/python-ideas/2015-July/034763.html
Composition with Other Prefixes
-------------------------------
* Expression-strings apply to unicode objects only,
therefore ``u''`` is never needed.
Should it be prevented?
* Bytes objects are not included here and do not compose with e'' as they
do not support ``__format__()``.
* Complimentary to raw strings,
backslash codes shall not be converted in the expression-string,
when combined with ``r''`` as ``re''``.
Examples
--------
A more complicated example follows::
n = 5; # t0, t1 = … TODO
a = e"Sliced {n} onions in {t1-t0:.3f} seconds."
# returns the equvalent of
estr("Sliced {n} onions in {t1-t0:.3f} seconds", # template
('Sliced ', ' onions in ', ' seconds'), # strings
('n', 't1-t0:.3f'), # expressions
(5, 0.555555) # values
)
With expressions only::
b = e"Three random numbers: {rand()}, {rand()}, {rand()}."
# returns the equvalent of
estr("Three random numbers: {rand():f}, {rand():f}, {rand():}.", # template
('Three random numbers: ', ', ', ', ', '.'), # strings
('rand():f', 'rand():f', 'rand():f'), # expressions
(rand(), rand(), rand()) # values
)
Additional Topics
=================
Safety
-----------
In this section we will describe the safety situation and precautions taken
in support of expression-strings.
in support of format-strings.
#. Only string literals shall be considered here,
#. Only string literals have been considered for format-strings,
not variables to be taken as input or passed around,
making external attacks difficult to accomplish.
* ``str.format()`` `already handles`_ this use-case.
* Direct instantiation of the ExpressionString object with non-literal input
shall not be allowed. (Practicality?)
``str.format()`` and alternatives `already handle`_ this use-case.
#. Neither ``locals()`` nor ``globals()`` are necessary nor used during the
transformation,
@ -622,37 +482,72 @@ in support of expression-strings.
#. To eliminate complexity as well as ``RuntimeError`` (s) due to recursion
depth, recursive interpolation is not supported.
#. Restricted characters or expression classes?, such as ``=`` for assignment.
However,
mistakes or malicious code could be missed inside string literals.
Though that can be said of code in general,
that these expressions are inside strings means they are a bit more likely
to be obscured.
.. _already handles: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html
.. _already handle: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html
Mitigation via tools
Mitigation via Tools
''''''''''''''''''''
The idea is that tools or linters such as pyflakes, pylint, or Pycharm,
could check inside strings for constructs that exceed project policy.
As this is a common task with languages these days,
tools won't have to implement this feature solely for Python,
may check inside strings with expressions and mark them up appropriately.
As this is a common task with programming languages today,
multi-language tools won't have to implement this feature solely for Python,
significantly shortening time to implementation.
Additionally the Python interpreter could check(?) and warn with appropriate
command-line parameters passed.
Farther in the future,
strings might also be checked for constructs that exceed the safety policy of
a project.
Style Guide/Precautions
-----------------------
As arbitrary expressions may accomplish anything a Python expression is
able to,
it is highly recommended to avoid constructs inside format-strings that could
cause side effects.
Further guidelines may be written once usage patterns and true problems are
known.
Reference Implementation(s)
---------------------------
The `say module on PyPI`_ implements string interpolation as described here
with the small burden of a callable interface::
pip install say
from say import say
nums = list(range(4))
say("Nums has {len(nums)} items: {nums}")
A Python implementation of Ruby interpolation `is also available`_.
It uses the codecs module to do its work::
pip install interpy
# coding: interpy
location = 'World'
print("Hello #{location}.")
.. _say module on PyPI: https://pypi.python.org/pypi/say/
.. _is also available: https://github.com/syrusakbary/interpy
Backwards Compatibility
-----------------------
By using existing syntax and avoiding use of current or historical features,
expression-strings (and any associated sub-features),
were designed so as to not interfere with existing code and is not expected
to cause any issues.
By using existing syntax and avoiding current or historical features,
format strings were designed so as to not interfere with existing code and are
not expected to cause any issues.
Postponed Ideas
@ -666,20 +561,12 @@ Though it was highly desired to integrate internationalization support,
the finer details diverge at almost every point,
making a common solution unlikely: [15]_
* Use-cases
* Compile and run-time tasks
* Interpolation Syntax
* Use-cases differ
* Compile vs. run-time tasks
* Interpolation syntax needs
* Intended audience
* Security policy
Rather than try to fit a "square peg in a round hole,"
this PEP attempts to allow internationalization to be supported in the future
by not preventing it.
In this proposal,
expression-string inputs are saved for inspection and re-rendering at a later
time,
allowing for their use by an external library of any sort.
Rejected Ideas
--------------
@ -687,18 +574,25 @@ Rejected Ideas
Restricting Syntax to ``str.format()`` Only
'''''''''''''''''''''''''''''''''''''''''''
This was deemed not enough of a solution to the problem.
It can be seen in the `Implementations in Other Languages`_ section that the
developer community at large tends to agree.
The common `arguments against`_ support of arbitrary expresssions were:
The common `arguments against`_ arbitrary expresssions were:
#. YAGNI, "You ain't gonna need it."
#. The change is not congruent with historical Python conservatism.
#. `YAGNI`_, "You aren't gonna need it."
#. The feature is not congruent with historical Python conservatism.
#. Postpone - can implement in a future version if need is demonstrated.
.. _YAGNI: https://en.wikipedia.org/wiki/You_aren't_gonna_need_it
.. _arguments against: https://mail.python.org/pipermail/python-ideas/2015-August/034913.html
Support of only ``str.format()`` syntax however,
was deemed not enough of a solution to the problem.
Often a simple length or increment of an object, for example,
is desired before printing.
It can be seen in the `Implementations in Other Languages`_ section that the
developer community at large tends to agree.
String interpolation with arbitrary expresssions is becoming an industry
standard in modern languages due to its utility.
Additional/Custom String-Prefixes
'''''''''''''''''''''''''''''''''
@ -720,7 +614,7 @@ this was thought to create too much uncertainty of when and where string
expressions could be used safely or not.
The concept was also difficult to describe to others. [12]_
Always consider expression-string variables to be unescaped,
Always consider format string variables to be unescaped,
unless the developer has explicitly escaped them.
@ -735,33 +629,13 @@ and looking too much like bash/perl,
which could encourage bad habits. [13]_
Reference Implementation(s)
===========================
An expression-string implementation is currently attached to PEP 498,
under the ``f''`` prefix,
and may be available in nightly builds.
A Python implementation of Ruby interpolation `is also available`_,
which is similar to this proposal.
It uses the codecs module to do its work::
pip install interpy
# coding: interpy
location = 'World'
print("Hello #{location}.")
.. _is also available: https://github.com/syrusakbary/interpy
Acknowledgements
================
* Eric V. Smith for providing invaluable implementation work and design
opinions, helping to focus this PEP.
* Others on the python-ideas mailing list for rejecting the craziest of ideas,
also helping to achieve focus.
* Eric V. Smith for the authoring and implementation of PEP 498.
* Everyone on the python-ideas mailing list for rejecting the various crazy
ideas that came up,
helping to keep the final design in focus.
References
@ -771,7 +645,6 @@ References
(https://mail.python.org/pipermail/python-ideas/2015-July/034659.html)
.. [2] Briefer String Format
(https://mail.python.org/pipermail/python-ideas/2015-July/034669.html)

View File

@ -5,11 +5,12 @@ Last-Modified: $Date$
Author: Donald Stufft <donald@stufft.io>
BDFL-Delegate: Donald Stufft <donald@stufft.io>
Discussions-To: distutils-sig@python.org
Status: Draft
Status: Accepted
Type: Informational
Content-Type: text/x-rst
Created: 04-Sep-2015
Post-History: 04-Sep-2015
Resolution: https://mail.python.org/pipermail/distutils-sig/2015-September/026899.html
Abstract
@ -91,6 +92,10 @@ In addition to the above, the following constraints are placed on the API:
associated signature, the signature would be located at
``/packages/HolyGrail-1.0.tar.gz.asc``.
* A repository **MAY** include a ``data-gpg-sig`` attribute on a file link with
a value of either ``true`` or ``false`` to indicate whether or not there is a
GPG signature. Repositories that do this **SHOULD** include it on every link.
Normalized Names
----------------

396
pep-0504.txt Normal file
View File

@ -0,0 +1,396 @@
PEP: 504
Title: Using the System RNG by default
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Withdrawn
Type: Standards Track
Content-Type: text/x-rst
Created: 15-Sep-2015
Python-Version: 3.6
Post-History: 15-Sep-2015
Abstract
========
Python currently defaults to using the deterministic Mersenne Twister random
number generator for the module level APIs in the ``random`` module, requiring
users to know that when they're performing "security sensitive" work, they
should instead switch to using the cryptographically secure ``os.urandom`` or
``random.SystemRandom`` interfaces or a third party library like
``cryptography``.
Unfortunately, this approach has resulted in a situation where developers that
aren't aware that they're doing security sensitive work use the default module
level APIs, and thus expose their users to unnecessary risks.
This isn't an acute problem, but it is a chronic one, and the often long
delays between the introduction of security flaws and their exploitation means
that it is difficult for developers to naturally learn from experience.
In order to provide an eventually pervasive solution to the problem, this PEP
proposes that Python switch to using the system random number generator by
default in Python 3.6, and require developers to opt-in to using the
deterministic random number generator process wide either by using a new
``random.ensure_repeatable()`` API, or by explicitly creating their own
``random.Random()`` instance.
To minimise the impact on existing code, module level APIs that require
determinism will implicitly switch to the deterministic PRNG.
PEP Withdrawal
==============
During discussion of this PEP, Steven D'Aprano proposed the simpler alternative
of offering a standardised ``secrets`` module that provides "one obvious way"
to handle security sensitive tasks like generating default passwords and other
tokens.
Steven's proposal has the desired effect of aligning the easy way to generate
such tokens and the right way to generate them, without introducing any
compatibility risks for the existing ``random`` module API, so this PEP has
been withdrawn in favour of further work on refining Steven's proposal as
PEP 506.
Proposal
========
Currently, it is never correct to use the module level functions in the
``random`` module for security sensitive applications. This PEP proposes to
change that admonition in Python 3.6+ to instead be that it is not correct to
use the module level functions in the ``random`` module for security sensitive
applications if ``random.ensure_repeatable()`` is ever called (directly or
indirectly) in that process.
To achieve this, rather than being bound methods of a ``random.Random``
instance as they are today, the module level callables in ``random`` would
change to be functions that delegate to the corresponding method of the
existing ``random._inst`` module attribute.
By default, this attribute will be bound to a ``random.SystemRandom`` instance.
A new ``random.ensure_repeatable()`` API will then rebind the ``random._inst``
attribute to a ``system.Random`` instance, restoring the same module level
API behaviour as existed in previous Python versions (aside from the
additional level of indirection)::
def ensure_repeatable():
"""Switch to using random.Random() for the module level APIs
This switches the default RNG instance from the crytographically
secure random.SystemRandom() to the deterministic random.Random(),
enabling the seed(), getstate() and setstate() operations. This means
a particular random scenario can be replayed later by providing the
same seed value or restoring a previously saved state.
NOTE: Libraries implementing security sensitive operations should
always explicitly use random.SystemRandom() or os.urandom in order to
correctly handle applications that call this function.
"""
if not isinstance(_inst, Random):
_inst = random.Random()
To minimise the impact on existing code, calling any of the following module
level functions will implicitly call ``random.ensure_repeatable()``:
* ``random.seed``
* ``random.getstate``
* ``random.setstate``
There are no changes proposed to the ``random.Random`` or
``random.SystemRandom`` class APIs - applications that explicitly instantiate
their own random number generators will be entirely unaffected by this
proposal.
Warning on implicit opt-in
--------------------------
In Python 3.6, implicitly opting in to the use of the deterministic PRNG will
emit a deprecation warning using the following check::
if not isinstance(_inst, Random):
warnings.warn(DeprecationWarning,
"Implicitly ensuring repeatability. "
"See help(random.ensure_repeatable) for details")
ensure_repeatable()
The specific wording of the warning should have a suitable answer added to
Stack Overflow as was done for the custom error message that was added for
missing parentheses in a call to print [#print]_.
In the first Python 3 release after Python 2.7 switches to security fix only
mode, the deprecation warning will be upgraded to a RuntimeWarning so it is
visible by default.
This PEP does *not* propose ever removing the ability to ensure the default RNG
used process wide is a deterministic PRNG that will produce the same series of
outputs given a specific seed. That capability is widely used in modelling
and simulation scenarios, and requiring that ``ensure_repeatable()`` be called
either directly or indirectly is a sufficient enhancement to address the cases
where the module level random API is used for security sensitive tasks in web
applications without due consideration for the potential security implications
of using a deterministic PRNG.
Performance impact
------------------
Due to the large performance difference between ``random.Random`` and
``random.SystemRandom``, applications ported to Python 3.6 will encounter a
significant performance regression in cases where:
* the application is using the module level random API
* cryptographic quality randomness isn't needed
* the application doesn't already implicitly opt back in to the deterministic
PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate``
* the application isn't updated to explicitly call ``random.ensure_repeatable``
This would be noted in the Porting section of the Python 3.6 What's New guide,
with the recommendation to include the following code in the ``__main__``
module of affected applications::
if hasattr(random, "ensure_repeatable"):
random.ensure_repeatable()
Applications that do need cryptographic quality randomness should be using the
system random number generator regardless of speed considerations, so in those
cases the change proposed in this PEP will fix a previously latent security
defect.
Documentation changes
---------------------
The ``random`` module documentation would be updated to move the documentation
of the ``seed``, ``getstate`` and ``setstate`` interfaces later in the module,
along with the documentation of the new ``ensure_repeatable`` function and the
associated security warning.
That section of the module documentation would also gain a discussion of the
respective use cases for the deterministic PRNG enabled by
``ensure_repeatable`` (games, modelling & simulation, software testing) and the
system RNG that is used by default (cryptography, security token generation).
This discussion will also recommend the use of third party security libraries
for the latter task.
Rationale
=========
Writing secure software under deadline and budget pressures is a hard problem.
This is reflected in regular notifications of data breaches involving personally
identifiable information [#breaches]_, as well as with failures to take
security considerations into account when new systems, like motor vehicles
[#uconnect]_, are connected to the internet. It's also the case that a lot of
the programming advice readily available on the internet [#search] simply
doesn't take the mathemetical arcana of computer security into account.
Compounding these issues is the fact that defenders have to cover *all* of
their potential vulnerabilites, as a single mistake can make it possible to
subvert other defences [#bcrypt]_.
One of the factors that contributes to making this last aspect particularly
difficult is APIs where using them inappropriately creates a *silent* security
failure - one where the only way to find out that what you're doing is
incorrect is for someone reviewing your code to say "that's a potential
security problem", or for a system you're responsible for to be compromised
through such an oversight (and you're not only still responsible for that
system when it is compromised, but your intrusion detection and auditing
mechanisms are good enough for you to be able to figure out after the event
how the compromise took place).
This kind of situation is a significant contributor to "security fatigue",
where developers (often rightly [#owasptopten]_) feel that security engineers
spend all their time saying "don't do that the easy way, it creates a
security vulnerability".
As the designers of one of the world's most popular languages [#ieeetopten]_,
we can help reduce that problem by making the easy way the right way (or at
least the "not wrong" way) in more circumstances, so developers and security
engineers can spend more time worrying about mitigating actually interesting
threats, and less time fighting with default language behaviours.
Discussion
==========
Why "ensure_repeatable" over "ensure_deterministic"?
----------------------------------------------------
This is a case where the meaning of a word as specialist jargon conflicts with
the typical meaning of the word, even though it's *technically* the same.
From a technical perspective, a "deterministic RNG" means that given knowledge
of the algorithm and the current state, you can reliably compute arbitrary
future states.
The problem is that "deterministic" on its own doesn't convey those qualifiers,
so it's likely to instead be interpreted as "predictable" or "not random" by
folks that are familiar with the conventional meaning, but aren't familiar with
the additional qualifiers on the technical meaning.
A second problem with "deterministic" as a description for the traditional RNG
is that it doesn't really tell you what you can *do* with the traditional RNG
that you can't do with the system one.
"ensure_repeatable" aims to address both of those problems, as its common
meaning accurately describes the main reason for preferring the deterministic
PRNG over the system RNG: ensuring you can repeat the same series of outputs
by providing the same seed value, or by restoring a previously saved PRNG state.
Only changing the default for Python 3.6+
-----------------------------------------
Some other recent security changes, such as upgrading the capabilities of the
``ssl`` module and switching to properly verifying HTTPS certificates by
default, have been considered critical enough to justify backporting the
change to all currently supported versions of Python.
The difference in this case is one of degree - the additional benefits from
rolling out this particular change a couple of years earlier than will
otherwise be the case aren't sufficient to justify either the additional effort
or the stability risks involved in making such an intrusive change in a
maintenance release.
Keeping the module level functions
----------------------------------
In additional to general backwards compatibility considerations, Python is
widely used for educational purposes, and we specifically don't want to
invalidate the wide array of educational material that assumes the availabilty
of the current ``random`` module API. Accordingly, this proposal ensures that
most of the public API can continue to be used not only without modification,
but without generating any new warnings.
Warning when implicitly opting in to the deterministic RNG
----------------------------------------------------------
It's necessary to implicitly opt in to the deterministic PRNG as Python is
widely used for modelling and simulation purposes where this is the right
thing to do, and in many cases, these software models won't have a dedicated
maintenance team tasked with ensuring they keep working on the latest versions
of Python.
Unfortunately, explicitly calling ``random.seed`` with data from ``os.urandom``
is also a mistake that appears in a number of the flawed "how to generate a
security token in Python" guides readily available online.
Using first DeprecationWarning, and then eventually a RuntimeWarning, to
advise against implicitly switching to the deterministic PRNG aims to
nudge future users that need a cryptographically secure RNG away from
calling ``random.seed()`` and those that genuinely need a deterministic
generator towards explicitily calling ``random.ensure_repeatable()``.
Avoiding the introduction of a userspace CSPRNG
-----------------------------------------------
The original discussion of this proposal on python-ideas[#csprng]_ suggested
introducing a cryptographically secure pseudo-random number generator and using
that by default, rather than defaulting to the relatively slow system random
number generator.
The problem [#nocsprng]_ with this approach is that it introduces an additional
point of failure in security sensitive situations, for the sake of applications
where the random number generation may not even be on a critical performance
path.
Applications that do need cryptographic quality randomness should be using the
system random number generator regardless of speed considerations, so in those
cases.
Isn't the deterministic PRNG "secure enough"?
---------------------------------------------
In a word, "No" - that's why there's a warning in the module documentation
that says not to use it for security sensitive purposes. While we're not
currently aware of any studies of Python's random number generator specifically,
studies of PHP's random number generator [#php]_ have demonstrated the ability
to use weaknesses in that subsystem to facilitate a practical attack on
password recovery tokens in popular PHP web applications.
However, one of the rules of secure software development is that "attacks only
get better, never worse", so it may be that by the time Python 3.6 is released
we will actually see a practical attack on Python's deterministic PRNG publicly
documented.
Security fatigue in the Python ecosystem
----------------------------------------
Over the past few years, the computing industry as a whole has been
making a concerted effort to upgrade the shared network infrastructure we all
depend on to a "secure by default" stance. As one of the most widely used
programming languages for network service development (including the OpenStack
Infrastructure-as-a-Service platform) and for systems administration
on Linux systems in general, a fair share of that burden has fallen on the
Python ecosystem, which is understandably frustrating for Pythonistas using
Python in other contexts where these issues aren't of as great a concern.
This consideration is one of the primary factors driving the substantial
backwards compatibility improvements in this proposal relative to the initial
draft concept posted to python-ideas [#draft]_.
Acknowledgements
================
* Theo de Raadt, for making the suggestion to Guido van Rossum that we
seriously consider defaulting to a cryptographically secure random number
generator
* Serhiy Storchaka, Terry Reedy, Petr Viktorin, and anyone else in the
python-ideas threads that suggested the approach of transparently switching
to the ``random.Random`` implementation when any of the functions that only
make sense for a deterministic RNG are called
* Nathaniel Smith for providing the reference on practical attacks against
PHP's random number generator when used to generate password reset tokens
* Donald Stufft for pursuing additional discussions with network security
experts that suggested the introduction of a userspace CSPRNG would mean
additional complexity for insufficient gain relative to just using the
system RNG directly
* Paul Moore for eloquently making the case for the current level of security
fatigue in the Python ecosystem
References
==========
.. [#breaches] Visualization of data breaches involving more than 30k records (each)
(http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/)
.. [#uconnect] Remote UConnect hack for Jeep Cherokee
(http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/)
.. [#php] PRNG based attack against password reset tokens in PHP applications
(https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf)
.. [#search] Search link for "python password generator"
(https://www.google.com.au/search?q=python+password+generator)
.. [#csprng] python-ideas thread discussing using a userspace CSPRNG
(https://mail.python.org/pipermail/python-ideas/2015-September/035886.html)
.. [#draft] Initial draft concept that eventually became this PEP
(https://mail.python.org/pipermail/python-ideas/2015-September/036095.html)
.. [#nocsprng] Safely generating random numbers
(http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/)
.. [#ieeetopten] IEEE Spectrum 2015 Top Ten Programming Languages
(http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages)
.. [#owasptopten] OWASP Top Ten Web Security Issues for 2013
(https://www.owasp.org/index.php/OWASP_Top_Ten_Project#tab=OWASP_Top_10_for_2013)
.. [#print] Stack Overflow answer for missing parentheses in call to print
(http://stackoverflow.com/questions/25445439/what-does-syntaxerror-missing-parentheses-in-call-to-print-mean-in-python/25445440#25445440)
.. [#bcrypt] Bypassing bcrypt through an insecure data cache
(http://arstechnica.com/security/2015/09/once-seen-as-bulletproof-11-million-ashley-madison-passwords-already-cracked/)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

205
pep-0505.txt Normal file
View File

@ -0,0 +1,205 @@
PEP: 505
Title: None coalescing operators
Version: $Revision$
Last-Modified: $Date$
Author: Mark E. Haase <mehaase@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Sep-2015
Python-Version: 3.6
Abstract
========
Several modern programming languages have so-called "null coalescing" or
"null aware" operators, including C#, Dart, Perl, Swift, and PHP (starting in
version 7). These operators provide syntactic sugar for common patterns
involving null references. [1]_ [2]_
* The "null coalescing" operator is a binary operator that returns its first
first non-null operand.
* The "null aware member access" operator is a binary operator that accesses
an instance member only if that instance is non-null. It returns null
otherwise.
* The "null aware index access" operator is a binary operator that accesses a
member of a collection only if that collection is non-null. It returns null
otherwise.
Python does not have any directly equivalent syntax. The ``or`` operator can
be used to similar effect but checks for a truthy value, not ``None``
specifically. The ternary operator ``... if ... else ...`` can be used for
explicit null checks but is more verbose and typically duplicates part of the
expression in between ``if`` and ``else``. The proposed ``None`` coalescing
and ``None`` aware operators ofter an alternative syntax that is more
intuitive and concise.
Rationale
=========
Null Coalescing Operator
------------------------
The following code illustrates how the ``None`` coalescing operators would
work in Python::
>>> title = 'My Title'
>>> title ?? 'Default Title'
'My Title'
>>> title = None
>>> title ?? 'Default Title'
'Default Title'
Similar behavior can be achieved with the ``or`` operator, but ``or`` checks
whether its left operand is false-y, not specifically ``None``. This can lead
to surprising behavior. Consider the scenario of computing the price of some
products a customer has in his/her shopping cart::
>>> price = 100
>>> requested_quantity = 5
>>> default_quantity = 1
>>> (requested_quantity or default_quantity) * price
500
>>> requested_quantity = None
>>> (requested_quantity or default_quantity) * price
100
>>> requested_quantity = 0
>>> (requested_quantity or default_quantity) * price # oops!
100
This type of bug is not possible with the ``None`` coalescing operator,
because there is no implicit type coersion to ``bool``::
>>> price = 100
>>> requested_quantity = 0
>>> default_quantity = 1
>>> (requested_quantity ?? default_quantity) * price
0
The same correct behavior can be achieved with the ternary operator. Here is
an excerpt from the popular Requests package::
data = [] if data is None else data
files = [] if files is None else files
headers = {} if headers is None else headers
params = {} if params is None else params
hooks = {} if hooks is None else hooks
This particular formulation has the undesirable effect of putting the operands
in an unintuitive order: the brain thinks, "use ``data`` if possible and use
``[]`` as a fallback," but the code puts the fallback *before* the preferred
value.
The author of this package could have written it like this instead::
data = data if data is not None else []
files = files if files is not None else []
headers = headers if headers is not None else {}
params = params if params is not None else {}
hooks = hooks if hooks is not None else {}
This ordering of the operands is more intuitive, but it requires 4 extra
characters (for "not "). It also highlights the repetition of identifiers:
``data if data``, ``files if files``, etc. The ``None`` coalescing operator
improves readability::
data = data ?? []
files = files ?? []
headers = headers ?? {}
params = params ?? {}
hooks = hooks ?? {}
The ``None`` coalescing operator also has a corresponding assignment shortcut.
::
data ?= []
files ?= []
headers ?= {}
params ?= {}
hooks ?= {}
The ``None`` coalescing operator is left-associative, which allows for easy
chaining::
>>> user_title = None
>>> local_default_title = None
>>> global_default_title = 'Global Default Title'
>>> title = user_title ?? local_default_title ?? global_default_title
'Global Default Title'
The direction of associativity is important because the ``None`` coalescing
operator short circuits: if its left operand is non-null, then the right
operand is not evaluated.
::
>>> def get_default(): raise Exception()
>>> 'My Title' ?? get_default()
'My Title'
Null-Aware Member Access Operator
---------------------------------
::
>>> title = 'My Title'
>>> title.upper()
'MY TITLE'
>>> title = None
>>> title.upper()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'upper'
>>> title?.upper()
None
Null-Aware Index Access Operator
---------------------------------
::
>>> person = {'name': 'Mark', 'age': 32}
>>> person['name']
'Mark'
>>> person = None
>>> person['name']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not subscriptable
>>> person?['name']
None
Specification
=============
References
==========
.. [1] Wikipedia: Null coalescing operator
(https://en.wikipedia.org/wiki/Null_coalescing_operator)
.. [2] Seth Ladd's Blog: Null-aware operators in Dart
(http://blog.sethladd.com/2015/07/null-aware-operators-in-dart.html)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

449
pep-0506.txt Normal file
View File

@ -0,0 +1,449 @@
PEP: 506
Title: Adding A Secrets Module To The Standard Library
Version: $Revision$
Last-Modified: $Date$
Author: Steven D'Aprano <steve@pearwood.info>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 19-Sep-2015
Python-Version: 3.6
Post-History:
Abstract
========
This PEP proposes the addition of a module for common security-related
functions such as generating tokens to the Python standard library.
Definitions
===========
Some common abbreviations used in this proposal:
* PRNG:
Pseudo Random Number Generator. A deterministic algorithm used
to produce random-looking numbers with certain desirable
statistical properties.
* CSPRNG:
Cryptographically Strong Pseudo Random Number Generator. An
algorithm used to produce random-looking numbers which are
resistant to prediction.
* MT:
Mersenne Twister. An extensively studied PRNG which is currently
used by the ``random`` module as the default.
Rationale
=========
This proposal is motivated by concerns that Python's standard library
makes it too easy for developers to inadvertently make serious security
errors. Theo de Raadt, the founder of OpenBSD, contacted Guido van Rossum
and expressed some concern [#]_ about the use of MT for generating sensitive
information such as passwords, secure tokens, session keys and similar.
Although the documentation for the ``random`` module explicitly states that
the default is not suitable for security purposes [#]_, it is strongly
believed that this warning may be missed, ignored or misunderstood by
many Python developers. In particular:
* developers may not have read the documentation and consequently
not seen the warning;
* they may not realise that their specific use of the module has security
implications; or
* not realising that there could be a problem, they have copied code
(or learned techniques) from websites which don't offer best
practises.
The first [#]_ hit when searching for "python how to generate passwords" on
Google is a tutorial that uses the default functions from the ``random``
module [#]_. Although it is not intended for use in web applications, it is
likely that similar techniques find themselves used in that situation.
The second hit is to a StackOverflow question about generating
passwords [#]_. Most of the answers given, including the accepted one, use
the default functions. When one user warned that the default could be
easily compromised, they were told "I think you worry too much." [#]_
This strongly suggests that the existing ``random`` module is an attractive
nuisance when it comes to generating (for example) passwords or secure
tokens.
Additional motivation (of a more philosophical bent) can be found in the
post which first proposed this idea [#]_.
Proposal
========
Alternative proposals have focused on the default PRNG in the ``random``
module, with the aim of providing "secure by default" cryptographically
strong primitives that developers can build upon without thinking about
security. (See Alternatives below.) This proposes a different approach:
* The standard library already provides cryptographically strong
primitives, but many users don't know they exist or when to use them.
* Instead of requiring crypto-naive users to write secure code, the
standard library should include a set of ready-to-use "batteries" for
the most common needs, such as generating secure tokens. This code
will both directly satisfy a need ("How do I generate a password reset
token?"), and act as an example of acceptable practises which
developers can learn from [#]_.
To do this, this PEP proposes that we add a new module to the standard
library, with the suggested name ``secrets``. This module will contain a
set of ready-to-use functions for common activities with security
implications, together with some lower-level primitives.
The suggestion is that ``secrets`` becomes the go-to module for dealing
with anything which should remain secret (passwords, tokens, etc.)
while the ``random`` module remains backward-compatible.
API and Implementation
======================
The contents of the ``secrets`` module is expected to evolve over time, and
likely will evolve between the time of writing this PEP and actual release
in the standard library [#]_. At the time of writing, the following functions
have been suggested:
* A high-level function for generating secure tokens suitable for use
in (e.g.) password recovery, as session keys, etc.
* A limited interface to the system CSPRNG, using either ``os.urandom``
directly or ``random.SystemRandom``. Unlike the ``random`` module, this
does not need to provide methods for seeding, getting or setting the
state, or any non-uniform distributions. It should provide the
following:
- A function for choosing items from a sequence, ``secrets.choice``.
- A function for generating an integer within some range, such as
``secrets.randrange`` or ``secrets.randint``.
- A function for generating a given number of random bits and/or bytes
as an integer.
- A similar function which returns the value as a hex digit string.
* ``hmac.compare_digest`` under the name ``equal``.
The consensus appears to be that there is no need to add a new CSPRNG to
the ``random`` module to support these uses, ``SystemRandom`` will be
sufficient.
Some illustrative implementations have been given by Nick Coghlan [#]_
and a minimalist API by Tim Peters [#]_. This idea has also been discussed
on the issue tracker for the "cryptography" module [#]_. The following
pseudo-code can be taken as a possible starting point for the real
implementation::
from random import SystemRandom
from hmac import compare_digest as equal
_sysrand = SystemRandom()
randrange = _sysrand.randrange
randint = _sysrand.randint
randbits = _sysrand.getrandbits
choice = _sysrand.choice
def randbelow(exclusive_upper_bound):
return _sysrand._randbelow(exclusive_upper_bound)
DEFAULT_ENTROPY = 32 # bytes
def token_bytes(nbytes=None):
if nbytes is None:
nbytes = DEFAULT_ENTROPY
return os.urandom(nbytes)
def token_hex(nbytes=None):
return binascii.hexlify(token_bytes(nbytes)).decode('ascii')
def token_url(nbytes=None):
tok = token_bytes(nbytes)
return base64.urlsafe_b64encode(tok).rstrip(b'=').decode('ascii')
The ``secrets`` module itself will be pure Python, and other Python
implementations can easily make use of it unchanged, or adapt it as
necessary.
Default arguments
~~~~~~~~~~~~~~~~~
One difficult question is "How many bytes should my token be?". We can
help with this question by providing a default amount of entropy for the
"token_*" functions. If the ``nbytes`` argument is None or not given, the
default entropy will be used. This default value should be large enough
to be expected to be secure for medium-security uses, but is expected to
change in the future, possibly even in a maintenance release [#]_.
Naming conventions
~~~~~~~~~~~~~~~~~~
One question is the naming conventions used in the module [#]_, whether to
use C-like naming conventions such as "randrange" or more Pythonic names
such as "random_range".
Functions which are simply bound methods of the private ``SystemRandom``
instance (e.g. ``randrange``), or a thin wrapper around such, should keep
the familiar names. Those which are something new (such as the various
``token_*`` functions) will use more Pythonic names.
Alternatives
============
One alternative is to change the default PRNG provided by the ``random``
module [#]_. This received considerable scepticism and outright opposition:
* There is fear that a CSPRNG may be slower than the current PRNG (which
in the case of MT is already quite slow).
* Some applications (such as scientific simulations, and replaying
gameplay) require the ability to seed the PRNG into a known state,
which a CSPRNG lacks by design.
* Another major use of the ``random`` module is for simple "guess a number"
games written by beginners, and many people are loath to make any
change to the ``random`` module which may make that harder.
* Although there is no proposal to remove MT from the ``random`` module,
there was considerable hostility to the idea of having to opt-in to
a non-CSPRNG or any backwards-incompatible changes.
* Demonstrated attacks against MT are typically against PHP applications.
It is believed that PHP's version of MT is a significantly softer target
than Python's version, due to a poor seeding technique [#]_. Consequently,
without a proven attack against Python applications, many people object
to a backwards-incompatible change.
Nick Coghlan made an earlier suggestion for a globally configurable PRNG
which uses the system CSPRNG by default [#]_, but has since withdrawn it
in favour of this proposal.
Comparison To Other Languages
=============================
* PHP
PHP includes a function ``uniqid`` [#]_ which by default returns a
thirteen character string based on the current time in microseconds.
Translated into Python syntax, it has the following signature::
def uniqid(prefix='', more_entropy=False)->str
The PHP documentation warns that this function is not suitable for
security purposes. Nevertheless, various mature, well-known PHP
applications use it for that purpose (citation needed).
PHP 5.3 and better also includes a function ``openssl_random_pseudo_bytes``
[#]_. Translated into Python syntax, it has roughly the following
signature::
def openssl_random_pseudo_bytes(length:int)->Tuple[str, bool]
This function returns a pseudo-random string of bytes of the given
length, and an boolean flag giving whether the string is considered
cryptographically strong. The PHP manual suggests that returning
anything but True should be rare except for old or broken platforms.
* JavaScript
Based on a rather cursory search [#]_, there do not appear to be any
well-known standard functions for producing strong random values in
JavaScript. ``Math.random`` is often used, despite serious weaknesses
making it unsuitable for cryptographic purposes [#]_. In recent years
the majority of browsers have gained support for ``window.crypto.getRandomValues`` [#]_.
Node.js offers a rich cryptographic module, ``crypto`` [#]_, most of
which is beyond the scope of this PEP. It does include a single function
for generating random bytes, ``crypto.randomBytes``.
* Ruby
The Ruby standard library includes a module ``SecureRandom`` [#]_
which includes the following methods:
* base64 - returns a Base64 encoded random string.
* hex - returns a random hexadecimal string.
* random_bytes - returns a random byte string.
* random_number - depending on the argument, returns either a random
integer in the range(0, n), or a random float between 0.0 and 1.0.
* urlsafe_base64 - returns a random URL-safe Base64 encoded string.
* uuid - return a version 4 random Universally Unique IDentifier.
What Should Be The Name Of The Module?
======================================
There was a proposal to add a "random.safe" submodule, quoting the Zen
of Python "Namespaces are one honking great idea" koan. However, the
author of the Zen, Tim Peters, has come out against this idea [#]_, and
recommends a top-level module.
In discussion on the python-ideas mailing list so far, the name "secrets"
has received some approval, and no strong opposition.
There is already an existing third-party module with the same name [#]_,
but it appears to be unused and abandoned.
Frequently Asked Questions
==========================
* Q: Is this a real problem? Surely MT is random enough that nobody can
predict its output.
A: The consensus among security professionals is that MT is not safe
in security contexts. It is not difficult to reconstruct the internal
state of MT [#]_ [#]_ and so predict all past and future values. There
are a number of known, practical attacks on systems using MT for
randomness [#]_.
While there are currently no known direct attacks on applications
written in Python due to the use of MT, there is widespread agreement
that such usage is unsafe.
* Q: Is this an alternative to specialise cryptographic software such as SSL?
A: No. This is a "batteries included" solution, not a full-featured
"nuclear reactor". It is intended to mitigate against some basic
security errors, not be a solution to all security-related issues. To
quote Nick Coghlan referring to his earlier proposal [#]_::
"...folks really are better off learning to use things like
cryptography.io for security sensitive software, so this change
is just about harm mitigation given that it's inevitable that a
non-trivial proportion of the millions of current and future
Python developers won't do that."
* Q: What about a password generator?
A: The consensus is that the requirements for password generators are too
variable for it to be a good match for the standard library [#]_. No
password generator will be included in the initial release of the
module, instead it will be given in the documentation as a recipe (à la
the recipes in the ``itertools`` module) [#]_.
* Q: Will ``secrets`` use /dev/random (which blocks) or /dev/urandom (which
doesn't block) on Linux? What about other platforms?
A: ``secrets`` will be based on ``os.urandom`` and ``random.SystemRandom``,
which are interfaces to your operating system's best source of
cryptographic randomness. On Linux, that may be ``/dev/urandom`` [#]_,
on Windows it may be ``CryptGenRandom()``, but see the documentation
and/or source code for the detailed implementation details.
References
==========
.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/035820.html
.. [#] https://docs.python.org/3/library/random.html
.. [#] As of the date of writing. Also, as Google search terms may be
automatically customised for the user without their knowledge, some
readers may see different results.
.. [#] http://interactivepython.org/runestone/static/everyday/2013/01/3_password.html
.. [#] http://stackoverflow.com/questions/3854692/generate-password-in-python
.. [#] http://stackoverflow.com/questions/3854692/generate-password-in-python/3854766#3854766
.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036238.html
.. [#] At least those who are motivated to read the source code and documentation.
.. [#] Tim Peters suggests that bike-shedding the contents of the module will
be 10000 times more time consuming than actually implementing the
module. Words do not begin to express how much I am looking forward to
this.
.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036271.html
.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036350.html
.. [#] https://github.com/pyca/cryptography/issues/2347
.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036517.html
https://mail.python.org/pipermail/python-ideas/2015-September/036515.html
.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036474.html
.. [#] Link needed.
.. [#] By default PHP seeds the MT PRNG with the time (citation needed),
which is exploitable by attackers, while Python seeds the PRNG with
output from the system CSPRNG, which is believed to be much harder to
exploit.
.. [#] http://legacy.python.org/dev/peps/pep-0504/
.. [#] http://php.net/manual/en/function.uniqid.php
.. [#] http://php.net/manual/en/function.openssl-random-pseudo-bytes.php
.. [#] Volunteers and patches are welcome.
.. [#] http://ifsec.blogspot.fr/2012/05/cross-domain-mathrandom-prediction.html
.. [#] https://developer.mozilla.org/en-US/docs/Web/API/RandomSource/getRandomValues
.. [#] https://nodejs.org/api/crypto.html
.. [#] http://ruby-doc.org/stdlib-2.1.2/libdoc/securerandom/rdoc/SecureRandom.html
.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036254.html
.. [#] https://pypi.python.org/pypi/secrets
.. [#] https://jazzy.id.au/2010/09/22/cracking_random_number_generators_part_3.html
.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036077.html
.. [#] https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf
.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036157.html
.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036476.html
https://mail.python.org/pipermail/python-ideas/2015-September/036478.html
.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036488.html
.. [#] http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/
http://www.2uo.de/myths-about-urandom/
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

331
pep-0507.txt Normal file
View File

@ -0,0 +1,331 @@
PEP: 507
Title: Migrate CPython to Git and GitLab
Version: $Revision$
Last-Modified: $Date$
Author: Barry Warsaw <barry@python.org>
Status: Draft
Type: Process
Content-Type: text/x-rst
Created: 2015-09-30
Post-History:
Abstract
========
This PEP proposes migrating the repository hosting of CPython and the
supporting repositories to Git. Further, it proposes adopting a
hosted GitLab instance as the primary way of handling merge requests,
code reviews, and code hosting. It is similar in intent to PEP 481
but proposes an open source alternative to GitHub and omits the
proposal to run Phabricator. As with PEP 481, this particular PEP is
offered as an alternative to PEP 474 and PEP 462.
Rationale
=========
CPython is an open source project which relies on a number of
volunteers donating their time. As with any healthy, vibrant open
source project, it relies on attracting new volunteers as well as
retaining existing developers. Given that volunteer time is the most
scarce resource, providing a process that maximizes the efficiency of
contributors and reduces the friction for contributions, is of vital
importance for the long-term health of the project.
The current tool chain of the CPython project is a custom and unique
combination of tools. This has two critical implications:
* The unique nature of the tool chain means that contributors must
remember or relearn, the process, workflow, and tools whenever they
contribute to CPython, without the advantage of leveraging long-term
memory and familiarity they retain by working with other projects in
the FLOSS ecosystem. The knowledge they gain in working with
CPython is unlikely to be applicable to other projects.
* The burden on the Python/PSF infrastructure team is much greater in
order to continue to maintain custom tools, improve them over time,
fix bugs, address security issues, and more generally adapt to new
standards in online software development with global collaboration.
These limitations act as a barrier to contribution both for highly
engaged contributors (e.g. core Python developers) and especially for
more casual "drive-by" contributors, who care more about getting their
bug fix than learning a new suite of tools and workflows.
By proposing the adoption of both a different version control system
and a modern, well-maintained hosting solution, this PEP addresses
these limitations. It aims to enable a modern, well-understood
process that will carry CPython development for many years.
Version Control System
----------------------
Currently the CPython and supporting repositories use Mercurial. As a
modern distributed version control system, it has served us well since
the migration from Subversion. However, when evaluating the VCS we
must consider the capabilities of the VCS itself as well as the
network effect and mindshare of the community around that VCS.
There are really only two real options for this, Mercurial and Git.
The technical capabilities of the two systems are largely equivalent,
therefore this PEP instead focuses on their social aspects.
It is not possible to get exact numbers for the number of projects or
people which are using a particular VCS, however we can infer this by
looking at several sources of information for what VCS projects are
using.
The Open Hub (previously Ohloh) statistics [#openhub-stats]_ show that
37% of the repositories indexed by The Open Hub are using Git (second
only to Subversion which has 48%) while Mercurial has just 2%, beating
only Bazaar which has 1%. This has Git being just over 18 times as
popular as Mercurial on The Open Hub.
Another source of information on VCS popularity is PyPI itself. This
source is more targeted at the Python community itself since it
represents projects developed for Python. Unfortunately PyPI does not
have a standard location for representing this information, so this
requires manual processing. If we limit our search to the top 100
projects on PyPI (ordered by download counts) we can see that 62% of
them use Git, while 22% of them use Mercurial, and 13% use something
else. This has Git being just under 3 times as popular as Mercurial
for the top 100 projects on PyPI.
These numbers back up the anecdotal evidence for Git as the far more
popular DVCS for open source projects. Choosing the more popular VCS
has a number of positive benefits.
For new contributors it increases the likelihood that they will have already
learned the basics of Git as part of working with another project or if they
are just now learning Git, that they'll be able to take that knowledge and
apply it to other projects. Additionally a larger community means more people
writing how to guides, answering questions, and writing articles about Git
which makes it easier for a new user to find answers and information about the
tool they are trying to learn and use. Given its popularity, there may also
be more auxiliary tooling written *around* Git. This increases options for
everything from GUI clients, helper scripts, repository hosting, etc.
Further, the adoption of Git as the proposed back-end repository
format doesn't prohibit the use of Mercurial by fans of that VCS!
Mercurial users have the [#hg-git]_ plugin which allows them to push
and pull from a Git server using the Mercurial front-end. It's a
well-maintained and highly functional plugin that seems to be
well-liked by Mercurial users.
Repository Hosting
------------------
Where and how the official repositories for CPython are hosted is in
someways determined by the choice of VCS. With Git there are several
options. In fact, once the repository is hosted in Git, branches can
be mirrored in many locations, within many free, open, and proprietary
code hosting sites.
It's still important for CPython to adopt a single, official
repository, with a web front-end that allows for many convenient and
common interactions entirely through the web, without always requiring
local VCS manipulations. These interactions include as a minimum,
code review with inline comments, branch diffing, CI integration, and
auto-merging.
This PEP proposes to adopt a [#GitLab]_ instance, run within the
python.org domain, accessible to and with ultimate control from the
PSF and the Python infrastructure team, but donated, hosted, and
primarily maintained by GitLab, Inc.
Why GitLab? Because it is a fully functional Git hosting system, that
sports modern web interactions, software workflows, and CI
integration. GitLab's Community Edition (CE) is open source software,
and thus is closely aligned with the principles of the CPython
community.
Code Review
-----------
Currently CPython uses a custom fork of Rietveld modified to not run
on Google App Engine and which is currently only really maintained by
one person. It is missing common features present in many modern code
review tools.
This PEP proposes to utilize GitLab's built-in merge requests and
online code review features to facilitate reviews of all proposed
changes.
GitLab merge requests
---------------------
The normal workflow for a GitLab hosted project is to submit a *merge request*
asking that a feature or bug fix branch be merged into a target branch,
usually one or more of the stable maintenance branches or the next-version
master branch for new features. GitLab's merge requests are similar in form
and function to GitHub's pull requests, so anybody who is already familiar
with the latter should be able to immediately utilize the former.
Once submitted, a conversation about the change can be had between the
submitter and reviewer. This includes both general comments, and inline
comments attached to a particular line of the diff between the source and
target branches. Projects can also be configured to automatically run
continuous integration on the submitted branch, the results of which are
readily visible from the merge request page. Thus both the reviewer and
submitter can immediately see the results of the tests, making it much easier
to only land branches with passing tests. Each new push to the source branch
(e.g. to respond to a commenter's feedback or to fix a failing test) results
in a new run of the CI, so that the state of the request always reflects the
latest commit.
Merge requests have a fairly major advantage over the older "submit a patch to
a bug tracker" model. They allow developers to work completely within the VCS
using standard VCS tooling, without requiring the creation of a patch file or
figuring out the right location to upload the patch to. This lowers the
barrier for sending a change to be reviewed.
Merge requests are far easier to review. For example, they provide nice
syntax highlighted diffs which can operate in either unified or side by side
views. They allow commenting inline and on the merge request as a whole and
they present that in a nice unified way which will also hide comments which no
longer apply. Comments can be hidden and revealed.
Actually merging a merge request is quite simple, if the source branch applies
cleanly to the target branch. A core reviewer simply needs to press the
"Merge" button for GitLab to automatically perform the merge. The source
branch can be optionally rebased, and once the merge is completed, the source
branch can be automatically deleted.
GitLab also has a good workflow for submitting pull requests to a project
completely through their web interface. This would enable the Python
documentation to have "Edit on GitLab" buttons on every page and people who
discover things like typos, inaccuracies, or just want to make improvements to
the docs they are currently reading. They can simply hit that button and get
an in browser editor that will let them make changes and submit a merge
request all from the comfort of their browser.
Criticism
=========
X is not written in Python
--------------------------
One feature that the current tooling (Mercurial, Rietveld) has is that the
primary language for all of the pieces are written in Python. This PEP
focuses more on the *best* tools for the job and not necessarily on the *best*
tools that happen to be written in Python. Volunteer time is the most
precious resource for any open source project and we can best respect and
utilize that time by focusing on the benefits and downsides of the tools
themselves rather than what language their authors happened to write them in.
One concern is the ability to modify tools to work for us, however one of the
Goals here is to *not* modify software to work for us and instead adapt
ourselves to a more standardized workflow. This standardization pays off in
the ability to re-use tools out of the box freeing up developer time to
actually work on Python itself as well as enabling knowledge sharing between
projects.
However if we do need to modify the tooling, Git itself is largely written in
C the same as CPython itself. It can also have commands written for it using
any language, including Python. GitLab itself is largely written in Ruby and
since it is Open Source software, we would have the ability to submit merge
requests to the upstream Community Edition, albeit in language potentially
unfamiliar to most Python programmers.
Mercurial is better than Git
----------------------------
Whether Mercurial or Git is better on a technical level is a highly subjective
opinion. This PEP does not state whether the mechanics of Git or Mercurial
are better, and instead focuses on the network effect that is available for
either option. While this PEP proposes switching to Git, Mercurial users are
not left completely out of the loop. By using the hg-git extension for
Mercurial, working with server-side Git repositories is fairly easy and
straightforward.
CPython Workflow is too Complicated
-----------------------------------
One sentiment that came out of previous discussions was that the multi-branch
model of CPython was too complicated for GitLab style merge requests. This
PEP disagrees with that sentiment.
Currently any particular change requires manually creating a patch for 2.7 and
3.x which won't change at all in this regards.
If someone submits a fix for the current stable branch (e.g. 3.5) the merge
request workflow can be used to create a request to merge the current stable
branch into the master branch, assuming there is no merge conflicts. As
always, merge conflicts must be manually and locally resolved. Because
developers also have the *option* of performing the merge locally, this
provides an improvement over the current situation where the merge *must*
always happen locally.
For fixes in the current development branch that must also be applied to
stable release branches, it is possible in many situations to locally cherry
pick and apply the change to other branches, with merge requests submitted for
each stable branch. It is also possible just cherry pick and complete the
merge locally. These are all accomplished with standard Git commands and
techniques, with the advantage that all such changes can go through the review
and CI test workflows, even for merges to stable branches. Minor changes may
be easily accomplished in the GitLab web editor.
No system can hide all the complexities involved in maintaining several long
lived branches. The only thing that the tooling can do is make it as easy as
possible to submit and commit changes.
Open issues
===========
* What level of hosted support will GitLab offer? The PEP author has been in
contact with the GitLab CEO, with positive interest on their part. The
details of the hosting offer would have to be discussed.
* What happens to Roundup and do we switch to the GitLab issue tracker?
Currently, this PEP is *not* suggesting we move from Roundup to GitLab
issues. We have way too much invested in Roundup right now and migrating
the data would be a huge effort. GitLab does support webhooks, so we will
probably want to use webhooks to integrate merges and other events with
updates to Roundup (e.g. to include pointers to commits, close issues,
etc. similar to what is currently done).
* What happens to wiki.python.org? Nothing! While GitLab does support wikis
in repositories, there's no reason for us to migration our Moin wikis.
* What happens to the existing GitHub mirrors? We'd probably want to
regenerate them once the official upstream branches are natively hosted in
Git. This may change commit ids, but after that, it should be easy to
mirror the official Git branches and repositories far and wide.
* Where would the GitLab instance live? Physically, in whatever hosting
provider GitLab chooses. We would point gitlab.python.org (or
git.python.org?) to this host.
References
==========
.. [#openhub-stats] `Open Hub Statistics <https://www.openhub.net/repositories/compare>`
.. [#hg-git] `Hg-Git mercurial plugin <https://hg-git.github.io/>`
.. [#GitLab] `https://about.gitlab.com/`
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

View File

@ -2,7 +2,7 @@ PEP: 3140
Title: str(container) should call str(item), not repr(item)
Version: $Revision$
Last-Modified: $Date$
Author: Oleg Broytmann <phd@phd.pp.ru>,
Author: Oleg Broytman <phd@phdru.name>,
Jim J. Jewett <jimjjewett@gmail.com>
Discussions-To: python-3000@python.org
Status: Rejected