This commit is contained in:
Alexander Belopolsky 2015-09-20 20:25:44 -04:00
commit 3cdb11485b
10 changed files with 2106 additions and 302 deletions

View File

@ -424,11 +424,12 @@ How to Make A Release
that directory. Note though that if you're releasing a maintenance
release for an older version, don't change the current link.
___ If this is a final release (even a maintenance release), also unpack
the HTML docs to /srv/docs.python.org/release/X.Y.Z on
docs.iad1.psf.io. Make sure the files are in group "docs". If it is a
release of a security-fix-only version, tell the DE to build a version
with the "version switcher" and put it there.
___ If this is a final release (even a maintenance release), also
unpack the HTML docs to /srv/docs.python.org/release/X.Y.Z on
docs.iad1.psf.io. Make sure the files are in group "docs" and are
group-writeable. If it is a release of a security-fix-only version,
tell the DE to build a version with the "version switcher"
and put it there.
___ Let the DE check if the docs are built and work all right.
@ -484,6 +485,10 @@ How to Make A Release
Note that the easiest thing is probably to copy fields from
an existing Python release "page", editing as you go.
There should only be one "page" for a release (e.g. 3.5.0, 3.5.1).
Reuse the same page for all pre-releases, changing the version
number and the documentation as you go.
___ If this isn't the first release for a version, open the existing
"page" for editing and update it to the new release. Don't save yet!

951
pep-0103.txt Normal file
View File

@ -0,0 +1,951 @@
PEP: 103
Title: Collecting information about git
Version: $Revision$
Last-Modified: $Date$
Author: Oleg Broytman <phd@phdru.name>
Status: Draft
Type: Informational
Content-Type: text/x-rst
Created: 01-Jun-2015
Post-History: 12-Sep-2015
Abstract
========
This Informational PEP collects information about git. There is, of
course, a lot of documentation for git, so the PEP concentrates on
more complex (and more related to Python development) issues,
scenarios and examples.
The plan is to extend the PEP in the future collecting information
about equivalence of Mercurial and git scenarios to help migrating
Python development from Mercurial to git.
The author of the PEP doesn't currently plan to write a Process PEP on
migration Python development from Mercurial to git.
Documentation
=============
Git is accompanied with a lot of documentation, both online and
offline.
Documentation for starters
--------------------------
Git Tutorial: `part 1
<https://www.kernel.org/pub/software/scm/git/docs/gittutorial.html>`_,
`part 2
<https://www.kernel.org/pub/software/scm/git/docs/gittutorial-2.html>`_.
`Git User's manual
<https://www.kernel.org/pub/software/scm/git/docs/user-manual.html>`_.
`Everyday GIT With 20 Commands Or So
<https://www.kernel.org/pub/software/scm/git/docs/giteveryday.html>`_.
`Git workflows
<https://www.kernel.org/pub/software/scm/git/docs/gitworkflows.html>`_.
Advanced documentation
----------------------
`Git Magic
<http://www-cs-students.stanford.edu/~blynn/gitmagic/index.html>`_,
with a number of translations.
`Pro Git <https://git-scm.com/book>`_. The Book about git. Buy it at
Amazon or download in PDF, mobi, or ePub form. It has translations to
many different languages. Download Russian translation from `GArik
<https://github.com/GArik/progit/wiki>`_.
`Git Wiki <https://git.wiki.kernel.org/index.php/Main_Page>`_.
Offline documentation
---------------------
Git has builtin help: run ``git help $TOPIC``. For example, run
``git help git`` or ``git help help``.
Quick start
===========
Download and installation
-------------------------
Unix users: `download and install using your package manager
<https://git-scm.com/download/linux>`_.
Microsoft Windows: download `git-for-windows
<https://github.com/git-for-windows/git/releases>`_ or `msysGit
<https://github.com/msysgit/msysgit/releases>`_.
MacOS X: use git installed with `XCode
<https://developer.apple.com/xcode/downloads/>`_ or download from
`MacPorts <https://www.macports.org/ports.php?by=name&substr=git>`_ or
`git-osx-installer
<http://sourceforge.net/projects/git-osx-installer/files/>`_ or
install git with `Homebrew <http://brew.sh/>`_: ``brew install git``.
`git-cola <https://git-cola.github.io/index.html>`_ is a Git GUI
written in Python and GPL licensed. Linux, Windows, MacOS X.
`TortoiseGit <https://tortoisegit.org/>`_ is a Windows Shell Interface
to Git based on TortoiseSVN; open source.
Initial configuration
---------------------
This simple code is often appears in documentation, but it is
important so let repeat it here. Git stores author and committer
names/emails in every commit, so configure your real name and
preferred email::
$ git config --global user.name "User Name"
$ git config --global user.email user.name@example.org
Examples in this PEP
====================
Examples of git commands in this PEP use the following approach. It is
supposed that you, the user, works with a local repository named
``python`` that has an upstream remote repo named ``origin``. Your
local repo has two branches ``v1`` and ``master``. For most examples
the currently checked out branch is ``master``. That is, it's assumed
you have done something like that::
$ git clone https://git.python.org/python.git
$ cd python
$ git branch v1 origin/v1
The first command clones remote repository into local directory
`python``, creates a new local branch master, sets
remotes/origin/master as its upstream remote-tracking branch and
checks it out into the working directory.
The last command creates a new local branch v1 and sets
remotes/origin/v1 as its upstream remote-tracking branch.
The same result can be achieved with commands::
$ git clone -b v1 https://git.python.org/python.git
$ cd python
$ git checkout --track origin/master
The last command creates a new local branch master, sets
remotes/origin/master as its upstream remote-tracking branch and
checks it out into the working directory.
Branches and branches
=====================
Git terminology can be a bit misleading. Take, for example, the term
"branch". In git it has two meanings. A branch is a directed line of
commits (possibly with merges). And a branch is a label or a pointer
assigned to a line of commits. It is important to distinguish when you
talk about commits and when about their labels. Lines of commits are
by itself unnamed and are usually only lengthening and merging.
Labels, on the other hand, can be created, moved, renamed and deleted
freely.
Remote repositories and remote branches
=======================================
Remote-tracking branches are branches (pointers to commits) in your
local repository. They are there for git (and for you) to remember
what branches and commits have been pulled from and pushed to what
remote repos (you can pull from and push to many remotes).
Remote-tracking branches live under ``remotes/$REMOTE`` namespaces,
e.g. ``remotes/origin/master``.
To see the status of remote-tracking branches run::
$ git branch -rv
To see local and remote-tracking branches (and tags) pointing to
commits::
$ git log --decorate
You never do your own development on remote-tracking branches. You
create a local branch that has a remote branch as upstream and do
development on that local branch. On push git pushes commits to the
remote repo and updates remote-tracking branches, on pull git fetches
commits from the remote repo, updates remote-tracking branches and
fast-forwards, merges or rebases local branches.
When you do an initial clone like this::
$ git clone -b v1 https://git.python.org/python.git
git clones remote repository ``https://git.python.org/python.git`` to
directory ``python``, creates a remote named ``origin``, creates
remote-tracking branches, creates a local branch ``v1``, configure it
to track upstream remotes/origin/v1 branch and checks out ``v1`` into
the working directory.
Updating local and remote-tracking branches
-------------------------------------------
There is a major difference between
::
$ git fetch $REMOTE $BRANCH
and
::
$ git fetch $REMOTE $BRANCH:$BRANCH
The first command fetches commits from the named $BRANCH in the
$REMOTE repository that are not in your repository, updates
remote-tracking branch and leaves the id (the hash) of the head commit
in file .git/FETCH_HEAD.
The second command fetches commits from the named $BRANCH in the
$REMOTE repository that are not in your repository and updates both
the local branch $BRANCH and its upstream remote-tracking branch. But
it refuses to update branches in case of non-fast-forward. And it
refuses to update the current branch (currently checked out branch,
where HEAD is pointing to).
The first command is used internally by ``git pull``.
::
$ git pull $REMOTE $BRANCH
is equivalent to
::
$ git fetch $REMOTE $BRANCH
$ git merge FETCH_HEAD
Certainly, $BRANCH in that case should be your current branch. If you
want to merge a different branch into your current branch first update
that non-current branch and then merge::
$ git fetch origin v1:v1 # Update v1
$ git pull --rebase origin master # Update the current branch master
# using rebase instead of merge
$ git merge v1
If you have not yet pushed commits on ``v1``, though, the scenario has
to become a bit more complex. Git refuses to update
non-fast-forwardable branch, and you don't want to do force-pull
because that would remove your non-pushed commits and you would need
to recover. So you want to rebase ``v1`` but you cannot rebase
non-current branch. Hence, checkout ``v1`` and rebase it before
merging::
$ git checkout v1
$ git pull --rebase origin v1
$ git checkout master
$ git pull --rebase origin master
$ git merge v1
It is possible to configure git to make it fetch/pull a few branches
or all branches at once, so you can simply run
::
$ git pull origin
or even
::
$ git pull
Default remote repository for fetching/pulling is ``origin``. Default
set of references to fetch is calculated using matching algorithm: git
fetches all branches having the same name on both ends.
Push
''''
Pushing is a bit simpler. There is only one command ``push``. When you
run
::
$ git push origin v1 master
git pushes local v1 to remote v1 and local master to remote master.
The same as::
$ git push origin v1:v1 master:master
Git pushes commits to the remote repo and updates remote-tracking
branches. Git refuses to push commits that aren't fast-forwardable.
You can force-push anyway, but please remember - you can force-push to
your own repositories but don't force-push to public or shared repos.
If you find git refuses to push commits that aren't fast-forwardable,
better fetch and merge commits from the remote repo (or rebase your
commits on top of the fetched commits), then push. Only force-push if
you know what you do and why you do it. See the section `Commit
editing and caveats`_ below.
It is possible to configure git to make it push a few branches or all
branches at once, so you can simply run
::
$ git push origin
or even
::
$ git push
Default remote repository for pushing is ``origin``. Default set of
references to push in git before 2.0 is calculated using matching
algorithm: git pushes all branches having the same name on both ends.
Default set of references to push in git 2.0+ is calculated using
simple algorithm: git pushes the current branch back to its
@{upstream}.
To configure git before 2.0 to the new behaviour run::
$ git config push.default simple
To configure git 2.0+ to the old behaviour run::
$ git config push.default matching
Git doesn't allow to push a branch if it's the current branch in the
remote non-bare repository: git refuses to update remote working
directory. You really should push only to bare repositories. For
non-bare repositories git prefers pull-based workflow.
When you want to deploy code on a remote host and can only use push
(because your workstation is behind a firewall and you cannot pull
from it) you do that in two steps using two repositories: you push
from the workstation to a bare repo on the remote host, ssh to the
remote host and pull from the bare repo to a non-bare deployment repo.
That changed in git 2.3, but see `the blog post
<https://github.com/blog/1957-git-2-3-has-been-released#push-to-deploy>`_
for caveats; in 2.4 the push-to-deploy feature was `further improved
<https://github.com/blog/1994-git-2-4-atomic-pushes-push-to-deploy-and-more#push-to-deploy-improvements>`_.
Tags
''''
Git automatically fetches tags that point to commits being fetched
during fetch/pull. To fetch all tags (and commits they point to) run
``git fetch --tags origin``. To fetch some specific tags fetch them
explicitly::
$ git fetch origin tag $TAG1 tag $TAG2...
For example::
$ git fetch origin tag 1.4.2
$ git fetch origin v1:v1 tag 2.1.7
Git doesn't automatically pushes tags. That allows you to have private
tags. To push tags list them explicitly::
$ git push origin tag 1.4.2
$ git push origin v1 master tag 2.1.7
Or push all tags at once::
$ git push --tags origin
Don't move tags with ``git tag -f`` or remove tags with ``git tag -d``
after they have been published.
Private information
'''''''''''''''''''
When cloning/fetching/pulling/pushing git copies only database objects
(commits, trees, files and tags) and symbolic references (branches and
lightweight tags). Everything else is private to the repository and
never cloned, updated or pushed. It's your config, your hooks, your
private exclude file.
If you want to distribute hooks, copy them to the working tree, add,
commit, push and instruct the team to update and install the hooks
manually.
Commit editing and caveats
==========================
A warning not to edit published (pushed) commits also appears in
documentation but it's repeated here anyway as it's very important.
It is possible to recover from a forced push but it's PITA for the
entire team. Please avoid it.
To see what commits have not been published yet compare the head of the
branch with its upstream remote-tracking branch::
$ git log origin/master.. # from origin/master to HEAD (of master)
$ git log origin/v1..v1 # from origin/v1 to the head of v1
For every branch that has an upstream remote-tracking branch git
maintains an alias @{upstream} (short version @{u}), so the commands
above can be given as::
$ git log @{u}..
$ git log v1@{u}..v1
To see the status of all branches::
$ git branch -avv
To compare the status of local branches with a remote repo::
$ git remote show origin
Read `how to recover from upstream rebase
<https://git-scm.com/docs/git-rebase#_recovering_from_upstream_rebase>`_.
It is in ``git help rebase``.
On the other hand don't be too afraid about commit editing. You can
safely edit, reorder, remove, combine and split commits that haven't
been pushed yet. You can even push commits to your own (backup) repo,
edit them later and force-push edited commits to replace what have
already been pushed. Not a problem until commits are in a public
or shared repository.
Undo
====
Whatever you do, don't panic. Almost anything in git can be undone.
git checkout: restore file's content
------------------------------------
``git checkout``, for example, can be used to restore the content of
file(s) to that one of a commit. Like this::
git checkout HEAD~ README
The commands restores the contents of README file to the last but one
commit in the current branch. By default the commit ID is simply HEAD;
i.e. ``git checkout README`` restores README to the latest commit.
(Do not use ``git checkout`` to view a content of a file in a commit,
use ``git cat-file -p``; e.g. ``git cat-file -p HEAD~:path/to/README``).
git reset: remove (non-pushed) commits
--------------------------------------
``git reset`` moves the head of the current branch. The head can be
moved to point to any commit but it's often used to remove a commit or
a few (preferably, non-pushed ones) from the top of the branch - that
is, to move the branch backward in order to undo a few (non-pushed)
commits.
``git reset`` has three modes of operation - soft, hard and mixed.
Default is mixed. ProGit `explains
<https://git-scm.com/book/en/Git-Tools-Reset-Demystified>`_ the
difference very clearly. Bare repositories don't have indices or
working trees so in a bare repo only soft reset is possible.
Unstaging
'''''''''
Mixed mode reset with a path or paths can be used to unstage changes -
that is, to remove from index changes added with ``git add`` for
committing. See `The Book
<https://git-scm.com/book/en/Git-Basics-Undoing-Things>`_ for details
about unstaging and other undo tricks.
git reflog: reference log
-------------------------
Removing commits with ``git reset`` or moving the head of a branch
sounds dangerous and it is. But there is a way to undo: another
reset back to the original commit. Git doesn't remove commits
immediately; unreferenced commits (in git terminology they are called
"dangling commits") stay in the database for some time (default is two
weeks) so you can reset back to it or create a new branch pointing to
the original commit.
For every move of a branch's head - with ``git commit``, ``git
checkout``, ``git fetch``, ``git pull``, ``git rebase``, ``git reset``
and so on - git stores a reference log (reflog for short). For every
move git stores where the head was. Command ``git reflog`` can be used
to view (and manipulate) the log.
In addition to the moves of the head of every branch git stores the
moves of the HEAD - a symbolic reference that (usually) names the
current branch. HEAD is changed with ``git checkout $BRANCH``.
By default ``git reflog`` shows the moves of the HEAD, i.e. the
command is equivalent to ``git reflog HEAD``. To show the moves of the
head of a branch use the command ``git reflog $BRANCH``.
So to undo a ``git reset`` lookup the original commit in ``git
reflog``, verify it with ``git show`` or ``git log`` and run ``git
reset $COMMIT_ID``. Git stores the move of the branch's head in
reflog, so you can undo that undo later again.
In a more complex situation you'd want to move some commits along with
resetting the head of the branch. Cherry-pick them to the new branch.
For example, if you want to reset the branch ``master`` back to the
original commit but preserve two commits created in the current branch
do something like::
$ git branch save-master # create a new branch saving master
$ git reflog # find the original place of master
$ git reset $COMMIT_ID
$ git cherry-pick save-master~ save-master
$ git branch -D save-master # remove temporary branch
git revert: revert a commit
---------------------------
``git revert`` reverts a commit or commits, that is, it creates a new
commit or commits that revert(s) the effects of the given commits.
It's the only way to undo published commits (``git commit --amend``,
``git rebase`` and ``git reset`` change the branch in
non-fast-forwardable ways so they should only be used for non-pushed
commits.)
There is a problem with reverting a merge commit. ``git revert`` can
undo the code created by the merge commit but it cannot undo the fact
of merge. See the discussion `How to revert a faulty merge
<https://www.kernel.org/pub/software/scm/git/docs/howto/revert-a-faulty-merge.html>`_.
One thing that cannot be undone
-------------------------------
Whatever you undo, there is one thing that cannot be undone -
overwritten uncommitted changes. Uncommitted changes don't belong to
git so git cannot help preserving them.
Most of the time git warns you when you're going to execute a command
that overwrites uncommitted changes. Git doesn't allow you to switch
branches with ``git checkout``. It stops you when you're going to
rebase with non-clean working tree. It refuses to pull new commits
over non-committed files.
But there are commands that do exactly that - overwrite files in the
working tree. Commands like ``git checkout $PATHs`` or ``git reset
--hard`` silently overwrite files including your uncommitted changes.
With that in mind you can understand the stance "commit early, commit
often". Commit as often as possible. Commit on every save in your
editor or IDE. You can edit your commits before pushing - edit commit
messages, change commits, reorder, combine, split, remove. But save
your changes in git database, either commit changes or at least stash
them with ``git stash``.
Merge or rebase?
================
Internet is full of heated discussions on the topic: "merge or
rebase?" Most of them are meaningless. When a DVCS is being used in a
big team with a big and complex project with many branches there is
simply no way to avoid merges. So the question's diminished to
"whether to use rebase, and if yes - when to use rebase?" Considering
that it is very much recommended not to rebase published commits the
question's diminished even further: "whether to use rebase on
non-pushed commits?"
That small question is for the team to decide. The author of the PEP
recommends to use rebase when pulling, i.e. always do ``git pull
--rebase`` or even configure automatic setup of rebase for every new
branch::
$ git config branch.autosetuprebase always
and configure rebase for existing branches::
$ git config branch.$NAME.rebase true
For example::
$ git config branch.v1.rebase true
$ git config branch.master.rebase true
After that ``git pull origin master`` becomes equivalent to ``git pull
--rebase origin master``.
It is recommended to create new commits in a separate feature or topic
branch while using rebase to update the mainline branch. When the
topic branch is ready merge it into mainline. To avoid a tedious task
of resolving large number of conflicts at once you can merge the topic
branch to the mainline from time to time and switch back to the topic
branch to continue working on it. The entire workflow would be
something like::
$ git checkout -b issue-42 # create a new issue branch and switch to it
...edit/test/commit...
$ git checkout master
$ git pull --rebase origin master # update master from the upstream
$ git merge issue-42
$ git branch -d issue-42 # delete the topic branch
$ git push origin master
When the topic branch is deleted only the label is removed, commits
are stayed in the database, they are now merged into master::
o--o--o--o--o--M--< master - the mainline branch
\ /
--*--*--* - the topic branch, now unnamed
The topic branch is deleted to avoid cluttering branch namespace with
small topic branches. Information on what issue was fixed or what
feature was implemented should be in the commit messages.
Null-merges
===========
Git has a builtin merge strategy for what Python core developers call
"null-merge"::
$ git merge -s ours v1 # null-merge v1 into master
Branching models
================
Git doesn't assume any particular development model regarding
branching and merging. Some projects prefer to graduate patches from
the oldest branch to the newest, some prefer to cherry-pick commits
backwards, some use squashing (combining a number of commits into
one). Anything is possible.
There are a few examples to start with. `git help workflows
<https://www.kernel.org/pub/software/scm/git/docs/gitworkflows.html>`_
describes how the very git authors develop git.
ProGit book has a few chapters devoted to branch management in
different projects: `Git Branching - Branching Workflows
<https://git-scm.com/book/en/Git-Branching-Branching-Workflows>`_ and
`Distributed Git - Contributing to a Project
<https://git-scm.com/book/en/Distributed-Git-Contributing-to-a-Project>`_.
There is also a well-known article `A successful Git branching model
<http://nvie.com/posts/a-successful-git-branching-model/>`_ by Vincent
Driessen. It recommends a set of very detailed rules on creating and
managing mainline, topic and bugfix branches. To support the model the
author implemented `git flow <https://github.com/nvie/gitflow>`_
extension.
Advanced configuration
======================
Line endings
------------
Git has builtin mechanisms to handle line endings between platforms
with different end-of-line styles. To allow git to do CRLF conversion
assign ``text`` attribute to files using `.gitattributes
<https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html>`_.
For files that have to have specific line endings assign ``eol``
attribute. For binary files the attribute is, naturally, ``binary``.
For example::
$ cat .gitattributes
*.py text
*.txt text
*.png binary
/readme.txt eol=CRLF
To check what attributes git uses for files use ``git check-attr``
command. For example::
$ git check-attr -a -- \*.py
Advanced topics
===============
Staging area
------------
Staging area aka index aka cache is a distinguishing feature of git.
Staging area is where git collects patches before committing them.
Separation between collecting patches and commit phases provides a
very useful feature of git: you can review collected patches before
commit and even edit them - remove some hunks, add new hunks and
review again.
To add files to the index use ``git add``. Collecting patches before
committing means you need to do that for every change, not only to add
new (untracked) files. To simplify committing in case you just want to
commit everything without reviewing run ``git commit --all`` (or just
``-a``) - the command adds every changed tracked file to the index and
then commit. To commit a file or files regardless of patches collected
in the index run ``git commit [--only|-o] -- $FILE...``.
To add hunks of patches to the index use ``git add --patch`` (or just
``-p``). To remove collected files from the index use ``git reset HEAD
-- $FILE...`` To add/inspect/remove collected hunks use ``git add
--interactive`` (``-i``).
To see the diff between the index and the last commit (i.e., collected
patches) use ``git diff --cached``. To see the diff between the
working tree and the index (i.e., uncollected patches) use just ``git
diff``. To see the diff between the working tree and the last commit
(i.e., both collected and uncollected patches) run ``git diff HEAD``.
See `WhatIsTheIndex
<https://git.wiki.kernel.org/index.php/WhatIsTheIndex>`_ and
`IndexCommandQuickref
<https://git.wiki.kernel.org/index.php/IndexCommandQuickref>`_ in Git
Wiki.
ReReRe
======
Rerere is a mechanism that helps to resolve repeated merge conflicts.
The most frequent source of recurring merge conflicts are topic
branches that are merged into mainline and then the merge commits are
removed; that's often performed to test the topic branches and train
rerere; merge commits are removed to have clean linear history and
finish the topic branch with only one last merge commit.
Rerere works by remembering the states of tree before and after a
successful commit. That way rerere can automatically resolve conflicts
if they appear in the same files.
Rerere can be used manually with ``git rerere`` command but most often
it's used automatically. Enable rerere with these commands in a
working tree::
$ git config rerere.enabled true
$ git config rerere.autoupdate true
You don't need to turn rerere on globally - you don't want rerere in
bare repositories or single-branche repositories; you only need rerere
in repos where you often perform merges and resolve merge conflicts.
See `Rerere <https://git-scm.com/book/en/Git-Tools-Rerere>`_ in The
Book.
Database maintenance
====================
Git object database and other files/directories under ``.git`` require
periodic maintenance and cleanup. For example, commit editing left
unreferenced objects (dangling objects, in git terminology) and these
objects should be pruned to avoid collecting cruft in the DB. The
command ``git gc`` is used for maintenance. Git automatically runs
``git gc --auto`` as a part of some commands to do quick maintenance.
Users are recommended to run ``git gc --aggressive`` from time to
time; ``git help gc`` recommends to run it every few hundred
changesets; for more intensive projects it should be something like
once a week and less frequently (biweekly or monthly) for lesser
active projects.
``git gc --aggressive`` not only removes dangling objects, it also
repacks object database into indexed and better optimized pack(s); it
also packs symbolic references (branches and tags). Another way to do
it is to run ``git repack``.
There is a well-known `message
<https://gcc.gnu.org/ml/gcc/2007-12/msg00165.html>`_ from Linus
Torvalds regarding "stupidity" of ``git gc --aggressive``. The message
can safely be ignored now. It is old and outdated, ``git gc
--aggressive`` became much better since that time.
For those who still prefer ``git repack`` over ``git gc --aggressive``
the recommended parameters are ``git repack -a -d -f --depth=20
--window=250``. See `this detailed experiment
<http://vcscompare.blogspot.ru/2008/06/git-repack-parameters.html>`_
for explanation of the effects of these parameters.
From time to time run ``git fsck [--strict]`` to verify integrity of
the database. ``git fsck`` may produce a list of dangling objects;
that's not an error, just a reminder to perform regular maintenance.
Tips and tricks
===============
Command-line options and arguments
----------------------------------
`git help cli
<https://www.kernel.org/pub/software/scm/git/docs/gitcli.html>`_
recommends not to combine short options/flags. Most of the times
combining works: ``git commit -av`` works perfectly, but there are
situations when it doesn't. E.g., ``git log -p -5`` cannot be combined
as ``git log -p5``.
Some options have arguments, some even have default arguments. In that
case the argument for such option must be spelled in a sticky way:
``-Oarg``, never ``-O arg`` because for an option that has a default
argument the latter means "use default value for option ``-O`` and
pass ``arg`` further to the option parser". For example, ``git grep``
has an option ``-O`` that passes a list of names of the found files to
a program; default program for ``-O`` is a pager (usually ``less``),
but you can use your editor::
$ git grep -Ovim # but not -O vim
BTW, if git is instructed to use ``less`` as the pager (i.e., if pager
is not configured in git at all it uses ``less`` by default, or if it
gets ``less`` from GIT_PAGER or PAGER environment variables, or if it
was configured with ``git config --global core.pager less``, or
``less`` is used in the command ``git grep -Oless``) ``git grep``
passes ``+/$pattern`` option to ``less`` which is quite convenient.
Unfortunately, ``git grep`` doesn't pass the pattern if the pager is
not exactly ``less``, even if it's ``less`` with parameters (something
like ``git config --global core.pager less -FRSXgimq``); fortunately,
``git grep -Oless`` always passes the pattern.
bash/zsh completion
-------------------
It's a bit hard to type ``git rebase --interactive --preserve-merges
HEAD~5`` manually even for those who are happy to use command-line,
and this is where shell completion is of great help. Bash/zsh come
with programmable completion, often automatically installed and
enabled, so if you have bash/zsh and git installed, chances are you
are already done - just go and use it at the command-line.
If you don't have necessary bits installed, install and enable
bash_completion package. If you want to upgrade your git completion to
the latest and greatest download necessary file from `git contrib
<https://git.kernel.org/cgit/git/git.git/tree/contrib/completion>`_.
Git-for-windows comes with git-bash for which bash completion is
installed and enabled.
bash/zsh prompt
---------------
For command-line lovers shell prompt can carry a lot of useful
information. To include git information in the prompt use
`git-prompt.sh
<https://git.kernel.org/cgit/git/git.git/tree/contrib/completion/git-prompt.sh>`_.
Read the detailed instructions in the file.
Search the Net for "git prompt" to find other prompt variants.
git on server
=============
The simplest way to publish a repository or a group of repositories is
``git daemon``. The daemon provides anonymous access, by default it is
read-only. The repositories are accessible by git protocol (git://
URLs). Write access can be enabled but the protocol lacks any
authentication means, so it should be enabled only within a trusted
LAN. See ``git help daemon`` for details.
Git over ssh provides authentication and repo-level authorisation as
repositories can be made user- or group-writeable (see parameter
``core.sharedRepository`` in ``git help config``). If that's too
permissive or too restrictive for some project's needs there is a
wrapper `gitolite <http://gitolite.com/gitolite/index.html>`_ that can
be configured to allow access with great granularity; gitolite is
written in Perl and has a lot of documentation.
Web interface to browse repositories can be created using `gitweb
<https://git.kernel.org/cgit/git/git.git/tree/gitweb>`_ or `cgit
<http://git.zx2c4.com/cgit/about/>`_. Both are CGI scripts (written in
Perl and C). In addition to web interface both provide read-only dumb
http access for git (http(s):// URLs).
There are also more advanced web-based development environments that
include ability to manage users, groups and projects; private,
group-accessible and public repositories; they often include issue
trackers, wiki pages, pull requests and other tools for development
and communication. Among these environments are `Kallithea
<https://kallithea-scm.org/>`_ and `pagure <https://pagure.io/>`_,
both are written in Python; pagure was written by Fedora developers
and is being used to develop some Fedora projects. `Gogs
<http://gogs.io/>`_ is written in Go; there is a fork `Gitea
<http://gitea.io/>`_.
And last but not least, `Gitlab <https://about.gitlab.com/>`_. It's
perhaps the most advanced web-based development environment for git.
Written in Ruby, community edition is free and open source (MIT
license).
From Mercurial to git
=====================
There are many tools to convert Mercurial repositories to git. The
most famous are, probably, `hg-git <https://hg-git.github.io/>`_ and
`fast-export <http://repo.or.cz/w/fast-export.git>`_ (many years ago
it was known under the name ``hg2git``).
But a better tool, perhaps the best, is `git-remote-hg
<https://github.com/felipec/git-remote-hg>`_. It provides transparent
bidirectional (pull and push) access to Mercurial repositories from
git. Its author wrote a `comparison of alternatives
<https://github.com/felipec/git/wiki/Comparison-of-git-remote-hg-alternatives>`_
that seems to be mostly objective.
To use git-remote-hg, install or clone it, add to your PATH (or copy
script ``git-remote-hg`` to a directory that's already in PATH) and
prepend ``hg::`` to Mercurial URLs. For example::
$ git clone https://github.com/felipec/git-remote-hg.git
$ PATH=$PATH:"`pwd`"/git-remote-hg
$ git clone hg::https://hg.python.org/peps/ PEPs
To work with the repository just use regular git commands including
``git fetch/pull/push``.
To start converting your Mercurial habits to git see the page
`Mercurial for Git users
<https://mercurial.selenic.com/wiki/GitConcepts>`_ at Mercurial wiki.
At the second half of the page there is a table that lists
corresponding Mercurial and git commands. Should work perfectly in
both directions.
Python Developer's Guide also has a chapter `Mercurial for git
developers <https://docs.python.org/devguide/gitdevs.html>`_ that
documents a few differences between git and hg.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
vim: set fenc=us-ascii tw=70 :

View File

@ -66,7 +66,7 @@ Features for 3.5
* PEP 479, change StopIteration handling inside generators
* PEP 484, the typing module, a new standard for type annotations
* PEP 485, math.isclose(), a function for testing approximate equality
* PEP 486, making the Widnows Python launcher aware of virtual environments
* PEP 486, making the Windows Python launcher aware of virtual environments
* PEP 488, eliminating .pyo files
* PEP 489, a new and improved mechanism for loading extension modules
* PEP 492, coroutines with async and await syntax

View File

@ -404,6 +404,19 @@ where ``delta`` is the size of the fold or the gap.
Temporal Arithmetic and Comparison Operators
============================================
.. epigraph::
| In *mathematicks* he was greater
| Than Tycho Brahe, or Erra Pater:
| For he, by geometric scale,
| Could take the size of pots of ale;
| Resolve, by sines and tangents straight,
| If bread or butter wanted weight,
| And wisely tell what hour o' th' day
| The clock does strike by algebra.
-- "Hudibras" by Samuel Butler
The value of the ``fold`` attribute will be ignored in all operations
with naive datetime instances. As a consequence, naive
``datetime.datetime`` or ``datetime.time`` instances that differ only

View File

@ -8,7 +8,7 @@ Type: Standards Track
Content-Type: text/x-rst
Created: 01-Aug-2015
Python-Version: 3.6
Post-History: 07-Aug-2015, 30-Aug-2015, 04-Sep-2015
Post-History: 07-Aug-2015, 30-Aug-2015, 04-Sep-2015, 19-Sep-2015
Resolution: https://mail.python.org/pipermail/python-dev/2015-September/141526.html
Abstract
@ -201,6 +201,11 @@ braces ``'{{'`` or ``'}}'`` inside literal portions of an f-string are
replaced by the corresponding single brace. Doubled opening braces do
not signify the start of an expression.
Note that ``__format__()`` is not called directly on each value. The
actual code uses the equivalent of ``type(value).__format__(value,
format_spec)``, or ``format(value, format_spec)``. See the
documentation of the builtin ``format()`` function for more details.
Comments, using the ``'#'`` character, are not allowed inside an
expression.
@ -209,7 +214,7 @@ specified. The allowed conversions are ``'!s'``, ``'!r'``, or
``'!a'``. These are treated the same as in ``str.format()``: ``'!s'``
calls ``str()`` on the expression, ``'!r'`` calls ``repr()`` on the
expression, and ``'!a'`` calls ``ascii()`` on the expression. These
conversions are applied before the call to ``__format__``. The only
conversions are applied before the call to ``format()``. The only
reason to use ``'!s'`` is if you want to specify a format specifier
that applies to ``str``, not to the type of the expression.
@ -222,9 +227,9 @@ So, an f-string looks like::
f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } <text> ... '
The resulting expression's ``__format__`` method is called with the
format specifier as an argument. The resulting value is used when
building the value of the f-string.
The expression is then formatted using the ``__format__`` protocol,
using the format specifier as an argument. The resulting value is
used when building the value of the f-string.
Expressions cannot contain ``':'`` or ``'!'`` outside of strings or
parentheses, brackets, or braces. The exception is that the ``'!='``
@ -293,7 +298,7 @@ For example, this code::
Might be be evaluated as::
'abc' + expr1.__format__(spec1) + repr(expr2).__format__(spec2) + 'def' + str(expr3).__format__('') + 'ghi'
'abc' + format(expr1, spec1) + format(repr(expr2)) + 'def' + format(str(expr3)) + 'ghi'
Expression evaluation
---------------------
@ -371,7 +376,7 @@ yields the value::
While the exact method of this run time concatenation is unspecified,
the above code might evaluate to::
'ab' + x.__format__('') + '{c}' + 'str<' + y.__format__('^4') + '>de'
'ab' + format(x) + '{c}' + 'str<' + format(y, '^4') + '>de'
Each f-string is entirely evaluated before being concatenated to
adjacent f-strings. That means that this::

View File

@ -1,44 +1,46 @@
PEP: 502
Title: String Interpolation Redux
Title: String Interpolation - Extended Discussion
Version: $Revision$
Last-Modified: $Date$
Author: Mike G. Miller
Status: Draft
Type: Standards Track
Type: Informational
Content-Type: text/x-rst
Created: 10-Aug-2015
Python-Version: 3.6
Note: Open issues below are stated with a question mark (?),
and are therefore searchable.
Abstract
========
This proposal describes a new string interpolation feature for Python,
called an *expression-string*,
that is both concise and powerful,
improves readability in most cases,
yet does not conflict with existing code.
PEP 498: *Literal String Interpolation*, which proposed "formatted strings" was
accepted September 9th, 2015.
Additional background and rationale given during its design phase is detailed
below.
To recap that PEP,
a string prefix was introduced that marks the string as a template to be
rendered.
These formatted strings may contain one or more expressions
built on `the existing syntax`_ of ``str.format()``.
The formatted string expands at compile-time into a conventional string format
operation,
with the given expressions from its text extracted and passed instead as
positional arguments.
To achieve this end,
a new string prefix is introduced,
which expands at compile-time into an equivalent expression-string object,
with requested variables from its context passed as keyword arguments.
At runtime,
the new object uses these passed values to render a string to given
specifications, building on `the existing syntax`_ of ``str.format()``::
the resulting expressions are evaluated to render a string to given
specifications::
>>> location = 'World'
>>> e'Hello, {location} !' # new prefix: e''
'Hello, World !' # interpolated result
>>> f'Hello, {location} !' # new prefix: f''
'Hello, World !' # interpolated result
Format-strings may be thought of as merely syntactic sugar to simplify traditional
calls to ``str.format()``.
.. _the existing syntax: https://docs.python.org/3/library/string.html#format-string-syntax
This PEP does not recommend to remove or deprecate any of the existing string
formatting mechanisms.
Motivation
==========
@ -50,12 +52,16 @@ In comparison to other dynamic scripting languages
with similar use cases,
the amount of code necessary to build similar strings is substantially higher,
while at times offering lower readability due to verbosity, dense syntax,
or identifier duplication. [1]_
or identifier duplication.
These difficulties are described at moderate length in the original
`post to python-ideas`_
that started the snowball (that became PEP 498) rolling. [1]_
Furthermore, replacement of the print statement with the more consistent print
function of Python 3 (PEP 3105) has added one additional minor burden,
an additional set of parentheses to type and read.
Combined with the verbosity of current formatting solutions,
Combined with the verbosity of current string formatting solutions,
this puts an otherwise simple language at an unfortunate disadvantage to its
peers::
@ -66,7 +72,7 @@ peers::
# Python 3, str.format with named parameters
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(**locals()))
# Python 3, variation B, worst case
# Python 3, worst case
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(user=user,
id=id,
hostname=
@ -74,7 +80,7 @@ peers::
In Python, the formatting and printing of a string with multiple variables in a
single line of code of standard width is noticeably harder and more verbose,
indentation often exacerbating the issue.
with indentation exacerbating the issue.
For use cases such as smaller projects, systems programming,
shell script replacements, and even one-liners,
@ -82,36 +88,17 @@ where message formatting complexity has yet to be encapsulated,
this verbosity has likely lead a significant number of developers and
administrators to choose other languages over the years.
.. _post to python-ideas: https://mail.python.org/pipermail/python-ideas/2015-July/034659.html
Rationale
=========
Naming
------
The term expression-string was chosen because other applicable terms,
such as format-string and template are already well used in the Python standard
library.
The string prefix itself, ``e''`` was chosen to demonstrate that the
specification enables expressions,
is not limited to ``str.format()`` syntax,
and also does not lend itself to `the shorthand term`_ "f-string".
It is also slightly easier to type than other choices such as ``_''`` and
``i''``,
while perhaps `less odd-looking`_ to C-developers.
``printf('')`` vs. ``print(f'')``.
.. _the shorthand term: reference_needed
.. _less odd-looking: https://mail.python.org/pipermail/python-dev/2015-August/141147.html
Goals
-------------
The design goals of expression-strings are as follows:
The design goals of format strings are as follows:
#. Eliminate need to pass variables manually.
#. Eliminate repetition of identifiers and redundant parentheses.
@ -133,40 +120,44 @@ Python specified both single (``'``) and double (``"``) ASCII quote
characters to enclose strings.
It is not reasonable to choose one of them now to enable interpolation,
while leaving the other for uninterpolated strings.
"Backtick" characters (`````) are also `constrained by history`_ as a shortcut
for ``repr()``.
Other characters,
such as the "Backtick" (or grave accent `````) are also
`constrained by history`_
as a shortcut for ``repr()``.
This leaves a few remaining options for the design of such a feature:
* An operator, as in printf-style string formatting via ``%``.
* A class, such as ``string.Template()``.
* A function, such as ``str.format()``.
* New syntax
* A method or function, such as ``str.format()``.
* New syntax, or
* A new string prefix marker, such as the well-known ``r''`` or ``u''``.
The first three options above currently work well.
The first three options above are mature.
Each has specific use cases and drawbacks,
yet also suffer from the verbosity and visual noise mentioned previously.
All are discussed in the next section.
All options are discussed in the next sections.
.. _constrained by history: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html
Background
-------------
This proposal builds on several existing techniques and proposals and what
Formatted strings build on several existing techniques and proposals and what
we've collectively learned from them.
In keeping with the design goals of readability and error-prevention,
the following examples therefore use named,
not positional arguments.
The following examples focus on the design goals of readability and
error-prevention using named parameters.
Let's assume we have the following dictionary,
and would like to print out its items as an informative string for end users::
>>> params = {'user': 'nobody', 'id': 9, 'hostname': 'darkstar'}
Printf-style formatting
'''''''''''''''''''''''
Printf-style formatting, via operator
'''''''''''''''''''''''''''''''''''''
This `venerable technique`_ continues to have its uses,
such as with byte-based protocols,
@ -178,7 +169,7 @@ and familiarity to many programmers::
In this form, considering the prerequisite dictionary creation,
the technique is verbose, a tad noisy,
and relatively readable.
yet relatively readable.
Additional issues are that an operator can only take one argument besides the
original string,
meaning multiple parameters must be passed in a tuple or dictionary.
@ -190,8 +181,8 @@ or forget the trailing type, e.g. (``s`` or ``d``).
.. _venerable technique: https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
string.Template
'''''''''''''''
string.Template Class
'''''''''''''''''''''
The ``string.Template`` `class from`_ PEP 292
(Simpler String Substitutions)
@ -202,7 +193,7 @@ that finds its main use cases in shell and internationalization tools::
Template('Hello, user: $user, id: ${id}, on host: $hostname').substitute(params)
Also verbose, however the string itself is readable.
While also verbose, the string itself is readable.
Though functionality is limited,
it meets its requirements well.
It isn't powerful enough for many cases,
@ -232,8 +223,8 @@ and likely contributed to the PEP's lack of acceptance.
It was superseded by the following proposal.
str.format()
''''''''''''
str.format() Method
'''''''''''''''''''
The ``str.format()`` `syntax of`_ PEP 3101 is the most recent and modern of the
existing options.
@ -253,36 +244,32 @@ string literals::
host=hostname)
'Hello, user: nobody, id: 9, on host: darkstar'
The verbosity of the method-based approach is illustrated here.
.. _syntax of: https://docs.python.org/3/library/string.html#format-string-syntax
PEP 498 -- Literal String Formatting
''''''''''''''''''''''''''''''''''''
PEP 498 discusses and delves partially into implementation details of
expression-strings,
which it calls f-strings,
the idea and syntax
(with exception of the prefix letter)
of which is identical to that discussed here.
The resulting compile-time transformation however
returns a string joined from parts at runtime,
rather than an object.
It also, somewhat controversially to those first exposed to it,
introduces the idea that these strings shall be augmented with support for
arbitrary expressions,
which is discussed further in the following sections.
PEP 498 defines and discusses format strings,
as also described in the `Abstract`_ above.
It also, somewhat controversially to those first exposed,
introduces the idea that format-strings shall be augmented with support for
arbitrary expressions.
This is discussed further in the
Restricting Syntax section under
`Rejected Ideas`_.
PEP 501 -- Translation ready string interpolation
'''''''''''''''''''''''''''''''''''''''''''''''''
The complimentary PEP 501 brings internationalization into the discussion as a
first-class concern, with its proposal of i-strings,
first-class concern, with its proposal of the i-prefix,
``string.Template`` syntax integration compatible with ES6 (Javascript),
deferred rendering,
and a similar object return value.
and an object return value.
Implementations in Other Languages
@ -374,7 +361,8 @@ ES6 (Javascript)
Designers of `Template strings`_ faced the same issue as Python where single
and double quotes were taken.
Unlike Python however, "backticks" were not.
They were chosen as part of the ECMAScript 2015 (ES6) standard::
Despite `their issues`_,
they were chosen as part of the ECMAScript 2015 (ES6) standard::
console.log(`Fifteen is ${a + b} and\nnot ${2 * a + b}.`);
@ -391,8 +379,10 @@ as the tag::
* User implemented prefixes supported.
* Arbitrary expressions are supported.
.. _their issues: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html
.. _Template strings: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
C#, Version 6
'''''''''''''
@ -428,13 +418,14 @@ Arbitrary `interpolation under Swift`_ is available on all strings::
Additional examples
'''''''''''''''''''
A number of additional examples may be `found at Wikipedia`_.
A number of additional examples of string interpolation may be
`found at Wikipedia`_.
Now that background and history have been covered,
let's continue on for a solution.
.. _found at Wikipedia: https://en.wikipedia.org/wiki/String_interpolation#Examples
Now that background and imlementation history have been covered,
let's continue on for a solution.
New Syntax
----------
@ -442,178 +433,47 @@ New Syntax
This should be an option of last resort,
as every new syntax feature has a cost in terms of real-estate in a brain it
inhabits.
There is one alternative left on our list of possibilities,
There is however one alternative left on our list of possibilities,
which follows.
New String Prefix
-----------------
Given the history of string formatting in Python,
backwards-compatibility,
Given the history of string formatting in Python and backwards-compatibility,
implementations in other languages,
and the avoidance of new syntax unless necessary,
avoidance of new syntax unless necessary,
an acceptable design is reached through elimination
rather than unique insight.
Therefore, we choose to explicitly mark interpolated string literals with a
string prefix.
Therefore, marking interpolated string literals with a string prefix is chosen.
We also choose an expression syntax that reuses and builds on the strongest of
We also choose an expression syntax that reuses and builds on the strongest of
the existing choices,
``str.format()`` to avoid further duplication.
Specification
=============
String literals with the prefix of ``e`` shall be converted at compile-time to
the construction of an ``estr`` (perhaps ``types.ExpressionString``?) object.
Strings and values are parsed from the literal and passed as tuples to the
constructor::
``str.format()`` to avoid further duplication of functionality::
>>> location = 'World'
>>> e'Hello, {location} !'
>>> f'Hello, {location} !' # new prefix: f''
'Hello, World !' # interpolated result
# becomes
# estr('Hello, {location} !', # template
('Hello, ', ' !'), # string fragments
('location',), # expressions
('World',), # values
)
The object interpolates its result immediately at run-time::
'Hello, World !'
PEP 498 -- Literal String Formatting, delves into the mechanics and
implementation of this design.
ExpressionString Objects
------------------------
The ExpressionString object supports both immediate and deferred rendering of
its given template and parameters.
It does this by immediately rendering its inputs to its internal string and
``.rendered`` string member (still necessary?),
useful in the majority of use cases.
To allow for deferred rendering and caller-specified escaping,
all inputs are saved for later inspection,
with convenience methods available.
Notes:
* Inputs are saved to the object as ``.template`` and ``.context`` members
for later use.
* No explicit ``str(estr)`` call is necessary to render the result,
though doing so might be desired to free resources if significant.
* Additional or deferred rendering is available through the ``.render()``
method, which allows template and context to be overriden for flexibility.
* Manual escaping of potentially dangerous input is available through the
``.escape(escape_function)`` method,
the rules of which may therefore be specified by the caller.
The given function should both accept and return a single modified string.
* A sample Python implementation can `found at Bitbucket`_:
.. _found at Bitbucket: https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_demo.py
Inherits From ``str`` Type
'''''''''''''''''''''''''''
Inheriting from the ``str`` class is one of the techniques available to improve
compatibility with code expecting a string object,
as it will pass an ``isinstance(obj, str)`` test.
ExpressionString implements this and also renders its result into the "raw"
string of its string superclass,
providing compatibility with a majority of code.
Interpolation Syntax
--------------------
The strongest of the existing string formatting syntaxes is chosen,
``str.format()`` as a base to build on. [10]_ [11]_
..
* Additionally, single arbitrary expressions shall also be supported inside
braces as an extension::
>>> e'My age is {age + 1} years.'
See below for section on safety.
* Triple quoted strings with multiple lines shall be supported::
>>> e'''Hello,
{location} !'''
'Hello,\n World !'
* Adjacent implicit concatenation shall be supported;
interpolation does not `not bleed into`_ other strings::
>>> 'Hello {1, 2, 3} ' e'{location} !'
'Hello {1, 2, 3} World !'
* Additional implementation details,
for example expression and error-handling,
are specified in the compatible PEP 498.
.. _not bleed into: https://mail.python.org/pipermail/python-ideas/2015-July/034763.html
Composition with Other Prefixes
-------------------------------
* Expression-strings apply to unicode objects only,
therefore ``u''`` is never needed.
Should it be prevented?
* Bytes objects are not included here and do not compose with e'' as they
do not support ``__format__()``.
* Complimentary to raw strings,
backslash codes shall not be converted in the expression-string,
when combined with ``r''`` as ``re''``.
Examples
--------
A more complicated example follows::
n = 5; # t0, t1 = … TODO
a = e"Sliced {n} onions in {t1-t0:.3f} seconds."
# returns the equvalent of
estr("Sliced {n} onions in {t1-t0:.3f} seconds", # template
('Sliced ', ' onions in ', ' seconds'), # strings
('n', 't1-t0:.3f'), # expressions
(5, 0.555555) # values
)
With expressions only::
b = e"Three random numbers: {rand()}, {rand()}, {rand()}."
# returns the equvalent of
estr("Three random numbers: {rand():f}, {rand():f}, {rand():}.", # template
('Three random numbers: ', ', ', ', ', '.'), # strings
('rand():f', 'rand():f', 'rand():f'), # expressions
(rand(), rand(), rand()) # values
)
Additional Topics
=================
Safety
-----------
In this section we will describe the safety situation and precautions taken
in support of expression-strings.
in support of format-strings.
#. Only string literals shall be considered here,
#. Only string literals have been considered for format-strings,
not variables to be taken as input or passed around,
making external attacks difficult to accomplish.
* ``str.format()`` `already handles`_ this use-case.
* Direct instantiation of the ExpressionString object with non-literal input
shall not be allowed. (Practicality?)
``str.format()`` and alternatives `already handle`_ this use-case.
#. Neither ``locals()`` nor ``globals()`` are necessary nor used during the
transformation,
@ -622,37 +482,72 @@ in support of expression-strings.
#. To eliminate complexity as well as ``RuntimeError`` (s) due to recursion
depth, recursive interpolation is not supported.
#. Restricted characters or expression classes?, such as ``=`` for assignment.
However,
mistakes or malicious code could be missed inside string literals.
Though that can be said of code in general,
that these expressions are inside strings means they are a bit more likely
to be obscured.
.. _already handles: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html
.. _already handle: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html
Mitigation via tools
Mitigation via Tools
''''''''''''''''''''
The idea is that tools or linters such as pyflakes, pylint, or Pycharm,
could check inside strings for constructs that exceed project policy.
As this is a common task with languages these days,
tools won't have to implement this feature solely for Python,
may check inside strings with expressions and mark them up appropriately.
As this is a common task with programming languages today,
multi-language tools won't have to implement this feature solely for Python,
significantly shortening time to implementation.
Additionally the Python interpreter could check(?) and warn with appropriate
command-line parameters passed.
Farther in the future,
strings might also be checked for constructs that exceed the safety policy of
a project.
Style Guide/Precautions
-----------------------
As arbitrary expressions may accomplish anything a Python expression is
able to,
it is highly recommended to avoid constructs inside format-strings that could
cause side effects.
Further guidelines may be written once usage patterns and true problems are
known.
Reference Implementation(s)
---------------------------
The `say module on PyPI`_ implements string interpolation as described here
with the small burden of a callable interface::
pip install say
from say import say
nums = list(range(4))
say("Nums has {len(nums)} items: {nums}")
A Python implementation of Ruby interpolation `is also available`_.
It uses the codecs module to do its work::
pip install interpy
# coding: interpy
location = 'World'
print("Hello #{location}.")
.. _say module on PyPI: https://pypi.python.org/pypi/say/
.. _is also available: https://github.com/syrusakbary/interpy
Backwards Compatibility
-----------------------
By using existing syntax and avoiding use of current or historical features,
expression-strings (and any associated sub-features),
were designed so as to not interfere with existing code and is not expected
to cause any issues.
By using existing syntax and avoiding current or historical features,
format strings were designed so as to not interfere with existing code and are
not expected to cause any issues.
Postponed Ideas
@ -666,20 +561,12 @@ Though it was highly desired to integrate internationalization support,
the finer details diverge at almost every point,
making a common solution unlikely: [15]_
* Use-cases
* Compile and run-time tasks
* Interpolation Syntax
* Use-cases differ
* Compile vs. run-time tasks
* Interpolation syntax needs
* Intended audience
* Security policy
Rather than try to fit a "square peg in a round hole,"
this PEP attempts to allow internationalization to be supported in the future
by not preventing it.
In this proposal,
expression-string inputs are saved for inspection and re-rendering at a later
time,
allowing for their use by an external library of any sort.
Rejected Ideas
--------------
@ -687,18 +574,25 @@ Rejected Ideas
Restricting Syntax to ``str.format()`` Only
'''''''''''''''''''''''''''''''''''''''''''
This was deemed not enough of a solution to the problem.
It can be seen in the `Implementations in Other Languages`_ section that the
developer community at large tends to agree.
The common `arguments against`_ support of arbitrary expresssions were:
The common `arguments against`_ arbitrary expresssions were:
#. YAGNI, "You ain't gonna need it."
#. The change is not congruent with historical Python conservatism.
#. `YAGNI`_, "You aren't gonna need it."
#. The feature is not congruent with historical Python conservatism.
#. Postpone - can implement in a future version if need is demonstrated.
.. _YAGNI: https://en.wikipedia.org/wiki/You_aren't_gonna_need_it
.. _arguments against: https://mail.python.org/pipermail/python-ideas/2015-August/034913.html
Support of only ``str.format()`` syntax however,
was deemed not enough of a solution to the problem.
Often a simple length or increment of an object, for example,
is desired before printing.
It can be seen in the `Implementations in Other Languages`_ section that the
developer community at large tends to agree.
String interpolation with arbitrary expresssions is becoming an industry
standard in modern languages due to its utility.
Additional/Custom String-Prefixes
'''''''''''''''''''''''''''''''''
@ -720,7 +614,7 @@ this was thought to create too much uncertainty of when and where string
expressions could be used safely or not.
The concept was also difficult to describe to others. [12]_
Always consider expression-string variables to be unescaped,
Always consider format string variables to be unescaped,
unless the developer has explicitly escaped them.
@ -735,33 +629,13 @@ and looking too much like bash/perl,
which could encourage bad habits. [13]_
Reference Implementation(s)
===========================
An expression-string implementation is currently attached to PEP 498,
under the ``f''`` prefix,
and may be available in nightly builds.
A Python implementation of Ruby interpolation `is also available`_,
which is similar to this proposal.
It uses the codecs module to do its work::
pip install interpy
# coding: interpy
location = 'World'
print("Hello #{location}.")
.. _is also available: https://github.com/syrusakbary/interpy
Acknowledgements
================
* Eric V. Smith for providing invaluable implementation work and design
opinions, helping to focus this PEP.
* Others on the python-ideas mailing list for rejecting the craziest of ideas,
also helping to achieve focus.
* Eric V. Smith for the authoring and implementation of PEP 498.
* Everyone on the python-ideas mailing list for rejecting the various crazy
ideas that came up,
helping to keep the final design in focus.
References
@ -771,7 +645,6 @@ References
(https://mail.python.org/pipermail/python-ideas/2015-July/034659.html)
.. [2] Briefer String Format
(https://mail.python.org/pipermail/python-ideas/2015-July/034669.html)

396
pep-0504.txt Normal file
View File

@ -0,0 +1,396 @@
PEP: 504
Title: Using the System RNG by default
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>
Status: Withdrawn
Type: Standards Track
Content-Type: text/x-rst
Created: 15-Sep-2015
Python-Version: 3.6
Post-History: 15-Sep-2015
Abstract
========
Python currently defaults to using the deterministic Mersenne Twister random
number generator for the module level APIs in the ``random`` module, requiring
users to know that when they're performing "security sensitive" work, they
should instead switch to using the cryptographically secure ``os.urandom`` or
``random.SystemRandom`` interfaces or a third party library like
``cryptography``.
Unfortunately, this approach has resulted in a situation where developers that
aren't aware that they're doing security sensitive work use the default module
level APIs, and thus expose their users to unnecessary risks.
This isn't an acute problem, but it is a chronic one, and the often long
delays between the introduction of security flaws and their exploitation means
that it is difficult for developers to naturally learn from experience.
In order to provide an eventually pervasive solution to the problem, this PEP
proposes that Python switch to using the system random number generator by
default in Python 3.6, and require developers to opt-in to using the
deterministic random number generator process wide either by using a new
``random.ensure_repeatable()`` API, or by explicitly creating their own
``random.Random()`` instance.
To minimise the impact on existing code, module level APIs that require
determinism will implicitly switch to the deterministic PRNG.
PEP Withdrawal
==============
During discussion of this PEP, Steven D'Aprano proposed the simpler alternative
of offering a standardised ``secrets`` module that provides "one obvious way"
to handle security sensitive tasks like generating default passwords and other
tokens.
Steven's proposal has the desired effect of aligning the easy way to generate
such tokens and the right way to generate them, without introducing any
compatibility risks for the existing ``random`` module API, so this PEP has
been withdrawn in favour of further work on refining Steven's proposal as
PEP 506.
Proposal
========
Currently, it is never correct to use the module level functions in the
``random`` module for security sensitive applications. This PEP proposes to
change that admonition in Python 3.6+ to instead be that it is not correct to
use the module level functions in the ``random`` module for security sensitive
applications if ``random.ensure_repeatable()`` is ever called (directly or
indirectly) in that process.
To achieve this, rather than being bound methods of a ``random.Random``
instance as they are today, the module level callables in ``random`` would
change to be functions that delegate to the corresponding method of the
existing ``random._inst`` module attribute.
By default, this attribute will be bound to a ``random.SystemRandom`` instance.
A new ``random.ensure_repeatable()`` API will then rebind the ``random._inst``
attribute to a ``system.Random`` instance, restoring the same module level
API behaviour as existed in previous Python versions (aside from the
additional level of indirection)::
def ensure_repeatable():
"""Switch to using random.Random() for the module level APIs
This switches the default RNG instance from the crytographically
secure random.SystemRandom() to the deterministic random.Random(),
enabling the seed(), getstate() and setstate() operations. This means
a particular random scenario can be replayed later by providing the
same seed value or restoring a previously saved state.
NOTE: Libraries implementing security sensitive operations should
always explicitly use random.SystemRandom() or os.urandom in order to
correctly handle applications that call this function.
"""
if not isinstance(_inst, Random):
_inst = random.Random()
To minimise the impact on existing code, calling any of the following module
level functions will implicitly call ``random.ensure_repeatable()``:
* ``random.seed``
* ``random.getstate``
* ``random.setstate``
There are no changes proposed to the ``random.Random`` or
``random.SystemRandom`` class APIs - applications that explicitly instantiate
their own random number generators will be entirely unaffected by this
proposal.
Warning on implicit opt-in
--------------------------
In Python 3.6, implicitly opting in to the use of the deterministic PRNG will
emit a deprecation warning using the following check::
if not isinstance(_inst, Random):
warnings.warn(DeprecationWarning,
"Implicitly ensuring repeatability. "
"See help(random.ensure_repeatable) for details")
ensure_repeatable()
The specific wording of the warning should have a suitable answer added to
Stack Overflow as was done for the custom error message that was added for
missing parentheses in a call to print [#print]_.
In the first Python 3 release after Python 2.7 switches to security fix only
mode, the deprecation warning will be upgraded to a RuntimeWarning so it is
visible by default.
This PEP does *not* propose ever removing the ability to ensure the default RNG
used process wide is a deterministic PRNG that will produce the same series of
outputs given a specific seed. That capability is widely used in modelling
and simulation scenarios, and requiring that ``ensure_repeatable()`` be called
either directly or indirectly is a sufficient enhancement to address the cases
where the module level random API is used for security sensitive tasks in web
applications without due consideration for the potential security implications
of using a deterministic PRNG.
Performance impact
------------------
Due to the large performance difference between ``random.Random`` and
``random.SystemRandom``, applications ported to Python 3.6 will encounter a
significant performance regression in cases where:
* the application is using the module level random API
* cryptographic quality randomness isn't needed
* the application doesn't already implicitly opt back in to the deterministic
PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate``
* the application isn't updated to explicitly call ``random.ensure_repeatable``
This would be noted in the Porting section of the Python 3.6 What's New guide,
with the recommendation to include the following code in the ``__main__``
module of affected applications::
if hasattr(random, "ensure_repeatable"):
random.ensure_repeatable()
Applications that do need cryptographic quality randomness should be using the
system random number generator regardless of speed considerations, so in those
cases the change proposed in this PEP will fix a previously latent security
defect.
Documentation changes
---------------------
The ``random`` module documentation would be updated to move the documentation
of the ``seed``, ``getstate`` and ``setstate`` interfaces later in the module,
along with the documentation of the new ``ensure_repeatable`` function and the
associated security warning.
That section of the module documentation would also gain a discussion of the
respective use cases for the deterministic PRNG enabled by
``ensure_repeatable`` (games, modelling & simulation, software testing) and the
system RNG that is used by default (cryptography, security token generation).
This discussion will also recommend the use of third party security libraries
for the latter task.
Rationale
=========
Writing secure software under deadline and budget pressures is a hard problem.
This is reflected in regular notifications of data breaches involving personally
identifiable information [#breaches]_, as well as with failures to take
security considerations into account when new systems, like motor vehicles
[#uconnect]_, are connected to the internet. It's also the case that a lot of
the programming advice readily available on the internet [#search] simply
doesn't take the mathemetical arcana of computer security into account.
Compounding these issues is the fact that defenders have to cover *all* of
their potential vulnerabilites, as a single mistake can make it possible to
subvert other defences [#bcrypt]_.
One of the factors that contributes to making this last aspect particularly
difficult is APIs where using them inappropriately creates a *silent* security
failure - one where the only way to find out that what you're doing is
incorrect is for someone reviewing your code to say "that's a potential
security problem", or for a system you're responsible for to be compromised
through such an oversight (and you're not only still responsible for that
system when it is compromised, but your intrusion detection and auditing
mechanisms are good enough for you to be able to figure out after the event
how the compromise took place).
This kind of situation is a significant contributor to "security fatigue",
where developers (often rightly [#owasptopten]_) feel that security engineers
spend all their time saying "don't do that the easy way, it creates a
security vulnerability".
As the designers of one of the world's most popular languages [#ieeetopten]_,
we can help reduce that problem by making the easy way the right way (or at
least the "not wrong" way) in more circumstances, so developers and security
engineers can spend more time worrying about mitigating actually interesting
threats, and less time fighting with default language behaviours.
Discussion
==========
Why "ensure_repeatable" over "ensure_deterministic"?
----------------------------------------------------
This is a case where the meaning of a word as specialist jargon conflicts with
the typical meaning of the word, even though it's *technically* the same.
From a technical perspective, a "deterministic RNG" means that given knowledge
of the algorithm and the current state, you can reliably compute arbitrary
future states.
The problem is that "deterministic" on its own doesn't convey those qualifiers,
so it's likely to instead be interpreted as "predictable" or "not random" by
folks that are familiar with the conventional meaning, but aren't familiar with
the additional qualifiers on the technical meaning.
A second problem with "deterministic" as a description for the traditional RNG
is that it doesn't really tell you what you can *do* with the traditional RNG
that you can't do with the system one.
"ensure_repeatable" aims to address both of those problems, as its common
meaning accurately describes the main reason for preferring the deterministic
PRNG over the system RNG: ensuring you can repeat the same series of outputs
by providing the same seed value, or by restoring a previously saved PRNG state.
Only changing the default for Python 3.6+
-----------------------------------------
Some other recent security changes, such as upgrading the capabilities of the
``ssl`` module and switching to properly verifying HTTPS certificates by
default, have been considered critical enough to justify backporting the
change to all currently supported versions of Python.
The difference in this case is one of degree - the additional benefits from
rolling out this particular change a couple of years earlier than will
otherwise be the case aren't sufficient to justify either the additional effort
or the stability risks involved in making such an intrusive change in a
maintenance release.
Keeping the module level functions
----------------------------------
In additional to general backwards compatibility considerations, Python is
widely used for educational purposes, and we specifically don't want to
invalidate the wide array of educational material that assumes the availabilty
of the current ``random`` module API. Accordingly, this proposal ensures that
most of the public API can continue to be used not only without modification,
but without generating any new warnings.
Warning when implicitly opting in to the deterministic RNG
----------------------------------------------------------
It's necessary to implicitly opt in to the deterministic PRNG as Python is
widely used for modelling and simulation purposes where this is the right
thing to do, and in many cases, these software models won't have a dedicated
maintenance team tasked with ensuring they keep working on the latest versions
of Python.
Unfortunately, explicitly calling ``random.seed`` with data from ``os.urandom``
is also a mistake that appears in a number of the flawed "how to generate a
security token in Python" guides readily available online.
Using first DeprecationWarning, and then eventually a RuntimeWarning, to
advise against implicitly switching to the deterministic PRNG aims to
nudge future users that need a cryptographically secure RNG away from
calling ``random.seed()`` and those that genuinely need a deterministic
generator towards explicitily calling ``random.ensure_repeatable()``.
Avoiding the introduction of a userspace CSPRNG
-----------------------------------------------
The original discussion of this proposal on python-ideas[#csprng]_ suggested
introducing a cryptographically secure pseudo-random number generator and using
that by default, rather than defaulting to the relatively slow system random
number generator.
The problem [#nocsprng]_ with this approach is that it introduces an additional
point of failure in security sensitive situations, for the sake of applications
where the random number generation may not even be on a critical performance
path.
Applications that do need cryptographic quality randomness should be using the
system random number generator regardless of speed considerations, so in those
cases.
Isn't the deterministic PRNG "secure enough"?
---------------------------------------------
In a word, "No" - that's why there's a warning in the module documentation
that says not to use it for security sensitive purposes. While we're not
currently aware of any studies of Python's random number generator specifically,
studies of PHP's random number generator [#php]_ have demonstrated the ability
to use weaknesses in that subsystem to facilitate a practical attack on
password recovery tokens in popular PHP web applications.
However, one of the rules of secure software development is that "attacks only
get better, never worse", so it may be that by the time Python 3.6 is released
we will actually see a practical attack on Python's deterministic PRNG publicly
documented.
Security fatigue in the Python ecosystem
----------------------------------------
Over the past few years, the computing industry as a whole has been
making a concerted effort to upgrade the shared network infrastructure we all
depend on to a "secure by default" stance. As one of the most widely used
programming languages for network service development (including the OpenStack
Infrastructure-as-a-Service platform) and for systems administration
on Linux systems in general, a fair share of that burden has fallen on the
Python ecosystem, which is understandably frustrating for Pythonistas using
Python in other contexts where these issues aren't of as great a concern.
This consideration is one of the primary factors driving the substantial
backwards compatibility improvements in this proposal relative to the initial
draft concept posted to python-ideas [#draft]_.
Acknowledgements
================
* Theo de Raadt, for making the suggestion to Guido van Rossum that we
seriously consider defaulting to a cryptographically secure random number
generator
* Serhiy Storchaka, Terry Reedy, Petr Viktorin, and anyone else in the
python-ideas threads that suggested the approach of transparently switching
to the ``random.Random`` implementation when any of the functions that only
make sense for a deterministic RNG are called
* Nathaniel Smith for providing the reference on practical attacks against
PHP's random number generator when used to generate password reset tokens
* Donald Stufft for pursuing additional discussions with network security
experts that suggested the introduction of a userspace CSPRNG would mean
additional complexity for insufficient gain relative to just using the
system RNG directly
* Paul Moore for eloquently making the case for the current level of security
fatigue in the Python ecosystem
References
==========
.. [#breaches] Visualization of data breaches involving more than 30k records (each)
(http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/)
.. [#uconnect] Remote UConnect hack for Jeep Cherokee
(http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/)
.. [#php] PRNG based attack against password reset tokens in PHP applications
(https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf)
.. [#search] Search link for "python password generator"
(https://www.google.com.au/search?q=python+password+generator)
.. [#csprng] python-ideas thread discussing using a userspace CSPRNG
(https://mail.python.org/pipermail/python-ideas/2015-September/035886.html)
.. [#draft] Initial draft concept that eventually became this PEP
(https://mail.python.org/pipermail/python-ideas/2015-September/036095.html)
.. [#nocsprng] Safely generating random numbers
(http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/)
.. [#ieeetopten] IEEE Spectrum 2015 Top Ten Programming Languages
(http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages)
.. [#owasptopten] OWASP Top Ten Web Security Issues for 2013
(https://www.owasp.org/index.php/OWASP_Top_Ten_Project#tab=OWASP_Top_10_for_2013)
.. [#print] Stack Overflow answer for missing parentheses in call to print
(http://stackoverflow.com/questions/25445439/what-does-syntaxerror-missing-parentheses-in-call-to-print-mean-in-python/25445440#25445440)
.. [#bcrypt] Bypassing bcrypt through an insecure data cache
(http://arstechnica.com/security/2015/09/once-seen-as-bulletproof-11-million-ashley-madison-passwords-already-cracked/)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

205
pep-0505.txt Normal file
View File

@ -0,0 +1,205 @@
PEP: 505
Title: None coalescing operators
Version: $Revision$
Last-Modified: $Date$
Author: Mark E. Haase <mehaase@gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Sep-2015
Python-Version: 3.6
Abstract
========
Several modern programming languages have so-called "null coalescing" or
"null aware" operators, including C#, Dart, Perl, Swift, and PHP (starting in
version 7). These operators provide syntactic sugar for common patterns
involving null references. [1]_ [2]_
* The "null coalescing" operator is a binary operator that returns its first
first non-null operand.
* The "null aware member access" operator is a binary operator that accesses
an instance member only if that instance is non-null. It returns null
otherwise.
* The "null aware index access" operator is a binary operator that accesses a
member of a collection only if that collection is non-null. It returns null
otherwise.
Python does not have any directly equivalent syntax. The ``or`` operator can
be used to similar effect but checks for a truthy value, not ``None``
specifically. The ternary operator ``... if ... else ...`` can be used for
explicit null checks but is more verbose and typically duplicates part of the
expression in between ``if`` and ``else``. The proposed ``None`` coalescing
and ``None`` aware operators ofter an alternative syntax that is more
intuitive and concise.
Rationale
=========
Null Coalescing Operator
------------------------
The following code illustrates how the ``None`` coalescing operators would
work in Python::
>>> title = 'My Title'
>>> title ?? 'Default Title'
'My Title'
>>> title = None
>>> title ?? 'Default Title'
'Default Title'
Similar behavior can be achieved with the ``or`` operator, but ``or`` checks
whether its left operand is false-y, not specifically ``None``. This can lead
to surprising behavior. Consider the scenario of computing the price of some
products a customer has in his/her shopping cart::
>>> price = 100
>>> requested_quantity = 5
>>> default_quantity = 1
>>> (requested_quantity or default_quantity) * price
500
>>> requested_quantity = None
>>> (requested_quantity or default_quantity) * price
100
>>> requested_quantity = 0
>>> (requested_quantity or default_quantity) * price # oops!
100
This type of bug is not possible with the ``None`` coalescing operator,
because there is no implicit type coersion to ``bool``::
>>> price = 100
>>> requested_quantity = 0
>>> default_quantity = 1
>>> (requested_quantity ?? default_quantity) * price
0
The same correct behavior can be achieved with the ternary operator. Here is
an excerpt from the popular Requests package::
data = [] if data is None else data
files = [] if files is None else files
headers = {} if headers is None else headers
params = {} if params is None else params
hooks = {} if hooks is None else hooks
This particular formulation has the undesirable effect of putting the operands
in an unintuitive order: the brain thinks, "use ``data`` if possible and use
``[]`` as a fallback," but the code puts the fallback *before* the preferred
value.
The author of this package could have written it like this instead::
data = data if data is not None else []
files = files if files is not None else []
headers = headers if headers is not None else {}
params = params if params is not None else {}
hooks = hooks if hooks is not None else {}
This ordering of the operands is more intuitive, but it requires 4 extra
characters (for "not "). It also highlights the repetition of identifiers:
``data if data``, ``files if files``, etc. The ``None`` coalescing operator
improves readability::
data = data ?? []
files = files ?? []
headers = headers ?? {}
params = params ?? {}
hooks = hooks ?? {}
The ``None`` coalescing operator also has a corresponding assignment shortcut.
::
data ?= []
files ?= []
headers ?= {}
params ?= {}
hooks ?= {}
The ``None`` coalescing operator is left-associative, which allows for easy
chaining::
>>> user_title = None
>>> local_default_title = None
>>> global_default_title = 'Global Default Title'
>>> title = user_title ?? local_default_title ?? global_default_title
'Global Default Title'
The direction of associativity is important because the ``None`` coalescing
operator short circuits: if its left operand is non-null, then the right
operand is not evaluated.
::
>>> def get_default(): raise Exception()
>>> 'My Title' ?? get_default()
'My Title'
Null-Aware Member Access Operator
---------------------------------
::
>>> title = 'My Title'
>>> title.upper()
'MY TITLE'
>>> title = None
>>> title.upper()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'upper'
>>> title?.upper()
None
Null-Aware Index Access Operator
---------------------------------
::
>>> person = {'name': 'Mark', 'age': 32}
>>> person['name']
'Mark'
>>> person = None
>>> person['name']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not subscriptable
>>> person?['name']
None
Specification
=============
References
==========
.. [1] Wikipedia: Null coalescing operator
(https://en.wikipedia.org/wiki/Null_coalescing_operator)
.. [2] Seth Ladd's Blog: Null-aware operators in Dart
(http://blog.sethladd.com/2015/07/null-aware-operators-in-dart.html)
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

356
pep-0506.txt Normal file
View File

@ -0,0 +1,356 @@
PEP: 506
Title: Adding A Secrets Module To The Standard Library
Version: $Revision$
Last-Modified: $Date$
Author: Steven D'Aprano <steve@pearwood.info>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 19-Sep-2015
Python-Version: 3.6
Post-History:
Abstract
========
This PEP proposes the addition of a module for common security-related
functions such as generating tokens to the Python standard library.
Definitions
===========
Some common abbreviations used in this proposal:
* PRNG:
Pseudo Random Number Generator. A deterministic algorithm used
to produce random-looking numbers with certain desirable
statistical properties.
* CSPRNG:
Cryptographically Strong Pseudo Random Number Generator. An
algorithm used to produce random-looking numbers which are
resistant to prediction.
* MT:
Mersenne Twister. An extensively studied PRNG which is currently
used by the ``random`` module as the default.
Rationale
=========
This proposal is motivated by concerns that Python's standard library
makes it too easy for developers to inadvertently make serious security
errors. Theo de Raadt, the founder of OpenBSD, contacted Guido van Rossum
and expressed some concern [1]_ about the use of MT for generating sensitive
information such as passwords, secure tokens, session keys and similar.
Although the documentation for the random module explicitly states that
the default is not suitable for security purposes [2]_, it is strongly
believed that this warning may be missed, ignored or misunderstood by
many Python developers. In particular:
* developers may not have read the documentation and consequently
not seen the warning;
* they may not realise that their specific use of it has security
implications; or
* not realising that there could be a problem, they have copied code
(or learned techniques) from websites which don't offer best
practises.
The first [3]_ hit when searching for "python how to generate passwords" on
Google is a tutorial that uses the default functions from the ``random``
module [4]_. Although it is not intended for use in web applications, it is
likely that similar techniques find themselves used in that situation.
The second hit is to a StackOverflow question about generating
passwords [5]_. Most of the answers given, including the accepted one, use
the default functions. When one user warned that the default could be
easily compromised, they were told "I think you worry too much." [6]_
This strongly suggests that the existing ``random`` module is an attractive
nuisance when it comes to generating (for example) passwords or secure
tokens.
Additional motivation (of a more philosophical bent) can be found in the
post which first proposed this idea [7]_.
Proposal
========
Alternative proposals have focused on the default PRNG in the ``random``
module, with the aim of providing "secure by default" cryptographically
strong primitives that developers can build upon without thinking about
security. (See Alternatives below.) This proposes a different approach:
* The standard library already provides cryptographically strong
primitives, but many users don't know they exist or when to use them.
* Instead of requiring crypto-naive users to write secure code, the
standard library should include a set of ready-to-use "batteries" for
the most common needs, such as generating secure tokens. This code
will both directly satisfy a need ("How do I generate a password reset
token?"), and act as an example of acceptable practises which
developers can learn from [8]_.
To do this, this PEP proposes that we add a new module to the standard
library, with the suggested name ``secrets``. This module will contain a
set of ready-to-use functions for common activities with security
implications, together with some lower-level primitives.
The suggestion is that ``secrets`` becomes the go-to module for dealing
with anything which should remain secret (passwords, tokens, etc.)
while the ``random`` module remains backward-compatible.
API and Implementation
======================
The contents of the ``secrets`` module is expected to evolve over time, and
likely will evolve between the time of writing this PEP and actual release
in the standard library [9]_. At the time of writing, the following functions
have been suggested:
* A high-level function for generating secure tokens suitable for use
in (e.g.) password recovery, as session keys, etc.
* A limited interface to the system CSPRNG, using either ``os.urandom``
directly or ``random.SystemRandom``. Unlike the ``random`` module, this
does not need to provide methods for seeding, getting or setting the
state, or any non-uniform distributions. It should provide the
following:
- A function for choosing items from a sequence, ``secrets.choice``.
- A function for generating an integer within some range, such as
``secrets.randrange`` or ``secrets.randint``.
- A function for generating a given number of random bits and/or bytes
as an integer.
- A similar function which returns the value as a hex digit string.
* ``hmac.compare_digest`` under the name ``equal``.
The consensus appears to be that there is no need to add a new CSPRNG to
the ``random`` module to support these uses, ``SystemRandom`` will be
sufficient.
Some illustrative implementations have been given by Nick Coghlan [10]_.
This idea has also been discussed on the issue tracker for the
"cryptography" module [11]_.
The ``secrets`` module itself will be pure Python, and other Python
implementations can easily make use of it unchanged, or adapt it as
necessary.
Alternatives
============
One alternative is to change the default PRNG provided by the ``random``
module [12]_. This received considerable scepticism and outright opposition:
* There is fear that a CSPRNG may be slower than the current PRNG (which
in the case of MT is already quite slow).
* Some applications (such as scientific simulations, and replaying
gameplay) require the ability to seed the PRNG into a known state,
which a CSPRNG lacks by design.
* Another major use of the ``random`` module is for simple "guess a number"
games written by beginners, and many people are loath to make any
change to the ``random`` module which may make that harder.
* Although there is no proposal to remove MT from the ``random`` module,
there was considerable hostility to the idea of having to opt-in to
a non-CSPRNG or any backwards-incompatible changes.
* Demonstrated attacks against MT are typically against PHP applications.
It is believed that PHP's version of MT is a significantly softer target
than Python's version, due to a poor seeding technique [13]_. Consequently,
without a proven attack against Python applications, many people object
to a backwards-incompatible change.
Nick Coghlan made an earlier suggestion for a globally configurable PRNG
which uses the system CSPRNG by default [14]_, but has since hinted that he
may withdraw it in favour of this proposal [15]_.
Comparison To Other Languages
=============================
* PHP
PHP includes a function ``uniqid`` [16]_ which by default returns a
thirteen character string based on the current time in microseconds.
Translated into Python syntax, it has the following signature::
def uniqid(prefix='', more_entropy=False)->str
The PHP documentation warns that this function is not suitable for
security purposes. Nevertheless, various mature, well-known PHP
applications use it for that purpose (citation needed).
PHP 5.3 and better also includes a function ``openssl_random_pseudo_bytes``
[17]_. Translated into Python syntax, it has roughly the following
signature::
def openssl_random_pseudo_bytes(length:int)->Tuple[str, bool]
This function returns a pseudo-random string of bytes of the given
length, and an boolean flag giving whether the string is considered
cryptographically strong. The PHP manual suggests that returning
anything but True should be rare except for old or broken platforms.
* Javascript
Based on a rather cursory search [18]_, there doesn't appear to be any
well-known standard functions for producing strong random values in
Javascript, although there may be good quality third-party libraries.
Standard Javascript doesn't seem to include an interface to the
system CSPRNG either, and people have extensively written about the
weaknesses of Javascript's ``Math.random`` [19]_.
* Ruby
The Ruby standard library includes a module ``SecureRandom`` [20]_
which includes the following methods:
* base64 - returns a Base64 encoded random string.
* hex - returns a random hexadecimal string.
* random_bytes - returns a random byte string.
* random_number - depending on the argument, returns either a random
integer in the range(0, n), or a random float between 0.0 and 1.0.
* urlsafe_base64 - returns a random URL-safe Base64 encoded string.
* uuid - return a version 4 random Universally Unique IDentifier.
What Should Be The Name Of The Module?
======================================
There was a proposal to add a "random.safe" submodule, quoting the Zen
of Python "Namespaces are one honking great idea" koan. However, the
author of the Zen, Tim Peters, has come out against this idea [21]_, and
recommends a top-level module.
In discussion on the python-ideas mailing list so far, the name "secrets"
has received some approval, and no strong opposition.
Frequently Asked Questions
==========================
* Q: Is this a real problem? Surely MT is random enough that nobody can
predict its output.
A: The consensus among security professionals is that MT is not safe
in security contexts. It is not difficult to reconstruct the internal
state of MT [22]_ [23]_ and so predict all past and future values. There
are a number of known, practical attacks on systems using MT for
randomness [24]_.
While there are currently no known direct attacks on applications
written in Python due to the use of MT, there is widespread agreement
that such usage is unsafe.
* Q: Is this an alternative to specialise cryptographic software such as SSL?
A: No. This is a "batteries included" solution, not a full-featured
"nuclear reactor". It is intended to mitigate against some basic
security errors, not be a solution to all security-related issues. To
quote Nick Coghlan referring to his earlier proposal [25]_::
"...folks really are better off learning to use things like
cryptography.io for security sensitive software, so this change
is just about harm mitigation given that it's inevitable that a
non-trivial proportion of the millions of current and future
Python developers won't do that."
References
==========
.. [1] https://mail.python.org/pipermail/python-ideas/2015-September/035820.html
.. [2] https://docs.python.org/3/library/random.html
.. [3] As of the date of writing. Also, as Google search terms may be
automatically customised for the user without their knowledge, some
readers may see different results.
.. [4] http://interactivepython.org/runestone/static/everyday/2013/01/3_password.html
.. [5] http://stackoverflow.com/questions/3854692/generate-password-in-python
.. [6] http://stackoverflow.com/questions/3854692/generate-password-in-python/3854766#3854766
.. [7] https://mail.python.org/pipermail/python-ideas/2015-September/036238.html
.. [8] At least those who are motivated to read the source code and documentation.
.. [9] Tim Peters suggests that bike-shedding the contents of the module will
be 10000 times more time consuming than actually implementing the
module. Words do not begin to express how much I am looking forward to
this.
.. [10] https://mail.python.org/pipermail/python-ideas/2015-September/036271.html
.. [11] https://github.com/pyca/cryptography/issues/2347
.. [12] Link needed.
.. [13] By default PHP seeds the MT PRNG with the time (citation needed),
which is exploitable by attackers, while Python seeds the PRNG with
output from the system CSPRNG, which is believed to be much harder to
exploit.
.. [14] http://legacy.python.org/dev/peps/pep-0504/
.. [15] https://mail.python.org/pipermail/python-ideas/2015-September/036243.html
.. [16] http://php.net/manual/en/function.uniqid.php
.. [17] http://php.net/manual/en/function.openssl-random-pseudo-bytes.php
.. [18] Volunteers and patches are welcome.
.. [19] http://ifsec.blogspot.fr/2012/05/cross-domain-mathrandom-prediction.html
.. [20] http://ruby-doc.org/stdlib-2.1.2/libdoc/securerandom/rdoc/SecureRandom.html
.. [21] https://mail.python.org/pipermail/python-ideas/2015-September/036254.html
.. [22] https://jazzy.id.au/2010/09/22/cracking_random_number_generators_part_3.html
.. [23] https://mail.python.org/pipermail/python-ideas/2015-September/036077.html
.. [24] https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf
.. [25] https://mail.python.org/pipermail/python-ideas/2015-September/036157.html
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

View File

@ -2,7 +2,7 @@ PEP: 3140
Title: str(container) should call str(item), not repr(item)
Version: $Revision$
Last-Modified: $Date$
Author: Oleg Broytmann <phd@phd.pp.ru>,
Author: Oleg Broytman <phd@phdru.name>,
Jim J. Jewett <jimjjewett@gmail.com>
Discussions-To: python-3000@python.org
Status: Rejected