merge
This commit is contained in:
commit
3cdb11485b
15
pep-0101.txt
15
pep-0101.txt
|
@ -424,11 +424,12 @@ How to Make A Release
|
|||
that directory. Note though that if you're releasing a maintenance
|
||||
release for an older version, don't change the current link.
|
||||
|
||||
___ If this is a final release (even a maintenance release), also unpack
|
||||
the HTML docs to /srv/docs.python.org/release/X.Y.Z on
|
||||
docs.iad1.psf.io. Make sure the files are in group "docs". If it is a
|
||||
release of a security-fix-only version, tell the DE to build a version
|
||||
with the "version switcher" and put it there.
|
||||
___ If this is a final release (even a maintenance release), also
|
||||
unpack the HTML docs to /srv/docs.python.org/release/X.Y.Z on
|
||||
docs.iad1.psf.io. Make sure the files are in group "docs" and are
|
||||
group-writeable. If it is a release of a security-fix-only version,
|
||||
tell the DE to build a version with the "version switcher"
|
||||
and put it there.
|
||||
|
||||
___ Let the DE check if the docs are built and work all right.
|
||||
|
||||
|
@ -484,6 +485,10 @@ How to Make A Release
|
|||
Note that the easiest thing is probably to copy fields from
|
||||
an existing Python release "page", editing as you go.
|
||||
|
||||
There should only be one "page" for a release (e.g. 3.5.0, 3.5.1).
|
||||
Reuse the same page for all pre-releases, changing the version
|
||||
number and the documentation as you go.
|
||||
|
||||
___ If this isn't the first release for a version, open the existing
|
||||
"page" for editing and update it to the new release. Don't save yet!
|
||||
|
||||
|
|
|
@ -0,0 +1,951 @@
|
|||
PEP: 103
|
||||
Title: Collecting information about git
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Oleg Broytman <phd@phdru.name>
|
||||
Status: Draft
|
||||
Type: Informational
|
||||
Content-Type: text/x-rst
|
||||
Created: 01-Jun-2015
|
||||
Post-History: 12-Sep-2015
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This Informational PEP collects information about git. There is, of
|
||||
course, a lot of documentation for git, so the PEP concentrates on
|
||||
more complex (and more related to Python development) issues,
|
||||
scenarios and examples.
|
||||
|
||||
The plan is to extend the PEP in the future collecting information
|
||||
about equivalence of Mercurial and git scenarios to help migrating
|
||||
Python development from Mercurial to git.
|
||||
|
||||
The author of the PEP doesn't currently plan to write a Process PEP on
|
||||
migration Python development from Mercurial to git.
|
||||
|
||||
|
||||
Documentation
|
||||
=============
|
||||
|
||||
Git is accompanied with a lot of documentation, both online and
|
||||
offline.
|
||||
|
||||
|
||||
Documentation for starters
|
||||
--------------------------
|
||||
|
||||
Git Tutorial: `part 1
|
||||
<https://www.kernel.org/pub/software/scm/git/docs/gittutorial.html>`_,
|
||||
`part 2
|
||||
<https://www.kernel.org/pub/software/scm/git/docs/gittutorial-2.html>`_.
|
||||
|
||||
`Git User's manual
|
||||
<https://www.kernel.org/pub/software/scm/git/docs/user-manual.html>`_.
|
||||
`Everyday GIT With 20 Commands Or So
|
||||
<https://www.kernel.org/pub/software/scm/git/docs/giteveryday.html>`_.
|
||||
`Git workflows
|
||||
<https://www.kernel.org/pub/software/scm/git/docs/gitworkflows.html>`_.
|
||||
|
||||
|
||||
Advanced documentation
|
||||
----------------------
|
||||
|
||||
`Git Magic
|
||||
<http://www-cs-students.stanford.edu/~blynn/gitmagic/index.html>`_,
|
||||
with a number of translations.
|
||||
|
||||
`Pro Git <https://git-scm.com/book>`_. The Book about git. Buy it at
|
||||
Amazon or download in PDF, mobi, or ePub form. It has translations to
|
||||
many different languages. Download Russian translation from `GArik
|
||||
<https://github.com/GArik/progit/wiki>`_.
|
||||
|
||||
`Git Wiki <https://git.wiki.kernel.org/index.php/Main_Page>`_.
|
||||
|
||||
|
||||
Offline documentation
|
||||
---------------------
|
||||
|
||||
Git has builtin help: run ``git help $TOPIC``. For example, run
|
||||
``git help git`` or ``git help help``.
|
||||
|
||||
|
||||
Quick start
|
||||
===========
|
||||
|
||||
Download and installation
|
||||
-------------------------
|
||||
|
||||
Unix users: `download and install using your package manager
|
||||
<https://git-scm.com/download/linux>`_.
|
||||
|
||||
Microsoft Windows: download `git-for-windows
|
||||
<https://github.com/git-for-windows/git/releases>`_ or `msysGit
|
||||
<https://github.com/msysgit/msysgit/releases>`_.
|
||||
|
||||
MacOS X: use git installed with `XCode
|
||||
<https://developer.apple.com/xcode/downloads/>`_ or download from
|
||||
`MacPorts <https://www.macports.org/ports.php?by=name&substr=git>`_ or
|
||||
`git-osx-installer
|
||||
<http://sourceforge.net/projects/git-osx-installer/files/>`_ or
|
||||
install git with `Homebrew <http://brew.sh/>`_: ``brew install git``.
|
||||
|
||||
`git-cola <https://git-cola.github.io/index.html>`_ is a Git GUI
|
||||
written in Python and GPL licensed. Linux, Windows, MacOS X.
|
||||
|
||||
`TortoiseGit <https://tortoisegit.org/>`_ is a Windows Shell Interface
|
||||
to Git based on TortoiseSVN; open source.
|
||||
|
||||
|
||||
Initial configuration
|
||||
---------------------
|
||||
|
||||
This simple code is often appears in documentation, but it is
|
||||
important so let repeat it here. Git stores author and committer
|
||||
names/emails in every commit, so configure your real name and
|
||||
preferred email::
|
||||
|
||||
$ git config --global user.name "User Name"
|
||||
$ git config --global user.email user.name@example.org
|
||||
|
||||
|
||||
Examples in this PEP
|
||||
====================
|
||||
|
||||
Examples of git commands in this PEP use the following approach. It is
|
||||
supposed that you, the user, works with a local repository named
|
||||
``python`` that has an upstream remote repo named ``origin``. Your
|
||||
local repo has two branches ``v1`` and ``master``. For most examples
|
||||
the currently checked out branch is ``master``. That is, it's assumed
|
||||
you have done something like that::
|
||||
|
||||
$ git clone https://git.python.org/python.git
|
||||
$ cd python
|
||||
$ git branch v1 origin/v1
|
||||
|
||||
The first command clones remote repository into local directory
|
||||
`python``, creates a new local branch master, sets
|
||||
remotes/origin/master as its upstream remote-tracking branch and
|
||||
checks it out into the working directory.
|
||||
|
||||
The last command creates a new local branch v1 and sets
|
||||
remotes/origin/v1 as its upstream remote-tracking branch.
|
||||
|
||||
The same result can be achieved with commands::
|
||||
|
||||
$ git clone -b v1 https://git.python.org/python.git
|
||||
$ cd python
|
||||
$ git checkout --track origin/master
|
||||
|
||||
The last command creates a new local branch master, sets
|
||||
remotes/origin/master as its upstream remote-tracking branch and
|
||||
checks it out into the working directory.
|
||||
|
||||
|
||||
Branches and branches
|
||||
=====================
|
||||
|
||||
Git terminology can be a bit misleading. Take, for example, the term
|
||||
"branch". In git it has two meanings. A branch is a directed line of
|
||||
commits (possibly with merges). And a branch is a label or a pointer
|
||||
assigned to a line of commits. It is important to distinguish when you
|
||||
talk about commits and when about their labels. Lines of commits are
|
||||
by itself unnamed and are usually only lengthening and merging.
|
||||
Labels, on the other hand, can be created, moved, renamed and deleted
|
||||
freely.
|
||||
|
||||
|
||||
Remote repositories and remote branches
|
||||
=======================================
|
||||
|
||||
Remote-tracking branches are branches (pointers to commits) in your
|
||||
local repository. They are there for git (and for you) to remember
|
||||
what branches and commits have been pulled from and pushed to what
|
||||
remote repos (you can pull from and push to many remotes).
|
||||
Remote-tracking branches live under ``remotes/$REMOTE`` namespaces,
|
||||
e.g. ``remotes/origin/master``.
|
||||
|
||||
To see the status of remote-tracking branches run::
|
||||
|
||||
$ git branch -rv
|
||||
|
||||
To see local and remote-tracking branches (and tags) pointing to
|
||||
commits::
|
||||
|
||||
$ git log --decorate
|
||||
|
||||
You never do your own development on remote-tracking branches. You
|
||||
create a local branch that has a remote branch as upstream and do
|
||||
development on that local branch. On push git pushes commits to the
|
||||
remote repo and updates remote-tracking branches, on pull git fetches
|
||||
commits from the remote repo, updates remote-tracking branches and
|
||||
fast-forwards, merges or rebases local branches.
|
||||
|
||||
When you do an initial clone like this::
|
||||
|
||||
$ git clone -b v1 https://git.python.org/python.git
|
||||
|
||||
git clones remote repository ``https://git.python.org/python.git`` to
|
||||
directory ``python``, creates a remote named ``origin``, creates
|
||||
remote-tracking branches, creates a local branch ``v1``, configure it
|
||||
to track upstream remotes/origin/v1 branch and checks out ``v1`` into
|
||||
the working directory.
|
||||
|
||||
|
||||
Updating local and remote-tracking branches
|
||||
-------------------------------------------
|
||||
|
||||
There is a major difference between
|
||||
|
||||
::
|
||||
|
||||
$ git fetch $REMOTE $BRANCH
|
||||
|
||||
and
|
||||
|
||||
::
|
||||
|
||||
$ git fetch $REMOTE $BRANCH:$BRANCH
|
||||
|
||||
The first command fetches commits from the named $BRANCH in the
|
||||
$REMOTE repository that are not in your repository, updates
|
||||
remote-tracking branch and leaves the id (the hash) of the head commit
|
||||
in file .git/FETCH_HEAD.
|
||||
|
||||
The second command fetches commits from the named $BRANCH in the
|
||||
$REMOTE repository that are not in your repository and updates both
|
||||
the local branch $BRANCH and its upstream remote-tracking branch. But
|
||||
it refuses to update branches in case of non-fast-forward. And it
|
||||
refuses to update the current branch (currently checked out branch,
|
||||
where HEAD is pointing to).
|
||||
|
||||
The first command is used internally by ``git pull``.
|
||||
|
||||
::
|
||||
|
||||
$ git pull $REMOTE $BRANCH
|
||||
|
||||
is equivalent to
|
||||
|
||||
::
|
||||
|
||||
$ git fetch $REMOTE $BRANCH
|
||||
$ git merge FETCH_HEAD
|
||||
|
||||
Certainly, $BRANCH in that case should be your current branch. If you
|
||||
want to merge a different branch into your current branch first update
|
||||
that non-current branch and then merge::
|
||||
|
||||
$ git fetch origin v1:v1 # Update v1
|
||||
$ git pull --rebase origin master # Update the current branch master
|
||||
# using rebase instead of merge
|
||||
$ git merge v1
|
||||
|
||||
If you have not yet pushed commits on ``v1``, though, the scenario has
|
||||
to become a bit more complex. Git refuses to update
|
||||
non-fast-forwardable branch, and you don't want to do force-pull
|
||||
because that would remove your non-pushed commits and you would need
|
||||
to recover. So you want to rebase ``v1`` but you cannot rebase
|
||||
non-current branch. Hence, checkout ``v1`` and rebase it before
|
||||
merging::
|
||||
|
||||
$ git checkout v1
|
||||
$ git pull --rebase origin v1
|
||||
$ git checkout master
|
||||
$ git pull --rebase origin master
|
||||
$ git merge v1
|
||||
|
||||
It is possible to configure git to make it fetch/pull a few branches
|
||||
or all branches at once, so you can simply run
|
||||
|
||||
::
|
||||
|
||||
$ git pull origin
|
||||
|
||||
or even
|
||||
|
||||
::
|
||||
|
||||
$ git pull
|
||||
|
||||
Default remote repository for fetching/pulling is ``origin``. Default
|
||||
set of references to fetch is calculated using matching algorithm: git
|
||||
fetches all branches having the same name on both ends.
|
||||
|
||||
|
||||
Push
|
||||
''''
|
||||
|
||||
Pushing is a bit simpler. There is only one command ``push``. When you
|
||||
run
|
||||
|
||||
::
|
||||
|
||||
$ git push origin v1 master
|
||||
|
||||
git pushes local v1 to remote v1 and local master to remote master.
|
||||
The same as::
|
||||
|
||||
$ git push origin v1:v1 master:master
|
||||
|
||||
Git pushes commits to the remote repo and updates remote-tracking
|
||||
branches. Git refuses to push commits that aren't fast-forwardable.
|
||||
You can force-push anyway, but please remember - you can force-push to
|
||||
your own repositories but don't force-push to public or shared repos.
|
||||
If you find git refuses to push commits that aren't fast-forwardable,
|
||||
better fetch and merge commits from the remote repo (or rebase your
|
||||
commits on top of the fetched commits), then push. Only force-push if
|
||||
you know what you do and why you do it. See the section `Commit
|
||||
editing and caveats`_ below.
|
||||
|
||||
It is possible to configure git to make it push a few branches or all
|
||||
branches at once, so you can simply run
|
||||
|
||||
::
|
||||
|
||||
$ git push origin
|
||||
|
||||
or even
|
||||
|
||||
::
|
||||
|
||||
$ git push
|
||||
|
||||
Default remote repository for pushing is ``origin``. Default set of
|
||||
references to push in git before 2.0 is calculated using matching
|
||||
algorithm: git pushes all branches having the same name on both ends.
|
||||
Default set of references to push in git 2.0+ is calculated using
|
||||
simple algorithm: git pushes the current branch back to its
|
||||
@{upstream}.
|
||||
|
||||
To configure git before 2.0 to the new behaviour run::
|
||||
|
||||
$ git config push.default simple
|
||||
|
||||
To configure git 2.0+ to the old behaviour run::
|
||||
|
||||
$ git config push.default matching
|
||||
|
||||
Git doesn't allow to push a branch if it's the current branch in the
|
||||
remote non-bare repository: git refuses to update remote working
|
||||
directory. You really should push only to bare repositories. For
|
||||
non-bare repositories git prefers pull-based workflow.
|
||||
|
||||
When you want to deploy code on a remote host and can only use push
|
||||
(because your workstation is behind a firewall and you cannot pull
|
||||
from it) you do that in two steps using two repositories: you push
|
||||
from the workstation to a bare repo on the remote host, ssh to the
|
||||
remote host and pull from the bare repo to a non-bare deployment repo.
|
||||
|
||||
That changed in git 2.3, but see `the blog post
|
||||
<https://github.com/blog/1957-git-2-3-has-been-released#push-to-deploy>`_
|
||||
for caveats; in 2.4 the push-to-deploy feature was `further improved
|
||||
<https://github.com/blog/1994-git-2-4-atomic-pushes-push-to-deploy-and-more#push-to-deploy-improvements>`_.
|
||||
|
||||
|
||||
Tags
|
||||
''''
|
||||
|
||||
Git automatically fetches tags that point to commits being fetched
|
||||
during fetch/pull. To fetch all tags (and commits they point to) run
|
||||
``git fetch --tags origin``. To fetch some specific tags fetch them
|
||||
explicitly::
|
||||
|
||||
$ git fetch origin tag $TAG1 tag $TAG2...
|
||||
|
||||
For example::
|
||||
|
||||
$ git fetch origin tag 1.4.2
|
||||
$ git fetch origin v1:v1 tag 2.1.7
|
||||
|
||||
Git doesn't automatically pushes tags. That allows you to have private
|
||||
tags. To push tags list them explicitly::
|
||||
|
||||
$ git push origin tag 1.4.2
|
||||
$ git push origin v1 master tag 2.1.7
|
||||
|
||||
Or push all tags at once::
|
||||
|
||||
$ git push --tags origin
|
||||
|
||||
Don't move tags with ``git tag -f`` or remove tags with ``git tag -d``
|
||||
after they have been published.
|
||||
|
||||
|
||||
Private information
|
||||
'''''''''''''''''''
|
||||
|
||||
When cloning/fetching/pulling/pushing git copies only database objects
|
||||
(commits, trees, files and tags) and symbolic references (branches and
|
||||
lightweight tags). Everything else is private to the repository and
|
||||
never cloned, updated or pushed. It's your config, your hooks, your
|
||||
private exclude file.
|
||||
|
||||
If you want to distribute hooks, copy them to the working tree, add,
|
||||
commit, push and instruct the team to update and install the hooks
|
||||
manually.
|
||||
|
||||
|
||||
Commit editing and caveats
|
||||
==========================
|
||||
|
||||
A warning not to edit published (pushed) commits also appears in
|
||||
documentation but it's repeated here anyway as it's very important.
|
||||
|
||||
It is possible to recover from a forced push but it's PITA for the
|
||||
entire team. Please avoid it.
|
||||
|
||||
To see what commits have not been published yet compare the head of the
|
||||
branch with its upstream remote-tracking branch::
|
||||
|
||||
$ git log origin/master.. # from origin/master to HEAD (of master)
|
||||
$ git log origin/v1..v1 # from origin/v1 to the head of v1
|
||||
|
||||
For every branch that has an upstream remote-tracking branch git
|
||||
maintains an alias @{upstream} (short version @{u}), so the commands
|
||||
above can be given as::
|
||||
|
||||
$ git log @{u}..
|
||||
$ git log v1@{u}..v1
|
||||
|
||||
To see the status of all branches::
|
||||
|
||||
$ git branch -avv
|
||||
|
||||
To compare the status of local branches with a remote repo::
|
||||
|
||||
$ git remote show origin
|
||||
|
||||
Read `how to recover from upstream rebase
|
||||
<https://git-scm.com/docs/git-rebase#_recovering_from_upstream_rebase>`_.
|
||||
It is in ``git help rebase``.
|
||||
|
||||
On the other hand don't be too afraid about commit editing. You can
|
||||
safely edit, reorder, remove, combine and split commits that haven't
|
||||
been pushed yet. You can even push commits to your own (backup) repo,
|
||||
edit them later and force-push edited commits to replace what have
|
||||
already been pushed. Not a problem until commits are in a public
|
||||
or shared repository.
|
||||
|
||||
|
||||
Undo
|
||||
====
|
||||
|
||||
Whatever you do, don't panic. Almost anything in git can be undone.
|
||||
|
||||
|
||||
git checkout: restore file's content
|
||||
------------------------------------
|
||||
|
||||
``git checkout``, for example, can be used to restore the content of
|
||||
file(s) to that one of a commit. Like this::
|
||||
|
||||
git checkout HEAD~ README
|
||||
|
||||
The commands restores the contents of README file to the last but one
|
||||
commit in the current branch. By default the commit ID is simply HEAD;
|
||||
i.e. ``git checkout README`` restores README to the latest commit.
|
||||
|
||||
(Do not use ``git checkout`` to view a content of a file in a commit,
|
||||
use ``git cat-file -p``; e.g. ``git cat-file -p HEAD~:path/to/README``).
|
||||
|
||||
|
||||
git reset: remove (non-pushed) commits
|
||||
--------------------------------------
|
||||
|
||||
``git reset`` moves the head of the current branch. The head can be
|
||||
moved to point to any commit but it's often used to remove a commit or
|
||||
a few (preferably, non-pushed ones) from the top of the branch - that
|
||||
is, to move the branch backward in order to undo a few (non-pushed)
|
||||
commits.
|
||||
|
||||
``git reset`` has three modes of operation - soft, hard and mixed.
|
||||
Default is mixed. ProGit `explains
|
||||
<https://git-scm.com/book/en/Git-Tools-Reset-Demystified>`_ the
|
||||
difference very clearly. Bare repositories don't have indices or
|
||||
working trees so in a bare repo only soft reset is possible.
|
||||
|
||||
|
||||
Unstaging
|
||||
'''''''''
|
||||
|
||||
Mixed mode reset with a path or paths can be used to unstage changes -
|
||||
that is, to remove from index changes added with ``git add`` for
|
||||
committing. See `The Book
|
||||
<https://git-scm.com/book/en/Git-Basics-Undoing-Things>`_ for details
|
||||
about unstaging and other undo tricks.
|
||||
|
||||
|
||||
git reflog: reference log
|
||||
-------------------------
|
||||
|
||||
Removing commits with ``git reset`` or moving the head of a branch
|
||||
sounds dangerous and it is. But there is a way to undo: another
|
||||
reset back to the original commit. Git doesn't remove commits
|
||||
immediately; unreferenced commits (in git terminology they are called
|
||||
"dangling commits") stay in the database for some time (default is two
|
||||
weeks) so you can reset back to it or create a new branch pointing to
|
||||
the original commit.
|
||||
|
||||
For every move of a branch's head - with ``git commit``, ``git
|
||||
checkout``, ``git fetch``, ``git pull``, ``git rebase``, ``git reset``
|
||||
and so on - git stores a reference log (reflog for short). For every
|
||||
move git stores where the head was. Command ``git reflog`` can be used
|
||||
to view (and manipulate) the log.
|
||||
|
||||
In addition to the moves of the head of every branch git stores the
|
||||
moves of the HEAD - a symbolic reference that (usually) names the
|
||||
current branch. HEAD is changed with ``git checkout $BRANCH``.
|
||||
|
||||
By default ``git reflog`` shows the moves of the HEAD, i.e. the
|
||||
command is equivalent to ``git reflog HEAD``. To show the moves of the
|
||||
head of a branch use the command ``git reflog $BRANCH``.
|
||||
|
||||
So to undo a ``git reset`` lookup the original commit in ``git
|
||||
reflog``, verify it with ``git show`` or ``git log`` and run ``git
|
||||
reset $COMMIT_ID``. Git stores the move of the branch's head in
|
||||
reflog, so you can undo that undo later again.
|
||||
|
||||
In a more complex situation you'd want to move some commits along with
|
||||
resetting the head of the branch. Cherry-pick them to the new branch.
|
||||
For example, if you want to reset the branch ``master`` back to the
|
||||
original commit but preserve two commits created in the current branch
|
||||
do something like::
|
||||
|
||||
$ git branch save-master # create a new branch saving master
|
||||
$ git reflog # find the original place of master
|
||||
$ git reset $COMMIT_ID
|
||||
$ git cherry-pick save-master~ save-master
|
||||
$ git branch -D save-master # remove temporary branch
|
||||
|
||||
|
||||
git revert: revert a commit
|
||||
---------------------------
|
||||
|
||||
``git revert`` reverts a commit or commits, that is, it creates a new
|
||||
commit or commits that revert(s) the effects of the given commits.
|
||||
It's the only way to undo published commits (``git commit --amend``,
|
||||
``git rebase`` and ``git reset`` change the branch in
|
||||
non-fast-forwardable ways so they should only be used for non-pushed
|
||||
commits.)
|
||||
|
||||
There is a problem with reverting a merge commit. ``git revert`` can
|
||||
undo the code created by the merge commit but it cannot undo the fact
|
||||
of merge. See the discussion `How to revert a faulty merge
|
||||
<https://www.kernel.org/pub/software/scm/git/docs/howto/revert-a-faulty-merge.html>`_.
|
||||
|
||||
|
||||
One thing that cannot be undone
|
||||
-------------------------------
|
||||
|
||||
Whatever you undo, there is one thing that cannot be undone -
|
||||
overwritten uncommitted changes. Uncommitted changes don't belong to
|
||||
git so git cannot help preserving them.
|
||||
|
||||
Most of the time git warns you when you're going to execute a command
|
||||
that overwrites uncommitted changes. Git doesn't allow you to switch
|
||||
branches with ``git checkout``. It stops you when you're going to
|
||||
rebase with non-clean working tree. It refuses to pull new commits
|
||||
over non-committed files.
|
||||
|
||||
But there are commands that do exactly that - overwrite files in the
|
||||
working tree. Commands like ``git checkout $PATHs`` or ``git reset
|
||||
--hard`` silently overwrite files including your uncommitted changes.
|
||||
|
||||
With that in mind you can understand the stance "commit early, commit
|
||||
often". Commit as often as possible. Commit on every save in your
|
||||
editor or IDE. You can edit your commits before pushing - edit commit
|
||||
messages, change commits, reorder, combine, split, remove. But save
|
||||
your changes in git database, either commit changes or at least stash
|
||||
them with ``git stash``.
|
||||
|
||||
|
||||
Merge or rebase?
|
||||
================
|
||||
|
||||
Internet is full of heated discussions on the topic: "merge or
|
||||
rebase?" Most of them are meaningless. When a DVCS is being used in a
|
||||
big team with a big and complex project with many branches there is
|
||||
simply no way to avoid merges. So the question's diminished to
|
||||
"whether to use rebase, and if yes - when to use rebase?" Considering
|
||||
that it is very much recommended not to rebase published commits the
|
||||
question's diminished even further: "whether to use rebase on
|
||||
non-pushed commits?"
|
||||
|
||||
That small question is for the team to decide. The author of the PEP
|
||||
recommends to use rebase when pulling, i.e. always do ``git pull
|
||||
--rebase`` or even configure automatic setup of rebase for every new
|
||||
branch::
|
||||
|
||||
$ git config branch.autosetuprebase always
|
||||
|
||||
and configure rebase for existing branches::
|
||||
|
||||
$ git config branch.$NAME.rebase true
|
||||
|
||||
For example::
|
||||
|
||||
$ git config branch.v1.rebase true
|
||||
$ git config branch.master.rebase true
|
||||
|
||||
After that ``git pull origin master`` becomes equivalent to ``git pull
|
||||
--rebase origin master``.
|
||||
|
||||
It is recommended to create new commits in a separate feature or topic
|
||||
branch while using rebase to update the mainline branch. When the
|
||||
topic branch is ready merge it into mainline. To avoid a tedious task
|
||||
of resolving large number of conflicts at once you can merge the topic
|
||||
branch to the mainline from time to time and switch back to the topic
|
||||
branch to continue working on it. The entire workflow would be
|
||||
something like::
|
||||
|
||||
$ git checkout -b issue-42 # create a new issue branch and switch to it
|
||||
...edit/test/commit...
|
||||
$ git checkout master
|
||||
$ git pull --rebase origin master # update master from the upstream
|
||||
$ git merge issue-42
|
||||
$ git branch -d issue-42 # delete the topic branch
|
||||
$ git push origin master
|
||||
|
||||
When the topic branch is deleted only the label is removed, commits
|
||||
are stayed in the database, they are now merged into master::
|
||||
|
||||
o--o--o--o--o--M--< master - the mainline branch
|
||||
\ /
|
||||
--*--*--* - the topic branch, now unnamed
|
||||
|
||||
The topic branch is deleted to avoid cluttering branch namespace with
|
||||
small topic branches. Information on what issue was fixed or what
|
||||
feature was implemented should be in the commit messages.
|
||||
|
||||
|
||||
Null-merges
|
||||
===========
|
||||
|
||||
Git has a builtin merge strategy for what Python core developers call
|
||||
"null-merge"::
|
||||
|
||||
$ git merge -s ours v1 # null-merge v1 into master
|
||||
|
||||
|
||||
Branching models
|
||||
================
|
||||
|
||||
Git doesn't assume any particular development model regarding
|
||||
branching and merging. Some projects prefer to graduate patches from
|
||||
the oldest branch to the newest, some prefer to cherry-pick commits
|
||||
backwards, some use squashing (combining a number of commits into
|
||||
one). Anything is possible.
|
||||
|
||||
There are a few examples to start with. `git help workflows
|
||||
<https://www.kernel.org/pub/software/scm/git/docs/gitworkflows.html>`_
|
||||
describes how the very git authors develop git.
|
||||
|
||||
ProGit book has a few chapters devoted to branch management in
|
||||
different projects: `Git Branching - Branching Workflows
|
||||
<https://git-scm.com/book/en/Git-Branching-Branching-Workflows>`_ and
|
||||
`Distributed Git - Contributing to a Project
|
||||
<https://git-scm.com/book/en/Distributed-Git-Contributing-to-a-Project>`_.
|
||||
|
||||
There is also a well-known article `A successful Git branching model
|
||||
<http://nvie.com/posts/a-successful-git-branching-model/>`_ by Vincent
|
||||
Driessen. It recommends a set of very detailed rules on creating and
|
||||
managing mainline, topic and bugfix branches. To support the model the
|
||||
author implemented `git flow <https://github.com/nvie/gitflow>`_
|
||||
extension.
|
||||
|
||||
|
||||
Advanced configuration
|
||||
======================
|
||||
|
||||
Line endings
|
||||
------------
|
||||
|
||||
Git has builtin mechanisms to handle line endings between platforms
|
||||
with different end-of-line styles. To allow git to do CRLF conversion
|
||||
assign ``text`` attribute to files using `.gitattributes
|
||||
<https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html>`_.
|
||||
For files that have to have specific line endings assign ``eol``
|
||||
attribute. For binary files the attribute is, naturally, ``binary``.
|
||||
|
||||
For example::
|
||||
|
||||
$ cat .gitattributes
|
||||
*.py text
|
||||
*.txt text
|
||||
*.png binary
|
||||
/readme.txt eol=CRLF
|
||||
|
||||
To check what attributes git uses for files use ``git check-attr``
|
||||
command. For example::
|
||||
|
||||
$ git check-attr -a -- \*.py
|
||||
|
||||
|
||||
Advanced topics
|
||||
===============
|
||||
|
||||
Staging area
|
||||
------------
|
||||
|
||||
Staging area aka index aka cache is a distinguishing feature of git.
|
||||
Staging area is where git collects patches before committing them.
|
||||
Separation between collecting patches and commit phases provides a
|
||||
very useful feature of git: you can review collected patches before
|
||||
commit and even edit them - remove some hunks, add new hunks and
|
||||
review again.
|
||||
|
||||
To add files to the index use ``git add``. Collecting patches before
|
||||
committing means you need to do that for every change, not only to add
|
||||
new (untracked) files. To simplify committing in case you just want to
|
||||
commit everything without reviewing run ``git commit --all`` (or just
|
||||
``-a``) - the command adds every changed tracked file to the index and
|
||||
then commit. To commit a file or files regardless of patches collected
|
||||
in the index run ``git commit [--only|-o] -- $FILE...``.
|
||||
|
||||
To add hunks of patches to the index use ``git add --patch`` (or just
|
||||
``-p``). To remove collected files from the index use ``git reset HEAD
|
||||
-- $FILE...`` To add/inspect/remove collected hunks use ``git add
|
||||
--interactive`` (``-i``).
|
||||
|
||||
To see the diff between the index and the last commit (i.e., collected
|
||||
patches) use ``git diff --cached``. To see the diff between the
|
||||
working tree and the index (i.e., uncollected patches) use just ``git
|
||||
diff``. To see the diff between the working tree and the last commit
|
||||
(i.e., both collected and uncollected patches) run ``git diff HEAD``.
|
||||
|
||||
See `WhatIsTheIndex
|
||||
<https://git.wiki.kernel.org/index.php/WhatIsTheIndex>`_ and
|
||||
`IndexCommandQuickref
|
||||
<https://git.wiki.kernel.org/index.php/IndexCommandQuickref>`_ in Git
|
||||
Wiki.
|
||||
|
||||
|
||||
ReReRe
|
||||
======
|
||||
|
||||
Rerere is a mechanism that helps to resolve repeated merge conflicts.
|
||||
The most frequent source of recurring merge conflicts are topic
|
||||
branches that are merged into mainline and then the merge commits are
|
||||
removed; that's often performed to test the topic branches and train
|
||||
rerere; merge commits are removed to have clean linear history and
|
||||
finish the topic branch with only one last merge commit.
|
||||
|
||||
Rerere works by remembering the states of tree before and after a
|
||||
successful commit. That way rerere can automatically resolve conflicts
|
||||
if they appear in the same files.
|
||||
|
||||
Rerere can be used manually with ``git rerere`` command but most often
|
||||
it's used automatically. Enable rerere with these commands in a
|
||||
working tree::
|
||||
|
||||
$ git config rerere.enabled true
|
||||
$ git config rerere.autoupdate true
|
||||
|
||||
You don't need to turn rerere on globally - you don't want rerere in
|
||||
bare repositories or single-branche repositories; you only need rerere
|
||||
in repos where you often perform merges and resolve merge conflicts.
|
||||
|
||||
See `Rerere <https://git-scm.com/book/en/Git-Tools-Rerere>`_ in The
|
||||
Book.
|
||||
|
||||
|
||||
Database maintenance
|
||||
====================
|
||||
|
||||
Git object database and other files/directories under ``.git`` require
|
||||
periodic maintenance and cleanup. For example, commit editing left
|
||||
unreferenced objects (dangling objects, in git terminology) and these
|
||||
objects should be pruned to avoid collecting cruft in the DB. The
|
||||
command ``git gc`` is used for maintenance. Git automatically runs
|
||||
``git gc --auto`` as a part of some commands to do quick maintenance.
|
||||
Users are recommended to run ``git gc --aggressive`` from time to
|
||||
time; ``git help gc`` recommends to run it every few hundred
|
||||
changesets; for more intensive projects it should be something like
|
||||
once a week and less frequently (biweekly or monthly) for lesser
|
||||
active projects.
|
||||
|
||||
``git gc --aggressive`` not only removes dangling objects, it also
|
||||
repacks object database into indexed and better optimized pack(s); it
|
||||
also packs symbolic references (branches and tags). Another way to do
|
||||
it is to run ``git repack``.
|
||||
|
||||
There is a well-known `message
|
||||
<https://gcc.gnu.org/ml/gcc/2007-12/msg00165.html>`_ from Linus
|
||||
Torvalds regarding "stupidity" of ``git gc --aggressive``. The message
|
||||
can safely be ignored now. It is old and outdated, ``git gc
|
||||
--aggressive`` became much better since that time.
|
||||
|
||||
For those who still prefer ``git repack`` over ``git gc --aggressive``
|
||||
the recommended parameters are ``git repack -a -d -f --depth=20
|
||||
--window=250``. See `this detailed experiment
|
||||
<http://vcscompare.blogspot.ru/2008/06/git-repack-parameters.html>`_
|
||||
for explanation of the effects of these parameters.
|
||||
|
||||
From time to time run ``git fsck [--strict]`` to verify integrity of
|
||||
the database. ``git fsck`` may produce a list of dangling objects;
|
||||
that's not an error, just a reminder to perform regular maintenance.
|
||||
|
||||
|
||||
Tips and tricks
|
||||
===============
|
||||
|
||||
Command-line options and arguments
|
||||
----------------------------------
|
||||
|
||||
`git help cli
|
||||
<https://www.kernel.org/pub/software/scm/git/docs/gitcli.html>`_
|
||||
recommends not to combine short options/flags. Most of the times
|
||||
combining works: ``git commit -av`` works perfectly, but there are
|
||||
situations when it doesn't. E.g., ``git log -p -5`` cannot be combined
|
||||
as ``git log -p5``.
|
||||
|
||||
Some options have arguments, some even have default arguments. In that
|
||||
case the argument for such option must be spelled in a sticky way:
|
||||
``-Oarg``, never ``-O arg`` because for an option that has a default
|
||||
argument the latter means "use default value for option ``-O`` and
|
||||
pass ``arg`` further to the option parser". For example, ``git grep``
|
||||
has an option ``-O`` that passes a list of names of the found files to
|
||||
a program; default program for ``-O`` is a pager (usually ``less``),
|
||||
but you can use your editor::
|
||||
|
||||
$ git grep -Ovim # but not -O vim
|
||||
|
||||
BTW, if git is instructed to use ``less`` as the pager (i.e., if pager
|
||||
is not configured in git at all it uses ``less`` by default, or if it
|
||||
gets ``less`` from GIT_PAGER or PAGER environment variables, or if it
|
||||
was configured with ``git config --global core.pager less``, or
|
||||
``less`` is used in the command ``git grep -Oless``) ``git grep``
|
||||
passes ``+/$pattern`` option to ``less`` which is quite convenient.
|
||||
Unfortunately, ``git grep`` doesn't pass the pattern if the pager is
|
||||
not exactly ``less``, even if it's ``less`` with parameters (something
|
||||
like ``git config --global core.pager less -FRSXgimq``); fortunately,
|
||||
``git grep -Oless`` always passes the pattern.
|
||||
|
||||
|
||||
bash/zsh completion
|
||||
-------------------
|
||||
|
||||
It's a bit hard to type ``git rebase --interactive --preserve-merges
|
||||
HEAD~5`` manually even for those who are happy to use command-line,
|
||||
and this is where shell completion is of great help. Bash/zsh come
|
||||
with programmable completion, often automatically installed and
|
||||
enabled, so if you have bash/zsh and git installed, chances are you
|
||||
are already done - just go and use it at the command-line.
|
||||
|
||||
If you don't have necessary bits installed, install and enable
|
||||
bash_completion package. If you want to upgrade your git completion to
|
||||
the latest and greatest download necessary file from `git contrib
|
||||
<https://git.kernel.org/cgit/git/git.git/tree/contrib/completion>`_.
|
||||
|
||||
Git-for-windows comes with git-bash for which bash completion is
|
||||
installed and enabled.
|
||||
|
||||
|
||||
bash/zsh prompt
|
||||
---------------
|
||||
|
||||
For command-line lovers shell prompt can carry a lot of useful
|
||||
information. To include git information in the prompt use
|
||||
`git-prompt.sh
|
||||
<https://git.kernel.org/cgit/git/git.git/tree/contrib/completion/git-prompt.sh>`_.
|
||||
Read the detailed instructions in the file.
|
||||
|
||||
Search the Net for "git prompt" to find other prompt variants.
|
||||
|
||||
|
||||
git on server
|
||||
=============
|
||||
|
||||
The simplest way to publish a repository or a group of repositories is
|
||||
``git daemon``. The daemon provides anonymous access, by default it is
|
||||
read-only. The repositories are accessible by git protocol (git://
|
||||
URLs). Write access can be enabled but the protocol lacks any
|
||||
authentication means, so it should be enabled only within a trusted
|
||||
LAN. See ``git help daemon`` for details.
|
||||
|
||||
Git over ssh provides authentication and repo-level authorisation as
|
||||
repositories can be made user- or group-writeable (see parameter
|
||||
``core.sharedRepository`` in ``git help config``). If that's too
|
||||
permissive or too restrictive for some project's needs there is a
|
||||
wrapper `gitolite <http://gitolite.com/gitolite/index.html>`_ that can
|
||||
be configured to allow access with great granularity; gitolite is
|
||||
written in Perl and has a lot of documentation.
|
||||
|
||||
Web interface to browse repositories can be created using `gitweb
|
||||
<https://git.kernel.org/cgit/git/git.git/tree/gitweb>`_ or `cgit
|
||||
<http://git.zx2c4.com/cgit/about/>`_. Both are CGI scripts (written in
|
||||
Perl and C). In addition to web interface both provide read-only dumb
|
||||
http access for git (http(s):// URLs).
|
||||
|
||||
There are also more advanced web-based development environments that
|
||||
include ability to manage users, groups and projects; private,
|
||||
group-accessible and public repositories; they often include issue
|
||||
trackers, wiki pages, pull requests and other tools for development
|
||||
and communication. Among these environments are `Kallithea
|
||||
<https://kallithea-scm.org/>`_ and `pagure <https://pagure.io/>`_,
|
||||
both are written in Python; pagure was written by Fedora developers
|
||||
and is being used to develop some Fedora projects. `Gogs
|
||||
<http://gogs.io/>`_ is written in Go; there is a fork `Gitea
|
||||
<http://gitea.io/>`_.
|
||||
|
||||
And last but not least, `Gitlab <https://about.gitlab.com/>`_. It's
|
||||
perhaps the most advanced web-based development environment for git.
|
||||
Written in Ruby, community edition is free and open source (MIT
|
||||
license).
|
||||
|
||||
|
||||
From Mercurial to git
|
||||
=====================
|
||||
|
||||
There are many tools to convert Mercurial repositories to git. The
|
||||
most famous are, probably, `hg-git <https://hg-git.github.io/>`_ and
|
||||
`fast-export <http://repo.or.cz/w/fast-export.git>`_ (many years ago
|
||||
it was known under the name ``hg2git``).
|
||||
|
||||
But a better tool, perhaps the best, is `git-remote-hg
|
||||
<https://github.com/felipec/git-remote-hg>`_. It provides transparent
|
||||
bidirectional (pull and push) access to Mercurial repositories from
|
||||
git. Its author wrote a `comparison of alternatives
|
||||
<https://github.com/felipec/git/wiki/Comparison-of-git-remote-hg-alternatives>`_
|
||||
that seems to be mostly objective.
|
||||
|
||||
To use git-remote-hg, install or clone it, add to your PATH (or copy
|
||||
script ``git-remote-hg`` to a directory that's already in PATH) and
|
||||
prepend ``hg::`` to Mercurial URLs. For example::
|
||||
|
||||
$ git clone https://github.com/felipec/git-remote-hg.git
|
||||
$ PATH=$PATH:"`pwd`"/git-remote-hg
|
||||
$ git clone hg::https://hg.python.org/peps/ PEPs
|
||||
|
||||
To work with the repository just use regular git commands including
|
||||
``git fetch/pull/push``.
|
||||
|
||||
To start converting your Mercurial habits to git see the page
|
||||
`Mercurial for Git users
|
||||
<https://mercurial.selenic.com/wiki/GitConcepts>`_ at Mercurial wiki.
|
||||
At the second half of the page there is a table that lists
|
||||
corresponding Mercurial and git commands. Should work perfectly in
|
||||
both directions.
|
||||
|
||||
Python Developer's Guide also has a chapter `Mercurial for git
|
||||
developers <https://docs.python.org/devguide/gitdevs.html>`_ that
|
||||
documents a few differences between git and hg.
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
||||
vim: set fenc=us-ascii tw=70 :
|
|
@ -66,7 +66,7 @@ Features for 3.5
|
|||
* PEP 479, change StopIteration handling inside generators
|
||||
* PEP 484, the typing module, a new standard for type annotations
|
||||
* PEP 485, math.isclose(), a function for testing approximate equality
|
||||
* PEP 486, making the Widnows Python launcher aware of virtual environments
|
||||
* PEP 486, making the Windows Python launcher aware of virtual environments
|
||||
* PEP 488, eliminating .pyo files
|
||||
* PEP 489, a new and improved mechanism for loading extension modules
|
||||
* PEP 492, coroutines with async and await syntax
|
||||
|
|
13
pep-0495.txt
13
pep-0495.txt
|
@ -404,6 +404,19 @@ where ``delta`` is the size of the fold or the gap.
|
|||
Temporal Arithmetic and Comparison Operators
|
||||
============================================
|
||||
|
||||
.. epigraph::
|
||||
|
||||
| In *mathematicks* he was greater
|
||||
| Than Tycho Brahe, or Erra Pater:
|
||||
| For he, by geometric scale,
|
||||
| Could take the size of pots of ale;
|
||||
| Resolve, by sines and tangents straight,
|
||||
| If bread or butter wanted weight,
|
||||
| And wisely tell what hour o' th' day
|
||||
| The clock does strike by algebra.
|
||||
|
||||
-- "Hudibras" by Samuel Butler
|
||||
|
||||
The value of the ``fold`` attribute will be ignored in all operations
|
||||
with naive datetime instances. As a consequence, naive
|
||||
``datetime.datetime`` or ``datetime.time`` instances that differ only
|
||||
|
|
19
pep-0498.txt
19
pep-0498.txt
|
@ -8,7 +8,7 @@ Type: Standards Track
|
|||
Content-Type: text/x-rst
|
||||
Created: 01-Aug-2015
|
||||
Python-Version: 3.6
|
||||
Post-History: 07-Aug-2015, 30-Aug-2015, 04-Sep-2015
|
||||
Post-History: 07-Aug-2015, 30-Aug-2015, 04-Sep-2015, 19-Sep-2015
|
||||
Resolution: https://mail.python.org/pipermail/python-dev/2015-September/141526.html
|
||||
|
||||
Abstract
|
||||
|
@ -201,6 +201,11 @@ braces ``'{{'`` or ``'}}'`` inside literal portions of an f-string are
|
|||
replaced by the corresponding single brace. Doubled opening braces do
|
||||
not signify the start of an expression.
|
||||
|
||||
Note that ``__format__()`` is not called directly on each value. The
|
||||
actual code uses the equivalent of ``type(value).__format__(value,
|
||||
format_spec)``, or ``format(value, format_spec)``. See the
|
||||
documentation of the builtin ``format()`` function for more details.
|
||||
|
||||
Comments, using the ``'#'`` character, are not allowed inside an
|
||||
expression.
|
||||
|
||||
|
@ -209,7 +214,7 @@ specified. The allowed conversions are ``'!s'``, ``'!r'``, or
|
|||
``'!a'``. These are treated the same as in ``str.format()``: ``'!s'``
|
||||
calls ``str()`` on the expression, ``'!r'`` calls ``repr()`` on the
|
||||
expression, and ``'!a'`` calls ``ascii()`` on the expression. These
|
||||
conversions are applied before the call to ``__format__``. The only
|
||||
conversions are applied before the call to ``format()``. The only
|
||||
reason to use ``'!s'`` is if you want to specify a format specifier
|
||||
that applies to ``str``, not to the type of the expression.
|
||||
|
||||
|
@ -222,9 +227,9 @@ So, an f-string looks like::
|
|||
|
||||
f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } <text> ... '
|
||||
|
||||
The resulting expression's ``__format__`` method is called with the
|
||||
format specifier as an argument. The resulting value is used when
|
||||
building the value of the f-string.
|
||||
The expression is then formatted using the ``__format__`` protocol,
|
||||
using the format specifier as an argument. The resulting value is
|
||||
used when building the value of the f-string.
|
||||
|
||||
Expressions cannot contain ``':'`` or ``'!'`` outside of strings or
|
||||
parentheses, brackets, or braces. The exception is that the ``'!='``
|
||||
|
@ -293,7 +298,7 @@ For example, this code::
|
|||
|
||||
Might be be evaluated as::
|
||||
|
||||
'abc' + expr1.__format__(spec1) + repr(expr2).__format__(spec2) + 'def' + str(expr3).__format__('') + 'ghi'
|
||||
'abc' + format(expr1, spec1) + format(repr(expr2)) + 'def' + format(str(expr3)) + 'ghi'
|
||||
|
||||
Expression evaluation
|
||||
---------------------
|
||||
|
@ -371,7 +376,7 @@ yields the value::
|
|||
While the exact method of this run time concatenation is unspecified,
|
||||
the above code might evaluate to::
|
||||
|
||||
'ab' + x.__format__('') + '{c}' + 'str<' + y.__format__('^4') + '>de'
|
||||
'ab' + format(x) + '{c}' + 'str<' + format(y, '^4') + '>de'
|
||||
|
||||
Each f-string is entirely evaluated before being concatenated to
|
||||
adjacent f-strings. That means that this::
|
||||
|
|
445
pep-0502.txt
445
pep-0502.txt
|
@ -1,43 +1,45 @@
|
|||
PEP: 502
|
||||
Title: String Interpolation Redux
|
||||
Title: String Interpolation - Extended Discussion
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Mike G. Miller
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Type: Informational
|
||||
Content-Type: text/x-rst
|
||||
Created: 10-Aug-2015
|
||||
Python-Version: 3.6
|
||||
|
||||
Note: Open issues below are stated with a question mark (?),
|
||||
and are therefore searchable.
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This proposal describes a new string interpolation feature for Python,
|
||||
called an *expression-string*,
|
||||
that is both concise and powerful,
|
||||
improves readability in most cases,
|
||||
yet does not conflict with existing code.
|
||||
PEP 498: *Literal String Interpolation*, which proposed "formatted strings" was
|
||||
accepted September 9th, 2015.
|
||||
Additional background and rationale given during its design phase is detailed
|
||||
below.
|
||||
|
||||
To recap that PEP,
|
||||
a string prefix was introduced that marks the string as a template to be
|
||||
rendered.
|
||||
These formatted strings may contain one or more expressions
|
||||
built on `the existing syntax`_ of ``str.format()``.
|
||||
The formatted string expands at compile-time into a conventional string format
|
||||
operation,
|
||||
with the given expressions from its text extracted and passed instead as
|
||||
positional arguments.
|
||||
|
||||
To achieve this end,
|
||||
a new string prefix is introduced,
|
||||
which expands at compile-time into an equivalent expression-string object,
|
||||
with requested variables from its context passed as keyword arguments.
|
||||
At runtime,
|
||||
the new object uses these passed values to render a string to given
|
||||
specifications, building on `the existing syntax`_ of ``str.format()``::
|
||||
the resulting expressions are evaluated to render a string to given
|
||||
specifications::
|
||||
|
||||
>>> location = 'World'
|
||||
>>> e'Hello, {location} !' # new prefix: e''
|
||||
>>> f'Hello, {location} !' # new prefix: f''
|
||||
'Hello, World !' # interpolated result
|
||||
|
||||
.. _the existing syntax: https://docs.python.org/3/library/string.html#format-string-syntax
|
||||
Format-strings may be thought of as merely syntactic sugar to simplify traditional
|
||||
calls to ``str.format()``.
|
||||
|
||||
This PEP does not recommend to remove or deprecate any of the existing string
|
||||
formatting mechanisms.
|
||||
.. _the existing syntax: https://docs.python.org/3/library/string.html#format-string-syntax
|
||||
|
||||
|
||||
Motivation
|
||||
|
@ -50,12 +52,16 @@ In comparison to other dynamic scripting languages
|
|||
with similar use cases,
|
||||
the amount of code necessary to build similar strings is substantially higher,
|
||||
while at times offering lower readability due to verbosity, dense syntax,
|
||||
or identifier duplication. [1]_
|
||||
or identifier duplication.
|
||||
|
||||
These difficulties are described at moderate length in the original
|
||||
`post to python-ideas`_
|
||||
that started the snowball (that became PEP 498) rolling. [1]_
|
||||
|
||||
Furthermore, replacement of the print statement with the more consistent print
|
||||
function of Python 3 (PEP 3105) has added one additional minor burden,
|
||||
an additional set of parentheses to type and read.
|
||||
Combined with the verbosity of current formatting solutions,
|
||||
Combined with the verbosity of current string formatting solutions,
|
||||
this puts an otherwise simple language at an unfortunate disadvantage to its
|
||||
peers::
|
||||
|
||||
|
@ -66,7 +72,7 @@ peers::
|
|||
# Python 3, str.format with named parameters
|
||||
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(**locals()))
|
||||
|
||||
# Python 3, variation B, worst case
|
||||
# Python 3, worst case
|
||||
print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(user=user,
|
||||
id=id,
|
||||
hostname=
|
||||
|
@ -74,7 +80,7 @@ peers::
|
|||
|
||||
In Python, the formatting and printing of a string with multiple variables in a
|
||||
single line of code of standard width is noticeably harder and more verbose,
|
||||
indentation often exacerbating the issue.
|
||||
with indentation exacerbating the issue.
|
||||
|
||||
For use cases such as smaller projects, systems programming,
|
||||
shell script replacements, and even one-liners,
|
||||
|
@ -82,36 +88,17 @@ where message formatting complexity has yet to be encapsulated,
|
|||
this verbosity has likely lead a significant number of developers and
|
||||
administrators to choose other languages over the years.
|
||||
|
||||
.. _post to python-ideas: https://mail.python.org/pipermail/python-ideas/2015-July/034659.html
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
|
||||
Naming
|
||||
------
|
||||
|
||||
The term expression-string was chosen because other applicable terms,
|
||||
such as format-string and template are already well used in the Python standard
|
||||
library.
|
||||
|
||||
The string prefix itself, ``e''`` was chosen to demonstrate that the
|
||||
specification enables expressions,
|
||||
is not limited to ``str.format()`` syntax,
|
||||
and also does not lend itself to `the shorthand term`_ "f-string".
|
||||
It is also slightly easier to type than other choices such as ``_''`` and
|
||||
``i''``,
|
||||
while perhaps `less odd-looking`_ to C-developers.
|
||||
``printf('')`` vs. ``print(f'')``.
|
||||
|
||||
.. _the shorthand term: reference_needed
|
||||
.. _less odd-looking: https://mail.python.org/pipermail/python-dev/2015-August/141147.html
|
||||
|
||||
|
||||
|
||||
Goals
|
||||
-------------
|
||||
|
||||
The design goals of expression-strings are as follows:
|
||||
The design goals of format strings are as follows:
|
||||
|
||||
#. Eliminate need to pass variables manually.
|
||||
#. Eliminate repetition of identifiers and redundant parentheses.
|
||||
|
@ -133,40 +120,44 @@ Python specified both single (``'``) and double (``"``) ASCII quote
|
|||
characters to enclose strings.
|
||||
It is not reasonable to choose one of them now to enable interpolation,
|
||||
while leaving the other for uninterpolated strings.
|
||||
"Backtick" characters (`````) are also `constrained by history`_ as a shortcut
|
||||
for ``repr()``.
|
||||
Other characters,
|
||||
such as the "Backtick" (or grave accent `````) are also
|
||||
`constrained by history`_
|
||||
as a shortcut for ``repr()``.
|
||||
|
||||
This leaves a few remaining options for the design of such a feature:
|
||||
|
||||
* An operator, as in printf-style string formatting via ``%``.
|
||||
* A class, such as ``string.Template()``.
|
||||
* A function, such as ``str.format()``.
|
||||
* New syntax
|
||||
* A method or function, such as ``str.format()``.
|
||||
* New syntax, or
|
||||
* A new string prefix marker, such as the well-known ``r''`` or ``u''``.
|
||||
|
||||
The first three options above currently work well.
|
||||
The first three options above are mature.
|
||||
Each has specific use cases and drawbacks,
|
||||
yet also suffer from the verbosity and visual noise mentioned previously.
|
||||
All are discussed in the next section.
|
||||
All options are discussed in the next sections.
|
||||
|
||||
.. _constrained by history: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html
|
||||
|
||||
|
||||
Background
|
||||
-------------
|
||||
|
||||
This proposal builds on several existing techniques and proposals and what
|
||||
Formatted strings build on several existing techniques and proposals and what
|
||||
we've collectively learned from them.
|
||||
In keeping with the design goals of readability and error-prevention,
|
||||
the following examples therefore use named,
|
||||
not positional arguments.
|
||||
|
||||
The following examples focus on the design goals of readability and
|
||||
error-prevention using named parameters.
|
||||
Let's assume we have the following dictionary,
|
||||
and would like to print out its items as an informative string for end users::
|
||||
|
||||
>>> params = {'user': 'nobody', 'id': 9, 'hostname': 'darkstar'}
|
||||
|
||||
|
||||
Printf-style formatting
|
||||
'''''''''''''''''''''''
|
||||
Printf-style formatting, via operator
|
||||
'''''''''''''''''''''''''''''''''''''
|
||||
|
||||
This `venerable technique`_ continues to have its uses,
|
||||
such as with byte-based protocols,
|
||||
|
@ -178,7 +169,7 @@ and familiarity to many programmers::
|
|||
|
||||
In this form, considering the prerequisite dictionary creation,
|
||||
the technique is verbose, a tad noisy,
|
||||
and relatively readable.
|
||||
yet relatively readable.
|
||||
Additional issues are that an operator can only take one argument besides the
|
||||
original string,
|
||||
meaning multiple parameters must be passed in a tuple or dictionary.
|
||||
|
@ -190,8 +181,8 @@ or forget the trailing type, e.g. (``s`` or ``d``).
|
|||
.. _venerable technique: https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting
|
||||
|
||||
|
||||
string.Template
|
||||
'''''''''''''''
|
||||
string.Template Class
|
||||
'''''''''''''''''''''
|
||||
|
||||
The ``string.Template`` `class from`_ PEP 292
|
||||
(Simpler String Substitutions)
|
||||
|
@ -202,7 +193,7 @@ that finds its main use cases in shell and internationalization tools::
|
|||
|
||||
Template('Hello, user: $user, id: ${id}, on host: $hostname').substitute(params)
|
||||
|
||||
Also verbose, however the string itself is readable.
|
||||
While also verbose, the string itself is readable.
|
||||
Though functionality is limited,
|
||||
it meets its requirements well.
|
||||
It isn't powerful enough for many cases,
|
||||
|
@ -232,8 +223,8 @@ and likely contributed to the PEP's lack of acceptance.
|
|||
It was superseded by the following proposal.
|
||||
|
||||
|
||||
str.format()
|
||||
''''''''''''
|
||||
str.format() Method
|
||||
'''''''''''''''''''
|
||||
|
||||
The ``str.format()`` `syntax of`_ PEP 3101 is the most recent and modern of the
|
||||
existing options.
|
||||
|
@ -253,36 +244,32 @@ string literals::
|
|||
host=hostname)
|
||||
'Hello, user: nobody, id: 9, on host: darkstar'
|
||||
|
||||
The verbosity of the method-based approach is illustrated here.
|
||||
|
||||
.. _syntax of: https://docs.python.org/3/library/string.html#format-string-syntax
|
||||
|
||||
|
||||
PEP 498 -- Literal String Formatting
|
||||
''''''''''''''''''''''''''''''''''''
|
||||
|
||||
PEP 498 discusses and delves partially into implementation details of
|
||||
expression-strings,
|
||||
which it calls f-strings,
|
||||
the idea and syntax
|
||||
(with exception of the prefix letter)
|
||||
of which is identical to that discussed here.
|
||||
The resulting compile-time transformation however
|
||||
returns a string joined from parts at runtime,
|
||||
rather than an object.
|
||||
|
||||
It also, somewhat controversially to those first exposed to it,
|
||||
introduces the idea that these strings shall be augmented with support for
|
||||
arbitrary expressions,
|
||||
which is discussed further in the following sections.
|
||||
PEP 498 defines and discusses format strings,
|
||||
as also described in the `Abstract`_ above.
|
||||
|
||||
It also, somewhat controversially to those first exposed,
|
||||
introduces the idea that format-strings shall be augmented with support for
|
||||
arbitrary expressions.
|
||||
This is discussed further in the
|
||||
Restricting Syntax section under
|
||||
`Rejected Ideas`_.
|
||||
|
||||
PEP 501 -- Translation ready string interpolation
|
||||
'''''''''''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
The complimentary PEP 501 brings internationalization into the discussion as a
|
||||
first-class concern, with its proposal of i-strings,
|
||||
first-class concern, with its proposal of the i-prefix,
|
||||
``string.Template`` syntax integration compatible with ES6 (Javascript),
|
||||
deferred rendering,
|
||||
and a similar object return value.
|
||||
and an object return value.
|
||||
|
||||
|
||||
Implementations in Other Languages
|
||||
|
@ -374,7 +361,8 @@ ES6 (Javascript)
|
|||
Designers of `Template strings`_ faced the same issue as Python where single
|
||||
and double quotes were taken.
|
||||
Unlike Python however, "backticks" were not.
|
||||
They were chosen as part of the ECMAScript 2015 (ES6) standard::
|
||||
Despite `their issues`_,
|
||||
they were chosen as part of the ECMAScript 2015 (ES6) standard::
|
||||
|
||||
console.log(`Fifteen is ${a + b} and\nnot ${2 * a + b}.`);
|
||||
|
||||
|
@ -391,8 +379,10 @@ as the tag::
|
|||
* User implemented prefixes supported.
|
||||
* Arbitrary expressions are supported.
|
||||
|
||||
.. _their issues: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html
|
||||
.. _Template strings: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
|
||||
|
||||
|
||||
C#, Version 6
|
||||
'''''''''''''
|
||||
|
||||
|
@ -428,13 +418,14 @@ Arbitrary `interpolation under Swift`_ is available on all strings::
|
|||
Additional examples
|
||||
'''''''''''''''''''
|
||||
|
||||
A number of additional examples may be `found at Wikipedia`_.
|
||||
A number of additional examples of string interpolation may be
|
||||
`found at Wikipedia`_.
|
||||
|
||||
Now that background and history have been covered,
|
||||
let's continue on for a solution.
|
||||
|
||||
.. _found at Wikipedia: https://en.wikipedia.org/wiki/String_interpolation#Examples
|
||||
|
||||
Now that background and imlementation history have been covered,
|
||||
let's continue on for a solution.
|
||||
|
||||
|
||||
New Syntax
|
||||
----------
|
||||
|
@ -442,178 +433,47 @@ New Syntax
|
|||
This should be an option of last resort,
|
||||
as every new syntax feature has a cost in terms of real-estate in a brain it
|
||||
inhabits.
|
||||
There is one alternative left on our list of possibilities,
|
||||
There is however one alternative left on our list of possibilities,
|
||||
which follows.
|
||||
|
||||
|
||||
New String Prefix
|
||||
-----------------
|
||||
|
||||
Given the history of string formatting in Python,
|
||||
backwards-compatibility,
|
||||
Given the history of string formatting in Python and backwards-compatibility,
|
||||
implementations in other languages,
|
||||
and the avoidance of new syntax unless necessary,
|
||||
avoidance of new syntax unless necessary,
|
||||
an acceptable design is reached through elimination
|
||||
rather than unique insight.
|
||||
Therefore, we choose to explicitly mark interpolated string literals with a
|
||||
string prefix.
|
||||
Therefore, marking interpolated string literals with a string prefix is chosen.
|
||||
|
||||
We also choose an expression syntax that reuses and builds on the strongest of
|
||||
the existing choices,
|
||||
``str.format()`` to avoid further duplication.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
String literals with the prefix of ``e`` shall be converted at compile-time to
|
||||
the construction of an ``estr`` (perhaps ``types.ExpressionString``?) object.
|
||||
Strings and values are parsed from the literal and passed as tuples to the
|
||||
constructor::
|
||||
``str.format()`` to avoid further duplication of functionality::
|
||||
|
||||
>>> location = 'World'
|
||||
>>> e'Hello, {location} !'
|
||||
>>> f'Hello, {location} !' # new prefix: f''
|
||||
'Hello, World !' # interpolated result
|
||||
|
||||
# becomes
|
||||
# estr('Hello, {location} !', # template
|
||||
('Hello, ', ' !'), # string fragments
|
||||
('location',), # expressions
|
||||
('World',), # values
|
||||
)
|
||||
|
||||
The object interpolates its result immediately at run-time::
|
||||
|
||||
'Hello, World !'
|
||||
PEP 498 -- Literal String Formatting, delves into the mechanics and
|
||||
implementation of this design.
|
||||
|
||||
|
||||
ExpressionString Objects
|
||||
------------------------
|
||||
|
||||
The ExpressionString object supports both immediate and deferred rendering of
|
||||
its given template and parameters.
|
||||
It does this by immediately rendering its inputs to its internal string and
|
||||
``.rendered`` string member (still necessary?),
|
||||
useful in the majority of use cases.
|
||||
To allow for deferred rendering and caller-specified escaping,
|
||||
all inputs are saved for later inspection,
|
||||
with convenience methods available.
|
||||
|
||||
Notes:
|
||||
|
||||
* Inputs are saved to the object as ``.template`` and ``.context`` members
|
||||
for later use.
|
||||
* No explicit ``str(estr)`` call is necessary to render the result,
|
||||
though doing so might be desired to free resources if significant.
|
||||
* Additional or deferred rendering is available through the ``.render()``
|
||||
method, which allows template and context to be overriden for flexibility.
|
||||
* Manual escaping of potentially dangerous input is available through the
|
||||
``.escape(escape_function)`` method,
|
||||
the rules of which may therefore be specified by the caller.
|
||||
The given function should both accept and return a single modified string.
|
||||
|
||||
* A sample Python implementation can `found at Bitbucket`_:
|
||||
|
||||
.. _found at Bitbucket: https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_demo.py
|
||||
|
||||
|
||||
Inherits From ``str`` Type
|
||||
'''''''''''''''''''''''''''
|
||||
|
||||
Inheriting from the ``str`` class is one of the techniques available to improve
|
||||
compatibility with code expecting a string object,
|
||||
as it will pass an ``isinstance(obj, str)`` test.
|
||||
ExpressionString implements this and also renders its result into the "raw"
|
||||
string of its string superclass,
|
||||
providing compatibility with a majority of code.
|
||||
|
||||
|
||||
Interpolation Syntax
|
||||
--------------------
|
||||
|
||||
The strongest of the existing string formatting syntaxes is chosen,
|
||||
``str.format()`` as a base to build on. [10]_ [11]_
|
||||
|
||||
..
|
||||
|
||||
* Additionally, single arbitrary expressions shall also be supported inside
|
||||
braces as an extension::
|
||||
|
||||
>>> e'My age is {age + 1} years.'
|
||||
|
||||
See below for section on safety.
|
||||
|
||||
* Triple quoted strings with multiple lines shall be supported::
|
||||
|
||||
>>> e'''Hello,
|
||||
{location} !'''
|
||||
'Hello,\n World !'
|
||||
|
||||
* Adjacent implicit concatenation shall be supported;
|
||||
interpolation does not `not bleed into`_ other strings::
|
||||
|
||||
>>> 'Hello {1, 2, 3} ' e'{location} !'
|
||||
'Hello {1, 2, 3} World !'
|
||||
|
||||
* Additional implementation details,
|
||||
for example expression and error-handling,
|
||||
are specified in the compatible PEP 498.
|
||||
|
||||
.. _not bleed into: https://mail.python.org/pipermail/python-ideas/2015-July/034763.html
|
||||
|
||||
|
||||
Composition with Other Prefixes
|
||||
-------------------------------
|
||||
|
||||
* Expression-strings apply to unicode objects only,
|
||||
therefore ``u''`` is never needed.
|
||||
Should it be prevented?
|
||||
|
||||
* Bytes objects are not included here and do not compose with e'' as they
|
||||
do not support ``__format__()``.
|
||||
|
||||
* Complimentary to raw strings,
|
||||
backslash codes shall not be converted in the expression-string,
|
||||
when combined with ``r''`` as ``re''``.
|
||||
|
||||
|
||||
Examples
|
||||
--------
|
||||
|
||||
A more complicated example follows::
|
||||
|
||||
n = 5; # t0, t1 = … TODO
|
||||
a = e"Sliced {n} onions in {t1-t0:.3f} seconds."
|
||||
# returns the equvalent of
|
||||
estr("Sliced {n} onions in {t1-t0:.3f} seconds", # template
|
||||
('Sliced ', ' onions in ', ' seconds'), # strings
|
||||
('n', 't1-t0:.3f'), # expressions
|
||||
(5, 0.555555) # values
|
||||
)
|
||||
|
||||
With expressions only::
|
||||
|
||||
b = e"Three random numbers: {rand()}, {rand()}, {rand()}."
|
||||
# returns the equvalent of
|
||||
estr("Three random numbers: {rand():f}, {rand():f}, {rand():}.", # template
|
||||
('Three random numbers: ', ', ', ', ', '.'), # strings
|
||||
('rand():f', 'rand():f', 'rand():f'), # expressions
|
||||
(rand(), rand(), rand()) # values
|
||||
)
|
||||
Additional Topics
|
||||
=================
|
||||
|
||||
|
||||
Safety
|
||||
-----------
|
||||
|
||||
In this section we will describe the safety situation and precautions taken
|
||||
in support of expression-strings.
|
||||
in support of format-strings.
|
||||
|
||||
#. Only string literals shall be considered here,
|
||||
#. Only string literals have been considered for format-strings,
|
||||
not variables to be taken as input or passed around,
|
||||
making external attacks difficult to accomplish.
|
||||
|
||||
* ``str.format()`` `already handles`_ this use-case.
|
||||
* Direct instantiation of the ExpressionString object with non-literal input
|
||||
shall not be allowed. (Practicality?)
|
||||
``str.format()`` and alternatives `already handle`_ this use-case.
|
||||
|
||||
#. Neither ``locals()`` nor ``globals()`` are necessary nor used during the
|
||||
transformation,
|
||||
|
@ -622,37 +482,72 @@ in support of expression-strings.
|
|||
#. To eliminate complexity as well as ``RuntimeError`` (s) due to recursion
|
||||
depth, recursive interpolation is not supported.
|
||||
|
||||
#. Restricted characters or expression classes?, such as ``=`` for assignment.
|
||||
|
||||
However,
|
||||
mistakes or malicious code could be missed inside string literals.
|
||||
Though that can be said of code in general,
|
||||
that these expressions are inside strings means they are a bit more likely
|
||||
to be obscured.
|
||||
|
||||
.. _already handles: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html
|
||||
.. _already handle: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html
|
||||
|
||||
|
||||
Mitigation via tools
|
||||
Mitigation via Tools
|
||||
''''''''''''''''''''
|
||||
|
||||
The idea is that tools or linters such as pyflakes, pylint, or Pycharm,
|
||||
could check inside strings for constructs that exceed project policy.
|
||||
As this is a common task with languages these days,
|
||||
tools won't have to implement this feature solely for Python,
|
||||
may check inside strings with expressions and mark them up appropriately.
|
||||
As this is a common task with programming languages today,
|
||||
multi-language tools won't have to implement this feature solely for Python,
|
||||
significantly shortening time to implementation.
|
||||
|
||||
Additionally the Python interpreter could check(?) and warn with appropriate
|
||||
command-line parameters passed.
|
||||
Farther in the future,
|
||||
strings might also be checked for constructs that exceed the safety policy of
|
||||
a project.
|
||||
|
||||
|
||||
Style Guide/Precautions
|
||||
-----------------------
|
||||
|
||||
As arbitrary expressions may accomplish anything a Python expression is
|
||||
able to,
|
||||
it is highly recommended to avoid constructs inside format-strings that could
|
||||
cause side effects.
|
||||
|
||||
Further guidelines may be written once usage patterns and true problems are
|
||||
known.
|
||||
|
||||
|
||||
Reference Implementation(s)
|
||||
---------------------------
|
||||
|
||||
The `say module on PyPI`_ implements string interpolation as described here
|
||||
with the small burden of a callable interface::
|
||||
|
||||
> pip install say
|
||||
|
||||
from say import say
|
||||
nums = list(range(4))
|
||||
say("Nums has {len(nums)} items: {nums}")
|
||||
|
||||
A Python implementation of Ruby interpolation `is also available`_.
|
||||
It uses the codecs module to do its work::
|
||||
|
||||
> pip install interpy
|
||||
|
||||
# coding: interpy
|
||||
location = 'World'
|
||||
print("Hello #{location}.")
|
||||
|
||||
.. _say module on PyPI: https://pypi.python.org/pypi/say/
|
||||
.. _is also available: https://github.com/syrusakbary/interpy
|
||||
|
||||
|
||||
Backwards Compatibility
|
||||
-----------------------
|
||||
|
||||
By using existing syntax and avoiding use of current or historical features,
|
||||
expression-strings (and any associated sub-features),
|
||||
were designed so as to not interfere with existing code and is not expected
|
||||
to cause any issues.
|
||||
By using existing syntax and avoiding current or historical features,
|
||||
format strings were designed so as to not interfere with existing code and are
|
||||
not expected to cause any issues.
|
||||
|
||||
|
||||
Postponed Ideas
|
||||
|
@ -666,20 +561,12 @@ Though it was highly desired to integrate internationalization support,
|
|||
the finer details diverge at almost every point,
|
||||
making a common solution unlikely: [15]_
|
||||
|
||||
* Use-cases
|
||||
* Compile and run-time tasks
|
||||
* Interpolation Syntax
|
||||
* Use-cases differ
|
||||
* Compile vs. run-time tasks
|
||||
* Interpolation syntax needs
|
||||
* Intended audience
|
||||
* Security policy
|
||||
|
||||
Rather than try to fit a "square peg in a round hole,"
|
||||
this PEP attempts to allow internationalization to be supported in the future
|
||||
by not preventing it.
|
||||
In this proposal,
|
||||
expression-string inputs are saved for inspection and re-rendering at a later
|
||||
time,
|
||||
allowing for their use by an external library of any sort.
|
||||
|
||||
|
||||
Rejected Ideas
|
||||
--------------
|
||||
|
@ -687,18 +574,25 @@ Rejected Ideas
|
|||
Restricting Syntax to ``str.format()`` Only
|
||||
'''''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
This was deemed not enough of a solution to the problem.
|
||||
It can be seen in the `Implementations in Other Languages`_ section that the
|
||||
developer community at large tends to agree.
|
||||
The common `arguments against`_ support of arbitrary expresssions were:
|
||||
|
||||
The common `arguments against`_ arbitrary expresssions were:
|
||||
|
||||
#. YAGNI, "You ain't gonna need it."
|
||||
#. The change is not congruent with historical Python conservatism.
|
||||
#. `YAGNI`_, "You aren't gonna need it."
|
||||
#. The feature is not congruent with historical Python conservatism.
|
||||
#. Postpone - can implement in a future version if need is demonstrated.
|
||||
|
||||
.. _YAGNI: https://en.wikipedia.org/wiki/You_aren't_gonna_need_it
|
||||
.. _arguments against: https://mail.python.org/pipermail/python-ideas/2015-August/034913.html
|
||||
|
||||
Support of only ``str.format()`` syntax however,
|
||||
was deemed not enough of a solution to the problem.
|
||||
Often a simple length or increment of an object, for example,
|
||||
is desired before printing.
|
||||
|
||||
It can be seen in the `Implementations in Other Languages`_ section that the
|
||||
developer community at large tends to agree.
|
||||
String interpolation with arbitrary expresssions is becoming an industry
|
||||
standard in modern languages due to its utility.
|
||||
|
||||
|
||||
Additional/Custom String-Prefixes
|
||||
'''''''''''''''''''''''''''''''''
|
||||
|
@ -720,7 +614,7 @@ this was thought to create too much uncertainty of when and where string
|
|||
expressions could be used safely or not.
|
||||
The concept was also difficult to describe to others. [12]_
|
||||
|
||||
Always consider expression-string variables to be unescaped,
|
||||
Always consider format string variables to be unescaped,
|
||||
unless the developer has explicitly escaped them.
|
||||
|
||||
|
||||
|
@ -735,33 +629,13 @@ and looking too much like bash/perl,
|
|||
which could encourage bad habits. [13]_
|
||||
|
||||
|
||||
Reference Implementation(s)
|
||||
===========================
|
||||
|
||||
An expression-string implementation is currently attached to PEP 498,
|
||||
under the ``f''`` prefix,
|
||||
and may be available in nightly builds.
|
||||
|
||||
A Python implementation of Ruby interpolation `is also available`_,
|
||||
which is similar to this proposal.
|
||||
It uses the codecs module to do its work::
|
||||
|
||||
> pip install interpy
|
||||
|
||||
# coding: interpy
|
||||
location = 'World'
|
||||
print("Hello #{location}.")
|
||||
|
||||
.. _is also available: https://github.com/syrusakbary/interpy
|
||||
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
||||
* Eric V. Smith for providing invaluable implementation work and design
|
||||
opinions, helping to focus this PEP.
|
||||
* Others on the python-ideas mailing list for rejecting the craziest of ideas,
|
||||
also helping to achieve focus.
|
||||
* Eric V. Smith for the authoring and implementation of PEP 498.
|
||||
* Everyone on the python-ideas mailing list for rejecting the various crazy
|
||||
ideas that came up,
|
||||
helping to keep the final design in focus.
|
||||
|
||||
|
||||
References
|
||||
|
@ -771,7 +645,6 @@ References
|
|||
|
||||
(https://mail.python.org/pipermail/python-ideas/2015-July/034659.html)
|
||||
|
||||
|
||||
.. [2] Briefer String Format
|
||||
|
||||
(https://mail.python.org/pipermail/python-ideas/2015-July/034669.html)
|
||||
|
|
|
@ -0,0 +1,396 @@
|
|||
PEP: 504
|
||||
Title: Using the System RNG by default
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Nick Coghlan <ncoghlan@gmail.com>
|
||||
Status: Withdrawn
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 15-Sep-2015
|
||||
Python-Version: 3.6
|
||||
Post-History: 15-Sep-2015
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
Python currently defaults to using the deterministic Mersenne Twister random
|
||||
number generator for the module level APIs in the ``random`` module, requiring
|
||||
users to know that when they're performing "security sensitive" work, they
|
||||
should instead switch to using the cryptographically secure ``os.urandom`` or
|
||||
``random.SystemRandom`` interfaces or a third party library like
|
||||
``cryptography``.
|
||||
|
||||
Unfortunately, this approach has resulted in a situation where developers that
|
||||
aren't aware that they're doing security sensitive work use the default module
|
||||
level APIs, and thus expose their users to unnecessary risks.
|
||||
|
||||
This isn't an acute problem, but it is a chronic one, and the often long
|
||||
delays between the introduction of security flaws and their exploitation means
|
||||
that it is difficult for developers to naturally learn from experience.
|
||||
|
||||
In order to provide an eventually pervasive solution to the problem, this PEP
|
||||
proposes that Python switch to using the system random number generator by
|
||||
default in Python 3.6, and require developers to opt-in to using the
|
||||
deterministic random number generator process wide either by using a new
|
||||
``random.ensure_repeatable()`` API, or by explicitly creating their own
|
||||
``random.Random()`` instance.
|
||||
|
||||
To minimise the impact on existing code, module level APIs that require
|
||||
determinism will implicitly switch to the deterministic PRNG.
|
||||
|
||||
PEP Withdrawal
|
||||
==============
|
||||
|
||||
During discussion of this PEP, Steven D'Aprano proposed the simpler alternative
|
||||
of offering a standardised ``secrets`` module that provides "one obvious way"
|
||||
to handle security sensitive tasks like generating default passwords and other
|
||||
tokens.
|
||||
|
||||
Steven's proposal has the desired effect of aligning the easy way to generate
|
||||
such tokens and the right way to generate them, without introducing any
|
||||
compatibility risks for the existing ``random`` module API, so this PEP has
|
||||
been withdrawn in favour of further work on refining Steven's proposal as
|
||||
PEP 506.
|
||||
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
||||
Currently, it is never correct to use the module level functions in the
|
||||
``random`` module for security sensitive applications. This PEP proposes to
|
||||
change that admonition in Python 3.6+ to instead be that it is not correct to
|
||||
use the module level functions in the ``random`` module for security sensitive
|
||||
applications if ``random.ensure_repeatable()`` is ever called (directly or
|
||||
indirectly) in that process.
|
||||
|
||||
To achieve this, rather than being bound methods of a ``random.Random``
|
||||
instance as they are today, the module level callables in ``random`` would
|
||||
change to be functions that delegate to the corresponding method of the
|
||||
existing ``random._inst`` module attribute.
|
||||
|
||||
By default, this attribute will be bound to a ``random.SystemRandom`` instance.
|
||||
|
||||
A new ``random.ensure_repeatable()`` API will then rebind the ``random._inst``
|
||||
attribute to a ``system.Random`` instance, restoring the same module level
|
||||
API behaviour as existed in previous Python versions (aside from the
|
||||
additional level of indirection)::
|
||||
|
||||
def ensure_repeatable():
|
||||
"""Switch to using random.Random() for the module level APIs
|
||||
|
||||
This switches the default RNG instance from the crytographically
|
||||
secure random.SystemRandom() to the deterministic random.Random(),
|
||||
enabling the seed(), getstate() and setstate() operations. This means
|
||||
a particular random scenario can be replayed later by providing the
|
||||
same seed value or restoring a previously saved state.
|
||||
|
||||
NOTE: Libraries implementing security sensitive operations should
|
||||
always explicitly use random.SystemRandom() or os.urandom in order to
|
||||
correctly handle applications that call this function.
|
||||
"""
|
||||
if not isinstance(_inst, Random):
|
||||
_inst = random.Random()
|
||||
|
||||
To minimise the impact on existing code, calling any of the following module
|
||||
level functions will implicitly call ``random.ensure_repeatable()``:
|
||||
|
||||
* ``random.seed``
|
||||
* ``random.getstate``
|
||||
* ``random.setstate``
|
||||
|
||||
There are no changes proposed to the ``random.Random`` or
|
||||
``random.SystemRandom`` class APIs - applications that explicitly instantiate
|
||||
their own random number generators will be entirely unaffected by this
|
||||
proposal.
|
||||
|
||||
Warning on implicit opt-in
|
||||
--------------------------
|
||||
|
||||
In Python 3.6, implicitly opting in to the use of the deterministic PRNG will
|
||||
emit a deprecation warning using the following check::
|
||||
|
||||
if not isinstance(_inst, Random):
|
||||
warnings.warn(DeprecationWarning,
|
||||
"Implicitly ensuring repeatability. "
|
||||
"See help(random.ensure_repeatable) for details")
|
||||
ensure_repeatable()
|
||||
|
||||
The specific wording of the warning should have a suitable answer added to
|
||||
Stack Overflow as was done for the custom error message that was added for
|
||||
missing parentheses in a call to print [#print]_.
|
||||
|
||||
In the first Python 3 release after Python 2.7 switches to security fix only
|
||||
mode, the deprecation warning will be upgraded to a RuntimeWarning so it is
|
||||
visible by default.
|
||||
|
||||
This PEP does *not* propose ever removing the ability to ensure the default RNG
|
||||
used process wide is a deterministic PRNG that will produce the same series of
|
||||
outputs given a specific seed. That capability is widely used in modelling
|
||||
and simulation scenarios, and requiring that ``ensure_repeatable()`` be called
|
||||
either directly or indirectly is a sufficient enhancement to address the cases
|
||||
where the module level random API is used for security sensitive tasks in web
|
||||
applications without due consideration for the potential security implications
|
||||
of using a deterministic PRNG.
|
||||
|
||||
Performance impact
|
||||
------------------
|
||||
|
||||
Due to the large performance difference between ``random.Random`` and
|
||||
``random.SystemRandom``, applications ported to Python 3.6 will encounter a
|
||||
significant performance regression in cases where:
|
||||
|
||||
* the application is using the module level random API
|
||||
* cryptographic quality randomness isn't needed
|
||||
* the application doesn't already implicitly opt back in to the deterministic
|
||||
PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate``
|
||||
* the application isn't updated to explicitly call ``random.ensure_repeatable``
|
||||
|
||||
This would be noted in the Porting section of the Python 3.6 What's New guide,
|
||||
with the recommendation to include the following code in the ``__main__``
|
||||
module of affected applications::
|
||||
|
||||
if hasattr(random, "ensure_repeatable"):
|
||||
random.ensure_repeatable()
|
||||
|
||||
Applications that do need cryptographic quality randomness should be using the
|
||||
system random number generator regardless of speed considerations, so in those
|
||||
cases the change proposed in this PEP will fix a previously latent security
|
||||
defect.
|
||||
|
||||
Documentation changes
|
||||
---------------------
|
||||
|
||||
The ``random`` module documentation would be updated to move the documentation
|
||||
of the ``seed``, ``getstate`` and ``setstate`` interfaces later in the module,
|
||||
along with the documentation of the new ``ensure_repeatable`` function and the
|
||||
associated security warning.
|
||||
|
||||
That section of the module documentation would also gain a discussion of the
|
||||
respective use cases for the deterministic PRNG enabled by
|
||||
``ensure_repeatable`` (games, modelling & simulation, software testing) and the
|
||||
system RNG that is used by default (cryptography, security token generation).
|
||||
This discussion will also recommend the use of third party security libraries
|
||||
for the latter task.
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Writing secure software under deadline and budget pressures is a hard problem.
|
||||
This is reflected in regular notifications of data breaches involving personally
|
||||
identifiable information [#breaches]_, as well as with failures to take
|
||||
security considerations into account when new systems, like motor vehicles
|
||||
[#uconnect]_, are connected to the internet. It's also the case that a lot of
|
||||
the programming advice readily available on the internet [#search] simply
|
||||
doesn't take the mathemetical arcana of computer security into account.
|
||||
Compounding these issues is the fact that defenders have to cover *all* of
|
||||
their potential vulnerabilites, as a single mistake can make it possible to
|
||||
subvert other defences [#bcrypt]_.
|
||||
|
||||
One of the factors that contributes to making this last aspect particularly
|
||||
difficult is APIs where using them inappropriately creates a *silent* security
|
||||
failure - one where the only way to find out that what you're doing is
|
||||
incorrect is for someone reviewing your code to say "that's a potential
|
||||
security problem", or for a system you're responsible for to be compromised
|
||||
through such an oversight (and you're not only still responsible for that
|
||||
system when it is compromised, but your intrusion detection and auditing
|
||||
mechanisms are good enough for you to be able to figure out after the event
|
||||
how the compromise took place).
|
||||
|
||||
This kind of situation is a significant contributor to "security fatigue",
|
||||
where developers (often rightly [#owasptopten]_) feel that security engineers
|
||||
spend all their time saying "don't do that the easy way, it creates a
|
||||
security vulnerability".
|
||||
|
||||
As the designers of one of the world's most popular languages [#ieeetopten]_,
|
||||
we can help reduce that problem by making the easy way the right way (or at
|
||||
least the "not wrong" way) in more circumstances, so developers and security
|
||||
engineers can spend more time worrying about mitigating actually interesting
|
||||
threats, and less time fighting with default language behaviours.
|
||||
|
||||
Discussion
|
||||
==========
|
||||
|
||||
Why "ensure_repeatable" over "ensure_deterministic"?
|
||||
----------------------------------------------------
|
||||
|
||||
This is a case where the meaning of a word as specialist jargon conflicts with
|
||||
the typical meaning of the word, even though it's *technically* the same.
|
||||
|
||||
From a technical perspective, a "deterministic RNG" means that given knowledge
|
||||
of the algorithm and the current state, you can reliably compute arbitrary
|
||||
future states.
|
||||
|
||||
The problem is that "deterministic" on its own doesn't convey those qualifiers,
|
||||
so it's likely to instead be interpreted as "predictable" or "not random" by
|
||||
folks that are familiar with the conventional meaning, but aren't familiar with
|
||||
the additional qualifiers on the technical meaning.
|
||||
|
||||
A second problem with "deterministic" as a description for the traditional RNG
|
||||
is that it doesn't really tell you what you can *do* with the traditional RNG
|
||||
that you can't do with the system one.
|
||||
|
||||
"ensure_repeatable" aims to address both of those problems, as its common
|
||||
meaning accurately describes the main reason for preferring the deterministic
|
||||
PRNG over the system RNG: ensuring you can repeat the same series of outputs
|
||||
by providing the same seed value, or by restoring a previously saved PRNG state.
|
||||
|
||||
Only changing the default for Python 3.6+
|
||||
-----------------------------------------
|
||||
|
||||
Some other recent security changes, such as upgrading the capabilities of the
|
||||
``ssl`` module and switching to properly verifying HTTPS certificates by
|
||||
default, have been considered critical enough to justify backporting the
|
||||
change to all currently supported versions of Python.
|
||||
|
||||
The difference in this case is one of degree - the additional benefits from
|
||||
rolling out this particular change a couple of years earlier than will
|
||||
otherwise be the case aren't sufficient to justify either the additional effort
|
||||
or the stability risks involved in making such an intrusive change in a
|
||||
maintenance release.
|
||||
|
||||
Keeping the module level functions
|
||||
----------------------------------
|
||||
|
||||
In additional to general backwards compatibility considerations, Python is
|
||||
widely used for educational purposes, and we specifically don't want to
|
||||
invalidate the wide array of educational material that assumes the availabilty
|
||||
of the current ``random`` module API. Accordingly, this proposal ensures that
|
||||
most of the public API can continue to be used not only without modification,
|
||||
but without generating any new warnings.
|
||||
|
||||
Warning when implicitly opting in to the deterministic RNG
|
||||
----------------------------------------------------------
|
||||
|
||||
It's necessary to implicitly opt in to the deterministic PRNG as Python is
|
||||
widely used for modelling and simulation purposes where this is the right
|
||||
thing to do, and in many cases, these software models won't have a dedicated
|
||||
maintenance team tasked with ensuring they keep working on the latest versions
|
||||
of Python.
|
||||
|
||||
Unfortunately, explicitly calling ``random.seed`` with data from ``os.urandom``
|
||||
is also a mistake that appears in a number of the flawed "how to generate a
|
||||
security token in Python" guides readily available online.
|
||||
|
||||
Using first DeprecationWarning, and then eventually a RuntimeWarning, to
|
||||
advise against implicitly switching to the deterministic PRNG aims to
|
||||
nudge future users that need a cryptographically secure RNG away from
|
||||
calling ``random.seed()`` and those that genuinely need a deterministic
|
||||
generator towards explicitily calling ``random.ensure_repeatable()``.
|
||||
|
||||
Avoiding the introduction of a userspace CSPRNG
|
||||
-----------------------------------------------
|
||||
|
||||
The original discussion of this proposal on python-ideas[#csprng]_ suggested
|
||||
introducing a cryptographically secure pseudo-random number generator and using
|
||||
that by default, rather than defaulting to the relatively slow system random
|
||||
number generator.
|
||||
|
||||
The problem [#nocsprng]_ with this approach is that it introduces an additional
|
||||
point of failure in security sensitive situations, for the sake of applications
|
||||
where the random number generation may not even be on a critical performance
|
||||
path.
|
||||
|
||||
Applications that do need cryptographic quality randomness should be using the
|
||||
system random number generator regardless of speed considerations, so in those
|
||||
cases.
|
||||
|
||||
Isn't the deterministic PRNG "secure enough"?
|
||||
---------------------------------------------
|
||||
|
||||
In a word, "No" - that's why there's a warning in the module documentation
|
||||
that says not to use it for security sensitive purposes. While we're not
|
||||
currently aware of any studies of Python's random number generator specifically,
|
||||
studies of PHP's random number generator [#php]_ have demonstrated the ability
|
||||
to use weaknesses in that subsystem to facilitate a practical attack on
|
||||
password recovery tokens in popular PHP web applications.
|
||||
|
||||
However, one of the rules of secure software development is that "attacks only
|
||||
get better, never worse", so it may be that by the time Python 3.6 is released
|
||||
we will actually see a practical attack on Python's deterministic PRNG publicly
|
||||
documented.
|
||||
|
||||
Security fatigue in the Python ecosystem
|
||||
----------------------------------------
|
||||
|
||||
Over the past few years, the computing industry as a whole has been
|
||||
making a concerted effort to upgrade the shared network infrastructure we all
|
||||
depend on to a "secure by default" stance. As one of the most widely used
|
||||
programming languages for network service development (including the OpenStack
|
||||
Infrastructure-as-a-Service platform) and for systems administration
|
||||
on Linux systems in general, a fair share of that burden has fallen on the
|
||||
Python ecosystem, which is understandably frustrating for Pythonistas using
|
||||
Python in other contexts where these issues aren't of as great a concern.
|
||||
|
||||
This consideration is one of the primary factors driving the substantial
|
||||
backwards compatibility improvements in this proposal relative to the initial
|
||||
draft concept posted to python-ideas [#draft]_.
|
||||
|
||||
Acknowledgements
|
||||
================
|
||||
|
||||
* Theo de Raadt, for making the suggestion to Guido van Rossum that we
|
||||
seriously consider defaulting to a cryptographically secure random number
|
||||
generator
|
||||
* Serhiy Storchaka, Terry Reedy, Petr Viktorin, and anyone else in the
|
||||
python-ideas threads that suggested the approach of transparently switching
|
||||
to the ``random.Random`` implementation when any of the functions that only
|
||||
make sense for a deterministic RNG are called
|
||||
* Nathaniel Smith for providing the reference on practical attacks against
|
||||
PHP's random number generator when used to generate password reset tokens
|
||||
* Donald Stufft for pursuing additional discussions with network security
|
||||
experts that suggested the introduction of a userspace CSPRNG would mean
|
||||
additional complexity for insufficient gain relative to just using the
|
||||
system RNG directly
|
||||
* Paul Moore for eloquently making the case for the current level of security
|
||||
fatigue in the Python ecosystem
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [#breaches] Visualization of data breaches involving more than 30k records (each)
|
||||
(http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/)
|
||||
|
||||
.. [#uconnect] Remote UConnect hack for Jeep Cherokee
|
||||
(http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/)
|
||||
|
||||
.. [#php] PRNG based attack against password reset tokens in PHP applications
|
||||
(https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf)
|
||||
|
||||
.. [#search] Search link for "python password generator"
|
||||
(https://www.google.com.au/search?q=python+password+generator)
|
||||
|
||||
.. [#csprng] python-ideas thread discussing using a userspace CSPRNG
|
||||
(https://mail.python.org/pipermail/python-ideas/2015-September/035886.html)
|
||||
|
||||
.. [#draft] Initial draft concept that eventually became this PEP
|
||||
(https://mail.python.org/pipermail/python-ideas/2015-September/036095.html)
|
||||
|
||||
.. [#nocsprng] Safely generating random numbers
|
||||
(http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/)
|
||||
|
||||
.. [#ieeetopten] IEEE Spectrum 2015 Top Ten Programming Languages
|
||||
(http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages)
|
||||
|
||||
.. [#owasptopten] OWASP Top Ten Web Security Issues for 2013
|
||||
(https://www.owasp.org/index.php/OWASP_Top_Ten_Project#tab=OWASP_Top_10_for_2013)
|
||||
|
||||
.. [#print] Stack Overflow answer for missing parentheses in call to print
|
||||
(http://stackoverflow.com/questions/25445439/what-does-syntaxerror-missing-parentheses-in-call-to-print-mean-in-python/25445440#25445440)
|
||||
|
||||
.. [#bcrypt] Bypassing bcrypt through an insecure data cache
|
||||
(http://arstechnica.com/security/2015/09/once-seen-as-bulletproof-11-million-ashley-madison-passwords-already-cracked/)
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
|
@ -0,0 +1,205 @@
|
|||
PEP: 505
|
||||
Title: None coalescing operators
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Mark E. Haase <mehaase@gmail.com>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 18-Sep-2015
|
||||
Python-Version: 3.6
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
Several modern programming languages have so-called "null coalescing" or
|
||||
"null aware" operators, including C#, Dart, Perl, Swift, and PHP (starting in
|
||||
version 7). These operators provide syntactic sugar for common patterns
|
||||
involving null references. [1]_ [2]_
|
||||
|
||||
* The "null coalescing" operator is a binary operator that returns its first
|
||||
first non-null operand.
|
||||
* The "null aware member access" operator is a binary operator that accesses
|
||||
an instance member only if that instance is non-null. It returns null
|
||||
otherwise.
|
||||
* The "null aware index access" operator is a binary operator that accesses a
|
||||
member of a collection only if that collection is non-null. It returns null
|
||||
otherwise.
|
||||
|
||||
Python does not have any directly equivalent syntax. The ``or`` operator can
|
||||
be used to similar effect but checks for a truthy value, not ``None``
|
||||
specifically. The ternary operator ``... if ... else ...`` can be used for
|
||||
explicit null checks but is more verbose and typically duplicates part of the
|
||||
expression in between ``if`` and ``else``. The proposed ``None`` coalescing
|
||||
and ``None`` aware operators ofter an alternative syntax that is more
|
||||
intuitive and concise.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
Null Coalescing Operator
|
||||
------------------------
|
||||
|
||||
The following code illustrates how the ``None`` coalescing operators would
|
||||
work in Python::
|
||||
|
||||
>>> title = 'My Title'
|
||||
>>> title ?? 'Default Title'
|
||||
'My Title'
|
||||
>>> title = None
|
||||
>>> title ?? 'Default Title'
|
||||
'Default Title'
|
||||
|
||||
Similar behavior can be achieved with the ``or`` operator, but ``or`` checks
|
||||
whether its left operand is false-y, not specifically ``None``. This can lead
|
||||
to surprising behavior. Consider the scenario of computing the price of some
|
||||
products a customer has in his/her shopping cart::
|
||||
|
||||
>>> price = 100
|
||||
>>> requested_quantity = 5
|
||||
>>> default_quantity = 1
|
||||
>>> (requested_quantity or default_quantity) * price
|
||||
500
|
||||
>>> requested_quantity = None
|
||||
>>> (requested_quantity or default_quantity) * price
|
||||
100
|
||||
>>> requested_quantity = 0
|
||||
>>> (requested_quantity or default_quantity) * price # oops!
|
||||
100
|
||||
|
||||
This type of bug is not possible with the ``None`` coalescing operator,
|
||||
because there is no implicit type coersion to ``bool``::
|
||||
|
||||
>>> price = 100
|
||||
>>> requested_quantity = 0
|
||||
>>> default_quantity = 1
|
||||
>>> (requested_quantity ?? default_quantity) * price
|
||||
0
|
||||
|
||||
The same correct behavior can be achieved with the ternary operator. Here is
|
||||
an excerpt from the popular Requests package::
|
||||
|
||||
data = [] if data is None else data
|
||||
files = [] if files is None else files
|
||||
headers = {} if headers is None else headers
|
||||
params = {} if params is None else params
|
||||
hooks = {} if hooks is None else hooks
|
||||
|
||||
This particular formulation has the undesirable effect of putting the operands
|
||||
in an unintuitive order: the brain thinks, "use ``data`` if possible and use
|
||||
``[]`` as a fallback," but the code puts the fallback *before* the preferred
|
||||
value.
|
||||
|
||||
The author of this package could have written it like this instead::
|
||||
|
||||
data = data if data is not None else []
|
||||
files = files if files is not None else []
|
||||
headers = headers if headers is not None else {}
|
||||
params = params if params is not None else {}
|
||||
hooks = hooks if hooks is not None else {}
|
||||
|
||||
This ordering of the operands is more intuitive, but it requires 4 extra
|
||||
characters (for "not "). It also highlights the repetition of identifiers:
|
||||
``data if data``, ``files if files``, etc. The ``None`` coalescing operator
|
||||
improves readability::
|
||||
|
||||
data = data ?? []
|
||||
files = files ?? []
|
||||
headers = headers ?? {}
|
||||
params = params ?? {}
|
||||
hooks = hooks ?? {}
|
||||
|
||||
The ``None`` coalescing operator also has a corresponding assignment shortcut.
|
||||
|
||||
::
|
||||
|
||||
data ?= []
|
||||
files ?= []
|
||||
headers ?= {}
|
||||
params ?= {}
|
||||
hooks ?= {}
|
||||
|
||||
The ``None`` coalescing operator is left-associative, which allows for easy
|
||||
chaining::
|
||||
|
||||
>>> user_title = None
|
||||
>>> local_default_title = None
|
||||
>>> global_default_title = 'Global Default Title'
|
||||
>>> title = user_title ?? local_default_title ?? global_default_title
|
||||
'Global Default Title'
|
||||
|
||||
The direction of associativity is important because the ``None`` coalescing
|
||||
operator short circuits: if its left operand is non-null, then the right
|
||||
operand is not evaluated.
|
||||
|
||||
::
|
||||
|
||||
>>> def get_default(): raise Exception()
|
||||
>>> 'My Title' ?? get_default()
|
||||
'My Title'
|
||||
|
||||
|
||||
Null-Aware Member Access Operator
|
||||
---------------------------------
|
||||
|
||||
::
|
||||
|
||||
>>> title = 'My Title'
|
||||
>>> title.upper()
|
||||
'MY TITLE'
|
||||
>>> title = None
|
||||
>>> title.upper()
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
AttributeError: 'NoneType' object has no attribute 'upper'
|
||||
>>> title?.upper()
|
||||
None
|
||||
|
||||
|
||||
Null-Aware Index Access Operator
|
||||
---------------------------------
|
||||
|
||||
::
|
||||
|
||||
>>> person = {'name': 'Mark', 'age': 32}
|
||||
>>> person['name']
|
||||
'Mark'
|
||||
>>> person = None
|
||||
>>> person['name']
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
TypeError: 'NoneType' object is not subscriptable
|
||||
>>> person?['name']
|
||||
None
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] Wikipedia: Null coalescing operator
|
||||
(https://en.wikipedia.org/wiki/Null_coalescing_operator)
|
||||
|
||||
.. [2] Seth Ladd's Blog: Null-aware operators in Dart
|
||||
(http://blog.sethladd.com/2015/07/null-aware-operators-in-dart.html)
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
|
@ -0,0 +1,356 @@
|
|||
PEP: 506
|
||||
Title: Adding A Secrets Module To The Standard Library
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Steven D'Aprano <steve@pearwood.info>
|
||||
Status: Draft
|
||||
Type: Standards Track
|
||||
Content-Type: text/x-rst
|
||||
Created: 19-Sep-2015
|
||||
Python-Version: 3.6
|
||||
Post-History:
|
||||
|
||||
|
||||
Abstract
|
||||
========
|
||||
|
||||
This PEP proposes the addition of a module for common security-related
|
||||
functions such as generating tokens to the Python standard library.
|
||||
|
||||
|
||||
Definitions
|
||||
===========
|
||||
|
||||
Some common abbreviations used in this proposal:
|
||||
|
||||
* PRNG:
|
||||
|
||||
Pseudo Random Number Generator. A deterministic algorithm used
|
||||
to produce random-looking numbers with certain desirable
|
||||
statistical properties.
|
||||
|
||||
* CSPRNG:
|
||||
|
||||
Cryptographically Strong Pseudo Random Number Generator. An
|
||||
algorithm used to produce random-looking numbers which are
|
||||
resistant to prediction.
|
||||
|
||||
* MT:
|
||||
|
||||
Mersenne Twister. An extensively studied PRNG which is currently
|
||||
used by the ``random`` module as the default.
|
||||
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
||||
This proposal is motivated by concerns that Python's standard library
|
||||
makes it too easy for developers to inadvertently make serious security
|
||||
errors. Theo de Raadt, the founder of OpenBSD, contacted Guido van Rossum
|
||||
and expressed some concern [1]_ about the use of MT for generating sensitive
|
||||
information such as passwords, secure tokens, session keys and similar.
|
||||
|
||||
Although the documentation for the random module explicitly states that
|
||||
the default is not suitable for security purposes [2]_, it is strongly
|
||||
believed that this warning may be missed, ignored or misunderstood by
|
||||
many Python developers. In particular:
|
||||
|
||||
* developers may not have read the documentation and consequently
|
||||
not seen the warning;
|
||||
|
||||
* they may not realise that their specific use of it has security
|
||||
implications; or
|
||||
|
||||
* not realising that there could be a problem, they have copied code
|
||||
(or learned techniques) from websites which don't offer best
|
||||
practises.
|
||||
|
||||
The first [3]_ hit when searching for "python how to generate passwords" on
|
||||
Google is a tutorial that uses the default functions from the ``random``
|
||||
module [4]_. Although it is not intended for use in web applications, it is
|
||||
likely that similar techniques find themselves used in that situation.
|
||||
The second hit is to a StackOverflow question about generating
|
||||
passwords [5]_. Most of the answers given, including the accepted one, use
|
||||
the default functions. When one user warned that the default could be
|
||||
easily compromised, they were told "I think you worry too much." [6]_
|
||||
|
||||
This strongly suggests that the existing ``random`` module is an attractive
|
||||
nuisance when it comes to generating (for example) passwords or secure
|
||||
tokens.
|
||||
|
||||
Additional motivation (of a more philosophical bent) can be found in the
|
||||
post which first proposed this idea [7]_.
|
||||
|
||||
|
||||
Proposal
|
||||
========
|
||||
|
||||
Alternative proposals have focused on the default PRNG in the ``random``
|
||||
module, with the aim of providing "secure by default" cryptographically
|
||||
strong primitives that developers can build upon without thinking about
|
||||
security. (See Alternatives below.) This proposes a different approach:
|
||||
|
||||
* The standard library already provides cryptographically strong
|
||||
primitives, but many users don't know they exist or when to use them.
|
||||
|
||||
* Instead of requiring crypto-naive users to write secure code, the
|
||||
standard library should include a set of ready-to-use "batteries" for
|
||||
the most common needs, such as generating secure tokens. This code
|
||||
will both directly satisfy a need ("How do I generate a password reset
|
||||
token?"), and act as an example of acceptable practises which
|
||||
developers can learn from [8]_.
|
||||
|
||||
To do this, this PEP proposes that we add a new module to the standard
|
||||
library, with the suggested name ``secrets``. This module will contain a
|
||||
set of ready-to-use functions for common activities with security
|
||||
implications, together with some lower-level primitives.
|
||||
|
||||
The suggestion is that ``secrets`` becomes the go-to module for dealing
|
||||
with anything which should remain secret (passwords, tokens, etc.)
|
||||
while the ``random`` module remains backward-compatible.
|
||||
|
||||
|
||||
API and Implementation
|
||||
======================
|
||||
|
||||
The contents of the ``secrets`` module is expected to evolve over time, and
|
||||
likely will evolve between the time of writing this PEP and actual release
|
||||
in the standard library [9]_. At the time of writing, the following functions
|
||||
have been suggested:
|
||||
|
||||
* A high-level function for generating secure tokens suitable for use
|
||||
in (e.g.) password recovery, as session keys, etc.
|
||||
|
||||
* A limited interface to the system CSPRNG, using either ``os.urandom``
|
||||
directly or ``random.SystemRandom``. Unlike the ``random`` module, this
|
||||
does not need to provide methods for seeding, getting or setting the
|
||||
state, or any non-uniform distributions. It should provide the
|
||||
following:
|
||||
|
||||
- A function for choosing items from a sequence, ``secrets.choice``.
|
||||
- A function for generating an integer within some range, such as
|
||||
``secrets.randrange`` or ``secrets.randint``.
|
||||
- A function for generating a given number of random bits and/or bytes
|
||||
as an integer.
|
||||
- A similar function which returns the value as a hex digit string.
|
||||
|
||||
* ``hmac.compare_digest`` under the name ``equal``.
|
||||
|
||||
The consensus appears to be that there is no need to add a new CSPRNG to
|
||||
the ``random`` module to support these uses, ``SystemRandom`` will be
|
||||
sufficient.
|
||||
|
||||
Some illustrative implementations have been given by Nick Coghlan [10]_.
|
||||
This idea has also been discussed on the issue tracker for the
|
||||
"cryptography" module [11]_.
|
||||
|
||||
The ``secrets`` module itself will be pure Python, and other Python
|
||||
implementations can easily make use of it unchanged, or adapt it as
|
||||
necessary.
|
||||
|
||||
|
||||
Alternatives
|
||||
============
|
||||
|
||||
One alternative is to change the default PRNG provided by the ``random``
|
||||
module [12]_. This received considerable scepticism and outright opposition:
|
||||
|
||||
* There is fear that a CSPRNG may be slower than the current PRNG (which
|
||||
in the case of MT is already quite slow).
|
||||
|
||||
* Some applications (such as scientific simulations, and replaying
|
||||
gameplay) require the ability to seed the PRNG into a known state,
|
||||
which a CSPRNG lacks by design.
|
||||
|
||||
* Another major use of the ``random`` module is for simple "guess a number"
|
||||
games written by beginners, and many people are loath to make any
|
||||
change to the ``random`` module which may make that harder.
|
||||
|
||||
* Although there is no proposal to remove MT from the ``random`` module,
|
||||
there was considerable hostility to the idea of having to opt-in to
|
||||
a non-CSPRNG or any backwards-incompatible changes.
|
||||
|
||||
* Demonstrated attacks against MT are typically against PHP applications.
|
||||
It is believed that PHP's version of MT is a significantly softer target
|
||||
than Python's version, due to a poor seeding technique [13]_. Consequently,
|
||||
without a proven attack against Python applications, many people object
|
||||
to a backwards-incompatible change.
|
||||
|
||||
Nick Coghlan made an earlier suggestion for a globally configurable PRNG
|
||||
which uses the system CSPRNG by default [14]_, but has since hinted that he
|
||||
may withdraw it in favour of this proposal [15]_.
|
||||
|
||||
|
||||
Comparison To Other Languages
|
||||
=============================
|
||||
|
||||
* PHP
|
||||
|
||||
PHP includes a function ``uniqid`` [16]_ which by default returns a
|
||||
thirteen character string based on the current time in microseconds.
|
||||
Translated into Python syntax, it has the following signature::
|
||||
|
||||
def uniqid(prefix='', more_entropy=False)->str
|
||||
|
||||
The PHP documentation warns that this function is not suitable for
|
||||
security purposes. Nevertheless, various mature, well-known PHP
|
||||
applications use it for that purpose (citation needed).
|
||||
|
||||
PHP 5.3 and better also includes a function ``openssl_random_pseudo_bytes``
|
||||
[17]_. Translated into Python syntax, it has roughly the following
|
||||
signature::
|
||||
|
||||
def openssl_random_pseudo_bytes(length:int)->Tuple[str, bool]
|
||||
|
||||
This function returns a pseudo-random string of bytes of the given
|
||||
length, and an boolean flag giving whether the string is considered
|
||||
cryptographically strong. The PHP manual suggests that returning
|
||||
anything but True should be rare except for old or broken platforms.
|
||||
|
||||
* Javascript
|
||||
|
||||
Based on a rather cursory search [18]_, there doesn't appear to be any
|
||||
well-known standard functions for producing strong random values in
|
||||
Javascript, although there may be good quality third-party libraries.
|
||||
Standard Javascript doesn't seem to include an interface to the
|
||||
system CSPRNG either, and people have extensively written about the
|
||||
weaknesses of Javascript's ``Math.random`` [19]_.
|
||||
|
||||
* Ruby
|
||||
|
||||
The Ruby standard library includes a module ``SecureRandom`` [20]_
|
||||
which includes the following methods:
|
||||
|
||||
* base64 - returns a Base64 encoded random string.
|
||||
|
||||
* hex - returns a random hexadecimal string.
|
||||
|
||||
* random_bytes - returns a random byte string.
|
||||
|
||||
* random_number - depending on the argument, returns either a random
|
||||
integer in the range(0, n), or a random float between 0.0 and 1.0.
|
||||
|
||||
* urlsafe_base64 - returns a random URL-safe Base64 encoded string.
|
||||
|
||||
* uuid - return a version 4 random Universally Unique IDentifier.
|
||||
|
||||
|
||||
What Should Be The Name Of The Module?
|
||||
======================================
|
||||
|
||||
There was a proposal to add a "random.safe" submodule, quoting the Zen
|
||||
of Python "Namespaces are one honking great idea" koan. However, the
|
||||
author of the Zen, Tim Peters, has come out against this idea [21]_, and
|
||||
recommends a top-level module.
|
||||
|
||||
In discussion on the python-ideas mailing list so far, the name "secrets"
|
||||
has received some approval, and no strong opposition.
|
||||
|
||||
|
||||
Frequently Asked Questions
|
||||
==========================
|
||||
|
||||
* Q: Is this a real problem? Surely MT is random enough that nobody can
|
||||
predict its output.
|
||||
|
||||
A: The consensus among security professionals is that MT is not safe
|
||||
in security contexts. It is not difficult to reconstruct the internal
|
||||
state of MT [22]_ [23]_ and so predict all past and future values. There
|
||||
are a number of known, practical attacks on systems using MT for
|
||||
randomness [24]_.
|
||||
|
||||
While there are currently no known direct attacks on applications
|
||||
written in Python due to the use of MT, there is widespread agreement
|
||||
that such usage is unsafe.
|
||||
|
||||
* Q: Is this an alternative to specialise cryptographic software such as SSL?
|
||||
|
||||
A: No. This is a "batteries included" solution, not a full-featured
|
||||
"nuclear reactor". It is intended to mitigate against some basic
|
||||
security errors, not be a solution to all security-related issues. To
|
||||
quote Nick Coghlan referring to his earlier proposal [25]_::
|
||||
|
||||
"...folks really are better off learning to use things like
|
||||
cryptography.io for security sensitive software, so this change
|
||||
is just about harm mitigation given that it's inevitable that a
|
||||
non-trivial proportion of the millions of current and future
|
||||
Python developers won't do that."
|
||||
|
||||
|
||||
References
|
||||
==========
|
||||
|
||||
.. [1] https://mail.python.org/pipermail/python-ideas/2015-September/035820.html
|
||||
|
||||
.. [2] https://docs.python.org/3/library/random.html
|
||||
|
||||
.. [3] As of the date of writing. Also, as Google search terms may be
|
||||
automatically customised for the user without their knowledge, some
|
||||
readers may see different results.
|
||||
|
||||
.. [4] http://interactivepython.org/runestone/static/everyday/2013/01/3_password.html
|
||||
|
||||
.. [5] http://stackoverflow.com/questions/3854692/generate-password-in-python
|
||||
|
||||
.. [6] http://stackoverflow.com/questions/3854692/generate-password-in-python/3854766#3854766
|
||||
|
||||
.. [7] https://mail.python.org/pipermail/python-ideas/2015-September/036238.html
|
||||
|
||||
.. [8] At least those who are motivated to read the source code and documentation.
|
||||
|
||||
.. [9] Tim Peters suggests that bike-shedding the contents of the module will
|
||||
be 10000 times more time consuming than actually implementing the
|
||||
module. Words do not begin to express how much I am looking forward to
|
||||
this.
|
||||
|
||||
.. [10] https://mail.python.org/pipermail/python-ideas/2015-September/036271.html
|
||||
|
||||
.. [11] https://github.com/pyca/cryptography/issues/2347
|
||||
|
||||
.. [12] Link needed.
|
||||
|
||||
.. [13] By default PHP seeds the MT PRNG with the time (citation needed),
|
||||
which is exploitable by attackers, while Python seeds the PRNG with
|
||||
output from the system CSPRNG, which is believed to be much harder to
|
||||
exploit.
|
||||
|
||||
.. [14] http://legacy.python.org/dev/peps/pep-0504/
|
||||
|
||||
.. [15] https://mail.python.org/pipermail/python-ideas/2015-September/036243.html
|
||||
|
||||
.. [16] http://php.net/manual/en/function.uniqid.php
|
||||
|
||||
.. [17] http://php.net/manual/en/function.openssl-random-pseudo-bytes.php
|
||||
|
||||
.. [18] Volunteers and patches are welcome.
|
||||
|
||||
.. [19] http://ifsec.blogspot.fr/2012/05/cross-domain-mathrandom-prediction.html
|
||||
|
||||
.. [20] http://ruby-doc.org/stdlib-2.1.2/libdoc/securerandom/rdoc/SecureRandom.html
|
||||
|
||||
.. [21] https://mail.python.org/pipermail/python-ideas/2015-September/036254.html
|
||||
|
||||
.. [22] https://jazzy.id.au/2010/09/22/cracking_random_number_generators_part_3.html
|
||||
|
||||
.. [23] https://mail.python.org/pipermail/python-ideas/2015-September/036077.html
|
||||
|
||||
.. [24] https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf
|
||||
|
||||
.. [25] https://mail.python.org/pipermail/python-ideas/2015-September/036157.html
|
||||
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
||||
This document has been placed in the public domain.
|
||||
|
||||
|
||||
|
||||
..
|
||||
Local Variables:
|
||||
mode: indented-text
|
||||
indent-tabs-mode: nil
|
||||
sentence-end-double-space: t
|
||||
fill-column: 70
|
||||
coding: utf-8
|
||||
End:
|
|
@ -2,7 +2,7 @@ PEP: 3140
|
|||
Title: str(container) should call str(item), not repr(item)
|
||||
Version: $Revision$
|
||||
Last-Modified: $Date$
|
||||
Author: Oleg Broytmann <phd@phd.pp.ru>,
|
||||
Author: Oleg Broytman <phd@phdru.name>,
|
||||
Jim J. Jewett <jimjjewett@gmail.com>
|
||||
Discussions-To: python-3000@python.org
|
||||
Status: Rejected
|
||||
|
|
Loading…
Reference in New Issue