diff --git a/pep-0101.txt b/pep-0101.txt index 917015c73..91f238cd4 100644 --- a/pep-0101.txt +++ b/pep-0101.txt @@ -424,11 +424,12 @@ How to Make A Release that directory. Note though that if you're releasing a maintenance release for an older version, don't change the current link. - ___ If this is a final release (even a maintenance release), also unpack - the HTML docs to /srv/docs.python.org/release/X.Y.Z on - docs.iad1.psf.io. Make sure the files are in group "docs". If it is a - release of a security-fix-only version, tell the DE to build a version - with the "version switcher" and put it there. + ___ If this is a final release (even a maintenance release), also + unpack the HTML docs to /srv/docs.python.org/release/X.Y.Z on + docs.iad1.psf.io. Make sure the files are in group "docs" and are + group-writeable. If it is a release of a security-fix-only version, + tell the DE to build a version with the "version switcher" + and put it there. ___ Let the DE check if the docs are built and work all right. @@ -484,6 +485,10 @@ How to Make A Release Note that the easiest thing is probably to copy fields from an existing Python release "page", editing as you go. + There should only be one "page" for a release (e.g. 3.5.0, 3.5.1). + Reuse the same page for all pre-releases, changing the version + number and the documentation as you go. + ___ If this isn't the first release for a version, open the existing "page" for editing and update it to the new release. Don't save yet! diff --git a/pep-0103.txt b/pep-0103.txt new file mode 100644 index 000000000..4b5154ae5 --- /dev/null +++ b/pep-0103.txt @@ -0,0 +1,951 @@ +PEP: 103 +Title: Collecting information about git +Version: $Revision$ +Last-Modified: $Date$ +Author: Oleg Broytman +Status: Draft +Type: Informational +Content-Type: text/x-rst +Created: 01-Jun-2015 +Post-History: 12-Sep-2015 + +Abstract +======== + +This Informational PEP collects information about git. There is, of +course, a lot of documentation for git, so the PEP concentrates on +more complex (and more related to Python development) issues, +scenarios and examples. + +The plan is to extend the PEP in the future collecting information +about equivalence of Mercurial and git scenarios to help migrating +Python development from Mercurial to git. + +The author of the PEP doesn't currently plan to write a Process PEP on +migration Python development from Mercurial to git. + + +Documentation +============= + +Git is accompanied with a lot of documentation, both online and +offline. + + +Documentation for starters +-------------------------- + +Git Tutorial: `part 1 +`_, +`part 2 +`_. + +`Git User's manual +`_. +`Everyday GIT With 20 Commands Or So +`_. +`Git workflows +`_. + + +Advanced documentation +---------------------- + +`Git Magic +`_, +with a number of translations. + +`Pro Git `_. The Book about git. Buy it at +Amazon or download in PDF, mobi, or ePub form. It has translations to +many different languages. Download Russian translation from `GArik +`_. + +`Git Wiki `_. + + +Offline documentation +--------------------- + +Git has builtin help: run ``git help $TOPIC``. For example, run +``git help git`` or ``git help help``. + + +Quick start +=========== + +Download and installation +------------------------- + +Unix users: `download and install using your package manager +`_. + +Microsoft Windows: download `git-for-windows +`_ or `msysGit +`_. + +MacOS X: use git installed with `XCode +`_ or download from +`MacPorts `_ or +`git-osx-installer +`_ or +install git with `Homebrew `_: ``brew install git``. + +`git-cola `_ is a Git GUI +written in Python and GPL licensed. Linux, Windows, MacOS X. + +`TortoiseGit `_ is a Windows Shell Interface +to Git based on TortoiseSVN; open source. + + +Initial configuration +--------------------- + +This simple code is often appears in documentation, but it is +important so let repeat it here. Git stores author and committer +names/emails in every commit, so configure your real name and +preferred email:: + + $ git config --global user.name "User Name" + $ git config --global user.email user.name@example.org + + +Examples in this PEP +==================== + +Examples of git commands in this PEP use the following approach. It is +supposed that you, the user, works with a local repository named +``python`` that has an upstream remote repo named ``origin``. Your +local repo has two branches ``v1`` and ``master``. For most examples +the currently checked out branch is ``master``. That is, it's assumed +you have done something like that:: + + $ git clone https://git.python.org/python.git + $ cd python + $ git branch v1 origin/v1 + +The first command clones remote repository into local directory +`python``, creates a new local branch master, sets +remotes/origin/master as its upstream remote-tracking branch and +checks it out into the working directory. + +The last command creates a new local branch v1 and sets +remotes/origin/v1 as its upstream remote-tracking branch. + +The same result can be achieved with commands:: + + $ git clone -b v1 https://git.python.org/python.git + $ cd python + $ git checkout --track origin/master + +The last command creates a new local branch master, sets +remotes/origin/master as its upstream remote-tracking branch and +checks it out into the working directory. + + +Branches and branches +===================== + +Git terminology can be a bit misleading. Take, for example, the term +"branch". In git it has two meanings. A branch is a directed line of +commits (possibly with merges). And a branch is a label or a pointer +assigned to a line of commits. It is important to distinguish when you +talk about commits and when about their labels. Lines of commits are +by itself unnamed and are usually only lengthening and merging. +Labels, on the other hand, can be created, moved, renamed and deleted +freely. + + +Remote repositories and remote branches +======================================= + +Remote-tracking branches are branches (pointers to commits) in your +local repository. They are there for git (and for you) to remember +what branches and commits have been pulled from and pushed to what +remote repos (you can pull from and push to many remotes). +Remote-tracking branches live under ``remotes/$REMOTE`` namespaces, +e.g. ``remotes/origin/master``. + +To see the status of remote-tracking branches run:: + + $ git branch -rv + +To see local and remote-tracking branches (and tags) pointing to +commits:: + + $ git log --decorate + +You never do your own development on remote-tracking branches. You +create a local branch that has a remote branch as upstream and do +development on that local branch. On push git pushes commits to the +remote repo and updates remote-tracking branches, on pull git fetches +commits from the remote repo, updates remote-tracking branches and +fast-forwards, merges or rebases local branches. + +When you do an initial clone like this:: + + $ git clone -b v1 https://git.python.org/python.git + +git clones remote repository ``https://git.python.org/python.git`` to +directory ``python``, creates a remote named ``origin``, creates +remote-tracking branches, creates a local branch ``v1``, configure it +to track upstream remotes/origin/v1 branch and checks out ``v1`` into +the working directory. + + +Updating local and remote-tracking branches +------------------------------------------- + +There is a major difference between + +:: + + $ git fetch $REMOTE $BRANCH + +and + +:: + + $ git fetch $REMOTE $BRANCH:$BRANCH + +The first command fetches commits from the named $BRANCH in the +$REMOTE repository that are not in your repository, updates +remote-tracking branch and leaves the id (the hash) of the head commit +in file .git/FETCH_HEAD. + +The second command fetches commits from the named $BRANCH in the +$REMOTE repository that are not in your repository and updates both +the local branch $BRANCH and its upstream remote-tracking branch. But +it refuses to update branches in case of non-fast-forward. And it +refuses to update the current branch (currently checked out branch, +where HEAD is pointing to). + +The first command is used internally by ``git pull``. + +:: + + $ git pull $REMOTE $BRANCH + +is equivalent to + +:: + + $ git fetch $REMOTE $BRANCH + $ git merge FETCH_HEAD + +Certainly, $BRANCH in that case should be your current branch. If you +want to merge a different branch into your current branch first update +that non-current branch and then merge:: + + $ git fetch origin v1:v1 # Update v1 + $ git pull --rebase origin master # Update the current branch master + # using rebase instead of merge + $ git merge v1 + +If you have not yet pushed commits on ``v1``, though, the scenario has +to become a bit more complex. Git refuses to update +non-fast-forwardable branch, and you don't want to do force-pull +because that would remove your non-pushed commits and you would need +to recover. So you want to rebase ``v1`` but you cannot rebase +non-current branch. Hence, checkout ``v1`` and rebase it before +merging:: + + $ git checkout v1 + $ git pull --rebase origin v1 + $ git checkout master + $ git pull --rebase origin master + $ git merge v1 + +It is possible to configure git to make it fetch/pull a few branches +or all branches at once, so you can simply run + +:: + + $ git pull origin + +or even + +:: + + $ git pull + +Default remote repository for fetching/pulling is ``origin``. Default +set of references to fetch is calculated using matching algorithm: git +fetches all branches having the same name on both ends. + + +Push +'''' + +Pushing is a bit simpler. There is only one command ``push``. When you +run + +:: + + $ git push origin v1 master + +git pushes local v1 to remote v1 and local master to remote master. +The same as:: + + $ git push origin v1:v1 master:master + +Git pushes commits to the remote repo and updates remote-tracking +branches. Git refuses to push commits that aren't fast-forwardable. +You can force-push anyway, but please remember - you can force-push to +your own repositories but don't force-push to public or shared repos. +If you find git refuses to push commits that aren't fast-forwardable, +better fetch and merge commits from the remote repo (or rebase your +commits on top of the fetched commits), then push. Only force-push if +you know what you do and why you do it. See the section `Commit +editing and caveats`_ below. + +It is possible to configure git to make it push a few branches or all +branches at once, so you can simply run + +:: + + $ git push origin + +or even + +:: + + $ git push + +Default remote repository for pushing is ``origin``. Default set of +references to push in git before 2.0 is calculated using matching +algorithm: git pushes all branches having the same name on both ends. +Default set of references to push in git 2.0+ is calculated using +simple algorithm: git pushes the current branch back to its +@{upstream}. + +To configure git before 2.0 to the new behaviour run:: + +$ git config push.default simple + +To configure git 2.0+ to the old behaviour run:: + +$ git config push.default matching + +Git doesn't allow to push a branch if it's the current branch in the +remote non-bare repository: git refuses to update remote working +directory. You really should push only to bare repositories. For +non-bare repositories git prefers pull-based workflow. + +When you want to deploy code on a remote host and can only use push +(because your workstation is behind a firewall and you cannot pull +from it) you do that in two steps using two repositories: you push +from the workstation to a bare repo on the remote host, ssh to the +remote host and pull from the bare repo to a non-bare deployment repo. + +That changed in git 2.3, but see `the blog post +`_ +for caveats; in 2.4 the push-to-deploy feature was `further improved +`_. + + +Tags +'''' + +Git automatically fetches tags that point to commits being fetched +during fetch/pull. To fetch all tags (and commits they point to) run +``git fetch --tags origin``. To fetch some specific tags fetch them +explicitly:: + + $ git fetch origin tag $TAG1 tag $TAG2... + +For example:: + + $ git fetch origin tag 1.4.2 + $ git fetch origin v1:v1 tag 2.1.7 + +Git doesn't automatically pushes tags. That allows you to have private +tags. To push tags list them explicitly:: + + $ git push origin tag 1.4.2 + $ git push origin v1 master tag 2.1.7 + +Or push all tags at once:: + + $ git push --tags origin + +Don't move tags with ``git tag -f`` or remove tags with ``git tag -d`` +after they have been published. + + +Private information +''''''''''''''''''' + +When cloning/fetching/pulling/pushing git copies only database objects +(commits, trees, files and tags) and symbolic references (branches and +lightweight tags). Everything else is private to the repository and +never cloned, updated or pushed. It's your config, your hooks, your +private exclude file. + +If you want to distribute hooks, copy them to the working tree, add, +commit, push and instruct the team to update and install the hooks +manually. + + +Commit editing and caveats +========================== + +A warning not to edit published (pushed) commits also appears in +documentation but it's repeated here anyway as it's very important. + +It is possible to recover from a forced push but it's PITA for the +entire team. Please avoid it. + +To see what commits have not been published yet compare the head of the +branch with its upstream remote-tracking branch:: + + $ git log origin/master.. # from origin/master to HEAD (of master) + $ git log origin/v1..v1 # from origin/v1 to the head of v1 + +For every branch that has an upstream remote-tracking branch git +maintains an alias @{upstream} (short version @{u}), so the commands +above can be given as:: + + $ git log @{u}.. + $ git log v1@{u}..v1 + +To see the status of all branches:: + + $ git branch -avv + +To compare the status of local branches with a remote repo:: + + $ git remote show origin + +Read `how to recover from upstream rebase +`_. +It is in ``git help rebase``. + +On the other hand don't be too afraid about commit editing. You can +safely edit, reorder, remove, combine and split commits that haven't +been pushed yet. You can even push commits to your own (backup) repo, +edit them later and force-push edited commits to replace what have +already been pushed. Not a problem until commits are in a public +or shared repository. + + +Undo +==== + +Whatever you do, don't panic. Almost anything in git can be undone. + + +git checkout: restore file's content +------------------------------------ + +``git checkout``, for example, can be used to restore the content of +file(s) to that one of a commit. Like this:: + + git checkout HEAD~ README + +The commands restores the contents of README file to the last but one +commit in the current branch. By default the commit ID is simply HEAD; +i.e. ``git checkout README`` restores README to the latest commit. + +(Do not use ``git checkout`` to view a content of a file in a commit, +use ``git cat-file -p``; e.g. ``git cat-file -p HEAD~:path/to/README``). + + +git reset: remove (non-pushed) commits +-------------------------------------- + +``git reset`` moves the head of the current branch. The head can be +moved to point to any commit but it's often used to remove a commit or +a few (preferably, non-pushed ones) from the top of the branch - that +is, to move the branch backward in order to undo a few (non-pushed) +commits. + +``git reset`` has three modes of operation - soft, hard and mixed. +Default is mixed. ProGit `explains +`_ the +difference very clearly. Bare repositories don't have indices or +working trees so in a bare repo only soft reset is possible. + + +Unstaging +''''''''' + +Mixed mode reset with a path or paths can be used to unstage changes - +that is, to remove from index changes added with ``git add`` for +committing. See `The Book +`_ for details +about unstaging and other undo tricks. + + +git reflog: reference log +------------------------- + +Removing commits with ``git reset`` or moving the head of a branch +sounds dangerous and it is. But there is a way to undo: another +reset back to the original commit. Git doesn't remove commits +immediately; unreferenced commits (in git terminology they are called +"dangling commits") stay in the database for some time (default is two +weeks) so you can reset back to it or create a new branch pointing to +the original commit. + +For every move of a branch's head - with ``git commit``, ``git +checkout``, ``git fetch``, ``git pull``, ``git rebase``, ``git reset`` +and so on - git stores a reference log (reflog for short). For every +move git stores where the head was. Command ``git reflog`` can be used +to view (and manipulate) the log. + +In addition to the moves of the head of every branch git stores the +moves of the HEAD - a symbolic reference that (usually) names the +current branch. HEAD is changed with ``git checkout $BRANCH``. + +By default ``git reflog`` shows the moves of the HEAD, i.e. the +command is equivalent to ``git reflog HEAD``. To show the moves of the +head of a branch use the command ``git reflog $BRANCH``. + +So to undo a ``git reset`` lookup the original commit in ``git +reflog``, verify it with ``git show`` or ``git log`` and run ``git +reset $COMMIT_ID``. Git stores the move of the branch's head in +reflog, so you can undo that undo later again. + +In a more complex situation you'd want to move some commits along with +resetting the head of the branch. Cherry-pick them to the new branch. +For example, if you want to reset the branch ``master`` back to the +original commit but preserve two commits created in the current branch +do something like:: + + $ git branch save-master # create a new branch saving master + $ git reflog # find the original place of master + $ git reset $COMMIT_ID + $ git cherry-pick save-master~ save-master + $ git branch -D save-master # remove temporary branch + + +git revert: revert a commit +--------------------------- + +``git revert`` reverts a commit or commits, that is, it creates a new +commit or commits that revert(s) the effects of the given commits. +It's the only way to undo published commits (``git commit --amend``, +``git rebase`` and ``git reset`` change the branch in +non-fast-forwardable ways so they should only be used for non-pushed +commits.) + +There is a problem with reverting a merge commit. ``git revert`` can +undo the code created by the merge commit but it cannot undo the fact +of merge. See the discussion `How to revert a faulty merge +`_. + + +One thing that cannot be undone +------------------------------- + +Whatever you undo, there is one thing that cannot be undone - +overwritten uncommitted changes. Uncommitted changes don't belong to +git so git cannot help preserving them. + +Most of the time git warns you when you're going to execute a command +that overwrites uncommitted changes. Git doesn't allow you to switch +branches with ``git checkout``. It stops you when you're going to +rebase with non-clean working tree. It refuses to pull new commits +over non-committed files. + +But there are commands that do exactly that - overwrite files in the +working tree. Commands like ``git checkout $PATHs`` or ``git reset +--hard`` silently overwrite files including your uncommitted changes. + +With that in mind you can understand the stance "commit early, commit +often". Commit as often as possible. Commit on every save in your +editor or IDE. You can edit your commits before pushing - edit commit +messages, change commits, reorder, combine, split, remove. But save +your changes in git database, either commit changes or at least stash +them with ``git stash``. + + +Merge or rebase? +================ + +Internet is full of heated discussions on the topic: "merge or +rebase?" Most of them are meaningless. When a DVCS is being used in a +big team with a big and complex project with many branches there is +simply no way to avoid merges. So the question's diminished to +"whether to use rebase, and if yes - when to use rebase?" Considering +that it is very much recommended not to rebase published commits the +question's diminished even further: "whether to use rebase on +non-pushed commits?" + +That small question is for the team to decide. The author of the PEP +recommends to use rebase when pulling, i.e. always do ``git pull +--rebase`` or even configure automatic setup of rebase for every new +branch:: + + $ git config branch.autosetuprebase always + +and configure rebase for existing branches:: + + $ git config branch.$NAME.rebase true + +For example:: + + $ git config branch.v1.rebase true + $ git config branch.master.rebase true + +After that ``git pull origin master`` becomes equivalent to ``git pull +--rebase origin master``. + +It is recommended to create new commits in a separate feature or topic +branch while using rebase to update the mainline branch. When the +topic branch is ready merge it into mainline. To avoid a tedious task +of resolving large number of conflicts at once you can merge the topic +branch to the mainline from time to time and switch back to the topic +branch to continue working on it. The entire workflow would be +something like:: + + $ git checkout -b issue-42 # create a new issue branch and switch to it + ...edit/test/commit... + $ git checkout master + $ git pull --rebase origin master # update master from the upstream + $ git merge issue-42 + $ git branch -d issue-42 # delete the topic branch + $ git push origin master + +When the topic branch is deleted only the label is removed, commits +are stayed in the database, they are now merged into master:: + + o--o--o--o--o--M--< master - the mainline branch + \ / + --*--*--* - the topic branch, now unnamed + +The topic branch is deleted to avoid cluttering branch namespace with +small topic branches. Information on what issue was fixed or what +feature was implemented should be in the commit messages. + + +Null-merges +=========== + +Git has a builtin merge strategy for what Python core developers call +"null-merge":: + + $ git merge -s ours v1 # null-merge v1 into master + + +Branching models +================ + +Git doesn't assume any particular development model regarding +branching and merging. Some projects prefer to graduate patches from +the oldest branch to the newest, some prefer to cherry-pick commits +backwards, some use squashing (combining a number of commits into +one). Anything is possible. + +There are a few examples to start with. `git help workflows +`_ +describes how the very git authors develop git. + +ProGit book has a few chapters devoted to branch management in +different projects: `Git Branching - Branching Workflows +`_ and +`Distributed Git - Contributing to a Project +`_. + +There is also a well-known article `A successful Git branching model +`_ by Vincent +Driessen. It recommends a set of very detailed rules on creating and +managing mainline, topic and bugfix branches. To support the model the +author implemented `git flow `_ +extension. + + +Advanced configuration +====================== + +Line endings +------------ + +Git has builtin mechanisms to handle line endings between platforms +with different end-of-line styles. To allow git to do CRLF conversion +assign ``text`` attribute to files using `.gitattributes +`_. +For files that have to have specific line endings assign ``eol`` +attribute. For binary files the attribute is, naturally, ``binary``. + +For example:: + + $ cat .gitattributes + *.py text + *.txt text + *.png binary + /readme.txt eol=CRLF + +To check what attributes git uses for files use ``git check-attr`` +command. For example:: + +$ git check-attr -a -- \*.py + + +Advanced topics +=============== + +Staging area +------------ + +Staging area aka index aka cache is a distinguishing feature of git. +Staging area is where git collects patches before committing them. +Separation between collecting patches and commit phases provides a +very useful feature of git: you can review collected patches before +commit and even edit them - remove some hunks, add new hunks and +review again. + +To add files to the index use ``git add``. Collecting patches before +committing means you need to do that for every change, not only to add +new (untracked) files. To simplify committing in case you just want to +commit everything without reviewing run ``git commit --all`` (or just +``-a``) - the command adds every changed tracked file to the index and +then commit. To commit a file or files regardless of patches collected +in the index run ``git commit [--only|-o] -- $FILE...``. + +To add hunks of patches to the index use ``git add --patch`` (or just +``-p``). To remove collected files from the index use ``git reset HEAD +-- $FILE...`` To add/inspect/remove collected hunks use ``git add +--interactive`` (``-i``). + +To see the diff between the index and the last commit (i.e., collected +patches) use ``git diff --cached``. To see the diff between the +working tree and the index (i.e., uncollected patches) use just ``git +diff``. To see the diff between the working tree and the last commit +(i.e., both collected and uncollected patches) run ``git diff HEAD``. + +See `WhatIsTheIndex +`_ and +`IndexCommandQuickref +`_ in Git +Wiki. + + +ReReRe +====== + +Rerere is a mechanism that helps to resolve repeated merge conflicts. +The most frequent source of recurring merge conflicts are topic +branches that are merged into mainline and then the merge commits are +removed; that's often performed to test the topic branches and train +rerere; merge commits are removed to have clean linear history and +finish the topic branch with only one last merge commit. + +Rerere works by remembering the states of tree before and after a +successful commit. That way rerere can automatically resolve conflicts +if they appear in the same files. + +Rerere can be used manually with ``git rerere`` command but most often +it's used automatically. Enable rerere with these commands in a +working tree:: + + $ git config rerere.enabled true + $ git config rerere.autoupdate true + +You don't need to turn rerere on globally - you don't want rerere in +bare repositories or single-branche repositories; you only need rerere +in repos where you often perform merges and resolve merge conflicts. + +See `Rerere `_ in The +Book. + + +Database maintenance +==================== + +Git object database and other files/directories under ``.git`` require +periodic maintenance and cleanup. For example, commit editing left +unreferenced objects (dangling objects, in git terminology) and these +objects should be pruned to avoid collecting cruft in the DB. The +command ``git gc`` is used for maintenance. Git automatically runs +``git gc --auto`` as a part of some commands to do quick maintenance. +Users are recommended to run ``git gc --aggressive`` from time to +time; ``git help gc`` recommends to run it every few hundred +changesets; for more intensive projects it should be something like +once a week and less frequently (biweekly or monthly) for lesser +active projects. + +``git gc --aggressive`` not only removes dangling objects, it also +repacks object database into indexed and better optimized pack(s); it +also packs symbolic references (branches and tags). Another way to do +it is to run ``git repack``. + +There is a well-known `message +`_ from Linus +Torvalds regarding "stupidity" of ``git gc --aggressive``. The message +can safely be ignored now. It is old and outdated, ``git gc +--aggressive`` became much better since that time. + +For those who still prefer ``git repack`` over ``git gc --aggressive`` +the recommended parameters are ``git repack -a -d -f --depth=20 +--window=250``. See `this detailed experiment +`_ +for explanation of the effects of these parameters. + +From time to time run ``git fsck [--strict]`` to verify integrity of +the database. ``git fsck`` may produce a list of dangling objects; +that's not an error, just a reminder to perform regular maintenance. + + +Tips and tricks +=============== + +Command-line options and arguments +---------------------------------- + +`git help cli +`_ +recommends not to combine short options/flags. Most of the times +combining works: ``git commit -av`` works perfectly, but there are +situations when it doesn't. E.g., ``git log -p -5`` cannot be combined +as ``git log -p5``. + +Some options have arguments, some even have default arguments. In that +case the argument for such option must be spelled in a sticky way: +``-Oarg``, never ``-O arg`` because for an option that has a default +argument the latter means "use default value for option ``-O`` and +pass ``arg`` further to the option parser". For example, ``git grep`` +has an option ``-O`` that passes a list of names of the found files to +a program; default program for ``-O`` is a pager (usually ``less``), +but you can use your editor:: + + $ git grep -Ovim # but not -O vim + +BTW, if git is instructed to use ``less`` as the pager (i.e., if pager +is not configured in git at all it uses ``less`` by default, or if it +gets ``less`` from GIT_PAGER or PAGER environment variables, or if it +was configured with ``git config --global core.pager less``, or +``less`` is used in the command ``git grep -Oless``) ``git grep`` +passes ``+/$pattern`` option to ``less`` which is quite convenient. +Unfortunately, ``git grep`` doesn't pass the pattern if the pager is +not exactly ``less``, even if it's ``less`` with parameters (something +like ``git config --global core.pager less -FRSXgimq``); fortunately, +``git grep -Oless`` always passes the pattern. + + +bash/zsh completion +------------------- + +It's a bit hard to type ``git rebase --interactive --preserve-merges +HEAD~5`` manually even for those who are happy to use command-line, +and this is where shell completion is of great help. Bash/zsh come +with programmable completion, often automatically installed and +enabled, so if you have bash/zsh and git installed, chances are you +are already done - just go and use it at the command-line. + +If you don't have necessary bits installed, install and enable +bash_completion package. If you want to upgrade your git completion to +the latest and greatest download necessary file from `git contrib +`_. + +Git-for-windows comes with git-bash for which bash completion is +installed and enabled. + + +bash/zsh prompt +--------------- + +For command-line lovers shell prompt can carry a lot of useful +information. To include git information in the prompt use +`git-prompt.sh +`_. +Read the detailed instructions in the file. + +Search the Net for "git prompt" to find other prompt variants. + + +git on server +============= + +The simplest way to publish a repository or a group of repositories is +``git daemon``. The daemon provides anonymous access, by default it is +read-only. The repositories are accessible by git protocol (git:// +URLs). Write access can be enabled but the protocol lacks any +authentication means, so it should be enabled only within a trusted +LAN. See ``git help daemon`` for details. + +Git over ssh provides authentication and repo-level authorisation as +repositories can be made user- or group-writeable (see parameter +``core.sharedRepository`` in ``git help config``). If that's too +permissive or too restrictive for some project's needs there is a +wrapper `gitolite `_ that can +be configured to allow access with great granularity; gitolite is +written in Perl and has a lot of documentation. + +Web interface to browse repositories can be created using `gitweb +`_ or `cgit +`_. Both are CGI scripts (written in +Perl and C). In addition to web interface both provide read-only dumb +http access for git (http(s):// URLs). + +There are also more advanced web-based development environments that +include ability to manage users, groups and projects; private, +group-accessible and public repositories; they often include issue +trackers, wiki pages, pull requests and other tools for development +and communication. Among these environments are `Kallithea +`_ and `pagure `_, +both are written in Python; pagure was written by Fedora developers +and is being used to develop some Fedora projects. `Gogs +`_ is written in Go; there is a fork `Gitea +`_. + +And last but not least, `Gitlab `_. It's +perhaps the most advanced web-based development environment for git. +Written in Ruby, community edition is free and open source (MIT +license). + + +From Mercurial to git +===================== + +There are many tools to convert Mercurial repositories to git. The +most famous are, probably, `hg-git `_ and +`fast-export `_ (many years ago +it was known under the name ``hg2git``). + +But a better tool, perhaps the best, is `git-remote-hg +`_. It provides transparent +bidirectional (pull and push) access to Mercurial repositories from +git. Its author wrote a `comparison of alternatives +`_ +that seems to be mostly objective. + +To use git-remote-hg, install or clone it, add to your PATH (or copy +script ``git-remote-hg`` to a directory that's already in PATH) and +prepend ``hg::`` to Mercurial URLs. For example:: + + $ git clone https://github.com/felipec/git-remote-hg.git + $ PATH=$PATH:"`pwd`"/git-remote-hg + $ git clone hg::https://hg.python.org/peps/ PEPs + +To work with the repository just use regular git commands including +``git fetch/pull/push``. + +To start converting your Mercurial habits to git see the page +`Mercurial for Git users +`_ at Mercurial wiki. +At the second half of the page there is a table that lists +corresponding Mercurial and git commands. Should work perfectly in +both directions. + +Python Developer's Guide also has a chapter `Mercurial for git +developers `_ that +documents a few differences between git and hg. + + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: + vim: set fenc=us-ascii tw=70 : diff --git a/pep-0362.txt b/pep-0362.txt index 227666e7b..6e9d0427f 100644 --- a/pep-0362.txt +++ b/pep-0362.txt @@ -3,7 +3,7 @@ Title: Function Signature Object Version: $Revision$ Last-Modified: $Date$ Author: Brett Cannon , Jiwon Seo , - Yury Selivanov , Larry Hastings + Yury Selivanov , Larry Hastings Status: Final Type: Standards Track Content-Type: text/x-rst diff --git a/pep-0462.txt b/pep-0462.txt index 1ad00359c..61d6c77e0 100644 --- a/pep-0462.txt +++ b/pep-0462.txt @@ -3,7 +3,7 @@ Title: Core development workflow automation for CPython Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan -Status: Deferred +Status: Withdrawn Type: Process Content-Type: text/x-rst Requires: 474 @@ -23,11 +23,15 @@ experience for other contributors that are reliant on the core team to get their changes incorporated. -PEP Deferral -============ +PEP Withdrawal +============== -This PEP is currently deferred pending acceptance or rejection of the -Kallithea-based forge.python.org proposal in PEP 474. +This PEP has been `withdrawn by the author +`_ +in favour of the GitLab based proposal in PEP 507. + +If anyone else would like to take over championing this PEP, contact the +`core-workflow mailing list `_ Rationale for changes to the core development workflow diff --git a/pep-0474.txt b/pep-0474.txt index 38af089d5..b98e1bb9a 100644 --- a/pep-0474.txt +++ b/pep-0474.txt @@ -3,7 +3,7 @@ Title: Creating forge.python.org Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan -Status: Draft +Status: Withdrawn Type: Process Content-Type: text/x-rst Created: 19-Jul-2014 @@ -23,6 +23,17 @@ This PEP does *not* propose any changes to the core development workflow for CPython itself (see PEP 462 in relation to that). +PEP Withdrawal +============== + +This PEP has been `withdrawn by the author +`_ +in favour of the GitLab based proposal in PEP 507. + +If anyone else would like to take over championing this PEP, contact the +`core-workflow mailing list `_ + + Proposal ======== diff --git a/pep-0476.txt b/pep-0476.txt index 89f2b0e2b..ffcf6958d 100644 --- a/pep-0476.txt +++ b/pep-0476.txt @@ -7,6 +7,7 @@ Status: Final Type: Standards Track Content-Type: text/x-rst Created: 28-August-2014 +Resolution: https://mail.python.org/pipermail/python-dev/2014-October/136676.html Abstract ======== diff --git a/pep-0478.txt b/pep-0478.txt index d4f72e9eb..b6f8bf428 100644 --- a/pep-0478.txt +++ b/pep-0478.txt @@ -66,7 +66,7 @@ Features for 3.5 * PEP 479, change StopIteration handling inside generators * PEP 484, the typing module, a new standard for type annotations * PEP 485, math.isclose(), a function for testing approximate equality -* PEP 486, making the Widnows Python launcher aware of virtual environments +* PEP 486, making the Windows Python launcher aware of virtual environments * PEP 488, eliminating .pyo files * PEP 489, a new and improved mechanism for loading extension modules * PEP 492, coroutines with async and await syntax diff --git a/pep-0492.txt b/pep-0492.txt index 168a258ae..2d5c98804 100644 --- a/pep-0492.txt +++ b/pep-0492.txt @@ -2,7 +2,7 @@ PEP: 492 Title: Coroutines with async and await syntax Version: $Revision$ Last-Modified: $Date$ -Author: Yury Selivanov +Author: Yury Selivanov Discussions-To: Status: Final Type: Standards Track @@ -125,7 +125,7 @@ Key properties of *coroutines*: * Internally, two new code object flags were introduced: - ``CO_COROUTINE`` is used to mark *native coroutines* - (defined with new syntax.) + (defined with new syntax). - ``CO_ITERABLE_COROUTINE`` is used to make *generator-based coroutines* compatible with *native coroutines* (set by @@ -139,7 +139,7 @@ Key properties of *coroutines*: such behavior requires a future import (see PEP 479). * When a *coroutine* is garbage collected, a ``RuntimeWarning`` is - raised if it was never awaited on (see also `Debugging Features`_.) + raised if it was never awaited on (see also `Debugging Features`_). * See also `Coroutine objects`_ section. @@ -199,7 +199,7 @@ can be one of: internally, coroutines are a special kind of generators, every ``await`` is suspended by a ``yield`` somewhere down the chain of ``await`` calls (please refer to PEP 3156 for a detailed - explanation.) + explanation). To enable this behavior for coroutines, a new magic method called ``__await__`` is added. In asyncio, for instance, to enable *Future* @@ -222,7 +222,7 @@ can be one of: It is a ``SyntaxError`` to use ``await`` outside of an ``async def`` function (like it is a ``SyntaxError`` to use ``yield`` outside of -``def`` function.) +``def`` function). It is a ``TypeError`` to pass anything other than an *awaitable* object to an ``await`` expression. @@ -918,7 +918,7 @@ There is no use of ``await`` names in CPython. ``async`` is mostly used by asyncio. We are addressing this by renaming ``async()`` function to ``ensure_future()`` (see `asyncio`_ -section for details.) +section for details). Another use of ``async`` keyword is in ``Lib/xml/dom/xmlbuilder.py``, to define an ``async = False`` attribute for ``DocumentLS`` class. @@ -970,7 +970,7 @@ PEP 3152 by Gregory Ewing proposes a different mechanism for coroutines 2. A new keyword ``cocall`` to call a *cofunction*. Can only be used inside a *cofunction*. Maps to ``await`` in this proposal (with - some differences, see below.) + some differences, see below). 3. It is not possible to call a *cofunction* without a ``cocall`` keyword. diff --git a/pep-0494.txt b/pep-0494.txt index 8160a23c9..203cb0584 100644 --- a/pep-0494.txt +++ b/pep-0494.txt @@ -19,7 +19,7 @@ items. .. Small features may be added up to the first beta release. Bugs may be fixed until the final release, - which is planned for September 2015. + which is planned for December 2016. Release Manager and Crew @@ -31,17 +31,37 @@ Release Manager and Crew - Documentation: Georg Brandl +3.6 Lifespan +============ + +3.6 will receive bugfix updates approximately every 3-6 months for +approximately 18 months. After the release of 3.7.0 final, a final +3.6 bugfix update will be released. After that, it is expected that +security updates (source only) will be released until 5 years after +the release of 3.6 final, so until approximately December 2021. + + Release Schedule ================ -The releases: +3.6.0 schedule +-------------- -- 3.6.0 alpha 1: TBD -- 3.6.0 beta 1: TBD -- 3.6.0 candidate 1: TBD -- 3.6.0 final: TBD (late 2016?) +- 3.6 development begins: 2015-05-24 +- 3.6.0 alpha 1: 2016-05-15 +- 3.6.0 alpha 2: 2016-06-12 +- 3.6.0 alpha 3: 2016-07-10 +- 3.6.0 alpha 4: 2016-08-07 +- 3.6.0 beta 1: 2016-09-07 -(Beta 1 is also "feature freeze"--no new features beyond this point.) +(No new features beyond this point.) + +- 3.6.0 beta 2: 2016-10-02 +- 3.6.0 beta 3: 2016-10-30 +- 3.6.0 beta 4: 2016-11-20 +- 3.6.0 candidate 1: 2016-12-04 +- 3.6.0 candidate 2 (if needed): 2016-12-11 +- 3.6.0 final: 2016-12-16 Features for 3.6 diff --git a/pep-0495-fold-2.png b/pep-0495-fold-2.png new file mode 100644 index 000000000..d09eb41f7 Binary files /dev/null and b/pep-0495-fold-2.png differ diff --git a/pep-0495-fold.png b/pep-0495-fold.png deleted file mode 100644 index d9fe8b6ee..000000000 Binary files a/pep-0495-fold.png and /dev/null differ diff --git a/pep-0495-gap.png b/pep-0495-gap.png new file mode 100644 index 000000000..e3ba3cb77 Binary files /dev/null and b/pep-0495-gap.png differ diff --git a/pep-0495-gap.svg b/pep-0495-gap.svg new file mode 100644 index 000000000..658d9df55 --- /dev/null +++ b/pep-0495-gap.svg @@ -0,0 +1,437 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + UTC local + + + + t + u0 + u1 + + + + Fold + + + + + + + + + + + diff --git a/pep-0495.txt b/pep-0495.txt index 6303f4d97..61becbd74 100644 --- a/pep-0495.txt +++ b/pep-0495.txt @@ -4,34 +4,29 @@ Version: $Revision$ Last-Modified: $Date$ Author: Alexander Belopolsky , Tim Peters Discussions-To: Datetime-SIG -Status: Draft +Status: Accepted Type: Standards Track Content-Type: text/x-rst Created: 02-Aug-2015 - +Python-Version: 3.6 +Resolution: https://mail.python.org/pipermail/datetime-sig/2015-September/000900.html Abstract ======== -This PEP adds a new attribute ``fold`` to the instances of +This PEP adds a new attribute ``fold`` to instances of the ``datetime.time`` and ``datetime.datetime`` classes that can be used to differentiate between two moments in time for which local times are -the same. The allowed values for the `fold` attribute will be 0 and 1 +the same. The allowed values for the ``fold`` attribute will be 0 and 1 with 0 corresponding to the earlier and 1 to the later of the two possible readings of an ambiguous local time. -.. sidebar:: US public service advertisement - - .. image:: pep-0495-daylightsavings.png - :align: center - :width: 95% - Rationale ========= -In the most world locations there have been and will be times when +In most world locations, there have been and will be times when local clocks are moved back. [#]_ In those times, intervals are introduced in which local clocks show the same time twice in the same day. In these situations, the information displayed on a local clock @@ -40,9 +35,14 @@ a particular moment in time. The proposed solution is to add an attribute to the ``datetime`` instances taking values of 0 and 1 that will enumerate the two ambiguous times. +.. image:: pep-0495-daylightsavings.png + :align: center + :width: 30% + + .. [#] People who live in locations observing the Daylight Saving Time (DST) move their clocks back (usually one hour) every Fall. - + It is less common, but occasionally clocks can be moved back for other reasons. For example, Ukraine skipped the spring-forward transition in March 1990 and instead, moved their clocks back on @@ -76,11 +76,11 @@ Proposal The "fold" attribute -------------------- -We propose adding an attribute called ``fold`` to the instances -of ``datetime.time`` and ``datetime.datetime`` classes. This attribute -should have the value 0 for all instances except those that -represent the second (chronologically) moment in time in an ambiguous -case. For those instances, the value will be 1. [#]_ +We propose adding an attribute called ``fold`` to instances of the +``datetime.time`` and ``datetime.datetime`` classes. This attribute +should have the value 0 for all instances except those that represent +the second (chronologically) moment in time in an ambiguous case. For +those instances, the value will be 1. [#]_ .. [#] An instance that has ``fold=1`` in a non-ambiguous case is said to represent an invalid time (or is invalid for short), but @@ -93,10 +93,6 @@ case. For those instances, the value will be 1. [#]_ this PEP specifies how various functions should behave when given an invalid instance. -.. image:: pep-0495-fold.png - :align: center - :width: 60% - Affected APIs ------------- @@ -121,15 +117,23 @@ Methods The ``replace()`` methods of the ``datetime.time`` and ``datetime.datetime`` classes will get a new keyword-only argument -called ``fold``. It will -behave similarly to the other ``replace()`` arguments: if the ``fold`` -argument is specified and given a value 0 or 1, the new instance -returned by ``replace()`` will have its ``fold`` attribute set -to that value. In CPython, any non-integer value of ``fold`` will -raise a ``TypeError``, but other implementations may allow the value -``None`` to behave the same as when ``fold`` is not given. If the -``fold`` argument is not specified, the original value of the ``fold`` -attribute is copied to the result. +called ``fold``. It will behave similarly to the other ``replace()`` +arguments: if the ``fold`` argument is specified and given a value 0 +or 1, the new instance returned by ``replace()`` will have its +``fold`` attribute set to that value. In CPython, any non-integer +value of ``fold`` will raise a ``TypeError``, but other +implementations may allow the value ``None`` to behave the same as +when ``fold`` is not given. [#]_ (This is +a nod to the existing difference in treatment of ``None`` arguments +in other positions of this method across Python implementations; +it is not intended to leave the door open for future alternative +interpretation of ``fold=None``.) If the ``fold`` argument is not +specified, the original value of the ``fold`` attribute is copied to +the result. + +.. [#] PyPy and pure Python implementation distributed with CPython + already allow ``None`` to mean "no change to existing + attribute" for all other attributes in ``replace()``. C-API ..... @@ -137,14 +141,14 @@ C-API Access macros will be defined to extract the value of ``fold`` from ``PyDateTime_DateTime`` and ``PyDateTime_Time`` objects. -.. code:: +.. code:: int PyDateTime_GET_FOLD(PyDateTime_DateTime *o) Return the value of ``fold`` as a C ``int``. -.. code:: - +.. code:: + int PyDateTime_TIME_GET_FOLD(PyDateTime_Time *o) Return the value of ``fold`` as a C ``int``. @@ -155,14 +159,17 @@ instance: .. code:: - PyObject* PyDateTime_FromDateAndTimeAndFold(int year, int month, int day, int hour, int minute, int second, int usecond, int fold) + PyObject* PyDateTime_FromDateAndTimeAndFold( + int year, int month, int day, int hour, int minute, + int second, int usecond, int fold) Return a ``datetime.datetime`` object with the specified year, month, day, hour, minute, second, microsecond and fold. .. code:: - PyObject* PyTime_FromTimeAndFold(int hour, int minute, int second, int usecond, int fold) + PyObject* PyTime_FromTimeAndFold( + int hour, int minute, int second, int usecond, int fold) Return a ``datetime.time`` object with the specified hour, minute, second, microsecond and fold. @@ -174,18 +181,23 @@ Affected Behaviors What time is it? ................ -The ``datetime.now()`` method called with no arguments, will set +The ``datetime.now()`` method called without arguments will set ``fold=1`` when returning the second of the two ambiguous times in a system local time fold. When called with a ``tzinfo`` argument, the value of the ``fold`` will be determined by the ``tzinfo.fromutc()`` -implementation. If an instance of the ``datetime.timezone`` class -(*e.g.* ``datetime.timezone.utc``) is passed as ``tzinfo``, the +implementation. When an instance of the ``datetime.timezone`` class +(the stdlib's fixed-offset ``tzinfo`` subclass, +*e.g.* ``datetime.timezone.utc``) is passed as ``tzinfo``, the returned datetime instance will always have ``fold=0``. +The ``datetime.utcnow()`` method is unaffected. Conversion from naive to aware .............................. +A new feature is proposed to facilitate conversion from naive datetime +instances to aware. + The ``astimezone()`` method will now work for naive ``self``. The system local timezone will be assumed in this case and the ``fold`` flag will be used to determine which local timezone is in effect @@ -199,6 +211,11 @@ For example, on a system set to US/Eastern timezone:: >>> dt.replace(fold=1).astimezone().strftime('%D %T %Z%z') '11/02/14 01:30:00 EST-0500' +An implication is that ``datetime.now(tz)`` is fully equivalent to +``datetime.now().astimezone(tz)`` (assuming ``tz`` is an instance of a +post-PEP ``tzinfo`` implementation, i.e. one that correctly handles +and sets ``fold``). + Conversion from POSIX seconds from EPOCH ........................................ @@ -227,8 +244,10 @@ time, there are two values ``s0`` and ``s1`` such that:: datetime.fromtimestamp(s0) == datetime.fromtimestamp(s1) == dt +(This is because ``==`` disregards the value of fold -- see below.) + In this case, ``dt.timestamp()`` will return the smaller of ``s0`` -and ``s1`` values if ``dt.fold == True`` and the larger otherwise. +and ``s1`` values if ``dt.fold == 0`` and the larger otherwise. For example, on a system set to US/Eastern timezone:: @@ -238,7 +257,6 @@ For example, on a system set to US/Eastern timezone:: >>> datetime(2014, 11, 2, 1, 30, fold=1).timestamp() 1414909800.0 - When a ``datetime.datetime`` instance ``dt`` represents a missing time, there is no value ``s`` for which:: @@ -254,6 +272,8 @@ is always the same as the offset right after the gap. The value returned by ``dt.timestamp()`` given a missing ``dt`` will be the greater of the two "nice to know" values if ``dt.fold == 0`` and the smaller otherwise. +(This is not a typo -- it's intentionally backwards from the rule for +ambiguous times.) For example, on a system set to US/Eastern timezone:: @@ -270,13 +290,14 @@ Users of pre-PEP implementations of ``tzinfo`` will not see any changes in the behavior of their aware datetime instances. Two such instances that differ only by the value of the ``fold`` attribute will not be distinguishable by any means other than an explicit access to -the ``fold`` value. +the ``fold`` value. (This is because these pre-PEP implementations +are not using the ``fold`` attribute.) -On the other hand, if object's ``tzinfo`` is set to a fold-aware -implementation, then the value of ``fold`` will affect the result of -several methods but only if the corresponding time is in a fold or in -a gap: ``utcoffset()``, ``dst()``, ``tzname()``, ``astimezone()``, -``strftime()`` (if "%Z" or "%z" directive is used in the format +On the other hand, if an object's ``tzinfo`` is set to a fold-aware +implementation, then in a fold or gap the value of ``fold`` will +affect the result of several methods: +``utcoffset()``, ``dst()``, ``tzname()``, ``astimezone()``, +``strftime()`` (if the "%Z" or "%z" directive is used in the format specification), ``isoformat()``, and ``timetuple()``. @@ -293,16 +314,21 @@ The ``datetime.datetime.time()`` method will copy the value of the Pickles ....... +The value of the fold attribute will only be saved in pickles created +with protocol version 4 (introduced in Python 3.4) or greater. + Pickle sizes for the ``datetime.datetime`` and ``datetime.time`` objects will not change. The ``fold`` value will be encoded in the -first bit of the 5th byte of the ``datetime.datetime`` pickle payload -or the 2nd byte of the datetime.time. In the `current implementation`_ -these bytes are used to store minute value (0-59) and the first bit is -always 0. (This change only affects pickle format. In the C -implementation, the ``fold`` attribute will get a full byte to store its -value.) +first bit of the 3rd byte of the ``datetime.datetime`` +pickle payload; and in the first bit of the 1st byte of the +``datetime.time`` payload. In the `current implementation`_ +these bytes are used to store the month (1-12) and hour (0-23) values +and the first bit is always 0. We picked these bytes because they are +the only bytes that are checked by the current unpickle code. Thus +loading post-PEP ``fold=1`` pickles in a pre-PEP Python will result in +an exception rather than an instance with out of range components. -.. _current implementation: https://hg.python.org/cpython/file/d3b20bff9c5d/Include/datetime.h#l17 +.. _current implementation: https://hg.python.org/cpython/file/v3.5.0/Include/datetime.h#l10 Implementations of tzinfo in the Standard Library @@ -312,13 +338,16 @@ No new implementations of ``datetime.tzinfo`` abstract class are proposed in this PEP. The existing (fixed offset) timezones do not introduce ambiguous local times and their ``utcoffset()`` implementation will return the same constant value as they do now -regardless of the value of ``fold``. +regardless of the value of ``fold``. The basic implementation of ``fromutc()`` in the abstract -``datetime.tzinfo`` class will not change. It is currently not -used anywhere in the stdlib because the only included ``tzinfo`` -implementation (the ``datetime.timzeone`` class implementing fixed -offset timezones) override ``fromutc()``. +``datetime.tzinfo`` class will not change. It is currently not used +anywhere in the stdlib because the only included ``tzinfo`` +implementation (the ``datetime.timezone`` class implementing fixed +offset timezones) overrides ``fromutc()``. Keeping the default +implementation unchanged has the benefit that pre-PEP 3rd party +implementations that inherit the default ``fromutc()`` are not +accidentally affected. Guidelines for New tzinfo Implementations @@ -337,16 +366,102 @@ methods should ignore the value of ``fold`` unless they are called on the ambiguous or missing times. -In the DST Fold ---------------- +In the Fold +----------- New subclasses should override the base-class ``fromutc()`` method and -implement it so that in all cases where two UTC times ``u1`` and -``u2`` (``u1`` <``u2``) correspond to the same local time -``fromutc(u1)`` will return an instance with ``fold=0`` and -``fromutc(u2)`` will return an instance with ``fold=1``. In all +implement it so that in all cases where two different UTC times ``u0`` and +``u1`` (``u0`` <``u1``) correspond to the same local time ``t``, +``fromutc(u0)`` will return an instance with ``fold=0`` and +``fromutc(u1)`` will return an instance with ``fold=1``. In all other cases the returned instance should have ``fold=0``. +The ``utcoffset()``, ``tzname()`` and ``dst()`` methods should use the +value of the fold attribute to determine whether an otherwise +ambiguous time ``t`` corresponds to the time before or after the +transition. By definition, ``utcoffset()`` is greater before and +smaller after any transition that creates a fold. The values returned +by ``tzname()`` and ``dst()`` may or may not depend on the value of +the ``fold`` attribute depending on the kind of the transition. + +.. image:: pep-0495-fold-2.png + :align: center + :width: 60% + +The sketch above illustrates the relationship between the UTC and +local time around a fall-back transition. The zig-zag line is a graph +of the function implemented by ``fromutc()``. Two intervals on the +UTC axis adjacent to the transition point and having the size of the +time shift at the transition are mapped to the same interval on the +local axis. New implementations of ``fromutc()`` method should set +the fold attribute to 1 when ``self`` is in the region marked in +yellow on the UTC axis. (All intervals should be treated as closed on +the left and open on the right.) + + +Mind the Gap +------------ + +The ``fromutc()`` method should never produce a time in the gap. + +If the ``utcoffset()``, ``tzname()`` or ``dst()`` method is called on a +local time that falls in a gap, the rules in effect before the +transition should be used if ``fold=0``. Otherwise, the rules in +effect after the transition should be used. + +.. image:: pep-0495-gap.png + :align: center + :width: 60% + +The sketch above illustrates the relationship between the UTC and +local time around a spring-forward transition. At the transition, the +local clock is advanced skipping the times in the gap. For the +purposes of determining the values of ``utcoffset()``, ``tzname()`` +and ``dst()``, the line before the transition is extended forward to +find the UTC time corresponding to the time in the gap with ``fold=0`` +and for instances with ``fold=1``, the line after the transition is +extended back. + +Summary of Rules at a Transition +-------------------------------- + +On ambiguous/missing times ``utcoffset()`` should return values +according to the following table: + ++-----------------+----------------+-----------------------------+ +| | fold=0 | fold=1 | ++=================+================+=============================+ +| Fold | oldoff | newoff = oldoff - delta | ++-----------------+----------------+-----------------------------+ +| Gap | oldoff | newoff = oldoff + delta | ++-----------------+----------------+-----------------------------+ + +where ``oldoff`` (``newoff``) is the UTC offset before (after) the +transition and ``delta`` is the absolute size of the fold or the gap. + +Note that the interpretation of the fold attribute is consistent in +the fold and gap cases. In both cases, ``fold=0`` (``fold=1``) means +use ``fromutc()`` line before (after) the transition to find the UTC +time. Only in the "Fold" case, the UTC times ``u0`` and ``u1`` are +"real" solutions for the equation ``fromutc(u) == t``, while in the +"Gap" case they are "imaginary" solutions. + + +The DST Transitions +------------------- + +On a missing time introduced at the start of DST, the values returned +by ``utcoffset()`` and ``dst()`` methods should be as follows + ++-----------------+----------------+------------------+ +| | fold=0 | fold=1 | ++=================+================+==================+ +| utcoffset() | stdoff | stdoff + dstoff | ++-----------------+----------------+------------------+ +| dst() | zero | dstoff | ++-----------------+----------------+------------------+ + + On an ambiguous time introduced at the end of DST, the values returned by ``utcoffset()`` and ``dst()`` methods should be as follows @@ -363,61 +478,101 @@ DST correction (typically ``dstoff = timedelta(hours=1)``) and ``zero = timedelta(0)``. -Mind the DST Gap ----------------- +Temporal Arithmetic and Comparison Operators +============================================ -On a missing time introduced at the start of DST, the values returned -by ``utcoffset()`` and ``dst()`` methods should be as follows +.. epigraph:: -+-----------------+----------------+------------------+ -| | fold=0 | fold=1 | -+=================+================+==================+ -| utcoffset() | stdoff | stdoff + dstoff | -+-----------------+----------------+------------------+ -| dst() | zero | dstoff | -+-----------------+----------------+------------------+ + | In *mathematicks* he was greater + | Than Tycho Brahe, or Erra Pater: + | For he, by geometric scale, + | Could take the size of pots of ale; + | Resolve, by sines and tangents straight, + | If bread or butter wanted weight, + | And wisely tell what hour o' th' day + | The clock does strike by algebra. + -- "Hudibras" by Samuel Butler -Non-DST Folds and Gaps ----------------------- - -On ambiguous/missing times introduced by the change in the standard time -offset, the ``dst()`` method should return the same value regardless of -the value of ``fold`` and the ``utcoffset()`` should return values -according to the following table: - -+-----------------+----------------+-----------------------------+ -| | fold=0 | fold=1 | -+=================+================+=============================+ -| ambiguous | oldoff | newoff = oldoff - delta | -+-----------------+----------------+-----------------------------+ -| missing | oldoff | newoff = oldoff + delta | -+-----------------+----------------+-----------------------------+ - -where ``delta`` is the size of the fold or the gap. - - -Temporal Arithmetic -=================== - -The value of "fold" will be ignored in all operations except those -that involve conversion between timezones. [#]_ As a consequence, +The value of the ``fold`` attribute will be ignored in all operations +with naive datetime instances. As a consequence, naive ``datetime.datetime`` or ``datetime.time`` instances that differ only by the value of ``fold`` will compare as equal. Applications that need to differentiate between such instances should check the value of -``fold`` or convert them to a timezone that does not have ambiguous -times. +``fold`` explicitly or convert those instances to a timezone that does +not have ambiguous times (such as UTC). -The result of addition (subtraction) of a timedelta to (from) a -datetime will always have ``fold`` set to 0 even if the +The value of ``fold`` will also be ignored whenever a timedelta is +added to or subtracted from a datetime instance which may be either +aware or naive. The result of addition (subtraction) of a timedelta +to (from) a datetime will always have ``fold`` set to 0 even if the original datetime instance had ``fold=1``. -.. [#] Computing a difference between two aware datetime instances - with different values of ``tzinfo`` involves an implicit timezone - conversion. In this case, the result may depend on the value of - the ``fold`` attribute in either of the instances, but only if the - instance has ``tzinfo`` that accounts for the value of ``fold`` - in its ``utcoffset()`` method. +No changes are proposed to the way the difference ``t - s`` is +computed for datetime instances ``t`` and ``s``. If both instances +are naive or ``t.tzinfo`` is the same instance as ``s.tzinfo`` +(``t.tzinfo is s.tzinfo`` evaluates to ``True``) then ``t - s`` is a +timedelta ``d`` such that ``s + d == t``. As explained in the +previous paragraph, timedelta addition ignores both ``fold`` and +``tzinfo`` attributes and so does intra-zone or naive datetime +subtraction. + +Naive and intra-zone comparisons will ignore the value of ``fold`` and +return the same results as they do now. (This is the only way to +preserve backward compatibility. If you need an aware intra-zone +comparison that uses the fold, convert both sides to UTC first.) + +The inter-zone subtraction will be defined as it is now: ``t - s`` is +computed as ``(t - t.utcoffset()) - (s - +s.utcoffset()).replace(tzinfo=t.tzinfo)``, but the result will +depend on the values of ``t.fold`` and ``s.fold`` when either +``t.tzinfo`` or ``s.tzinfo`` is post-PEP. [#]_ + +.. [#] Note that the new rules may result in a paradoxical situation + when ``s == t`` but ``s - u != t - u``. Such paradoxes are + not really new and are inherent in the overloading of the minus + operator differently for intra- and inter-zone operations. For + example, one can easily construct datetime instances ``t`` and ``s`` + with some variable offset ``tzinfo`` and a datetime ``u`` with + ``tzinfo=timezone.utc`` such that ``(t - u) - (s - u) != t - s``. + The explanation for this paradox is that the minuses inside the + parentheses and the two other minuses are really three different + operations: inter-zone datetime subtraction, timedelta subtraction, + and intra-zone datetime subtraction, which each have the mathematical + properties of subtraction separately, but not when combined in a + single expression. + + +Aware datetime Equality Comparison +---------------------------------- + +The aware datetime comparison operators will work the same as they do +now, with results indirectly affected by the value of ``fold`` whenever +the ``utcoffset()`` value of one of the operands depends on it, with one +exception. Whenever one or both of the operands in inter-zone comparison is +such that its ``utcoffset()`` depends on the value of its ``fold`` +fold attribute, the result is ``False``. [#]_ + +.. [#] This exception is designed to preserve the hash and equivalence + invariants in the face of paradoxes of inter-zone arithmetic. + +Formally, ``t == s`` when ``t.tzinfo is s.tzinfo`` evaluates to +``False`` can be defined as follows. Let ``toutc(t, fold)`` be a +function that takes an aware datetime instance ``t`` and returns a +naive instance representing the same time in UTC assuming a given +value of ``fold``: + +.. code:: + + def toutc(t, fold): + u = t - t.replace(fold=fold).utcoffset() + return u.replace(tzinfo=None) + +Then ``t == s`` is equivalent to + +.. code:: + + toutc(t, fold=0) == toutc(t, fold=1) == toutc(s, fold=0) == toutc(s, fold=1) Backward and Forward Compatibility @@ -467,7 +622,7 @@ A non-technical answer between fold=0 and fold=1 when I set it for tomorrow 01:30 AM. What should I do? * Alice: I've never hear of a Py-O-Clock, but I guess fold=0 is - the first 01:30 AM and fold=1 is the second. + the first 01:30 AM and fold=1 is the second. A technical reason @@ -538,13 +693,12 @@ The following alternative names have also been considered: **repeated** Did not receive any support on the mailing list. - + **ltdf** (Local Time Disambiguation Flag) - short and no-one will attempt - to guess what it means without reading the docs. (Feel free to - use it in discussions with the meaning ltdf=False is the - earlier if you don't want to endorse any of the alternatives - above.) + to guess what it means without reading the docs. (This abbreviation + was used in PEP discussions with the meaning ``ltdf=False`` is the + earlier by those who didn't want to endorse any of the alternatives.) .. _original: https://mail.python.org/pipermail/python-dev/2015-April/139099.html .. _independently proposed: https://mail.python.org/pipermail/datetime-sig/2015-August/000479.html @@ -585,7 +739,7 @@ such program because ``astimezone()`` does not currently work with naive datetimes. This leaves us with only one situation where an existing program can -start producing diferent results after the implementation of this PEP: +start producing different results after the implementation of this PEP: when a ``datetime.timestamp()`` method is called on a naive datetime instance that happen to be in the fold or the gap. In the current implementation, the result is undefined. Depending on the system @@ -638,13 +792,13 @@ hemisphere (where DST is in effect in June) one can get Note that 12:00 was interpreted as 13:00 by ``mktime``. With the ``datetime.timestamp``, ``datetime.fromtimestamp``, it is currently -guaranteed that +guaranteed that .. code:: >>> t = datetime.datetime(2015, 6, 1, 12).timestamp() >>> datetime.datetime.fromtimestamp(t) - datetime.datetime(2015, 6, 1, 12, 0) + datetime.datetime(2015, 6, 1, 12, 0) This PEP extends the same guarantee to both values of ``fold``: @@ -652,13 +806,13 @@ This PEP extends the same guarantee to both values of ``fold``: >>> t = datetime.datetime(2015, 6, 1, 12, fold=0).timestamp() >>> datetime.datetime.fromtimestamp(t) - datetime.datetime(2015, 6, 1, 12, 0) + datetime.datetime(2015, 6, 1, 12, 0) .. code:: >>> t = datetime.datetime(2015, 6, 1, 12, fold=1).timestamp() >>> datetime.datetime.fromtimestamp(t) - datetime.datetime(2015, 6, 1, 12, 0) + datetime.datetime(2015, 6, 1, 12, 0) Thus one of the suggested uses for ``fold=-1`` -- to match the legacy behavior -- is not needed. Either choice of ``fold`` will match the @@ -708,7 +862,7 @@ implement any desired behavior in a few lines of code. Implementation ============== -* Github fork: https://github.com/abalkin/cpython +* Github fork: https://github.com/abalkin/cpython/tree/issue24773-s3 * Tracker issue: http://bugs.python.org/issue24773 diff --git a/pep-0498.txt b/pep-0498.txt index 9504e8780..4dba3e02a 100644 --- a/pep-0498.txt +++ b/pep-0498.txt @@ -8,7 +8,7 @@ Type: Standards Track Content-Type: text/x-rst Created: 01-Aug-2015 Python-Version: 3.6 -Post-History: 07-Aug-2015, 30-Aug-2015, 04-Sep-2015 +Post-History: 07-Aug-2015, 30-Aug-2015, 04-Sep-2015, 19-Sep-2015 Resolution: https://mail.python.org/pipermail/python-dev/2015-September/141526.html Abstract @@ -173,8 +173,7 @@ In source code, f-strings are string literals that are prefixed by the letter 'f' or 'F'. Everywhere this PEP uses 'f', 'F' may also be used. 'f' may be combined with 'r', in either order, to produce raw f-string literals. 'f' may not be combined with 'b': this PEP does not -propose to add binary f-strings. 'f' may also be combined with 'u', in -either order, although adding 'u' has no effect. +propose to add binary f-strings. 'f' may not be combined with 'u'. When tokenizing source files, f-strings use the same rules as normal strings, raw strings, binary strings, and triple quoted strings. That @@ -198,9 +197,14 @@ expressions. Expressions appear within curly braces ``'{'`` and expressions are evaluated, formatted with the existing __format__ protocol, then the results are concatenated together with the string literals. While scanning the string for expressions, any doubled -braces ``'{{'`` or ``'}}'`` are replaced by the corresponding single -brace. Doubled opening braces do not signify the start of an -expression. +braces ``'{{'`` or ``'}}'`` inside literal portions of an f-string are +replaced by the corresponding single brace. Doubled opening braces do +not signify the start of an expression. + +Note that ``__format__()`` is not called directly on each value. The +actual code uses the equivalent of ``type(value).__format__(value, +format_spec)``, or ``format(value, format_spec)``. See the +documentation of the builtin ``format()`` function for more details. Comments, using the ``'#'`` character, are not allowed inside an expression. @@ -210,7 +214,7 @@ specified. The allowed conversions are ``'!s'``, ``'!r'``, or ``'!a'``. These are treated the same as in ``str.format()``: ``'!s'`` calls ``str()`` on the expression, ``'!r'`` calls ``repr()`` on the expression, and ``'!a'`` calls ``ascii()`` on the expression. These -conversions are applied before the call to ``__format__``. The only +conversions are applied before the call to ``format()``. The only reason to use ``'!s'`` is if you want to specify a format specifier that applies to ``str``, not to the type of the expression. @@ -221,11 +225,11 @@ not provided, an empty string is used. So, an f-string looks like:: - f ' { } text ... ' + f ' { } ... ' -The resulting expression's ``__format__`` method is called with the -format specifier. The resulting value is used when building the value -of the f-string. +The expression is then formatted using the ``__format__`` protocol, +using the format specifier as an argument. The resulting value is +used when building the value of the f-string. Expressions cannot contain ``':'`` or ``'!'`` outside of strings or parentheses, brackets, or braces. The exception is that the ``'!='`` @@ -290,11 +294,11 @@ mechanism that ``str.format()`` uses to convert values to strings. For example, this code:: - f'abc{expr1:spec1}{expr2!r:spec2}def{expr3:!s}ghi' + f'abc{expr1:spec1}{expr2!r:spec2}def{expr3}ghi' Might be be evaluated as:: - 'abc' + expr1.__format__(spec1) + repr(expr2).__format__(spec2) + 'def' + str(spec3).__format__('') + 'ghi' + 'abc' + format(expr1, spec1) + format(repr(expr2), spec2) + 'def' + format(expr3) + 'ghi' Expression evaluation --------------------- @@ -372,7 +376,15 @@ yields the value:: While the exact method of this run time concatenation is unspecified, the above code might evaluate to:: - 'ab' + x.__format__('') + '{c}' + 'str<' + y.__format__('^4') + 'de' + 'ab' + format(x) + '{c}' + 'str<' + format(y, '^4') + '>de' + +Each f-string is entirely evaluated before being concatenated to +adjacent f-strings. That means that this:: + + >>> f'{x' f'}' + +Is a syntax error, because the first f-string does not contain a +closing brace. Error handling -------------- @@ -386,15 +398,13 @@ Unmatched braces:: >>> f'x={x' File "", line 1 - SyntaxError: missing '}' in format string expression + SyntaxError: f-string: expecting '}' Invalid expressions:: >>> f'x={!x}' - File "", line 1 - !x - ^ - SyntaxError: invalid syntax + File "", line 1 + SyntaxError: f-string: empty expression not allowed Run time errors occur when evaluating the expressions inside an f-string. Note that an f-string can be evaluated multiple times, and @@ -425,7 +435,8 @@ Leading and trailing whitespace in expressions is ignored --------------------------------------------------------- For ease of readability, leading and trailing whitespace in -expressions is ignored. +expressions is ignored. This is a by-product of enclosing the +expression in parentheses before evaluation. Evaluation order of expressions ------------------------------- @@ -577,8 +588,8 @@ Triple-quoted f-strings Triple quoted f-strings are allowed. These strings are parsed just as normal triple-quoted strings are. After parsing and decoding, the -normal f-string logic is applied, and ``__format__()`` on each value -is called. +normal f-string logic is applied, and ``__format__()`` is called on +each value. Raw f-strings ------------- @@ -653,6 +664,14 @@ If you feel you must use lambdas, they may be used inside of parentheses:: >>> f'{(lambda x: x*2)(3)}' '6' +Can't combine with 'u' +-------------------------- + +The 'u' prefix was added to Python 3.3 in PEP 414 as a means to ease +source compatibility with Python 2.7. Because Python 2.7 will never +support f-strings, there is nothing to be gained by being able to +combine the 'f' prefix with 'u'. + Examples from Python's source code ================================== diff --git a/pep-0500.txt b/pep-0500.txt index c7b0049d7..a1c64eb40 100644 --- a/pep-0500.txt +++ b/pep-0500.txt @@ -5,12 +5,12 @@ Version: $Revision$ Last-Modified: $Date$ Author: Alexander Belopolsky , Tim Peters Discussions-To: Datetime-SIG -Status: Draft +Status: Rejected Type: Standards Track Content-Type: text/x-rst Requires: 495 Created: 08-Aug-2015 - +Resolution: https://mail.python.org/pipermail/datetime-sig/2015-August/000354.html Abstract ======== diff --git a/pep-0502.txt b/pep-0502.txt index a51b7eba6..dbb1db34c 100644 --- a/pep-0502.txt +++ b/pep-0502.txt @@ -1,44 +1,46 @@ PEP: 502 -Title: String Interpolation Redux +Title: String Interpolation - Extended Discussion Version: $Revision$ Last-Modified: $Date$ Author: Mike G. Miller Status: Draft -Type: Standards Track +Type: Informational Content-Type: text/x-rst Created: 10-Aug-2015 Python-Version: 3.6 -Note: Open issues below are stated with a question mark (?), -and are therefore searchable. - Abstract ======== -This proposal describes a new string interpolation feature for Python, -called an *expression-string*, -that is both concise and powerful, -improves readability in most cases, -yet does not conflict with existing code. +PEP 498: *Literal String Interpolation*, which proposed "formatted strings" was +accepted September 9th, 2015. +Additional background and rationale given during its design phase is detailed +below. + +To recap that PEP, +a string prefix was introduced that marks the string as a template to be +rendered. +These formatted strings may contain one or more expressions +built on `the existing syntax`_ of ``str.format()``. +The formatted string expands at compile-time into a conventional string format +operation, +with the given expressions from its text extracted and passed instead as +positional arguments. -To achieve this end, -a new string prefix is introduced, -which expands at compile-time into an equivalent expression-string object, -with requested variables from its context passed as keyword arguments. At runtime, -the new object uses these passed values to render a string to given -specifications, building on `the existing syntax`_ of ``str.format()``:: +the resulting expressions are evaluated to render a string to given +specifications:: >>> location = 'World' - >>> e'Hello, {location} !' # new prefix: e'' - 'Hello, World !' # interpolated result + >>> f'Hello, {location} !' # new prefix: f'' + 'Hello, World !' # interpolated result + +Format-strings may be thought of as merely syntactic sugar to simplify traditional +calls to ``str.format()``. .. _the existing syntax: https://docs.python.org/3/library/string.html#format-string-syntax -This PEP does not recommend to remove or deprecate any of the existing string -formatting mechanisms. - Motivation ========== @@ -50,12 +52,16 @@ In comparison to other dynamic scripting languages with similar use cases, the amount of code necessary to build similar strings is substantially higher, while at times offering lower readability due to verbosity, dense syntax, -or identifier duplication. [1]_ +or identifier duplication. + +These difficulties are described at moderate length in the original +`post to python-ideas`_ +that started the snowball (that became PEP 498) rolling. [1]_ Furthermore, replacement of the print statement with the more consistent print function of Python 3 (PEP 3105) has added one additional minor burden, an additional set of parentheses to type and read. -Combined with the verbosity of current formatting solutions, +Combined with the verbosity of current string formatting solutions, this puts an otherwise simple language at an unfortunate disadvantage to its peers:: @@ -66,7 +72,7 @@ peers:: # Python 3, str.format with named parameters print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(**locals())) - # Python 3, variation B, worst case + # Python 3, worst case print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(user=user, id=id, hostname= @@ -74,7 +80,7 @@ peers:: In Python, the formatting and printing of a string with multiple variables in a single line of code of standard width is noticeably harder and more verbose, -indentation often exacerbating the issue. +with indentation exacerbating the issue. For use cases such as smaller projects, systems programming, shell script replacements, and even one-liners, @@ -82,36 +88,17 @@ where message formatting complexity has yet to be encapsulated, this verbosity has likely lead a significant number of developers and administrators to choose other languages over the years. +.. _post to python-ideas: https://mail.python.org/pipermail/python-ideas/2015-July/034659.html + Rationale ========= -Naming ------- - -The term expression-string was chosen because other applicable terms, -such as format-string and template are already well used in the Python standard -library. - -The string prefix itself, ``e''`` was chosen to demonstrate that the -specification enables expressions, -is not limited to ``str.format()`` syntax, -and also does not lend itself to `the shorthand term`_ "f-string". -It is also slightly easier to type than other choices such as ``_''`` and -``i''``, -while perhaps `less odd-looking`_ to C-developers. -``printf('')`` vs. ``print(f'')``. - -.. _the shorthand term: reference_needed -.. _less odd-looking: https://mail.python.org/pipermail/python-dev/2015-August/141147.html - - - Goals ------------- -The design goals of expression-strings are as follows: +The design goals of format strings are as follows: #. Eliminate need to pass variables manually. #. Eliminate repetition of identifiers and redundant parentheses. @@ -133,40 +120,44 @@ Python specified both single (``'``) and double (``"``) ASCII quote characters to enclose strings. It is not reasonable to choose one of them now to enable interpolation, while leaving the other for uninterpolated strings. -"Backtick" characters (`````) are also `constrained by history`_ as a shortcut -for ``repr()``. +Other characters, +such as the "Backtick" (or grave accent `````) are also +`constrained by history`_ +as a shortcut for ``repr()``. This leaves a few remaining options for the design of such a feature: * An operator, as in printf-style string formatting via ``%``. * A class, such as ``string.Template()``. -* A function, such as ``str.format()``. -* New syntax +* A method or function, such as ``str.format()``. +* New syntax, or * A new string prefix marker, such as the well-known ``r''`` or ``u''``. -The first three options above currently work well. +The first three options above are mature. Each has specific use cases and drawbacks, yet also suffer from the verbosity and visual noise mentioned previously. -All are discussed in the next section. +All options are discussed in the next sections. .. _constrained by history: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html + Background ------------- -This proposal builds on several existing techniques and proposals and what +Formatted strings build on several existing techniques and proposals and what we've collectively learned from them. +In keeping with the design goals of readability and error-prevention, +the following examples therefore use named, +not positional arguments. -The following examples focus on the design goals of readability and -error-prevention using named parameters. Let's assume we have the following dictionary, and would like to print out its items as an informative string for end users:: >>> params = {'user': 'nobody', 'id': 9, 'hostname': 'darkstar'} -Printf-style formatting -''''''''''''''''''''''' +Printf-style formatting, via operator +''''''''''''''''''''''''''''''''''''' This `venerable technique`_ continues to have its uses, such as with byte-based protocols, @@ -178,7 +169,7 @@ and familiarity to many programmers:: In this form, considering the prerequisite dictionary creation, the technique is verbose, a tad noisy, -and relatively readable. +yet relatively readable. Additional issues are that an operator can only take one argument besides the original string, meaning multiple parameters must be passed in a tuple or dictionary. @@ -190,8 +181,8 @@ or forget the trailing type, e.g. (``s`` or ``d``). .. _venerable technique: https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting -string.Template -''''''''''''''' +string.Template Class +''''''''''''''''''''' The ``string.Template`` `class from`_ PEP 292 (Simpler String Substitutions) @@ -202,7 +193,7 @@ that finds its main use cases in shell and internationalization tools:: Template('Hello, user: $user, id: ${id}, on host: $hostname').substitute(params) -Also verbose, however the string itself is readable. +While also verbose, the string itself is readable. Though functionality is limited, it meets its requirements well. It isn't powerful enough for many cases, @@ -232,8 +223,8 @@ and likely contributed to the PEP's lack of acceptance. It was superseded by the following proposal. -str.format() -'''''''''''' +str.format() Method +''''''''''''''''''' The ``str.format()`` `syntax of`_ PEP 3101 is the most recent and modern of the existing options. @@ -253,36 +244,32 @@ string literals:: host=hostname) 'Hello, user: nobody, id: 9, on host: darkstar' +The verbosity of the method-based approach is illustrated here. + .. _syntax of: https://docs.python.org/3/library/string.html#format-string-syntax PEP 498 -- Literal String Formatting '''''''''''''''''''''''''''''''''''' -PEP 498 discusses and delves partially into implementation details of -expression-strings, -which it calls f-strings, -the idea and syntax -(with exception of the prefix letter) -of which is identical to that discussed here. -The resulting compile-time transformation however -returns a string joined from parts at runtime, -rather than an object. - -It also, somewhat controversially to those first exposed to it, -introduces the idea that these strings shall be augmented with support for -arbitrary expressions, -which is discussed further in the following sections. +PEP 498 defines and discusses format strings, +as also described in the `Abstract`_ above. +It also, somewhat controversially to those first exposed, +introduces the idea that format-strings shall be augmented with support for +arbitrary expressions. +This is discussed further in the +Restricting Syntax section under +`Rejected Ideas`_. PEP 501 -- Translation ready string interpolation ''''''''''''''''''''''''''''''''''''''''''''''''' The complimentary PEP 501 brings internationalization into the discussion as a -first-class concern, with its proposal of i-strings, +first-class concern, with its proposal of the i-prefix, ``string.Template`` syntax integration compatible with ES6 (Javascript), deferred rendering, -and a similar object return value. +and an object return value. Implementations in Other Languages @@ -374,7 +361,8 @@ ES6 (Javascript) Designers of `Template strings`_ faced the same issue as Python where single and double quotes were taken. Unlike Python however, "backticks" were not. -They were chosen as part of the ECMAScript 2015 (ES6) standard:: +Despite `their issues`_, +they were chosen as part of the ECMAScript 2015 (ES6) standard:: console.log(`Fifteen is ${a + b} and\nnot ${2 * a + b}.`); @@ -391,8 +379,10 @@ as the tag:: * User implemented prefixes supported. * Arbitrary expressions are supported. +.. _their issues: https://mail.python.org/pipermail/python-ideas/2007-January/000054.html .. _Template strings: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings + C#, Version 6 ''''''''''''' @@ -428,13 +418,14 @@ Arbitrary `interpolation under Swift`_ is available on all strings:: Additional examples ''''''''''''''''''' -A number of additional examples may be `found at Wikipedia`_. +A number of additional examples of string interpolation may be +`found at Wikipedia`_. + +Now that background and history have been covered, +let's continue on for a solution. .. _found at Wikipedia: https://en.wikipedia.org/wiki/String_interpolation#Examples -Now that background and imlementation history have been covered, -let's continue on for a solution. - New Syntax ---------- @@ -442,178 +433,47 @@ New Syntax This should be an option of last resort, as every new syntax feature has a cost in terms of real-estate in a brain it inhabits. -There is one alternative left on our list of possibilities, +There is however one alternative left on our list of possibilities, which follows. New String Prefix ----------------- -Given the history of string formatting in Python, -backwards-compatibility, +Given the history of string formatting in Python and backwards-compatibility, implementations in other languages, -and the avoidance of new syntax unless necessary, +avoidance of new syntax unless necessary, an acceptable design is reached through elimination rather than unique insight. -Therefore, we choose to explicitly mark interpolated string literals with a -string prefix. +Therefore, marking interpolated string literals with a string prefix is chosen. -We also choose an expression syntax that reuses and builds on the strongest of +We also choose an expression syntax that reuses and builds on the strongest of the existing choices, -``str.format()`` to avoid further duplication. - - -Specification -============= - -String literals with the prefix of ``e`` shall be converted at compile-time to -the construction of an ``estr`` (perhaps ``types.ExpressionString``?) object. -Strings and values are parsed from the literal and passed as tuples to the -constructor:: +``str.format()`` to avoid further duplication of functionality:: >>> location = 'World' - >>> e'Hello, {location} !' + >>> f'Hello, {location} !' # new prefix: f'' + 'Hello, World !' # interpolated result - # becomes - # estr('Hello, {location} !', # template - ('Hello, ', ' !'), # string fragments - ('location',), # expressions - ('World',), # values - ) - -The object interpolates its result immediately at run-time:: - - 'Hello, World !' +PEP 498 -- Literal String Formatting, delves into the mechanics and +implementation of this design. -ExpressionString Objects ------------------------- - -The ExpressionString object supports both immediate and deferred rendering of -its given template and parameters. -It does this by immediately rendering its inputs to its internal string and -``.rendered`` string member (still necessary?), -useful in the majority of use cases. -To allow for deferred rendering and caller-specified escaping, -all inputs are saved for later inspection, -with convenience methods available. - -Notes: - -* Inputs are saved to the object as ``.template`` and ``.context`` members - for later use. -* No explicit ``str(estr)`` call is necessary to render the result, - though doing so might be desired to free resources if significant. -* Additional or deferred rendering is available through the ``.render()`` - method, which allows template and context to be overriden for flexibility. -* Manual escaping of potentially dangerous input is available through the - ``.escape(escape_function)`` method, - the rules of which may therefore be specified by the caller. - The given function should both accept and return a single modified string. - -* A sample Python implementation can `found at Bitbucket`_: - -.. _found at Bitbucket: https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_demo.py - - -Inherits From ``str`` Type -''''''''''''''''''''''''''' - -Inheriting from the ``str`` class is one of the techniques available to improve -compatibility with code expecting a string object, -as it will pass an ``isinstance(obj, str)`` test. -ExpressionString implements this and also renders its result into the "raw" -string of its string superclass, -providing compatibility with a majority of code. - - -Interpolation Syntax --------------------- - -The strongest of the existing string formatting syntaxes is chosen, -``str.format()`` as a base to build on. [10]_ [11]_ - -.. - -* Additionally, single arbitrary expressions shall also be supported inside - braces as an extension:: - - >>> e'My age is {age + 1} years.' - - See below for section on safety. - -* Triple quoted strings with multiple lines shall be supported:: - - >>> e'''Hello, - {location} !''' - 'Hello,\n World !' - -* Adjacent implicit concatenation shall be supported; - interpolation does not `not bleed into`_ other strings:: - - >>> 'Hello {1, 2, 3} ' e'{location} !' - 'Hello {1, 2, 3} World !' - -* Additional implementation details, - for example expression and error-handling, - are specified in the compatible PEP 498. - -.. _not bleed into: https://mail.python.org/pipermail/python-ideas/2015-July/034763.html - - -Composition with Other Prefixes -------------------------------- - -* Expression-strings apply to unicode objects only, - therefore ``u''`` is never needed. - Should it be prevented? - -* Bytes objects are not included here and do not compose with e'' as they - do not support ``__format__()``. - -* Complimentary to raw strings, - backslash codes shall not be converted in the expression-string, - when combined with ``r''`` as ``re''``. - - -Examples --------- - -A more complicated example follows:: - - n = 5; # t0, t1 = … TODO - a = e"Sliced {n} onions in {t1-t0:.3f} seconds." - # returns the equvalent of - estr("Sliced {n} onions in {t1-t0:.3f} seconds", # template - ('Sliced ', ' onions in ', ' seconds'), # strings - ('n', 't1-t0:.3f'), # expressions - (5, 0.555555) # values - ) - -With expressions only:: - - b = e"Three random numbers: {rand()}, {rand()}, {rand()}." - # returns the equvalent of - estr("Three random numbers: {rand():f}, {rand():f}, {rand():}.", # template - ('Three random numbers: ', ', ', ', ', '.'), # strings - ('rand():f', 'rand():f', 'rand():f'), # expressions - (rand(), rand(), rand()) # values - ) +Additional Topics +================= Safety ----------- In this section we will describe the safety situation and precautions taken -in support of expression-strings. +in support of format-strings. -#. Only string literals shall be considered here, +#. Only string literals have been considered for format-strings, not variables to be taken as input or passed around, making external attacks difficult to accomplish. - * ``str.format()`` `already handles`_ this use-case. - * Direct instantiation of the ExpressionString object with non-literal input - shall not be allowed. (Practicality?) + ``str.format()`` and alternatives `already handle`_ this use-case. #. Neither ``locals()`` nor ``globals()`` are necessary nor used during the transformation, @@ -622,37 +482,72 @@ in support of expression-strings. #. To eliminate complexity as well as ``RuntimeError`` (s) due to recursion depth, recursive interpolation is not supported. -#. Restricted characters or expression classes?, such as ``=`` for assignment. - However, mistakes or malicious code could be missed inside string literals. Though that can be said of code in general, that these expressions are inside strings means they are a bit more likely to be obscured. -.. _already handles: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html +.. _already handle: https://mail.python.org/pipermail/python-ideas/2015-July/034729.html -Mitigation via tools +Mitigation via Tools '''''''''''''''''''' The idea is that tools or linters such as pyflakes, pylint, or Pycharm, -could check inside strings for constructs that exceed project policy. -As this is a common task with languages these days, -tools won't have to implement this feature solely for Python, +may check inside strings with expressions and mark them up appropriately. +As this is a common task with programming languages today, +multi-language tools won't have to implement this feature solely for Python, significantly shortening time to implementation. -Additionally the Python interpreter could check(?) and warn with appropriate -command-line parameters passed. +Farther in the future, +strings might also be checked for constructs that exceed the safety policy of +a project. + + +Style Guide/Precautions +----------------------- + +As arbitrary expressions may accomplish anything a Python expression is +able to, +it is highly recommended to avoid constructs inside format-strings that could +cause side effects. + +Further guidelines may be written once usage patterns and true problems are +known. + + +Reference Implementation(s) +--------------------------- + +The `say module on PyPI`_ implements string interpolation as described here +with the small burden of a callable interface:: + + > pip install say + + from say import say + nums = list(range(4)) + say("Nums has {len(nums)} items: {nums}") + +A Python implementation of Ruby interpolation `is also available`_. +It uses the codecs module to do its work:: + + > pip install interpy + + # coding: interpy + location = 'World' + print("Hello #{location}.") + +.. _say module on PyPI: https://pypi.python.org/pypi/say/ +.. _is also available: https://github.com/syrusakbary/interpy Backwards Compatibility ----------------------- -By using existing syntax and avoiding use of current or historical features, -expression-strings (and any associated sub-features), -were designed so as to not interfere with existing code and is not expected -to cause any issues. +By using existing syntax and avoiding current or historical features, +format strings were designed so as to not interfere with existing code and are +not expected to cause any issues. Postponed Ideas @@ -666,20 +561,12 @@ Though it was highly desired to integrate internationalization support, the finer details diverge at almost every point, making a common solution unlikely: [15]_ -* Use-cases -* Compile and run-time tasks -* Interpolation Syntax +* Use-cases differ +* Compile vs. run-time tasks +* Interpolation syntax needs * Intended audience * Security policy -Rather than try to fit a "square peg in a round hole," -this PEP attempts to allow internationalization to be supported in the future -by not preventing it. -In this proposal, -expression-string inputs are saved for inspection and re-rendering at a later -time, -allowing for their use by an external library of any sort. - Rejected Ideas -------------- @@ -687,18 +574,25 @@ Rejected Ideas Restricting Syntax to ``str.format()`` Only ''''''''''''''''''''''''''''''''''''''''''' -This was deemed not enough of a solution to the problem. -It can be seen in the `Implementations in Other Languages`_ section that the -developer community at large tends to agree. +The common `arguments against`_ support of arbitrary expresssions were: -The common `arguments against`_ arbitrary expresssions were: - -#. YAGNI, "You ain't gonna need it." -#. The change is not congruent with historical Python conservatism. +#. `YAGNI`_, "You aren't gonna need it." +#. The feature is not congruent with historical Python conservatism. #. Postpone - can implement in a future version if need is demonstrated. +.. _YAGNI: https://en.wikipedia.org/wiki/You_aren't_gonna_need_it .. _arguments against: https://mail.python.org/pipermail/python-ideas/2015-August/034913.html +Support of only ``str.format()`` syntax however, +was deemed not enough of a solution to the problem. +Often a simple length or increment of an object, for example, +is desired before printing. + +It can be seen in the `Implementations in Other Languages`_ section that the +developer community at large tends to agree. +String interpolation with arbitrary expresssions is becoming an industry +standard in modern languages due to its utility. + Additional/Custom String-Prefixes ''''''''''''''''''''''''''''''''' @@ -720,7 +614,7 @@ this was thought to create too much uncertainty of when and where string expressions could be used safely or not. The concept was also difficult to describe to others. [12]_ -Always consider expression-string variables to be unescaped, +Always consider format string variables to be unescaped, unless the developer has explicitly escaped them. @@ -735,33 +629,13 @@ and looking too much like bash/perl, which could encourage bad habits. [13]_ -Reference Implementation(s) -=========================== - -An expression-string implementation is currently attached to PEP 498, -under the ``f''`` prefix, -and may be available in nightly builds. - -A Python implementation of Ruby interpolation `is also available`_, -which is similar to this proposal. -It uses the codecs module to do its work:: - - > pip install interpy - - # coding: interpy - location = 'World' - print("Hello #{location}.") - -.. _is also available: https://github.com/syrusakbary/interpy - - Acknowledgements ================ -* Eric V. Smith for providing invaluable implementation work and design - opinions, helping to focus this PEP. -* Others on the python-ideas mailing list for rejecting the craziest of ideas, - also helping to achieve focus. +* Eric V. Smith for the authoring and implementation of PEP 498. +* Everyone on the python-ideas mailing list for rejecting the various crazy + ideas that came up, + helping to keep the final design in focus. References @@ -771,7 +645,6 @@ References (https://mail.python.org/pipermail/python-ideas/2015-July/034659.html) - .. [2] Briefer String Format (https://mail.python.org/pipermail/python-ideas/2015-July/034669.html) diff --git a/pep-0503.txt b/pep-0503.txt index 30ad90462..b98601e86 100644 --- a/pep-0503.txt +++ b/pep-0503.txt @@ -5,11 +5,12 @@ Last-Modified: $Date$ Author: Donald Stufft BDFL-Delegate: Donald Stufft Discussions-To: distutils-sig@python.org -Status: Draft +Status: Accepted Type: Informational Content-Type: text/x-rst Created: 04-Sep-2015 Post-History: 04-Sep-2015 +Resolution: https://mail.python.org/pipermail/distutils-sig/2015-September/026899.html Abstract @@ -91,6 +92,10 @@ In addition to the above, the following constraints are placed on the API: associated signature, the signature would be located at ``/packages/HolyGrail-1.0.tar.gz.asc``. +* A repository **MAY** include a ``data-gpg-sig`` attribute on a file link with + a value of either ``true`` or ``false`` to indicate whether or not there is a + GPG signature. Repositories that do this **SHOULD** include it on every link. + Normalized Names ---------------- diff --git a/pep-0504.txt b/pep-0504.txt new file mode 100644 index 000000000..7f7c3b0ca --- /dev/null +++ b/pep-0504.txt @@ -0,0 +1,396 @@ +PEP: 504 +Title: Using the System RNG by default +Version: $Revision$ +Last-Modified: $Date$ +Author: Nick Coghlan +Status: Withdrawn +Type: Standards Track +Content-Type: text/x-rst +Created: 15-Sep-2015 +Python-Version: 3.6 +Post-History: 15-Sep-2015 + +Abstract +======== + +Python currently defaults to using the deterministic Mersenne Twister random +number generator for the module level APIs in the ``random`` module, requiring +users to know that when they're performing "security sensitive" work, they +should instead switch to using the cryptographically secure ``os.urandom`` or +``random.SystemRandom`` interfaces or a third party library like +``cryptography``. + +Unfortunately, this approach has resulted in a situation where developers that +aren't aware that they're doing security sensitive work use the default module +level APIs, and thus expose their users to unnecessary risks. + +This isn't an acute problem, but it is a chronic one, and the often long +delays between the introduction of security flaws and their exploitation means +that it is difficult for developers to naturally learn from experience. + +In order to provide an eventually pervasive solution to the problem, this PEP +proposes that Python switch to using the system random number generator by +default in Python 3.6, and require developers to opt-in to using the +deterministic random number generator process wide either by using a new +``random.ensure_repeatable()`` API, or by explicitly creating their own +``random.Random()`` instance. + +To minimise the impact on existing code, module level APIs that require +determinism will implicitly switch to the deterministic PRNG. + +PEP Withdrawal +============== + +During discussion of this PEP, Steven D'Aprano proposed the simpler alternative +of offering a standardised ``secrets`` module that provides "one obvious way" +to handle security sensitive tasks like generating default passwords and other +tokens. + +Steven's proposal has the desired effect of aligning the easy way to generate +such tokens and the right way to generate them, without introducing any +compatibility risks for the existing ``random`` module API, so this PEP has +been withdrawn in favour of further work on refining Steven's proposal as +PEP 506. + + +Proposal +======== + +Currently, it is never correct to use the module level functions in the +``random`` module for security sensitive applications. This PEP proposes to +change that admonition in Python 3.6+ to instead be that it is not correct to +use the module level functions in the ``random`` module for security sensitive +applications if ``random.ensure_repeatable()`` is ever called (directly or +indirectly) in that process. + +To achieve this, rather than being bound methods of a ``random.Random`` +instance as they are today, the module level callables in ``random`` would +change to be functions that delegate to the corresponding method of the +existing ``random._inst`` module attribute. + +By default, this attribute will be bound to a ``random.SystemRandom`` instance. + +A new ``random.ensure_repeatable()`` API will then rebind the ``random._inst`` +attribute to a ``system.Random`` instance, restoring the same module level +API behaviour as existed in previous Python versions (aside from the +additional level of indirection):: + + def ensure_repeatable(): + """Switch to using random.Random() for the module level APIs + + This switches the default RNG instance from the crytographically + secure random.SystemRandom() to the deterministic random.Random(), + enabling the seed(), getstate() and setstate() operations. This means + a particular random scenario can be replayed later by providing the + same seed value or restoring a previously saved state. + + NOTE: Libraries implementing security sensitive operations should + always explicitly use random.SystemRandom() or os.urandom in order to + correctly handle applications that call this function. + """ + if not isinstance(_inst, Random): + _inst = random.Random() + +To minimise the impact on existing code, calling any of the following module +level functions will implicitly call ``random.ensure_repeatable()``: + +* ``random.seed`` +* ``random.getstate`` +* ``random.setstate`` + +There are no changes proposed to the ``random.Random`` or +``random.SystemRandom`` class APIs - applications that explicitly instantiate +their own random number generators will be entirely unaffected by this +proposal. + +Warning on implicit opt-in +-------------------------- + +In Python 3.6, implicitly opting in to the use of the deterministic PRNG will +emit a deprecation warning using the following check:: + + if not isinstance(_inst, Random): + warnings.warn(DeprecationWarning, + "Implicitly ensuring repeatability. " + "See help(random.ensure_repeatable) for details") + ensure_repeatable() + +The specific wording of the warning should have a suitable answer added to +Stack Overflow as was done for the custom error message that was added for +missing parentheses in a call to print [#print]_. + +In the first Python 3 release after Python 2.7 switches to security fix only +mode, the deprecation warning will be upgraded to a RuntimeWarning so it is +visible by default. + +This PEP does *not* propose ever removing the ability to ensure the default RNG +used process wide is a deterministic PRNG that will produce the same series of +outputs given a specific seed. That capability is widely used in modelling +and simulation scenarios, and requiring that ``ensure_repeatable()`` be called +either directly or indirectly is a sufficient enhancement to address the cases +where the module level random API is used for security sensitive tasks in web +applications without due consideration for the potential security implications +of using a deterministic PRNG. + +Performance impact +------------------ + +Due to the large performance difference between ``random.Random`` and +``random.SystemRandom``, applications ported to Python 3.6 will encounter a +significant performance regression in cases where: + +* the application is using the module level random API +* cryptographic quality randomness isn't needed +* the application doesn't already implicitly opt back in to the deterministic + PRNG by calling ``random.seed``, ``random.getstate``, or ``random.setstate`` +* the application isn't updated to explicitly call ``random.ensure_repeatable`` + +This would be noted in the Porting section of the Python 3.6 What's New guide, +with the recommendation to include the following code in the ``__main__`` +module of affected applications:: + + if hasattr(random, "ensure_repeatable"): + random.ensure_repeatable() + +Applications that do need cryptographic quality randomness should be using the +system random number generator regardless of speed considerations, so in those +cases the change proposed in this PEP will fix a previously latent security +defect. + +Documentation changes +--------------------- + +The ``random`` module documentation would be updated to move the documentation +of the ``seed``, ``getstate`` and ``setstate`` interfaces later in the module, +along with the documentation of the new ``ensure_repeatable`` function and the +associated security warning. + +That section of the module documentation would also gain a discussion of the +respective use cases for the deterministic PRNG enabled by +``ensure_repeatable`` (games, modelling & simulation, software testing) and the +system RNG that is used by default (cryptography, security token generation). +This discussion will also recommend the use of third party security libraries +for the latter task. + +Rationale +========= + +Writing secure software under deadline and budget pressures is a hard problem. +This is reflected in regular notifications of data breaches involving personally +identifiable information [#breaches]_, as well as with failures to take +security considerations into account when new systems, like motor vehicles +[#uconnect]_, are connected to the internet. It's also the case that a lot of +the programming advice readily available on the internet [#search] simply +doesn't take the mathemetical arcana of computer security into account. +Compounding these issues is the fact that defenders have to cover *all* of +their potential vulnerabilites, as a single mistake can make it possible to +subvert other defences [#bcrypt]_. + +One of the factors that contributes to making this last aspect particularly +difficult is APIs where using them inappropriately creates a *silent* security +failure - one where the only way to find out that what you're doing is +incorrect is for someone reviewing your code to say "that's a potential +security problem", or for a system you're responsible for to be compromised +through such an oversight (and you're not only still responsible for that +system when it is compromised, but your intrusion detection and auditing +mechanisms are good enough for you to be able to figure out after the event +how the compromise took place). + +This kind of situation is a significant contributor to "security fatigue", +where developers (often rightly [#owasptopten]_) feel that security engineers +spend all their time saying "don't do that the easy way, it creates a +security vulnerability". + +As the designers of one of the world's most popular languages [#ieeetopten]_, +we can help reduce that problem by making the easy way the right way (or at +least the "not wrong" way) in more circumstances, so developers and security +engineers can spend more time worrying about mitigating actually interesting +threats, and less time fighting with default language behaviours. + +Discussion +========== + +Why "ensure_repeatable" over "ensure_deterministic"? +---------------------------------------------------- + +This is a case where the meaning of a word as specialist jargon conflicts with +the typical meaning of the word, even though it's *technically* the same. + +From a technical perspective, a "deterministic RNG" means that given knowledge +of the algorithm and the current state, you can reliably compute arbitrary +future states. + +The problem is that "deterministic" on its own doesn't convey those qualifiers, +so it's likely to instead be interpreted as "predictable" or "not random" by +folks that are familiar with the conventional meaning, but aren't familiar with +the additional qualifiers on the technical meaning. + +A second problem with "deterministic" as a description for the traditional RNG +is that it doesn't really tell you what you can *do* with the traditional RNG +that you can't do with the system one. + +"ensure_repeatable" aims to address both of those problems, as its common +meaning accurately describes the main reason for preferring the deterministic +PRNG over the system RNG: ensuring you can repeat the same series of outputs +by providing the same seed value, or by restoring a previously saved PRNG state. + +Only changing the default for Python 3.6+ +----------------------------------------- + +Some other recent security changes, such as upgrading the capabilities of the +``ssl`` module and switching to properly verifying HTTPS certificates by +default, have been considered critical enough to justify backporting the +change to all currently supported versions of Python. + +The difference in this case is one of degree - the additional benefits from +rolling out this particular change a couple of years earlier than will +otherwise be the case aren't sufficient to justify either the additional effort +or the stability risks involved in making such an intrusive change in a +maintenance release. + +Keeping the module level functions +---------------------------------- + +In additional to general backwards compatibility considerations, Python is +widely used for educational purposes, and we specifically don't want to +invalidate the wide array of educational material that assumes the availabilty +of the current ``random`` module API. Accordingly, this proposal ensures that +most of the public API can continue to be used not only without modification, +but without generating any new warnings. + +Warning when implicitly opting in to the deterministic RNG +---------------------------------------------------------- + +It's necessary to implicitly opt in to the deterministic PRNG as Python is +widely used for modelling and simulation purposes where this is the right +thing to do, and in many cases, these software models won't have a dedicated +maintenance team tasked with ensuring they keep working on the latest versions +of Python. + +Unfortunately, explicitly calling ``random.seed`` with data from ``os.urandom`` +is also a mistake that appears in a number of the flawed "how to generate a +security token in Python" guides readily available online. + +Using first DeprecationWarning, and then eventually a RuntimeWarning, to +advise against implicitly switching to the deterministic PRNG aims to +nudge future users that need a cryptographically secure RNG away from +calling ``random.seed()`` and those that genuinely need a deterministic +generator towards explicitily calling ``random.ensure_repeatable()``. + +Avoiding the introduction of a userspace CSPRNG +----------------------------------------------- + +The original discussion of this proposal on python-ideas[#csprng]_ suggested +introducing a cryptographically secure pseudo-random number generator and using +that by default, rather than defaulting to the relatively slow system random +number generator. + +The problem [#nocsprng]_ with this approach is that it introduces an additional +point of failure in security sensitive situations, for the sake of applications +where the random number generation may not even be on a critical performance +path. + +Applications that do need cryptographic quality randomness should be using the +system random number generator regardless of speed considerations, so in those +cases. + +Isn't the deterministic PRNG "secure enough"? +--------------------------------------------- + +In a word, "No" - that's why there's a warning in the module documentation +that says not to use it for security sensitive purposes. While we're not +currently aware of any studies of Python's random number generator specifically, +studies of PHP's random number generator [#php]_ have demonstrated the ability +to use weaknesses in that subsystem to facilitate a practical attack on +password recovery tokens in popular PHP web applications. + +However, one of the rules of secure software development is that "attacks only +get better, never worse", so it may be that by the time Python 3.6 is released +we will actually see a practical attack on Python's deterministic PRNG publicly +documented. + +Security fatigue in the Python ecosystem +---------------------------------------- + +Over the past few years, the computing industry as a whole has been +making a concerted effort to upgrade the shared network infrastructure we all +depend on to a "secure by default" stance. As one of the most widely used +programming languages for network service development (including the OpenStack +Infrastructure-as-a-Service platform) and for systems administration +on Linux systems in general, a fair share of that burden has fallen on the +Python ecosystem, which is understandably frustrating for Pythonistas using +Python in other contexts where these issues aren't of as great a concern. + +This consideration is one of the primary factors driving the substantial +backwards compatibility improvements in this proposal relative to the initial +draft concept posted to python-ideas [#draft]_. + +Acknowledgements +================ + +* Theo de Raadt, for making the suggestion to Guido van Rossum that we + seriously consider defaulting to a cryptographically secure random number + generator +* Serhiy Storchaka, Terry Reedy, Petr Viktorin, and anyone else in the + python-ideas threads that suggested the approach of transparently switching + to the ``random.Random`` implementation when any of the functions that only + make sense for a deterministic RNG are called +* Nathaniel Smith for providing the reference on practical attacks against + PHP's random number generator when used to generate password reset tokens +* Donald Stufft for pursuing additional discussions with network security + experts that suggested the introduction of a userspace CSPRNG would mean + additional complexity for insufficient gain relative to just using the + system RNG directly +* Paul Moore for eloquently making the case for the current level of security + fatigue in the Python ecosystem + +References +========== + +.. [#breaches] Visualization of data breaches involving more than 30k records (each) + (http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/) + +.. [#uconnect] Remote UConnect hack for Jeep Cherokee + (http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/) + +.. [#php] PRNG based attack against password reset tokens in PHP applications + (https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf) + +.. [#search] Search link for "python password generator" + (https://www.google.com.au/search?q=python+password+generator) + +.. [#csprng] python-ideas thread discussing using a userspace CSPRNG + (https://mail.python.org/pipermail/python-ideas/2015-September/035886.html) + +.. [#draft] Initial draft concept that eventually became this PEP + (https://mail.python.org/pipermail/python-ideas/2015-September/036095.html) + +.. [#nocsprng] Safely generating random numbers + (http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/) + +.. [#ieeetopten] IEEE Spectrum 2015 Top Ten Programming Languages + (http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages) + +.. [#owasptopten] OWASP Top Ten Web Security Issues for 2013 + (https://www.owasp.org/index.php/OWASP_Top_Ten_Project#tab=OWASP_Top_10_for_2013) + +.. [#print] Stack Overflow answer for missing parentheses in call to print + (http://stackoverflow.com/questions/25445439/what-does-syntaxerror-missing-parentheses-in-call-to-print-mean-in-python/25445440#25445440) + +.. [#bcrypt] Bypassing bcrypt through an insecure data cache + (http://arstechnica.com/security/2015/09/once-seen-as-bulletproof-11-million-ashley-madison-passwords-already-cracked/) + +Copyright +========= + +This document has been placed in the public domain. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/pep-0505.txt b/pep-0505.txt new file mode 100644 index 000000000..a855a07f2 --- /dev/null +++ b/pep-0505.txt @@ -0,0 +1,205 @@ +PEP: 505 +Title: None coalescing operators +Version: $Revision$ +Last-Modified: $Date$ +Author: Mark E. Haase +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 18-Sep-2015 +Python-Version: 3.6 + +Abstract +======== + +Several modern programming languages have so-called "null coalescing" or +"null aware" operators, including C#, Dart, Perl, Swift, and PHP (starting in +version 7). These operators provide syntactic sugar for common patterns +involving null references. [1]_ [2]_ + +* The "null coalescing" operator is a binary operator that returns its first + first non-null operand. +* The "null aware member access" operator is a binary operator that accesses + an instance member only if that instance is non-null. It returns null + otherwise. +* The "null aware index access" operator is a binary operator that accesses a + member of a collection only if that collection is non-null. It returns null + otherwise. + +Python does not have any directly equivalent syntax. The ``or`` operator can +be used to similar effect but checks for a truthy value, not ``None`` +specifically. The ternary operator ``... if ... else ...`` can be used for +explicit null checks but is more verbose and typically duplicates part of the +expression in between ``if`` and ``else``. The proposed ``None`` coalescing +and ``None`` aware operators ofter an alternative syntax that is more +intuitive and concise. + + +Rationale +========= + +Null Coalescing Operator +------------------------ + +The following code illustrates how the ``None`` coalescing operators would +work in Python:: + + >>> title = 'My Title' + >>> title ?? 'Default Title' + 'My Title' + >>> title = None + >>> title ?? 'Default Title' + 'Default Title' + +Similar behavior can be achieved with the ``or`` operator, but ``or`` checks +whether its left operand is false-y, not specifically ``None``. This can lead +to surprising behavior. Consider the scenario of computing the price of some +products a customer has in his/her shopping cart:: + + >>> price = 100 + >>> requested_quantity = 5 + >>> default_quantity = 1 + >>> (requested_quantity or default_quantity) * price + 500 + >>> requested_quantity = None + >>> (requested_quantity or default_quantity) * price + 100 + >>> requested_quantity = 0 + >>> (requested_quantity or default_quantity) * price # oops! + 100 + +This type of bug is not possible with the ``None`` coalescing operator, +because there is no implicit type coersion to ``bool``:: + + >>> price = 100 + >>> requested_quantity = 0 + >>> default_quantity = 1 + >>> (requested_quantity ?? default_quantity) * price + 0 + +The same correct behavior can be achieved with the ternary operator. Here is +an excerpt from the popular Requests package:: + + data = [] if data is None else data + files = [] if files is None else files + headers = {} if headers is None else headers + params = {} if params is None else params + hooks = {} if hooks is None else hooks + +This particular formulation has the undesirable effect of putting the operands +in an unintuitive order: the brain thinks, "use ``data`` if possible and use +``[]`` as a fallback," but the code puts the fallback *before* the preferred +value. + +The author of this package could have written it like this instead:: + + data = data if data is not None else [] + files = files if files is not None else [] + headers = headers if headers is not None else {} + params = params if params is not None else {} + hooks = hooks if hooks is not None else {} + +This ordering of the operands is more intuitive, but it requires 4 extra +characters (for "not "). It also highlights the repetition of identifiers: +``data if data``, ``files if files``, etc. The ``None`` coalescing operator +improves readability:: + + data = data ?? [] + files = files ?? [] + headers = headers ?? {} + params = params ?? {} + hooks = hooks ?? {} + +The ``None`` coalescing operator also has a corresponding assignment shortcut. + +:: + + data ?= [] + files ?= [] + headers ?= {} + params ?= {} + hooks ?= {} + +The ``None`` coalescing operator is left-associative, which allows for easy +chaining:: + + >>> user_title = None + >>> local_default_title = None + >>> global_default_title = 'Global Default Title' + >>> title = user_title ?? local_default_title ?? global_default_title + 'Global Default Title' + +The direction of associativity is important because the ``None`` coalescing +operator short circuits: if its left operand is non-null, then the right +operand is not evaluated. + +:: + + >>> def get_default(): raise Exception() + >>> 'My Title' ?? get_default() + 'My Title' + + +Null-Aware Member Access Operator +--------------------------------- + +:: + + >>> title = 'My Title' + >>> title.upper() + 'MY TITLE' + >>> title = None + >>> title.upper() + Traceback (most recent call last): + File "", line 1, in + AttributeError: 'NoneType' object has no attribute 'upper' + >>> title?.upper() + None + + +Null-Aware Index Access Operator +--------------------------------- + +:: + + >>> person = {'name': 'Mark', 'age': 32} + >>> person['name'] + 'Mark' + >>> person = None + >>> person['name'] + Traceback (most recent call last): + File "", line 1, in + TypeError: 'NoneType' object is not subscriptable + >>> person?['name'] + None + + +Specification +============= + + +References +========== + +.. [1] Wikipedia: Null coalescing operator + (https://en.wikipedia.org/wiki/Null_coalescing_operator) + +.. [2] Seth Ladd's Blog: Null-aware operators in Dart + (http://blog.sethladd.com/2015/07/null-aware-operators-in-dart.html) + + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/pep-0506.txt b/pep-0506.txt new file mode 100644 index 000000000..9bd7a2dfd --- /dev/null +++ b/pep-0506.txt @@ -0,0 +1,449 @@ +PEP: 506 +Title: Adding A Secrets Module To The Standard Library +Version: $Revision$ +Last-Modified: $Date$ +Author: Steven D'Aprano +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 19-Sep-2015 +Python-Version: 3.6 +Post-History: + + +Abstract +======== + +This PEP proposes the addition of a module for common security-related +functions such as generating tokens to the Python standard library. + + +Definitions +=========== + +Some common abbreviations used in this proposal: + +* PRNG: + + Pseudo Random Number Generator. A deterministic algorithm used + to produce random-looking numbers with certain desirable + statistical properties. + +* CSPRNG: + + Cryptographically Strong Pseudo Random Number Generator. An + algorithm used to produce random-looking numbers which are + resistant to prediction. + +* MT: + + Mersenne Twister. An extensively studied PRNG which is currently + used by the ``random`` module as the default. + + +Rationale +========= + +This proposal is motivated by concerns that Python's standard library +makes it too easy for developers to inadvertently make serious security +errors. Theo de Raadt, the founder of OpenBSD, contacted Guido van Rossum +and expressed some concern [#]_ about the use of MT for generating sensitive +information such as passwords, secure tokens, session keys and similar. + +Although the documentation for the ``random`` module explicitly states that +the default is not suitable for security purposes [#]_, it is strongly +believed that this warning may be missed, ignored or misunderstood by +many Python developers. In particular: + +* developers may not have read the documentation and consequently + not seen the warning; + +* they may not realise that their specific use of the module has security + implications; or + +* not realising that there could be a problem, they have copied code + (or learned techniques) from websites which don't offer best + practises. + +The first [#]_ hit when searching for "python how to generate passwords" on +Google is a tutorial that uses the default functions from the ``random`` +module [#]_. Although it is not intended for use in web applications, it is +likely that similar techniques find themselves used in that situation. +The second hit is to a StackOverflow question about generating +passwords [#]_. Most of the answers given, including the accepted one, use +the default functions. When one user warned that the default could be +easily compromised, they were told "I think you worry too much." [#]_ + +This strongly suggests that the existing ``random`` module is an attractive +nuisance when it comes to generating (for example) passwords or secure +tokens. + +Additional motivation (of a more philosophical bent) can be found in the +post which first proposed this idea [#]_. + + +Proposal +======== + +Alternative proposals have focused on the default PRNG in the ``random`` +module, with the aim of providing "secure by default" cryptographically +strong primitives that developers can build upon without thinking about +security. (See Alternatives below.) This proposes a different approach: + +* The standard library already provides cryptographically strong + primitives, but many users don't know they exist or when to use them. + +* Instead of requiring crypto-naive users to write secure code, the + standard library should include a set of ready-to-use "batteries" for + the most common needs, such as generating secure tokens. This code + will both directly satisfy a need ("How do I generate a password reset + token?"), and act as an example of acceptable practises which + developers can learn from [#]_. + +To do this, this PEP proposes that we add a new module to the standard +library, with the suggested name ``secrets``. This module will contain a +set of ready-to-use functions for common activities with security +implications, together with some lower-level primitives. + +The suggestion is that ``secrets`` becomes the go-to module for dealing +with anything which should remain secret (passwords, tokens, etc.) +while the ``random`` module remains backward-compatible. + + +API and Implementation +====================== + +The contents of the ``secrets`` module is expected to evolve over time, and +likely will evolve between the time of writing this PEP and actual release +in the standard library [#]_. At the time of writing, the following functions +have been suggested: + +* A high-level function for generating secure tokens suitable for use + in (e.g.) password recovery, as session keys, etc. + +* A limited interface to the system CSPRNG, using either ``os.urandom`` + directly or ``random.SystemRandom``. Unlike the ``random`` module, this + does not need to provide methods for seeding, getting or setting the + state, or any non-uniform distributions. It should provide the + following: + + - A function for choosing items from a sequence, ``secrets.choice``. + - A function for generating an integer within some range, such as + ``secrets.randrange`` or ``secrets.randint``. + - A function for generating a given number of random bits and/or bytes + as an integer. + - A similar function which returns the value as a hex digit string. + +* ``hmac.compare_digest`` under the name ``equal``. + +The consensus appears to be that there is no need to add a new CSPRNG to +the ``random`` module to support these uses, ``SystemRandom`` will be +sufficient. + +Some illustrative implementations have been given by Nick Coghlan [#]_ +and a minimalist API by Tim Peters [#]_. This idea has also been discussed +on the issue tracker for the "cryptography" module [#]_. The following +pseudo-code can be taken as a possible starting point for the real +implementation:: + + from random import SystemRandom + from hmac import compare_digest as equal + + _sysrand = SystemRandom() + + randrange = _sysrand.randrange + randint = _sysrand.randint + randbits = _sysrand.getrandbits + choice = _sysrand.choice + + def randbelow(exclusive_upper_bound): + return _sysrand._randbelow(exclusive_upper_bound) + + DEFAULT_ENTROPY = 32 # bytes + + def token_bytes(nbytes=None): + if nbytes is None: + nbytes = DEFAULT_ENTROPY + return os.urandom(nbytes) + + def token_hex(nbytes=None): + return binascii.hexlify(token_bytes(nbytes)).decode('ascii') + + def token_url(nbytes=None): + tok = token_bytes(nbytes) + return base64.urlsafe_b64encode(tok).rstrip(b'=').decode('ascii') + + +The ``secrets`` module itself will be pure Python, and other Python +implementations can easily make use of it unchanged, or adapt it as +necessary. + +Default arguments +~~~~~~~~~~~~~~~~~ + +One difficult question is "How many bytes should my token be?". We can +help with this question by providing a default amount of entropy for the +"token_*" functions. If the ``nbytes`` argument is None or not given, the +default entropy will be used. This default value should be large enough +to be expected to be secure for medium-security uses, but is expected to +change in the future, possibly even in a maintenance release [#]_. + +Naming conventions +~~~~~~~~~~~~~~~~~~ + +One question is the naming conventions used in the module [#]_, whether to +use C-like naming conventions such as "randrange" or more Pythonic names +such as "random_range". + +Functions which are simply bound methods of the private ``SystemRandom`` +instance (e.g. ``randrange``), or a thin wrapper around such, should keep +the familiar names. Those which are something new (such as the various +``token_*`` functions) will use more Pythonic names. + +Alternatives +============ + +One alternative is to change the default PRNG provided by the ``random`` +module [#]_. This received considerable scepticism and outright opposition: + +* There is fear that a CSPRNG may be slower than the current PRNG (which + in the case of MT is already quite slow). + +* Some applications (such as scientific simulations, and replaying + gameplay) require the ability to seed the PRNG into a known state, + which a CSPRNG lacks by design. + +* Another major use of the ``random`` module is for simple "guess a number" + games written by beginners, and many people are loath to make any + change to the ``random`` module which may make that harder. + +* Although there is no proposal to remove MT from the ``random`` module, + there was considerable hostility to the idea of having to opt-in to + a non-CSPRNG or any backwards-incompatible changes. + +* Demonstrated attacks against MT are typically against PHP applications. + It is believed that PHP's version of MT is a significantly softer target + than Python's version, due to a poor seeding technique [#]_. Consequently, + without a proven attack against Python applications, many people object + to a backwards-incompatible change. + +Nick Coghlan made an earlier suggestion for a globally configurable PRNG +which uses the system CSPRNG by default [#]_, but has since withdrawn it +in favour of this proposal. + + +Comparison To Other Languages +============================= + +* PHP + + PHP includes a function ``uniqid`` [#]_ which by default returns a + thirteen character string based on the current time in microseconds. + Translated into Python syntax, it has the following signature:: + + def uniqid(prefix='', more_entropy=False)->str + + The PHP documentation warns that this function is not suitable for + security purposes. Nevertheless, various mature, well-known PHP + applications use it for that purpose (citation needed). + + PHP 5.3 and better also includes a function ``openssl_random_pseudo_bytes`` + [#]_. Translated into Python syntax, it has roughly the following + signature:: + + def openssl_random_pseudo_bytes(length:int)->Tuple[str, bool] + + This function returns a pseudo-random string of bytes of the given + length, and an boolean flag giving whether the string is considered + cryptographically strong. The PHP manual suggests that returning + anything but True should be rare except for old or broken platforms. + +* JavaScript + + Based on a rather cursory search [#]_, there do not appear to be any + well-known standard functions for producing strong random values in + JavaScript. ``Math.random`` is often used, despite serious weaknesses + making it unsuitable for cryptographic purposes [#]_. In recent years + the majority of browsers have gained support for ``window.crypto.getRandomValues`` [#]_. + + Node.js offers a rich cryptographic module, ``crypto`` [#]_, most of + which is beyond the scope of this PEP. It does include a single function + for generating random bytes, ``crypto.randomBytes``. + +* Ruby + + The Ruby standard library includes a module ``SecureRandom`` [#]_ + which includes the following methods: + + * base64 - returns a Base64 encoded random string. + + * hex - returns a random hexadecimal string. + + * random_bytes - returns a random byte string. + + * random_number - depending on the argument, returns either a random + integer in the range(0, n), or a random float between 0.0 and 1.0. + + * urlsafe_base64 - returns a random URL-safe Base64 encoded string. + + * uuid - return a version 4 random Universally Unique IDentifier. + + +What Should Be The Name Of The Module? +====================================== + +There was a proposal to add a "random.safe" submodule, quoting the Zen +of Python "Namespaces are one honking great idea" koan. However, the +author of the Zen, Tim Peters, has come out against this idea [#]_, and +recommends a top-level module. + +In discussion on the python-ideas mailing list so far, the name "secrets" +has received some approval, and no strong opposition. + +There is already an existing third-party module with the same name [#]_, +but it appears to be unused and abandoned. + + +Frequently Asked Questions +========================== + +* Q: Is this a real problem? Surely MT is random enough that nobody can + predict its output. + + A: The consensus among security professionals is that MT is not safe + in security contexts. It is not difficult to reconstruct the internal + state of MT [#]_ [#]_ and so predict all past and future values. There + are a number of known, practical attacks on systems using MT for + randomness [#]_. + + While there are currently no known direct attacks on applications + written in Python due to the use of MT, there is widespread agreement + that such usage is unsafe. + +* Q: Is this an alternative to specialise cryptographic software such as SSL? + + A: No. This is a "batteries included" solution, not a full-featured + "nuclear reactor". It is intended to mitigate against some basic + security errors, not be a solution to all security-related issues. To + quote Nick Coghlan referring to his earlier proposal [#]_:: + + "...folks really are better off learning to use things like + cryptography.io for security sensitive software, so this change + is just about harm mitigation given that it's inevitable that a + non-trivial proportion of the millions of current and future + Python developers won't do that." + +* Q: What about a password generator? + + A: The consensus is that the requirements for password generators are too + variable for it to be a good match for the standard library [#]_. No + password generator will be included in the initial release of the + module, instead it will be given in the documentation as a recipe (à la + the recipes in the ``itertools`` module) [#]_. + +* Q: Will ``secrets`` use /dev/random (which blocks) or /dev/urandom (which + doesn't block) on Linux? What about other platforms? + + A: ``secrets`` will be based on ``os.urandom`` and ``random.SystemRandom``, + which are interfaces to your operating system's best source of + cryptographic randomness. On Linux, that may be ``/dev/urandom`` [#]_, + on Windows it may be ``CryptGenRandom()``, but see the documentation + and/or source code for the detailed implementation details. + + +References +========== + +.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/035820.html + +.. [#] https://docs.python.org/3/library/random.html + +.. [#] As of the date of writing. Also, as Google search terms may be + automatically customised for the user without their knowledge, some + readers may see different results. + +.. [#] http://interactivepython.org/runestone/static/everyday/2013/01/3_password.html + +.. [#] http://stackoverflow.com/questions/3854692/generate-password-in-python + +.. [#] http://stackoverflow.com/questions/3854692/generate-password-in-python/3854766#3854766 + +.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036238.html + +.. [#] At least those who are motivated to read the source code and documentation. + +.. [#] Tim Peters suggests that bike-shedding the contents of the module will + be 10000 times more time consuming than actually implementing the + module. Words do not begin to express how much I am looking forward to + this. + +.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036271.html + +.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036350.html + +.. [#] https://github.com/pyca/cryptography/issues/2347 + +.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036517.html + https://mail.python.org/pipermail/python-ideas/2015-September/036515.html + +.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036474.html + +.. [#] Link needed. + +.. [#] By default PHP seeds the MT PRNG with the time (citation needed), + which is exploitable by attackers, while Python seeds the PRNG with + output from the system CSPRNG, which is believed to be much harder to + exploit. + +.. [#] http://legacy.python.org/dev/peps/pep-0504/ + +.. [#] http://php.net/manual/en/function.uniqid.php + +.. [#] http://php.net/manual/en/function.openssl-random-pseudo-bytes.php + +.. [#] Volunteers and patches are welcome. + +.. [#] http://ifsec.blogspot.fr/2012/05/cross-domain-mathrandom-prediction.html + +.. [#] https://developer.mozilla.org/en-US/docs/Web/API/RandomSource/getRandomValues + +.. [#] https://nodejs.org/api/crypto.html + +.. [#] http://ruby-doc.org/stdlib-2.1.2/libdoc/securerandom/rdoc/SecureRandom.html + +.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036254.html + +.. [#] https://pypi.python.org/pypi/secrets + +.. [#] https://jazzy.id.au/2010/09/22/cracking_random_number_generators_part_3.html + +.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036077.html + +.. [#] https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf + +.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036157.html + +.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036476.html + https://mail.python.org/pipermail/python-ideas/2015-September/036478.html + +.. [#] https://mail.python.org/pipermail/python-ideas/2015-September/036488.html + +.. [#] http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/ + http://www.2uo.de/myths-about-urandom/ + + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/pep-0507.txt b/pep-0507.txt new file mode 100644 index 000000000..b899a8c38 --- /dev/null +++ b/pep-0507.txt @@ -0,0 +1,331 @@ +PEP: 507 +Title: Migrate CPython to Git and GitLab +Version: $Revision$ +Last-Modified: $Date$ +Author: Barry Warsaw +Status: Draft +Type: Process +Content-Type: text/x-rst +Created: 2015-09-30 +Post-History: + + +Abstract +======== + +This PEP proposes migrating the repository hosting of CPython and the +supporting repositories to Git. Further, it proposes adopting a +hosted GitLab instance as the primary way of handling merge requests, +code reviews, and code hosting. It is similar in intent to PEP 481 +but proposes an open source alternative to GitHub and omits the +proposal to run Phabricator. As with PEP 481, this particular PEP is +offered as an alternative to PEP 474 and PEP 462. + + +Rationale +========= + +CPython is an open source project which relies on a number of +volunteers donating their time. As with any healthy, vibrant open +source project, it relies on attracting new volunteers as well as +retaining existing developers. Given that volunteer time is the most +scarce resource, providing a process that maximizes the efficiency of +contributors and reduces the friction for contributions, is of vital +importance for the long-term health of the project. + +The current tool chain of the CPython project is a custom and unique +combination of tools. This has two critical implications: + +* The unique nature of the tool chain means that contributors must + remember or relearn, the process, workflow, and tools whenever they + contribute to CPython, without the advantage of leveraging long-term + memory and familiarity they retain by working with other projects in + the FLOSS ecosystem. The knowledge they gain in working with + CPython is unlikely to be applicable to other projects. + +* The burden on the Python/PSF infrastructure team is much greater in + order to continue to maintain custom tools, improve them over time, + fix bugs, address security issues, and more generally adapt to new + standards in online software development with global collaboration. + +These limitations act as a barrier to contribution both for highly +engaged contributors (e.g. core Python developers) and especially for +more casual "drive-by" contributors, who care more about getting their +bug fix than learning a new suite of tools and workflows. + +By proposing the adoption of both a different version control system +and a modern, well-maintained hosting solution, this PEP addresses +these limitations. It aims to enable a modern, well-understood +process that will carry CPython development for many years. + + +Version Control System +---------------------- + +Currently the CPython and supporting repositories use Mercurial. As a +modern distributed version control system, it has served us well since +the migration from Subversion. However, when evaluating the VCS we +must consider the capabilities of the VCS itself as well as the +network effect and mindshare of the community around that VCS. + +There are really only two real options for this, Mercurial and Git. +The technical capabilities of the two systems are largely equivalent, +therefore this PEP instead focuses on their social aspects. + +It is not possible to get exact numbers for the number of projects or +people which are using a particular VCS, however we can infer this by +looking at several sources of information for what VCS projects are +using. + +The Open Hub (previously Ohloh) statistics [#openhub-stats]_ show that +37% of the repositories indexed by The Open Hub are using Git (second +only to Subversion which has 48%) while Mercurial has just 2%, beating +only Bazaar which has 1%. This has Git being just over 18 times as +popular as Mercurial on The Open Hub. + +Another source of information on VCS popularity is PyPI itself. This +source is more targeted at the Python community itself since it +represents projects developed for Python. Unfortunately PyPI does not +have a standard location for representing this information, so this +requires manual processing. If we limit our search to the top 100 +projects on PyPI (ordered by download counts) we can see that 62% of +them use Git, while 22% of them use Mercurial, and 13% use something +else. This has Git being just under 3 times as popular as Mercurial +for the top 100 projects on PyPI. + +These numbers back up the anecdotal evidence for Git as the far more +popular DVCS for open source projects. Choosing the more popular VCS +has a number of positive benefits. + +For new contributors it increases the likelihood that they will have already +learned the basics of Git as part of working with another project or if they +are just now learning Git, that they'll be able to take that knowledge and +apply it to other projects. Additionally a larger community means more people +writing how to guides, answering questions, and writing articles about Git +which makes it easier for a new user to find answers and information about the +tool they are trying to learn and use. Given its popularity, there may also +be more auxiliary tooling written *around* Git. This increases options for +everything from GUI clients, helper scripts, repository hosting, etc. + +Further, the adoption of Git as the proposed back-end repository +format doesn't prohibit the use of Mercurial by fans of that VCS! +Mercurial users have the [#hg-git]_ plugin which allows them to push +and pull from a Git server using the Mercurial front-end. It's a +well-maintained and highly functional plugin that seems to be +well-liked by Mercurial users. + + +Repository Hosting +------------------ + +Where and how the official repositories for CPython are hosted is in +someways determined by the choice of VCS. With Git there are several +options. In fact, once the repository is hosted in Git, branches can +be mirrored in many locations, within many free, open, and proprietary +code hosting sites. + +It's still important for CPython to adopt a single, official +repository, with a web front-end that allows for many convenient and +common interactions entirely through the web, without always requiring +local VCS manipulations. These interactions include as a minimum, +code review with inline comments, branch diffing, CI integration, and +auto-merging. + +This PEP proposes to adopt a [#GitLab]_ instance, run within the +python.org domain, accessible to and with ultimate control from the +PSF and the Python infrastructure team, but donated, hosted, and +primarily maintained by GitLab, Inc. + +Why GitLab? Because it is a fully functional Git hosting system, that +sports modern web interactions, software workflows, and CI +integration. GitLab's Community Edition (CE) is open source software, +and thus is closely aligned with the principles of the CPython +community. + + +Code Review +----------- + +Currently CPython uses a custom fork of Rietveld modified to not run +on Google App Engine and which is currently only really maintained by +one person. It is missing common features present in many modern code +review tools. + +This PEP proposes to utilize GitLab's built-in merge requests and +online code review features to facilitate reviews of all proposed +changes. + + +GitLab merge requests +--------------------- + +The normal workflow for a GitLab hosted project is to submit a *merge request* +asking that a feature or bug fix branch be merged into a target branch, +usually one or more of the stable maintenance branches or the next-version +master branch for new features. GitLab's merge requests are similar in form +and function to GitHub's pull requests, so anybody who is already familiar +with the latter should be able to immediately utilize the former. + +Once submitted, a conversation about the change can be had between the +submitter and reviewer. This includes both general comments, and inline +comments attached to a particular line of the diff between the source and +target branches. Projects can also be configured to automatically run +continuous integration on the submitted branch, the results of which are +readily visible from the merge request page. Thus both the reviewer and +submitter can immediately see the results of the tests, making it much easier +to only land branches with passing tests. Each new push to the source branch +(e.g. to respond to a commenter's feedback or to fix a failing test) results +in a new run of the CI, so that the state of the request always reflects the +latest commit. + +Merge requests have a fairly major advantage over the older "submit a patch to +a bug tracker" model. They allow developers to work completely within the VCS +using standard VCS tooling, without requiring the creation of a patch file or +figuring out the right location to upload the patch to. This lowers the +barrier for sending a change to be reviewed. + +Merge requests are far easier to review. For example, they provide nice +syntax highlighted diffs which can operate in either unified or side by side +views. They allow commenting inline and on the merge request as a whole and +they present that in a nice unified way which will also hide comments which no +longer apply. Comments can be hidden and revealed. + +Actually merging a merge request is quite simple, if the source branch applies +cleanly to the target branch. A core reviewer simply needs to press the +"Merge" button for GitLab to automatically perform the merge. The source +branch can be optionally rebased, and once the merge is completed, the source +branch can be automatically deleted. + +GitLab also has a good workflow for submitting pull requests to a project +completely through their web interface. This would enable the Python +documentation to have "Edit on GitLab" buttons on every page and people who +discover things like typos, inaccuracies, or just want to make improvements to +the docs they are currently reading. They can simply hit that button and get +an in browser editor that will let them make changes and submit a merge +request all from the comfort of their browser. + + +Criticism +========= + +X is not written in Python +-------------------------- + +One feature that the current tooling (Mercurial, Rietveld) has is that the +primary language for all of the pieces are written in Python. This PEP +focuses more on the *best* tools for the job and not necessarily on the *best* +tools that happen to be written in Python. Volunteer time is the most +precious resource for any open source project and we can best respect and +utilize that time by focusing on the benefits and downsides of the tools +themselves rather than what language their authors happened to write them in. + +One concern is the ability to modify tools to work for us, however one of the +Goals here is to *not* modify software to work for us and instead adapt +ourselves to a more standardized workflow. This standardization pays off in +the ability to re-use tools out of the box freeing up developer time to +actually work on Python itself as well as enabling knowledge sharing between +projects. + +However if we do need to modify the tooling, Git itself is largely written in +C the same as CPython itself. It can also have commands written for it using +any language, including Python. GitLab itself is largely written in Ruby and +since it is Open Source software, we would have the ability to submit merge +requests to the upstream Community Edition, albeit in language potentially +unfamiliar to most Python programmers. + + +Mercurial is better than Git +---------------------------- + +Whether Mercurial or Git is better on a technical level is a highly subjective +opinion. This PEP does not state whether the mechanics of Git or Mercurial +are better, and instead focuses on the network effect that is available for +either option. While this PEP proposes switching to Git, Mercurial users are +not left completely out of the loop. By using the hg-git extension for +Mercurial, working with server-side Git repositories is fairly easy and +straightforward. + + +CPython Workflow is too Complicated +----------------------------------- + +One sentiment that came out of previous discussions was that the multi-branch +model of CPython was too complicated for GitLab style merge requests. This +PEP disagrees with that sentiment. + +Currently any particular change requires manually creating a patch for 2.7 and +3.x which won't change at all in this regards. + +If someone submits a fix for the current stable branch (e.g. 3.5) the merge +request workflow can be used to create a request to merge the current stable +branch into the master branch, assuming there is no merge conflicts. As +always, merge conflicts must be manually and locally resolved. Because +developers also have the *option* of performing the merge locally, this +provides an improvement over the current situation where the merge *must* +always happen locally. + +For fixes in the current development branch that must also be applied to +stable release branches, it is possible in many situations to locally cherry +pick and apply the change to other branches, with merge requests submitted for +each stable branch. It is also possible just cherry pick and complete the +merge locally. These are all accomplished with standard Git commands and +techniques, with the advantage that all such changes can go through the review +and CI test workflows, even for merges to stable branches. Minor changes may +be easily accomplished in the GitLab web editor. + +No system can hide all the complexities involved in maintaining several long +lived branches. The only thing that the tooling can do is make it as easy as +possible to submit and commit changes. + + +Open issues +=========== + +* What level of hosted support will GitLab offer? The PEP author has been in + contact with the GitLab CEO, with positive interest on their part. The + details of the hosting offer would have to be discussed. + +* What happens to Roundup and do we switch to the GitLab issue tracker? + Currently, this PEP is *not* suggesting we move from Roundup to GitLab + issues. We have way too much invested in Roundup right now and migrating + the data would be a huge effort. GitLab does support webhooks, so we will + probably want to use webhooks to integrate merges and other events with + updates to Roundup (e.g. to include pointers to commits, close issues, + etc. similar to what is currently done). + +* What happens to wiki.python.org? Nothing! While GitLab does support wikis + in repositories, there's no reason for us to migration our Moin wikis. + +* What happens to the existing GitHub mirrors? We'd probably want to + regenerate them once the official upstream branches are natively hosted in + Git. This may change commit ids, but after that, it should be easy to + mirror the official Git branches and repositories far and wide. + +* Where would the GitLab instance live? Physically, in whatever hosting + provider GitLab chooses. We would point gitlab.python.org (or + git.python.org?) to this host. + + +References +========== + +.. [#openhub-stats] `Open Hub Statistics ` +.. [#hg-git] `Hg-Git mercurial plugin ` +.. [#GitLab] `https://about.gitlab.com/` + + +Copyright +========= + +This document has been placed in the public domain. + + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/pep-3140.txt b/pep-3140.txt index aba08e1b9..55b51ab56 100644 --- a/pep-3140.txt +++ b/pep-3140.txt @@ -2,7 +2,7 @@ PEP: 3140 Title: str(container) should call str(item), not repr(item) Version: $Revision$ Last-Modified: $Date$ -Author: Oleg Broytmann , +Author: Oleg Broytman , Jim J. Jewett Discussions-To: python-3000@python.org Status: Rejected