2014-11-19 06:34:40 -05:00
|
|
|
PEP: 458
|
|
|
|
Title: Surviving a Compromise of PyPI
|
|
|
|
Version: $Revision$
|
|
|
|
Last-Modified: $Date$
|
|
|
|
Author: Trishank Karthik Kuppusamy <tk47@students.poly.edu>,
|
|
|
|
Donald Stufft <donald@stufft.io>,
|
|
|
|
Justin Cappos <jcappos@poly.edu>
|
|
|
|
Discussions-To: Distutils SIG <distutils-sig@python.org>
|
|
|
|
Status: Draft
|
|
|
|
Type: Standards Track
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
Created: 27-Sep-2013
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
========
|
|
|
|
|
|
|
|
This PEP describes how the Python Package Index (PyPI [1]_) may be integrated
|
|
|
|
with The Update Framework [2]_ (TUF). TUF was designed to be a plug-and-play
|
|
|
|
security add-on to a software updater or package manager. TUF provides
|
|
|
|
end-to-end security like SSL, but for software updates instead of HTTP
|
|
|
|
connections. The framework integrates best security practices such as
|
|
|
|
separating responsibilities, adopting the many-man rule for signing packages,
|
|
|
|
keeping signing keys offline, and revocation of expired or compromised signing
|
|
|
|
keys.
|
|
|
|
|
|
|
|
The proposed integration will render modern package managers such as pip [3]_
|
|
|
|
more secure against various types of security attacks on PyPI and protect users
|
|
|
|
against them. Even in the worst case where an attacker manages to compromise
|
|
|
|
PyPI itself, the damage is controlled in scope and limited in duration.
|
|
|
|
|
|
|
|
Specifically, this PEP will describe how PyPI processes should be adapted to
|
|
|
|
incorporate TUF metadata. It will not prescribe how package managers such as
|
|
|
|
pip should be adapted to install or update with TUF metadata projects from
|
|
|
|
PyPI.
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
=========
|
|
|
|
|
|
|
|
In January 2013, the Python Software Foundation (PSF) announced [4]_ that the
|
|
|
|
python.org wikis for Python, Jython, and the PSF were subjected to a security
|
|
|
|
breach which caused all of the wiki data to be destroyed on January 5 2013.
|
|
|
|
Fortunately, the PyPI infrastructure was not affected by this security breach.
|
|
|
|
However, the incident is a reminder that PyPI should take defensive steps to
|
|
|
|
protect users as much as possible in the event of a compromise. Attacks on
|
|
|
|
software repositories happen all the time [5]_. We must accept the possibility
|
|
|
|
of security breaches and prepare PyPI accordingly because it is a valuable
|
|
|
|
target used by thousands, if not millions, of people.
|
|
|
|
|
|
|
|
Before the wiki attack, PyPI used MD5 hashes to tell package managers such as
|
|
|
|
pip whether or not a package was corrupted in transit. However, the absence of
|
|
|
|
SSL made it hard for package managers to verify transport integrity to PyPI.
|
|
|
|
It was easy to launch a man-in-the-middle attack between pip and PyPI to change
|
|
|
|
package contents arbitrarily. This can be used to trick users into installing
|
|
|
|
malicious packages. After the wiki attack, several steps were proposed (some
|
|
|
|
of which were implemented) to deliver a much higher level of security than was
|
|
|
|
previously the case: requiring SSL to communicate with PyPI [6]_, restricting
|
|
|
|
project names [7]_, and migrating from MD5 to SHA-2 hashes [8]_.
|
|
|
|
|
|
|
|
These steps, though necessary, are insufficient because attacks are still
|
|
|
|
possible through other avenues. For example, a public mirror is trusted to
|
|
|
|
honestly mirror PyPI, but some mirrors may misbehave due to malice or accident.
|
|
|
|
Package managers such as pip are supposed to use signatures from PyPI to verify
|
|
|
|
packages downloaded from a public mirror [9]_, but none are known to actually
|
|
|
|
do so [10]_. Therefore, it is also wise to add more security measures to
|
|
|
|
detect attacks from public mirrors or content delivery networks [11]_ (CDNs).
|
|
|
|
|
|
|
|
Even though official mirrors are being deprecated on PyPI [12]_, there remain a
|
|
|
|
wide variety of other attack vectors on package managers [13]_. Among other
|
|
|
|
things, these attacks can crash client systems, cause obsolete packages to be
|
|
|
|
installed, or even allow an attacker to execute arbitrary code. In September
|
|
|
|
2013, we showed how the latest version of pip then was susceptible to these
|
|
|
|
attacks and how TUF could protect users against them [14]_.
|
|
|
|
|
|
|
|
Finally, PyPI allows for packages to be signed with GPG keys [15]_, although no
|
|
|
|
package manager is known to verify those signatures, thus negating much of the
|
|
|
|
benefits of having those signatures at all. Validating integrity through
|
|
|
|
cryptography is important, but issues such as immediate and secure key
|
|
|
|
revocation or specifying a required threshold number of signatures still
|
|
|
|
remain. Furthermore, GPG by itself does not immediately address the attacks
|
|
|
|
mentioned above.
|
|
|
|
|
|
|
|
In order to protect PyPI against infrastructure compromises, we propose
|
|
|
|
integrating PyPI with The Update Framework [2]_ (TUF).
|
|
|
|
|
|
|
|
|
|
|
|
Definitions
|
|
|
|
===========
|
|
|
|
|
|
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
|
|
|
|
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
|
|
|
|
interpreted as described in RFC 2119__.
|
|
|
|
|
|
|
|
__ http://www.ietf.org/rfc/rfc2119.txt
|
|
|
|
|
|
|
|
In order to keep this PEP focused solely on the application of TUF on PyPI, the
|
|
|
|
reader is assumed to already be familiar with the design principles of
|
|
|
|
TUF [2]_. It is also strongly RECOMMENDED that the reader be familiar with the
|
|
|
|
TUF specification [16]_.
|
|
|
|
|
|
|
|
* Projects: Projects are software components that are made available for
|
|
|
|
integration. Projects include Python libraries, frameworks, scripts, plugins,
|
|
|
|
applications, collections of data or other resources, and various
|
|
|
|
combinations thereof. Public Python projects are typically registered on the
|
|
|
|
Python Package Index [17]_.
|
|
|
|
|
|
|
|
* Releases: Releases are uniquely identified snapshots of a project [17]_.
|
|
|
|
|
|
|
|
* Distributions: Distributions are the packaged files which are used to publish
|
|
|
|
and distribute a release [17]_.
|
|
|
|
|
|
|
|
* Simple index: The HTML page which contains internal links to the
|
|
|
|
distributions of a project [17]_.
|
|
|
|
|
|
|
|
* Consistent snapshot: A set of TUF metadata and PyPI targets that capture the
|
|
|
|
complete state of all projects on PyPI as they were at some fixed point in
|
|
|
|
time.
|
|
|
|
|
|
|
|
* The *consistent-snapshot* (*release*) role: In order to prevent confusion due
|
|
|
|
to the different meanings of the term "release" as employed by PEP 426 [17]_
|
|
|
|
and the TUF specification [16]_, we rename the *release* role as the
|
|
|
|
*consistent-snapshot* role.
|
|
|
|
|
|
|
|
* Continuous delivery: A set of processes with which PyPI produces consistent
|
|
|
|
snapshots that can safely coexist and deleted independently [18]_.
|
|
|
|
|
|
|
|
* Developer: Either the owner or maintainer of a project who is allowed to
|
|
|
|
update the TUF metadata as well as distribution metadata and data for the
|
|
|
|
project.
|
|
|
|
|
|
|
|
* Online key: A key that MUST be stored on the PyPI server infrastructure.
|
|
|
|
This is usually to allow automated signing with the key. However, this means
|
|
|
|
that an attacker who compromises PyPI infrastructure will be able to read
|
|
|
|
these keys.
|
|
|
|
|
|
|
|
* Offline key: A key that MUST be stored off the PyPI infrastructure. This
|
|
|
|
prevents automated signing with the key. This means that an attacker who
|
|
|
|
compromises PyPI infrastructure will not be able to immediately read these
|
|
|
|
keys.
|
|
|
|
|
|
|
|
* Developer key: A private key for which its corresponding public key is
|
|
|
|
registered with PyPI to say that it is responsible for directly signing for
|
|
|
|
or delegating the distributions belonging to a project. For the purposes of
|
|
|
|
this PEP, it is offline in the sense that the private key MUST not be stored
|
|
|
|
on PyPI. However, the project is free to require certain developer keys to
|
|
|
|
be online on its own infrastructure.
|
|
|
|
|
|
|
|
* Threshold signature scheme: A role could increase its resilience to key
|
|
|
|
compromises by requiring that at least t out of n keys are REQUIRED to sign
|
|
|
|
its metadata. This means that a compromise of t-1 keys is insufficient to
|
|
|
|
compromise the role itself. We denote this property by saying that the role
|
|
|
|
requires (t, n) keys.
|
|
|
|
|
|
|
|
|
|
|
|
Overview
|
|
|
|
========
|
|
|
|
|
|
|
|
.. image:: https://raw.github.com/theupdateframework/pep-on-pypi-with-tuf/master/figure1.png
|
|
|
|
|
|
|
|
Figure 1: A simplified overview of the roles in PyPI with TUF
|
|
|
|
|
|
|
|
Figure 1 shows a simplified overview of the roles that TUF metadata assume on
|
|
|
|
PyPI. The top-level *root* role signs for the keys of the top-level
|
|
|
|
*timestamp*, *consistent-snapshot*, *targets* and *root* roles. The
|
|
|
|
*timestamp* role signs for a new and consistent snapshot. The *consistent-
|
|
|
|
snapshot* role signs for the *root*, *targets* and all delegated targets
|
|
|
|
metadata. The *claimed* role signs for all projects that have registered their
|
|
|
|
own developer keys with PyPI. The *recently-claimed* role signs for all
|
|
|
|
projects that recently registered their own developer keys with PyPI. Finally,
|
|
|
|
the *unclaimed* role signs for all projects that have not registered developer
|
|
|
|
keys with PyPI. The *claimed*, *recently-claimed* and *unclaimed* roles are
|
|
|
|
numbered 1, 2, 3 respectively because a project will be searched for in each of
|
|
|
|
those roles in that descending order: first in *claimed*, then in
|
|
|
|
*recently-claimed* if necessary, and finally in *unclaimed* if necessary.
|
|
|
|
|
|
|
|
Every year, PyPI administrators are going to sign for *root* role keys. After
|
|
|
|
that, automation will continuously sign for a timestamped, consistent snapshot
|
|
|
|
of all projects. Every few months, PyPI administrators will move projects with
|
|
|
|
vetted developer keys from the *recently-claimed* role to the *claimed* role.
|
|
|
|
As we will soon see, they will sign for *claimed* with projects with offline
|
|
|
|
keys.
|
|
|
|
|
|
|
|
This PEP does not require project developers to use TUF to secure their
|
|
|
|
packages from attacks on PyPI. By default, all projects will be signed for by
|
|
|
|
the *unclaimed* role. If a project wishes stronger security guarantees, then
|
|
|
|
the project is strongly RECOMMENDED to register developer keys with PyPI so
|
|
|
|
that it may sign for its own distributions. By doing so, the project must
|
|
|
|
remain as a *recently-claimed* project until PyPI administrators have had an
|
|
|
|
opportunity to vet the developer keys of the project, after which the project
|
|
|
|
will be moved to the *claimed* role.
|
|
|
|
|
|
|
|
This PEP has **not** been designed to be backward-compatible for package
|
|
|
|
managers that do not use the TUF security protocol to install or update a
|
|
|
|
project from the PyPI described here. Instead, it is RECOMMENDED that PyPI
|
|
|
|
maintain a backward-compatible API of itself that does NOT offer TUF so that
|
|
|
|
older package managers that do not use TUF will be able to install or update
|
|
|
|
projects from PyPI as usual but without any of the security offered by TUF.
|
|
|
|
For the rest of this PEP, we will assume that PyPI will simultaneously maintain
|
|
|
|
a backward-incompatible API of itself for package managers that MUST use TUF to
|
|
|
|
securely install or update projects. We think that this approach represents a
|
|
|
|
reasonable trade-off: older package managers that do not TUF will still be able
|
|
|
|
to install or update projects without any TUF security from PyPI, and newer
|
|
|
|
package managers that do use TUF will be able to securely install or update
|
|
|
|
projects. At some point in the future, PyPI administrators MAY choose to
|
|
|
|
permanently deprecate the backward-compatible version of itself that does not
|
|
|
|
offer TUF metadata.
|
|
|
|
|
|
|
|
Unless a mirror, CDN or the PyPI repository has been compromised, the end-user
|
|
|
|
will not be able to discern whether or not a package manager is using TUF to
|
|
|
|
install or update a project from PyPI.
|
|
|
|
|
|
|
|
|
|
|
|
Responsibility Separation
|
|
|
|
=========================
|
|
|
|
|
|
|
|
Recall that TUF requires four top-level roles: *root*, *timestamp*,
|
|
|
|
*consistent-snapshot* and *targets*. The *root* role specifies the keys of all
|
|
|
|
the top-level roles (including itself). The *timestamp* role specifies the
|
|
|
|
latest consistent snapshot. The *consistent-snapshot* role specifies the
|
|
|
|
latest versions of all TUF metadata files (other than *timestamp*). The
|
|
|
|
*targets* role specifies available target files (in our case, it will be all
|
|
|
|
files on PyPI under the /simple and /packages directories). In this PEP, each
|
|
|
|
of these roles will serve their responsibilities without exception.
|
|
|
|
|
|
|
|
Our proposal offers two levels of security to developers. If developers opt in
|
|
|
|
to secure their projects with their own developer keys, then their projects
|
|
|
|
will be very secure. Otherwise, TUF will still protect them in many cases:
|
|
|
|
|
|
|
|
1. Minimum security (no action by a developer): protects *unclaimed* and
|
|
|
|
*recently-claimed* projects without developer keys from CDNs [19]_ or public
|
|
|
|
mirrors, but not from some PyPI compromises. This is because continuous
|
|
|
|
delivery requires some keys to be online. This level of security protects
|
|
|
|
projects from being accidentally or deliberately tampered with by a mirror
|
|
|
|
or a CDN because the mirror or CDN will not have any of the PyPI or
|
|
|
|
developer keys required to sign for projects. However, it would not protect
|
|
|
|
projects from attackers who have compromised PyPI because they will be able
|
|
|
|
to manipulate the TUF metadata for *unclaimed* projects with the appropriate
|
|
|
|
online keys.
|
|
|
|
|
|
|
|
2. Maximum security (developer signs their project): protects projects with
|
|
|
|
developer keys not only from CDNs or public mirrors, but also from some PyPI
|
|
|
|
compromises. This is because many important keys will be offline. This
|
|
|
|
level of security protects projects from being accidentally or deliberately
|
|
|
|
tampered with by a mirror or a CDN for reasons identical to the minimum
|
|
|
|
security level. It will also protect projects (or at least mitigate
|
|
|
|
damages) from the most likely attacks on PyPI. For example: given access to
|
|
|
|
online keys after a PyPI compromise, attackers will be able to freeze the
|
|
|
|
distributions for these projects, but they will not be able to serve
|
|
|
|
malicious distributions for these projects (not without compromising other
|
|
|
|
offline keys which would entail more risk, time and energy). Details for
|
|
|
|
the exact level of security offered is discussed in the section on key
|
|
|
|
management.
|
|
|
|
|
|
|
|
In order to complete support for continuous delivery, we propose three
|
|
|
|
delegated targets roles:
|
|
|
|
|
|
|
|
1. *claimed*: Signs for the delegation of PyPI projects to their respective
|
|
|
|
developer keys.
|
|
|
|
|
|
|
|
2. *recently-claimed*: This role is almost identical to the *claimed* role and
|
|
|
|
could technically be performed by the *unclaimed* role, but there are two
|
|
|
|
important reasons why it exists independently: the first reason is to
|
|
|
|
improve the performance of looking up projects in the *unclaimed* role (by
|
|
|
|
moving metadata to the *recently-claimed* role instead), and the second
|
|
|
|
reason is to make it easier for PyPI administrators to move
|
|
|
|
*recently-claimed* projects to the *claimed* role.
|
|
|
|
|
|
|
|
3. *unclaimed*: Signs for PyPI projects without developer keys.
|
|
|
|
|
|
|
|
The *targets* role MUST delegate all PyPI projects to the three delegated
|
|
|
|
targets roles in the order of appearance listed above. This means that when
|
|
|
|
pip downloads with TUF a distribution from a project on PyPI, it will first
|
|
|
|
consult the *claimed* role about it. If the *claimed* role has delegated the
|
|
|
|
project, then pip will trust the project developers (in order of delegation)
|
|
|
|
about the TUF metadata for the project. Otherwise, pip will consult the
|
|
|
|
*recently-claimed* role about the project. If the *recently-claimed* role has
|
|
|
|
delegated the project, then pip will trust the project developers (in order of
|
|
|
|
delegation) about the TUF metadata for the project. Otherwise, pip will
|
|
|
|
consult the *unclaimed* role about the TUF metadata for the project. If the
|
|
|
|
*unclaimed* role has not delegated the project, then the project is considered
|
|
|
|
to be non-existent on PyPI.
|
|
|
|
|
|
|
|
A PyPI project MAY begin without registering a developer key. Therefore, the
|
|
|
|
project will be signed for by the *unclaimed* role. After registering
|
|
|
|
developer keys, the project will be removed from the *unclaimed* role and
|
|
|
|
delegated to the *recently-claimed* role. After a probation period and a
|
|
|
|
vetting process to verify the developer keys of the project, the project will
|
|
|
|
be removed from the *recently-claimed* role and delegated to the *claimed*
|
|
|
|
role.
|
|
|
|
|
|
|
|
The *claimed* role offers maximum security, whereas the *recently-claimed* and
|
|
|
|
*unclaimed* role offer minimum security. All three roles support continuous
|
|
|
|
delivery of PyPI projects.
|
|
|
|
|
|
|
|
The *unclaimed* role offers minimum security because PyPI will sign for
|
|
|
|
projects without developer keys with an online key in order to permit
|
|
|
|
continuous delivery.
|
|
|
|
|
|
|
|
The *recently-claimed* role offers minimum security because while the project
|
|
|
|
developers will sign for their own distributions with offline developer keys,
|
|
|
|
PyPI will sign with an online key the delegation of the project to those
|
|
|
|
offline developer keys. The signing of the delegation with an online key
|
|
|
|
allows PyPI administrators to continuously deliver projects without having to
|
|
|
|
continuously sign the delegation whenever one of those projects registers
|
|
|
|
developer keys.
|
|
|
|
|
|
|
|
Finally, the *claimed* role offers maximum security because PyPI will sign with
|
|
|
|
offline keys the delegation of a project to its offline developer keys. This
|
|
|
|
means that every now and then, PyPI administrators will vet developer keys and
|
|
|
|
sign the delegation of a project to those developer keys after being reasonably
|
|
|
|
sure about the ownership of the developer keys. The process for vetting
|
|
|
|
developer keys is out of the scope of this PEP.
|
|
|
|
|
|
|
|
|
|
|
|
Metadata Management
|
|
|
|
===================
|
|
|
|
|
|
|
|
In this section, we examine the TUF metadata that PyPI must manage by itself,
|
|
|
|
and other TUF metadata that must be safely delegated to projects. Examples of
|
|
|
|
the metadata described here may be seen at our testbed mirror of
|
|
|
|
`PyPI-with-TUF`__.
|
|
|
|
|
|
|
|
__ http://mirror1.poly.edu/
|
|
|
|
|
|
|
|
The metadata files that change most frequently will be *timestamp*,
|
|
|
|
*consistent-snapshot* and delegated targets (*claimed*, *recently-claimed*,
|
|
|
|
*unclaimed*, project) metadata. The *timestamp* and *consistent-snapshot*
|
|
|
|
metadata MUST be updated whenever *root*, *targets* or delegated targets
|
|
|
|
metadata are updated. Observe, though, that *root* and *targets* metadata are
|
|
|
|
much less likely to be updated as often as delegated targets metadata.
|
|
|
|
Therefore, *timestamp* and *consistent-snapshot* metadata will most likely be
|
|
|
|
updated frequently (possibly every minute) due to delegated targets metadata
|
|
|
|
being updated frequently in order to drive continuous delivery of projects.
|
|
|
|
|
|
|
|
Consequently, the processes with which PyPI updates projects will have to be
|
|
|
|
updated accordingly, the details of which are explained in the following
|
|
|
|
subsections.
|
|
|
|
|
|
|
|
|
|
|
|
Why Do We Need Consistent Snapshots?
|
|
|
|
------------------------------------
|
|
|
|
|
|
|
|
In an ideal world, metadata and data should be immediately updated and
|
|
|
|
presented whenever a project is updated. In practice, there will be problems
|
|
|
|
when there are many readers and writers who access the same metadata or data at
|
|
|
|
the same time.
|
|
|
|
|
|
|
|
An important example at the time of writing is that, mirrors are very likely,
|
|
|
|
as far as we can tell, to update in an inconsistent manner from PyPI as it is
|
|
|
|
without TUF. Specifically, a mirror would update itself in such a way that
|
|
|
|
project A would be from time T, whereas project B would be from time T+5,
|
|
|
|
project C would be from time T+3, and so on where T is the time that the mirror
|
|
|
|
first begun updating itself. There is no known way for a mirror to update
|
|
|
|
itself such that it captures the state of all projects as they were at time T.
|
|
|
|
|
|
|
|
Adding TUF to PyPI will not automatically solve the problem. Consider what we
|
|
|
|
call the `"inverse replay" or "fast-forward" problem`__. Suppose that PyPI has
|
|
|
|
timestamped a consistent snapshot at version 1. A mirror is later in the
|
|
|
|
middle of copying PyPI at this snapshot. While the mirror is copying PyPI at
|
|
|
|
this snapshot, PyPI timestamps a new snapshot at, say, version 2. Without
|
|
|
|
accounting for consistency, the mirror would then find itself with a copy of
|
|
|
|
PyPI in an inconsistent state which is indistinguishable from arbitrary
|
|
|
|
metadata or target attacks. The problem would also apply when the mirror is
|
|
|
|
substituted with a pip user.
|
|
|
|
|
|
|
|
__ https://groups.google.com/forum/#!topic/theupdateframework/8mkR9iqivQA
|
|
|
|
|
|
|
|
Therefore, the problem can be summarized as such: there are problems of
|
|
|
|
consistency on PyPI with or without TUF. TUF requires its metadata to be
|
|
|
|
consistent with the data, but how would the metadata be kept consistent with
|
|
|
|
projects that change all the time?
|
|
|
|
|
|
|
|
As a result, we will solve for PyPI the problem of producing a consistent
|
|
|
|
snapshot that captures the state of all known projects at a given time. Each
|
|
|
|
consistent snapshot can safely coexist with any other consistent snapshot and
|
|
|
|
deleted independently without affecting any other consistent snapshot.
|
|
|
|
|
|
|
|
The gist of the solution is that every metadata or data file written to disk
|
|
|
|
MUST include in its filename the `cryptographic hash`__ of the file. How would
|
|
|
|
this help clients which use the TUF protocol to securely and consistently
|
|
|
|
install or update a project from PyPI?
|
|
|
|
|
|
|
|
__ https://en.wikipedia.org/wiki/Cryptographic_hash_function
|
|
|
|
|
|
|
|
Recall that the first step in the TUF protocol requires the client to download
|
|
|
|
the latest *timestamp* metadata. However, the client would not know in advance
|
|
|
|
the hash of the *timestamp* metadata file from the latest consistent snapshot.
|
|
|
|
Therefore, PyPI MUST redirect all HTTP GET requests for *timestamp* metadata to
|
|
|
|
the *timestamp* metadata file from the latest consistent snapshot. Since the
|
|
|
|
*timestamp* metadata is the root of a tree of cryptographic hashes pointing to
|
|
|
|
every other metadata or target file that are meant to exist together for
|
|
|
|
consistency, the client is then able to retrieve any file from this consistent
|
|
|
|
snapshot by deterministically including, in the request for the file, the hash
|
|
|
|
of the file in the filename. Assuming infinite disk space and no `hash
|
|
|
|
collisions`__, a client may safely read from one consistent snapshot while PyPI
|
|
|
|
produces another consistent snapshot.
|
|
|
|
|
|
|
|
__ https://en.wikipedia.org/wiki/Collision_(computer_science)
|
|
|
|
|
|
|
|
In this simple but effective manner, we are able to capture a consistent
|
|
|
|
snapshot of all projects and the associated metadata at a given time. The next
|
|
|
|
subsection will explicate the implementation details of this idea.
|
|
|
|
|
|
|
|
|
|
|
|
Producing Consistent Snapshots
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
Given a project, PyPI is responsible for updating, depending on the project,
|
|
|
|
either the *claimed*, *recently-claimed* or *unclaimed* metadata as well as
|
|
|
|
associated delegated targets metadata. Every project MUST upload its set of
|
|
|
|
metadata and targets in a single transaction. We will call this set of files
|
|
|
|
the project transaction. We will discuss later how PyPI MAY validate the files
|
|
|
|
in a project transaction. For now, let us focus on how PyPI will respond to a
|
|
|
|
project transaction. We will call this response the project transaction
|
|
|
|
process. There will also be a consistent snapshot process that we will define
|
|
|
|
momentarily; for now, it suffices to know that project transaction processes
|
|
|
|
and the consistent snapshot process must coordinate with each other.
|
|
|
|
|
|
|
|
Also, every metadata and target file MUST include in its filename the `hex
|
|
|
|
digest`__ of its `SHA-256`__ hash. For this PEP, it is RECOMMENDED that PyPI
|
|
|
|
adopt a simple convention of the form filename.digest.ext, where filename is
|
|
|
|
the original filename without a copy of the hash, digest is the hex digest of
|
|
|
|
the hash, and ext is the filename extension.
|
|
|
|
|
|
|
|
__ http://docs.python.org/2/library/hashlib.html#hashlib.hash.hexdigest
|
|
|
|
__ https://en.wikipedia.org/wiki/SHA-2
|
|
|
|
|
|
|
|
When an *unclaimed* project uploads a new transaction, a project transaction
|
|
|
|
process MUST add all new targets and relevant delegated *unclaimed* metadata.
|
|
|
|
(We will see later in this section why the *unclaimed* role will delegate
|
|
|
|
targets to a number of delegated *unclaimed* roles.) Finally, the project
|
|
|
|
transaction process MUST inform the consistent snapshot process about new
|
|
|
|
delegated *unclaimed* metadata.
|
|
|
|
|
|
|
|
When a *recently-claimed* project uploads a new a transaction, a project
|
|
|
|
transaction process MUST add all new targets and delegated targets metadata for
|
|
|
|
the project. If the project is new, then the project transaction process MUST
|
|
|
|
also add new *recently-claimed* metadata with public keys and threshold number
|
|
|
|
(which MUST be part of the transaction) for the project. Finally, the project
|
|
|
|
transaction process MUST inform the consistent snapshot process about new
|
|
|
|
*recently-claimed* metadata as well as the current set of delegated targets
|
|
|
|
metadata for the project.
|
|
|
|
|
|
|
|
The process for a *claimed* project is slightly different. The difference is
|
|
|
|
that PyPI administrators will choose to move the project from the
|
|
|
|
*recently-claimed* role to the *claimed* role. A project transaction process
|
|
|
|
MUST then add new *recently-claimed* and *claimed* metadata to reflect this
|
|
|
|
migration. As is the case for a *recently-claimed* project, the project
|
|
|
|
transaction process MUST always add all new targets and delegated targets
|
|
|
|
metadata for the *claimed* project. Finally, the project transaction process
|
|
|
|
MUST inform the consistent snapshot process about new *recently-claimed* or
|
|
|
|
*claimed* metadata as well as the current set of delegated targets metadata for
|
|
|
|
the project.
|
|
|
|
|
|
|
|
Project transaction processes SHOULD be automated, except when PyPI
|
|
|
|
administrators move a project from the *recently-claimed* role to the *claimed*
|
|
|
|
role. Project transaction processes MUST also be applied atomically: either
|
|
|
|
all metadata and targets, or none of them, are added. The project transaction
|
|
|
|
processes and consistent snapshot process SHOULD work concurrently. Finally,
|
|
|
|
project transaction processes SHOULD keep in memory the latest *claimed*,
|
|
|
|
*recently-claimed* and *unclaimed* metadata so that they will be correctly
|
|
|
|
updated in new consistent snapshots.
|
|
|
|
|
|
|
|
All project transactions MAY be placed in a single queue and processed
|
|
|
|
serially. Alternatively, the queue MAY be processed concurrently in order of
|
|
|
|
appearance provided that the following rules are observed:
|
|
|
|
|
|
|
|
1. No pair of project transaction processes must concurrently work on the same
|
|
|
|
project.
|
|
|
|
|
|
|
|
2. No pair of project transaction processes must concurrently work on
|
|
|
|
*unclaimed* projects that belong to the same delegated *unclaimed* targets
|
|
|
|
role.
|
|
|
|
|
|
|
|
3. No pair of project transaction processes must concurrently work on new
|
|
|
|
*recently-claimed* projects.
|
|
|
|
|
|
|
|
4. No pair of project transaction processes must concurrently work on new
|
|
|
|
*claimed* projects.
|
|
|
|
|
|
|
|
5. No project transaction process must work on a new *claimed* project while
|
|
|
|
another project transaction process is working on a new *recently-claimed*
|
|
|
|
project and vice versa.
|
|
|
|
|
|
|
|
These rules MUST be observed so that metadata is not read from or written to
|
|
|
|
inconsistently.
|
|
|
|
|
|
|
|
The consistent snapshot process is fairly simple and SHOULD be automated. The
|
|
|
|
consistent snapshot process MUST keep in memory the latest working set of
|
|
|
|
*root*, *targets* and delegated targets metadata. Every minute or so, the
|
|
|
|
consistent snapshot process will sign for this latest working set. (Recall
|
|
|
|
that project transaction processes continuously inform the consistent snapshot
|
|
|
|
process about the latest delegated targets metadata in a concurrency-safe
|
|
|
|
manner. The consistent snapshot process will actually sign for a copy of the
|
|
|
|
latest working set while the actual latest working set in memory will be
|
|
|
|
updated with information continuously communicated by project transaction
|
|
|
|
processes.) Next, the consistent snapshot process MUST generate and sign new
|
|
|
|
*timestamp* metadata that will vouch for the *consistent-snapshot* metadata
|
|
|
|
generated in the previous step. Finally, the consistent snapshot process MUST
|
|
|
|
add new *timestamp* and *consistent-snapshot* metadata representing the latest
|
|
|
|
consistent snapshot.
|
|
|
|
|
|
|
|
A few implementation notes are now in order. So far, we have seen only that
|
|
|
|
new metadata and targets are added, but not that old metadata and targets are
|
|
|
|
removed. Practical constraints are such that eventually PyPI will run out of
|
|
|
|
disk space to produce a new consistent snapshot. In that case, PyPI MAY then
|
|
|
|
use something like a "mark-and-sweep" algorithm to delete sufficiently old
|
|
|
|
consistent snapshots: in order to preserve the latest consistent snapshot, PyPI
|
|
|
|
would walk objects beginning from the root (*timestamp*) of the latest
|
|
|
|
consistent snapshot, mark all visited objects, and delete all unmarked
|
|
|
|
objects. The last few consistent snapshots may be preserved in a similar
|
|
|
|
fashion. Deleting a consistent snapshot will cause clients to see nothing
|
|
|
|
thereafter but HTTP 404 responses to any request for a file in that consistent
|
|
|
|
snapshot. Clients SHOULD then retry their requests with the latest consistent
|
|
|
|
snapshot.
|
|
|
|
|
|
|
|
We do **not** consider updates to any consistent snapshot because `hash
|
|
|
|
collisions`__ are out of the scope of this PEP. In case a hash collision is
|
|
|
|
observed, PyPI MAY wish to check that the file being added is identical to the
|
|
|
|
file already stored. (Should a hash collision be observed, it is far more
|
|
|
|
likely the case that the file is identical rather than being a genuine
|
|
|
|
`collision attack`__.) Otherwise, PyPI MAY either overwrite the existing file
|
|
|
|
or ignore any write operation to an existing file.
|
|
|
|
|
|
|
|
__ https://en.wikipedia.org/wiki/Collision_(computer_science)
|
|
|
|
__ https://en.wikipedia.org/wiki/Collision_attack
|
|
|
|
|
|
|
|
All clients, such as pip using the TUF protocol, MUST be modified to download
|
|
|
|
every metadata and target file (except for *timestamp* metadata) by including,
|
|
|
|
in the request for the file, the hash of the file in the filename. Following
|
|
|
|
the filename convention recommended earlier, a request for the file at
|
|
|
|
filename.ext will be transformed to the equivalent request for the file at
|
|
|
|
filename.digest.ext.
|
|
|
|
|
|
|
|
Finally, PyPI SHOULD use a `transaction log`__ to record project transaction
|
|
|
|
processes and queues so that it will be easier to recover from errors after a
|
|
|
|
server failure.
|
|
|
|
|
|
|
|
__ https://en.wikipedia.org/wiki/Transaction_log
|
|
|
|
|
|
|
|
|
|
|
|
Metadata Validation
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
A *claimed* or *recently-claimed* project will need to upload in its
|
|
|
|
transaction to PyPI not just targets (a simple index as well as distributions)
|
|
|
|
but also TUF metadata. The project MAY do so by uploading a ZIP file
|
|
|
|
containing two directories, /metadata/ (containing delegated targets metadata
|
|
|
|
files) and /targets/ (containing targets such as the project simple index and
|
|
|
|
distributions which are signed for by the delegated targets metadata).
|
|
|
|
|
|
|
|
Whenever the project uploads metadata or targets to PyPI, PyPI SHOULD check the
|
|
|
|
project TUF metadata for at least the following properties:
|
|
|
|
|
|
|
|
* A threshold number of the developers keys registered with PyPI by that
|
|
|
|
project MUST have signed for the delegated targets metadata file that
|
|
|
|
represents the "root" of targets for that project (e.g. metadata/targets/
|
|
|
|
project.txt).
|
|
|
|
|
|
|
|
* The signatures of delegated targets metadata files MUST be valid.
|
|
|
|
|
|
|
|
* The delegated targets metadata files MUST NOT be expired.
|
|
|
|
|
|
|
|
* The delegated targets metadata MUST be consistent with the targets.
|
|
|
|
|
|
|
|
* A delegator MUST NOT delegate targets that were not delegated to itself by
|
|
|
|
another delegator.
|
|
|
|
|
|
|
|
* A delegatee MUST NOT sign for targets that were not delegated to itself by a
|
|
|
|
delegator.
|
|
|
|
|
|
|
|
* Every file MUST contain a unique copy of its hash in its filename following
|
|
|
|
the filename.digest.ext convention recommended earlier.
|
|
|
|
|
|
|
|
If PyPI chooses to check the project TUF metadata, then PyPI MAY choose to
|
|
|
|
reject publishing any set of metadata or targets that do not meet these
|
|
|
|
requirements.
|
|
|
|
|
|
|
|
PyPI MUST enforce access control by ensuring that each project can only write
|
|
|
|
to the TUF metadata for which it is responsible. It MUST do so by ensuring
|
|
|
|
that project transaction processes write to the correct metadata as well as
|
|
|
|
correct locations within those metadata. For example, a project transaction
|
|
|
|
process for an *unclaimed* project MUST write to the correct target paths in
|
|
|
|
the correct delegated *unclaimed* metadata for the targets of the project.
|
|
|
|
|
|
|
|
On rare occasions, PyPI MAY wish to extend the TUF metadata format for projects
|
|
|
|
in a backward-incompatible manner. Note that PyPI will NOT be able to
|
|
|
|
automatically rewrite existing TUF metadata on behalf of projects in order to
|
|
|
|
upgrade the metadata to the new backward-incompatible format because this would
|
|
|
|
invalidate the signatures of the metadata as signed by developer keys.
|
|
|
|
Instead, package managers SHOULD be written to recognize and handle multiple
|
|
|
|
incompatible versions of TUF metadata so that *claimed* and *recently-claimed*
|
|
|
|
projects could be offered a reasonable time to migrate their metadata to newer
|
|
|
|
but backward-incompatible formats.
|
|
|
|
|
|
|
|
The details of how each project manages its TUF metadata is beyond the scope of
|
|
|
|
this PEP.
|
|
|
|
|
|
|
|
|
|
|
|
Mirroring Protocol
|
|
|
|
------------------
|
|
|
|
|
|
|
|
The mirroring protocol as described in PEP 381 [9]_ SHOULD change to mirror
|
|
|
|
PyPI with TUF.
|
|
|
|
|
|
|
|
A mirror SHOULD have to maintain for its clients only one consistent snapshot
|
|
|
|
which would represent the latest consistent snapshot from PyPI known to the
|
|
|
|
mirror. The mirror would then serve all HTTP requests for metadata or targets
|
|
|
|
by simply reading directly from this consistent snapshot directory.
|
|
|
|
|
|
|
|
The mirroring protocol itself is fairly simple. The mirror would ask PyPI for
|
|
|
|
*timestamp* metadata from the latest consistent snapshot and proceed to copy
|
|
|
|
the entire consistent snapshot from the *timestamp* metadata onwards. If the
|
|
|
|
mirror encounters a failure to copy any metadata or target file while copying
|
|
|
|
the consistent snapshot, it SHOULD retrying resuming the copy of that
|
|
|
|
particular consistent snapshot. If PyPI has deleted that consistent snapshot,
|
|
|
|
then the mirror SHOULD delete the failed consistent snapshot and try
|
|
|
|
downloading the latest consistent snapshot instead.
|
|
|
|
|
|
|
|
The mirror SHOULD point users to a previous consistent snapshot directory while
|
|
|
|
it is copying the latest consistent snapshot from PyPI. Only after the latest
|
|
|
|
consistent snapshot has been completely copied SHOULD the mirror switch clients
|
|
|
|
to the latest consistent snapshot. The mirror MAY then delete the previous
|
|
|
|
consistent snapshot once it finds that no client is reading from the previous
|
|
|
|
consistent snapshot.
|
|
|
|
|
|
|
|
The mirror MAY use extant file transfer software such as rsync__ to mirror
|
|
|
|
PyPI. In that case, the mirror MUST first obtain the latest known timestamp
|
|
|
|
metadata from PyPI. The mirror MUST NOT immediately publish the latest known
|
|
|
|
timestamp metadata from PyPI. Instead, the mirror MUST first iteratively
|
|
|
|
transfer all new files from PyPI until there are no new files left to transfer.
|
|
|
|
Finally, the mirror MUST publish the latest known timestamp it fetched from
|
|
|
|
PyPI so that package managers such as pip may be directed to the latest
|
|
|
|
consistent snapshot known to the mirror.
|
|
|
|
|
|
|
|
__ https://rsync.samba.org/
|
|
|
|
|
|
|
|
|
|
|
|
Backup Process
|
|
|
|
--------------
|
|
|
|
|
|
|
|
In order to be able to safely restore from static snapshots later in the event
|
|
|
|
of a compromise, PyPI SHOULD maintain a small number of its own mirrors to copy
|
|
|
|
PyPI consistent snapshots according to some schedule. The mirroring protocol
|
|
|
|
can be used immediately for this purpose. The mirrors must be secured and
|
|
|
|
isolated such that they are responsible only for mirroring PyPI. The mirrors
|
|
|
|
can be checked against one another to detect accidental or malicious failures.
|
|
|
|
|
|
|
|
|
|
|
|
Metadata Expiry Times
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
The *root* and *targets* role metadata SHOULD expire in a year, because these
|
|
|
|
metadata files are expected to change very rarely.
|
|
|
|
|
|
|
|
The *claimed* role metadata SHOULD expire in three to six months, because this
|
|
|
|
metadata is expected to be refreshed in that time frame. This time frame was
|
|
|
|
chosen to induce an easier administration process for PyPI.
|
|
|
|
|
|
|
|
The *timestamp*, *consistent-snapshot*, *recently-claimed* and *unclaimed* role
|
|
|
|
metadata SHOULD expire in a day because a CDN or mirror SHOULD synchronize
|
|
|
|
itself with PyPI every day. Furthermore, this generous time frame also takes
|
|
|
|
into account client clocks that are highly skewed or adrift.
|
|
|
|
|
|
|
|
The expiry times for the delegated targets metadata of a project is beyond the
|
|
|
|
scope of this PEP.
|
|
|
|
|
|
|
|
|
|
|
|
Metadata Scalability
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
Due to the growing number of projects and distributions, the TUF metadata will
|
|
|
|
also grow correspondingly.
|
|
|
|
|
|
|
|
For example, consider the *unclaimed* role. In August 2013, we found that the
|
|
|
|
size of the *unclaimed* role metadata was about 42MB if the *unclaimed* role
|
|
|
|
itself signed for about 220K PyPI targets (which are simple indices and
|
|
|
|
distributions). We will not delve into details in this PEP, but TUF features a
|
|
|
|
so-called "`lazy bin walk`__" scheme which splits a large targets or delegated
|
|
|
|
targets metadata file into many small ones. This allows a TUF client updater
|
|
|
|
to intelligently download only a small number of TUF metadata files in order to
|
|
|
|
update any project signed for by the *unclaimed* role. For example, applying
|
|
|
|
this scheme to the previous repository resulted in pip downloading between
|
|
|
|
1.3KB and 111KB to install or upgrade a PyPI project via TUF.
|
|
|
|
|
|
|
|
__ https://github.com/theupdateframework/tuf/issues/39
|
|
|
|
|
|
|
|
From our findings as of the time of writing, PyPI SHOULD split all targets in
|
|
|
|
the *unclaimed* role by delegating it to 1024 delegated targets role, each of
|
|
|
|
which would sign for PyPI targets whose hashes fall into that "bin" or
|
|
|
|
delegated targets role. We found that 1024 bins would result in the
|
|
|
|
*unclaimed* role metadata and each of its binned delegated targets role
|
|
|
|
metadata to be about the same size (40-50KB) for about 220K PyPI targets
|
|
|
|
(simple indices and distributions).
|
|
|
|
|
|
|
|
It is possible to make the TUF metadata more compact by representing it in a
|
|
|
|
binary format as opposed to the JSON text format. Nevertheless, we believe
|
|
|
|
that a sufficiently large number of project and distributions will induce
|
|
|
|
scalability challenges at some point, and therefore the *unclaimed* role will
|
|
|
|
then still need delegations in order to address the problem. Furthermore, the
|
|
|
|
JSON format is an open and well-known standard for data interchange.
|
|
|
|
|
|
|
|
Due to the large number of delegated target metadata files, compressed versions
|
|
|
|
of *consistent-snapshot* metadata SHOULD also be made available.
|
|
|
|
|
|
|
|
|
|
|
|
Key Management
|
|
|
|
==============
|
|
|
|
|
|
|
|
In this section, we examine the kind of keys required to sign for TUF roles on
|
|
|
|
PyPI. TUF is agnostic with respect to choices of digital signature algorithms.
|
|
|
|
For the purpose of discussion, we will assume that most digital signatures will
|
|
|
|
be produced with the well-tested and tried RSA algorithm [20]_. Nevertheless,
|
|
|
|
we do NOT recommend any particular digital signature algorithm in this PEP
|
|
|
|
because there are a few important constraints: firstly, cryptography changes
|
|
|
|
over time; secondly, package managers such as pip may wish to perform signature
|
|
|
|
verification in Python, without resorting to a compiled C library, in order to
|
|
|
|
be able to run on as many systems as Python supports; finally, TUF recommends
|
|
|
|
diversity of keys for certain applications, and we will soon discuss these
|
|
|
|
exceptions.
|
|
|
|
|
|
|
|
|
|
|
|
Number Of Keys
|
|
|
|
--------------
|
|
|
|
|
|
|
|
The *timestamp*, *consistent-snapshot*, *recently-claimed* and *unclaimed*
|
|
|
|
roles will need to support continuous delivery. Even though their respective
|
|
|
|
keys will then need to be online, we will require that the keys be independent
|
|
|
|
of each other. This allows for each of the keys to be placed on separate
|
|
|
|
servers if need be, and prevents side channel attacks that compromise one key
|
|
|
|
from automatically compromising the rest of the keys. Therefore, each of the
|
|
|
|
*timestamp*, *consistent-snapshot*, *recently-claimed* and *unclaimed* roles
|
|
|
|
MUST require (1, 1) keys.
|
|
|
|
|
|
|
|
The *unclaimed* role MAY delegate targets in an automated manner to a number of
|
|
|
|
roles called "bins", as we discussed in the previous section. Each of the
|
|
|
|
"bin" roles SHOULD share the same key as the *unclaimed* role, due
|
|
|
|
simultaneously to space efficiency of metadata and because there is no security
|
|
|
|
advantage in requiring separate keys.
|
|
|
|
|
|
|
|
The *root* role is critical for security and should very rarely be used. It is
|
|
|
|
primarily used for key revocation, and it is the root of trust for all of PyPI.
|
|
|
|
The *root* role signs for the keys that are authorized for each of the
|
|
|
|
top-level roles (including itself). The keys belonging to the *root* role are
|
|
|
|
intended to be very well-protected and used with the least frequency of all
|
|
|
|
keys. We propose that every PSF board member own a (strong) root key. A
|
|
|
|
majority of them can then constitute the quorum to revoke or endow trust in all
|
|
|
|
top-level keys. Alternatively, the system administrators of PyPI (instead of
|
|
|
|
PSF board members) could be responsible for signing for the *root* role.
|
|
|
|
Therefore, the *root* role SHOULD require (t, n) keys, where n is the number of
|
|
|
|
either all PyPI administrators or all PSF board members, and t > 1 (so that at
|
|
|
|
least two members must sign the *root* role).
|
|
|
|
|
|
|
|
The *targets* role will be used only to sign for the static delegation of all
|
|
|
|
targets to the *claimed*, *recently-claimed* and *unclaimed* roles. Since
|
|
|
|
these target delegations must be secured against attacks in the event of a
|
|
|
|
compromise, the keys for the *targets* role MUST be offline and independent
|
|
|
|
from other keys. For simplicity of key management without sacrificing
|
|
|
|
security, it is RECOMMENDED that the keys of the *targets* role are permanently
|
|
|
|
discarded as soon as they have been created and used to sign for the role.
|
|
|
|
Therefore, the *targets* role SHOULD require (1, 1) keys. Again, this is
|
|
|
|
because the keys are going to be permanently discarded, and more offline keys
|
|
|
|
will not help against key recovery attacks [21]_ unless diversity of keys is
|
|
|
|
maintained.
|
|
|
|
|
|
|
|
Similarly, the *claimed* role will be used only to sign for the dynamic
|
|
|
|
delegation of projects to their respective developer keys. Since these target
|
|
|
|
delegations must be secured against attacks in the event of a compromise, the
|
|
|
|
keys for the *claimed* role MUST be offline and independent from other keys.
|
|
|
|
Therefore, the *claimed* role SHOULD require (t, n) keys, where n is the number
|
|
|
|
of all PyPI administrators (in order to keep it manageable), and t ≥ 1 (so that
|
|
|
|
at least one member MUST sign the *claimed* role). While a stronger threshold
|
|
|
|
would indeed render the role more robust against a compromise of the *claimed*
|
|
|
|
keys (which is highly unlikely assuming that the keys are independent and
|
|
|
|
securely kept offline), we think that this trade-off is acceptable for the
|
|
|
|
important purpose of keeping the maintenance overhead for PyPI administrators
|
|
|
|
as little as possible. At the time of writing, we are keeping this point open
|
|
|
|
for discussion by the distutils-sig community.
|
|
|
|
|
|
|
|
The number of developer keys is project-specific and thus beyond the scope of
|
|
|
|
this PEP.
|
|
|
|
|
|
|
|
|
|
|
|
Online and Offline Keys
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
In order to support continuous delivery, the *timestamp*,
|
|
|
|
*consistent-snapshot*, *recently-claimed* and *unclaimed* role keys MUST be
|
|
|
|
online.
|
|
|
|
|
|
|
|
As explained in the previous section, the *root*, *targets* and *claimed* role
|
|
|
|
keys MUST be offline for maximum security. Developers keys will be offline in
|
|
|
|
the sense that the private keys MUST NOT be stored on PyPI, though some of them
|
|
|
|
may be online on the private infrastructure of the project.
|
|
|
|
|
|
|
|
|
|
|
|
Key Strength
|
|
|
|
------------
|
|
|
|
|
|
|
|
At the time of writing, we recommend that all RSA keys (both offline and
|
|
|
|
online) SHOULD have a minimum key size of 3072 bits for data-protection
|
|
|
|
lifetimes beyond 2030 [22]_.
|
|
|
|
|
|
|
|
|
|
|
|
Diversity Of Keys
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
Due to the threats of weak key generation and implementation weaknesses [2]_,
|
|
|
|
the types of keys as well as the libraries used to generate them should vary
|
|
|
|
within TUF on PyPI. Our current implementation of TUF supports multiple
|
|
|
|
digital signature algorithms such as RSA (with OpenSSL [23]_ or PyCrypto [24]_)
|
|
|
|
and ed25519 [25]_. Furthermore, TUF supports the binding of other
|
|
|
|
cryptographic libraries that it does not immediately support "out of the box",
|
|
|
|
and so one MAY generate keys using other cryptographic libraries and use them
|
|
|
|
for TUF on PyPI.
|
|
|
|
|
|
|
|
As such, the root role keys SHOULD be generated by a variety of digital
|
|
|
|
signature algorithms as implemented by different cryptographic libraries.
|
|
|
|
|
|
|
|
|
|
|
|
Key Compromise Analysis
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
.. image:: https://raw.github.com/theupdateframework/pep-on-pypi-with-tuf/master/table1.png
|
|
|
|
|
|
|
|
Table 1: Attacks possible by compromising certain combinations of role keys
|
|
|
|
|
|
|
|
|
|
|
|
Table 1 summarizes the kinds of attacks rendered possible by compromising a
|
|
|
|
threshold number of keys belonging to the TUF roles on PyPI. Except for the
|
|
|
|
*timestamp* and *consistent-snapshot* roles, the pairwise interaction of role
|
|
|
|
compromises may be found by taking the union of both rows.
|
|
|
|
|
|
|
|
In September 2013, we showed how the latest version of pip then was susceptible
|
|
|
|
to these attacks and how TUF could protect users against them [14]_.
|
|
|
|
|
|
|
|
An attacker who compromises developer keys for a project and who is able to
|
|
|
|
somehow upload malicious metadata and targets to PyPI will be able to serve
|
|
|
|
malicious updates to users of that project (and that project alone). Note that
|
|
|
|
compromising *targets* or any delegated targets role (except for project
|
|
|
|
targets metadata) does not immediately endow the attacker with the ability to
|
|
|
|
serve malicious updates. The attacker must also compromise the *timestamp* and
|
|
|
|
*consistent-snapshot* roles (which are both online and therefore more likely to
|
|
|
|
be compromised). This means that in order to launch any attack, one must be
|
|
|
|
not only be able to act as a man-in-the-middle but also compromise the
|
|
|
|
*timestamp* key (or the *root* keys and sign a new *timestamp* key). To launch
|
|
|
|
any attack other than a freeze attack, one must also compromise the
|
|
|
|
*consistent-snapshot* key.
|
|
|
|
|
|
|
|
Finally, a compromise of the PyPI infrastructure MAY introduce malicious
|
|
|
|
updates to *recently-claimed* and *unclaimed* projects because the keys for
|
|
|
|
those roles are online. However, attackers cannot modify *claimed* projects in
|
|
|
|
such an event because *targets* and *claimed* metadata have been signed with
|
|
|
|
offline keys. Therefore, it is RECOMMENDED that high-value projects register
|
|
|
|
their developer keys with PyPI and sign for their own distributions.
|
|
|
|
|
|
|
|
|
|
|
|
In the Event of a Key Compromise
|
|
|
|
--------------------------------
|
|
|
|
|
|
|
|
By a key compromise, we mean that the key as well as PyPI infrastructure has
|
|
|
|
been compromised and used to sign new metadata on PyPI.
|
|
|
|
|
|
|
|
If a threshold number of developer keys of a project have been compromised,
|
|
|
|
then the project MUST take the following steps:
|
|
|
|
|
|
|
|
1. The project metadata and targets MUST be restored to the last known good
|
|
|
|
consistent snapshot where the project was not known to be compromised. This
|
|
|
|
can be done by the developers repackaging and resigning all targets with the
|
|
|
|
new keys.
|
|
|
|
|
|
|
|
2. The project delegated targets metadata MUST have their version numbers
|
|
|
|
incremented, expiry times suitably extended and signatures renewed.
|
|
|
|
|
|
|
|
Whereas PyPI MUST take the following steps:
|
|
|
|
|
|
|
|
1. Revoke the compromised developer keys from the delegation to the project by
|
|
|
|
the *recently-claimed* or *claimed* role. This is done by replacing the
|
|
|
|
compromised developer keys with newly issued developer keys.
|
|
|
|
|
|
|
|
2. A new timestamped consistent snapshot MUST be issued.
|
|
|
|
|
|
|
|
If a threshold number of *timestamp*, *consistent-snapshot*, *recently-claimed*
|
|
|
|
or *unclaimed* keys have been compromised, then PyPI MUST take the following
|
|
|
|
steps:
|
|
|
|
|
|
|
|
1. Revoke the *timestamp*, *consistent-snapshot* and *targets* role keys from
|
|
|
|
the *root* role. This is done by replacing the compromised *timestamp*,
|
|
|
|
*consistent-snapshot* and *targets* keys with newly issued keys.
|
|
|
|
|
|
|
|
2. Revoke the *recently-claimed* and *unclaimed* keys from the *targets* role
|
|
|
|
by replacing their keys with newly issued keys. Sign the new *targets* role
|
|
|
|
metadata and discard the new keys (because, as we explained earlier, this
|
|
|
|
increases the security of *targets* metadata).
|
|
|
|
|
|
|
|
3. Clear all targets or delegations in the *recently-claimed* role and delete
|
|
|
|
all associated delegated targets metadata. Recently registered projects
|
|
|
|
SHOULD register their developer keys again with PyPI.
|
|
|
|
|
|
|
|
4. All targets of the *recently-claimed* and *unclaimed* roles SHOULD be
|
|
|
|
compared with the last known good consistent snapshot where none of the
|
|
|
|
*timestamp*, *consistent-snapshot*, *recently-claimed* or *unclaimed* keys
|
|
|
|
were known to have been compromised. Added, updated or deleted targets in
|
|
|
|
the compromised consistent snapshot that do not match the last known good
|
|
|
|
consistent snapshot MAY be restored to their previous versions. After
|
|
|
|
ensuring the integrity of all *unclaimed* targets, the *unclaimed* metadata
|
|
|
|
MUST be regenerated.
|
|
|
|
|
|
|
|
5. The *recently-claimed* and *unclaimed* metadata MUST have their version
|
|
|
|
numbers incremented, expiry times suitably extended and signatures renewed.
|
|
|
|
|
|
|
|
6. A new timestamped consistent snapshot MUST be issued.
|
|
|
|
|
|
|
|
This would preemptively protect all of these roles even though only one of them
|
|
|
|
may have been compromised.
|
|
|
|
|
|
|
|
If a threshold number of the *targets* or *claimed* keys have been compromised,
|
|
|
|
then there is little that an attacker could do without the *timestamp* and
|
|
|
|
*consistent-snapshot* keys. In this case, PyPI MUST simply revoke the
|
|
|
|
compromised *targets* or *claimed* keys by replacing them with new keys in the
|
|
|
|
*root* and *targets* roles respectively.
|
|
|
|
|
|
|
|
If a threshold number of the *timestamp*, *consistent-snapshot* and *claimed*
|
|
|
|
keys have been compromised, then PyPI MUST take the following steps in addition
|
|
|
|
to the steps taken when either the *timestamp* or *consistent-snapshot* keys
|
|
|
|
are compromised:
|
|
|
|
|
|
|
|
1. Revoke the *claimed* role keys from the *targets* role and replace them with
|
|
|
|
newly issued keys.
|
|
|
|
|
|
|
|
2. All project targets of the *claimed* roles SHOULD be compared with the last
|
|
|
|
known good consistent snapshot where none of the *timestamp*,
|
|
|
|
*consistent-snapshot* or *claimed* keys were known to have been compromised.
|
|
|
|
Added, updated or deleted targets in the compromised consistent snapshot
|
|
|
|
that do not match the last known good consistent snapshot MAY be restored to
|
|
|
|
their previous versions. After ensuring the integrity of all *claimed*
|
|
|
|
project targets, the *claimed* metadata MUST be regenerated.
|
|
|
|
|
|
|
|
3. The *claimed* metadata MUST have their version numbers incremented, expiry
|
|
|
|
times suitably extended and signatures renewed.
|
|
|
|
|
|
|
|
If a threshold number of the *timestamp*, *consistent-snapshot* and *targets*
|
|
|
|
keys have been compromised, then PyPI MUST take the union of the steps taken
|
|
|
|
when the *claimed*, *recently-claimed* and *unclaimed* keys have been
|
|
|
|
compromised.
|
|
|
|
|
|
|
|
If a threshold number of the *root* keys have been compromised, then PyPI MUST
|
|
|
|
take the steps taken when the *targets* role has been compromised as well as
|
|
|
|
replace all of the *root* keys.
|
|
|
|
|
|
|
|
It is also RECOMMENDED that PyPI sufficiently document compromises with
|
|
|
|
security bulletins. These security bulletins will be most informative when
|
|
|
|
users of pip with TUF are unable to install or update a project because the
|
|
|
|
keys for the *timestamp*, *consistent-snapshot* or *root* roles are no longer
|
|
|
|
valid. They could then visit the PyPI web site to consult security bulletins
|
|
|
|
that would help to explain why they are no longer able to install or update,
|
|
|
|
and then take action accordingly. When a threshold number of *root* keys have
|
|
|
|
not been revoked due to a compromise, then new *root* metadata may be safely
|
|
|
|
updated because a threshold number of existing *root* keys will be used to sign
|
|
|
|
for the integrity of the new *root* metadata so that TUF clients will be able
|
|
|
|
to verify the integrity of the new *root* metadata with a threshold number of
|
|
|
|
previously known *root* keys. This will be the common case. Otherwise, in the
|
|
|
|
worst case where a threshold number of *root* keys have been revoked due to a
|
|
|
|
compromise, an end-user may choose to update new *root* metadata with
|
|
|
|
`out-of-band`__ mechanisms.
|
|
|
|
|
|
|
|
__ https://en.wikipedia.org/wiki/Out-of-band#Authentication
|
|
|
|
|
|
|
|
|
|
|
|
Appendix: Rejected Proposals
|
|
|
|
============================
|
|
|
|
|
|
|
|
|
|
|
|
Alternative Proposals for Producing Consistent Snapshots
|
|
|
|
--------------------------------------------------------
|
|
|
|
|
|
|
|
The complete file snapshot (CFS) scheme uses file system directories to store
|
|
|
|
efficient consistent snapshots over time. In this scheme, every consistent
|
|
|
|
snapshot will be stored in a separate directory, wherein files that are shared
|
|
|
|
with previous consistent snapshots will be `hard links`__ instead of copies.
|
|
|
|
|
|
|
|
__ https://en.wikipedia.org/wiki/Hard_link
|
|
|
|
|
|
|
|
The `differential file`__ snapshot (DFS) scheme is a variant of the CFS scheme,
|
|
|
|
wherein the next consistent snapshot directory will contain only the additions
|
|
|
|
of new files and updates to existing files of the previous consistent snapshot.
|
|
|
|
(The first consistent snapshot will contain a complete set of files known
|
|
|
|
then.) Deleted files will be marked as such in the next consistent snapshot
|
|
|
|
directory. This means that files will be resolved in this manner: First, set
|
|
|
|
the current consistent snapshot directory to be the latest consistent snapshot
|
|
|
|
directory. Then, any requested file will be seeked in the current consistent
|
|
|
|
snapshot directory. If the file exists in the current consistent snapshot
|
|
|
|
directory, then that file will be returned. If it has been marked as deleted
|
|
|
|
in the current consistent snapshot directory, then that file will be reported
|
|
|
|
as missing. Otherwise, the current consistent snapshot directory will be set
|
|
|
|
to the preceding consistent snapshot directory and the previous few steps will
|
|
|
|
be iterated until there is no preceding consistent snapshot to be considered,
|
|
|
|
at which point the file will be reported as missing.
|
|
|
|
|
|
|
|
__ http://dl.acm.org/citation.cfm?id=320484
|
|
|
|
|
|
|
|
With the CFS scheme, the trade-off is the I/O costs of producing a consistent
|
|
|
|
snapshot with the file system. As of October 2013, we found that a fairly
|
|
|
|
modern computer with a 7200RPM hard disk drive required at least three minutes
|
|
|
|
to produce a consistent snapshot with the "cp -lr" command on the ext3__ file
|
|
|
|
system. Perhaps the I/O costs of this scheme may be ameliorated with advanced
|
|
|
|
tools or file systems such as ZFS__ or btrfs__.
|
|
|
|
|
|
|
|
__ https://en.wikipedia.org/wiki/Ext3
|
|
|
|
__ https://en.wikipedia.org/wiki/ZFS
|
|
|
|
__ https://en.wikipedia.org/wiki/Btrfs
|
|
|
|
|
|
|
|
While the DFS scheme improves upon the CFS scheme in terms of producing faster
|
|
|
|
consistent snapshots, there are at least two trade-offs. The first is that a
|
|
|
|
web server will need to be modified to perform the "daisy chain" resolution of
|
|
|
|
a file. The second is that every now and then, the differential snapshots will
|
|
|
|
need to be "squashed" or merged together with the first consistent snapshot to
|
|
|
|
produce a new first consistent snapshot with the latest and complete set of
|
|
|
|
files. Although the merge cost may be amortized over time, this scheme is not
|
|
|
|
conceptually si
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
==========
|
|
|
|
|
|
|
|
.. [1] https://pypi.python.org
|
|
|
|
.. [2] https://isis.poly.edu/~jcappos/papers/samuel_tuf_ccs_2010.pdf
|
|
|
|
.. [3] http://www.pip-installer.org
|
|
|
|
.. [4] https://wiki.python.org/moin/WikiAttack2013
|
|
|
|
.. [5] https://github.com/theupdateframework/pip/wiki/Attacks-on-software-repositories
|
|
|
|
.. [6] https://mail.python.org/pipermail/distutils-sig/2013-April/020596.html
|
|
|
|
.. [7] https://mail.python.org/pipermail/distutils-sig/2013-May/020701.html
|
|
|
|
.. [8] https://mail.python.org/pipermail/distutils-sig/2013-July/022008.html
|
|
|
|
.. [9] PEP 381, Mirroring infrastructure for PyPI, Ziadé, Löwis
|
|
|
|
http://www.python.org/dev/peps/pep-0381/
|
|
|
|
.. [10] https://mail.python.org/pipermail/distutils-sig/2013-September/022773.html
|
|
|
|
.. [11] https://mail.python.org/pipermail/distutils-sig/2013-May/020848.html
|
|
|
|
.. [12] PEP 449, Removal of the PyPI Mirror Auto Discovery and Naming Scheme, Stufft
|
|
|
|
http://www.python.org/dev/peps/pep-0449/
|
|
|
|
.. [13] https://isis.poly.edu/~jcappos/papers/cappos_mirror_ccs_08.pdf
|
|
|
|
.. [14] https://mail.python.org/pipermail/distutils-sig/2013-September/022755.html
|
|
|
|
.. [15] https://pypi.python.org/security
|
|
|
|
.. [16] https://github.com/theupdateframework/tuf/blob/develop/docs/tuf-spec.txt
|
|
|
|
.. [17] PEP 426, Metadata for Python Software Packages 2.0, Coghlan, Holth, Stufft
|
|
|
|
http://www.python.org/dev/peps/pep-0426/
|
|
|
|
.. [18] https://en.wikipedia.org/wiki/Continuous_delivery
|
|
|
|
.. [19] https://mail.python.org/pipermail/distutils-sig/2013-August/022154.html
|
|
|
|
.. [20] https://en.wikipedia.org/wiki/RSA_%28algorithm%29
|
|
|
|
.. [21] https://en.wikipedia.org/wiki/Key-recovery_attack
|
|
|
|
.. [22] http://csrc.nist.gov/publications/nistpubs/800-57/SP800-57-Part1.pdf
|
|
|
|
.. [23] https://www.openssl.org/
|
|
|
|
.. [24] https://pypi.python.org/pypi/pycrypto
|
|
|
|
.. [25] http://ed25519.cr.yp.to/
|
|
|
|
|
|
|
|
|
|
|
|
Acknowledgements
|
|
|
|
================
|
|
|
|
|
|
|
|
Nick Coghlan, Daniel Holth and the distutils-sig community in general for
|
|
|
|
helping us to think about how to usably and efficiently integrate TUF with
|
|
|
|
PyPI.
|
|
|
|
|
|
|
|
Roger Dingledine, Sebastian Hahn, Nick Mathewson, Martin Peck and Justin
|
|
|
|
Samuel for helping us to design TUF from its predecessor Thandy of the Tor
|
|
|
|
project.
|
|
|
|
|
|
|
|
Konstantin Andrianov, Geremy Condra, Vladimir Diaz, Zane Fisher, Justin Samuel,
|
|
|
|
Tian Tian, Santiago Torres, John Ward, and Yuyu Zheng for helping us to develop
|
|
|
|
TUF.
|
|
|
|
|
|
|
|
Vladimir Diaz, Monzur Muhammad and Sai Teja Peddinti for helping us to review
|
|
|
|
this PEP.
|
|
|
|
|
|
|
|
Zane Fisher for helping us to review and transcribe this PEP.
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
=========
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|