diff --git a/pep-0458.txt b/pep-0458.txt index 738cd0eed..780190304 100644 --- a/pep-0458.txt +++ b/pep-0458.txt @@ -2,10 +2,11 @@ PEP: 458 Title: Surviving a Compromise of PyPI Version: $Revision$ Last-Modified: $Date$ -Author: Trishank Karthik Kuppusamy , - Donald Stufft , - Justin Cappos -Discussions-To: Distutils SIG +Author: Trishank Karthik Kuppusamy , + Vladimir Diaz , + Donald Stufft , Justin Cappos +BDFL-Delegate: Richard Jones +Discussions-To: DistUtils mailing list Status: Draft Type: Standards Track Content-Type: text/x-rst @@ -15,491 +16,632 @@ Created: 27-Sep-2013 Abstract ======== -This PEP describes how the Python Package Index (PyPI [1]_) may be integrated -with The Update Framework [2]_ (TUF). TUF was designed to be a plug-and-play -security add-on to a software updater or package manager. TUF provides -end-to-end security like SSL, but for software updates instead of HTTP -connections. The framework integrates best security practices such as -separating responsibilities, adopting the many-man rule for signing packages, -keeping signing keys offline, and revocation of expired or compromised signing -keys. +This PEP proposes how the Python Package Index (PyPI [1]_) should be integrated +with The Update Framework [2]_ (TUF). TUF was designed to be a flexible +security add-on to a software updater or package manager. The framework +integrates best security practices such as separating role responsibilities, +adopting the many-man rule for signing packages, keeping signing keys offline, +and revocation of expired or compromised signing keys. For example, attackers +would have to steal multiple signing keys stored independently to compromise +a role responsible for specifying a repository's available files. Another role +responsible for indicating the latest snapshot of the repository may have to be +similarly compromised, and independent of the first compromised role. -The proposed integration will render modern package managers such as pip [3]_ -more secure against various types of security attacks on PyPI and protect users -against them. Even in the worst case where an attacker manages to compromise -PyPI itself, the damage is controlled in scope and limited in duration. +The proposed integration will allow modern package managers such as pip [3]_ to +be more secure against various types of security attacks on PyPI and protect +users from such attacks. Specifically, this PEP describes how PyPI processes +should be adapted to generate and incorporate TUF metadata (i.e., the minimum +security model). The minimum security model supports verification of PyPI +distributions that are signed with keys stored on PyPI: distributions uploaded +by developers are signed by PyPI, require no action from developers (other than +uploading the distribution), and are immediately available for download. The +minimum security model also minimizes PyPI administrative responsibilities by +automating much of the signing process. -Specifically, this PEP will describe how PyPI processes should be adapted to -incorporate TUF metadata. It will not prescribe how package managers such as -pip should be adapted to install or update with TUF metadata projects from -PyPI. +This PEP does not prescribe how package managers such as pip should be adapted +to install or update projects from PyPI with TUF metadata. Package managers +interested in adopting TUF on the client side may consult TUF's `library +documentation`__, which exists for this purpose. Support for project +distributions that are signed by developers (maximum security model) is also +not discussed in this PEP, but is outlined in the appendix as a possible future +extension and covered in detail in PEP X [VD: Link to PEP once it is +completed]. The PEP X extension focuses on the maximum security model, which +requires more PyPI administrative work (none by clients), but it also proposes +an easy-to-use key management solution for developers, how to interface with a +potential future build farm on PyPI infrastructure, and discusses the +feasibility of end-to-end signing. + +__ https://github.com/theupdateframework/tuf/tree/develop/tuf/client#updaterpy -Rationale -========= +Motivation +========== In January 2013, the Python Software Foundation (PSF) announced [4]_ that the python.org wikis for Python, Jython, and the PSF were subjected to a security -breach which caused all of the wiki data to be destroyed on January 5 2013. +breach that caused all of the wiki data to be destroyed on January 5, 2013. Fortunately, the PyPI infrastructure was not affected by this security breach. However, the incident is a reminder that PyPI should take defensive steps to protect users as much as possible in the event of a compromise. Attacks on -software repositories happen all the time [5]_. We must accept the possibility -of security breaches and prepare PyPI accordingly because it is a valuable -target used by thousands, if not millions, of people. +software repositories happen all the time [5]_. The PSF must accept the +possibility of security breaches and prepare PyPI accordingly because it is a +valuable resource used by thousands, if not millions, of people. -Before the wiki attack, PyPI used MD5 hashes to tell package managers such as -pip whether or not a package was corrupted in transit. However, the absence of -SSL made it hard for package managers to verify transport integrity to PyPI. -It was easy to launch a man-in-the-middle attack between pip and PyPI to change -package contents arbitrarily. This can be used to trick users into installing -malicious packages. After the wiki attack, several steps were proposed (some -of which were implemented) to deliver a much higher level of security than was -previously the case: requiring SSL to communicate with PyPI [6]_, restricting -project names [7]_, and migrating from MD5 to SHA-2 hashes [8]_. +Before the wiki attack, PyPI used MD5 hashes to tell package managers, such as +pip, whether or not a package was corrupted in transit. However, the absence +of SSL made it hard for package managers to verify transport integrity to PyPI. +It was therefore easy to launch a man-in-the-middle attack between pip and +PyPI, and change package content arbitrarily. Users could be tricked into +installing malicious packages with man-in-the-middle attacks. After the wiki +attack, several steps were proposed (some of which were implemented) to deliver +a much higher level of security than was previously the case: requiring SSL to +communicate with PyPI [6]_, restricting project names [7]_, and migrating from +MD5 to SHA-2 hashes [8]_. These steps, though necessary, are insufficient because attacks are still possible through other avenues. For example, a public mirror is trusted to honestly mirror PyPI, but some mirrors may misbehave due to malice or accident. Package managers such as pip are supposed to use signatures from PyPI to verify packages downloaded from a public mirror [9]_, but none are known to actually -do so [10]_. Therefore, it is also wise to add more security measures to +do so [10]_. Therefore, it would be wise to add more security measures to detect attacks from public mirrors or content delivery networks [11]_ (CDNs). Even though official mirrors are being deprecated on PyPI [12]_, there remain a -wide variety of other attack vectors on package managers [13]_. Among other -things, these attacks can crash client systems, cause obsolete packages to be -installed, or even allow an attacker to execute arbitrary code. In September -2013, we showed how the latest version of pip then was susceptible to these -attacks and how TUF could protect users against them [14]_. +wide variety of other attack vectors on package managers [13]_. These attacks +can crash client systems, cause obsolete packages to be installed, or even +allow an attacker to execute arbitrary code. In `September 2013`__, a post was +made to the Distutils mailing list showing that the latest version of pip (at +the time) was susceptible to such attacks, and how TUF could protect users +against them [14]_. Specifically, testing was done to see how pip would +respond to these attacks with and without TUF. Attacks tested included replay +and freeze, arbitrary packages, slow retrieval, and endless data. The post +also included a demonstration of how pip would respond if PyPI were +compromised. -Finally, PyPI allows for packages to be signed with GPG keys [15]_, although no -package manager is known to verify those signatures, thus negating much of the -benefits of having those signatures at all. Validating integrity through -cryptography is important, but issues such as immediate and secure key -revocation or specifying a required threshold number of signatures still -remain. Furthermore, GPG by itself does not immediately address the attacks -mentioned above. +__ https://mail.python.org/pipermail/distutils-sig/2013-September/022755.html -In order to protect PyPI against infrastructure compromises, we propose -integrating PyPI with The Update Framework [2]_ (TUF). +With the intent to protect PyPI against infrastructure compromises, this PEP +proposes integrating PyPI with The Update Framework [2]_ (TUF). TUF helps +secure new or existing software update systems. Software update systems are +vulnerable to many known attacks, including those that can result in clients +being compromised or crashed. TUF solves these problems by providing a flexible +security framework that can be added to software updaters. + + +Threat Model +============ + +The threat model assumes the following: + +* Offline keys are safe and securely stored. + +* Attackers can compromise at least one of PyPI's trusted keys stored online, + and may do so at once or over a period of time. + +* Attackers can respond to client requests. + +An attacker is considered successful if they can cause a client to install (or +leave installed) something other than the most up-to-date version of the +software the client is updating. If the attacker is preventing the installation +of updates, they want clients to not realize there is anything wrong. Definitions =========== -The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", +The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119__. __ http://www.ietf.org/rfc/rfc2119.txt -In order to keep this PEP focused solely on the application of TUF on PyPI, the -reader is assumed to already be familiar with the design principles of -TUF [2]_. It is also strongly RECOMMENDED that the reader be familiar with the -TUF specification [16]_. +This PEP focuses on integrating TUF with PyPI; however, the reader is +encouraged to read about TUF's design principles [2]_. It is also RECOMMENDED +that the reader be familiar with the TUF specification [16]_. + +Terms used in this PEP are defined as follows: * Projects: Projects are software components that are made available for - integration. Projects include Python libraries, frameworks, scripts, plugins, - applications, collections of data or other resources, and various + integration. Projects include Python libraries, frameworks, scripts, + plugins, applications, collections of data or other resources, and various combinations thereof. Public Python projects are typically registered on the Python Package Index [17]_. * Releases: Releases are uniquely identified snapshots of a project [17]_. -* Distributions: Distributions are the packaged files which are used to publish +* Distributions: Distributions are the packaged files that are used to publish and distribute a release [17]_. -* Simple index: The HTML page which contains internal links to the +* Simple index: The HTML page that contains internal links to the distributions of a project [17]_. + +* Roles: There is one *root* role in PyPI. There are multiple roles whose + responsibilities are delegated to them directly or indirectly by the *root* + role. The term top-level role refers to the *root* role and any role + delegated by the *root* role. Each role has a single metadata file that it is + trusted to provide. + +* Metadata: Metadata are signed files that describe roles, other metadata, and + target files. + +* Repository: A repository is a resource compromised of named metadata and + target files. Clients request metadata and target files stored on a + repository. * Consistent snapshot: A set of TUF metadata and PyPI targets that capture the - complete state of all projects on PyPI as they were at some fixed point in + complete state of all projects on PyPI as they existed at some fixed point in time. -* The *consistent-snapshot* (*release*) role: In order to prevent confusion due - to the different meanings of the term "release" as employed by PEP 426 [17]_ - and the TUF specification [16]_, we rename the *release* role as the - *consistent-snapshot* role. - -* Continuous delivery: A set of processes with which PyPI produces consistent - snapshots that can safely coexist and deleted independently [18]_. - +* The *snapshot* (*release*) role: In order to prevent confusion due to the + different meanings of the term "release" used in PEP 426 [17]_ and the TUF + specification [16]_, the *release* role is renamed as the *snapshot* role. + * Developer: Either the owner or maintainer of a project who is allowed to - update the TUF metadata as well as distribution metadata and data for the + update the TUF metadata as well as distribution metadata and files for the project. -* Online key: A key that MUST be stored on the PyPI server infrastructure. - This is usually to allow automated signing with the key. However, this means - that an attacker who compromises PyPI infrastructure will be able to read - these keys. +* Online key: A private cryptographic key that MUST be stored on the PyPI + server infrastructure. This is usually to allow automated signing with the + key. However, an attacker who compromises the PyPI infrastructure will be + able to read these keys. -* Offline key: A key that MUST be stored off the PyPI infrastructure. This - prevents automated signing with the key. This means that an attacker who - compromises PyPI infrastructure will not be able to immediately read these - keys. +* Offline key: A private cryptographic key that MUST be stored independent of + the PyPI server infrastructure. This prevents automated signing with the + key. An attacker who compromises the PyPI infrastructure will not be able to + immediately read these keys. -* Developer key: A private key for which its corresponding public key is - registered with PyPI to say that it is responsible for directly signing for - or delegating the distributions belonging to a project. For the purposes of - this PEP, it is offline in the sense that the private key MUST not be stored - on PyPI. However, the project is free to require certain developer keys to - be online on its own infrastructure. - -* Threshold signature scheme: A role could increase its resilience to key - compromises by requiring that at least t out of n keys are REQUIRED to sign - its metadata. This means that a compromise of t-1 keys is insufficient to - compromise the role itself. We denote this property by saying that the role - requires (t, n) keys. +* Threshold signature scheme: A role can increase its resilience to key + compromises by specifying that at least t out of n keys are REQUIRED to sign + its metadata. A compromise of t-1 keys is insufficient to compromise the + role itself. Saying that a role requires (t, n) keys denotes the threshold + signature property. -Overview -======== +Overview of TUF +=============== -.. image:: https://raw.github.com/theupdateframework/pep-on-pypi-with-tuf/master/figure1.png +At its highest level, TUF provides applications with a secure method of +obtaining files and knowing when new versions of files are available. On the +surface, this all sounds simple. The basic steps for updating applications are: -Figure 1: A simplified overview of the roles in PyPI with TUF +* Knowing when an update exists. -Figure 1 shows a simplified overview of the roles that TUF metadata assume on -PyPI. The top-level *root* role signs for the keys of the top-level -*timestamp*, *consistent-snapshot*, *targets* and *root* roles. The -*timestamp* role signs for a new and consistent snapshot. The *consistent- -snapshot* role signs for the *root*, *targets* and all delegated targets -metadata. The *claimed* role signs for all projects that have registered their -own developer keys with PyPI. The *recently-claimed* role signs for all -projects that recently registered their own developer keys with PyPI. Finally, -the *unclaimed* role signs for all projects that have not registered developer -keys with PyPI. The *claimed*, *recently-claimed* and *unclaimed* roles are -numbered 1, 2, 3 respectively because a project will be searched for in each of -those roles in that descending order: first in *claimed*, then in -*recently-claimed* if necessary, and finally in *unclaimed* if necessary. +* Downloading a correct copy of the latest version of an updated file. -Every year, PyPI administrators are going to sign for *root* role keys. After -that, automation will continuously sign for a timestamped, consistent snapshot -of all projects. Every few months, PyPI administrators will move projects with -vetted developer keys from the *recently-claimed* role to the *claimed* role. -As we will soon see, they will sign for *claimed* with projects with offline -keys. +The problem is that updating applications is only simple when there are no +malicious activities in the picture. If an attacker is trying to interfere with +these seemingly simple steps, there is plenty they can do. -This PEP does not require project developers to use TUF to secure their -packages from attacks on PyPI. By default, all projects will be signed for by -the *unclaimed* role. If a project wishes stronger security guarantees, then -the project is strongly RECOMMENDED to register developer keys with PyPI so -that it may sign for its own distributions. By doing so, the project must -remain as a *recently-claimed* project until PyPI administrators have had an -opportunity to vet the developer keys of the project, after which the project -will be moved to the *claimed* role. +Assume a software updater takes the approach of most systems (at least the ones +that try to be secure). It downloads both the file it wants and a cryptographic +signature of the file. The software updater already knows which key it trusts +to make the signature. It checks that the signature is correct and was made by +this trusted key. Unfortunately, the software updater is still at risk in many +ways, including: -This PEP has **not** been designed to be backward-compatible for package -managers that do not use the TUF security protocol to install or update a -project from the PyPI described here. Instead, it is RECOMMENDED that PyPI -maintain a backward-compatible API of itself that does NOT offer TUF so that -older package managers that do not use TUF will be able to install or update -projects from PyPI as usual but without any of the security offered by TUF. -For the rest of this PEP, we will assume that PyPI will simultaneously maintain -a backward-incompatible API of itself for package managers that MUST use TUF to -securely install or update projects. We think that this approach represents a -reasonable trade-off: older package managers that do not TUF will still be able -to install or update projects without any TUF security from PyPI, and newer -package managers that do use TUF will be able to securely install or update -projects. At some point in the future, PyPI administrators MAY choose to -permanently deprecate the backward-compatible version of itself that does not -offer TUF metadata. +* An attacker keeps giving the software updater the same update file, so it + never realizes there is an update. -Unless a mirror, CDN or the PyPI repository has been compromised, the end-user -will not be able to discern whether or not a package manager is using TUF to -install or update a project from PyPI. +* An attacker gives the software updater an older, insecure version of a file + that it already has, so it downloads that one and blindly uses it thinking it + is newer. + +* An attacker gives the software updater a newer version of a file it has but + it is not the newest one. The file is newer to the software updater, but it + may be insecure and exploitable by the attacker. + +* An attacker compromises the key used to sign these files and now the software + updater downloads a malicious file that is properly signed. + +TUF is designed to address these attacks, and others, by adding signed metadata +(text files that describe the repository's files) to the repository and +referencing the metadata files during the update procedure. Repository files +are verified against the information included in the metadata before they are +handed off to the software update system. The framework also provides +multi-signature trust, explicit and implicit revocation of cryptograhic keys, +responsibility separation of the metadata, and minimizes key risk. For a full +list and outline of the repository attacks and software updater weaknesses +addressed by TUF, see Appendix A. -Responsibility Separation +Integrating TUF with PyPI ========================= -Recall that TUF requires four top-level roles: *root*, *timestamp*, -*consistent-snapshot* and *targets*. The *root* role specifies the keys of all -the top-level roles (including itself). The *timestamp* role specifies the -latest consistent snapshot. The *consistent-snapshot* role specifies the -latest versions of all TUF metadata files (other than *timestamp*). The -*targets* role specifies available target files (in our case, it will be all -files on PyPI under the /simple and /packages directories). In this PEP, each -of these roles will serve their responsibilities without exception. - -Our proposal offers two levels of security to developers. If developers opt in -to secure their projects with their own developer keys, then their projects -will be very secure. Otherwise, TUF will still protect them in many cases: - -1. Minimum security (no action by a developer): protects *unclaimed* and - *recently-claimed* projects without developer keys from CDNs [19]_ or public - mirrors, but not from some PyPI compromises. This is because continuous - delivery requires some keys to be online. This level of security protects - projects from being accidentally or deliberately tampered with by a mirror - or a CDN because the mirror or CDN will not have any of the PyPI or - developer keys required to sign for projects. However, it would not protect - projects from attackers who have compromised PyPI because they will be able - to manipulate the TUF metadata for *unclaimed* projects with the appropriate - online keys. - -2. Maximum security (developer signs their project): protects projects with - developer keys not only from CDNs or public mirrors, but also from some PyPI - compromises. This is because many important keys will be offline. This - level of security protects projects from being accidentally or deliberately - tampered with by a mirror or a CDN for reasons identical to the minimum - security level. It will also protect projects (or at least mitigate - damages) from the most likely attacks on PyPI. For example: given access to - online keys after a PyPI compromise, attackers will be able to freeze the - distributions for these projects, but they will not be able to serve - malicious distributions for these projects (not without compromising other - offline keys which would entail more risk, time and energy). Details for - the exact level of security offered is discussed in the section on key - management. - -In order to complete support for continuous delivery, we propose three -delegated targets roles: - -1. *claimed*: Signs for the delegation of PyPI projects to their respective - developer keys. - -2. *recently-claimed*: This role is almost identical to the *claimed* role and - could technically be performed by the *unclaimed* role, but there are two - important reasons why it exists independently: the first reason is to - improve the performance of looking up projects in the *unclaimed* role (by - moving metadata to the *recently-claimed* role instead), and the second - reason is to make it easier for PyPI administrators to move - *recently-claimed* projects to the *claimed* role. - -3. *unclaimed*: Signs for PyPI projects without developer keys. - -The *targets* role MUST delegate all PyPI projects to the three delegated -targets roles in the order of appearance listed above. This means that when -pip downloads with TUF a distribution from a project on PyPI, it will first -consult the *claimed* role about it. If the *claimed* role has delegated the -project, then pip will trust the project developers (in order of delegation) -about the TUF metadata for the project. Otherwise, pip will consult the -*recently-claimed* role about the project. If the *recently-claimed* role has -delegated the project, then pip will trust the project developers (in order of -delegation) about the TUF metadata for the project. Otherwise, pip will -consult the *unclaimed* role about the TUF metadata for the project. If the -*unclaimed* role has not delegated the project, then the project is considered -to be non-existent on PyPI. - -A PyPI project MAY begin without registering a developer key. Therefore, the -project will be signed for by the *unclaimed* role. After registering -developer keys, the project will be removed from the *unclaimed* role and -delegated to the *recently-claimed* role. After a probation period and a -vetting process to verify the developer keys of the project, the project will -be removed from the *recently-claimed* role and delegated to the *claimed* -role. - -The *claimed* role offers maximum security, whereas the *recently-claimed* and -*unclaimed* role offer minimum security. All three roles support continuous -delivery of PyPI projects. - -The *unclaimed* role offers minimum security because PyPI will sign for -projects without developer keys with an online key in order to permit -continuous delivery. - -The *recently-claimed* role offers minimum security because while the project -developers will sign for their own distributions with offline developer keys, -PyPI will sign with an online key the delegation of the project to those -offline developer keys. The signing of the delegation with an online key -allows PyPI administrators to continuously deliver projects without having to -continuously sign the delegation whenever one of those projects registers -developer keys. - -Finally, the *claimed* role offers maximum security because PyPI will sign with -offline keys the delegation of a project to its offline developer keys. This -means that every now and then, PyPI administrators will vet developer keys and -sign the delegation of a project to those developer keys after being reasonably -sure about the ownership of the developer keys. The process for vetting -developer keys is out of the scope of this PEP. +A software update system must complete two main tasks to integrate with TUF. +First, it must add the framework to the client side of the update system. For +example, TUF MAY be integrated with the pip package manager. Second, the +repository on the server side MUST be modified to provide signed TUF metadata. +This PEP is concerned with the second part of the integration, and the changes +required on PyPI to support software updates with TUF. -Metadata Management -=================== +What Additional Repository Files are Required on PyPI? +------------------------------------------------------ -In this section, we examine the TUF metadata that PyPI must manage by itself, -and other TUF metadata that must be safely delegated to projects. Examples of -the metadata described here may be seen at our testbed mirror of -`PyPI-with-TUF`__. +In order for package managers like pip to download and verify packages with +TUF, a few extra files MUST exist on PyPI. These extra repository files are +called TUF metadata. TUF metadata contains information such as which keys are +trustable, the cryptographic hashes of files, signatures to the metadata, +metadata version numbers, and the date after which the metadata should be +considered expired. -__ http://mirror1.poly.edu/ +When a package manager wants to check for updates, it asks TUF to do the work. +That is, a package manager never has to deal with this additional metadata or +understand what's going on underneath. If TUF reports back that there are +updates available, a package manager can then ask TUF to download these files +from PyPI. TUF downloads them and checks them against the TUF metadata that it +also downloads from the repository. If the downloaded target files are +trustworthy, TUF then hands them over to the package manager. -The metadata files that change most frequently will be *timestamp*, -*consistent-snapshot* and delegated targets (*claimed*, *recently-claimed*, -*unclaimed*, project) metadata. The *timestamp* and *consistent-snapshot* -metadata MUST be updated whenever *root*, *targets* or delegated targets -metadata are updated. Observe, though, that *root* and *targets* metadata are -much less likely to be updated as often as delegated targets metadata. -Therefore, *timestamp* and *consistent-snapshot* metadata will most likely be -updated frequently (possibly every minute) due to delegated targets metadata -being updated frequently in order to drive continuous delivery of projects. +The `Metadata`__ document provides information about each of the required +metadata and their expected content. The next section covers the different +kinds of metadata RECOMMENDED for PyPI. -Consequently, the processes with which PyPI updates projects will have to be -updated accordingly, the details of which are explained in the following -subsections. +__ https://github.com/theupdateframework/tuf/blob/develop/METADATA.md -Why Do We Need Consistent Snapshots? ------------------------------------- +PyPI and TUF Metadata +===================== -In an ideal world, metadata and data should be immediately updated and -presented whenever a project is updated. In practice, there will be problems -when there are many readers and writers who access the same metadata or data at -the same time. +TUF metadata provides information that clients can use to make update +decisions. For example, a *targets* metadata lists the available distributions +on PyPI and includes the distribution's signatures, cryptographic hashes, and +file sizes. Different metadata files provide different information. The +various metadata files are signed by different roles, which are indicated by +the *root* role. The concept of roles allows TUF to delegate responsibilities +to multiple roles and minimizes the impact of a compromised role. -An important example at the time of writing is that, mirrors are very likely, -as far as we can tell, to update in an inconsistent manner from PyPI as it is -without TUF. Specifically, a mirror would update itself in such a way that -project A would be from time T, whereas project B would be from time T+5, -project C would be from time T+3, and so on where T is the time that the mirror -first begun updating itself. There is no known way for a mirror to update -itself such that it captures the state of all projects as they were at time T. +TUF requires four top-level roles. These are *root*, *timestamp*, *snapshot*, +and *targets*. The *root* role specifies the public cryptographic keys of the +top-level roles (including its own). The *timestamp* role references the +latest *snapshot* and can signify when a new snapshot of the repository is +available. The *snapshot* role indicates the latest version of all the TUF +metadata files (other than *timestamp*). The *targets* role lists the +available target files (in our case, it will be all files on PyPI under the +/simple and /packages directories). Each top-level role will serve its +responsibilities without exception. Figure 1 provides a table of the roles +used in TUF. -Adding TUF to PyPI will not automatically solve the problem. Consider what we -call the `"inverse replay" or "fast-forward" problem`__. Suppose that PyPI has -timestamped a consistent snapshot at version 1. A mirror is later in the -middle of copying PyPI at this snapshot. While the mirror is copying PyPI at -this snapshot, PyPI timestamps a new snapshot at, say, version 2. Without -accounting for consistency, the mirror would then find itself with a copy of -PyPI in an inconsistent state which is indistinguishable from arbitrary -metadata or target attacks. The problem would also apply when the mirror is -substituted with a pip user. +.. image:: figure1.png -__ https://groups.google.com/forum/#!topic/theupdateframework/8mkR9iqivQA +Figure 1: An overview of the TUF roles. -Therefore, the problem can be summarized as such: there are problems of -consistency on PyPI with or without TUF. TUF requires its metadata to be -consistent with the data, but how would the metadata be kept consistent with -projects that change all the time? -As a result, we will solve for PyPI the problem of producing a consistent +Signing Metadata and Repository Management +------------------------------------------ + +The top-level *root* role signs for the keys of the top-level *timestamp*, +*snapshot*, *targets*, and *root* roles. The *timestamp* role signs for every +new snapshot of the repository metadata. The *snapshot* role signs for *root*, +*targets*, and all delegated roles. The *bins* roles (delegated roles) sign +for all distributions belonging to registered PyPI projects. + +Figure 2 provides an overview of the roles available within PyPI, which +includes the top-level roles and the roles delegated by *targets*. The figure +also indicates the types of keys used to sign each role and which roles are +trusted to sign for files available on PyPI. The next two sections cover the +details of signing repository files and the types of keys used for each role. + +.. image:: figure2.png + +Figure 2: An overview of the role metadata available on PyPI. + +The roles that change most frequently are *timestamp*, *snapshot* and delegated +roles (*bins* and its delegated roles). The *timestamp* and *snapshot* +metadata MUST be updated whenever *root*, *targets* or delegated metadata are +updated. Observe, though, that *root* and *targets* metadata are much less +likely to be updated as often as delegated metadata. Therefore, *timestamp* +and *snapshot* metadata will most likely be updated frequently (possibly every +minute) due to delegated metadata being updated frequently in order to support +continuous delivery of projects. Continuous delivery is a set of processes +that PyPI uses produce snapshots that can safely coexist and be deleted +independent of other snapshots [18]_. + +Every year, PyPI administrators SHOULD sign for *root* and *targets* role keys. +Automation will continuously sign for a timestamped, snapshot of all projects. +A `repository management`__ tool is available that can sign metadata files, +generate cryptographic keys, and manage a TUF repository. + +__ https://github.com/theupdateframework/tuf/tree/develop/tuf#repository-management + + +How to Establish Initial Trust in the PyPI Root Keys +---------------------------------------------------- + +Package managers like pip need to ship a file called "root.json" with the +installation files that users initially download. This includes information +about the keys trusted for certain roles, as well as the root keys themselves. +Any new version of "root.json" that clients may download are verified against +the root keys that client's initially trust. If a root key is compromised, but +a threshold of keys are still secured, the PyPI administrator MUST push a new +release that revokes trust in the compromised keys. If a threshold of root keys +are compromised, then "root.json" should be updated out-of-band, however the +threshold should be chosen so that this is extremely unlikely. The TUF client +library does not require manual intervention if root keys are revoked or added: +the update process handles the cases where "root.json" has changed. + +To bundle the software, "root.json" MUST be included in the version of pip +shipped with CPython (via ensurepip). The TUF client library then loads the +root metadata and downloads the rest of the roles, including updating +"root.json" if it has changed. An `outline of the update process`__ is +available. + +__ https://github.com/theupdateframework/tuf/tree/develop/tuf/client#overview-of-the-update-process. + + +Minimum Security Model +---------------------- + +There are two security models to consider when integrating TUF with PyPI. The +one proposed in this PEP is the minimum security model, which supports +verification of PyPI distributions that are signed with private cryptographic +keys stored on PyPI. Distributions uploaded by developers are signed by PyPI +and immediately available for download. A possible future extension to this +PEP, discussed in Appendix B, proposes the maximum security model and allows a +developer to sign for his/her project. Developer keys are not stored online: +therefore, projects are safe from PyPI compromises. + +The minimum security model requires no action from a developer and protects +against malicious CDNs [19]_ and public mirrors. To support continuous +delivery of uploaded packages, PyPI signs for projects with an online key. +This level of security prevents projects from being accidentally or +deliberately tampered with by a mirror or a CDN because the mirror or CDN will +not have any of the keys required to sign for projects. However, it does not +protect projects from attackers who have compromised PyPI, since attackers can +manipulate TUF metadata using the keys stored online. + +This PEP proposes that the *bins* role (and its delegated roles) sign for all +PyPI projects with an online key. The *targets* role, which only signs with an +offline key, MUST delegate all PyPI projects to the *bins* role. This means +that when a package manager such as pip (i.e., using TUF) downloads a +distribution from a project on PyPI, it will consult the *bins* role about the +TUF metadata for the project. If no bin roles delegated by *bins* specify the +project's distribution, then the project is considered to be non-existent on +PyPI. + + +Metadata Expiry Times +--------------------- + +The *root* and *targets* role metadata SHOULD expire in one year, because these +two metadata files are expected to change very rarely. + +The *timestamp*, *snapshot*, and *bins* metadata SHOULD expire in one day +because a CDN or mirror SHOULD synchronize itself with PyPI every day. +Furthermore, this generous time frame also takes into account client clocks +that are highly skewed or adrift. + + +Metadata Scalability +-------------------- + +Due to the growing number of projects and distributions, TUF metadata will also +grow correspondingly. For example, consider the *bins* role. In August 2013, +it was found that the size of the *bins* metadata was about 42MB if the *bins* +role itself signed for about 220K PyPI targets (which are simple indices and +distributions). This PEP does not delve into the details, but TUF features a +so-called "`lazy bin walk`__" scheme that splits a large targets' metadata file +into many small ones. This allows a TUF client updater to intelligently +download only a small number of TUF metadata files in order to update any +project signed for by the *bins* role. For example, applying this scheme to +the previous repository resulted in pip downloading between 1.3KB and 111KB to +install or upgrade a PyPI project via TUF. + +__ https://github.com/theupdateframework/tuf/issues/39 + +Based on our findings as of the time of writing, PyPI SHOULD split all targets +in the *bins* role by delegating them to 1024 delegated roles, each of which +would sign for PyPI targets whose hashes fall into that "bin" or delegated role +(see Figure 2). It was found that 1024 bins would result in the *bins* +metadata, and each of its delegated roles, being about the same size (40-50KB) +for about 220K PyPI targets (simple indices and distributions). + +It is possible to make TUF metadata more compact by representing it in a binary +format as opposed to the JSON text format. Nevertheless, a sufficiently large +number of projects and distributions will introduce scalability challenges at +some point, and therefore the *bins* role will still need delegations (as +outlined in figure 2) in order to address the problem. Furthermore, the JSON +format is an open and well-known standard for data interchange. Due to the +large number of delegated metadata, compressed versions of *snapshot* metadata +SHOULD also be made available to clients. + + +PyPI and Key Requirements +========================= + +In this section, the kinds of keys required to sign for TUF roles on PyPI are +examined. TUF is agnostic with respect to choices of digital signature +algorithms. For the purpose of discussion, it is assumed that most digital +signatures will be produced with the well-tested and tried RSA algorithm [20]_. +Nevertheless, we do NOT recommend any particular digital signature algorithm in +this PEP because there are a few important constraints: first, cryptography +changes over time; second, package managers such as pip may wish to perform +signature verification in Python, without resorting to a compiled C library, in +order to be able to run on as many systems as Python supports; and third, TUF +recommends diversity of keys for certain applications. + + +Number Of Keys Recommended +-------------------------- + +The *timestamp*, *snapshot*, and *bins* roles require continuous delivery. +Even though their respective keys MUST be online, this PEP requires that the +keys be independent of each other. Different keys for online roles allow for +each of the keys to be placed on separate servers if need be, and prevents side +channel attacks that compromise one key from automatically compromising the +rest of the keys. Therefore, each of the *timestamp*, *snapshot*, and *bins* +roles MUST require (1, 1) keys. + +The *bins* role MAY delegate targets in an automated manner to a number of +roles called "bins", as discussed in the previous section. Each of the "bin" +roles SHOULD share the same key as the *bins* role, due to space efficiency, +and because there is no security advantage to requiring separate keys. + +The *root* role key is critical for security and should very rarely be used. +It is primarily used for key revocation, and it is the locus of trust for all +of PyPI. The *root* role signs for the keys that are authorized for each of +the top-level roles (including its own). Keys belonging to the *root* role are +intended to be very well-protected and used with the least frequency of all +keys. It is RECOMMENDED that every PSF board member own a (strong) root key. +A majority of them can then constitute a quorum to revoke or endow trust in all +top-level keys. Alternatively, the system administrators of PyPI could be +given responsibility for signing for the *root* role. Therefore, the *root* +role SHOULD require (t, n) keys, where n is the number of either all PyPI +administrators or all PSF board members, and t > 1 (so that at least two +members must sign the *root* role). + +The *targets* role will be used only to sign for the static delegation of all +targets to the *bins* role. Since these target delegations must be secured +against attacks in the event of a compromise, the keys for the *targets* role +MUST be offline and independent of other keys. For simplicity of key +management, without sacrificing security, it is RECOMMENDED that the keys of +the *targets* role be permanently discarded as soon as they have been created +and used to sign for the role. Therefore, the *targets* role SHOULD require +(1, 1) keys. Again, this is because the keys are going to be permanently +discarded and more offline keys will not help resist key recovery attacks [21]_ +unless diversity of keys is maintained. + + +Online and Offline Keys Recommended for Each Role +------------------------------------------------- + +In order to support continuous delivery, the *timestamp*, *snapshot*, *bins* +role keys MUST be online. + +As explained in the previous section, the *root* and *targets* role keys MUST +be offline for maximum security: these keys will be offline in the sense that +their private keys MUST NOT be stored on PyPI, though some of them MAY be +online in the private infrastructure of the project. + + +How Should Metadata be Generated? +================================= + +Project developers expect the distributions they upload to PyPI to be +immediately available for download. Unfortunately, there will be problems when +many readers and writers simultaneously access the same metadata and +distributions. That is, there needs to be a way to ensure consistency of +metadata and repository files when multiple developers simulaneously change the +same metadata or distributions. There are also issues with consistency on PyPI +without TUF, but the problem is more severe with signed metadata that MUST keep +track of the files available on PyPI in real-time. + +Suppose that PyPI generates a *snapshot*, which indicates the latest version of +every metadata except *timestamp*, at version 1 and a client requests this +*snapshot* from PyPI. While the client is busy downloading this *snapshot*, +PyPI then timestamps a new snapshot at, say, version 2. Without ensuring +consistency of metadata, the client would find itself with a copy of *snapshot* +that disagrees with what is available on PyPI, which is indistinguishable from +arbitrary metadata injected by an attacker. The problem would also occur for +mirrors attempting to sync with PyPI. + + +Consistent Snapshots +-------------------- + +There are problems with consistency on PyPI with or without TUF. TUF requires +that its metadata be consistent with the repository files, but how would the +metadata be kept consistent with projects that change all the time? As a +result, this proposal MUST address the problem of producing a consistent snapshot that captures the state of all known projects at a given time. Each -consistent snapshot can safely coexist with any other consistent snapshot and -deleted independently without affecting any other consistent snapshot. +snapshot should safely coexist with any other snapshot, and be able to be +deleted independently, without affecting any other snapshot. -The gist of the solution is that every metadata or data file written to disk -MUST include in its filename the `cryptographic hash`__ of the file. How would -this help clients which use the TUF protocol to securely and consistently -install or update a project from PyPI? +The solution presented in this PEP is that every metadata or data file managed +by PyPI and written to disk MUST include in its filename the `cryptographic +hash`__ of the file. How would this help clients that use the TUF protocol to +securely and consistently install or update a project from PyPI? __ https://en.wikipedia.org/wiki/Cryptographic_hash_function -Recall that the first step in the TUF protocol requires the client to download -the latest *timestamp* metadata. However, the client would not know in advance -the hash of the *timestamp* metadata file from the latest consistent snapshot. -Therefore, PyPI MUST redirect all HTTP GET requests for *timestamp* metadata to -the *timestamp* metadata file from the latest consistent snapshot. Since the -*timestamp* metadata is the root of a tree of cryptographic hashes pointing to -every other metadata or target file that are meant to exist together for -consistency, the client is then able to retrieve any file from this consistent -snapshot by deterministically including, in the request for the file, the hash -of the file in the filename. Assuming infinite disk space and no `hash -collisions`__, a client may safely read from one consistent snapshot while PyPI -produces another consistent snapshot. +The first step in the TUF protocol requires the client to download the latest +*timestamp* metadata. However, the client would not know in advance the hash +of the *timestamp* associated with the latest snapshot. Therefore, PyPI MUST +redirect all HTTP GET requests for *timestamp* to the *timestamp* referenced in +the latest snapshot. The *timestamp* role is the root of a tree of +cryptographic hashes that points to every other metadata that is meant to exist +together (i.e., clients request metadata in timestamp -> snapshot -> root -> +targets order). Clients are able to retrieve any file from this snapshot +by deterministically including, in the request for the file, the hash of the +file in the filename. Assuming infinite disk space and no `hash collisions`__, +a client may safely read from one snapshot while PyPI produces another +snapshot. __ https://en.wikipedia.org/wiki/Collision_(computer_science) -In this simple but effective manner, we are able to capture a consistent +In this simple but effective manner, PyPI is able to capture a consistent snapshot of all projects and the associated metadata at a given time. The next -subsection will explicate the implementation details of this idea. +subsection provides implementation details of this idea. + +Note: This PEP does not prohibit using advanced file systems or tools to +produce consistent snapshots. There are two important reasons for why this PEP +proposes the simple solution. First, the solution does not mandate that PyPI +use any particular file system or tool. Second, the generic file-system based +approach allows mirrors to use extant file transfer tools such as rsync to +efficiently transfer consistent snapshots from PyPI. Producing Consistent Snapshots ------------------------------ -Given a project, PyPI is responsible for updating, depending on the project, -either the *claimed*, *recently-claimed* or *unclaimed* metadata as well as -associated delegated targets metadata. Every project MUST upload its set of -metadata and targets in a single transaction. We will call this set of files -the project transaction. We will discuss later how PyPI MAY validate the files -in a project transaction. For now, let us focus on how PyPI will respond to a -project transaction. We will call this response the project transaction -process. There will also be a consistent snapshot process that we will define -momentarily; for now, it suffices to know that project transaction processes -and the consistent snapshot process must coordinate with each other. +Given a project, PyPI is responsible for updating the *bins* metadata (roles +delegated by the *bins* role and signed with an online key). Every project +MUST upload its release in a single transaction. The uploaded set of files is +called the "project transaction". How PyPI MAY validate the files in a project +transaction is discussed in a later section. For now, the focus is on how PyPI +will respond to a project transaction. -Also, every metadata and target file MUST include in its filename the `hex -digest`__ of its `SHA-256`__ hash. For this PEP, it is RECOMMENDED that PyPI -adopt a simple convention of the form filename.digest.ext, where filename is -the original filename without a copy of the hash, digest is the hex digest of -the hash, and ext is the filename extension. +Every metadata and target file MUST include in its filename the `hex digest`__ +of its `SHA-256`__ hash. For this PEP, it is RECOMMENDED that PyPI adopt a +simple convention of the form: digest.filename, where filename is the original +filename without a copy of the hash, and digest is the hex digest of the hash. __ http://docs.python.org/2/library/hashlib.html#hashlib.hash.hexdigest __ https://en.wikipedia.org/wiki/SHA-2 -When an *unclaimed* project uploads a new transaction, a project transaction -process MUST add all new targets and relevant delegated *unclaimed* metadata. -(We will see later in this section why the *unclaimed* role will delegate -targets to a number of delegated *unclaimed* roles.) Finally, the project -transaction process MUST inform the consistent snapshot process about new -delegated *unclaimed* metadata. +When a project uploads a new transaction, the project transaction process MUST +add all new targets and relevant delegated *bins* metadata. (It is shown later +in this section why the *bins* role will delegate targets to a number of +delegated *bins* roles.) Finally, the project transaction process MUST inform +the snapshot process about new delegated *bins* metadata. -When a *recently-claimed* project uploads a new a transaction, a project -transaction process MUST add all new targets and delegated targets metadata for -the project. If the project is new, then the project transaction process MUST -also add new *recently-claimed* metadata with public keys and threshold number -(which MUST be part of the transaction) for the project. Finally, the project -transaction process MUST inform the consistent snapshot process about new -*recently-claimed* metadata as well as the current set of delegated targets -metadata for the project. - -The process for a *claimed* project is slightly different. The difference is -that PyPI administrators will choose to move the project from the -*recently-claimed* role to the *claimed* role. A project transaction process -MUST then add new *recently-claimed* and *claimed* metadata to reflect this -migration. As is the case for a *recently-claimed* project, the project -transaction process MUST always add all new targets and delegated targets -metadata for the *claimed* project. Finally, the project transaction process -MUST inform the consistent snapshot process about new *recently-claimed* or -*claimed* metadata as well as the current set of delegated targets metadata for -the project. - -Project transaction processes SHOULD be automated, except when PyPI -administrators move a project from the *recently-claimed* role to the *claimed* -role. Project transaction processes MUST also be applied atomically: either -all metadata and targets, or none of them, are added. The project transaction -processes and consistent snapshot process SHOULD work concurrently. Finally, -project transaction processes SHOULD keep in memory the latest *claimed*, -*recently-claimed* and *unclaimed* metadata so that they will be correctly -updated in new consistent snapshots. +Project transaction processes SHOULD be automated and MUST also be applied +atomically: either all metadata and targets -- or none of them -- are added. +The project transaction and snapshot processes SHOULD work concurrently. +Finally, project transaction processes SHOULD keep in memory the latest *bins* +metadata so that they will be correctly updated in new consistent snapshots. All project transactions MAY be placed in a single queue and processed serially. Alternatively, the queue MAY be processed concurrently in order of -appearance provided that the following rules are observed: +appearance, provided that the following rules are observed: 1. No pair of project transaction processes must concurrently work on the same project. 2. No pair of project transaction processes must concurrently work on - *unclaimed* projects that belong to the same delegated *unclaimed* targets + *bins* projects that belong to the same delegated *bins* targets role. -3. No pair of project transaction processes must concurrently work on new - *recently-claimed* projects. - -4. No pair of project transaction processes must concurrently work on new - *claimed* projects. - -5. No project transaction process must work on a new *claimed* project while - another project transaction process is working on a new *recently-claimed* - project and vice versa. - These rules MUST be observed so that metadata is not read from or written to inconsistently. -The consistent snapshot process is fairly simple and SHOULD be automated. The -consistent snapshot process MUST keep in memory the latest working set of -*root*, *targets* and delegated targets metadata. Every minute or so, the -consistent snapshot process will sign for this latest working set. (Recall -that project transaction processes continuously inform the consistent snapshot -process about the latest delegated targets metadata in a concurrency-safe -manner. The consistent snapshot process will actually sign for a copy of the -latest working set while the actual latest working set in memory will be -updated with information continuously communicated by project transaction -processes.) Next, the consistent snapshot process MUST generate and sign new -*timestamp* metadata that will vouch for the *consistent-snapshot* metadata -generated in the previous step. Finally, the consistent snapshot process MUST -add new *timestamp* and *consistent-snapshot* metadata representing the latest -consistent snapshot. + +Snapshot Process +---------------- + +The snapshot process is fairly simple and SHOULD be automated. The snapshot +process MUST keep in memory the latest working set of *root*, *targets*, and +delegated roles. Every minute or so, the snapshot process will sign for this +latest working set. (Recall that project transaction processes continuously +inform the snapshot process about the latest delegated metadata in a +concurrency-safe manner. The snapshot process will actually sign for a copy of +the latest working set while the latest working set in memory will be updated +with information that is continuously communicated by the project transaction +processes.) The snapshot process MUST generate and sign new *timestamp* +metadata that will vouch for the metadata (*root*, *targets*, and delegated +roles) generated in the previous step. Finally, the snapshot process MUST make +available to clients the new *timestamp* and *snapshot* metadata representing +the latest snapshot. A few implementation notes are now in order. So far, we have seen only that new metadata and targets are added, but not that old metadata and targets are @@ -508,30 +650,19 @@ disk space to produce a new consistent snapshot. In that case, PyPI MAY then use something like a "mark-and-sweep" algorithm to delete sufficiently old consistent snapshots: in order to preserve the latest consistent snapshot, PyPI would walk objects beginning from the root (*timestamp*) of the latest -consistent snapshot, mark all visited objects, and delete all unmarked -objects. The last few consistent snapshots may be preserved in a similar -fashion. Deleting a consistent snapshot will cause clients to see nothing -thereafter but HTTP 404 responses to any request for a file in that consistent -snapshot. Clients SHOULD then retry their requests with the latest consistent +consistent snapshot, mark all visited objects, and delete all unmarked objects. +The last few consistent snapshots may be preserved in a similar fashion. +Deleting a consistent snapshot will cause clients to see nothing except HTTP +404 responses to any request for a file within that consistent snapshot. +Clients SHOULD then retry (as before) their requests with the latest consistent snapshot. -We do **not** consider updates to any consistent snapshot because `hash -collisions`__ are out of the scope of this PEP. In case a hash collision is -observed, PyPI MAY wish to check that the file being added is identical to the -file already stored. (Should a hash collision be observed, it is far more -likely the case that the file is identical rather than being a genuine -`collision attack`__.) Otherwise, PyPI MAY either overwrite the existing file -or ignore any write operation to an existing file. - -__ https://en.wikipedia.org/wiki/Collision_(computer_science) -__ https://en.wikipedia.org/wiki/Collision_attack - All clients, such as pip using the TUF protocol, MUST be modified to download every metadata and target file (except for *timestamp* metadata) by including, -in the request for the file, the hash of the file in the filename. Following -the filename convention recommended earlier, a request for the file at -filename.ext will be transformed to the equivalent request for the file at -filename.digest.ext. +in the request for the file, the cryptographic hash of the file in the +filename. Following the filename convention recommended earlier, a request for +the file at filename.ext will be transformed to the equivalent request for the +file at digest.filename. Finally, PyPI SHOULD use a `transaction log`__ to record project transaction processes and queues so that it will be easier to recover from errors after a @@ -540,487 +671,367 @@ server failure. __ https://en.wikipedia.org/wiki/Transaction_log -Metadata Validation -------------------- - -A *claimed* or *recently-claimed* project will need to upload in its -transaction to PyPI not just targets (a simple index as well as distributions) -but also TUF metadata. The project MAY do so by uploading a ZIP file -containing two directories, /metadata/ (containing delegated targets metadata -files) and /targets/ (containing targets such as the project simple index and -distributions which are signed for by the delegated targets metadata). - -Whenever the project uploads metadata or targets to PyPI, PyPI SHOULD check the -project TUF metadata for at least the following properties: - -* A threshold number of the developers keys registered with PyPI by that - project MUST have signed for the delegated targets metadata file that - represents the "root" of targets for that project (e.g. metadata/targets/ - project.txt). - -* The signatures of delegated targets metadata files MUST be valid. - -* The delegated targets metadata files MUST NOT be expired. - -* The delegated targets metadata MUST be consistent with the targets. - -* A delegator MUST NOT delegate targets that were not delegated to itself by - another delegator. - -* A delegatee MUST NOT sign for targets that were not delegated to itself by a - delegator. - -* Every file MUST contain a unique copy of its hash in its filename following - the filename.digest.ext convention recommended earlier. - -If PyPI chooses to check the project TUF metadata, then PyPI MAY choose to -reject publishing any set of metadata or targets that do not meet these -requirements. - -PyPI MUST enforce access control by ensuring that each project can only write -to the TUF metadata for which it is responsible. It MUST do so by ensuring -that project transaction processes write to the correct metadata as well as -correct locations within those metadata. For example, a project transaction -process for an *unclaimed* project MUST write to the correct target paths in -the correct delegated *unclaimed* metadata for the targets of the project. - -On rare occasions, PyPI MAY wish to extend the TUF metadata format for projects -in a backward-incompatible manner. Note that PyPI will NOT be able to -automatically rewrite existing TUF metadata on behalf of projects in order to -upgrade the metadata to the new backward-incompatible format because this would -invalidate the signatures of the metadata as signed by developer keys. -Instead, package managers SHOULD be written to recognize and handle multiple -incompatible versions of TUF metadata so that *claimed* and *recently-claimed* -projects could be offered a reasonable time to migrate their metadata to newer -but backward-incompatible formats. - -The details of how each project manages its TUF metadata is beyond the scope of -this PEP. - - -Mirroring Protocol ------------------- - -The mirroring protocol as described in PEP 381 [9]_ SHOULD change to mirror -PyPI with TUF. - -A mirror SHOULD have to maintain for its clients only one consistent snapshot -which would represent the latest consistent snapshot from PyPI known to the -mirror. The mirror would then serve all HTTP requests for metadata or targets -by simply reading directly from this consistent snapshot directory. - -The mirroring protocol itself is fairly simple. The mirror would ask PyPI for -*timestamp* metadata from the latest consistent snapshot and proceed to copy -the entire consistent snapshot from the *timestamp* metadata onwards. If the -mirror encounters a failure to copy any metadata or target file while copying -the consistent snapshot, it SHOULD retrying resuming the copy of that -particular consistent snapshot. If PyPI has deleted that consistent snapshot, -then the mirror SHOULD delete the failed consistent snapshot and try -downloading the latest consistent snapshot instead. - -The mirror SHOULD point users to a previous consistent snapshot directory while -it is copying the latest consistent snapshot from PyPI. Only after the latest -consistent snapshot has been completely copied SHOULD the mirror switch clients -to the latest consistent snapshot. The mirror MAY then delete the previous -consistent snapshot once it finds that no client is reading from the previous -consistent snapshot. - -The mirror MAY use extant file transfer software such as rsync__ to mirror -PyPI. In that case, the mirror MUST first obtain the latest known timestamp -metadata from PyPI. The mirror MUST NOT immediately publish the latest known -timestamp metadata from PyPI. Instead, the mirror MUST first iteratively -transfer all new files from PyPI until there are no new files left to transfer. -Finally, the mirror MUST publish the latest known timestamp it fetched from -PyPI so that package managers such as pip may be directed to the latest -consistent snapshot known to the mirror. - -__ https://rsync.samba.org/ - - -Backup Process --------------- - -In order to be able to safely restore from static snapshots later in the event -of a compromise, PyPI SHOULD maintain a small number of its own mirrors to copy -PyPI consistent snapshots according to some schedule. The mirroring protocol -can be used immediately for this purpose. The mirrors must be secured and -isolated such that they are responsible only for mirroring PyPI. The mirrors -can be checked against one another to detect accidental or malicious failures. - - -Metadata Expiry Times ---------------------- - -The *root* and *targets* role metadata SHOULD expire in a year, because these -metadata files are expected to change very rarely. - -The *claimed* role metadata SHOULD expire in three to six months, because this -metadata is expected to be refreshed in that time frame. This time frame was -chosen to induce an easier administration process for PyPI. - -The *timestamp*, *consistent-snapshot*, *recently-claimed* and *unclaimed* role -metadata SHOULD expire in a day because a CDN or mirror SHOULD synchronize -itself with PyPI every day. Furthermore, this generous time frame also takes -into account client clocks that are highly skewed or adrift. - -The expiry times for the delegated targets metadata of a project is beyond the -scope of this PEP. - - -Metadata Scalability --------------------- - -Due to the growing number of projects and distributions, the TUF metadata will -also grow correspondingly. - -For example, consider the *unclaimed* role. In August 2013, we found that the -size of the *unclaimed* role metadata was about 42MB if the *unclaimed* role -itself signed for about 220K PyPI targets (which are simple indices and -distributions). We will not delve into details in this PEP, but TUF features a -so-called "`lazy bin walk`__" scheme which splits a large targets or delegated -targets metadata file into many small ones. This allows a TUF client updater -to intelligently download only a small number of TUF metadata files in order to -update any project signed for by the *unclaimed* role. For example, applying -this scheme to the previous repository resulted in pip downloading between -1.3KB and 111KB to install or upgrade a PyPI project via TUF. - -__ https://github.com/theupdateframework/tuf/issues/39 - -From our findings as of the time of writing, PyPI SHOULD split all targets in -the *unclaimed* role by delegating it to 1024 delegated targets role, each of -which would sign for PyPI targets whose hashes fall into that "bin" or -delegated targets role. We found that 1024 bins would result in the -*unclaimed* role metadata and each of its binned delegated targets role -metadata to be about the same size (40-50KB) for about 220K PyPI targets -(simple indices and distributions). - -It is possible to make the TUF metadata more compact by representing it in a -binary format as opposed to the JSON text format. Nevertheless, we believe -that a sufficiently large number of project and distributions will induce -scalability challenges at some point, and therefore the *unclaimed* role will -then still need delegations in order to address the problem. Furthermore, the -JSON format is an open and well-known standard for data interchange. - -Due to the large number of delegated target metadata files, compressed versions -of *consistent-snapshot* metadata SHOULD also be made available. - - -Key Management -============== - -In this section, we examine the kind of keys required to sign for TUF roles on -PyPI. TUF is agnostic with respect to choices of digital signature algorithms. -For the purpose of discussion, we will assume that most digital signatures will -be produced with the well-tested and tried RSA algorithm [20]_. Nevertheless, -we do NOT recommend any particular digital signature algorithm in this PEP -because there are a few important constraints: firstly, cryptography changes -over time; secondly, package managers such as pip may wish to perform signature -verification in Python, without resorting to a compiled C library, in order to -be able to run on as many systems as Python supports; finally, TUF recommends -diversity of keys for certain applications, and we will soon discuss these -exceptions. - - -Number Of Keys --------------- - -The *timestamp*, *consistent-snapshot*, *recently-claimed* and *unclaimed* -roles will need to support continuous delivery. Even though their respective -keys will then need to be online, we will require that the keys be independent -of each other. This allows for each of the keys to be placed on separate -servers if need be, and prevents side channel attacks that compromise one key -from automatically compromising the rest of the keys. Therefore, each of the -*timestamp*, *consistent-snapshot*, *recently-claimed* and *unclaimed* roles -MUST require (1, 1) keys. - -The *unclaimed* role MAY delegate targets in an automated manner to a number of -roles called "bins", as we discussed in the previous section. Each of the -"bin" roles SHOULD share the same key as the *unclaimed* role, due -simultaneously to space efficiency of metadata and because there is no security -advantage in requiring separate keys. - -The *root* role is critical for security and should very rarely be used. It is -primarily used for key revocation, and it is the root of trust for all of PyPI. -The *root* role signs for the keys that are authorized for each of the -top-level roles (including itself). The keys belonging to the *root* role are -intended to be very well-protected and used with the least frequency of all -keys. We propose that every PSF board member own a (strong) root key. A -majority of them can then constitute the quorum to revoke or endow trust in all -top-level keys. Alternatively, the system administrators of PyPI (instead of -PSF board members) could be responsible for signing for the *root* role. -Therefore, the *root* role SHOULD require (t, n) keys, where n is the number of -either all PyPI administrators or all PSF board members, and t > 1 (so that at -least two members must sign the *root* role). - -The *targets* role will be used only to sign for the static delegation of all -targets to the *claimed*, *recently-claimed* and *unclaimed* roles. Since -these target delegations must be secured against attacks in the event of a -compromise, the keys for the *targets* role MUST be offline and independent -from other keys. For simplicity of key management without sacrificing -security, it is RECOMMENDED that the keys of the *targets* role are permanently -discarded as soon as they have been created and used to sign for the role. -Therefore, the *targets* role SHOULD require (1, 1) keys. Again, this is -because the keys are going to be permanently discarded, and more offline keys -will not help against key recovery attacks [21]_ unless diversity of keys is -maintained. - -Similarly, the *claimed* role will be used only to sign for the dynamic -delegation of projects to their respective developer keys. Since these target -delegations must be secured against attacks in the event of a compromise, the -keys for the *claimed* role MUST be offline and independent from other keys. -Therefore, the *claimed* role SHOULD require (t, n) keys, where n is the number -of all PyPI administrators (in order to keep it manageable), and t ≥ 1 (so that -at least one member MUST sign the *claimed* role). While a stronger threshold -would indeed render the role more robust against a compromise of the *claimed* -keys (which is highly unlikely assuming that the keys are independent and -securely kept offline), we think that this trade-off is acceptable for the -important purpose of keeping the maintenance overhead for PyPI administrators -as little as possible. At the time of writing, we are keeping this point open -for discussion by the distutils-sig community. - -The number of developer keys is project-specific and thus beyond the scope of -this PEP. - - -Online and Offline Keys ------------------------ - -In order to support continuous delivery, the *timestamp*, -*consistent-snapshot*, *recently-claimed* and *unclaimed* role keys MUST be -online. - -As explained in the previous section, the *root*, *targets* and *claimed* role -keys MUST be offline for maximum security. Developers keys will be offline in -the sense that the private keys MUST NOT be stored on PyPI, though some of them -may be online on the private infrastructure of the project. - - -Key Strength ------------- - -At the time of writing, we recommend that all RSA keys (both offline and -online) SHOULD have a minimum key size of 3072 bits for data-protection -lifetimes beyond 2030 [22]_. - - -Diversity Of Keys ------------------ - -Due to the threats of weak key generation and implementation weaknesses [2]_, -the types of keys as well as the libraries used to generate them should vary -within TUF on PyPI. Our current implementation of TUF supports multiple -digital signature algorithms such as RSA (with OpenSSL [23]_ or PyCrypto [24]_) -and ed25519 [25]_. Furthermore, TUF supports the binding of other -cryptographic libraries that it does not immediately support "out of the box", -and so one MAY generate keys using other cryptographic libraries and use them -for TUF on PyPI. - -As such, the root role keys SHOULD be generated by a variety of digital -signature algorithms as implemented by different cryptographic libraries. - - Key Compromise Analysis ------------------------ +======================= -.. image:: https://raw.github.com/theupdateframework/pep-on-pypi-with-tuf/master/table1.png +This PEP has covered the minimum security model, the TUF roles that should be +added to support continuous delivery of distributions, and how to generate and +sign the metadata of each role. The remaining sections discuss how PyPI +SHOULD audit repository metadata, and the methods PyPI can use to detect and +recover from a PyPI compromise. -Table 1: Attacks possible by compromising certain combinations of role keys +Table 1 summarizes a few of the attacks possible when a threshold number of +private cryptographic keys (belonging to any of the PyPI roles) are +compromised. The leftmost column lists the roles (or a combination of roles) +that have been compromised, and the columns to its right show whether the +compromised roles leaves clients susceptible to malicious updates, a freeze +attack, or metadata inconsistency attacks. ++-----------------+-------------------+----------------+--------------------------------+ +| Role Compromise | Malicious Updates | Freeze Attack | Metadata Inconsistency Attacks | ++=================+===================+================+================================+ +| timestamp | NO | YES | NO | +| | snapshot and | limited by | snapshot needs to cooperate | +| | targets or any | earliest root, | | +| | of the bins need | targets, or | | +| | to cooperate | bin expiry | | +| | | time | | ++-----------------+-------------------+----------------+--------------------------------+ +| snapshot | NO | NO | NO | +| | timestamp and | timestamp | timestamp needs to cooperate | +| | targets or any of | needs to | | +| | the bins need to | cooperate | | +| | cooperate | | | ++-----------------+-------------------+----------------+--------------------------------+ +| timestamp | NO | YES | YES | +| **AND** | targets or any | limited by | limited by earliest root, | +| snapshot | of the bins need | earliest root, | targets, or bin metadata | +| | to cooperate | targets, or | expiry time | +| | | bin metadata | | +| | | expiry time | | ++-----------------+-------------------+----------------+--------------------------------+ +| targets | NO | NOT APPLICABLE | NOT APPLICABLE | +| **OR** | timestamp and | need timestamp | need timestamp and snapshot | +| bin | snapshot need to | and snapshot | | +| | cooperate | | | ++-----------------+-------------------+----------------+--------------------------------+ +| timestamp | YES | YES | YES | +| **AND** | | limited by | limited by earliest root, | +| snapshot | | earliest root, | targets, or bin metadata | +| **AND** | | targets, or | expiry time | +| bin | | bin metadata | | +| | | expiry time | | ++-----------------+-------------------+----------------+--------------------------------+ +| root | YES | YES | YES | ++-----------------+-------------------+----------------+--------------------------------+ -Table 1 summarizes the kinds of attacks rendered possible by compromising a -threshold number of keys belonging to the TUF roles on PyPI. Except for the -*timestamp* and *consistent-snapshot* roles, the pairwise interaction of role -compromises may be found by taking the union of both rows. +Table 1: Attacks possible by compromising certain combinations of role keys. +In `September 2013`__, it was shown how the latest version (at the time) of pip +was susceptible to these attacks and how TUF could protect users against them +[14]_. -In September 2013, we showed how the latest version of pip then was susceptible -to these attacks and how TUF could protect users against them [14]_. +__ https://mail.python.org/pipermail/distutils-sig/2013-September/022755.html -An attacker who compromises developer keys for a project and who is able to -somehow upload malicious metadata and targets to PyPI will be able to serve -malicious updates to users of that project (and that project alone). Note that -compromising *targets* or any delegated targets role (except for project -targets metadata) does not immediately endow the attacker with the ability to -serve malicious updates. The attacker must also compromise the *timestamp* and -*consistent-snapshot* roles (which are both online and therefore more likely to -be compromised). This means that in order to launch any attack, one must be -not only be able to act as a man-in-the-middle but also compromise the -*timestamp* key (or the *root* keys and sign a new *timestamp* key). To launch -any attack other than a freeze attack, one must also compromise the -*consistent-snapshot* key. +Note that compromising *targets* or any delegated role (except for project +targets metadata) does not immediately allow an attacker to serve malicious +updates. The attacker must also compromise the *timestamp* and *snapshot* +roles (which are both online and therefore more likely to be compromised). +This means that in order to launch any attack, one must not only be able to +act as a man-in-the-middle but also compromise the *timestamp* key (or +compromise the *root* keys and sign a new *timestamp* key). To launch any +attack other than a freeze attack, one must also compromise the *snapshot* key. Finally, a compromise of the PyPI infrastructure MAY introduce malicious -updates to *recently-claimed* and *unclaimed* projects because the keys for -those roles are online. However, attackers cannot modify *claimed* projects in -such an event because *targets* and *claimed* metadata have been signed with -offline keys. Therefore, it is RECOMMENDED that high-value projects register -their developer keys with PyPI and sign for their own distributions. +updates to *bins* projects because the keys for these roles are online. The +maximum security model discussed in the appendix addresses this issue. PEP X +[VD: Link to PEP once it is completed] also covers the maximum security model +and goes into more detail on generating developer keys and signing uploaded +distributions. In the Event of a Key Compromise -------------------------------- -By a key compromise, we mean that the key as well as PyPI infrastructure has -been compromised and used to sign new metadata on PyPI. +A key compromise means that a threshold of keys (belonging to the metadata +roles on PyPI), as well as the PyPI infrastructure, have been compromised and +used to sign new metadata on PyPI. -If a threshold number of developer keys of a project have been compromised, -then the project MUST take the following steps: +If a threshold number of *timestamp*, *snapshot*, or *bins* keys have +been compromised, then PyPI MUST take the following steps: -1. The project metadata and targets MUST be restored to the last known good - consistent snapshot where the project was not known to be compromised. This - can be done by the developers repackaging and resigning all targets with the - new keys. - -2. The project delegated targets metadata MUST have their version numbers - incremented, expiry times suitably extended and signatures renewed. - -Whereas PyPI MUST take the following steps: - -1. Revoke the compromised developer keys from the delegation to the project by - the *recently-claimed* or *claimed* role. This is done by replacing the - compromised developer keys with newly issued developer keys. - -2. A new timestamped consistent snapshot MUST be issued. - -If a threshold number of *timestamp*, *consistent-snapshot*, *recently-claimed* -or *unclaimed* keys have been compromised, then PyPI MUST take the following -steps: - -1. Revoke the *timestamp*, *consistent-snapshot* and *targets* role keys from +1. Revoke the *timestamp*, *snapshot* and *targets* role keys from the *root* role. This is done by replacing the compromised *timestamp*, - *consistent-snapshot* and *targets* keys with newly issued keys. + *snapshot* and *targets* keys with newly issued keys. -2. Revoke the *recently-claimed* and *unclaimed* keys from the *targets* role - by replacing their keys with newly issued keys. Sign the new *targets* role - metadata and discard the new keys (because, as we explained earlier, this - increases the security of *targets* metadata). +2. Revoke the *bins* keys from the *targets* role by replacing their keys with + newly issued keys. Sign the new *targets* role metadata and discard the new + keys (because, as explained earlier, this increases the security of + *targets* metadata). -3. Clear all targets or delegations in the *recently-claimed* role and delete - all associated delegated targets metadata. Recently registered projects - SHOULD register their developer keys again with PyPI. - -4. All targets of the *recently-claimed* and *unclaimed* roles SHOULD be - compared with the last known good consistent snapshot where none of the - *timestamp*, *consistent-snapshot*, *recently-claimed* or *unclaimed* keys +3. All targets of the *bins* roles SHOULD be compared with the last known + good consistent snapshot where none of the *timestamp*, *snapshot*, or + *bins* keys were known to have been compromised. Added, updated or deleted targets in the compromised consistent snapshot that do not match the last known good consistent snapshot MAY be restored to their previous versions. After - ensuring the integrity of all *unclaimed* targets, the *unclaimed* metadata + ensuring the integrity of all *bins* targets, the *bins* metadata MUST be regenerated. -5. The *recently-claimed* and *unclaimed* metadata MUST have their version - numbers incremented, expiry times suitably extended and signatures renewed. +4. The *bins* metadata MUST have their version numbers incremented, expiry + times suitably extended, and signatures renewed. -6. A new timestamped consistent snapshot MUST be issued. +5. A new timestamped consistent snapshot MUST be issued. -This would preemptively protect all of these roles even though only one of them -may have been compromised. +Following these steps would preemptively protect all of these roles even though +only one of them may have been compromised. -If a threshold number of the *targets* or *claimed* keys have been compromised, -then there is little that an attacker could do without the *timestamp* and -*consistent-snapshot* keys. In this case, PyPI MUST simply revoke the -compromised *targets* or *claimed* keys by replacing them with new keys in the -*root* and *targets* roles respectively. - -If a threshold number of the *timestamp*, *consistent-snapshot* and *claimed* -keys have been compromised, then PyPI MUST take the following steps in addition -to the steps taken when either the *timestamp* or *consistent-snapshot* keys -are compromised: - -1. Revoke the *claimed* role keys from the *targets* role and replace them with - newly issued keys. - -2. All project targets of the *claimed* roles SHOULD be compared with the last - known good consistent snapshot where none of the *timestamp*, - *consistent-snapshot* or *claimed* keys were known to have been compromised. - Added, updated or deleted targets in the compromised consistent snapshot - that do not match the last known good consistent snapshot MAY be restored to - their previous versions. After ensuring the integrity of all *claimed* - project targets, the *claimed* metadata MUST be regenerated. - -3. The *claimed* metadata MUST have their version numbers incremented, expiry - times suitably extended and signatures renewed. - -If a threshold number of the *timestamp*, *consistent-snapshot* and *targets* -keys have been compromised, then PyPI MUST take the union of the steps taken -when the *claimed*, *recently-claimed* and *unclaimed* keys have been -compromised. - -If a threshold number of the *root* keys have been compromised, then PyPI MUST -take the steps taken when the *targets* role has been compromised as well as -replace all of the *root* keys. +If a threshold number of *root* keys have been compromised, then PyPI MUST take +the steps taken when the *targets* role has been compromised. All of the +*root* keys must also be replaced. It is also RECOMMENDED that PyPI sufficiently document compromises with security bulletins. These security bulletins will be most informative when -users of pip with TUF are unable to install or update a project because the -keys for the *timestamp*, *consistent-snapshot* or *root* roles are no longer -valid. They could then visit the PyPI web site to consult security bulletins -that would help to explain why they are no longer able to install or update, -and then take action accordingly. When a threshold number of *root* keys have -not been revoked due to a compromise, then new *root* metadata may be safely -updated because a threshold number of existing *root* keys will be used to sign -for the integrity of the new *root* metadata so that TUF clients will be able -to verify the integrity of the new *root* metadata with a threshold number of -previously known *root* keys. This will be the common case. Otherwise, in the -worst case where a threshold number of *root* keys have been revoked due to a +users of pip-with-TUF are unable to install or update a project because the +keys for the *timestamp*, *snapshot* or *root* roles are no longer valid. They +could then visit the PyPI web site to consult security bulletins that would +help to explain why they are no longer able to install or update, and then take +action accordingly. When a threshold number of *root* keys have not been +revoked due to a compromise, then new *root* metadata may be safely updated +because a threshold number of existing *root* keys will be used to sign for the +integrity of the new *root* metadata. TUF clients will be able to verify the +integrity of the new *root* metadata with a threshold number of previously +known *root* keys. This will be the common case. Otherwise, in the worst +case, where a threshold number of *root* keys have been revoked due to a compromise, an end-user may choose to update new *root* metadata with `out-of-band`__ mechanisms. __ https://en.wikipedia.org/wiki/Out-of-band#Authentication -Appendix: Rejected Proposals -============================ +Auditing Snapshots +------------------ + +If a malicious party compromises PyPI, they can sign arbitrary files with any +of the online keys. The roles with offline keys (i.e., *root* and *targets*) +are still protected. To safely recover from a repository compromise, snapshots +should be audited to ensure files are only restored to trusted versions. + +When a repository compromise has been detected, the integrity of three types of +information must be validated: + +1. If the online keys of the repository have been compromised, they can be + revoked by having the *targets* role sign new metadata delegating to a new + key. + +2. If the role metadata on the repository has been changed, this would impact + the metadata that is signed by online keys. Any role information created + since the last period should be discarded. As a result, developers of new + projects will need to re-register their projects. + +3. If the packages themselves may have been tampered with, they can be + validated using the stored hash information for packages that existed at the + time of the last period. + +In order to safely restore snapshots in the event of a compromise, PyPI SHOULD +maintain a small number of its own mirrors to copy PyPI snapshots according to +some schedule. The mirroring protocol can be used immediately for this +purpose. The mirrors must be secured and isolated such that they are +responsible only for mirroring PyPI. The mirrors can be checked against one +another to detect accidental or malicious failures. + +Another approach is to generate the cryptographic hash of *snapshot* +periodically and tweet it. Perhaps a user comes forward with the actual +metadata and the repository maintainers can verify the metadata's cryptographic +hash. Alternatively, PyPI may periodically archive its own versions of +*snapshot* rather than rely on externally provided metadata. In this case, +PyPI SHOULD take the cryptographic hash of every package on the repository and +store this data on an offline device. If any package hash has changed, this +indicates an attack. + +As for attacks that serve different versions of metadata, or freeze a version +of a package at a specific version, they can be handled by TUF with techniques +like implicit key revocation and metadata mismatch detection [81]. -Alternative Proposals for Producing Consistent Snapshots --------------------------------------------------------- +Appendix A: Repository Attacks Prevented by TUF +=============================================== -The complete file snapshot (CFS) scheme uses file system directories to store -efficient consistent snapshots over time. In this scheme, every consistent -snapshot will be stored in a separate directory, wherein files that are shared -with previous consistent snapshots will be `hard links`__ instead of copies. +* **Arbitrary software installation**: An attacker installs anything they want + on the client system. That is, an attacker can provide arbitrary files in + respond to download requests and the files will not be detected as + illegitimate. -__ https://en.wikipedia.org/wiki/Hard_link +* **Rollback attacks**: An attacker presents a software update system with + older files than those the client has already seen, causing the client to use + files older than those the client knows about. -The `differential file`__ snapshot (DFS) scheme is a variant of the CFS scheme, -wherein the next consistent snapshot directory will contain only the additions -of new files and updates to existing files of the previous consistent snapshot. -(The first consistent snapshot will contain a complete set of files known -then.) Deleted files will be marked as such in the next consistent snapshot -directory. This means that files will be resolved in this manner: First, set -the current consistent snapshot directory to be the latest consistent snapshot -directory. Then, any requested file will be seeked in the current consistent -snapshot directory. If the file exists in the current consistent snapshot -directory, then that file will be returned. If it has been marked as deleted -in the current consistent snapshot directory, then that file will be reported -as missing. Otherwise, the current consistent snapshot directory will be set -to the preceding consistent snapshot directory and the previous few steps will -be iterated until there is no preceding consistent snapshot to be considered, -at which point the file will be reported as missing. +* **Indefinite freeze attacks**: An attacker continues to present a software + update system with the same files the client has already seen. The result is + that the client does not know that new files are available. -__ http://dl.acm.org/citation.cfm?id=320484 +* **Endless data attacks**: An attacker responds to a file download request + with an endless stream of data, causing harm to clients (e.g., a disk + partition filling up or memory exhaustion). -With the CFS scheme, the trade-off is the I/O costs of producing a consistent -snapshot with the file system. As of October 2013, we found that a fairly -modern computer with a 7200RPM hard disk drive required at least three minutes -to produce a consistent snapshot with the "cp -lr" command on the ext3__ file -system. Perhaps the I/O costs of this scheme may be ameliorated with advanced -tools or file systems such as ZFS__ or btrfs__. +* **Slow retrieval attacks**: An attacker responds to clients with a very slow + stream of data that essentially results in the client never continuing the + update process. -__ https://en.wikipedia.org/wiki/Ext3 -__ https://en.wikipedia.org/wiki/ZFS -__ https://en.wikipedia.org/wiki/Btrfs +* **Extraneous dependencies attacks**: An attacker indicates to clients that in + order to install the software they wanted, they also need to install + unrelated software. This unrelated software can be from a trusted source + but may have known vulnerabilities that are exploitable by the attacker. -While the DFS scheme improves upon the CFS scheme in terms of producing faster -consistent snapshots, there are at least two trade-offs. The first is that a -web server will need to be modified to perform the "daisy chain" resolution of -a file. The second is that every now and then, the differential snapshots will -need to be "squashed" or merged together with the first consistent snapshot to -produce a new first consistent snapshot with the latest and complete set of -files. Although the merge cost may be amortized over time, this scheme is not -conceptually si +* **Mix-and-match attacks**: An attacker presents clients with a view of a + repository that includes files that never existed together on the repository + at the same time. This can result in, for example, outdated versions of + dependencies being installed. + +* **Wrong software installation**: An attacker provides a client with a trusted + file that is not the one the client wanted. + +* **Malicious mirrors preventing updates**: An attacker in control of one + repository mirror is able to prevent users from obtaining updates from + other, good mirrors. + +* **Vulnerability to key compromises**: An attacker who is able to compromise a + single key or less than a given threshold of keys can compromise clients. + This includes relying on a single online key (such as only being protected + by SSL) or a single offline key (such as most software update systems use + to sign files). +Appendix B: Extension to the Minimum Security Model +=================================================== + +The maximum security model and end-to-end signing have been intentionally +excluded from this PEP. Although both improve PyPI's ability to survive a +repository compromise and allow developers to sign their distributions, they +have been postponed for review as a potential future extension to PEP 458. PEP +X [VD: Link to PEP once it is completed], which discusses the extension in +detail, is available for review to those developers interested in the +end-to-end signing option. The maximum security model and end-to-end signing +are briefly covered in subsections that follow. + +There are several reasons for not initially supporting the features discussed +in this section: + +1. A build farm (distribution wheels on supported platforms are generated for + each project on PyPI infrastructure) may possibly complicate matters. PyPI + wants to support a build farm in the future. Unfortunately, if wheels are + auto-generated externally, developer signatures for these wheels are + unlikely. However, there might still be a benefit to generating wheels from + source distributions that are signed by developers (provided that + reproducible wheels are possible). Another possibility is to optionally + delegate trust of these wheels to an online role. + +2. An easy-to-use key management solution is needed for developers. + `miniLock`__ is one likely candidate for management and generation of keys. + Although developer signatures can remain optional, this approach may be + inadequate due to the great number of potentially unsigned dependencies each + distribution may have. If any one of these dependencies is unsigned, it + negates any benefit the project gains from signing its own distribution + (i.e., attackers would only need to compromise one of the unsigned + dependencies to attack end-users). Requiring developers to manually sign + distributions and manage keys is expected to render key signing an unused + feature. + + __ https://minilock.io/ + +3. A two-phase approach, where the minimum security model is implemented first + followed by the maximum security model, can simplify matters and give PyPI + administrators time to review the feasibility of end-to-end signing. + + +Maximum Security Model +---------------------- + +The maximum security model relies on developers signing their projects and +uploading signed metadata to PyPI. If the PyPI infrastructure were to be +compromised, attackers would be unable to serve malicious versions of claimed +projects without access to the project's developer key. Figure 3 depicts the +changes made to figure 2, namely that developer roles are now supported and +that three new delegated roles exist: *claimed*, *recently-claimed*, and +*unclaimed*. The *bins* role has been renamed *unclaimed* and can contain any +projects that have not been added to *claimed*. The strength of this model +(over the minimum security model) is in the offline keys provided by +developers. Although the minimum security model supports continuous delivery, +all of the projects are signed by an online key. An attacker can corrupt +packages in the minimum security model, but not in the maximum model without +also compromising a developer's key. + +.. image:: figure3.png + +Figure 3: An overview of the metadata layout in the maximum security model. +The maximum security model supports continuous delivery and survivable key +compromise. + + +End-to-End Signing +------------------ + +End-to-End signing allows both PyPI and developers to sign for the metadata +downloaded by clients. PyPI is trusted to make uploaded projects available to +clients (they sign the metadata for this part of the process), and developers +can sign the distributions that they upload. + +PEP X [VD: Link to PEP once it is completed] discusses the tools available to +developers who sign the distributions that they upload to PyPI. To summarize +PEP X, developers generate cryptographic keys and sign metadata in some +automated fashion, where the metadata includes the information required to +verify the authenticity of the distribution. The metadata is then uploaded to +PyPI by the client, where it will be available for download by package managers +such as pip (i.e., package managers that support TUF metadata). The entire +process is transparent to clients (using a package manager that supports TUF) +who download distributions from PyPI. + + +Appendix C: PEP 470 and Projects Hosted Externally +================================================== + +How should TUF handle distributions that are not hosted on PyPI? According to +`PEP 470`__, projects may opt to host their distributions externally and are +only required to provide PyPI a link to its external index, which package +managers like pip can use to find the project's distributions. PEP 470 does +not mention whether externally hosted projects are considered unverified by +default, as projects that use this option are not required to submit any +information about their distributions (e.g., file size and cryptographic hash) +when the project is registered, nor include a cryptographic hash of the file +in download links. + +__ http://www.python.org/dev/peps/pep-0470/ + +Potentional approaches that PyPI administrators MAY consider to handle +projects hosted externally: + +1. Download external distributions but do not verify them. The targets + metadata will not include information for externally hosted projects. + +2. PyPI will periodically download information from the external index. PyPI + will gather the external distribution's file size and hashes and generate + appropriate TUF metadata. + +3. External projects MUST submit to PyPI the file size and cryptographic hash + for a distribution. + +4. External projects MUST upload to PyPI a developer public key for the + index. The distribution MUST create TUF metadata that is stored at the + index, and signed with the developer's corresponding private key. The + client will fetch the external TUF metadata as part of the package + update process. + +5. External projects MUST upload to PyPI signed TUF metadata (as allowed by + the maximum security model) about the distributions that they host + externally, and a developer public key. Package managers verify + distributions by consulting the signed metadata uploaded to PyPI. + +Only one of the options listed above should be implemented on PyPI. Option +(4) or (5) is RECOMMENDED because external distributions are signed by +developers. External distributions that are forged (due to a compromised +PyPI account or external host) may be detected if external developers are +required to sign metadata, although this requirement is likely only practical +if an easy-to-use key management solution and developer scripts are provided +by PyPI. References @@ -1055,27 +1066,30 @@ References .. [24] https://pypi.python.org/pypi/pycrypto .. [25] http://ed25519.cr.yp.to/ - Acknowledgements ================ -Nick Coghlan, Daniel Holth and the distutils-sig community in general for -helping us to think about how to usably and efficiently integrate TUF with +This material is based upon work supported by the National Science Foundation +under Grants No. CNS-1345049 and CNS-0959138. Any opinions, findings, and +conclusions or recommendations expressed in this material are those of the +author(s) and do not necessarily reflect the views of the National Science +Foundation. + +We thank Nick Coghlan, Daniel Holth and the distutils-sig community in general +for helping us to think about how to usably and efficiently integrate TUF with PyPI. -Roger Dingledine, Sebastian Hahn, Nick Mathewson, Martin Peck and Justin -Samuel for helping us to design TUF from its predecessor Thandy of the Tor -project. +Roger Dingledine, Sebastian Hahn, Nick Mathewson, Martin Peck and Justin Samuel +helped us to design TUF from its predecessor Thandy of the Tor project. -Konstantin Andrianov, Geremy Condra, Vladimir Diaz, Zane Fisher, Justin Samuel, -Tian Tian, Santiago Torres, John Ward, and Yuyu Zheng for helping us to develop -TUF. +We appreciate the efforts of Konstantin Andrianov, Geremy Condra, Zane Fisher, +Justin Samuel, Tian Tian, Santiago Torres, John Ward, and Yuyu Zheng to to +develop TUF. -Vladimir Diaz, Monzur Muhammad and Sai Teja Peddinti for helping us to review -this PEP. - -Zane Fisher for helping us to review and transcribe this PEP. +Vladimir Diaz, Monzur Muhammad and Sai Teja Peddinti helped us to review this +PEP. +Zane Fisher helped us to review and transcribe this PEP. Copyright ========= diff --git a/pep-0480.txt b/pep-0480.txt new file mode 100644 index 000000000..f2f77ccf2 --- /dev/null +++ b/pep-0480.txt @@ -0,0 +1,890 @@ +PEP: 480 +Title: Surviving a Compromise of PyPI: The Maximum Security Model +Version: $Revision$ +Last-Modified: $Date$ +Author: Trishank Karthik Kuppusamy , + Vladimir Diaz , Donald Stufft , + Justin Cappos +BDFL-Delegate: Richard Jones +Discussions-To: DistUtils mailing list +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Requires: 458 +Created: 8-Oct-2014 + + +Abstract +======== + +Proposed is an extension to PEP 458 that adds support for end-to-end signing +and the maximum security model. End-to-end signing allows both PyPI and +developers to sign for the distributions that are downloaded by clients. The +minimum security model proposed by PEP 458 supports continuous delivery of +distributions (because they are signed by online keys), but that model does not +protect distributions in the event that PyPI is compromised. In the minimum +security model, attackers may sign for malicious distributions by compromising +the signing keys stored on PyPI infrastructure. The maximum security model, +described in this PEP, retains the benefits of PEP 458 (e.g., immediate +availability of distributions that are uploaded to PyPI), but additionally +ensures that end-users are not at risk of installing forged software if PyPI is +compromised. + +This PEP discusses the changes made to PEP 458 but excludes its informational +elements to primarily focus on the maximum security model. For example, an +overview of The Update Framework or the basic mechanisms in PEP 458 are not +covered here. The changes to PEP 458 include modifications to the snapshot +process, key compromise analysis, auditing snapshots, and the steps that should +be taken in the event of a PyPI compromise. The signing and key management +process that PyPI MAY RECOMMEND is discussed but not strictly defined. How the +release process should be implemented to manage keys and metadata is left to +the implementors of the signing tools. That is, this PEP delineates the +expected cryptographic key type and signature format included in metadata that +MUST be uploaded by developers in order to support end-to-end verification of +distributions. + + +Rationale +========= + +PEP 458 [1]_ proposes how PyPI should be integrated with The Update Framework +(TUF) [2]_. It explains how modern package managers like pip can be made more +secure, and the types of attacks that can be prevented if PyPI is modified on +the server side to include TUF metadata. Package managers can reference the +TUF metadata available on PyPI to download distributions more securely. + +PEP 458 also describes the metadata layout of the PyPI repository and employs +the minimum security model, which supports continuous delivery of projects and +uses online cryptographic keys to sign the distributions uploaded by +developers. Although the minimum security model guards against most attacks on +software updaters [5]_ [7]_, such as mix-and-match and extraneous dependencies +attacks, it can be improved to support end-to-end signing and to prohibit +forged distributions in the event that PyPI is compromised. + +The main strength of PEP 458 and the minimum security model is the automated +and simplified release process: developers may upload distributions and then +have PyPI sign for their distributions. Much of the release process is handled +in an automated fashion by online roles and this approach requires storing +cryptographic signing keys on the PyPI infrastructure. Unfortunately, +cryptographic keys that are stored online are vulnerable to theft. The maximum +security model, proposed in this PEP, permits developers to sign for the +distributions that they make available to PyPI users, and does not put +end-users at risk of downloading malicious distributions if the online keys +stored on PyPI infrastructure are compromised. + + +Threat Model +============ + +The threat model assumes the following: + +* Offline keys are safe and securely stored. + +* Attackers can compromise at least one of PyPI's trusted keys that are stored + online, and may do so at once or over a period of time. + +* Attackers can respond to client requests. + +* Attackers may control any number of developer keys for projects a client does + not want to install. + +Attackers are considered successful if they can cause a client to install (or +leave installed) something other than the most up-to-date version of the +software the client is updating. When an attacker is preventing the +installation of updates, the attacker's goal is that clients not realize that +anything is wrong. + + +Definitions +=========== + +The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", +"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be +interpreted as described in RFC `2119`__. + +__ http://www.ietf.org/rfc/rfc2119.txt + +This PEP focuses on integrating TUF with PyPI; however, the reader is +encouraged to read about TUF's design principles [2]_. It is also RECOMMENDED +that the reader be familiar with the TUF specification [3]_, and PEP 458 [1]_ +(which this PEP is extending). + +Terms used in this PEP are defined as follows: + +* Projects: Projects are software components that are made available for + integration. Projects include Python libraries, frameworks, scripts, + plugins, applications, collections of data or other resources, and various + combinations thereof. Public Python projects are typically registered on the + Python Package Index [4]_. + +* Releases: Releases are uniquely identified snapshots of a project [4]_. + +* Distributions: Distributions are the packaged files that are used to publish + and distribute a release. + +* Simple index: The HTML page that contains internal links to the + distributions of a project [4]_. + +* Roles: There is one *root* role in PyPI. There are multiple roles whose + responsibilities are delegated to them directly or indirectly by the *root* + role. The term "top-level role" refers to the *root* role and any role + delegated by the *root* role. Each role has a single metadata file that it is + trusted to provide. + +* Metadata: Metadata are files that describe roles, other metadata, and target + files. + +* Repository: A repository is a resource comprised of named metadata and target + files. Clients request metadata and target files stored on a repository. + +* Consistent snapshot: A set of TUF metadata and PyPI targets that capture the + complete state of all projects on PyPI as they existed at some fixed point in + time. + +* The *snapshot* (*release*) role: In order to prevent confusion due to the + different meanings of the term "release" used in PEP 426 [1]_ and the TUF + specification [3]_, the *release* role is renamed to the *snapshot* role. + +* Developer: Either the owner or maintainer of a project who is allowed to + update TUF metadata, as well as distribution metadata and files for a given + project. + +* Online key: A private cryptographic key that MUST be stored on the PyPI + server infrastructure. This usually allows automated signing with the key. + An attacker who compromises the PyPI infrastructure will be able to + immediately read these keys. + +* Offline key: A private cryptographic key that MUST be stored independent of + the PyPI server infrastructure. This prevents automated signing with the + key. An attacker who compromises the PyPI infrastructure will not be able to + immediately read these keys. + +* Threshold signature scheme: A role can increase its resilience to key + compromises by specifying that at least t out of n keys are REQUIRED to sign + its metadata. A compromise of t-1 keys is insufficient to compromise the + role itself. Saying that a role requires (t, n) keys denotes the threshold + signature property. + + +Maximum Security Model +====================== + +The maximum security model permits developers to sign their projects and to +upload signed metadata to PyPI. If the PyPI infrastructure were compromised, +attackers would be unable to serve malicious versions of a *claimed* project +without having access to that project's developer key. Figure 1 depicts the +changes made to the metadata layout of the minimum security model, namely that +developer roles are now supported and that three new delegated roles exist: +*claimed*, *recently-claimed*, and *unclaimed*. The *bins* role from the +minimum security model has been renamed *unclaimed* and can contain any +projects that have not been added to *claimed*. The *unclaimed* role functions +just as before (i.e., as explained in PEP 458, projects added to this role are +signed by PyPI with an online key). Offline keys provided by developers ensure +the strength of the maximum security model over the minimum model. Although +the minimum security model supports continuous delivery of projects, all +projects are signed by an online key. That is, an attacker is able to corrupt +packages in the minimum security model, but not in the maximum model, without +also compromising a developer's key. + +.. image:: figure1.png + +Figure 1: An overview of the metadata layout in the maximum security model. +The maximum security model supports continuous delivery and survivable key +compromise. + +Projects that are signed by developers and uploaded to PyPI for the first time +are added to the *recently-claimed* role. The *recently-claimed* role uses an +online key, so projects uploaded for the first time are immediately available +to clients. After some time has passed, PyPI administrators MAY periodically +move (e.g., every month) projects listed in *recently-claimed* to the *claimed* +role for maximum security. The *claimed* role uses an offline key, thus +projects added to this role cannot be easily forged if PyPI is compromised. + +The *recently-claimed* role is separate from the *unclaimed* role for usability +and efficiency, not security. If new project delegations were prepended to +*unclaimed* metadata, *unclaimed* would need to be re-downloaded every time a +project obtained a key. By separating out new projects, the amount of data +retrieved is reduced. From a usability standpoint, it also makes it easier for +administrators to see which projects are now claimed. This information is +needed when moving keys from *recently-claimed* to *claimed*, which is +discussed in more detail in the "Producing Consistent Snapshots" section. + + +End-to-End Signing +================== + +End-to-end signing allows both PyPI and developers to sign for the metadata +downloaded by clients. PyPI is trusted to make uploaded projects available to +clients (PyPI signs the metadata for this part of the process), and developers +sign the distributions that they upload to PyPI. + +In order to delegate trust to a project, developers are required to submit a +public key to PyPI. PyPI takes the project's public key and adds it to parent +metadata that PyPI then signs. After the initial trust is established, +developers are required to sign distributions that they upload to PyPI using +the public key's corresponding private key. The signed TUF metadata that +developers upload to PyPI includes information like the distribution's file +size and hash, which package managers use to verify distributions that are +downloaded. + +The practical implications of end-to-end signing is the extra administrative +work needed to delegate trust to a project, and the signed metadata that +developers MUST upload to PyPI along with the distribution. Specifically, PyPI +is expected to periodically sign metadata with an offline key by adding +projects to the *claimed* metadata file and signing it. In contrast, projects +are only ever signed with an online key in the minimum security model. +End-to-end signing does require manual intervention to delegate trust (i.e., to +sign metadata with an offline key), but this is a one-time cost and projects +have stronger protections against PyPI compromises thereafter. + + +Metadata Signatures, Key Management, and Signing Distributions +============================================================== + +This section discusses the tools, signature scheme, and signing methods that +PyPI MAY recommend to implementors of the signing tools. Developers are +expected to use these tools to sign and upload distributions to PyPI. To +summarize the RECOMMENDED tools and schemes discussed in the subsections below, +developers MAY generate cryptographic keys and sign metadata (with the Ed25519 +signature scheme) in some automated fashion, where the metadata includes the +information required to verify the authenticity of the distribution. +Developers then upload metadata to PyPI, where it will be available for +download by package managers such as pip (i.e., package managers that support +TUF metadata). The entire process is transparent to the end-users (using a +package manager that supports TUF) that download distributions from PyPI. + +The first three subsections (Cryptographic Signature Scheme, Cryptographic Key +Files, and Key Management) cover the cryptographic components of the developer +release process. That is, which key type PyPI supports, how keys may be +stored, and how keys may be generated. The two subsections that follow the +first three discuss the PyPI modules that SHOULD be modified to support TUF +metadata. For example, Twine and Distutils are two projects that SHOULD be +modified. Finally, the last subsection goes over the automated key management +and signing solution that is RECOMMENDED for the signing tools. + +TUF's design is flexible with respect to cryptographic key types, signatures, +and signing methods. The tools, modification, and methods discussed in the +following sections are RECOMMENDATIONS for the implementors of the signing +tools. + + +Cryptographic Signature Scheme: Ed25519 +--------------------------------------- + +The package manager (pip) shipped with CPython MUST work on non-CPython +interpreters and cannot have dependencies that have to be compiled (i.e., the +PyPI+TUF integration MUST NOT require compilation of C extensions in order to +verify cryptographic signatures). Verification of signatures MUST be done in +Python, and verifying RSA [11]_ signatures in pure-Python may be impractical due +to speed. Therefore, PyPI MAY use the `Ed25519`__ signature scheme. + +__ http://ed25519.cr.yp.to/ + +Ed25519 [12]_ is a public-key signature system that uses small cryptographic +signatures and keys. A `pure-Python implementation`__ of the Ed25519 signature +scheme is available. Verification of Ed25519 signatures is fast even when +performed in Python. + +__ https://github.com/pyca/ed25519 + + +Cryptographic Key Files +----------------------- + +The implementation MAY encrypt key files with AES-256-CTR-Mode and strengthen +passwords with PBKDF2-HMAC-SHA256 (100K iterations by default, but this may be +overridden by the developer). The current Python implementation of TUF can use +any cryptographic library (support for PyCA cryptography will be added in the +future), may override the default number of PBKDF2 iterations, and the KDF +tweaked to taste. + + +Key Management: miniLock +------------------------ + +An easy-to-use key management solution is needed. One solution is to derive a +private key from a password so that developers do not have to manage +cryptographic key files across multiple computers. `miniLock`__ is an example +of how this can be done. Developers may view the cryptographic key as a +secondary password. miniLock also works well with a signature scheme like +Ed25519, which only needs a very small key. + +__ https://github.com/kaepora/miniLock#-minilock + + +Third-party Upload Tools: Twine +------------------------------- + +Third-party tools like `Twine`__ MAY be modified (if they wish to support +distributions that include TUF metadata) to sign and upload developer projects +to PyPI. Twine is a utility for interacting with PyPI that uses TLS to upload +distributions, and prevents MITM attacks on usernames and passwords. + +__ https://github.com/pypa/twine + + +Distutils +--------- + +`Distutils`__ MAY be modified to sign metadata and to upload signed distributions +to PyPI. Distutils comes packaged with CPython and is the most widely-used +tool for uploading distributions to PyPI. + +__ https://docs.python.org/2/distutils/index.html#distutils-index + + +Automated Signing Solution +-------------------------- + +An easy-to-use key management solution is RECOMMENDED for developers. One +approach is to generate a cryptographic private key from a user password, akin +to miniLock. Although developer signatures can remain optional, this approach +may be inadequate due to the great number of potentially unsigned dependencies +each distribution may have. If any one of these dependencies is unsigned, it +negates any benefit the project gains from signing its own distribution (i.e., +attackers would only need to compromise one of the unsigned dependencies to +attack end-users). Requiring developers to manually sign distributions and +manage keys is expected to render key signing an unused feature. + +A default, PyPI-mediated key management and package signing solution that is +`transparent`__ to developers and does not require a key escrow (sharing of +encrypted private keys with PyPI) is RECOMMENDED for the signing tools. +Additionally, the signing tools SHOULD circumvent the sharing of private keys +across multiple machines of each developer. + +__ https://en.wikipedia.org/wiki/Transparency_%28human%E2%80%93computer_interaction%29 + +The following outlines an automated signing solution that a new developer MAY +follow to upload a distribution to PyPI: + +1. Register a PyPI project. +2. Enter a secondary password (independent of the PyPI user account password). +3. Optional: Add a new identity to the developer's PyPI user account from a + second machine (after a password prompt). +4. Upload project. + +Step 1 is the normal procedure followed by developers to `register a PyPI +project`__. + +__ https://pypi.python.org/pypi?:action=register_form + +Step 2 generates an encrypted key file (private), uploads an Ed25519 public key +to PyPI, and signs the TUF metadata that is generated for the distribution. + +Optionally adding a new identity from a second machine, by simply entering a +password, in step 3 also generates an encrypted private key file and uploads an +Ed25519 public key to PyPI. Separate identities MAY be created to allow a +developer, or other project maintainers, to sign releases on multiple machines. +An existing verified identity (its public key is contained in project metadata +or has been uploaded to PyPI) signs for new identities. By default, project +metadata has a signature threshold of "1" and other verified identities may +create new releases to satisfy the threshold. + +Step 4 uploads the distribution file and TUF metadata to PyPI. The "Snapshot +Process" section discusses in detail the procedure followed by developers to +upload a distribution to PyPI. + +Generation of cryptographic files and signatures is transparent to the +developers in the default case: developers need not be aware that packages are +automatically signed. However, the signing tools should be flexible; a single +project key may also be shared between multiple machines if manual key +management is preferred (e.g., ssh-copy-id). + +The `repository`__ and `developer`__ TUF tools currently support all of the +recommendations previously mentioned, except for the automated signing +solution, which SHOULD be added to Distutils, Twine, and other third-party +signing tools. The automated signing solution calls available repository tool +functions to sign metadata and to generate the cryptographic key files. + +__ https://github.com/theupdateframework/tuf/blob/develop/tuf/README.md +__ https://github.com/theupdateframework/tuf/blob/develop/tuf/README-developer-tools.md + + +Snapshot Process +---------------- + +The snapshot process is fairly simple and SHOULD be automated. The snapshot +process MUST keep in memory the latest working set of *root*, *targets*, and +delegated roles. Every minute or so the snapshot process will sign for this +latest working set. (Recall that project transaction processes continuously +inform the snapshot process about the latest delegated metadata in a +concurrency-safe manner. The snapshot process will actually sign for a copy of +the latest working set while the latest working set in memory will be updated +with information that is continuously communicated by the project transaction +processes.) The snapshot process MUST generate and sign new *timestamp* +metadata that will vouch for the metadata (*root*, *targets*, and delegated +roles) generated in the previous step. Finally, the snapshot process MUST make +available to clients the new *timestamp* and *snapshot* metadata representing +the latest snapshot. + +A *claimed* or *recently-claimed* project will need to upload in its +transaction to PyPI not just targets (a simple index as well as distributions) +but also TUF metadata. The project MAY do so by uploading a ZIP file containing +two directories, /metadata/ (containing delegated targets metadata files) and +/targets/ (containing targets such as the project simple index and +distributions that are signed by the delegated targets metadata). + +Whenever the project uploads metadata or targets to PyPI, PyPI SHOULD check the +project TUF metadata for at least the following properties: + +* A threshold number of the developers keys registered with PyPI by that + project MUST have signed for the delegated targets metadata file that + represents the "root" of targets for that project (e.g. metadata/targets/ + project.txt). +* The signatures of delegated targets metadata files MUST be valid. +* The delegated targets metadata files MUST NOT have expired. +* The delegated targets metadata MUST be consistent with the targets. +* A delegator MUST NOT delegate targets that were not delegated to itself by + another delegator. +* A delegatee MUST NOT sign for targets that were not delegated to itself by a + delegator. + +If PyPI chooses to check the project TUF metadata, then PyPI MAY choose to +reject publishing any set of metadata or targets that do not meet these +requirements. + +PyPI MUST enforce access control by ensuring that each project can only write +to the TUF metadata for which it is responsible. It MUST do so by ensuring that +project transaction processes write to the correct metadata as well as correct +locations within those metadata. For example, a project transaction process for +an unclaimed project MUST write to the correct target paths in the correct +delegated unclaimed metadata for the targets of the project. + +On rare occasions, PyPI MAY wish to extend the TUF metadata format for projects +in a backward-incompatible manner. Note that PyPI will NOT be able to +automatically rewrite existing TUF metadata on behalf of projects in order to +upgrade the metadata to the new backward-incompatible format because this would +invalidate the signatures of the metadata as signed by developer keys. +Instead, package managers SHOULD be written to recognize and handle multiple +incompatible versions of TUF metadata so that claimed and recently-claimed +projects could be offered a reasonable time to migrate their metadata to newer +but backward-incompatible formats. + +If PyPI eventually runs out of disk space to produce a new consistent snapshot, +then PyPI MAY then use something like a "mark-and-sweep" algorithm to delete +sufficiently outdated consistent snapshots. That is, only outdated metadata +like *timestamp* and *snapshot* that are no longer used are deleted. +Specifically, in order to preserve the latest consistent snapshot, PyPI would +walk objects -- beginning from the root (*timestamp*) -- of the latest +consistent snapshot, mark all visited objects, and delete all unmarked objects. +The last few consistent snapshots may be preserved in a similar fashion. +Deleting a consistent snapshot will cause clients to see nothing except HTTP +404 responses to any request for a target of the deleted consistent snapshot. +Clients SHOULD then retry (as before) their requests with the latest consistent +snapshot. + +All package managers that support TUF metadata MUST be modified to download +every metadata and target file (except for *timestamp* metadata) by including, +in the request for the file, the cryptographic hash of the file in the +filename. Following the filename convention RECOMMENDED in the next +subsection, a request for the file at filename.ext will be transformed to the +equivalent request for the file at digest.filename. + +Finally, PyPI SHOULD use a `transaction log`__ to record project transaction +processes and queues so that it will be easier to recover from errors after a +server failure. + +__ https://en.wikipedia.org/wiki/Transaction_log + + +Producing Consistent Snapshots +------------------------------ + +PyPI is responsible for updating, depending on the project, either the +*claimed*, *recently-claimed*, or *unclaimed* metadata and associated delegated +metadata. Every project MUST upload its set of metadata and targets in a single +transaction. The uploaded set of files is called the "project transaction." +How PyPI MAY validate files in a project transaction is discussed in a later +section. The focus of this section is on how PyPI will respond to a project +transaction. + +Every metadata and target file MUST include in its filename the `hex digest`__ +of its `SHA-256`__ hash, which PyPI may prepend to filenames after the files +have been uploaded. For this PEP, it is RECOMMENDED that PyPI adopt a simple +convention of the form: *digest.filename*, where filename is the original +filename without a copy of the hash, and digest is the hex digest of the hash. + +__ http://docs.python.org/2/library/hashlib.html#hashlib.hash.hexdigest +__ https://en.wikipedia.org/wiki/SHA-2 + +When an unclaimed project uploads a new transaction, a project transaction +process MUST add all new targets and relevant delegated unclaimed metadata. +The project transaction process MUST inform the snapshot process about new +delegated unclaimed metadata. + +When a *recently-claimed* project uploads a new transaction, a project +transaction process MUST add all new targets and delegated targets metadata for +the project. If the project is new, then the project transaction process MUST +also add new *recently-claimed* metadata with the public keys (which MUST be +part of the transaction) for the project. *recently-claimed* projects have a +threshold value of "1" set by the transaction process. Finally, the project +transaction process MUST inform the snapshot process about new +*recently-claimed* metadata, as well as the current set of delegated targets +metadata for the project. + +The transaction process for a claimed project is slightly different in that +PyPI administrators periodically move (a manual process that MAY occur every +two weeks to a month) projects from the *recently-claimed* role to the +*claimed* role. (Moving a project from *recently-claimed* to *claimed* is a +manual process because PyPI administrators have to use an offline key to sign +the claimed project's distribution.) A project transaction process MUST then +add new *recently-claimed* and *claimed* metadata to reflect this migration. As +is the case for a *recently-claimed* project, the project transaction process +MUST always add all new targets and delegated targets metadata for the claimed +project. Finally, the project transaction process MUST inform the consistent +snapshot process about new *recently-claimed* or *claimed* metadata, as well as +the current set of delegated targets metadata for the project. + +Project transaction processes SHOULD be automated, except when PyPI +administrators move a project from the *recently-claimed* role to the *claimed* +role. Project transaction processes MUST also be applied atomically: either all +metadata and targets -- or none of them -- are added. The project transaction +processes and snapshot process SHOULD work concurrently. Finally, project +transaction processes SHOULD keep in memory the latest *claimed*, +*recently-claimed*, and *unclaimed* metadata so that they will be correctly +updated in new consistent snapshots. + +The queue MAY be processed concurrently in order of appearance, provided that +the following rules are observed: + +1. No pair of project transaction processes may concurrently work on the same + project. + +2. No pair of project transaction processes may concurrently work on + *unclaimed* projects that belong to the same delegated *unclaimed* role. + +3. No pair of project transaction processes may concurrently work on new + recently-claimed projects. + +4. No pair of project transaction processes may concurrently work on new + claimed projects. + +5. No project transaction process may work on a new claimed project while + another project transaction process is working on a new recently-claimed + project and vice versa. + +These rules MUST be observed to ensure that metadata is not read from or +written to inconsistently. + + +Auditing Snapshots +------------------ + +If a malicious party compromises PyPI, they can sign arbitrary files with any +of the online keys. The roles with offline keys (i.e., *root* and *targets*) +are still protected. To safely recover from a repository compromise, snapshots +should be audited to ensure that files are only restored to trusted versions. + +When a repository compromise has been detected, the integrity of three types of +information must be validated: + +1. If the online keys of the repository have been compromised, they can be + revoked by having the *targets* role sign new metadata, delegated to a new + key. + +2. If the role metadata on the repository has been changed, this will impact + the metadata that is signed by online keys. Any role information created + since the compromise should be discarded. As a result, developers of new + projects will need to re-register their projects. + +3. If the packages themselves may have been tampered with, they can be + validated using the stored hash information for packages that existed in + trusted metadata before the compromise. Also, new distributions that are + signed by developers in the *claimed* role may be safely retained. However, + any distributions signed by developers in the *recently-claimed* or + *unclaimed* roles should be discarded. + +In order to safely restore snapshots in the event of a compromise, PyPI SHOULD +maintain a small number of its own mirrors to copy PyPI snapshots according to +some schedule. The mirroring protocol can be used immediately for this +purpose. The mirrors must be secured and isolated such that they are +responsible only for mirroring PyPI. The mirrors can be checked against one +another to detect accidental or malicious failures. + +Another approach is to periodically generate the cryptographic hash of +*snapshot* and tweet it. For example, upon receiving the tweet, a user comes +forward with the actual metadata and the repository maintainers are then able +to verify metadata's cryptographic hash. Alternatively, PyPI may periodically +archive its own versions of *snapshot* rather than rely on externally provided +metadata. In this case, PyPI SHOULD take the cryptographic hash of every +package on the repository and store this data on an offline device. If any +package hash has changed, this indicates an attack has occurred. + +Attacks that serve different versions of metadata or that freeze a version of a +package at a specific version can be handled by TUF with techniques such as +implicit key revocation and metadata mismatch detection [2]_. +n + +Key Compromise Analysis +======================= + +This PEP has covered the maximum security model, the TUF roles that should be +added to support continuous delivery of distributions, how to generate and sign +the metadata of each role, and how to support distributions that have been +signed by developers. The remaining sections discuss how PyPI SHOULD audit +repository metadata, and the methods PyPI can use to detect and recover from a +PyPI compromise. + +Table 1 summarizes a few of the attacks possible when a threshold number of +private cryptographic keys (belonging to any of the PyPI roles) are +compromised. The leftmost column lists the roles (or a combination of roles) +that have been compromised, and the columns to the right show whether the +compromised roles leaves clients susceptible to malicious updates, freeze +attacks, or metadata inconsistency attacks. + ++-------------------+-------------------+-----------------------+-----------------------+ +| Role Compromise | Malicious Updates | Freeze Attack | Metadata Inconsistency| +| | | | Attacks | ++===================+===================+=======================+=======================+ +| timetamp | NO | YES | NO | +| | snapshot and | limited by earliest | snapshot needs to | +| | targets or any | root, targets, or bin | cooperate | +| | of the delegated | metadata expiry time | | +| | roles need to | | | +| | cooperate | | | ++-------------------+-------------------+-----------------------+-----------------------+ +| snapshot | NO | NO | NO | +| | timestamp and | timestamp needs to | timestamp needs to | +| | targets or any of | coorperate | cooperate | +| | the delegated | | | +| | roles need to | | | +| | cooperate | | | ++-------------------+-------------------+-----------------------+-----------------------+ +| timestamp | NO | YES | YES | +| *AND* | targets or any | limited by earliest | limited by earliest | +| snapshot | of the delegated | root, targets, or bin | root, targets, or bin | +| | roles need to | metadata expiry time | metadata expiry time | +| | cooperate | | | +| | | | | ++-------------------+-------------------+-----------------------+-----------------------+ +| targets | NO | NOT APPLICABLE | NOT APPLICABLE | +| *OR* | timestamp and | need timestamp and | need timestamp | +| **claimed** | snapshot need to | snapshot | and snapshot | +| *OR* | cooperate | | | +| recently-claimed | | | | +| *OR* | | | | +| unclaimed | | | | +| *OR* | | | | +| **project** | | | | ++-------------------+-------------------+-----------------------+-----------------------+ +| (timestamp | YES | YES | YES | +| *AND* | | limited by earliest | limited by earliest | +| snapshot) | | root, targets, or bin | root, targets, or bin | +| *AND* | | metadata expiry time | metadata expiry time | +| **project** | | | | +| | | | | ++-------------------+-------------------+-----------------------+-----------------------+ +| (timestamp | YES | YES | YES | +| *AND* | but only of | limited by earliest | limited by earliest | +| snapshot) | projects not | root, targets, | root, targets, | +| *AND* | delegated by | claimed, | claimed, | +| (recently-claimed | claimed | recently-claimed, | recently-claimed, | +| *OR* | | project, or unclaimed | project, or unclaimed | +| unclaimed) | | metadata expiry time | metadata expiry time | ++-------------------+-------------------+-----------------------+-----------------------+ +| (timestamp | | YES | YES | +| *AND* | | limited by earliest | limited by earliest | +| snapshot) | | root, targets, | root, targets, | +| *AND* | YES | claimed, | claimed, | +| (targets *OR* | | recently-claimed, | recently-claimed, | +| **claimed**) | | project, or unclaimed | project, or unclaimed | +| | | metadata expiry time | metadata expiry time | ++-------------------+-------------------+-----------------------+-----------------------+ +| root | YES | YES | YES | ++-------------------+-------------------+-----------------------+-----------------------+ + +Table 1: Attacks that are possible by compromising certain combinations of role +keys. In `September 2013`__, it was shown how the latest version (at the time) +of pip was susceptible to these attacks and how TUF could protect users against +them [8]_. Roles signed by offline keys are in **bold**. + +__ https://mail.python.org/pipermail/distutils-sig/2013-September/022755.html + +Note that compromising *targets* or any delegated role (except for project +targets metadata) does not immediately allow an attacker to serve malicious +updates. The attacker must also compromise the *timestamp* and *snapshot* +roles (which are both online and therefore more likely to be compromised). +This means that in order to launch any attack, one must not only be able to act +as a man-in-the-middle, but also compromise the *timestamp* key (or compromise +the *root* keys and sign a new *timestamp* key). To launch any attack other +than a freeze attack, one must also compromise the *snapshot* key. Finally, a +compromise of the PyPI infrastructure MAY introduce malicious updates to +*recently-claimed* projects because the keys for these roles are online. + + +In the Event of a Key Compromise +-------------------------------- + +A key compromise means that a threshold of keys belonging to developers or the +roles on PyPI, as well as the PyPI infrastructure, have been compromised and +used to sign new metadata on PyPI. + +If a threshold number of developer keys of a project have been compromised, +the project MUST take the following steps: + +1. The project metadata and targets MUST be restored to the last known good + consistent snapshot where the project was not known to be compromised. This + can be done by developers repackaging and resigning all targets with + the new keys. + +2. The project's metadata MUST have its version numbers incremented, expiry + times suitably extended, and signatures renewed. + +Whereas PyPI MUST take the following steps: + +1. Revoke the compromised developer keys from the *recently-claimed* or + *claimed* role. This is done by replacing the compromised developer keys + with newly issued developer keys. + +2. A new timestamped consistent snapshot MUST be issued. + +If a threshold number of *timestamp*, *snapshot*, *recently-claimed*, or +*unclaimed* keys have been compromised, then PyPI MUST take the following +steps: + +1. Revoke the *timestamp*, *snapshot*, and *targets* role keys from the + root role. This is done by replacing the compromised *timestamp*, + *snapshot*, and *targets* keys with newly issued keys. + +2. Revoke the *recently-claimed* and *unclaimed* keys from the *targets* role + by replacing their keys with newly issued keys. Sign the new targets role + metadata and discard the new keys (because, as we explained earlier, this + increases the security of targets metadata). + +3. Clear all targets or delegations in the *recently-claimed* role and delete + all associated delegated targets metadata. Recently registered projects + SHOULD register their developer keys again with PyPI. + +4. All targets of the *recently-claimed* and *unclaimed* roles SHOULD be + compared with the last known good consistent snapshot where none of the + timestamp, snapshot, recently-claimed, or unclaimed keys were known to have + been compromised. Added, updated, or deleted targets in the compromised + consistent snapshot that do not match the last known good consistent + snapshot SHOULD be restored to their previous versions. After ensuring the + integrity of all unclaimed targets, the unclaimed metadata MUST be + regenerated. + +5. The *recently-claimed* and *unclaimed* metadata MUST have their version + numbers incremented, expiry times suitably extended, and signatures + renewed. + +6. A new timestamped consistent snapshot MUST be issued. + +This would preemptively protect all of these roles even though only one of them +may have been compromised. + +If a threshold number of the *targets* or *claimed* keys have been compromised, +then there is little that an attacker would be able do without the *timestamp* +and *snapshot* keys. In this case, PyPI MUST simply revoke the compromised +*targets* or *claimed* keys by replacing them with new keys in the *root* and +*targets* roles, respectively. + +If a threshold number of the *timestamp*, *snapshot*, and *claimed* keys have +been compromised, then PyPI MUST take the following steps in addition to the +steps taken when either the *timestamp* or *snapshot* keys are compromised: + +1. Revoke the *claimed* role keys from the targets role and replace them with + newly issued keys. + +2. All project targets of the claimed roles SHOULD be compared with the last + known good consistent snapshot where none of the *timestamp*, *snapshot*, + or *claimed* keys were known to have been compromised. Added, updated, or + deleted targets in the compromised consistent snapshot that do not match + the last known good consistent snapshot MAY be restored to their previous + versions. After ensuring the integrity of all claimed project targets, the + *claimed* metadata MUST be regenerated. + +3. The claimed metadata MUST have their version numbers incremented, expiry + times suitably extended, and signatures renewed. + +Following these steps would preemptively protect all of these roles even though +only one of them may have been compromised. + +If a threshold number of *root* keys have been compromised, then PyPI MUST take +the steps taken when the *targets* role has been compromised. All of the +*root* keys must also be replaced. + +It is also RECOMMENDED that PyPI sufficiently document compromises with +security bulletins. These security bulletins will be most informative when +users of pip-with-TUF are unable to install or update a project because the +keys for the *timestamp*, *snapshot*, or *root* roles are no longer valid. +Users could then visit the PyPI web site to consult security bulletins that +would help to explain why users are no longer able to install or update, and +then take action accordingly. When a threshold number of *root* keys have not +been revoked due to a compromise, then new *root* metadata may be safely +updated because a threshold number of existing *root* keys will be used to sign +for the integrity of the new *root* metadata. TUF clients will be able to +verify the integrity of the new *root* metadata with a threshold number of +previously known *root* keys. This will be the common case. In the worst +case, where a threshold number of *root* keys have been revoked due to a +compromise, an end-user may choose to update new *root* metadata with +`out-of-band`__ mechanisms. + +__ https://en.wikipedia.org/wiki/Out-of-band#Authentication + + +Appendix A: PyPI Build Farm and End-to-End Signing +================================================== + +PyPI administrators intend to support a central build farm. The PyPI build +farm will auto-generate a `Wheel`__, for each distribution that is uploaded by +developers, on PyPI infrastructure and on supported platforms. Package +managers will likely install projects by downloading these PyPI Wheels (which +can be installed much faster than source distributions) rather than the source +distributions signed by developers. The implications of having a central build +farm with end-to-end signing SHOULD be investigated before the maximum security +model is implemented. + +__ http://wheel.readthedocs.org/en/latest/ + +An issue with a central build farm and end-to-end signing is that developers +are unlikely to sign Wheel distributions once they have been generated on PyPI +infrastructure. However, generating wheels from source distributions that are +signed by developers can still be beneficial, provided that building Wheels is +a deterministic process. If deterministic builds are infeasible, developers +may delegate trust of these wheels to a PyPI role that signs for wheels with +an online key. + + +References +========== + +.. [1] https://www.python.org/dev/peps/pep-0458/ +.. [2] https://isis.poly.edu/~jcappos/papers/samuel_tuf_ccs_2010.pdf +.. [3] https://github.com/theupdateframework/tuf/blob/develop/docs/tuf-spec.txt +.. [4] http://www.python.org/dev/peps/pep-0426/ +.. [5] https://github.com/theupdateframework/pip/wiki/Attacks-on-software-repositories +.. [6] https://mail.python.org/pipermail/distutils-sig/2013-September/022773.html +.. [7] https://isis.poly.edu/~jcappos/papers/cappos_mirror_ccs_08.pdf +.. [8] https://mail.python.org/pipermail/distutils-sig/2013-September/022755.html +.. [9] https://pypi.python.org/security +.. [10] https://mail.python.org/pipermail/distutils-sig/2013-August/022154.html +.. [11] https://en.wikipedia.org/wiki/RSA_%28algorithm%29 +.. [12] http://ed25519.cr.yp.to/ + + +Acknowledgements +================ + +This material is based upon work supported by the National Science Foundation +under Grants No. CNS-1345049 and CNS-0959138. Any opinions, findings, and +conclusions or recommendations expressed in this material are those of the +author(s) and do not necessarily reflect the views of the National Science +Foundation. + +We thank Nick Coghlan, Daniel Holth and the distutils-sig community in general +for helping us to think about how to usably and efficiently integrate TUF with +PyPI. + +Roger Dingledine, Sebastian Hahn, Nick Mathewson, Martin Peck and Justin +Samuel helped us to design TUF from its predecessor Thandy of the Tor project. + +We appreciate the efforts of Konstantin Andrianov, Geremy Condra, Zane Fisher, +Justin Samuel, Tian Tian, Santiago Torres, John Ward, and Yuyu Zheng to develop +TUF. + + +Copyright +========= + +This document has been placed in the public domain.