python-peps/peps/pep-0740.rst

727 lines
28 KiB
ReStructuredText

PEP: 740
Title: Index support for digital attestations
Author: William Woodruff <william@yossarian.net>,
Facundo Tuesca <facundo.tuesca@trailofbits.com>,
Dustin Ingram <di@python.org>
Sponsor: Donald Stufft <donald@stufft.io>
PEP-Delegate: Donald Stufft <donald@stufft.io>
Discussions-To: https://discuss.python.org/t/pep-740-index-support-for-digital-attestations/44498
Status: Provisional
Type: Standards Track
Topic: Packaging
Created: 08-Jan-2024
Post-History: `02-Jan-2024 <https://discuss.python.org/t/pre-pep-exposing-trusted-publisher-provenance-on-pypi/42337>`__,
`29-Jan-2024 <https://discuss.python.org/t/pep-740-index-support-for-digital-attestations/44498>`__
Resolution: https://discuss.python.org/t/pep-740-index-support-for-digital-attestations/44498/26
Abstract
========
This PEP proposes a collection of changes related to the upload and distribution
of digitally signed attestations and metadata used to verify them on a Python
package repository, such as PyPI.
These changes have two subcomponents:
* Changes to the currently unstandardized PyPI upload API, allowing clients
to upload digital attestations as :ref:`attestation objects <attestation-object>`;
* Changes to the :ref:`HTML and JSON "simple" APIs <packaging:simple-repository-api>`,
allowing clients to retrieve both digital attestations and
`Trusted Publishing <https://docs.pypi.org/trusted-publishers/>`_ metadata
for individual release files as :ref:`provenance objects <provenance-object>`.
This PEP does not make a policy recommendation around mandatory digital
attestations on release uploads or their subsequent verification by installing
clients like ``pip``.
Rationale and Motivation
========================
Desire for digital signatures on Python packages has been repeatedly
expressed by both package maintainers and downstream users:
* Maintainers wish to demonstrate the integrity and authenticity of their
package uploads;
* Individual downstream users wish to verify package integrity and authenticity
without placing additional trust in their index's honesty;
* "Bulk" downstream users (such as Operating System distributions) wish to
perform similar verifications and potentially re-expose or countersign
for their own downstream packaging ecosystems.
This proposal seeks to accommodate each of the above use cases.
Additionally, this proposal identifies the following motivations:
* Verifiable provenance for Python package distributions: many Python
packages currently contain *unauthenticated* provenance metadata, such
as URLs for source hosts. A cryptographic attestation format could enable
strong *authenticated* links between these packages and their source hosts,
allowing both the index and downstream users to cryptographically verify that
a package originates from its claimed source repository.
* Raising attacker requirements: an attacker who seeks to take
over a Python package can be described along *sophistication*
(unsophisticated to sophisticated) and *targeting* dimensions
(opportunistic to targeted).
Digital attestations impose additional sophistication requirements: the
attacker must be sufficiently sophisticated to access private signing material
(or signing identities).
* Index verifiability: in the status quo, the only attestation provided by the
index is an optional PGP signature per release file
(see :ref:`PGP signatures <pgp-signatures>`). These signatures are not
(and cannot be) checked by the index either for well-formedness or for
validity, since the index has no mechanism for identifying the right public
key for the signature. This PEP overcomes this limitation
by ensuring that :ref:`provenance objects <provenance-object>` contain all
of the metadata needed by the index to verify an attestation's validity.
This PEP proposes a generic attestation format, containing an
:ref:`attestation statement for signature generation <payload-and-signature-generation>`,
with the expectation that index providers adopt the
format with a suitable source of identity for signature verification, such as
Trusted Publishing.
Design Considerations
---------------------
This PEP identifies the following design considerations when evaluating
both its own proposed changes and previous work in the same or adjacent
areas of Python packaging:
1. Index accessibility: digital attestations for Python packages
are ideally retrievable directly from the index itself, as "detached"
resources.
This both simplifies some compatibility concerns (by avoiding
the need to modify the distribution formats themselves) and also simplifies
the behavior of potential installing clients (by allowing them to
retrieve each attestation before its corresponding package without needing
to do streaming decompression).
2. Verification by the index itself: in addition to enabling verification
by installing clients, each digital attestation is *ideally* verifiable
in some form by the index itself.
This both increases the overall quality
of attestations uploaded to the index (preventing, for example, users
from accidentally uploading incorrect or invalid attestations) and also
enables UI and UX refinements on the index itself (such as a "provenance"
view for each uploaded package).
3. General applicability: digital attestations should be applicable to
*any and every* package uploaded to the index, regardless of its format
(sdist or wheel) or interior contents.
4. Metadata support: this PEP refers to "digital attestations" rather than
just "digital signatures" to emphasize the ideal presence of additional
metadata within the cryptographic envelope.
For example, to prevent domain separation between a distribution's name and
its contents, this PEP uses '`Statements <https://github.com/in-toto/attestation/blob/v1.0/spec/v1.0/statement.md>`__'
from the `in-toto project <https://in-toto.io/>`__ to bind the distribution's
contents (via SHA-256 digest) to its filename.
Previous Work
-------------
.. _pgp-signatures:
PGP signatures
^^^^^^^^^^^^^^
PyPI and other indices have historically supported PGP signatures on uploaded
distributions. These could be supplied during upload, and could be retrieved
by installing clients via the ``data-gpg-sig`` attribute in the :pep:`503`
API, the ``gpg-sig`` key on the :pep:`691` API, or via an adjacent
``.asc``-suffixed URL.
PGP signature uploads have been disabled on PyPI since
`May 2023 <https://blog.pypi.org/posts/2023-05-23-removing-pgp/>`_, after
`an investigation <https://blog.yossarian.net/2023/05/21/PGP-signatures-on-PyPI-worse-than-useless>`_
determined that the majority of signatures (which, themselves, constituted a
tiny percentage of overall uploads) could not be associated with a public key or
otherwise meaningfully verified.
In their previously supported form on PyPI, PGP signatures satisfied
considerations (1) and (3) above but not (2) (owing to the need for external
keyservers and key distribution) or (4) (due to PGP signatures typically being
constructed over just an input file, without any associated signed metadata).
Wheel signatures
^^^^^^^^^^^^^^^^
:pep:`427` (and its :ref:`living PyPA counterpart <packaging:binary-distribution-format>`)
specify the :term:`wheel format <packaging:Wheel>`.
This format includes accommodations for digital signatures embedded directly
into the wheel, in either JWS or S/MIME format. These signatures are specified
over a :pep:`376` RECORD, which is modified to include a cryptographic digest
for each recorded file in the wheel.
While wheel signatures are fully specified, they do not appear to be broadly
used; the official `wheel tooling <https://github.com/pypa/wheel>`_ deprecated
signature generation and verification support
`in 0.32.0 <https://wheel.readthedocs.io/en/stable/news.html>`_, which was
released in 2018.
Additionally, wheel signatures do not satisfy any of
the above considerations (due to the "attached" nature of the signatures,
non-verifiability on the index itself, and support for wheels only).
Specification
=============
.. _upload-endpoint:
Upload endpoint changes
-----------------------
The current upload API is not standardized. However, we propose the following
changes to it:
* In addition to the current top-level ``content`` and ``gpg_signature`` fields,
the index **SHALL** accept ``attestations`` as an additional multipart form
field.
* The new ``attestations`` field **SHALL** be a JSON array.
* The ``attestations`` array **SHALL** have one or more items, each a JSON object
representing an individual attestation.
* Each attestation object **MUST** be verifiable by the index. If the index fails
to verify any attestation in ``attestations``, it **MUST** reject the upload.
The format of attestation objects is defined under :ref:`attestation-object`
and the process for verifying attestations is defined under
:ref:`attestation-verification`.
Index changes
-------------
Simple Index
^^^^^^^^^^^^
The following changes are made to the
:ref:`simple repository API <packaging:simple-repository-api-base>`:
* When an uploaded file has one or more attestations, the index **MAY**
provide a provenance file containing attestations associated with
a given distribution. The format of the provenance file
**SHALL** be a JSON-encoded :ref:`provenance object <provenance-object>`,
which **SHALL** contain the file's attestations.
The location of the provenance file is signaled by the index via
the ``data-provenance`` attribute.
* When a provenance file is present, the index **MAY** include a
``data-provenance`` attribute on its file link. The value of the
``data-provenance`` attribute **SHALL** be a fully qualified URL,
signaling the the file's provenance can be found
at that URL. This URL **MUST** represent a
`secure origin <https://www.chromium.org/Home/chromium-security/prefer-secure-origins-for-powerful-new-features/>`_.
The following table provides examples of release file URLs, ``data-provenance``
values, and their resulting provenance file URLs.
.. csv-table::
:header: "File URL", "``data-provenance``", "Provenance URL"
"https://example.com/sampleproject-1.2.3.tar.gz", "``https://example.com/sampleproject-1.2.3.tar.gz.provenance``", "https://example.com/sampleproject-1.2.3.tar.gz.provenance"
"https://example.com/sampleproject-1.2.3.tar.gz", "``https://other.example.com/sampleproject-1.2.3.tar.gz/provenance``", "https://other.example.com/sampleproject-1.2.3.tar.gz/provenance"
"https://example.com/sampleproject-1.2.3.tar.gz", "``../relative``", "*(invalid: not a fully qualified URL)*"
"https://example.com/sampleproject-1.2.3.tar.gz", "``http://unencrypted.example.com/provenance``", "*(invalid: not a secure origin)*"
* The index **MAY** choose to modify the provenance file. For example,
the index **MAY** permit adding additional attestations and verification
materials, such as attestations from third-party auditors or other services.
See :ref:`changes-to-provenance-objects` for an additional discussion of
reasons why a file's provenance may change.
JSON-based Simple API
^^^^^^^^^^^^^^^^^^^^^
The following changes are made to the
:ref:`JSON simple API <packaging:simple-repository-api-json>`:
* When an uploaded file has one or more attestations, the index **MAY**
include a ``provenance`` key in the ``file`` dictionary for that file.
The value of the ``provenance`` key **SHALL** be either a JSON string
or ``null``. If ``provenance`` is not ``null``, it **SHALL** be a URL
to the associated provenance file.
See :ref:`appendix-3` for an explanation of the technical decision to
embed the SHA-256 digest in the JSON API, rather than the full
:ref:`provenance object <provenance-object>`.
These changes require a version change to the JSON API:
* The ``api-version`` **SHALL** specify version 1.3 or later.
.. _attestation-object:
Attestation objects
-------------------
An attestation object is a JSON object with several required keys; applications
or signers may include additional keys so long as all explicitly
listed keys are provided. The required layout of an attestation
object is provided as pseudocode below.
.. code-block:: python
@dataclass
class Attestation:
version: Literal[1]
"""
The attestation object's version, which is always 1.
"""
verification_material: VerificationMaterial
"""
Cryptographic materials used to verify `envelope`.
"""
envelope: Envelope
"""
The enveloped attestation statement and signature.
"""
@dataclass
class Envelope:
statement: bytes
"""
The attestation statement.
This is represented as opaque bytes on the wire (encoded as base64),
but it MUST be an JSON in-toto v1 Statement.
"""
signature: bytes
"""
A signature for the above statement, encoded as base64.
"""
@dataclass
class VerificationMaterial:
certificate: str
"""
The signing certificate, as `base64(DER(cert))`.
"""
transparency_entries: list[object]
"""
One or more transparency log entries for this attestation's signature
and certificate.
"""
A full data model for each object in ``transparency_entries`` is provided in
:ref:`appendix-2`. Attestation objects **SHOULD** include one or more
transparency log entries, and **MAY** include additional keys for other
sources of signed time (such as an :rfc:`3161` Time Stamping Authority or a
`Roughtime <https://blog.cloudflare.com/roughtime>`__ server).
Attestation objects are versioned; this PEP specifies version 1. Each version
is tied to a single cryptographic suite to minimize unnecessary cryptographic
agility. In version 1, the suite is as follows:
* Certificates are specified as X.509 certificates, and comply with the
profile in :rfc:`5280`.
* The message signature algorithm is ECDSA, with the P-256 curve for public keys
and SHA-256 as the cryptographic digest function.
Future PEPs may change this suite (and the overall shape of the attestation
object) by selecting a new version number.
.. _payload-and-signature-generation:
Attestation statement and signature generation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The *attestation statement* is the actual claim that is cryptographically signed
over within the attestation object (i.e., the ``envelope.statement``).
The attestation statement is encoded as a
`v1 in-toto Statement object <https://github.com/in-toto/attestation/blob/v1.0/spec/v1.0/statement.md>`__,
in JSON form. When serialized the statement is treated as an opaque binary blob,
avoiding the need for canonicalization. An example JSON-encoded statement is
provided in :ref:`appendix-4`.
In addition to being a v1 in-toto Statement, the attestation statement is constrained
in the following ways:
* The in-toto ``subject`` **MUST** contain only a single subject.
* ``subject[0].name`` is the distribution's filename, which **MUST** be
a valid :ref:`source distribution <packaging:source-distribution-format>` or
:ref:`wheel distribution <packaging:binary-distribution-format>` filename.
* ``subject[0].digest`` **MUST** contain a SHA-256 digest. Other digests
**MAY** be present. The digests **MUST** be represented as hexadecimal strings.
* The following ``predicateType`` values are supported:
* `SLSA Provenance <https://slsa.dev/provenance/v1>`__: ``https://slsa.dev/provenance/v1``
* `PyPI Publish Attestation <https://docs.pypi.org/attestations/publish/v1>`__: ``https://docs.pypi.org/attestations/publish/v1``
The signature over this statement is constructed using the
`v1 DSSE signature protocol <https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md>`__,
with a ``PAYLOAD_TYPE`` of ``application/vnd.in-toto+json`` and a ``PAYLOAD_BODY`` of the JSON-encoded
statement above. No other ``PAYLOAD_TYPE`` is permitted.
.. _provenance-object:
Provenance objects
------------------
The index will serve uploaded attestations along with metadata that can assist
in verifying them in the form of JSON serialized objects.
These *provenance objects* will be available via both the Simple Index
and JSON-based Simple API as described above, and will have the following layout:
.. code-block:: json
{
"version": 1,
"attestation_bundles": [
{
"publisher": {
"kind": "important-ci-service",
"claims": {},
"vendor-property": "foo",
"another-property": 123
},
"attestations": [
{ /* attestation 1 ... */ },
{ /* attestation 2 ... */ }
]
}
]
}
or, as pseudocode:
.. code-block:: python
@dataclass
class Publisher:
kind: string
"""
The kind of Trusted Publisher.
"""
claims: object | None
"""
Any context-specific claims retained by the index during Trusted Publisher
authentication.
"""
_rest: object
"""
Each publisher object is open-ended, meaning that it MAY contain additional
fields beyond the ones specified explicitly above. This field signals that,
but is not itself present.
"""
@dataclass
class AttestationBundle:
publisher: Publisher
"""
The publisher associated with this set of attestations.
"""
attestations: list[Attestation]
"""
The set of attestations included in this bundle.
"""
@dataclass
class Provenance:
version: Literal[1]
"""
The provenance object's version, which is always 1.
"""
attestation_bundles: list[AttestationBundle]
"""
One or more attestation "bundles".
"""
* ``version`` is ``1``. Like attestation objects, provenance objects are
versioned, and this PEP only defines version ``1``.
* ``attestation_bundles`` is a **required** JSON array, containing one
or more "bundles" of attestations. Each bundle corresponds to a
signing identity (such as a Trusted Publishing identity), and contains
one or more attestation objects.
As noted in the ``Publisher`` model,
each ``AttestationBundle.publisher`` object is specific to its Trusted Publisher
but must include at minimum:
* A ``kind`` key, which **MUST** be a JSON string that uniquely identifies the
kind of Trusted Publisher.
* A ``claims`` key, which **MUST** be a JSON object containing any context-specific
claims retained by the index during Trusted Publisher authentication.
All other keys in the publisher object are publisher-specific. A full
illustrative example of a publisher object is provided in :ref:`appendix-1`.
Each array of attestation objects is a superset of the ``attestations``
array supplied by the uploaded through the ``attestations`` field at upload
time, as described in :ref:`upload-endpoint` and
:ref:`changes-to-provenance-objects`.
.. _changes-to-provenance-objects:
Changes to provenance objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Provenance objects are *not* immutable, and may change over time. Reasons
for changes to the provenance object include but are not limited to:
* Addition of new attestations for a pre-existing signing identity: the index
**MAY** choose to allow additional attestations by pre-existing signing
identities, such as newer attestation versions for already uploaded
files.
* Addition of new signing identities and associated attestations: the index
**MAY** choose to support attestations from sources other than the file's
uploader, such as third-party auditors or the index itself. These attestations
may be performed asynchronously, requiring the index to insert them into
the provenance object *post facto*.
.. _attestation-verification:
Attestation verification
------------------------
Verifying an attestation object against a distribution file requires verification of each of the
following:
* ``version`` is ``1``. The verifier **MUST** reject any other version.
* ``verification_material.certificate`` is a valid signing certificate, as
issued by an *a priori* trusted authority (such as a root of trust already
present within the verifying client).
* ``verification_material.certificate`` identifies an appropriate signing
subject, such as the machine identity of the Trusted Publisher that published
the package.
* ``envelope.statement`` is a valid in-toto v1 Statement, with a subject
and digest that **MUST** match the distribution's filename and contents.
For the distribution's filename, matching **MUST** be performed by parsing
using the appropriate source distribution or wheel filename format, as
the statement's subject may be equivalent but normalized.
* ``envelope.signature`` is a valid signature for ``envelope.statement``
corresponding to ``verification_material.certificate``,
as reconstituted via the
`v1 DSSE signature protocol <https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md>`__.
In addition to the above required steps, a verifier **MAY** additionally verify
``verification_material.transparency_entries`` on a policy basis, e.g. requiring
at least one transparency log entry or a threshold of entries. When verifying
transparency entries, the verifier **MUST** confirm that the inclusion time for
each entry lies within the signing certificate's validity period.
Security Implications
=====================
This PEP is primarily "mechanical" in nature; it provides layouts for
structuring and serving verifiable digital attestations without specifying
higher level security "policies" around attestation validity, thresholds
between attestations, and so forth.
Cryptographic agility in attestations
-------------------------------------
Algorithmic agility is a common source of exploitable vulnerabilities
in cryptographic schemes. This PEP limits algorithmic agility in two ways:
* All algorithms are specified in a single suite, rather than a geometric
collection of parameters. This makes it impossible (for example) for an
attacker to select a strong signature algorithm with a weak hash function,
compromising the scheme as a whole.
* Attestation objects are versioned, and may only contain the algorithmic
suite specified for their version. If a specific suite
is considered insecure in the future, clients may choose to blanket reject
or qualify verifications of attestations that contain that suite.
Index trust
-----------
This PEP does **not** increase (or decrease) trust in the index itself:
the index is still effectively trusted to honestly deliver unmodified package
distributions, since a dishonest index capable of modifying package
contents could also dishonestly modify or omit package attestations.
As a result, this PEP's presumption of index trust is equivalent to the
unstated presumption with earlier mechanisms, like PGP and wheel signatures.
This PEP does not preclude or exclude future index trust mechanisms, such
as :pep:`458` and/or :pep:`480`.
Recommendations
===============
This PEP recommends, but does not mandate, that attestation objects
contain one or more verifiable sources of signed time that corroborate the
signing certificate's claimed validity period. Indices that implement this
PEP may choose to strictly enforce this requirement.
.. _appendix-1:
Appendix 1: Example Trusted Publisher Representation
====================================================
This appendix provides a fictional example of a ``publisher`` key within
a simple JSON API ``project.files[].provenance`` listing:
.. code-block:: json
"publisher": {
"kind": "GitHub",
"claims": {
"ref": "refs/tags/v1.0.0",
"sha": "da39a3ee5e6b4b0d3255bfef95601890afd80709"
},
"repository_name": "HolyGrail",
"repository_owner": "octocat",
"repository_owner_id": "1",
"workflow_filename": "publish.yml",
"environment": null
}
.. _appendix-2:
Appendix 2: Data models for Transparency Log Entries
====================================================
This appendix contains pseudocoded data models for transparency log entries
in attestation objects. Each transparency log entry serves as a source
of signed inclusion time, and can be verified either online or offline.
.. code-block:: python
@dataclass
class TransparencyLogEntry:
log_index: int
"""
The global index of the log entry, used when querying the log.
"""
log_id: str
"""
An opaque, unique identifier for the log.
"""
entry_kind: str
"""
The kind (type) of log entry.
"""
entry_version: str
"""
The version of the log entry's submitted format.
"""
integrated_time: int
"""
The UNIX timestamp from the log from when the entry was persisted.
"""
inclusion_proof: InclusionProof
"""
The actual inclusion proof of the log entry.
"""
@dataclass
class InclusionProof:
log_index: int
"""
The index of the entry in the tree it was written to.
"""
root_hash: str
"""
The digest stored at the root of the Merkle tree at the time of proof
generation.
"""
tree_size: int
"""
The size of the Merkle tree at the time of proof generation.
"""
hashes: list[str]
"""
A list of hashes required to complete the inclusion proof, sorted
in order from leaf to root. The leaf and root hashes are not themselves
included in this list; the root is supplied via `root_hash` and the client
must calculate the leaf hash.
"""
checkpoint: str
"""
The signed tree head's signature, at the time of proof generation.
"""
cosigned_checkpoints: list[str]
"""
Cosigned checkpoints from zero or more log witnesses.
"""
.. _appendix-3:
Appendix 3: Simple JSON API size considerations
===============================================
A previous draft of this PEP required embedding each
:ref:`provenance object <provenance-object>` directly into its appropriate part
of the JSON Simple API.
The current version of this PEP embeds the SHA-256 digest of the provenance
object instead. This is done for size and network bandwidth consideration
reasons:
1. We estimate the typical size of an attestation object to be approximately
5.3 KB of JSON.
2. We conservatively estimate that indices eventually host around 3 attestations
per release file, or approximately 15.9 KB of JSON per combined provenance
object.
3. As of May 2024, the average project on PyPI has approximately 21 release
files. We conservatively expect this average to increase over time.
4. Combined, these numbers imply that a typical project might expect to host
between 60 and 70 attestations, or approximately 339 KB of additional JSON
in its "project detail" endpoint.
These numbers are significantly worse in "pathological" cases, where projects
have hundreds or thousands of releases and/or dozens of files per release.
.. _appendix-4:
Appendix 4: Example attestation statement
=========================================
Given a source distribution ``sampleproject-1.2.3.tar.gz`` with a SHA-256
digest of ``e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855``,
the following is an appropriate in-toto Statement, as a JSON object:
.. code-block:: json
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [
{
"name": "sampleproject-1.2.3.tar.gz",
"digest": {"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}
}
],
"predicateType": "https://some-arbitrary-predicate.example.com/v1",
"predicate": {
"something-else": "foo"
}
}
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.