diff --git a/peps/pep-0740.rst b/peps/pep-0740.rst index 8fe927207..ea2658547 100644 --- a/peps/pep-0740.rst +++ b/peps/pep-0740.rst @@ -1,7 +1,8 @@ PEP: 740 Title: Index support for digital attestations Author: William Woodruff , - Facundo Tuesca + Facundo Tuesca , + Dustin Ingram Sponsor: Donald Stufft PEP-Delegate: Donald Stufft Discussions-To: https://discuss.python.org/t/pep-740-index-support-for-digital-attestations/44498 @@ -22,16 +23,15 @@ package repository, such as PyPI. These changes have two subcomponents: * Changes to the currently unstandardized PyPI upload API, allowing clients - to upload digital attestations; + to upload digital attestations as :ref:`attestation objects `; * Changes to the :pep:`503` and :pep:`691` "simple" APIs, allowing clients to retrieve both digital attestations and `Trusted Publishing `_ metadata - for individual release files. + for individual release files as :ref:`provenance objects `. -This PEP does not recommend a specific digital attestation format, nor does -it make a policy recommendation around mandatory digital attestations on -release uploads or their subsequent verification by installing clients like -``pip``. +This PEP does not make a policy recommendation around mandatory digital +attestations on release uploads or their subsequent verification by installing +clients like ``pip``. Rationale and Motivation ======================== @@ -64,31 +64,21 @@ Additionally, this proposal identifies the following motivations: Digital attestations impose additional sophistication requirements: the attacker must be sufficiently sophisticated to access private signing material - (or signing identities). They also impose additional targeting requirements: - the release consistency requirement (mentioned below) means that the attacker - cannot upload *any* attestation, but only one of a type already seen for a - particular release. In the future, this could be further "ratcheted" down - by allowing project maintainers to disable releases without attestations - entirely. -* Release consistency: in the status quo, the only attestation provided by the + (or signing identities). +* Index verifiability: in the status quo, the only attestation provided by the index is an optional PGP signature per release file (see :ref:`PGP signatures `). These signatures are not - checked by the index either for well-formedness or for validity, since - the index has no mechanism for identifying the right public key for the - signature. + (and cannot be) checked by the index either for well-formedness or for + validity, since the index has no mechanism for identifying the right public + key for the signature. This PEP overcomes this limitation + by ensuring that :ref:`provenance objects ` contain all + of the metadata needed by the index to verify an attestation's validity. - Additionally, the index does not have an "all or none" requirement - for PGP signatures, meaning that there is no consistency requirement - between distributions within a release (and that maintainers may - accidentally forget to upload signatures when adding additional - release distributions). - -While this PEP does not recommend a specific digital attestation format, -it does recognize the utility of Trusted Publishing as a pre-existing, -"zero-configuration" source of strong provenance for Python packages. -Consequently this PEP includes a proposed scheme for exposing each release -file's Trusted Publisher metadata, with the expectation that a future digital -attestation format will likely make use of it. +This PEP proposes a generic attestation format, containing an +:ref:`attestation payload for signature generation `, +with the expectation that index providers adopt the +format with a suitable source of identity for signature verification, such as +Trusted Publishing. Design Considerations --------------------- @@ -126,19 +116,9 @@ areas of Python packaging: metadata within the cryptographic envelope. For example, to prevent domain separation between a distribution's name and - its contents, the digital attestation could be performed over + its contents, this PEP proposes that digital attestations be performed over ``HASH(name || HASH(contents))`` rather than just ``HASH(contents)``. -5. Consistent release attestations: if a file belonging to a release has a - set of digital attestations, then all of the other files belonging to that - release should also have the same types of attestations. - - This simplifies the downstream use story for digital attestations, and - prevents potentially vulnerable "swiss cheese" release patterns (where - a verifier checks for a valid attestation on ``HolyGrail-1.0.tar.gz`` - but their installing client actually resolves an attacker-controlled, - platform-specific ``.whl`` instead). - Previous Work ------------- @@ -165,10 +145,6 @@ In their previously supported form on PyPI, PGP signatures satisfied considerations (1) and (3) above but not (2) (owing to the need for external keyservers and key distribution) or (4) (due to PGP signatures typically being constructed over just an input file, without any associated signed metadata). -Similarly, PyPI's historical implementation of PGP did not satisfy consideration -(5), due to a lack of consistency checks between different release files -(and an inability to perform those checks due to no access to the signer's -public key). Wheel signatures ^^^^^^^^^^^^^^^^ @@ -205,110 +181,354 @@ changes to it: * In addition to the current top-level ``content`` and ``gpg_signature`` fields, the index **SHALL** accept ``attestations`` as an additional multipart form field. -* The new ``attestations`` field **SHALL** be a JSON object. -* The JSON object **SHALL** have one or more keys, each identifying an - attestation format known to the index. If any key does not identify an - attestation format known to the index, the index **MUST** reject the upload. -* The value associated with each well-known key **SHALL** be a JSON object. -* Each attestation value **MUST** be verifiable by the index. If the index fails +* The new ``attestations`` field **SHALL** be a JSON array. +* The ``attestations`` array **SHALL** have one or more items, each a JSON object + representing an individual attestation. +* Each attestation object **MUST** be verifiable by the index. If the index fails to verify any attestation in ``attestations``, it **MUST** reject the upload. - -In addition to the above, the index **SHALL** enforce a consistency -policy for release attestations via the following: - -* If the first file under a new release is supplied with ``attestations``, - then all subsequently uploaded files under the same release **MUST** also - have ``attestations``. Conversely, if the first file under a new release - does not have any ``attestations``, then all subsequent uploads under the - same release **MUST NOT** have ``attestations``. -* All files under the same release **MUST** have the same set of well-known - attestation format keys. - -The index **MUST** reject any file upload that does not satisfy these -consistency properties. + The format of attestation objects is defined under :ref:`attestation-object` + and the process for verifying attestations is defined under + :ref:`attestation-verification`. Index changes ------------- -.. _provenance-object: - -Provenance objects -^^^^^^^^^^^^^^^^^^ - -The index will serve uploaded attestations along with metadata that can assist -in verifying them in the form of JSON serialized objects. - -These "provenance objects" will be available via both the :pep:`503` Simple Index -and :pep:`691` JSON-based Simple API as described below, and will have the -following structure: - -.. code-block:: json - - { - "publisher": { - "type": "important-ci-service", - "claims": {}, - "vendor-property": "foo", - "another-property": 123 - }, - "attestations": { - "some-attestation": {/* ... */}, - "another-attestation": {/* ... */} - } - } - -* ``publisher`` is an **optional** JSON object, containing a - representation of the file's Trusted Publisher configuration at the time - the file was uploaded to the package index. The keys within the ``publisher`` - object are specific to each Trusted Publisher but include, at minimum: - - * A ``type`` key, which **MUST** be a JSON string that uniquely identifies the - kind of Trusted Publisher. - * A ``claims`` key, which **MUST** be a JSON object containing any context-specific - claims retained by the index during Trusted Publisher authentication. - - All other keys in the ``publisher`` object are publisher-specific. A full - illustrative example of a ``publisher`` object is provided in :ref:`appendix-2`. -* ``attestations`` is a **required** JSON object, containing one or - more attestation objects as identified by their keys. This object is - a superset of ``attestations`` object supplied by the uploader through the - ``attestations`` field, as described in :ref:`upload-endpoint`. - - Because ``attestations`` is a superset of the file's original uploaded attestations, - the index **MAY** chose to embed additional attestations of its own. - Simple Index ^^^^^^^^^^^^ -* When an uploaded file has one or more attestations, the index **MAY** include a - ``data-provenance`` attribute on its file link, with a value of either - ``true`` or ``false``. -* When ``data-provenance`` is ``true``, the index **MUST** serve a - :ref:`provenance object ` at the same URL, but with - ``.provenance`` appended to it. For example, if ``HolyGrail-1.0.tar.gz`` - exists and has associated attestations, those attestations would be located - within the provenance object hosted at ``HolyGrail-1.0.tar.gz.provenance``. +* When an uploaded file has one or more attestations, the index **MAY** + provide a ``.provenance`` file adjacent to the hosted distribution. + The format of the ``.provenance`` file **SHALL** be a JSON-encoded + :ref:`provenance object `, which **SHALL** contain + the file's attestations. + + For example, if an uploaded file is hosted at + the URL ``https://example.com/sampleproject-1.2.3.tar.gz``, the provenance + URL would be ``https://example.com/sampleproject-1.2.3.tar.gz.provenance``. + +* When a ``.provenance`` file is present, the index **MAY** include a + ``data-provenance`` attribute on its file link. The value of the + ``data-provenance`` attribute **SHALL** be the SHA256 digest of the + associated ``.provenance`` file. + +* The index **MAY** choose to modify the ``.provenance`` file. For example, + the index **MAY** permit adding additional attestations and verification + materials, such as attestations from third-party auditors or other services. + When the index modifies the ``.provenance`` file, it **MUST** also update the + ``data-provenance`` attribute's value to the new SHA256 digest. + + See :ref:`changes-to-provenance-objects` for an additional discussion of + reasons why a file's provenance may change. JSON-based Simple API ^^^^^^^^^^^^^^^^^^^^^ -* When an uploaded file has one or more attestations, the index **MAY** include a - ``provenance`` object in the ``file`` dictionary for that file. -* ``provenance``, when present, **MUST** be a :ref:`provenance object `. +* When an uploaded file has one or more attestations, the index **MAY** + include a ``provenance`` object in the ``file`` dictionary for that file. + The format of the ``provenance`` object **SHALL** be a JSON-encoded + :ref:`provenance object `, which **SHALL** contain + the file's attestations. + +* The index **MAY** choose to modify the ``provenance`` object, under the same + conditions as the ``.provenance`` file specified above. + + See :ref:`changes-to-provenance-objects` for an additional discussion of + reasons why a file's provenance may change. These changes require a version change to the JSON API: -* The ``api-version`` must specify version 1.2 or later. +* The ``api-version`` **SHALL** specify version 1.2 or later. + +.. _attestation-object: + +Attestation objects +------------------- + +An attestation object is a JSON object with several required keys; applications +or signers may include additional keys so long as all explicitly +listed keys are provided. The required layout of an attestation +object is provided as pseudocode below. + +.. code-block:: python + + @dataclass + class Attestation: + version: Literal[1] + """ + The attestation object's version, which is always 1. + """ + + verification_material: VerificationMaterial + """ + Cryptographic materials used to verify `message_signature`. + """ + + message_signature: str + """ + The attestation's signature, as `base64(raw-sig)`, where `raw-sig` + is the raw bytes of the signing operation over the attestation payload. + """ + + @dataclass + class VerificationMaterial: + certificate: str + """ + The signing certificate, as `base64(DER(cert))`. + """ + + transparency_entries: list[object] + """ + One or more transparency log entries for this attestation's signature + and certificate. + """ + +A full data model for each object in ``transparency_entries`` is provided in +:ref:`appendix-2`. Attestation objects **SHOULD** include one or more +transparency log entries, and **MAY** include additional keys for other +sources of signed time (such as an :rfc:`3161` Time Stamping Authority or a +`Roughtime `__ server). + +Attestation objects are versioned; this PEP specifies version 1. Each version +is tied to a single cryptographic suite to minimize unnecessary cryptographic +agility. In version 1, the suite is as follows: + +* Certificates are specified as X.509 certificates, and comply with the + profile in :rfc:`5280`. +* The message signature algorithm is ECDSA, with the P-256 curve for public keys + and SHA-256 as the cryptographic digest function. + +Future PEPs may change this suite (and the overall shape of the attestation +object) by selecting a new version number. + +.. _payload-and-signature-generation: + +Attestation payload and signature generation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The *attestation payload* is the actual claim that is cryptographically signed +over within the attestation object (as the ``message_signature``). + +The attestation payload is encoded as an :rfc:`8785` canonicalized JSON object, +with the following pseudocode layout: + +.. code-block:: python + + @dataclass + class AttestationPayload: + distribution: str + """ + The file name of the Python package distribution. + """ + + digest: str + """ + The SHA-256 digest of the distribution's contents, as a hexadecimal string. + """ + +The value of ``distribution`` is the same distribution filename that appears +in the :pep:`503` and :pep:`691` APIs. For example, ``distribution`` would be +``sampleproject-1.2.0-py2.py3-none-any.whl`` for the following simple index +entry: + +.. code-block:: html + + sampleproject-1.2.0-py2.py3-none-any.whl
+ +In practice, this means that ``distribution`` is defined by the PyPA's +living specifications for +:ref:`binary distributions ` and +:ref:`source distributions `, although +non-conforming distributions may be hosted by the index. + +The following pseudocode demonstrates the construction of an attestation +payload and its signature: + +.. code-block:: python + + def build_payload(dist: Path) -> AttestationPayload: + return AttestationPayload( + distribution=dist.name, + digest=sha256(dist.read_bytes()).hexdigest, + ) + + attestation_payload = build_payload("sampleproject-1.2.0-py2.py3-none-any.whl") + + # canonical_json is a fictitious module that performs RFC 8785 canonical + # JSON serialization. + encoded_payload = canonical_json.dumps(asdict(attestation_payload)) + + raw_signature = signing_key.sign(encoded_payload, ECDSA(SHA2_256())) + message_signature = b64encode(raw_signature) + +.. _provenance-object: + +Provenance objects +------------------ + +The index will serve uploaded attestations along with metadata that can assist +in verifying them in the form of JSON serialized objects. + +These *provenance objects* will be available via both the :pep:`503` Simple Index +and :pep:`691` JSON-based Simple API as described above, and will have the +following layout: + +.. code-block:: json + + { + "version": 1, + "attestation_bundles": [ + { + "publisher": { + "kind": "important-ci-service", + "claims": {}, + "vendor-property": "foo", + "another-property": 123 + }, + "attestations": [ + { /* attestation 1 ... */ }, + { /* attestation 2 ... */ } + ] + } + ] + } + +or, as pseudocode: + +.. code-block:: python + + @dataclass + class Publisher: + kind: string + """ + The kind of Trusted Publisher. + """ + + claims: object | None + """ + Any context-specific claims retained by the index during Trusted Publisher + authentication. + """ + + _rest: object + """ + Each publisher object is open-ended, meaning that it MAY contain additional + fields beyond the ones specified explicitly above. This field signals that, + but is not itself present. + """ + + @dataclass + class AttestationBundle: + publisher: Publisher + """ + The publisher associated with this set of attestations. + """ + + attestations: list[Attestation] + """ + The set of attestations included in this bundle. + """ + + @dataclass + class Provenance: + version: Literal[1] + """ + The provenance object's version, which is always 1. + """ + + attestation_bundles: list[AttestationBundle] + """ + One or more attestation "bundles". + """ + +* ``version`` is ``1``. Like attestation objects, provenance objects are + versioned, and this PEP only defines version ``1``. +* ``attestation_bundles`` is a **required** JSON array, containing one + or more "bundles" of attestations. Each bundle corresponds to a + signing identity (such as a Trusted Publishing identity), and contains + one or more attestation objects. + + As noted in the ``Publisher`` model, + each ``AttestationBundle.publisher`` object is specific to its Trusted Publisher + but must include at minimum: + + * A ``kind`` key, which **MUST** be a JSON string that uniquely identifies the + kind of Trusted Publisher. + * A ``claims`` key, which **MUST** be a JSON object containing any context-specific + claims retained by the index during Trusted Publisher authentication. + + All other keys in the publisher object are publisher-specific. A full + illustrative example of a publisher object is provided in :ref:`appendix-1`. + + Each array of attestation objects is a superset of the ``attestations`` + array supplied by the uploaded through the ``attestations`` field at upload + time, as described in :ref:`upload-endpoint` and + :ref:`changes-to-provenance-objects`. + +.. _changes-to-provenance-objects: + +Changes to provenance objects +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Provenance objects are *not* immutable, and may change over time. Reasons +for changes to the provenance object include but are not limited to: + +* Addition of new attestations for a pre-existing signing identity: the index + **MAY** choose to allow additional attestations by pre-existing signing + identities, such as newer attestation versions for already uploaded + files. + +* Addition of new signing identities and associated attestations: the index + **MAY** choose to support attestations from sources other than the file's + uploader, such as third-party auditors or the index itself. These attestations + may be performed asynchronously, requiring the index to insert them into + the provenance object *post facto*. + +.. _attestation-verification: + +Attestation verification +------------------------ + +Verifying an attestation object requires verification of each of the following: + +* ``version`` is ``1``. The verifier **MUST** reject any other version. +* ``verification_material.certificate`` is a valid signing certificate, as + issued by an *a priori* trusted authority (such as a root of trust already + present within the verifying client). +* ``verification_material.certificate`` identifies an appropriate signing + subject, such as the machine identity of the Trusted Publisher that published + the package. +* ``message_signature`` can be verified by ``verification_material.certificate``, + using the reconstructed attestation payload as the cleartext input. The + verifier **MUST** reconstruct the attestation payload itself. + +In addition to the above required steps, a verifier **MAY** additionally verify +``verification_material.transparency_entries`` on a policy basis, e.g. requiring +at least one transparency log entry or a threshold of entries. When verifying +transparency entries, the verifier **MUST** confirm that the inclusion time for +each entry lies within the signing certificate's validity period. Security Implications ===================== -This PEP is "mechanical" in nature; it provides only the plumbing for future -digital attestations on package indices, without specifying their concrete -cryptographic details. +This PEP is primarily "mechanical" in nature; it provides layouts for +structuring and serving verifiable digital attestations without specifying +higher level security "policies" around attestation validity, thresholds +between attestations, and so forth. -As such, we do not identify any positive or negative security implications -for this PEP. +Cryptographic agility in attestations +------------------------------------- + +Algorithmic agility is a common source of exploitable vulnerabilities +in cryptographic schemes. This PEP limits algorithmic agility in two ways: + +* All algorithms are specified in a single suite, rather than a geometric + collection of parameters. This makes it impossible (for example) for an + attacker to select a strong signature algorithm with a weak hash function, + compromising the scheme as a whole. +* Attestation objects are versioned, and may only contain the algorithmic + suite specified for their version. If a specific suite + is considered insecure in the future, clients may choose to blanket reject + or qualify verifications of attestations that contain that suite. Index trust ----------- @@ -318,82 +538,44 @@ the index is still effectively trusted to honestly deliver unmodified package distributions, since a dishonest index capable of modifying package contents could also dishonestly modify or omit package attestations. As a result, this PEP's presumption of index trust is equivalent to the -unstated presumption with earlier mechanisms, like PGP and Wheel signatures. +unstated presumption with earlier mechanisms, like PGP and wheel signatures. This PEP does not preclude or exclude future index trust mechanisms, such as :pep:`458` and/or :pep:`480`. +Flexible attestations +--------------------- + +This PEP specifies a fixed attestation payload (defined in +:ref:`payload-and-signature-generation`), binding the contents of each uploaded +file to its public name on the index. This payload format is fixed and +inflexible to ease implementation, and to minimize additional mechanical +changes to the index itself (e.g., needing to store and service detached +attestation documents). + +This PEP does not preclude or exclude future more flexible attestation payload +formats, such as ones built on `in-toto `__. + Recommendations =============== -This PEP does not recommend specific attestation formats. It does, -however, make the following recommendations to package indices seeking -to create new or implement pre-existing attestation formats: - -1. Consult the :ref:`living PyPA specifications ` - first, and determine if any currently defined attestation formats suit - your purpose. -2. If no suitable attestation format is defined under the PyPA specifications, - consider submitting it to the PyPA specifications for longevity and reuse - purposes. - -When designing a new attestation format, we make the following recommendations: - -1. Pick a short, but unique name for your attestation format; this name will - serve as the attestation's identifier in the upload and index APIs. - - When appropriate for an attestation format, we recommend using ``:`` as a - domain separator. For example, an attestation format that provides publish - provenance using `Sigstore `_ might have the - name ``sigstore:publish``. -2. Prefer parsimony in your format: avoid optional fields and functionality, - avoid unnecessary cryptographic agility and message malleability, and ensure - that verifying the attestation communicates something meaningful beyond a - basic integrity check (since the index itself already supplies cryptographic - digests for this purpose). +This PEP recommends, but does not mandate, that attestation objects +contain one or more verifiable sources of signed time that corroborate the +signing certificate's claimed validity period. Indices that implement this +PEP may choose to strictly enforce this requirement. .. _appendix-1: -Appendix 1: Example Uploaded Attestations -========================================= - -This appendix provides a fictional example of the ``attestations`` field -submitted on file upload, with two fictional attestations (``publish`` and -``timestamp``): - -.. code-block:: json - - { - "publish": { - "mediaType": "application/vnd.dev.sigstore.bundle+json;version=0.2", - "verificationMaterial": { /* omitted for brevity */ }, - "messageSignature": { - "messageDigest": { - "algorithm": "some-hash-algo", - "digest": "digest-here" - }, - "signature": "signature-here" - } - }, - "timestamp": { - "cms": "some-long-blob-here" - } - } - -The payloads of these fictional attestations are purely illustrative. - -.. _appendix-2: - -Appendix 2: Example Trusted Publisher Representation +Appendix 1: Example Trusted Publisher Representation ==================================================== This appendix provides a fictional example of a ``publisher`` key within -a :pep:`691` ``project.files[].provenance`` listing: +a simple JSON API ``project.files[].provenance`` listing: .. code-block:: json "publisher": { - "type": "GitHub", + "kind": "GitHub", "claims": { "ref": "refs/tags/v1.0.0", "sha": "da39a3ee5e6b4b0d3255bfef95601890afd80709" @@ -405,6 +587,87 @@ a :pep:`691` ``project.files[].provenance`` listing: "environment": null } + +.. _appendix-2: + +Appendix 2: Data models for Transparency Log Entries +==================================================== + +This appendix contains pseudocoded data models for transparency log entries +in attestation objects. Each transparency log entry serves as a source +of signed inclusion time, and can be verified either online or offline. + +.. code-block:: python + + @dataclass + class TransparencyLogEntry: + log_index: int + """ + The global index of the log entry, used when querying the log. + """ + + log_id: str + """ + An opaque, unique identifier for the log. + """ + + entry_kind: str + """ + The kind (type) of log entry. + """ + + entry_version: str + """ + The version of the log entry's submitted format. + """ + + integrated_time: int + """ + The UNIX timestamp from the log from when the entry was persisted. + """ + + inclusion_proof: InclusionProof + """ + The actual inclusion proof the the log entry. + """ + + + @dataclass + class InclusionProof: + log_index: int + """ + The index of the entry in the tree it was written to. + """ + + root_hash: str + """ + The digest stored at the root of the Merkle tree at the time of proof + generation. + """ + + tree_size: int + """ + The size of the Merkle tree at the time of proof generation. + """ + + hashes: list[str] + """ + A list of hashes required to complete the inclusion proof, sorted + in order from leaf to root. The leaf and root hashes are not themselves + included in this list; the root is supplied via `root_hash` and the client + must calculate the leaf hash. + """ + + checkpoint: str + """ + The signed tree head's signature, at the time of proof generation. + """ + + cosigned_checkpoints: list[str] + """ + Cosigned checkpoints from zero or more log witnesses. + """ + Copyright =========