diff --git a/peps/pep-0740.rst b/peps/pep-0740.rst index ea2658547..28871fc87 100644 --- a/peps/pep-0740.rst +++ b/peps/pep-0740.rst @@ -24,8 +24,8 @@ These changes have two subcomponents: * Changes to the currently unstandardized PyPI upload API, allowing clients to upload digital attestations as :ref:`attestation objects `; -* Changes to the :pep:`503` and :pep:`691` "simple" APIs, allowing clients - to retrieve both digital attestations and +* Changes to the :ref:`HTML and JSON "simple" APIs `, + allowing clients to retrieve both digital attestations and `Trusted Publishing `_ metadata for individual release files as :ref:`provenance objects `. @@ -75,7 +75,7 @@ Additionally, this proposal identifies the following motivations: of the metadata needed by the index to verify an attestation's validity. This PEP proposes a generic attestation format, containing an -:ref:`attestation payload for signature generation `, +:ref:`attestation statement for signature generation `, with the expectation that index providers adopt the format with a suitable source of identity for signature verification, such as Trusted Publishing. @@ -116,8 +116,9 @@ areas of Python packaging: metadata within the cryptographic envelope. For example, to prevent domain separation between a distribution's name and - its contents, this PEP proposes that digital attestations be performed over - ``HASH(name || HASH(contents))`` rather than just ``HASH(contents)``. + its contents, this PEP uses '`Statements `__' + from the `in-toto project `__ to bind the distribution's + contents (via SHA-256 digest) to its filename. Previous Work @@ -196,6 +197,9 @@ Index changes Simple Index ^^^^^^^^^^^^ +The following changes are made to the +:ref:`simple repository API `: + * When an uploaded file has one or more attestations, the index **MAY** provide a ``.provenance`` file adjacent to the hosted distribution. The format of the ``.provenance`` file **SHALL** be a JSON-encoded @@ -208,14 +212,14 @@ Simple Index * When a ``.provenance`` file is present, the index **MAY** include a ``data-provenance`` attribute on its file link. The value of the - ``data-provenance`` attribute **SHALL** be the SHA256 digest of the + ``data-provenance`` attribute **SHALL** be the SHA-256 digest of the associated ``.provenance`` file. * The index **MAY** choose to modify the ``.provenance`` file. For example, the index **MAY** permit adding additional attestations and verification materials, such as attestations from third-party auditors or other services. When the index modifies the ``.provenance`` file, it **MUST** also update the - ``data-provenance`` attribute's value to the new SHA256 digest. + ``data-provenance`` attribute's value to the new SHA-256 digest. See :ref:`changes-to-provenance-objects` for an additional discussion of reasons why a file's provenance may change. @@ -223,17 +227,19 @@ Simple Index JSON-based Simple API ^^^^^^^^^^^^^^^^^^^^^ +The following changes are made to the +:ref:`JSON simple API `: + * When an uploaded file has one or more attestations, the index **MAY** - include a ``provenance`` object in the ``file`` dictionary for that file. - The format of the ``provenance`` object **SHALL** be a JSON-encoded - :ref:`provenance object `, which **SHALL** contain - the file's attestations. + include a ``provenance`` key in the ``file`` dictionary for that file. -* The index **MAY** choose to modify the ``provenance`` object, under the same - conditions as the ``.provenance`` file specified above. + The value of the ``provenance`` key **SHALL** be a JSON string, which + **SHALL** be the SHA-256 digest of the associated ``.provenance`` file, + as in the Simple Index. - See :ref:`changes-to-provenance-objects` for an additional discussion of - reasons why a file's provenance may change. + See :ref:`appendix-3` for an explanation of the technical decision to + embed the SHA-256 digest in the JSON API, rather than the full + :ref:`provenance object `. These changes require a version change to the JSON API: @@ -260,13 +266,28 @@ object is provided as pseudocode below. verification_material: VerificationMaterial """ - Cryptographic materials used to verify `message_signature`. + Cryptographic materials used to verify `envelope`. """ - message_signature: str + envelope: Envelope """ - The attestation's signature, as `base64(raw-sig)`, where `raw-sig` - is the raw bytes of the signing operation over the attestation payload. + The enveloped attestation statement and signature. + """ + + + @dataclass + class Envelope: + statement: bytes + """ + The attestation statement. + + This is represented as opaque bytes on the wire (encoded as base64), + but it MUST be an JSON in-toto v1 Statement. + """ + + signature: bytes + """ + A signature for the above statement, encoded as base64. """ @dataclass @@ -302,63 +323,36 @@ object) by selecting a new version number. .. _payload-and-signature-generation: -Attestation payload and signature generation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Attestation statement and signature generation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The *attestation payload* is the actual claim that is cryptographically signed -over within the attestation object (as the ``message_signature``). +The *attestation statement* is the actual claim that is cryptographically signed +over within the attestation object (i.e., the ``envelope.statement``). -The attestation payload is encoded as an :rfc:`8785` canonicalized JSON object, -with the following pseudocode layout: +The attestation statement is encoded as a +`v1 in-toto Statement object `__, +in JSON form. When serialized the statement is treated as an opaque binary blob, +avoiding the need for canonicalization. An example JSON-encoded statement is +provided in :ref:`appendix-4`. -.. code-block:: python +In addition to being a v1 in-toto Statement, the attestation statement is constrained +in the following ways: - @dataclass - class AttestationPayload: - distribution: str - """ - The file name of the Python package distribution. - """ +* The in-toto ``subject`` **MUST** contain only a single subject. +* ``subject[0].name`` is the distribution's filename, which **MUST** be + a valid :ref:`source distribution ` or + :ref:`wheel distribution ` filename. +* ``subject[0].digest`` **MUST** contain a SHA-256 digest. Other digests + **MAY** be present. The digests **MUST** be represented as hexadecimal strings. +* The following ``predicateType`` values are supported: - digest: str - """ - The SHA-256 digest of the distribution's contents, as a hexadecimal string. - """ + * `SLSA Provenance `__: ``https://slsa.dev/provenance/v1`` + * `PyPI Publish Attestation `__: ``https://docs.pypi.org/attestations/publish/v1`` -The value of ``distribution`` is the same distribution filename that appears -in the :pep:`503` and :pep:`691` APIs. For example, ``distribution`` would be -``sampleproject-1.2.0-py2.py3-none-any.whl`` for the following simple index -entry: - -.. code-block:: html - - sampleproject-1.2.0-py2.py3-none-any.whl
- -In practice, this means that ``distribution`` is defined by the PyPA's -living specifications for -:ref:`binary distributions ` and -:ref:`source distributions `, although -non-conforming distributions may be hosted by the index. - -The following pseudocode demonstrates the construction of an attestation -payload and its signature: - -.. code-block:: python - - def build_payload(dist: Path) -> AttestationPayload: - return AttestationPayload( - distribution=dist.name, - digest=sha256(dist.read_bytes()).hexdigest, - ) - - attestation_payload = build_payload("sampleproject-1.2.0-py2.py3-none-any.whl") - - # canonical_json is a fictitious module that performs RFC 8785 canonical - # JSON serialization. - encoded_payload = canonical_json.dumps(asdict(attestation_payload)) - - raw_signature = signing_key.sign(encoded_payload, ECDSA(SHA2_256())) - message_signature = b64encode(raw_signature) +The signature over this statement is constructed using the +`v1 DSSE signature protocol `__, +with a ``PAYLOAD_TYPE`` of ``application/vnd.in-toto+json`` and a ``PAYLOAD_BODY`` of the JSON-encoded +statement above. No other ``PAYLOAD_TYPE`` is permitted. .. _provenance-object: @@ -368,9 +362,8 @@ Provenance objects The index will serve uploaded attestations along with metadata that can assist in verifying them in the form of JSON serialized objects. -These *provenance objects* will be available via both the :pep:`503` Simple Index -and :pep:`691` JSON-based Simple API as described above, and will have the -following layout: +These *provenance objects* will be available via both the Simple Index +and JSON-based Simple API as described above, and will have the following layout: .. code-block:: json @@ -488,7 +481,8 @@ for changes to the provenance object include but are not limited to: Attestation verification ------------------------ -Verifying an attestation object requires verification of each of the following: +Verifying an attestation object against a distribution file requires verification of each of the +following: * ``version`` is ``1``. The verifier **MUST** reject any other version. * ``verification_material.certificate`` is a valid signing certificate, as @@ -497,9 +491,15 @@ Verifying an attestation object requires verification of each of the following: * ``verification_material.certificate`` identifies an appropriate signing subject, such as the machine identity of the Trusted Publisher that published the package. -* ``message_signature`` can be verified by ``verification_material.certificate``, - using the reconstructed attestation payload as the cleartext input. The - verifier **MUST** reconstruct the attestation payload itself. +* ``envelope.statement`` is a valid in-toto v1 Statement, with a subject + and digest that **MUST** match the distribution's filename and contents. + For the distribution's filename, matching **MUST** be performed by parsing + using the appropriate source distribution or wheel filename format, as + the statement's subject may be equivalent but normalized. +* ``envelope.signature`` is a valid signature for ``envelope.statement`` + corresponding to ``verification_material.certificate``, + as reconstituted via the + `v1 DSSE signature protocol `__. In addition to the above required steps, a verifier **MAY** additionally verify ``verification_material.transparency_entries`` on a policy basis, e.g. requiring @@ -543,19 +543,6 @@ unstated presumption with earlier mechanisms, like PGP and wheel signatures. This PEP does not preclude or exclude future index trust mechanisms, such as :pep:`458` and/or :pep:`480`. -Flexible attestations ---------------------- - -This PEP specifies a fixed attestation payload (defined in -:ref:`payload-and-signature-generation`), binding the contents of each uploaded -file to its public name on the index. This payload format is fixed and -inflexible to ease implementation, and to minimize additional mechanical -changes to the index itself (e.g., needing to store and service detached -attestation documents). - -This PEP does not preclude or exclude future more flexible attestation payload -formats, such as ones built on `in-toto `__. - Recommendations =============== @@ -628,7 +615,7 @@ of signed inclusion time, and can be verified either online or offline. inclusion_proof: InclusionProof """ - The actual inclusion proof the the log entry. + The actual inclusion proof of the log entry. """ @@ -668,6 +655,58 @@ of signed inclusion time, and can be verified either online or offline. Cosigned checkpoints from zero or more log witnesses. """ +.. _appendix-3: + +Appendix 3: Simple JSON API size considerations +=============================================== + +A previous draft of this PEP required embedding each +:ref:`provenance object ` directly into its appropriate part +of the JSON Simple API. + +The current version of this PEP embeds the SHA-256 digest of the provenance +object instead. This is done for size and network bandwidth consideration +reasons: + +1. We estimate the typical size of an attestation object to be approximately + 5.3 KB of JSON. +2. We conservatively estimate that indices eventually host around 3 attestations + per release file, or approximately 15.9 KB of JSON per combined provenance + object. +3. As of May 2024, the average project on PyPI has approximately 21 release + files. We conservatively expect this average to increase over time. +4. Combined, these numbers imply that a typical project might expect to host + between 60 and 70 attestations, or approximately 339 KB of additional JSON + in its "project detail" endpoint. + +These numbers are significantly worse in "pathological" cases, where projects +have hundreds or thousands of releases and/or dozens of files per release. + +.. _appendix-4: + +Appendix 4: Example attestation statement +========================================= + +Given a source distribution ``sampleproject-1.2.3.tar.gz`` with a SHA-256 +digest of ``e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855``, +the following is an appropriate in-toto Statement, as a JSON object: + +.. code-block:: json + + { + "_type": "https://in-toto.io/Statement/v1", + "subject": [ + { + "name": "sampleproject-1.2.3.tar.gz", + "digest": {"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"} + } + ], + "predicateType": "https://some-arbitrary-predicate.example.com/v1", + "predicate": { + "something-else": "foo" + } + } + Copyright =========