PEP 740: tweak JSON simple API prescriptions (#3768)

Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Facundo Tuesca <facundo.tuesca@trailofbits.com>
This commit is contained in:
William Woodruff 2024-06-12 16:08:35 -04:00 committed by GitHub
parent 764f563338
commit 67631c3428
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 130 additions and 91 deletions

View File

@ -24,8 +24,8 @@ These changes have two subcomponents:
* Changes to the currently unstandardized PyPI upload API, allowing clients * Changes to the currently unstandardized PyPI upload API, allowing clients
to upload digital attestations as :ref:`attestation objects <attestation-object>`; to upload digital attestations as :ref:`attestation objects <attestation-object>`;
* Changes to the :pep:`503` and :pep:`691` "simple" APIs, allowing clients * Changes to the :ref:`HTML and JSON "simple" APIs <packaging:simple-repository-api>`,
to retrieve both digital attestations and allowing clients to retrieve both digital attestations and
`Trusted Publishing <https://docs.pypi.org/trusted-publishers/>`_ metadata `Trusted Publishing <https://docs.pypi.org/trusted-publishers/>`_ metadata
for individual release files as :ref:`provenance objects <provenance-object>`. for individual release files as :ref:`provenance objects <provenance-object>`.
@ -75,7 +75,7 @@ Additionally, this proposal identifies the following motivations:
of the metadata needed by the index to verify an attestation's validity. of the metadata needed by the index to verify an attestation's validity.
This PEP proposes a generic attestation format, containing an This PEP proposes a generic attestation format, containing an
:ref:`attestation payload for signature generation <payload-and-signature-generation>`, :ref:`attestation statement for signature generation <payload-and-signature-generation>`,
with the expectation that index providers adopt the with the expectation that index providers adopt the
format with a suitable source of identity for signature verification, such as format with a suitable source of identity for signature verification, such as
Trusted Publishing. Trusted Publishing.
@ -116,8 +116,9 @@ areas of Python packaging:
metadata within the cryptographic envelope. metadata within the cryptographic envelope.
For example, to prevent domain separation between a distribution's name and For example, to prevent domain separation between a distribution's name and
its contents, this PEP proposes that digital attestations be performed over its contents, this PEP uses '`Statements <https://github.com/in-toto/attestation/blob/v1.0/spec/v1.0/statement.md>`__'
``HASH(name || HASH(contents))`` rather than just ``HASH(contents)``. from the `in-toto project <https://in-toto.io/>`__ to bind the distribution's
contents (via SHA-256 digest) to its filename.
Previous Work Previous Work
@ -196,6 +197,9 @@ Index changes
Simple Index Simple Index
^^^^^^^^^^^^ ^^^^^^^^^^^^
The following changes are made to the
:ref:`simple repository API <packaging:simple-repository-api-base>`:
* When an uploaded file has one or more attestations, the index **MAY** * When an uploaded file has one or more attestations, the index **MAY**
provide a ``.provenance`` file adjacent to the hosted distribution. provide a ``.provenance`` file adjacent to the hosted distribution.
The format of the ``.provenance`` file **SHALL** be a JSON-encoded The format of the ``.provenance`` file **SHALL** be a JSON-encoded
@ -208,14 +212,14 @@ Simple Index
* When a ``.provenance`` file is present, the index **MAY** include a * When a ``.provenance`` file is present, the index **MAY** include a
``data-provenance`` attribute on its file link. The value of the ``data-provenance`` attribute on its file link. The value of the
``data-provenance`` attribute **SHALL** be the SHA256 digest of the ``data-provenance`` attribute **SHALL** be the SHA-256 digest of the
associated ``.provenance`` file. associated ``.provenance`` file.
* The index **MAY** choose to modify the ``.provenance`` file. For example, * The index **MAY** choose to modify the ``.provenance`` file. For example,
the index **MAY** permit adding additional attestations and verification the index **MAY** permit adding additional attestations and verification
materials, such as attestations from third-party auditors or other services. materials, such as attestations from third-party auditors or other services.
When the index modifies the ``.provenance`` file, it **MUST** also update the When the index modifies the ``.provenance`` file, it **MUST** also update the
``data-provenance`` attribute's value to the new SHA256 digest. ``data-provenance`` attribute's value to the new SHA-256 digest.
See :ref:`changes-to-provenance-objects` for an additional discussion of See :ref:`changes-to-provenance-objects` for an additional discussion of
reasons why a file's provenance may change. reasons why a file's provenance may change.
@ -223,17 +227,19 @@ Simple Index
JSON-based Simple API JSON-based Simple API
^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^
The following changes are made to the
:ref:`JSON simple API <packaging:simple-repository-api-json>`:
* When an uploaded file has one or more attestations, the index **MAY** * When an uploaded file has one or more attestations, the index **MAY**
include a ``provenance`` object in the ``file`` dictionary for that file. include a ``provenance`` key in the ``file`` dictionary for that file.
The format of the ``provenance`` object **SHALL** be a JSON-encoded
:ref:`provenance object <provenance-object>`, which **SHALL** contain
the file's attestations.
* The index **MAY** choose to modify the ``provenance`` object, under the same The value of the ``provenance`` key **SHALL** be a JSON string, which
conditions as the ``.provenance`` file specified above. **SHALL** be the SHA-256 digest of the associated ``.provenance`` file,
as in the Simple Index.
See :ref:`changes-to-provenance-objects` for an additional discussion of See :ref:`appendix-3` for an explanation of the technical decision to
reasons why a file's provenance may change. embed the SHA-256 digest in the JSON API, rather than the full
:ref:`provenance object <provenance-object>`.
These changes require a version change to the JSON API: These changes require a version change to the JSON API:
@ -260,13 +266,28 @@ object is provided as pseudocode below.
verification_material: VerificationMaterial verification_material: VerificationMaterial
""" """
Cryptographic materials used to verify `message_signature`. Cryptographic materials used to verify `envelope`.
""" """
message_signature: str envelope: Envelope
""" """
The attestation's signature, as `base64(raw-sig)`, where `raw-sig` The enveloped attestation statement and signature.
is the raw bytes of the signing operation over the attestation payload. """
@dataclass
class Envelope:
statement: bytes
"""
The attestation statement.
This is represented as opaque bytes on the wire (encoded as base64),
but it MUST be an JSON in-toto v1 Statement.
"""
signature: bytes
"""
A signature for the above statement, encoded as base64.
""" """
@dataclass @dataclass
@ -302,63 +323,36 @@ object) by selecting a new version number.
.. _payload-and-signature-generation: .. _payload-and-signature-generation:
Attestation payload and signature generation Attestation statement and signature generation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The *attestation payload* is the actual claim that is cryptographically signed The *attestation statement* is the actual claim that is cryptographically signed
over within the attestation object (as the ``message_signature``). over within the attestation object (i.e., the ``envelope.statement``).
The attestation payload is encoded as an :rfc:`8785` canonicalized JSON object, The attestation statement is encoded as a
with the following pseudocode layout: `v1 in-toto Statement object <https://github.com/in-toto/attestation/blob/v1.0/spec/v1.0/statement.md>`__,
in JSON form. When serialized the statement is treated as an opaque binary blob,
avoiding the need for canonicalization. An example JSON-encoded statement is
provided in :ref:`appendix-4`.
.. code-block:: python In addition to being a v1 in-toto Statement, the attestation statement is constrained
in the following ways:
@dataclass * The in-toto ``subject`` **MUST** contain only a single subject.
class AttestationPayload: * ``subject[0].name`` is the distribution's filename, which **MUST** be
distribution: str a valid :ref:`source distribution <packaging:source-distribution-format>` or
""" :ref:`wheel distribution <packaging:binary-distribution-format>` filename.
The file name of the Python package distribution. * ``subject[0].digest`` **MUST** contain a SHA-256 digest. Other digests
""" **MAY** be present. The digests **MUST** be represented as hexadecimal strings.
* The following ``predicateType`` values are supported:
digest: str * `SLSA Provenance <https://slsa.dev/provenance/v1>`__: ``https://slsa.dev/provenance/v1``
""" * `PyPI Publish Attestation <https://docs.pypi.org/attestations/publish/v1>`__: ``https://docs.pypi.org/attestations/publish/v1``
The SHA-256 digest of the distribution's contents, as a hexadecimal string.
"""
The value of ``distribution`` is the same distribution filename that appears The signature over this statement is constructed using the
in the :pep:`503` and :pep:`691` APIs. For example, ``distribution`` would be `v1 DSSE signature protocol <https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md>`__,
``sampleproject-1.2.0-py2.py3-none-any.whl`` for the following simple index with a ``PAYLOAD_TYPE`` of ``application/vnd.in-toto+json`` and a ``PAYLOAD_BODY`` of the JSON-encoded
entry: statement above. No other ``PAYLOAD_TYPE`` is permitted.
.. code-block:: html
<a href="https://example.com/...">sampleproject-1.2.0-py2.py3-none-any.whl</a><br/>
In practice, this means that ``distribution`` is defined by the PyPA's
living specifications for
:ref:`binary distributions <packaging:binary-distribution-format>` and
:ref:`source distributions <packaging:source-distribution-format>`, although
non-conforming distributions may be hosted by the index.
The following pseudocode demonstrates the construction of an attestation
payload and its signature:
.. code-block:: python
def build_payload(dist: Path) -> AttestationPayload:
return AttestationPayload(
distribution=dist.name,
digest=sha256(dist.read_bytes()).hexdigest,
)
attestation_payload = build_payload("sampleproject-1.2.0-py2.py3-none-any.whl")
# canonical_json is a fictitious module that performs RFC 8785 canonical
# JSON serialization.
encoded_payload = canonical_json.dumps(asdict(attestation_payload))
raw_signature = signing_key.sign(encoded_payload, ECDSA(SHA2_256()))
message_signature = b64encode(raw_signature)
.. _provenance-object: .. _provenance-object:
@ -368,9 +362,8 @@ Provenance objects
The index will serve uploaded attestations along with metadata that can assist The index will serve uploaded attestations along with metadata that can assist
in verifying them in the form of JSON serialized objects. in verifying them in the form of JSON serialized objects.
These *provenance objects* will be available via both the :pep:`503` Simple Index These *provenance objects* will be available via both the Simple Index
and :pep:`691` JSON-based Simple API as described above, and will have the and JSON-based Simple API as described above, and will have the following layout:
following layout:
.. code-block:: json .. code-block:: json
@ -488,7 +481,8 @@ for changes to the provenance object include but are not limited to:
Attestation verification Attestation verification
------------------------ ------------------------
Verifying an attestation object requires verification of each of the following: Verifying an attestation object against a distribution file requires verification of each of the
following:
* ``version`` is ``1``. The verifier **MUST** reject any other version. * ``version`` is ``1``. The verifier **MUST** reject any other version.
* ``verification_material.certificate`` is a valid signing certificate, as * ``verification_material.certificate`` is a valid signing certificate, as
@ -497,9 +491,15 @@ Verifying an attestation object requires verification of each of the following:
* ``verification_material.certificate`` identifies an appropriate signing * ``verification_material.certificate`` identifies an appropriate signing
subject, such as the machine identity of the Trusted Publisher that published subject, such as the machine identity of the Trusted Publisher that published
the package. the package.
* ``message_signature`` can be verified by ``verification_material.certificate``, * ``envelope.statement`` is a valid in-toto v1 Statement, with a subject
using the reconstructed attestation payload as the cleartext input. The and digest that **MUST** match the distribution's filename and contents.
verifier **MUST** reconstruct the attestation payload itself. For the distribution's filename, matching **MUST** be performed by parsing
using the appropriate source distribution or wheel filename format, as
the statement's subject may be equivalent but normalized.
* ``envelope.signature`` is a valid signature for ``envelope.statement``
corresponding to ``verification_material.certificate``,
as reconstituted via the
`v1 DSSE signature protocol <https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md>`__.
In addition to the above required steps, a verifier **MAY** additionally verify In addition to the above required steps, a verifier **MAY** additionally verify
``verification_material.transparency_entries`` on a policy basis, e.g. requiring ``verification_material.transparency_entries`` on a policy basis, e.g. requiring
@ -543,19 +543,6 @@ unstated presumption with earlier mechanisms, like PGP and wheel signatures.
This PEP does not preclude or exclude future index trust mechanisms, such This PEP does not preclude or exclude future index trust mechanisms, such
as :pep:`458` and/or :pep:`480`. as :pep:`458` and/or :pep:`480`.
Flexible attestations
---------------------
This PEP specifies a fixed attestation payload (defined in
:ref:`payload-and-signature-generation`), binding the contents of each uploaded
file to its public name on the index. This payload format is fixed and
inflexible to ease implementation, and to minimize additional mechanical
changes to the index itself (e.g., needing to store and service detached
attestation documents).
This PEP does not preclude or exclude future more flexible attestation payload
formats, such as ones built on `in-toto <https://in-toto.io/>`__.
Recommendations Recommendations
=============== ===============
@ -628,7 +615,7 @@ of signed inclusion time, and can be verified either online or offline.
inclusion_proof: InclusionProof inclusion_proof: InclusionProof
""" """
The actual inclusion proof the the log entry. The actual inclusion proof of the log entry.
""" """
@ -668,6 +655,58 @@ of signed inclusion time, and can be verified either online or offline.
Cosigned checkpoints from zero or more log witnesses. Cosigned checkpoints from zero or more log witnesses.
""" """
.. _appendix-3:
Appendix 3: Simple JSON API size considerations
===============================================
A previous draft of this PEP required embedding each
:ref:`provenance object <provenance-object>` directly into its appropriate part
of the JSON Simple API.
The current version of this PEP embeds the SHA-256 digest of the provenance
object instead. This is done for size and network bandwidth consideration
reasons:
1. We estimate the typical size of an attestation object to be approximately
5.3 KB of JSON.
2. We conservatively estimate that indices eventually host around 3 attestations
per release file, or approximately 15.9 KB of JSON per combined provenance
object.
3. As of May 2024, the average project on PyPI has approximately 21 release
files. We conservatively expect this average to increase over time.
4. Combined, these numbers imply that a typical project might expect to host
between 60 and 70 attestations, or approximately 339 KB of additional JSON
in its "project detail" endpoint.
These numbers are significantly worse in "pathological" cases, where projects
have hundreds or thousands of releases and/or dozens of files per release.
.. _appendix-4:
Appendix 4: Example attestation statement
=========================================
Given a source distribution ``sampleproject-1.2.3.tar.gz`` with a SHA-256
digest of ``e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855``,
the following is an appropriate in-toto Statement, as a JSON object:
.. code-block:: json
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [
{
"name": "sampleproject-1.2.3.tar.gz",
"digest": {"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}
}
],
"predicateType": "https://some-arbitrary-predicate.example.com/v1",
"predicate": {
"something-else": "foo"
}
}
Copyright Copyright
========= =========