PEP 740: tweak JSON simple API prescriptions (#3768)

Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Facundo Tuesca <facundo.tuesca@trailofbits.com>
This commit is contained in:
William Woodruff 2024-06-12 16:08:35 -04:00 committed by GitHub
parent 764f563338
commit 67631c3428
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 130 additions and 91 deletions

View File

@ -24,8 +24,8 @@ These changes have two subcomponents:
* Changes to the currently unstandardized PyPI upload API, allowing clients
to upload digital attestations as :ref:`attestation objects <attestation-object>`;
* Changes to the :pep:`503` and :pep:`691` "simple" APIs, allowing clients
to retrieve both digital attestations and
* Changes to the :ref:`HTML and JSON "simple" APIs <packaging:simple-repository-api>`,
allowing clients to retrieve both digital attestations and
`Trusted Publishing <https://docs.pypi.org/trusted-publishers/>`_ metadata
for individual release files as :ref:`provenance objects <provenance-object>`.
@ -75,7 +75,7 @@ Additionally, this proposal identifies the following motivations:
of the metadata needed by the index to verify an attestation's validity.
This PEP proposes a generic attestation format, containing an
:ref:`attestation payload for signature generation <payload-and-signature-generation>`,
:ref:`attestation statement for signature generation <payload-and-signature-generation>`,
with the expectation that index providers adopt the
format with a suitable source of identity for signature verification, such as
Trusted Publishing.
@ -116,8 +116,9 @@ areas of Python packaging:
metadata within the cryptographic envelope.
For example, to prevent domain separation between a distribution's name and
its contents, this PEP proposes that digital attestations be performed over
``HASH(name || HASH(contents))`` rather than just ``HASH(contents)``.
its contents, this PEP uses '`Statements <https://github.com/in-toto/attestation/blob/v1.0/spec/v1.0/statement.md>`__'
from the `in-toto project <https://in-toto.io/>`__ to bind the distribution's
contents (via SHA-256 digest) to its filename.
Previous Work
@ -196,6 +197,9 @@ Index changes
Simple Index
^^^^^^^^^^^^
The following changes are made to the
:ref:`simple repository API <packaging:simple-repository-api-base>`:
* When an uploaded file has one or more attestations, the index **MAY**
provide a ``.provenance`` file adjacent to the hosted distribution.
The format of the ``.provenance`` file **SHALL** be a JSON-encoded
@ -208,14 +212,14 @@ Simple Index
* When a ``.provenance`` file is present, the index **MAY** include a
``data-provenance`` attribute on its file link. The value of the
``data-provenance`` attribute **SHALL** be the SHA256 digest of the
``data-provenance`` attribute **SHALL** be the SHA-256 digest of the
associated ``.provenance`` file.
* The index **MAY** choose to modify the ``.provenance`` file. For example,
the index **MAY** permit adding additional attestations and verification
materials, such as attestations from third-party auditors or other services.
When the index modifies the ``.provenance`` file, it **MUST** also update the
``data-provenance`` attribute's value to the new SHA256 digest.
``data-provenance`` attribute's value to the new SHA-256 digest.
See :ref:`changes-to-provenance-objects` for an additional discussion of
reasons why a file's provenance may change.
@ -223,17 +227,19 @@ Simple Index
JSON-based Simple API
^^^^^^^^^^^^^^^^^^^^^
The following changes are made to the
:ref:`JSON simple API <packaging:simple-repository-api-json>`:
* When an uploaded file has one or more attestations, the index **MAY**
include a ``provenance`` object in the ``file`` dictionary for that file.
The format of the ``provenance`` object **SHALL** be a JSON-encoded
:ref:`provenance object <provenance-object>`, which **SHALL** contain
the file's attestations.
include a ``provenance`` key in the ``file`` dictionary for that file.
* The index **MAY** choose to modify the ``provenance`` object, under the same
conditions as the ``.provenance`` file specified above.
The value of the ``provenance`` key **SHALL** be a JSON string, which
**SHALL** be the SHA-256 digest of the associated ``.provenance`` file,
as in the Simple Index.
See :ref:`changes-to-provenance-objects` for an additional discussion of
reasons why a file's provenance may change.
See :ref:`appendix-3` for an explanation of the technical decision to
embed the SHA-256 digest in the JSON API, rather than the full
:ref:`provenance object <provenance-object>`.
These changes require a version change to the JSON API:
@ -260,13 +266,28 @@ object is provided as pseudocode below.
verification_material: VerificationMaterial
"""
Cryptographic materials used to verify `message_signature`.
Cryptographic materials used to verify `envelope`.
"""
message_signature: str
envelope: Envelope
"""
The attestation's signature, as `base64(raw-sig)`, where `raw-sig`
is the raw bytes of the signing operation over the attestation payload.
The enveloped attestation statement and signature.
"""
@dataclass
class Envelope:
statement: bytes
"""
The attestation statement.
This is represented as opaque bytes on the wire (encoded as base64),
but it MUST be an JSON in-toto v1 Statement.
"""
signature: bytes
"""
A signature for the above statement, encoded as base64.
"""
@dataclass
@ -302,63 +323,36 @@ object) by selecting a new version number.
.. _payload-and-signature-generation:
Attestation payload and signature generation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Attestation statement and signature generation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The *attestation payload* is the actual claim that is cryptographically signed
over within the attestation object (as the ``message_signature``).
The *attestation statement* is the actual claim that is cryptographically signed
over within the attestation object (i.e., the ``envelope.statement``).
The attestation payload is encoded as an :rfc:`8785` canonicalized JSON object,
with the following pseudocode layout:
The attestation statement is encoded as a
`v1 in-toto Statement object <https://github.com/in-toto/attestation/blob/v1.0/spec/v1.0/statement.md>`__,
in JSON form. When serialized the statement is treated as an opaque binary blob,
avoiding the need for canonicalization. An example JSON-encoded statement is
provided in :ref:`appendix-4`.
.. code-block:: python
In addition to being a v1 in-toto Statement, the attestation statement is constrained
in the following ways:
@dataclass
class AttestationPayload:
distribution: str
"""
The file name of the Python package distribution.
"""
* The in-toto ``subject`` **MUST** contain only a single subject.
* ``subject[0].name`` is the distribution's filename, which **MUST** be
a valid :ref:`source distribution <packaging:source-distribution-format>` or
:ref:`wheel distribution <packaging:binary-distribution-format>` filename.
* ``subject[0].digest`` **MUST** contain a SHA-256 digest. Other digests
**MAY** be present. The digests **MUST** be represented as hexadecimal strings.
* The following ``predicateType`` values are supported:
digest: str
"""
The SHA-256 digest of the distribution's contents, as a hexadecimal string.
"""
* `SLSA Provenance <https://slsa.dev/provenance/v1>`__: ``https://slsa.dev/provenance/v1``
* `PyPI Publish Attestation <https://docs.pypi.org/attestations/publish/v1>`__: ``https://docs.pypi.org/attestations/publish/v1``
The value of ``distribution`` is the same distribution filename that appears
in the :pep:`503` and :pep:`691` APIs. For example, ``distribution`` would be
``sampleproject-1.2.0-py2.py3-none-any.whl`` for the following simple index
entry:
.. code-block:: html
<a href="https://example.com/...">sampleproject-1.2.0-py2.py3-none-any.whl</a><br/>
In practice, this means that ``distribution`` is defined by the PyPA's
living specifications for
:ref:`binary distributions <packaging:binary-distribution-format>` and
:ref:`source distributions <packaging:source-distribution-format>`, although
non-conforming distributions may be hosted by the index.
The following pseudocode demonstrates the construction of an attestation
payload and its signature:
.. code-block:: python
def build_payload(dist: Path) -> AttestationPayload:
return AttestationPayload(
distribution=dist.name,
digest=sha256(dist.read_bytes()).hexdigest,
)
attestation_payload = build_payload("sampleproject-1.2.0-py2.py3-none-any.whl")
# canonical_json is a fictitious module that performs RFC 8785 canonical
# JSON serialization.
encoded_payload = canonical_json.dumps(asdict(attestation_payload))
raw_signature = signing_key.sign(encoded_payload, ECDSA(SHA2_256()))
message_signature = b64encode(raw_signature)
The signature over this statement is constructed using the
`v1 DSSE signature protocol <https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md>`__,
with a ``PAYLOAD_TYPE`` of ``application/vnd.in-toto+json`` and a ``PAYLOAD_BODY`` of the JSON-encoded
statement above. No other ``PAYLOAD_TYPE`` is permitted.
.. _provenance-object:
@ -368,9 +362,8 @@ Provenance objects
The index will serve uploaded attestations along with metadata that can assist
in verifying them in the form of JSON serialized objects.
These *provenance objects* will be available via both the :pep:`503` Simple Index
and :pep:`691` JSON-based Simple API as described above, and will have the
following layout:
These *provenance objects* will be available via both the Simple Index
and JSON-based Simple API as described above, and will have the following layout:
.. code-block:: json
@ -488,7 +481,8 @@ for changes to the provenance object include but are not limited to:
Attestation verification
------------------------
Verifying an attestation object requires verification of each of the following:
Verifying an attestation object against a distribution file requires verification of each of the
following:
* ``version`` is ``1``. The verifier **MUST** reject any other version.
* ``verification_material.certificate`` is a valid signing certificate, as
@ -497,9 +491,15 @@ Verifying an attestation object requires verification of each of the following:
* ``verification_material.certificate`` identifies an appropriate signing
subject, such as the machine identity of the Trusted Publisher that published
the package.
* ``message_signature`` can be verified by ``verification_material.certificate``,
using the reconstructed attestation payload as the cleartext input. The
verifier **MUST** reconstruct the attestation payload itself.
* ``envelope.statement`` is a valid in-toto v1 Statement, with a subject
and digest that **MUST** match the distribution's filename and contents.
For the distribution's filename, matching **MUST** be performed by parsing
using the appropriate source distribution or wheel filename format, as
the statement's subject may be equivalent but normalized.
* ``envelope.signature`` is a valid signature for ``envelope.statement``
corresponding to ``verification_material.certificate``,
as reconstituted via the
`v1 DSSE signature protocol <https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md>`__.
In addition to the above required steps, a verifier **MAY** additionally verify
``verification_material.transparency_entries`` on a policy basis, e.g. requiring
@ -543,19 +543,6 @@ unstated presumption with earlier mechanisms, like PGP and wheel signatures.
This PEP does not preclude or exclude future index trust mechanisms, such
as :pep:`458` and/or :pep:`480`.
Flexible attestations
---------------------
This PEP specifies a fixed attestation payload (defined in
:ref:`payload-and-signature-generation`), binding the contents of each uploaded
file to its public name on the index. This payload format is fixed and
inflexible to ease implementation, and to minimize additional mechanical
changes to the index itself (e.g., needing to store and service detached
attestation documents).
This PEP does not preclude or exclude future more flexible attestation payload
formats, such as ones built on `in-toto <https://in-toto.io/>`__.
Recommendations
===============
@ -628,7 +615,7 @@ of signed inclusion time, and can be verified either online or offline.
inclusion_proof: InclusionProof
"""
The actual inclusion proof the the log entry.
The actual inclusion proof of the log entry.
"""
@ -668,6 +655,58 @@ of signed inclusion time, and can be verified either online or offline.
Cosigned checkpoints from zero or more log witnesses.
"""
.. _appendix-3:
Appendix 3: Simple JSON API size considerations
===============================================
A previous draft of this PEP required embedding each
:ref:`provenance object <provenance-object>` directly into its appropriate part
of the JSON Simple API.
The current version of this PEP embeds the SHA-256 digest of the provenance
object instead. This is done for size and network bandwidth consideration
reasons:
1. We estimate the typical size of an attestation object to be approximately
5.3 KB of JSON.
2. We conservatively estimate that indices eventually host around 3 attestations
per release file, or approximately 15.9 KB of JSON per combined provenance
object.
3. As of May 2024, the average project on PyPI has approximately 21 release
files. We conservatively expect this average to increase over time.
4. Combined, these numbers imply that a typical project might expect to host
between 60 and 70 attestations, or approximately 339 KB of additional JSON
in its "project detail" endpoint.
These numbers are significantly worse in "pathological" cases, where projects
have hundreds or thousands of releases and/or dozens of files per release.
.. _appendix-4:
Appendix 4: Example attestation statement
=========================================
Given a source distribution ``sampleproject-1.2.3.tar.gz`` with a SHA-256
digest of ``e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855``,
the following is an appropriate in-toto Statement, as a JSON object:
.. code-block:: json
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [
{
"name": "sampleproject-1.2.3.tar.gz",
"digest": {"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}
}
],
"predicateType": "https://some-arbitrary-predicate.example.com/v1",
"predicate": {
"something-else": "foo"
}
}
Copyright
=========