PEP 691: Refactor based on Feedback (#2592)

* Describe JSON format first and more clearly
* Clarify versioning
* Move content types down and flesh them out more
* Expand the conneg section
* Expand upon the impact to PEP 458
* Add alternative mechanisms for conneg
* Provide recommendations to clients and servers
* Add a FAQ for implications for static file serving
* Add a FAQ to clarify that TUF targets != URLs
* Add a FAQ for application/json
* Add a FAQ about PyPI
* Add support for PEP 629 in the JSON response body
* Add PyPI to the appendix
* Recommend ;q=0 on text/html
* Rename the dist-info-metadata-available field
* Update the PEP-Delegate to Brett Cannon
This commit is contained in:
Donald Stufft 2022-05-08 17:00:55 -04:00 committed by GitHub
parent 7557f1959f
commit 178afaf170
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 535 additions and 109 deletions

View File

@ -7,7 +7,7 @@ Author: Donald Stufft <donald@stufft.io>,
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
BDFL-Delegate: Donald Stufft <donald@stufft.io>
PEP-Delegate: Brett Cannon <brett@python.org>
Discussions-To: https://discuss.python.org/t/pep-691-json-based-simple-api-for-python-package-indexes/15553
Created: 04-May-2022
Post-History: `05-May-2022 <https://discuss.python.org/t/pep-691-json-based-simple-api-for-python-package-indexes/15553>`__
@ -53,10 +53,11 @@ that effort has not gained much if any traction beyond people thinking that it
would be nice to do it.
This PEP attempts to take a different route. It doesn't fundamentally change
the overall API structure, but instead specifies a new representation of the
the overall API structure, but instead specifies a new serialization of the
existing data contained in existing :pep:`503` responses in a format that is
easier for software to parse rather than using a human centric document format.
Goals
=====
@ -98,109 +99,111 @@ Specification
To enable parsing responses with only the standard library, this PEP specifies that
all responses (besides the files themselves, and the HTML responses from
:pep:`503`) should be encoded using `JSON <https://www.json.org/>`_.
:pep:`503`) should be serialized using `JSON <https://www.json.org/>`_.
To enable zero configuration discovery and to minimize the amount of additional HTTP
requests, this PEP extends :pep:`503` such that all of the API endpoints (other than the
files themselves) will utilize HTTP content negotiation to allow client and server to
select the correct format to serve, i.e. either HTML or JSON.
select the correct serialization format to serve, i.e. either HTML or JSON.
Format Selection
----------------
A HTML response will be the default when requesting in version 1.0:
- ``/simple/``
- ``/simple/foo/``
- Like :pep:`503`, the trailing ``/`` is expected
To request a JSON response, the ``Accept`` header will need to be added to the
request specify the response type and version. For version 1.0 this will look like:
``Accept: application/vnd.pypi.simple.v1+json``
The version is also optional and will then always return the latest version:
``Accept: application/vnd.pypi.simple+json``
This is for clients who always want latest and should expect potential
breakages. Additionally, it is potential useful way to run integration tests
against a possibly breaking version.
Specifying HTML is also allowed so clients can be explicit to backends (e.g if we
switch to JSON default in the future):
``Accept: application/vnd.pypi.simple.v1+html``
Using ``text/html`` will also work, which will serve the latest API version. To
be explicit, clients should use specific HTML ``Accept``. If no
``Accept`` is specified, the latest HTML version will be returned unless
the backend *only* supports JSON. Backends may default to returning JSON in the
future.
The ``Accept:`` header also allows you to say that you prefer the the V1 Simple JSON API,
if that's not available then you prefer the V1 HTML API, and if that's not available,
just ``text/html``. To do this would look like:
``Accept: application/vnd.pypi.simple.v1+json, application/vnd.pypi.simple.v1+html, text/html``
Versioning
----------
Versioning will adhere to :pep:`629` format (``Major.Minor``) and will be
included in the ``Accept`` request that clients add to obtain a JSON
response. We don't foresee the use of *Minor* versioning but will support it if
the need does arise.
Versioning will adhere to :pep:`629` format (``Major.Minor``), which has defined the
existing HTML responses to be ``1.0``. Since this PEP does not introduce new features
into the API, rather it describes a different serialization format for the existing
features, this PEP does not change the existing ``1.0`` version, and instead just
describes how to serialize that into JSON.
The header for clients accessing version 1.0 of the API will be:
Simililary to :pep:`629`, the major version number **MUST** be incremented if any
changes to the new format would result in no longer being able to expect existing
clients to meaningfully understand the format.
``application/vnd.pypi.simple.index.v1+json``
Likewise, incrementing the minor version **MUST** be incremented if features are
added or removed from the format, but existing clients would be expected to continue
to meaningfully understand the format.
An example for Accept values that a newer APIs could support **would** look like:
Changes that would not result in existing clients being unable to meaningfully
understand the format and which do not represent features being added or removed
may occur without changing the version number.
``application/vnd.pypi.simple.index.v2+json``
This is intentionally vague, as this PEP believes it is best left up to future PEPs
that make any changes to the API to investigate and decide whether or not that
change should increment the major or minor version.
If a version that does not exist is requested, the server will explicitly return a
`406 Not Acceptable
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/406>`_ HTTP status
code. The response will also indicate available API versions and links to
version formats.
Future versions of the API may add things that can only be represented in a subset
of the available serializations of that version. All serializations version numbers
**SHOULD** be kept in sync, but the specifics of how a feature serializes into each
format may differ, including whether or not that feature is present at all.
It is the intent of this PEP that the API should be thought of as URL endpoints that
return data, whose interpretation is defined by the version of that data, and then
serialized into the target serialization format.
TUF Support - PEP 458
---------------------
JSON Serialization
------------------
:pep:`458` states that the "Simple Index" needs to be hashable. To adhere to the TUF
standard, we will need a target for each response, i.e. the HTML and JSON (plus any
future type) response. To provide this we could have two targets per API endpoint:
The URL structure from :pep:`503` still applies, as this PEP only adds an additional
serialization format for the already existing API.
- ``/simple/foo/vnd.pypi.simple.v1.html``
- ``/simple/foo/vnd.pypi.simple.v1.json``
The following constraints apply to all JSON serialized responses described in this
PEP:
Additionally, when calculating the digest of a JSON response, indices should
use the `Canonical JSON <https://wiki.laptop.org/go/Canonical_JSON>`_ format.
* All JSON responses will *always* be a JSON object rather than an array or other
type.
* While JSON doesn't natively support an URL type, any value that represents an
URL in this API may be either absolute or relative as long as they point to
the correct location. If relative, they are relative to the current URL as if
it were HTML.
* Additional keys may be added to any dictionary objects in the API responses
and clients **MUST** ignore keys that they don't understand.
* All JSON responses will have a ``meta`` key, which contains information related to
the response itself, rather than the content of the response.
* All JSON responses will have a ``meta.api-version`` key, which will be a string that
contains the :pep:`629` ``Major.Minor`` version number, with the same fail/warn
semantics as in :pep:`629`.
* All requirements of :pep:`503` that are not HTML specific still apply.
Root URL
--------
Project List
~~~~~~~~~~~~
The root URL ``/`` for this PEP (which represents the base URL) will be a JSON encoded
dictionary where each key is a string of the normalized project name, and the value is
a dictionary with a single key, ``url``, which represents the URL that the project can
be fetched from. As an example::
dictionary which has a single key, ``projects``, which is itself a dictionary where each
key is a string of the normalized project name, and the value is a dictionary with a
single key, ``url``, which represents the URL that the project can be fetched from. As
an example:
.. code-block:: json
{
"meta": {
"api-version": "1.0"
},
"projects": {
"frob": {"url": "/frob/"},
"spamspamspam": {"url": "/spamspamspam/"}
}
}
Below the root URL is another URL for each individual project contained within
a repository. The format of this URL is ``/<project>/`` where the ``<project>``
is replaced by the :pep:`503`-canonicalized name for that project, so a project named
"Holy_Grail" would have a URL like ``/holy-grail/``. This URL must respond with a
JSON encoded dictionary that has two keys, ``name``, which represents the normalized
name of the project and ``files``. The ``files`` key is a list of dictionaries,
each one representing an individual file.
Project Detail
~~~~~~~~~~~~~~
The format of this URL is ``/<project>/`` where the ``<project>`` is replaced by the
:pep:`503`-canonicalized name for that project, so a project named "Holy_Grail" would
have a URL like ``/holy-grail/``.
This URL must respond with a JSON encoded dictionary that has two keys, ``name``, which
represents the normalized name of the project and ``files``. The ``files`` key is a
list of dictionaries, each one representing an individual file.
Each individual file dictionary has the following keys:
@ -214,25 +217,52 @@ Each individual file dictionary has the following keys:
The ``hashes`` dictionary **MUST** be present, even if no hashes are available
for the file, however it is **HIGHLY** recommended that at least one secure,
guaranteed to be available hash is always included.
By default, any hash algorithm available via `hashlib
<https://docs.python.org/3/library/hashlib.html>`_ (specifically any that can
be passed to ``hashlib.new()`` and do not require additional parameters) can
be used as a key for the hashes dictionary. At least one secure algorithm from
``hashlib.algorithms_guaranteed`` **SHOULD** always be included. At the time
of this PEP, ``sha256`` specifically is recommended.
- ``requires-python``: An **optional** key that exposes the *Requires-Python*
metadata field, specified in :pep:`345`. Where this is present, installer tools
**SHOULD** ignore the download when installing to a Python version that
doesn't satisfy the requirement.
- ``dist-info-metadata-available``: An **optional** key that indicates
Unlike ``data-requires-python`` in :pep:`503`, the ``requires-python`` key does not
require any special escaping other than anything JSON does naturally.
- ``dist-info-metadata``: An **optional** key that indicates
that metadata for this file is available, via the same location as specified in
:pep:`658` (`{file_url}.metadata`). Where this is present, it **MUST** be true,
or a dictionary mapping a hash name to a hex encoded digest of the metadata hash.
:pep:`658` (``{file_url}.metadata``). Where this is present, it **MUST** be
boolean to indicate if the file has an associated metadata file, or a dictionary
mapping hash names to a hex encoded digest of the metadata's hash.
When this is a dictionary of hashes, then all the same requirements and
recommendations as the ``hashes`` key hold true for this key as well.
If this key is missing then the metadata file may or may not exist. If the key
value is truthy, then the metadata file is present, and if it is falsey then it
is not.
It is recommended that servers make the hashes of the metadata file available if
possible.
- ``gpg-sig``: An **optional** key that acts a boolean to indicate if the file has
an associated GPG signature or not. If this key does not exist, then the signature
may or may not exist.
- ``yanked``: An **optional** key which may have no value, or may have an
arbitrary string as a value. The presence of a ``yanked`` key SHOULD
be interpreted as indicating that the file pointed to by the ``url`` field
has been "Yanked" as per :pep:`592`.
- ``yanked``: An **optional** key which may be a boolean to indicate if the file
has been yanked, or a non empty, but otherwise arbitrary, string to indicate that
a file has been yanked with a specific reason. If the ``yanked`` key is present
and is a truthy value, then it **SHOULD** be interpreted as indicating that the
file pointed to by the ``url`` field has been "Yanked" as per :pep:`592`.
As an example::
As an example:
.. code-block:: json
{
"meta": {
"api-version": "1.0"
},
"name": "holygrail",
"files": [
{
@ -247,42 +277,358 @@ As an example::
"url": "https://example.com/files/holygrail-1.0-py3-none-any.whl",
"hashes": {"sha256": "...", "blake2b": "..."},
"requires-python": ">=3.7",
"dist-info-metadata-available": true
},
"dist-info-metadata": true
}
]
}
In addition to the above, the following constraints are placed on the API:
* While JSON doesn't natively support an URL type, any value that represents an
URL in this API may be either absolute or relative as long as they point to
the correct location. If relative, they are relative to the current URL as if
it were HTML.
Content-Types
-------------
* Additional keys may be added to any dictionary objects in the API responses
and clients **MUST** ignore keys that they don't understand.
This PEP proposes that all responses from the Simple API will have a standard
content type that describes what the response is (a simple api response), what
version of the API it represents, and what serialization format has been used.
* By default, any hash algorithm available via `hashlib
<https://docs.python.org/3/library/hashlib.html>`_ (specifically any that can
be passed to ``hashlib.new()`` and do not require additional parameters) can
be used as a key for the hashes dictionary. At least one secure algorithm from
``hashlib.algorithms_guaranteed`` **SHOULD** always be included. At the time
of this PEP, ``sha256`` specifically is recommended.
The structure of this content type will be::
* Unlike ``data-requires-python`` in :pep:`503`, the ``requires-python`` key does not
require any special escaping other than anything JSON does naturally.
application/vnd.pypi.simple.$version+format
* Future features **MAY** be implemented or only supported when operating under JSON.
This would be decided on a case by case basis depending on how important the feature
is, how widely used HTML is at that point, and how difficult representing the feature
in HTML would be.
Since only major versions should be disruptive to clients attempting to
understand one of these API responses, only the major version will be included
in the content type, and will be prefixed with a ``v`` to clarify that it is a
version number.
* All requirements of :pep:`503` that are not HTML specific still apply.
Which means that for the existing 1.0 API, the content types would be:
- **JSON:** ``application/vnd.pypi.simple.v1+json``
- **HTML:** ``application/vnd.pypi.simple.v1+html``
In addition to the above, a special "meta" version is supported named ``latest``,
whose purpose is to allow clients to request the absolute latest version, without
having to know ahead of time what that version is. It is recommended however,
that clients be explicit about what versions they support.
To support existing clients which expect the existing :pep:`503` API responses to
use the ``text/html`` content type, this PEP further defines ``text/html`` as an alias
for the ``application/vnd.pypi.simple.v1+html`` content type.
Version + Format Selection
--------------------------
Now that there is multiple possible serializations, we need a mechanism to allow
clients to indicate what serialization formats that they're able to understand. In
addition, it would be a benefit if any possible new major version to the API can
be added without disrupting existing clients expecting the previous API version.
To enable this, this PEP standardizes on the use of HTTP's
`Server-Driven Content Negotiation <https://developer.mozilla.org/en-US/docs/Web/HTTP/Content_negotiation>`_.
While this PEP won't fully describe the entirety of server-driven content
negotiation, the flow is roughly:
1. The client makes an HTTP request containing an ``Accept`` header listing all
of the version+format content types that they are able to understand.
2. The server inspects that header, selects one of the listed content types,
then returns a response using that content type.
3. If the server does not support any of the content types in the ``Accept``
header or if the client did not provide an ``Accept`` header at all, then
they are able to choose between 3 different options for how to respond:
a. Select a default content type other than what the client has requested
and return a response with that.
b. Return a HTTP ``406 Not Acceptable`` response to indicate that none of
the requested content types were available, and the server was unable
or unwilling to select a default content type to respond with.
c. Return a HTTP ``300 Multiple Choices`` response that contains a list of
all of the possible responses that could have been chosen.
4. The client interprets the response, handling the different types of responses
that the server may have responded with.
This PEP does not specify which choices the server makes in regards to handling
a content type that it isn't able to return, and clients **SHOULD** be prepared
to handle all of the possible responses in whatever way makes the most sense for
that client.
However, as there is no standard format for how a ``300 Multiple Choices``
response can be interpreted, this PEP highly discourages servers from utilizing
that option, as clients will have no way to understand and select a different
content-type to request. In addition, it's unlikely that the client *could*
understand a different content type anyways, so at best this response would
likely just be treated the same as a ``406 Not Acceptable`` error.
This PEP **does** require that if the meta version ``latest`` is being used, the
server **MUST** respond with the content type for the actual version that is
contained in the response
(i.e. A ``Accept: application/vnd.pypi.simple.latest+json`` request that returns
a v1.x response should have a ``Content-Type`` of
``application/vnd.pypi.simple.v1+json``).
The ``Accept`` header is a comma separated list of content types that the client
understands and is able to process. It supports three different formats for each
content type that is being requested:
- ``$type/$subtype``
- ``$type/*``
- ``*/*``
For the use of selecting a version+format, the most useful of these is
``$type/$subtype``, as that is the only way to actually specify the version
and format you want.
The order of the content types listed in the ``Accept`` header does not have any
specific meaning, and the server **SHOULD** consider all of them to be equally
valid to respond with. If a client wishes to specify that they prefer a specific
content type over another, they may use the ``Accept`` header's
`quality value <https://developer.mozilla.org/en-US/docs/Glossary/Quality_values>`_
syntax.
This allows a client to specify a priority for a specific entry in their
``Accept`` header, by append a ``;q=`` followed by a value between ``0`` and
``1`` inclusive, with up to 3 decimal digits. When interpreting this value,
an entry with a higher quality has priority over an entry with a lower quality,
and any entry without a quality present will default to a quality of ``1``.
However, clients should keep in mind that a server is free to select **any** of
the content types they've asked for, regardless of their requested priority, and
it may even return a content type that they did **not** ask for.
To aid clients in determining the content type of the response that they have
received from an API request, this PEP requires that servers always include a
``Content-Type`` header indicating the content type of the response. This is
technically a backwards incompatible change, however in practice
`pip has been enforcing this requirement <https://github.com/pypa/pip/blob/cf3696a81b341925f82f20cb527e656176987565/src/pip/_internal/index/collector.py#L123-L150>`_
so the risks for actual breakages is low.
An example of how a client can operate would look like:
.. code-block:: python3
import cgi
import requests
# Construct our list of acceptable content types, we want to prefer
# that we get a v1 response serialized using JSON, however we also
# can support a v1 response serialized using HTML. For compatibility
# we also request text/html, but we prefer it least of all since we
# don't know if it's actually a Simple API response, or just some
# random HTML page that we've gotten due to a misconfiguration.
CONTENT_TYPES = [
"application/vnd.pypi.simple.v1+json",
"application/vnd.pypi.simple.v1+html",
"text/html;q=0", # For legacy compatibility
]
ACCEPT = ", ".join(CONTENT_TYPES)
# Actually make our request to the API, requesting all of the content
# types that we find acceptable, and letting the server select one of
# them out of the list.
resp = requests.get("https://pypi.org/simple/", headers={"Accept": ACCEPT})
# If the server does not support any of the content types you requested,
# AND it has chosen to return a HTTP 406 error instead of a default
# response then this will raise an exception for the 406 error.
resp.raise_for_status()
# Determine what kind of response we've gotten to ensure that it is one
# that we can support, and if it is, dispatch to a function that will
# understand how to interpret that particular version+serialization. If
# we don't understand the content type we've gotten, then we'll raise
# an exception.
content_type, _ = cgi.parse_header(resp.headers.get("content-type", ""))
match content_type:
case "application/vnd.pypi.simple.v1+json":
handle_v1_json(resp)
case "application/vnd.pypi.simple.v1+html" | "text/html":
handle_v1_html(resp)
case _:
raise Exception(f"Unknown content type: {content_type}")
If a client wishes to only support HTML or only support JSON, then they would
just remove the content types that they do not want from the ``Accept`` header,
and turn receiving them into an error.
Alternative Negotiation Mechanisms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
While using HTTP's Content negotiation is considered the standard way for a client
and server to coordinate to ensure that the client is getting an HTTP response that
it is able to understand, there are situations where that mechanism may not be
sufficient. For those cases this PEP has alternative negotiation mechanisms that
may *optionally* be used instead.
URL Parameter
^^^^^^^^^^^^^
Servers that implement the Simple API may choose to support an URL parameter named
``format`` to allow the clients to request a specific version of the URL.
The value of the ``format`` parameter should be **one** of the valid content types.
Passing multiple content types, wild cards, quality values, etc is **not** supported.
Supporting this parameter is optional, and clients **SHOULD NOT** rely on it for
interacting with the API. This negotiation mechanism is intended to allow for easier
human based exploration of the API within a browser, or to allow documentation or
notes to link to a specific version+format.
Servers that do not support this parameter may choose to return an error when it is
present, or they may simple ignore it's presence.
When a server does implement this parameter, it **SHOULD** take precedence over any
values in the client's ``Accept`` header, and if the server does not support the
requested format, it may choose to fall back to the ``Accept`` header, or choose any
of the error conditions that standard server-driven content negotiation typically
has (e.g. ``406 Not Available``, ``303 Multiple Choices``, or selecting a default
type to return).
Endpoint Configuration
^^^^^^^^^^^^^^^^^^^^^^
This option technically is not a special option at all, it is just a natural
consequence of using content negotiation and allowing servers to select which of the
available content types is their default.
If a server is unwilling or unable to implement the server-driven content negotiation,
and would instead rather require users to explicitly configure their client to select
the version they want, then that is a supported configuration.
To enable this, a server should make multiple endpoints (for instance,
``/simple/v1+html/`` and/or ``/simple/v1+json/``) for each version+format that they
wish to support. Under that endpoint, they can host a copy of their repository that
only supports one (or a subset) of the content-types. When a client makes a request
using the ``Accept`` header, the server can ignore it and return the content type
that corresponds to that endpoint.
For clients that wish to require specific configuration, they can keep track of
which version+format a specific repository url was configured for, and when making
a request to that server, emit an ``Accept`` header that *only* includes the correct
content type.
TUF Support - PEP 458
---------------------
:pep:`458` requires that all API responses are hashable and that they can be uniquely
identified by a path relative to the repository root. For a Simple API repository, the
target path is the Root of our API (e.g. ``/simple/`` on PyPI). This creates
challenges when accessing the API using a TUF client instead of directly using a
standard HTTP client, as the TUF client cannot handle the fact that a target could
have multiple different representations that all hash differently.
:pep:`458` does not specify what the target path should be for the Simple API, but I
believe that TUF requires that the target paths be "file-like", in other words, a
path like ``simple/PROJECT/`` is not acceptable, because it technically points to a
directory.
The saving grace is that the target path does not *have* to actually match the URL
being fetched from the Simple API, and it can just be a sigil that the fetching code
knows how to transform into the actual URL that needs to be fetched. This same thing
can hold true for other aspects of the actual HTTP request, such as the ``Accept``
header.
Ultimately figuring out how to map a directory to a filename is out of scope for this
PEP (but it would be in scope for :pep:`458`), and this PEP defers making a decision
about how exactly to represent this inside of :pep:`458` metadata.
However, it appears that the current WIP branch against pip that attempts to implement
:pep:`458` is using a target path like ``simple/PROJECT/index.html``. This could be
modified to include the API version and serialization format using something like
``simple/PROJECT/vnd.pypi.simple.vN.FORMAT``. So the v1 HTML format would be
``simple/PROJECT/vnd.pypi.simple.v1.html`` and the v1 JSON format woould be
``simple/PROJECT/vnd.pypi.simple.v1.json``.
In this case, since ``text/html`` is an alias to ``application/vnd.pypi.simple.v1+html``
when interacting through TUF, likely it will make the most sense to normalize to the
more explicit name.
Likewise the ``latest`` metaversion should not be included in the targets, only
explicitly declared versions should be supported.
Recommendations
===============
This section is non-normative, and represents what the PEP authors believe to be
the best default implementation decisions for something implementing this PEP, but
it does **not** represent any sort of requirement to match these decisions.
These decisions have been chosen to maximize the number of requests that can be
moved onto the newest version of an API, while maintaining the greatest amount
of compatibility. In addition, they've also tried to make using the API provide
guardrails that attempt to push clients into making the best choices it can.
It is recommended that servers:
- Support all 3 content types described in this PEP, using server-driven
content negotiation, for as long as they reasonably can, or at least as
long as they're receiving non trivial traffic that uses the HTML responses.
- When encountering an ``Accept`` header that does not contain any content types
that it knows how to work with, should not ever return a ``300 Multiple Choice``
response, and it should be preferred to return a ``406 Not Acceptable`` response.
- However, if choosing to use the endpoint configuration, you should prefer to
return a ``200 OK`` response in the expected content type for that endpoint.
- When selecting an acceptable version, should choose the highest version that
the client supports, with the most expressive/featureful serialization format,
taking into account the specificity of the client requests as well as any
quality priority values they have expressed, and it should only use the
``text/html`` content type as a last resort.
It is recommended that clients:
- Support all 3 content types described in this PEP, using server-driven
content negotiation, for as long as they reasonably can.
- When constructing an ``Accept`` header, include all of the content types
that you support.
You should generally *not* include a quality priority value for your content
types, unless you have implementation specific reasons that you want the
server to take into account (for example, if you're using the stdlib html
parser and you're worried that there may be some kinds of HTML responses that
you're unable to parse in some edge cases).
The one exception to this recommendation is that it is recommended that you
*should* include a ``;q=0`` value on the legacy ``text/html`` content type,
unless it is the only content type that you are requesting.
- Explicitly select what versions they are looking for, rather than using the
``latest`` meta version during normal operation.
- Check the ``Content-Type`` of the response and ensure it matches something
that you were expecting.
FAQ
===
Does this mean PyPI is planning to drop support for HTML/PEP 503?
-----------------------------------------------------------------
No, PyPI has no plans at this time to drop support for :pep:`503` or HTML
responses.
While this PEP does give repositories the flexibility to do that, that largely
exists to ensure that things like using the Endpoint Configuration mechanism is
able to work, and to ensure that clients do not make any assumptions that would
prevent, at some point in the future, gracefully dropping support for HTML.
The existing HTML responses incur almost no maintenance burden on PyPI and
there is no pressing need to remove them. The only real benefit to dropping them
would be to reduce the number of items cached in our CDN.
If in the future PyPI *does* wish to drop support for them, doing so would
almost certainly be the topic of a PEP, or at a minimum a public, open, discussion
and would be informed by metrics showing any impact to end users.
Why JSON instead of X format?
-----------------------------
@ -401,10 +747,90 @@ using separate API routes a less desirable solution than relying on content
negotiation to select the most ideal representation of the data.
Does this mean that static servers are no longer supported?
-----------------------------------------------------------
In short, no, static servers are still (almost) fully supported by this PEP.
The specifics of how they are supported will depend on the static server in
question. For example:
- **S3:** S3 fully supports custom content types, however it does not support
any form of content negotiation. In order to have a server hosted on S3, you
would have to use the "Endpoint configuration" style of negotiation, and
users would have to configure their clients explicitly.
- **Github Pages:** Github pages does not support custom content types, so the
S3 solution is not currently workable, which means that only ``text/html``
repositories would function.
- **Apache:** Apache fully supports server-driven content negotiation, and would
just need to be configured to map the custom content types to specific extension.
Doesn't TUF support require having different URLs for each representation?
--------------------------------------------------------------------------
While in TUF, each target can only have a single representation, and by default
that is assumed to map exactly to the target path that is being referenced
within TUF, there is actually no requirement that the target path is the same
as the server path, that the same data can't be represented by multiple targets.
In fact, TUF doesn't support the Simple API URLs as they are already, because
TUF assumes that a target points to a filename, but all of the Simple API URLs
are directories. Thus regardless of this PEP, there is going to have to be
something that translates between the naming of the targets within the TUF
metadata, and the actual requests being made to the server.
Currently the WIP TUF implementation for pip maps a target like
``simple/PROJECT/index.html`` to an HTTP request to fetch ``/simple/PROJECT/``.
However there is no reason that it could not be extended to map a target
like ``/simple/PROJECT/vnd.pypi.simple.v1.html`` to an HTTP request to
fetch ``/simple/PROJECT/`` with an ``Accept`` header of
``application/vnd.pypi.simple.v1+html``.
Why not add an ``application/json`` alias like ``text/html``?
-------------------------------------------------------------
This PEP believes that it is best for both clients and servers to be explicit
about the types of the API responses that are being used, and a content type
like ``application/json`` is the exact opposite of explicit.
The existence of the ``text/html`` alias exists as a compromise primarily to
ensure that existing consumers of the API continue to function as they already
do. There is no such expectation of existing clients using the Simple API with
a ``application/json`` content type.
In addition, ``application/json`` has no versioning in it, which means that
if there is ever a ``2.0`` version of the Simple API, we will be forced to make
a decision. Should ``application/json`` preserve backwards compatibility and
continue to be an alias for ``application/vnd.pypi.simple.v1+json``, or should
it be updated to be an alias for ``application/vnd.pypi.simple.v2+json``?
This problem doesn't exist for ``text/html``, because the assumption is that
HTML will remain a legacy format, and will likely not gain *any* new features,
much less features that require breaking compatability. So having it be an
alias for ``application/vnd.pypi.simple.v1+html`` is effectively the same as
having it be an alias for ``application/vnd.pypi.simple.latest+html``, since
``1.0`` will likely be the only HTML version to exist.
The largest benefit to adding the ``application/json`` content type is that
there do things that do not allow you to have custom content types, and require
you to select one of their preset content types. The main example of this being
Github Pages, which the lack of ``application/json`` support in this PEP means
that static repositories will no longer be able to be hosted on Github Pages
unless GitHub adds the ``application/vnd.pypi.simple.v1+json`` content type.
This PEP believes that the benefits are not large enough to add that content
type alias at this time, and that it's inclusion would likely be a footgun
waiting for unsuspecting people to accidentally pick it up. Especially given
that we can always add it in the future, but removing things is a lot harder
to do.
Appendix 1: Survey of use cases to cover
========================================
This was done through a discussion between ``pip`` and ``bandersnarch``
This was done through a discussion between ``pip``, ``PyPI``, and ``bandersnarch``
maintainers, who are the two first potential users for the new API. This is
how they use the Simple + JSON APIs today: