446 lines
25 KiB
ReStructuredText
446 lines
25 KiB
ReStructuredText
PEP: 766
|
||
Title: Explicit Priority Choices Among Multiple Indexes
|
||
Author: Michael Sarahan <msarahan@gmail.com>
|
||
Sponsor: Barry Warsaw <barry@python.org>
|
||
PEP-Delegate: Paul Moore <p.f.moore@gmail.com>
|
||
Discussions-To: https://discuss.python.org/t/pep-for-handling-multiple-indexes-index-priority/71589
|
||
Status: Draft
|
||
Type: Informational
|
||
Topic: Packaging
|
||
Created: 18-Nov-2024
|
||
Post-History: `18-Nov-2024 <https://discuss.python.org/t/pep-for-handling-multiple-indexes-index-priority/71589>`__,
|
||
|
||
Abstract
|
||
========
|
||
|
||
Package resolution is a key part of the Python user experience as the means of
|
||
extending Python's core functionality. The experience of package resolution is
|
||
mostly taken for granted until someone encounters a situation where the package
|
||
installer does something they don't expect. The installer behavior with
|
||
multiple indexes has been `a common source of unexpected behavior
|
||
<https://github.com/pypa/pip/issues/8606>`__. Through its ubiquity, pip has
|
||
long defined the standard expected behavior across other tools in the ecosystem,
|
||
but Python installers are diverging with respect to how they handle multiple
|
||
indexes. At the core of this divergence is whether index contents are combined
|
||
before resolving distributions, or each index is handled individually in order.
|
||
pip merges all indexes before matching distributions, while uv matches
|
||
distributions on one index before moving on to the next. Each approach has
|
||
advantages and disadvantages. This PEP aims to describe each of these
|
||
behaviors, which are referred to as “version priority” and “index priority”
|
||
respectively, so that community discussions and troubleshooting can share a
|
||
common vocabulary, and so that tools can implement predictable behavior based on
|
||
these descriptions.
|
||
|
||
Motivation
|
||
==========
|
||
|
||
Python package users frequently find themselves in need of specifying an index
|
||
or package source other than PyPI. There are many reasons for external indexes
|
||
to exist:
|
||
|
||
- File size/quota limitations on PyPI
|
||
- Implementation variants, such as `different GPU library builds in PyTorch <https://pytorch.org/get-started/locally/>`__
|
||
- `Local builds of packages shared internally at an organization <https://github.com/pypa/pip/issues/8606>`__
|
||
- `Situations where a local package has remote dependencies
|
||
<https://github.com/pypa/pip/issues/11624>`__, and the user wishes to prioritize
|
||
local packages over remote dependencies, while still falling back to remote
|
||
dependencies where needed
|
||
|
||
In most of these cases, it is not desirable to completely forego PyPI. Instead,
|
||
users generally want PyPI to still be a source of packages, but a lower priority
|
||
source. Unfortunately, `pip's current design precludes this concept of priority <https://github.com/pypa/pip/issues/8606>`__.
|
||
Some Python installer tools have developed alternative ways to handle multiple
|
||
indexes that incorporate mechanisms to express index priority, such as `uv
|
||
<https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes>`__
|
||
and `PDM
|
||
<https://pdm-project.org/latest/usage/config/#respect-the-order-of-the-sources>`__.
|
||
|
||
The innovation and the potential for customization is exciting, but it comes at
|
||
the risk of further fragmenting the python packaging ecosystem, which is already
|
||
perceived as one of Python's weak points. The motivation of this PEP is to encourage
|
||
installers to provide more insight into how they handle multiple indexes, and to
|
||
provide a vocabulary that can be common to the broader community.
|
||
|
||
Specification
|
||
=============
|
||
|
||
“Version priority”
|
||
------------------
|
||
|
||
This behavior is characterized by the installer always getting the
|
||
"best" version of a package, regardless of the index that it comes
|
||
from. "Best" is defined by the installer's algorithm for optimizing
|
||
the various traits of a package, also factoring in user input (such as
|
||
preferring only binaries, or no binaries). While installers may differ
|
||
in their optimization criteria and user options, the general trait that
|
||
all version priority installers share is that the index
|
||
contents are collated prior to candidate selection.
|
||
|
||
Version priority is most useful when all configured indexes are equally trusted
|
||
and well-behaved regarding the distribution interchangeability assumption.
|
||
Mirrors are especially well-behaved in this regard. That interchangeability
|
||
assumption is what makes comparing distributions of a given package meaningful.
|
||
Without it, the installer is no longer comparing “apples to apples.” In
|
||
practice, it is common for different indexes to have files that have different
|
||
contents than other indexes, such as builds for special hardware, or differing
|
||
metadata for the same package. Version priority behavior can lead to
|
||
undesirable, unexpected outcomes in these cases, and this is where `users
|
||
generally look for some kind of index priority
|
||
<https://github.com/pypa/pip/issues/8606>`__. Additionally, when there is a
|
||
difference in trust among indexes, version priority does not provide a way to
|
||
prefer more trusted indexes over less trusted indexes. This has been exploited by
|
||
dependency confusion attacks, and :pep:`708` was proposed as a way of
|
||
hard-coding a notion of trusted external indexes into the index.
|
||
|
||
The "version priority" name is new, and introduction of new terms should always
|
||
be minimized. This PEP looks toward the uv project, which refers to `its implementation of the version priority
|
||
behavior <https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes>`__
|
||
as “``unsafe-best-match``.” Naming is really hard here. On one hand, it
|
||
isn’t accurate to call pip’s default behavior intrinsically “unsafe.”
|
||
The addition of possibly malicious indexes is what
|
||
introduces concern with this behavior. :pep:`708` added a way to restrict
|
||
installers from drawing packages from unexpected, potentially insecure
|
||
indexes. On the other hand, the term “best-match” is technically
|
||
correct, but also misleading. The “best match” varies by user and by
|
||
application. “Best” is technically correct in the sense that it is a
|
||
global optimum according to the match criteria specified above, but that
|
||
is not necessarily what is “best” in a user’s eyes. “Version priority”
|
||
is a proposed term that avoids the concerns with the uv terminology,
|
||
while approximating the behavior in the most user-identifiable way that
|
||
packages are compared.
|
||
|
||
“Index priority”
|
||
----------------
|
||
|
||
In index priority, the resolver finds candidates for each index, one at a time.
|
||
The resolver proceeds to subsequent indexes only if the current package request
|
||
has no viable candidates. Index priority does not combine indexes into one
|
||
global, flat namespace. Because indexes are searched in order, the package from
|
||
an earlier index will be preferred over a package from a later index,
|
||
regardless of whether the later index had a better match with the installer's
|
||
optimization criteria. For a given installer, the optimization criteria and
|
||
selection algorithm should be the same for both index priority and version
|
||
priority. It is only the treatment of multiple indexes that differs: all
|
||
together for version priority, and individually for index priority.
|
||
|
||
The order of specification of indexes determines their priority in the
|
||
finding process. As a result, the way that installers load the index
|
||
configuration must be predictable and reproducible. This PEP does not prescribe
|
||
any particular mechanism, other than to say that installers should provide
|
||
a way of ordering their collection of sources. Installers should also
|
||
ideally provide optional debugging output that provides insight into
|
||
which index is being considered.
|
||
|
||
Each package’s finder should start at the beginning of the list of indexes, so each
|
||
package starts over with the index list. In other words, if one package has no
|
||
valid candidates on the first index, but finds a hit on the second index,
|
||
subsequent packages should still start their search on the first index, rather than
|
||
starting on the second.
|
||
|
||
One desirable behavior that the index priority strategy implies is that
|
||
there are no “surprise” updates, where a version bump on a
|
||
lower-priority index wins out over a curated, approved higher-priority
|
||
index. This is related to the security improvement of :pep:`708`, where
|
||
packages can restrict the external indexes that distributions can come
|
||
from, but index priority is more configurable by end users. The package installs are
|
||
only expected to change when either the higher-priority index or the
|
||
index priority configuration change. This stability and predictability
|
||
makes it more viable to configure indexes as a more persistent property of an
|
||
environment, rather than a one-off argument for one install command.
|
||
|
||
Cache keys
|
||
~~~~~~~~~~
|
||
|
||
Because index priority is acknowledging the possibility that different indexes
|
||
may have different content for a given package, caching and lockfiles should now
|
||
include the index from which distributions were downloaded. Without this
|
||
aspect, it is possible that after changing the list of configured indexes, the
|
||
cache or lockfile could provide a similarly-named distribution from a
|
||
lower-priority index. If every index follows the recommended behavior of
|
||
providing identical files across indexes for a given filename, this is not an
|
||
issue. However, that recommendation is not readily enforceable, and augmenting
|
||
the cache key with origin index would be a wise defensive change.
|
||
|
||
Ways that a request falls through to a lower priority index
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
- Package name is not present at all in higher priority index
|
||
- All distributions from higher priority index filtered out due to
|
||
version specifier, compatible Python version, platform tag, yanking or otherwise
|
||
- A denylist configuration for the installer specifies that a particular package
|
||
name should be ignored on a given index
|
||
- A higher priority index is unreachable (e.g. blocked by firewall
|
||
rules, temporarily unavailable due to maintenance, other miscellaneous
|
||
and temporary networking issues). This is a less clear-cut detail that
|
||
should be controllable by users. On one hand, this behavior would lead
|
||
to less predictable, likely unreproducible results by unexpectedly
|
||
falling through to lower priority indexes. On the other hand, graceful
|
||
fallback may be more valuable to some users, especially if they can
|
||
safely assume that all of their indexes are equally trusted. pip’s
|
||
behavior today is graceful fallback: you see warnings if an index is
|
||
having connection issues, but the installation will proceed with any
|
||
other available indexes. Because index priority can convey different trust
|
||
levels between indexes, installers that implement index priority should
|
||
default to raising errors and aborting on network issues. Installers may
|
||
choose to provide a flag to allow fall-through to lower-priority indexes in
|
||
case of network error.
|
||
|
||
Treatment within a given index follows existing behavior, but stops at
|
||
the bounds of one index and moves on to the next index only after all
|
||
priority preferences within the one index are exhausted. This means that
|
||
existing priorities among the unified collection of packages apply to
|
||
each index individually before falling through to a lower priority
|
||
index.
|
||
|
||
There are tradeoffs to make at every level of the optimization criteria:
|
||
|
||
- version: index priority will use an older version from a higher-priority index
|
||
even if a newer version is available on another index.
|
||
- wheel vs sdist: Should the installer use an sdist from a higher-priority
|
||
index before trying a wheel from a lower-priority index?
|
||
- more platform-specific wheels before less specific ones: Should the
|
||
installer use less specific wheels from higher-priority indexes
|
||
before using more specific wheels from lower priority indexes?
|
||
- flags such as pip's ``--prefer-binary``: Should the installer use an sdist from a higher
|
||
priority index before considering wheels on a lower priority index?
|
||
|
||
Installers are free to implement these priorities in different ways for
|
||
themselves, but they should document their optimization criteria and how they
|
||
handle fall-through to lower-priority indexes. For example, an installer could
|
||
say that ``--prefer-binary`` should not install an sdist unless it had iterated
|
||
through all configured indexes and found no installable binary candidates.
|
||
|
||
Mirroring
|
||
~~~~~~~~~
|
||
|
||
As described thus far, the index priority scheme breaks the use case of more
|
||
than one index url serving the same content. Such mirrors may be used with the
|
||
intent of ameliorating network issues or otherwise improving reliability. One
|
||
approach that installers could take to preserve mirroring functionality while
|
||
adding index priority would be to add a notion of user-definable index groups,
|
||
where each index in the group is assumed to be equivalent. This is related to
|
||
`Poetry's notion of package sources
|
||
<https://python-poetry.org/docs/repositories/>`__, except that this would allow
|
||
arbitrary numbers of prioritizable groups, and that this would assume members of
|
||
a group to be mirrors. Within each group, content could be combined, or each
|
||
member could be fetched concurrently. The fastest responding index would then
|
||
represent the group.
|
||
|
||
Backwards Compatibility
|
||
=======================
|
||
|
||
This PEP does not prescribe any changes as mandatory for any installer,
|
||
so it only introduces compatibility concerns if tools choose to adopt an
|
||
index behavior other than the behavior(s) they currently implement.
|
||
|
||
This PEP’s language does not quite align with existing tools, including
|
||
pip and uv. Either this PEP’s language can change during review of this PEP, or if
|
||
this PEP’s language is preferred, other projects could conform to it.
|
||
The only goal of proposing these terms is to create a central, common vocabulary
|
||
that makes it easier for users to learn about other installers.
|
||
|
||
As some tools rely on one or the other behavior, there are some possible
|
||
issues that may emerge, where tailoring available resources/packages for
|
||
a particular behavior may detract from the user experience for people
|
||
who rely on the other behavior.
|
||
|
||
- Different indexes may have different metadata. For example, one cannot assume
|
||
that the metadata for package “something” on index “A” has the same dependencies
|
||
as “something” on index “B”. This breaks fundamental assumptions of version
|
||
priority, but index priority can handle this. When an installer falls through to a
|
||
lower-priority index in the search order, it implies refreshing the package metadata
|
||
from the new index. This is both an improvement and a complication. It is a
|
||
complication in the sense that a cached metadata entry must be keyed by both
|
||
package name and index url, instead of just package name. It is a potential
|
||
improvement in that different implementation variants of a package can differ in
|
||
dependencies as long as their distributions are separated into different indexes.
|
||
|
||
- Users may not get updates as they expect when using index priority, because some higher priority
|
||
index has not updated/synchronized with PyPI to get the latest
|
||
packages. If the higher priority index has a valid candidate, newer
|
||
packages will not be found. This will need to be communicated
|
||
verbosely, because it is counter to pip’s well-established behavior.
|
||
|
||
- By adding index priority, an installer will improve the predictability of
|
||
which index will be selected, and index hosts may abuse this as a way of having
|
||
similarly named files that have different contents. With version priority,
|
||
this violates the key package interchangeability assumption, and insanity will ensue.
|
||
Index priority would be more workable, but the situation still
|
||
has great potential for confusion. It would be helpful to develop tools that
|
||
support installers in identifying these confusing issues. These tools could
|
||
operate independently of the installer process, as a means of validating the
|
||
sanity of a set of indexes. Depending on the time cost of these tools, the
|
||
installers could run them as part of their process. Users could, of course,
|
||
ignore the recommendations at their own risk.
|
||
|
||
Security Implications
|
||
=====================
|
||
|
||
Index priority creates a mechanism for users to explicitly specify a trust
|
||
hierarchy among their indexes. As such, it limits the potential for dependency
|
||
confusion attacks. Index priority was rejected by :pep:`708` as a solution for
|
||
dependency confusion attacks. This PEP requests that the rejection be
|
||
reconsidered, with index priority serving a different purpose. This PEP is
|
||
primarily motivated by the desire to support implementation variants, which is
|
||
the subject of `another discussion that hopefully leads to a PEP
|
||
<https://discuss.python.org/t/selecting-variant-wheels-according-to-a-semi-static-specification/53446>`__.
|
||
It is not mutually exclusive with :pep:`708`, nor does it suggest reverting or
|
||
withdrawing :pep:`708`. It is an answer to `how we could allow users to choose
|
||
which index to use at a more fine grained level than “per install”.
|
||
<https://github.com/astral-sh/uv/issues/171#issuecomment-1952291242>`__
|
||
|
||
For a more thorough discussion of the :pep:`708` rejection of index
|
||
priority, please see the `discuss.python.org thread for this PEP
|
||
<https://discuss.python.org/t/pep-766-handling-multiple-indexes-index-priority/71589>`__.
|
||
|
||
How to Teach This
|
||
=================
|
||
|
||
At the outset, the goal is not to convert pip or any other tool to
|
||
change its default priority behavior. The best way to teach is perhaps
|
||
to watch message boards, GitHub issue trackers and chat channels,
|
||
keeping an eye out for problems that index priority could help solve.
|
||
There are `several <https://github.com/pypa/pip/issues/8606>`__
|
||
`long-standing <https://stackoverflow.com/questions/67253141/python-pip-priority-order-with-index-url-and-extra-index-url>`__
|
||
`discussions <https://github.com/pypa/pip/issues/5045>`__
|
||
`that <https://discuss.python.org/t/dependency-notation-including-the-index-url/5659>`__
|
||
`would <https://github.com/pypa/pip/issues/9612>`__ be good places to
|
||
start advertising the concepts. The topics of the two officially
|
||
supported behaviors need documentation, and we, the authors of this
|
||
PEP, would develop these as part of the review period of this PEP.
|
||
These docs would likely consist of additions across several
|
||
indexes, cross-linking the concepts between installers. At a
|
||
minimum, we expect to add to the
|
||
`PyPUG <https://packaging.python.org/en/latest/>`__ and to `pip’s
|
||
documentation <https://pip.pypa.io/en/stable/cli/pip_install/>`__.
|
||
|
||
It will be important for installers to advertise the active behavior, especially in
|
||
error messaging, and that will provide ways to provide resources to
|
||
users about these behaviors.
|
||
|
||
uv users are already experiencing index priority. uv `documents this
|
||
behavior <https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes>`__
|
||
well, but it is always possible to `improve the
|
||
discoverability <https://github.com/astral-sh/uv/issues/4389>`__ of that
|
||
documentation from the command line, `where users will actually
|
||
encounter the unexpected
|
||
behavior <https://github.com/astral-sh/uv/issues/5146>`__.
|
||
|
||
Reference Implementation
|
||
========================
|
||
|
||
The uv project demonstrates index priority with its default behavior. uv
|
||
is implemented in Rust, though, so if a reference implementation to a Python-based tool
|
||
is necessary, we, the authors of this PEP, will provide one. For pip in
|
||
particular, we see the implementation plan as something like:
|
||
|
||
- For users who don’t use ``--extra-index-url`` or ``--find-links``,
|
||
there will be no change, and no migration is necessary.
|
||
- pip users would be able opt in to the index priority behavior with a
|
||
new config setting in the CLI and in ``pip.conf``. This proposal does not
|
||
recommend any strategy as the default for any installer. It only
|
||
recommends documenting the strategies that a tool provides.
|
||
- Enable extra info-level output for any pip operation where more than
|
||
one index is used. In this output, state the current strategy setting,
|
||
and a terse summary of implied behavior, as well as a link to docs
|
||
that describe the different options
|
||
- Add debugging output that verbosely identifies the index being used at
|
||
each step, including where the file is in the configuration hierarchy,
|
||
and where it is being included (via config file, env var, or CLI
|
||
flag).
|
||
- Plumb tracking of which index gets used for which
|
||
package/distribution through the entire pip install process. Store
|
||
this information so that it is available to tools like ``pip freeze``
|
||
- Supplement :pep:`751` (lockfiles) with capture of index where a
|
||
package/distribution came from
|
||
|
||
Rejected Ideas
|
||
==============
|
||
|
||
- Tell users to set up a proxy/mirror, such as `devpi <https://github.com/devpi/devpi>`__
|
||
or `Artifactory <https://jfrog.com/help/r/jfrog-artifactory-documentation/pypi-repositories>`__ that
|
||
serves local files if present, and forwards to another server (PyPI)
|
||
if no local files match
|
||
|
||
This matches the behavior of this proposal very closely, except that
|
||
this method requires hosting some server, and may be inaccessible or
|
||
not configurable to users in some environments. It is also important
|
||
to consider that for an organization that operates its own index
|
||
(for overcoming PyPI size restrictions, for example), this does not
|
||
solve the need for ``--extra-index-url`` or proxy/mirror for end
|
||
users. That is, organizations get no improvement from this approach
|
||
unless they proxy/mirror PyPI as a whole, and get users to configure
|
||
their proxy/mirror as their sole index.
|
||
|
||
- Are build tags and/or local version specifiers enough?
|
||
|
||
Build tags and local version specifiers will take precedence over
|
||
packages without those tags and/or local version specifiers. In a pool
|
||
of packages, builds that have these additions hosted on a server other
|
||
than PyPI will take priority over packages on PyPI, which rarely use
|
||
build tags, and forbid local version specifiers. This approach is
|
||
viable when package providers want to provide their own local
|
||
override, such as `HPC maintainers who provide optimized builds for
|
||
their
|
||
users <https://github.com/ComputeCanada/software-stack/blob/main/pip-which-version.md>`__.
|
||
It is less viable in some ways, such as build tags not showing up in
|
||
``pip freeze`` metadata, and `local version specifiers not being
|
||
allowed on
|
||
PyPI <https://discuss.python.org/t/lets-permit-local-version-label-in-version-specifiers/22781>`__.
|
||
There is also significant work entailed in building and maintaining
|
||
package collections with local build tag variants.
|
||
|
||
https://discuss.python.org/t/dependency-notation-including-the-index-url/5659/21
|
||
|
||
- What about :pep:`708`? Isn’t that
|
||
enough?
|
||
|
||
:pep:`708` is aimed specifically at addressing dependency confusion
|
||
attacks, and doesn’t address the potential for implementation variants
|
||
among indexes. It is a way of filtering external URLs and encoding an
|
||
allow-list for external indexes in index metadata. It does not change
|
||
the lack of priority or preference among channels that currently
|
||
exists.
|
||
|
||
- `Namespacing <https://discuss.python.org/t/dependency-notation-including-the-index-url/5659>`__
|
||
|
||
Namespacing is a means of specifying a package such that the Python
|
||
usage of the package does not change, but the package installation
|
||
restricts where the package comes from. :pep:`752` recently proposed a way to
|
||
multiplex a package’s owners in a flat package namespace (e.g.
|
||
PyPI) by reserving prefixes as grouping elements. `NPM’s concept
|
||
of “scopes” <https://docs.npmjs.com/cli/v10/using-npm/scope>`__ has
|
||
been raised as another good example of how this might look. This PEP
|
||
differs in that it is targeted to multiple index, not a flat package
|
||
namespace. The net effect is roughly the same in terms of predictably
|
||
choosing a particular package source, except that the namespacing
|
||
approach relies more on naming packages with these namespace prefixes,
|
||
whereas this PEP would be less granular, pulling in packages on
|
||
whatever higher-priority index the user specifies. The namespacing
|
||
approach relies on all configured indexes treating a given namespace
|
||
similarly, which leaves the usual concern that not all configured
|
||
indexes are trusted equally. The namespace idea is not incompatible
|
||
with this PEP, but it also does not improve expression of trust of
|
||
indexes in the way that this PEP does.
|
||
|
||
Open Issues
|
||
===========
|
||
|
||
[Any points that are still being decided/discussed.]
|
||
|
||
Acknowledgements
|
||
================
|
||
|
||
This work was supported financially by NVIDIA through employment of the author.
|
||
NVIDIA teammates dramatically improved this PEP with their
|
||
input. Astral Software pioneered the behaviors of index priority and thus laid the
|
||
foundation of this document. The pip authors deserve great praise for their
|
||
consistent direction and patient communication of the version priority behavior,
|
||
especially in the face of contentious security concerns.
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document is placed in the public domain or under the
|
||
CC0-1.0-Universal license, whichever is more permissive.
|