412 lines
18 KiB
ReStructuredText
412 lines
18 KiB
ReStructuredText
PEP: 752
|
|
Title: Implicit namespaces for package repositories
|
|
Author: Ofek Lev <ofekmeister@gmail.com>
|
|
Sponsor: Barry Warsaw <barry@python.org>
|
|
PEP-Delegate: Dustin Ingram <di@python.org>
|
|
Discussions-To: https://discuss.python.org/t/61227
|
|
Status: Draft
|
|
Type: Standards Track
|
|
Topic: Packaging
|
|
Created: 13-Aug-2024
|
|
Post-History: `18-Aug-2024 <https://discuss.python.org/t/61227>`__,
|
|
|
|
Abstract
|
|
========
|
|
|
|
This PEP specifies a way for organizations to reserve package name prefixes
|
|
for future uploads.
|
|
|
|
"Namespaces are one honking great idea -- let's do more of
|
|
those!" - :pep:`20`
|
|
|
|
Motivation
|
|
==========
|
|
|
|
The current ecosystem lacks a way for projects with many packages to signal a
|
|
verified pattern of ownership. Some examples:
|
|
|
|
* `Typeshed <https://github.com/python/typeshed>`__ is a community effort to
|
|
maintain type stubs for various packages. The stub packages they maintain
|
|
mirror the package name they target and are prefixed by ``types-``. For
|
|
example, the package ``requests`` has a stub that users would depend on
|
|
called ``types-requests``.
|
|
* Major cloud providers like Amazon, Google and Microsoft have a common prefix
|
|
for each feature's corresponding package [1]_. For example, most of Google's
|
|
packages are prefixed by ``google-cloud-`` e.g. ``google-cloud-compute`` for
|
|
`using virtual machines <https://cloud.google.com/products/compute>`__.
|
|
* Many projects [2]_ support a model where some packages are officially
|
|
maintained and third-party developers are encouraged to participate by
|
|
creating their own. For example, `Datadog <https://www.datadoghq.com>`__
|
|
offers observability as a service for organizations at any scale. The
|
|
`Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships out-of-the-box
|
|
with
|
|
`official integrations <https://github.com/DataDog/integrations-core>`__
|
|
for many products, like various databases and web servers, which are
|
|
distributed as Python packages that are prefixed by ``datadog-``. There is
|
|
support for creating `third-party integrations`__ which customers may run.
|
|
|
|
__ https://docs.datadoghq.com/developers/integrations/agent_integration/
|
|
|
|
Such projects are uniquely vulnerable to name-squatting attacks
|
|
which can ultimately result in `dependency confusion`__.
|
|
|
|
__ https://www.activestate.com/resources/quick-reads/dependency-confusion/
|
|
|
|
For example, say a new product is released for which monitoring would be
|
|
valuable. It would be reasonable to assume that Datadog would eventually
|
|
support it as an official integration. It takes a nontrivial amount of time to
|
|
deliver such an integration due to roadmap prioritization and the time required
|
|
for implementation. It would be impossible to reserve the name of every
|
|
potential package so in the interim an attacker may create a package that
|
|
appears legitimate which would execute malicious code at runtime. Not only are
|
|
users more likely to install such packages but doing so taints the perception
|
|
of the entire project.
|
|
|
|
Although :pep:`708` attempts to address this attack vector, it is specifically
|
|
about the case of multiple repositories being considered during dependency
|
|
resolution and does not offer any protection to the aforementioned use cases.
|
|
|
|
Namespacing also would drastically reduce the incidence of
|
|
`typosquatting <https://en.wikipedia.org/wiki/Typosquatting>`__
|
|
because typos would have to be in the prefix itself which is
|
|
`normalized <naming_>`_ and likely to be a short, well-known identifier like
|
|
``aws-``.
|
|
|
|
Rationale
|
|
=========
|
|
|
|
Other package ecosystems have generally solved this problem by taking one of
|
|
two approaches: either minimizing or maximizing backwards compatibility.
|
|
|
|
* `NPM <https://www.npmjs.com>`__ has the concept of
|
|
`scoped packages <https://docs.npmjs.com/about-scopes>`__ which were
|
|
`introduced`__ primarily to combat there being a dearth of available good
|
|
package names (whether a real or perceived phenomenon). When a user or
|
|
organization signs up they are given a scope that matches their name. For
|
|
example, the
|
|
`package <https://www.npmjs.com/package/@google-cloud/storage>`__ for using
|
|
Google Cloud Storage is ``@google-cloud/storage`` where ``@google-cloud/`` is
|
|
the scope. Regular user accounts (non-organization) may publish `unscoped`__
|
|
packages for public use.
|
|
This approach has the lowest amount of backwards compatibility because every
|
|
installer and tool has to be modified to account for scopes.
|
|
* `NuGet <https://www.nuget.org>`__ has the concept of
|
|
`package ID prefix reservation`__ which was
|
|
`introduced`__ primarily to satisfy users wishing to know where a package
|
|
came from. A package name prefix may be reserved for use by one or more
|
|
owners. Every reserved package has a special indication
|
|
`on its page <https://www.nuget.org/packages/Google.Cloud.Storage.V1>`__ to
|
|
communicate this. After reservation, any upload with a reserved prefix will
|
|
fail if the user is not an owner of the prefix. Existing packages that have a
|
|
prefix that is owned may continue to release as usual. This approach has the
|
|
highest amount of backwards compatibility because only modifications to
|
|
indices like PyPI are required and installers do not need to change.
|
|
|
|
__ https://blog.npmjs.org/post/116936804365/solving-npms-hard-problem-naming-packages
|
|
__ https://docs.npmjs.com/package-scope-access-level-and-visibility
|
|
__ https://learn.microsoft.com/en-us/nuget/nuget-org/id-prefix-reservation
|
|
__ https://devblogs.microsoft.com/nuget/Package-identity-and-trust/
|
|
|
|
This PEP specifies the NuGet approach of authorized reservation across a flat
|
|
namespace. Any solution that requires new package syntax must be built atop the
|
|
existing flat namespace and therefore implicit namespaces acquired via a
|
|
reservation mechanism would be a prerequisite to such explicit namespaces.
|
|
|
|
Terminology
|
|
===========
|
|
|
|
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
|
|
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
|
|
interpreted as described in :rfc:`2119`.
|
|
|
|
Organization
|
|
`Organizations <orgs_>`_ are entities that own projects and have various
|
|
users associated with them.
|
|
Grant
|
|
A grant is a reservation of a namespace for a package repository.
|
|
Open Namespace
|
|
An `open <open-namespaces_>`_ namespace allows for uploads from any project
|
|
owner.
|
|
Restricted Namespace
|
|
A restricted namespace only allows uploads from an owner of the namespace.
|
|
Parent Namespace
|
|
A namespace's parent refers to the namespace without the trailing
|
|
hyphenated component e.g. the parent of ``foo-bar`` is ``foo``.
|
|
Child Namespace
|
|
A namespace's child refers to the namespace with an additional trailing
|
|
hyphenated component e.g. the child of ``foo`` is ``foo-bar``.
|
|
|
|
Specification
|
|
=============
|
|
|
|
.. _orgs:
|
|
|
|
Organizations
|
|
-------------
|
|
|
|
Any package repository that allows for the creation of projects (e.g.
|
|
non-mirrors) MAY offer the concept of `organizations`__. Organizations
|
|
are entities that own projects and have various users associated with them.
|
|
|
|
__ https://blog.pypi.org/posts/2023-04-23-introducing-pypi-organizations/
|
|
|
|
Organizations MAY reserve one or more namespaces. Such reservations neither
|
|
confer ownership nor grant special privileges to existing projects.
|
|
|
|
.. _naming:
|
|
|
|
Naming
|
|
------
|
|
|
|
A namespace MUST be a `valid`__ project name and `normalized`__ internally e.g.
|
|
``foo.bar`` would become ``foo-bar``.
|
|
|
|
__ https://packaging.python.org/en/latest/specifications/name-normalization/#name-format
|
|
__ https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization
|
|
|
|
Semantics
|
|
---------
|
|
|
|
A namespace grant bestows ownership over the following:
|
|
|
|
1. A project matching the namespace itself such as the placeholder package
|
|
`microsoft <https://pypi.org/project/microsoft/>`__.
|
|
2. Projects that start with the namespace followed by a hyphen. For example,
|
|
the namespace ``foo`` would match the normalized project name ``foo-bar``
|
|
but not the project name ``foobar``.
|
|
|
|
Package name matching acts upon the `normalized <naming_>`_ namespace.
|
|
|
|
Namespaces are per-package repository and SHALL NOT be shared between
|
|
repositories. For example, if PyPI has a namespace ``microsoft`` that is owned
|
|
by the company Microsoft, packages starting with ``microsoft-`` that come from
|
|
other non-PyPI mirror repositories do not confer the same level of trust.
|
|
|
|
Grants MUST NOT overlap. For example, if there is an existing grant
|
|
for ``foo-bar`` then a new grant for ``foo`` would be forbidden. An overlap is
|
|
determined by comparing the `normalized <naming_>`_ proposed namespace with the
|
|
normalized namespace of every existing root grant. Every comparison must append
|
|
a hyphen to the end of the proposed and existing namespace. An overlap is
|
|
detected when any existing namespace starts with the proposed namespace.
|
|
|
|
.. _uploads:
|
|
|
|
Uploads
|
|
-------
|
|
|
|
If the following criteria are all true for a given upload:
|
|
|
|
1. The project does not yet exist.
|
|
2. The name matches a reserved namespace.
|
|
3. The project is not owned by an organization with an active grant for the
|
|
namespace.
|
|
|
|
Then the upload MUST fail with a 403 HTTP status code.
|
|
|
|
.. _open-namespaces:
|
|
|
|
Open Namespaces
|
|
-----------------
|
|
|
|
The owner of a grant may choose to allow others the ability to release new
|
|
projects with the associated namespace. Doing so MUST allow
|
|
`uploads <uploads_>`_ for new projects matching the namespace from any user.
|
|
|
|
It is possible for the owner of a namespace to both make it open and allow
|
|
other organizations to use the grant. In this case, the authorized
|
|
organizations have no special permissions and are equivalent to an open grant
|
|
without ownership.
|
|
|
|
.. _repository-metadata:
|
|
|
|
Repository Metadata
|
|
-------------------
|
|
|
|
The :pep:`JSON API <691>` version will be incremented from ``1.0`` to ``1.1``.
|
|
The following API changes MUST be implemented by repositories that support
|
|
this PEP. Repositories that do not support this PEP MUST NOT implement these
|
|
changes so that consumers of the API are able to determine whether the
|
|
repository supports this PEP.
|
|
|
|
.. _project-detail:
|
|
|
|
Project Detail
|
|
''''''''''''''
|
|
|
|
The :pep:`project detail <691#project-detail>` response will be modified as
|
|
follows.
|
|
|
|
The ``namespace`` key MUST be ``null`` if the project does not match an active
|
|
namespace grant. If the project does match a namespace grant, the value MUST be
|
|
a mapping with the following keys:
|
|
|
|
* ``prefix``: This is the associated `normalized <naming_>`_ namespace e.g.
|
|
``foo-bar``. If the owner of the project owns multiple matching grants then
|
|
this MUST be the namespace with the most number of characters. For example,
|
|
if the project name matched both ``foo-bar`` and ``foo-bar-baz`` then this
|
|
key would be the latter.
|
|
* ``authorized``: This is a boolean and will be true if the project owner
|
|
is an organization and is one of the current owners of the grant. This is
|
|
useful for tools that wish to make a distinction between official and
|
|
community packages.
|
|
* ``open``: This is a boolean indicating whether the namespace is
|
|
`open <open-namespaces_>`_.
|
|
|
|
Namespace Detail
|
|
''''''''''''''''
|
|
|
|
The format of this URL is ``/namespace/<namespace>`` where ``<namespace>`` is
|
|
the `normalized <naming_>`_ namespace. For example, the URL for the namespace
|
|
``foo.bar`` would be ``/namespace/foo-bar``.
|
|
|
|
The response will be a mapping with the following keys:
|
|
|
|
* ``prefix``: This is the `normalized <naming_>`_ version of the namespace e.g.
|
|
``foo-bar``.
|
|
* ``owner``: This is the organization that is responsible for the namespace.
|
|
* ``open``: This is a boolean indicating whether the namespace is
|
|
`open <open-namespaces_>`_.
|
|
* ``parent``: This is the parent namespace if it exists. For example, if the
|
|
namespace is ``foo-bar`` and there is an active grant for ``foo``, then this
|
|
would be ``"foo"``. If there is no parent then this key will be ``null``.
|
|
* ``children``: This is an array of any child namespaces. For example, if the
|
|
namespace is ``foo`` and there are active grants for ``foo-bar`` and
|
|
``foo-bar-baz`` then this would be ``["foo-bar", "foo-bar-baz"]``.
|
|
|
|
Grant Removal
|
|
-------------
|
|
|
|
When a reserved namespace becomes unclaimed, repositories MUST set the
|
|
``namespace`` key to ``null`` in the `API <project-detail_>`_.
|
|
|
|
Namespaces that were previously claimed but are now not SHOULD be eligible for
|
|
claiming again by any organization.
|
|
|
|
Backwards Compatibility
|
|
=======================
|
|
|
|
There are no intrinsic concerns because there is still a flat namespace and
|
|
installers need no modification. Additionally, many projects have already
|
|
chosen to signal a shared purpose with a prefix like `typeshed has done`__.
|
|
|
|
__ https://github.com/python/typeshed/issues/2491#issuecomment-578456045
|
|
|
|
Security Implications
|
|
=====================
|
|
|
|
* There is an opportunity to build on top of :pep:`740` and :pep:`480` so that
|
|
one could prove cryptographically that a specific release came from an owner
|
|
of the associated namespace. This PEP makes no effort to describe how this
|
|
will happen other than that work is planned for the future.
|
|
|
|
How to Teach This
|
|
=================
|
|
|
|
For consumers of packages we will document how metadata is exposed in the
|
|
`API <repository-metadata_>`_ and potentially in future note tooling that
|
|
supports utilizing namespaces to provide extra security guarantees during
|
|
installation.
|
|
|
|
Reference Implementation
|
|
========================
|
|
|
|
None at this time.
|
|
|
|
Rejected Ideas
|
|
==============
|
|
|
|
Organization Scoping
|
|
--------------------
|
|
|
|
The primary motivation for this PEP is to reduce dependency confusion attacks
|
|
and NPM-style scoping with an allowance of the legacy flat namespace would
|
|
increase the risk. If documentation instructed a user to install ``bar`` in the
|
|
namespace ``foo`` then the user must be careful to install ``@foo/bar`` and not
|
|
``foo-bar``, or vice versa. The Python packaging ecosystem has normalization
|
|
rules for names in order to maximize the ease of communication and this would
|
|
be a regression.
|
|
|
|
The runtime environment of Python is also not conducive to scoping. Whereas
|
|
multiple versions of the same JavaScript package may coexist, Python only
|
|
allows a single global namespace. Barring major changes to the language itself,
|
|
this is nearly impossible to change. Additionally, users have come to expect
|
|
that the package name is usually the same as what they would import and
|
|
eliminating the flat namespace would do away with that convention.
|
|
|
|
Scoping would be particularly affected by organization changes which are bound
|
|
to happen over time. An organization may change their name due to internal
|
|
shuffling, an acquisition, or any other reason. Whenever this happens every
|
|
project they own would in effect be renamed which would cause unnecessary
|
|
confusion for users, frequently.
|
|
|
|
Finally, the disruption to the community would be massive because it would
|
|
require an update from every package manager, security scanner, IDE, etc. New
|
|
packages released with the scoping would be incompatible with older tools and
|
|
would cause confusion for users along with frustration from maintainers having
|
|
to triage such complaints.
|
|
|
|
Open Issues
|
|
===========
|
|
|
|
None at this time.
|
|
|
|
Footnotes
|
|
=========
|
|
|
|
.. [1] The following shows the package prefixes for the major cloud providers:
|
|
|
|
- Amazon: `aws-cdk- <https://docs.aws.amazon.com/cdk/api/v2/python/>`__
|
|
- Google: `google-cloud- <https://github.com/googleapis/google-cloud-python/tree/main/packages>`__
|
|
and others based on ``google-``
|
|
- Microsoft: `azure- <https://github.com/Azure/azure-sdk-for-python/tree/main/sdk>`__
|
|
|
|
.. [2] Some examples of projects that have many packages with a common prefix:
|
|
|
|
- `Django <https://www.djangoproject.com>`__ is one of the most widely used
|
|
web frameworks in existence. They have the concept of `reusable apps`__,
|
|
which are commonly installed via
|
|
`third-party packages <https://djangopackages.org>`__ that implement a
|
|
subset of functionality to extend Django-based websites. These packages
|
|
are by convention prefixed by ``django-`` or ``dj-``.
|
|
- `Project Jupyter <https://jupyter.org>`__ is devoted to the development of
|
|
tooling for sharing interactive documents. They support `extensions`__
|
|
which in most cases (and in all cases for officially maintained
|
|
extensions) are prefixed by ``jupyter-``.
|
|
- `pytest <https://docs.pytest.org>`__ is Python's most popular testing
|
|
framework. They have the concept of `plugins`__ which may be developed by
|
|
anyone and by convention are prefixed by ``pytest-``.
|
|
- `MkDocs <https://www.mkdocs.org>`__ is a documentation framework based on
|
|
Markdown files. They also have the concept of
|
|
`plugins <https://www.mkdocs.org/dev-guide/plugins/>`__ which may be
|
|
developed by anyone and are usually prefixed by ``mkdocs-``.
|
|
- `Sphinx <https://www.sphinx-doc.org>`__ is a documentation framework
|
|
popular for large technical projects such as
|
|
`Swift <https://www.swift.org>`__ and Python itself. They have
|
|
the concept of `extensions`__ which are prefixed by ``sphinxcontrib-``,
|
|
many of which are maintained within a
|
|
`dedicated organization <https://github.com/sphinx-contrib>`__.
|
|
- `OpenTelemetry <https://opentelemetry.io>`__ is an open standard for
|
|
observability with `official packages`__ for the core APIs and SDK with
|
|
`third-party packages`__ to collect data from various sources. All
|
|
packages are prefixed by ``opentelemetry-`` with child prefixes in the
|
|
form ``opentelemetry-<component>-<name>-``.
|
|
- `Apache Airflow <https://airflow.apache.org>`__ is a platform to
|
|
programmatically orchestrate tasks as directed acyclic graphs (DAGs).
|
|
They have the concept of `plugins`__, and also `providers`__ which are
|
|
prefixed by ``apache-airflow-providers-``.
|
|
|
|
__ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/
|
|
__ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
|
|
__ https://docs.pytest.org/en/stable/how-to/writing_plugins.html
|
|
__ https://www.sphinx-doc.org/en/master/usage/extensions/index.html
|
|
__ https://github.com/open-telemetry/opentelemetry-python
|
|
__ https://github.com/open-telemetry/opentelemetry-python-contrib
|
|
__ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html
|
|
__ https://airflow.apache.org/docs/apache-airflow-providers/index.html
|
|
|
|
Copyright
|
|
=========
|
|
|
|
This document is placed in the public domain or under the
|
|
CC0-1.0-Universal license, whichever is more permissive.
|