python-peps/peps/pep-0752.rst

537 lines
23 KiB
ReStructuredText
Raw Normal View History

PEP: 752
Title: Package repository namespaces
Author: Ofek Lev <ofekmeister@gmail.com>
Sponsor: Barry Warsaw <barry@python.org>
PEP-Delegate: Donald Stufft <donald@stufft.io>
Discussions-To: https://discuss.python.org/t/61227
Status: Draft
Type: Standards Track
Topic: Packaging
Created: 13-Aug-2024
Post-History: `18-Aug-2024 <https://discuss.python.org/t/61227>`__,
Abstract
========
This PEP specifies a way for organizations to reserve package name prefixes
for future uploads.
"Namespaces are one honking great idea -- let's do more of
those!" - :pep:`20`
Motivation
==========
The current ecosystem lacks a way for projects with many packages to signal a
verified pattern of ownership. Some examples:
* `Typeshed <https://github.com/python/typeshed>`__ is a community effort to
maintain type stubs for various packages. The stub packages they maintain
mirror the package name they target and are prefixed by ``types-``. For
example, the package ``requests`` has a stub that users would depend on
called ``types-requests``.
* Major cloud providers like Amazon, Google and Microsoft have a common prefix
for each feature's corresponding package [1]_. For example, most of Google's
packages are prefixed by ``google-cloud-`` e.g. ``google-cloud-compute`` for
`using virtual machines <https://cloud.google.com/products/compute>`__.
* Many projects [2]_ support a model where some packages are officially
maintained and third-party developers are encouraged to participate by
creating their own. For example, `Datadog <https://www.datadoghq.com>`__
offers observability as a service for organizations at any scale. The
`Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships out-of-the-box
with
`official integrations <https://github.com/DataDog/integrations-core>`__
for many products, like various databases and web servers, which are
distributed as Python packages that are prefixed by ``datadog-``. There is
support for creating `third-party integrations`__ which customers may run.
__ https://docs.datadoghq.com/developers/integrations/agent_integration/
Such projects are uniquely vulnerable to `dependency confusion`__ attacks.
For example, say a new product is released for which monitoring would be
valuable. It would be reasonable to assume that Datadog would eventually
support it as an official integration. It takes a nontrivial amount of time to
deliver such an integration due to roadmap prioritization and the time required
for implementation. It would be impossible to reserve the name of every
potential package so in the interim an attacker may create a package that
appears legitimate which would execute malicious code at runtime. Not only are
users more likely to install such packages but doing so taints the perception
of the entire project.
__ https://www.activestate.com/resources/quick-reads/dependency-confusion/
Although :pep:`708` attempts to address this attack vector, it is specifically
about the case of multiple repositories being considered during dependency
resolution and does not offer any protection to the aforementioned use cases.
Namespacing also would drastically reduce the incidence of
`typosquatting <https://en.wikipedia.org/wiki/Typosquatting>`__
because typos would have to be in the prefix itself which is
`normalized <naming_>`_ and likely to be a short, well-known identifier like
``aws-``.
Rationale
=========
Tolerance for Disruption
------------------------
Other package ecosystems have generally solved this problem by taking one of
two approaches: either minimizing or maximizing backwards compatibility.
* `NPM <https://www.npmjs.com>`__ has the concept of
`scoped packages <https://docs.npmjs.com/about-scopes>`__ which were
`introduced`__ primarily to combat there being a dearth of available good
package names (whether a real or perceived phenomenon). When a user or
organization signs up they are given a scope that matches their name. For
example, the
`package <https://www.npmjs.com/package/@google-cloud/storage>`__ for using
Google Cloud Storage is ``@google-cloud/storage`` where ``@google-cloud/`` is
the scope. Regular user accounts (non-organization) may publish `unscoped`__
packages for public use.
This approach has the lowest amount of backwards compatibility because every
installer and tool has to be modified to account for scopes.
* `NuGet <https://www.nuget.org>`__ has the concept of
`package ID prefix reservation`__ which was
`introduced`__ primarily to satisfy users wishing to know where a package
came from. A package name prefix may be reserved for use by one or more
owners. Every reserved package has a special indication
`on its page <https://www.nuget.org/packages/Google.Cloud.Storage.V1>`__ to
communicate this. After reservation, any upload with a reserved prefix will
fail if the user is not an owner of the prefix. Existing packages that have a
prefix that is owned may continue to release as usual. This approach has the
highest amount of backwards compatibility because only modifications to
indices like PyPI are required and installers do not need to change.
__ https://blog.npmjs.org/post/116936804365/solving-npms-hard-problem-naming-packages
__ https://docs.npmjs.com/package-scope-access-level-and-visibility
__ https://learn.microsoft.com/en-us/nuget/nuget-org/id-prefix-reservation
__ https://devblogs.microsoft.com/nuget/Package-identity-and-trust/
This PEP specifies the NuGet approach of authorized reservation across a flat
namespace for the following reasons:
* Causing churn for the community is a hard blocker.
* The NPM approach has the potential to cause confusion for users if we allow
unscoped names. Our community has chosen to normalize separator characters
and so ``@aws/s3`` would likely be confused with ``@aws-s3``.
Approval Process
----------------
PyPI has been understaffed, receiving the first `dedicated specialist`__ in
July 2024. Due to lack of resources, user support has been lacking for
`package name claims <https://discuss.python.org/t/27436/19>`__,
`organization requests <https://discuss.python.org/t/33764/15>`__,
`storage limit increases <https://discuss.python.org/t/54035>`__,
and even `account recovery <https://discuss.python.org/t/43422/122>`__.
__ https://pyfound.blogspot.com/2024/07/announcing-our-new-pypi-support.html
The `default policy <grant-approval-criteria_>`_ of only allowing
`corporate organizations <corp-orgs_>`_ to reserve namespaces (except in
specific scenarios) provides the following benefits:
* PyPI would have a constant source of funding for support specialists,
infrastructure maintenance and new features.
* Although each application would require independent review, less human
feedback would be required because the process to approve a paid organization
already bestows a certain amount of trust.
Terminology
===========
The keywords "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as
described in :rfc:`2119`.
Organization
`Organizations <orgs_>` are entities that own projects and have various
users associated with them. Only organizations have access to the namespace
application form.
Corporate Organization
`Corporate organizations <corp-orgs_>` are organizations that have paid for
special functionality on PyPI.
Public Namespace
A `public <public-namespaces_>`_ namespace allows for uploads from any
project owner.
Private Namespace
A private namespace only allows uploads from an owner of the namespace.
Specification
=============
`Organizations <orgs_>`_ (NOT regular users) MAY reserve one or more
namespaces. Such reservations neither confer ownership nor grant special
privileges to existing projects.
.. _naming:
Naming
------
A namespace MUST be a `valid`__ project name and `normalized`__ internally e.g.
``foo.bar`` would become ``foo-bar``. The user facing namespace (e.g. in UI
tooltips) MUST preserve the original pre-normalized text as defined during
reservation.
__ https://packaging.python.org/en/latest/specifications/name-normalization/#name-format
__ https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization
Grant Semantics
---------------
A namespace grant bestows ownership over the following:
1. A project matching the namespace itself such as the placeholder package
`microsoft <https://pypi.org/project/microsoft/>`__.
2. Projects that start with the namespace followed by a hyphen. For example,
the namespace ``foo`` would match the normalized project name ``foo-bar``
but not the project name ``foobar``.
Package name matching acts upon the `normalized <naming_>`_ namespace.
Namespaces are per-package repository and SHALL NOT be shared between
repositories.
Grant Types
-----------
There are two types of grants.
.. _root-grant:
Root Grant
''''''''''
Only `organizations <orgs_>`_ have the ability to submit requests for namespace
grants. An organization gets a root grant for every accepted request. This
grant may produce any number of `child grants <child-grant_>`_.
.. _child-grant:
Child Grant
'''''''''''
A child grant is created by the owner of a `root grant <root-grant_>`_. The
child namespace MUST be prefixed by the root grant namespace followed by a
hyphen. For example, ``google-cloud`` would be a valid child of the root
namespace ``google``.
Child grants cannot have their own child grants.
.. _grant-ownership:
Grant Ownership
---------------
The owner of a grant may allow any number of other organizations to use the
grant. The grants behave as if they were owned by the organization. The owner
may revoke this permission at any time.
The owner may transfer ownership to another organization. If the organization
is a corporate organization, the target for transfer must also be. Settings for
permitted organizations are transferred as well.
.. _uploads:
Uploads
-------
If the following criteria are all true for a given upload:
1. The project does not yet exist.
2. The name matches a reserved namespace.
3. The project is not owned by an organization with an active grant for the
namespace.
Then the upload MUST fail with a 403 HTTP status code.
.. _user-interface:
User Interface
--------------
Every page for a particular release
(`example <https://pypi.org/project/google-cloud-compute/1.19.2/>`__)
that both matches an active namespace grant and is tied to an
`owner <grant-ownership_>`_
MUST receive a special indicator that signifies this tie.
The UI also MUST indicate what the prefix is (NuGet does not do this) and this
value MUST match the ``namespace`` key in the `API <repository-metadata_>`_.
Repositories SHOULD have a dedicated page that enumerates every active
namespace grant and which organization(s) own it.
.. _public-namespaces:
Public Namespaces
-----------------
The owner of a grant may choose to allow others the ability to release new
projects with the associated namespace. Doing so MUST allow
`uploads <uploads_>`_ for new projects matching the namespace from any user
but such releases MUST NOT have the `visual indicator <user-interface_>`_.
It is possible for the `owner <grant-ownership_>`_ of a namespace to both make
it public and allow other organizations to use it. In this case, the permitted
organizations have no special permissions and are essentially only public.
Root grants given to `community projects <grant-approval-criteria_>`_ SHALL
always be public.
When a `child grant <child-grant_>`_ is created, its public status SHALL be
inherited from the `root grant <root-grant_>`_. Owners of child grants MAY
make them public at any time. If a grant is public, it MUST NOT be made private
unless the owner of the grant is the owner of every project that matches the
namespace.
.. _repository-metadata:
Repository Metadata
-------------------
To allow installers and other tooling insight into this project-level metadata
of a namespaced project, the :pep:`JSON API <691>` version will be incremented
and support new keys for the project endpoint.
The ``owner`` key SHOULD be added and refer to the owner of the project,
whether an organization or a user.
The ``namespace`` key MAY be added and MUST be ``null`` if the project does not
match an active namespace grant. If the project does match a namespace grant,
the value MUST be a mapping with the following keys:
* ``name``: This is the associated `normalized <naming_>`_ namespace e.g.
``foo-bar``. If the owner of the project owns multiple matching grants then
this MUST be the namespace with the most number of characters. For example,
if the project name matched both ``foo-bar`` and ``foo-bar-baz`` then this
key would be the latter.
* ``owners``: This is an array of organizations that
`own <grant-ownership_>`_ the grant. This is useful for tools that wish to
make a distinction between official and community packages by checking if
the array contains the project ``owner``.
* ``public``: This is a boolean indicating whether the namespace is
`public <public-namespaces_>`_.
The presence of the ``namespace`` key indicates support for this PEP.
Grant Removal
-------------
If a grant is shared with other organizations, the owner organization MUST
initiate a transfer as a prerequisite for organization deletion.
If a grant is not shared, the owner may unclaim the namespace in either of the
following circumstances:
* The organization manually removes themselves as the owner.
* The organization is deleted.
When a reserved namespace becomes unclaimed, repositories:
1. MUST remove the `visual indicator <user-interface_>`_
2. MUST remove the ``namespace`` key in the `API <repository-metadata_>`_
Grant Applications
------------------
Submission
''''''''''
Only `organizations <orgs_>`_ have access to the page for submitting grant
applications. Reviews of `corporate organizations <corp-orgs_>`_ applications
are prioritized.
.. _grant-approval-criteria:
Approval Criteria
'''''''''''''''''
1. The namespace MUST NOT be something common like ``tool`` or ``apps``.
2. The namespace SHOULD be greater than three characters.
3. The namespace SHOULD properly and clearly identify the reservation owner.
4. The organization SHOULD be actively using the namespace.
5. There SHOULD be evidence that *not* reserving the namespace may cause
ambiguity, confusion, or other harm to the community.
Organizations that are not `corporate organizations <corp-orgs_>`_ MUST
represent one of the following:
* Large, popular open-source projects with many packages [2]_
* Universities that actively publish packages
* Government organizations that actively publish packages
* NPOs/NGOs that actively publish packages like
`Our World in Data <https://github.com/owid>`__
Backwards Compatibility
=======================
There are no intrinsic concerns because there is still a flat namespace and
installers need no modification. Additionally, many projects have already
chosen to signal a shared purpose with a prefix like `typeshed has done`__.
__ https://github.com/python/typeshed/issues/2491#issuecomment-578456045
Security Implications
=====================
* Although users will no longer see the visual indicator when a namespace
becomes unclaimed, external consumers of metadata may have difficulty
scraping the user facing
`enumeration <user-interface_>`_ of grants to verify current ownership.
* There is an opportunity to build on top of :pep:`740` and :pep:`480` so that
one could prove cryptographically that a specific release came from an owner
of the associated namespace. This PEP makes no effort to describe how this
will happen other than that work is planned for the future.
How to Teach This
=================
For organizations, we will document how to reserve namespaces, what the
benefits are and pricing.
For consumers of packages we will document the indicator on release pages, how
metadata is exposed in the `API <repository-metadata_>`_ and potentially in
future note tooling that supports utilizing namespaces to provide extra
security guarantees during installation.
Reference Implementation
========================
None at this time.
Rejected Ideas
==============
Organization Scoping
--------------------
The primary motivation for this PEP is to reduce dependency confusion attacks
and NPM-style scoping with an allowance of the legacy flat namespace would
increase the risk. If documentation instructed a user to install ``bar`` in the
namespace ``foo`` then the user must be careful to install ``@foo/bar`` and not
``foo-bar``, or vice versa. The Python packaging ecosystem has normalization
rules for names in order to maximize the ease of communication and this would
be a regression.
The runtime environment of Python is also not conducive to scoping. Whereas
multiple versions of the same JavaScript package may coexist, Python only
allows a single global namespace. Barring major changes to the language itself,
this is nearly impossible to change. Additionally, users have come to expect
that the package name is usually the same as what they would import and
eliminating the flat namespace would do away with that convention.
Scoping would be particularly affected by organization changes which are bound
to happen over time. An organization may change their name due to internal
shuffling, an acquisition, or any other reason. Whenever this happens every
project they own would in effect be renamed which would cause unnecessary
confusion for users, frequently.
Users have come to expect that package names may be typed without worry of
conflicting shell syntax and any namespace solution would pose challenges:
* Copying NPM's syntax (e.g. ``@foo/bar``) would alienate a large number of
Windows users because the ``@`` character is considered special in
`PowerShell`__.
* Starting names with a ``/`` would conflict with the common installer
capability of accepting paths without URI ``file://`` syntax.
* Starting names with a ``//`` like Bazel
`target patterns <https://bazel.build/run/build#specifying-build-targets>`__
would be confusing to users because the current normalization standard
eliminates consecutive separator characters.
__ https://learn.microsoft.com/en-us/powershell/scripting/lang-spec/chapter-07?view=powershell-7.4#717--operator
Finally, the disruption to the community would be massive because it would
require an update from every package manager, security scanner, IDE, etc. New
packages released with the scoping would be incompatible with older tools and
would cause confusion for users along with frustration from maintainers having
to triage such complaints.
Allow Non-Public Namespaces for Community Projects
--------------------------------------------------
This PEP enforces that the discretionary namespace grants for community
projects are `public <public-namespaces_>`_. This is almost always desired by
such projects and prevents the following situations:
* A perceived reduction in openness of community projects, for example if a
project was taken over by a business entity there may be a desire for it to
prevent the creation of new packages matching the namespace.
* When an existing community project with plugins (such as MkDocs) chooses to
reserve a namespace, future plugins that are officially adopted would have to
change their name. This would cause a massive disruption to users and reset
usage statistics. The workaround is to have a new package that is advertised
which would depend on the real package but this is suboptimal.
Open Issues
===========
None at this time.
Footnotes
=========
.. [1] The following shows the package prefixes for the major cloud providers:
- Amazon: `aws-cdk- <https://docs.aws.amazon.com/cdk/api/v2/python/>`__
- Google: `google-cloud- <https://github.com/googleapis/google-cloud-python/tree/main/packages>`__
and others based on ``google-``
- Microsoft: `azure- <https://github.com/Azure/azure-sdk-for-python/tree/main/sdk>`__
.. [2] Some examples of projects that have many packages with a common prefix:
2024-08-19 20:58:32 -04:00
- `Django <https://www.djangoproject.com>`__ is one of the most widely used
web frameworks in existence. They have the concept of `reusable apps`__,
which are commonly installed via
`third-party packages <https://djangopackages.org>`__ that implement a
subset of functionality to extend Django-based websites. These packages
are by convention prefixed by ``django-`` or ``dj-``.
- `Project Jupyter <https://jupyter.org>`__ is devoted to the development of
tooling for sharing interactive documents. They support `extensions`__
2024-08-19 20:58:32 -04:00
which in most cases (and in all cases for officially maintained
extensions) are prefixed by ``jupyter-``.
- `pytest <https://docs.pytest.org>`__ is Python's most popular testing
framework. They have the concept of `plugins`__ which may be developed by
anyone and by convention are prefixed by ``pytest-``.
- `MkDocs <https://www.mkdocs.org>`__ is a documentation framework based on
Markdown files. They also have the concept of
`plugins <https://www.mkdocs.org/dev-guide/plugins/>`__ which may be
developed by anyone and are usually prefixed by ``mkdocs-``.
- `Sphinx <https://www.sphinx-doc.org>`__ is a documentation framework
popular for large technical projects such as
`Swift <https://www.swift.org>`__ and Python itself. They have
the concept of `extensions`__ which are prefixed by ``sphinxcontrib-``,
many of which are maintained within a
`dedicated organization <https://github.com/sphinx-contrib>`__.
- `OpenTelemetry <https://opentelemetry.io>`__ is an open standard for
observability with `official packages`__ for the core APIs and SDK with
`third-party packages`__ to collect data from various sources. All
packages are prefixed by ``opentelemetry-`` with child prefixes in the
form ``opentelemetry-<component>-<name>-``.
2024-08-19 20:58:32 -04:00
- `Apache Airflow <https://airflow.apache.org>`__ is a platform to
programmatically orchestrate tasks as directed acyclic graphs (DAGs).
They have the concept of `plugins`__, and also `providers`__ which are
prefixed by ``apache-airflow-providers-``.
__ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/
__ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
2024-08-19 20:58:32 -04:00
__ https://docs.pytest.org/en/stable/how-to/writing_plugins.html
__ https://www.sphinx-doc.org/en/master/usage/extensions/index.html
__ https://github.com/open-telemetry/opentelemetry-python
__ https://github.com/open-telemetry/opentelemetry-python-contrib
2024-08-19 20:58:32 -04:00
__ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html
__ https://airflow.apache.org/docs/apache-airflow-providers/index.html
.. _orgs: https://blog.pypi.org/posts/2023-04-23-introducing-pypi-organizations/
.. _corp-orgs: https://docs.pypi.org/organization-accounts/pricing-and-payments/#corporate-organizations
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.