PEP: 752 Title: Package repository namespaces Author: Ofek Lev Sponsor: Barry Warsaw PEP-Delegate: Donald Stufft Discussions-To: https://discuss.python.org/t/61227 Status: Draft Type: Standards Track Topic: Packaging Created: 13-Aug-2024 Post-History: `18-Aug-2024 `__, Abstract ======== This PEP specifies a way for organizations to reserve package name prefixes for future uploads. "Namespaces are one honking great idea -- let's do more of those!" - :pep:`20` Motivation ========== The current ecosystem lacks a way for projects with many packages to signal a verified pattern of ownership. Some examples: * `Typeshed `__ is a community effort to maintain type stubs for various packages. The stub packages they maintain mirror the package name they target and are prefixed by ``types-``. For example, the package ``requests`` has a stub that users would depend on called ``types-requests``. * Major cloud providers like Amazon, Google and Microsoft have a common prefix for each feature's corresponding package [1]_. For example, most of Google's packages are prefixed by ``google-cloud-`` e.g. ``google-cloud-compute`` for `using virtual machines `__. * Many projects [2]_ support a model where some packages are officially maintained and third-party developers are encouraged to participate by creating their own. For example, `Datadog `__ offers observability as a service for organizations at any scale. The `Datadog Agent `__ ships out-of-the-box with `official integrations `__ for many products, like various databases and web servers, which are distributed as Python packages that are prefixed by ``datadog-``. There is support for creating `third-party integrations`__ which customers may run. __ https://docs.datadoghq.com/developers/integrations/agent_integration/ Such projects are uniquely vulnerable to attacks stemming from malicious actors squatting anticipated package names. For example, say a new product is released for which monitoring would be valuable. It would be reasonable to assume that Datadog would eventually support it as an official integration. It takes a nontrivial amount of time to deliver such an integration due to roadmap prioritization and the time required for implementation. It would be impossible to reserve the name of every potential package so in the interim an attacker may create a legitimate-appearing package which would execute malicious code at runtime. Not only are users more likely to install such packages but doing so taints the perception of the entire project. Namespacing also would drastically reduce the incidence of `typosquatting `__ because typos would have to be in the prefix itself which is `normalized `_ and likely to be a short, well-known identifier like ``aws-``. Rationale ========= Tolerance for Disruption ------------------------ Other package ecosystems have generally solved this problem by taking one of two approaches: either minimizing or maximizing backwards compatibility. * `NPM `__ has the concept of `scoped packages `__ which were `introduced`__ primarily to combat there being a dearth of available good package names (whether a real or perceived phenomenon). When a user or organization signs up they are given a scope that matches their name. For example, the `package `__ for using Google Cloud Storage is ``@google-cloud/storage`` where ``@google-cloud/`` is the scope. Regular user accounts (non-organization) may publish `unscoped`__ packages for public use. This approach has the lowest amount of backwards compatibility because every installer and tool has to be modified to account for scopes. * `NuGet `__ has the concept of `package ID prefix reservation`__ which was `introduced`__ primarily to satisfy users wishing to know where a package came from. A package name prefix may be reserved for use by one or more owners. Every reserved package has a special indication `on its page `__ to communicate this. After reservation, any upload with a reserved prefix will fail if the user is not an owner of the prefix. Existing packages that have a prefix that is owned may continue to release as usual. This approach has the highest amount of backwards compatibility because only modifications to indices like PyPI are required and installers do not need to change. __ https://blog.npmjs.org/post/116936804365/solving-npms-hard-problem-naming-packages __ https://docs.npmjs.com/package-scope-access-level-and-visibility __ https://learn.microsoft.com/en-us/nuget/nuget-org/id-prefix-reservation __ https://devblogs.microsoft.com/nuget/Package-identity-and-trust/ This PEP specifies the NuGet approach of authorized reservation across a flat namespace for the following reasons: * Causing churn for the community is a hard blocker. * The NPM approach has the potential to cause confusion for users if we allow unscoped names. Our community has chosen to normalize separator characters and so ``@aws/s3`` would likely be confused with ``@aws-s3``. Approval Process ---------------- PyPI has been understaffed, receiving the first `dedicated specialist`__ in July 2024. Due to lack of resources, user support has been lacking for `package name claims `__, `organization requests `__, `storage limit increases `__, and even `account recovery `__. __ https://pyfound.blogspot.com/2024/07/announcing-our-new-pypi-support.html The `default policy `_ of only allowing `corporate organizations `_ to reserve namespaces (except in specific scenarios) provides the following benefits: * PyPI would have a constant source of funding for support specialists, infrastructure maintenance and new features. * Although each application would require independent review, less human feedback would be required because the process to approve a paid organization already bestows a certain amount of trust. Specification ============= `Organizations `_ (NOT regular users) MAY reserve one or more namespaces. Such reservations neither confer ownership nor grant special privileges to existing packages. .. _naming: Naming ------ A namespace MUST be a `valid`__ project name and `normalized`__ internally e.g. ``foo.bar`` would become ``foo-bar``. The user facing namespace (e.g. in UI tooltips) MUST preserve the original pre-normalized text as defined during reservation. __ https://packaging.python.org/en/latest/specifications/name-normalization/#name-format __ https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization Grant Semantics --------------- A namespace grant bestows ownership over the following: 1. A package matching the namespace itself such as the placeholder package `microsoft `__. 2. Packages that start with the namespace followed by a hyphen. For example, the namespace ``foo`` would match the package ``foo-bar`` but not the package ``foobar``. Package name matching acts upon the `normalized `_ namespace. Namespaces are per-repository and MUST NOT be shared between repositories. Grant Types ----------- There are two types of grants. .. _root-grant: Root Grant '''''''''' Only `organizations `_ have the ability to submit requests for namespace grants. An organization gets a root grant for every accepted request. This grant may produce any number of `child grants `_. .. _child-grant: Child Grant ''''''''''' A child grant is created by the owner of a `root grant `_. The child namespace MUST be prefixed by the root grant namespace followed by a hyphen. For example, ``google-cloud`` would be a valid child of the root namespace ``google``. Child grants cannot have their own child grants. .. _grant-ownership: Grant Ownership --------------- The owner of a grant may allow any number of other organizations to use the grant. The grants behave as if they were owned by the organization. The owner may revoke this permission at any time. The owner may transfer ownership to another organization. If the organization is a corporate organization, the target for transfer must also be. Settings for permitted organizations are transferred as well. .. _uploads: Uploads ------- If the following criteria are all true for a given upload: 1. The package does not yet exist. 2. The name matches a reserved namespace. 3. The user is not authorized to use the namespace by the owner of the namespace. Then the upload MUST fail with a 403 HTTP status code. .. _user-interface: User Interface -------------- Every page for a particular release (`example `__) that both matches an active namespace grant and is tied to an `owner `_ MUST receive a special indicator that signifies this tie. The UI also MUST indicate what the prefix is (NuGet does not do this) and this value MUST match the ``namespace`` key in the `API `_. Repositories SHOULD have a dedicated page that enumerates every active namespace grant and which organization(s) own it. .. _public-namespaces: Public Namespaces ----------------- The owner of a grant may choose to allow others the ability to release new packages with the associated namespace. Doing so MUST allow `uploads `_ for new packages matching the namespace from any user but such releases MUST NOT have the `visual indicator `_. It is possible for the `owner `_ of a namespace to both make it public and allow other organizations to use it. In this case, the permitted organizations have no special permissions and are essentially only public. Root grants given to `community projects `_ SHALL always be public. .. _repository-metadata: Repository Metadata ------------------- To allow installers and other tooling insight into this metadata for a given artifact upload of a namespaced package, the :pep:`JSON API <691>` MUST include the following keys: * ``namespace``: This is the associated `normalized `_ namespace e.g. ``foo-bar``. If the namespace matches a child grant and the user happens to be authorized for both the child and the root grant, this MUST be the namespace associated with the child grant. * ``owner``: This is the organization with which the user is associated and owner of the grant. If the namespace is `public `_ and the user is not part of a `permitted `_ organization, this key MUST be set to ``__public__``. This is useful for tools that wish to make a distinction between official and community packages. The `Simple API`__ MAY include the aforementioned keys as attributes, for example: __ https://packaging.python.org/en/latest/specifications/simple-repository-api/#base-html-api .. code-block:: html ... Grant Removal ------------- If a grant is shared with other organizations, the owner organization MUST initiate a transfer as a prerequisite for organization deletion. If a grant is not shared, the owner may unclaim the namespace in either of the following circumstances: * The organization manually removes themselves as the owner. * The organization is deleted. When a reserved namespace becomes unclaimed, repositories: 1. MUST remove the `visual indicator `_ 2. MUST NOT modify past `release metadata `_ Grant Applications ------------------ Submission '''''''''' Only `organizations `_ have access to the page for submitting grant applications. Reviews of `corporate organizations `_ applications are prioritized. .. _grant-approval-criteria: Approval Criteria ''''''''''''''''' 1. The namespace MUST NOT be something common like ``tool`` or ``apps``. 2. The namespace SHOULD be greater than three characters. 3. The namespace SHOULD properly and clearly identify the reservation owner. 4. The organization SHOULD be actively using the namespace. 5. There SHOULD be evidence that *not* reserving the namespace may cause ambiguity, confusion, or other harm to the community. Organizations that are not `corporate organizations `_ MUST represent one of the following: * Large, popular open-source projects with many packages [2]_ * Universities that actively publish packages * Government organizations that actively publish packages * NPOs/NGOs that actively publish packages like `Our World in Data `__ Backwards Compatibility ======================= There are no intrinsic concerns because there is still a flat namespace and installers need no modification. Additionally, many projects have already chosen to signal a shared purpose with a prefix like `typeshed has done`__. __ https://github.com/python/typeshed/issues/2491#issuecomment-578456045 Security Implications ===================== * Although users will no longer see the visual indicator when a namespace becomes unclaimed, external consumers of metadata may have difficulty scraping the user facing `enumeration `_ of grants to verify current ownership. * There is an opportunity to build on top of :pep:`740` and :pep:`480` so that one could prove cryptographically that a specific release came from an owner of the associated namespace. This PEP makes no effort to describe how this will happen other than that work is planned for the future. How to Teach This ================= For organizations, we will document how to reserve namespaces, what the benefits are and pricing. For consumers of packages we will document the indicator on release pages, how metadata is exposed in the `API `_ and potentially in future note tooling that supports utilizing namespaces to provide extra security guarantees during installation. Reference Implementation ======================== None at this time. Rejected Ideas ============== Allow Non-Public Namespaces for Community Projects -------------------------------------------------- This PEP enforces that the discretionary namespace grants for community projects are `public `_. This is almost always desired by such projects and prevents the following situations: * A perceived reduction in openness of community projects, for example if a project was taken over by a business entity there may be a desire for it to prevent the creation of new packages matching the namespace. * When an existing community project with plugins (such as MkDocs) chooses to reserve a namespace, future plugins that are officially adopted would have to change their name. This would cause a massive disruption to users and reset usage statistics. The workaround is to have a new package that is advertised which would depend on the real package but this is suboptimal. Open Issues =========== None at this time. Footnotes ========= .. [1] The following shows the package prefixes for the major cloud providers: - Amazon: `aws-cdk- `__ - Google: `google-cloud- `__ and others based on ``google-`` - Microsoft: `azure- `__ .. [2] Some examples of projects that have many packages with a common prefix: - `MkDocs `__ is a documentation framework based on Markdown files. They have the concept of `plugins `__ which may be developed by anyone and by convention are prefixed by ``mkdocs-``. - `Project Jupyter `__ is devoted to the development of tooling for sharing interactive documents. They support `extensions`__ which in most cases (and in all cases for officially maintained extensions) are prefixed by ``jupyter-``. - `OpenTelemetry `__ is an open standard for observability with `official packages`__ for the core APIs and SDK with `third-party packages`__ to collect data from various sources. All packages are prefixed by ``opentelemetry-`` with child prefixes in the form ``opentelemetry---``. __ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html __ https://github.com/open-telemetry/opentelemetry-python __ https://github.com/open-telemetry/opentelemetry-python-contrib .. _orgs: https://blog.pypi.org/posts/2023-04-23-introducing-pypi-organizations/ .. _corp-orgs: https://docs.pypi.org/organization-accounts/pricing-and-payments/#corporate-organizations Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.