diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index f8888c6a5..89aca1b8b 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -631,6 +631,7 @@ peps/pep-0749.rst @JelleZijlstra # ... peps/pep-0750.rst @gvanrossum @lysnikolaou peps/pep-0751.rst @brettcannon +peps/pep-0752.rst @warsaw # ... # peps/pep-0754.rst # ... diff --git a/peps/pep-0752.rst b/peps/pep-0752.rst new file mode 100644 index 000000000..fa4a0914e --- /dev/null +++ b/peps/pep-0752.rst @@ -0,0 +1,429 @@ +PEP: 752 +Title: Package repository namespaces +Author: Ofek Lev +Sponsor: Barry Warsaw +PEP-Delegate: Donald Stufft +Status: Draft +Type: Standards Track +Topic: Packaging +Created: 13-Aug-2024 + +Abstract +======== + +This PEP specifies a way for organizations to reserve package name prefixes +for future uploads. + + "Namespaces are one honking great idea -- let's do more of + those!" - :pep:`20` + +Motivation +========== + +The current ecosystem lacks a way for projects with many packages to signal a +verified pattern of ownership. Some examples: + +* `Typeshed `__ is a community effort to + maintain type stubs for various packages. The stub packages they maintain + mirror the package name they target and are prefixed by ``types-``. For + example, the package ``requests`` has a stub that users would depend on + called ``types-requests``. +* Major cloud providers like Amazon, Google and Microsoft have a common prefix + for each feature's corresponding package [1]_. For example, most of Google's + packages are prefixed by ``google-cloud-`` e.g. ``google-cloud-compute`` for + `using virtual machines `__. +* Many projects [2]_ support a model where some packages are officially + maintained and third-party developers are encouraged to participate by + creating their own. For example, `Datadog `__ + offers observability as a service for organizations at any scale. The + `Datadog Agent `__ ships out-of-the-box + with + `official integrations `__ + for many products, like various databases and web servers, which are + distributed as Python packages that are prefixed by ``datadog-``. There is + support for creating `third-party integrations`__ which customers may run. + +__ https://docs.datadoghq.com/developers/integrations/agent_integration/ + +Such projects are uniquely vulnerable to attacks stemming from malicious actors +squatting anticipated package names. For example, say a new product is released +for which monitoring would be valuable. It would be reasonable to assume that +Datadog would eventually support it as an official integration. It takes a +nontrivial amount of time to deliver such an integration due to roadmap +prioritization and the time required for implementation. It would be impossible +to reserve the name of every potential package so in the interim an attacker +may create a legitimate-appearing package which would execute malicious code at +runtime. Not only are users more likely to install such packages but doing so +taints the perception of the entire project. + +Namespacing also would drastically reduce the incidence of +`typosquatting `__ +because typos would have to be in the prefix itself which is +`normalized `_ and likely to be a short, well-known identifier like +``aws-``. + +Rationale +========= + +Tolerance for Disruption +------------------------ + +Other package ecosystems have generally solved this problem by taking one of +two approaches: either minimizing or maximizing backwards compatibility. + +* `NPM `__ has the concept of + `scoped packages `__ which were + `introduced`__ primarily to combat there being a dearth of available good + package names (whether a real or perceived phenomenon). When a user or + organization signs up they are given a scope that matches their name. For + example, the + `package `__ for using + Google Cloud Storage is ``@google-cloud/storage`` where ``@google-cloud/`` is + the scope. Regular user accounts (non-organization) may publish `unscoped`__ + packages for public use. + This approach has the lowest amount of backwards compatibility because every + installer and tool has to be modified to account for scopes. +* `NuGet `__ has the concept of + `package ID prefix reservation`__ which was + `introduced`__ primarily to satisfy users wishing to know where a package + came from. A package name prefix may be reserved for use by one or more + owners. Every reserved package has a special indication + `on its page `__ to + communicate this. After reservation, any upload with a reserved prefix will + fail if the user is not an owner of the prefix. Existing packages that have a + prefix that is owned may continue to release as usual. This approach has the + highest amount of backwards compatibility because only modifications to + indices like PyPI are required and installers do not need to change. + +__ https://blog.npmjs.org/post/116936804365/solving-npms-hard-problem-naming-packages +__ https://docs.npmjs.com/package-scope-access-level-and-visibility +__ https://learn.microsoft.com/en-us/nuget/nuget-org/id-prefix-reservation +__ https://devblogs.microsoft.com/nuget/Package-identity-and-trust/ + +This PEP specifies the NuGet approach of authorized reservation across a flat +namespace for the following reasons: + +* Causing churn for the community is a hard blocker. +* The NPM approach has the potential to cause confusion for users if we allow + unscoped names. Our community has chosen to normalize separator characters + and so ``@aws/s3`` would likely be confused with ``@aws-s3``. + +Approval Process +---------------- + +PyPI has been understaffed, receiving the first `dedicated specialist`__ in +July 2024. Due to lack of resources, user support has been lacking for +`package name claims `__, +`organization requests `__, +`storage limit increases `__, +and even `account recovery `__. + +__ https://pyfound.blogspot.com/2024/07/announcing-our-new-pypi-support.html + +The `default policy `_ of only allowing +`corporate organizations `_ to reserve namespaces (except in +specific scenarios) provides the following benefits: + +* PyPI would have a constant source of funding for support specialists, + infrastructure maintenance and new features. +* Although each application would require independent review, less human + feedback would be required because the process to approve a paid organization + already bestows a certain amount of trust. + +Specification +============= + +`Organizations `_ (NOT regular users) MAY reserve one or more +namespaces. Such reservations neither confer ownership nor grant special +privileges to existing packages. + +.. _naming: + +Naming +------ + +A namespace MUST be a `valid`__ project name and `normalized`__ internally e.g. +``foo.bar`` would become ``foo-bar``. The user facing namespace (e.g. in UI +tooltips) MUST preserve the original pre-normalized text as defined during +reservation. + +__ https://packaging.python.org/en/latest/specifications/name-normalization/#name-format +__ https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization + +Grant Semantics +--------------- + +A namespace grant bestows ownership over the following: + +1. A package matching the namespace itself such as the placeholder package + `microsoft `__. +2. Packages that start with the namespace followed by a hyphen. For example, + the namespace ``foo`` would match the package ``foo-bar`` but not the + package ``foobar``. + +Package name matching acts upon the `normalized `_ namespace. + +Namespaces are per-repository and MUST NOT be shared between repositories. + +Grant Types +----------- + +There are two types of grants. + +.. _root-grant: + +Root Grant +'''''''''' + +Only `organizations `_ have the ability to submit requests for namespace +grants. An organization gets a root grant for every accepted request. This +grant may produce any number of `child grants `_. + +.. _child-grant: + +Child Grant +''''''''''' + +A child grant is created by the owner of a `root grant `_. The +child namespace MUST be prefixed by the root grant namespace followed by a +hyphen. For example, ``google-cloud`` would be a valid child of the root +namespace ``google``. + +Child grants cannot have their own child grants. + +.. _grant-ownership: + +Grant Ownership +--------------- + +The owner of a grant may allow any number of other organizations to use the +grant. The grants behave as if they were owned by the organization. The owner +may revoke this permission at any time. + +The owner may transfer ownership to another organization. If the organization +is a corporate organization, the target for transfer must also be. Settings for +permitted organizations are transferred as well. + +.. _uploads: + +Uploads +------- + +If the following criteria are all true for a given upload: + +1. The package does not yet exist. +2. The name matches a reserved namespace. +3. The user is not authorized to use the namespace by the owner of the + namespace. + +Then the upload MUST fail with a 403 HTTP status code. + +.. _user-interface: + +User Interface +-------------- + +Every page for a particular release +(`example `__) +that both matches an active namespace grant and is tied to an +`owner `_ +MUST receive a special indicator that signifies this tie. + +The UI also MUST indicate what the prefix is (NuGet does not do this) and this +value MUST match the ``namespace`` key in the `API `_. + +Repositories SHOULD have a dedicated page that enumerates every active +namespace grant and which organization(s) own it. + +.. _public-namespaces: + +Public Namespaces +----------------- + +The owner of a grant may choose to allow others the ability to release new +packages with the associated namespace. Doing so MUST allow +`uploads `_ for new packages matching the namespace from any user +but such releases MUST NOT have the `visual indicator `_. + +It is possible for the `owner `_ of a namespace to both make +it public and allow other organizations to use it. In this case, the permitted +organizations have no special permissions and are essentially only public. + +Root grants given to `community projects `_ SHALL +always be public. + +.. _repository-metadata: + +Repository Metadata +------------------- + +To allow installers and other tooling insight into this metadata for a given +artifact upload of a namespaced package, the :pep:`JSON API <691>` MUST include +the following keys: + +* ``namespace``: This is the associated `normalized `_ + namespace e.g. ``foo-bar``. If the namespace matches a child grant and the + user happens to be authorized for both the child and the root grant, this + MUST be the namespace associated with the child grant. +* ``owner``: This is the organization with which the user is associated and + owner of the grant. If the namespace is `public `_ and + the user is not part of a `permitted `_ organization, this + key MUST be set to ``__public__``. This is useful for tools that wish to make + a distinction between official and community packages. + +The `Simple API`__ MAY include the aforementioned keys as attributes, for +example: + +__ https://packaging.python.org/en/latest/specifications/simple-repository-api/#base-html-api + +.. code-block:: html + + ... + +Grant Removal +------------- + +If a grant is shared with other organizations, the owner organization MUST +initiate a transfer as a prerequisite for organization deletion. + +If a grant is not shared, the owner may unclaim the namespace in either of the +following circumstances: + +* The organization manually removes themselves as the owner. +* The organization is deleted. + +When a reserved namespace becomes unclaimed, repositories: + +1. MUST remove the `visual indicator `_ +2. MUST NOT modify past `release metadata `_ + +Grant Applications +------------------ + +Submission +'''''''''' + +Only `organizations `_ have access to the page for submitting grant +applications. Reviews of `corporate organizations `_ applications +are prioritized. + +.. _grant-approval-criteria: + +Approval Criteria +''''''''''''''''' + +1. The namespace MUST NOT be something common like ``tool`` or ``apps``. +2. The namespace SHOULD be greater than three characters. +3. The namespace SHOULD properly and clearly identify the reservation owner. +4. The organization SHOULD be actively using the namespace. +5. There SHOULD be evidence that *not* reserving the namespace may cause + ambiguity, confusion, or other harm to the community. + +Organizations that are not `corporate organizations `_ MUST +represent one of the following: + +* Large, popular open-source projects with many packages [2]_ +* Universities that actively publish packages +* Government organizations that actively publish packages +* NPOs/NGOs that actively publish packages like + `Our World in Data `__ + +Backwards Compatibility +======================= + +There are no intrinsic concerns because there is still a flat namespace and +installers need no modification. Additionally, many projects have already +chosen to signal a shared purpose with a prefix like `typeshed has done`__. + +__ https://github.com/python/typeshed/issues/2491#issuecomment-578456045 + +Security Implications +===================== + +* Although users will no longer see the visual indicator when a namespace + becomes unclaimed, external consumers of metadata may have difficulty + scraping the user facing + `enumeration `_ of grants to verify current ownership. +* There is an opportunity to build on top of :pep:`740` and :pep:`480` so that + one could prove cryptographically that a specific release came from an owner + of the associated namespace. This PEP makes no effort to describe how this + will happen other than that work is planned for the future. + +How to Teach This +================= + +For organizations, we will document how to reserve namespaces, what the +benefits are and pricing. + +For consumers of packages we will document the indicator on release pages, how +metadata is exposed in the `API `_ and potentially in +future note tooling that supports utilizing namespaces to provide extra +security guarantees during installation. + +Reference Implementation +======================== + +None at this time. + +Rejected Ideas +============== + +Allow Non-Public Namespaces for Community Projects +-------------------------------------------------- + +This PEP enforces that the discretionary namespace grants for community +projects are `public `_. This is almost always desired by +such projects and prevents the following situations: + +* A perceived reduction in openness of community projects, for example if a + project was taken over by a business entity there may be a desire for it to + prevent the creation of new packages matching the namespace. +* When an existing community project with plugins (such as MkDocs) chooses to + reserve a namespace, future plugins that are officially adopted would have to + change their name. This would cause a massive disruption to users and reset + usage statistics. The workaround is to have a new package that is advertised + which would depend on the real package but this is suboptimal. + +Open Issues +=========== + +None at this time. + +Footnotes +========= + +.. [1] The following shows the package prefixes for the major cloud providers: + + - Amazon: `aws-cdk- `__ + - Google: `google-cloud- `__ + and others based on ``google-`` + - Microsoft: `azure- `__ + +.. [2] Some examples of projects that have many packages with a common prefix: + + - `MkDocs `__ is a documentation framework + based on Markdown files. They have the concept of + `plugins `__ which may be + developed by anyone and by convention are prefixed by ``mkdocs-``. + - `Project Jupyter `__ is devoted to the development of + tooling for sharing interactive documents. They support `extensions`__ + which in most cases (and in all cases for officially maintained extensions) + are prefixed by ``jupyter-``. + - `OpenTelemetry `__ is an open standard for + observability with `official packages`__ for the core APIs and SDK with + `third-party packages`__ to collect data from various sources. All + packages are prefixed by ``opentelemetry-`` with child prefixes in the + form ``opentelemetry---``. + +__ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html +__ https://github.com/open-telemetry/opentelemetry-python +__ https://github.com/open-telemetry/opentelemetry-python-contrib + +.. _orgs: https://blog.pypi.org/posts/2023-04-23-introducing-pypi-organizations/ +.. _corp-orgs: https://docs.pypi.org/organization-accounts/pricing-and-payments/#corporate-organizations + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.