PEP: 752 Title: Implicit namespaces for package repositories Author: Ofek Lev Sponsor: Barry Warsaw PEP-Delegate: Dustin Ingram Discussions-To: https://discuss.python.org/t/61227 Status: Draft Type: Standards Track Topic: Packaging Created: 13-Aug-2024 Post-History: `18-Aug-2024 `__, Abstract ======== This PEP specifies a way for organizations to reserve package name prefixes for future uploads. "Namespaces are one honking great idea -- let's do more of those!" - :pep:`20` Motivation ========== The current ecosystem lacks a way for projects with many packages to signal a verified pattern of ownership. Some examples: * `Typeshed `__ is a community effort to maintain type stubs for various packages. The stub packages they maintain mirror the package name they target and are prefixed by ``types-``. For example, the package ``requests`` has a stub that users would depend on called ``types-requests``. * Major cloud providers like Amazon, Google and Microsoft have a common prefix for each feature's corresponding package [1]_. For example, most of Google's packages are prefixed by ``google-cloud-`` e.g. ``google-cloud-compute`` for `using virtual machines `__. * Many projects [2]_ support a model where some packages are officially maintained and third-party developers are encouraged to participate by creating their own. For example, `Datadog `__ offers observability as a service for organizations at any scale. The `Datadog Agent `__ ships out-of-the-box with `official integrations `__ for many products, like various databases and web servers, which are distributed as Python packages that are prefixed by ``datadog-``. There is support for creating `third-party integrations`__ which customers may run. __ https://docs.datadoghq.com/developers/integrations/agent_integration/ Such projects are uniquely vulnerable to name-squatting attacks which can ultimately result in `dependency confusion`__. __ https://www.activestate.com/resources/quick-reads/dependency-confusion/ For example, say a new product is released for which monitoring would be valuable. It would be reasonable to assume that Datadog would eventually support it as an official integration. It takes a nontrivial amount of time to deliver such an integration due to roadmap prioritization and the time required for implementation. It would be impossible to reserve the name of every potential package so in the interim an attacker may create a package that appears legitimate which would execute malicious code at runtime. Not only are users more likely to install such packages but doing so taints the perception of the entire project. Although :pep:`708` attempts to address this attack vector, it is specifically about the case of multiple repositories being considered during dependency resolution and does not offer any protection to the aforementioned use cases. Namespacing also would drastically reduce the incidence of `typosquatting `__ because typos would have to be in the prefix itself which is `normalized `_ and likely to be a short, well-known identifier like ``aws-``. Rationale ========= Other package ecosystems have generally solved this problem by taking one of two approaches: either minimizing or maximizing backwards compatibility. * `NPM `__ has the concept of `scoped packages `__ which were `introduced`__ primarily to combat there being a dearth of available good package names (whether a real or perceived phenomenon). When a user or organization signs up they are given a scope that matches their name. For example, the `package `__ for using Google Cloud Storage is ``@google-cloud/storage`` where ``@google-cloud/`` is the scope. Regular user accounts (non-organization) may publish `unscoped`__ packages for public use. This approach has the lowest amount of backwards compatibility because every installer and tool has to be modified to account for scopes. * `NuGet `__ has the concept of `package ID prefix reservation`__ which was `introduced`__ primarily to satisfy users wishing to know where a package came from. A package name prefix may be reserved for use by one or more owners. Every reserved package has a special indication `on its page `__ to communicate this. After reservation, any upload with a reserved prefix will fail if the user is not an owner of the prefix. Existing packages that have a prefix that is owned may continue to release as usual. This approach has the highest amount of backwards compatibility because only modifications to indices like PyPI are required and installers do not need to change. __ https://blog.npmjs.org/post/116936804365/solving-npms-hard-problem-naming-packages __ https://docs.npmjs.com/package-scope-access-level-and-visibility __ https://learn.microsoft.com/en-us/nuget/nuget-org/id-prefix-reservation __ https://devblogs.microsoft.com/nuget/Package-identity-and-trust/ This PEP specifies the NuGet approach of authorized reservation across a flat namespace. Any solution that requires new package syntax must be built atop the existing flat namespace and therefore implicit namespaces acquired via a reservation mechanism would be a prerequisite to such explicit namespaces. Terminology =========== The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in :rfc:`2119`. Organization `Organizations `_ are entities that own projects and have various users associated with them. Grant A grant is a reservation of a namespace for a package repository. Open Namespace An `open `_ namespace allows for uploads from any project owner. Restricted Namespace A restricted namespace only allows uploads from an owner of the namespace. Parent Namespace A namespace's parent refers to the namespace without the trailing hyphenated component e.g. the parent of ``foo-bar`` is ``foo``. Child Namespace A namespace's child refers to the namespace with an additional trailing hyphenated component e.g. the child of ``foo`` is ``foo-bar``. Specification ============= .. _orgs: Organizations ------------- Any package repository that allows for the creation of projects (e.g. non-mirrors) MAY offer the concept of `organizations`__. Organizations are entities that own projects and have various users associated with them. __ https://blog.pypi.org/posts/2023-04-23-introducing-pypi-organizations/ Organizations MAY reserve one or more namespaces. Such reservations neither confer ownership nor grant special privileges to existing projects. .. _naming: Naming ------ A namespace MUST be a `valid`__ project name and `normalized`__ internally e.g. ``foo.bar`` would become ``foo-bar``. __ https://packaging.python.org/en/latest/specifications/name-normalization/#name-format __ https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization Semantics --------- A namespace grant bestows ownership over the following: 1. A project matching the namespace itself such as the placeholder package `microsoft `__. 2. Projects that start with the namespace followed by a hyphen. For example, the namespace ``foo`` would match the normalized project name ``foo-bar`` but not the project name ``foobar``. Package name matching acts upon the `normalized `_ namespace. Namespaces are per-package repository and SHALL NOT be shared between repositories. For example, if PyPI has a namespace ``microsoft`` that is owned by the company Microsoft, packages starting with ``microsoft-`` that come from other non-PyPI mirror repositories do not confer the same level of trust. Grants MUST NOT overlap. For example, if there is an existing grant for ``foo-bar`` then a new grant for ``foo`` would be forbidden. An overlap is determined by comparing the `normalized `_ proposed namespace with the normalized namespace of every existing root grant. Every comparison must append a hyphen to the end of the proposed and existing namespace. An overlap is detected when any existing namespace starts with the proposed namespace. .. _uploads: Uploads ------- If the following criteria are all true for a given upload: 1. The project does not yet exist. 2. The name matches a reserved namespace. 3. The project is not owned by an organization with an active grant for the namespace. Then the upload MUST fail with a 403 HTTP status code. .. _open-namespaces: Open Namespaces ----------------- The owner of a grant may choose to allow others the ability to release new projects with the associated namespace. Doing so MUST allow `uploads `_ for new projects matching the namespace from any user. It is possible for the owner of a namespace to both make it open and allow other organizations to use the grant. In this case, the authorized organizations have no special permissions and are equivalent to an open grant without ownership. .. _repository-metadata: Repository Metadata ------------------- The :pep:`JSON API <691>` version will be incremented from ``1.0`` to ``1.1``. The following API changes MUST be implemented by repositories that support this PEP. Repositories that do not support this PEP MUST NOT implement these changes so that consumers of the API are able to determine whether the repository supports this PEP. .. _project-detail: Project Detail '''''''''''''' The :pep:`project detail <691#project-detail>` response will be modified as follows. The ``namespace`` key MUST be ``null`` if the project does not match an active namespace grant. If the project does match a namespace grant, the value MUST be a mapping with the following keys: * ``prefix``: This is the associated `normalized `_ namespace e.g. ``foo-bar``. If the owner of the project owns multiple matching grants then this MUST be the namespace with the most number of characters. For example, if the project name matched both ``foo-bar`` and ``foo-bar-baz`` then this key would be the latter. * ``authorized``: This is a boolean and will be true if the project owner is an organization and is one of the current owners of the grant. This is useful for tools that wish to make a distinction between official and community packages. * ``open``: This is a boolean indicating whether the namespace is `open `_. Namespace Detail '''''''''''''''' The format of this URL is ``/namespace/`` where ```` is the `normalized `_ namespace. For example, the URL for the namespace ``foo.bar`` would be ``/namespace/foo-bar``. The response will be a mapping with the following keys: * ``prefix``: This is the `normalized `_ version of the namespace e.g. ``foo-bar``. * ``owner``: This is the organization that is responsible for the namespace. * ``open``: This is a boolean indicating whether the namespace is `open `_. * ``parent``: This is the parent namespace if it exists. For example, if the namespace is ``foo-bar`` and there is an active grant for ``foo``, then this would be ``"foo"``. If there is no parent then this key will be ``null``. * ``children``: This is an array of any child namespaces. For example, if the namespace is ``foo`` and there are active grants for ``foo-bar`` and ``foo-bar-baz`` then this would be ``["foo-bar", "foo-bar-baz"]``. Grant Removal ------------- When a reserved namespace becomes unclaimed, repositories MUST set the ``namespace`` key to ``null`` in the `API `_. Namespaces that were previously claimed but are now not SHOULD be eligible for claiming again by any organization. Backwards Compatibility ======================= There are no intrinsic concerns because there is still a flat namespace and installers need no modification. Additionally, many projects have already chosen to signal a shared purpose with a prefix like `typeshed has done`__. __ https://github.com/python/typeshed/issues/2491#issuecomment-578456045 Security Implications ===================== * There is an opportunity to build on top of :pep:`740` and :pep:`480` so that one could prove cryptographically that a specific release came from an owner of the associated namespace. This PEP makes no effort to describe how this will happen other than that work is planned for the future. How to Teach This ================= For consumers of packages we will document how metadata is exposed in the `API `_ and potentially in future note tooling that supports utilizing namespaces to provide extra security guarantees during installation. Reference Implementation ======================== None at this time. Rejected Ideas ============== Organization Scoping -------------------- The primary motivation for this PEP is to reduce dependency confusion attacks and NPM-style scoping with an allowance of the legacy flat namespace would increase the risk. If documentation instructed a user to install ``bar`` in the namespace ``foo`` then the user must be careful to install ``@foo/bar`` and not ``foo-bar``, or vice versa. The Python packaging ecosystem has normalization rules for names in order to maximize the ease of communication and this would be a regression. The runtime environment of Python is also not conducive to scoping. Whereas multiple versions of the same JavaScript package may coexist, Python only allows a single global namespace. Barring major changes to the language itself, this is nearly impossible to change. Additionally, users have come to expect that the package name is usually the same as what they would import and eliminating the flat namespace would do away with that convention. Scoping would be particularly affected by organization changes which are bound to happen over time. An organization may change their name due to internal shuffling, an acquisition, or any other reason. Whenever this happens every project they own would in effect be renamed which would cause unnecessary confusion for users, frequently. Finally, the disruption to the community would be massive because it would require an update from every package manager, security scanner, IDE, etc. New packages released with the scoping would be incompatible with older tools and would cause confusion for users along with frustration from maintainers having to triage such complaints. Open Issues =========== None at this time. Footnotes ========= .. [1] The following shows the package prefixes for the major cloud providers: - Amazon: `aws-cdk- `__ - Google: `google-cloud- `__ and others based on ``google-`` - Microsoft: `azure- `__ .. [2] Some examples of projects that have many packages with a common prefix: - `Django `__ is one of the most widely used web frameworks in existence. They have the concept of `reusable apps`__, which are commonly installed via `third-party packages `__ that implement a subset of functionality to extend Django-based websites. These packages are by convention prefixed by ``django-`` or ``dj-``. - `Project Jupyter `__ is devoted to the development of tooling for sharing interactive documents. They support `extensions`__ which in most cases (and in all cases for officially maintained extensions) are prefixed by ``jupyter-``. - `pytest `__ is Python's most popular testing framework. They have the concept of `plugins`__ which may be developed by anyone and by convention are prefixed by ``pytest-``. - `MkDocs `__ is a documentation framework based on Markdown files. They also have the concept of `plugins `__ which may be developed by anyone and are usually prefixed by ``mkdocs-``. - `Sphinx `__ is a documentation framework popular for large technical projects such as `Swift `__ and Python itself. They have the concept of `extensions`__ which are prefixed by ``sphinxcontrib-``, many of which are maintained within a `dedicated organization `__. - `OpenTelemetry `__ is an open standard for observability with `official packages`__ for the core APIs and SDK with `third-party packages`__ to collect data from various sources. All packages are prefixed by ``opentelemetry-`` with child prefixes in the form ``opentelemetry---``. - `Apache Airflow `__ is a platform to programmatically orchestrate tasks as directed acyclic graphs (DAGs). They have the concept of `plugins`__, and also `providers`__ which are prefixed by ``apache-airflow-providers-``. __ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/ __ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html __ https://docs.pytest.org/en/stable/how-to/writing_plugins.html __ https://www.sphinx-doc.org/en/master/usage/extensions/index.html __ https://github.com/open-telemetry/opentelemetry-python __ https://github.com/open-telemetry/opentelemetry-python-contrib __ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html __ https://airflow.apache.org/docs/apache-airflow-providers/index.html Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.