python-peps/peps/pep-0752.rst

756 lines
35 KiB
ReStructuredText

PEP: 752
Title: Implicit namespaces for package repositories
Author: Ofek Lev <ofekmeister@gmail.com>
Sponsor: Barry Warsaw <barry@python.org>
PEP-Delegate: Dustin Ingram <di@python.org>
Discussions-To: https://discuss.python.org/t/63192
Status: Draft
Type: Standards Track
Topic: Packaging
Created: 13-Aug-2024
Post-History: `18-Aug-2024 <https://discuss.python.org/t/61227>`__,
`07-Sep-2024 <https://discuss.python.org/t/63192>`__,
Abstract
========
This PEP specifies a way for organizations to reserve package name prefixes
for future uploads.
"Namespaces are one honking great idea -- let's do more of
those!" - :pep:`20`
Motivation
==========
The current ecosystem lacks a way for projects with many packages to signal a
verified pattern of ownership. Such projects fall into two categories.
The first category is projects [1]_ that want complete control over their
namespace. A few examples:
* Major cloud providers like Amazon, Google and Microsoft have a common prefix
for each feature's corresponding package [3]_. For example, most of Google's
packages are prefixed by ``google-cloud-`` e.g. ``google-cloud-compute`` for
`using virtual machines <https://cloud.google.com/products/compute>`__.
* `OpenTelemetry <https://opentelemetry.io>`__ is an open standard for
observability with `official packages`__ for the core APIs and SDK with
`contrib packages`__ to collect data from various sources. All packages
are prefixed by ``opentelemetry-`` with child prefixes in the form
``opentelemetry-<component>-<name>-``. The contrib packages live in a
central repository and they are the only ones with the ability to publish.
__ https://github.com/open-telemetry/opentelemetry-python
__ https://github.com/open-telemetry/opentelemetry-python-contrib
The second category is projects [2]_ that want to share their namespace such
that some packages are officially maintained and third-party developers are
encouraged to participate by publishing their own. Some examples:
* `Project Jupyter <https://jupyter.org>`__ is devoted to the development of
tooling for sharing interactive documents. They support `extensions`__
which in most cases (and in all cases for officially maintained
extensions) are prefixed by ``jupyter-``.
* `Django <https://www.djangoproject.com>`__ is one of the most widely used web
frameworks in existence. They have the concept of `reusable apps`__, which
are commonly installed via
`third-party packages <https://djangopackages.org>`__ that implement a subset
of functionality to extend Django-based websites. These packages are by
convention prefixed by ``django-`` or ``dj-``.
__ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
__ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/
Such projects are uniquely vulnerable to name-squatting attacks
which can ultimately result in `dependency confusion`__.
__ https://www.activestate.com/resources/quick-reads/dependency-confusion/
For example, say a new product is released for which monitoring would be
valuable. It would be reasonable to assume that
`Datadog <https://www.datadoghq.com>`__ would eventually support it as an
official integration. It takes a nontrivial amount of time to deliver such an
integration due to roadmap prioritization and the time required for
implementation. It would be impossible to reserve the name of every potential
package so in the interim an attacker may create a package that appears
legitimate which would execute malicious code at runtime. Not only are users
more likely to install such packages but doing so taints the perception of the
entire project.
Although :pep:`708` attempts to address this attack vector, it is specifically
about the case of multiple repositories being considered during dependency
resolution and does not offer any protection to the aforementioned use cases.
Namespacing also would drastically reduce the incidence of
`typosquatting <https://en.wikipedia.org/wiki/Typosquatting>`__
because typos would have to be in the prefix itself which is
`normalized <naming_>`_ and likely to be a short, well-known identifier like
``aws-``. In recent years, typosquatting has become a popular attack vector
[4]_.
The `current protection`__ against typosquatting used by PyPI is to normalize
similar characters but that is insufficient for these use cases.
__ https://github.com/pypi/warehouse/blob/8615326918a180eb2652753743eac8e74f96a90b/warehouse/migrations/versions/d18d443f89f0_ultranormalize_name_function.py#L29-L42
Rationale
=========
Other package ecosystems have generally solved this problem by taking one of
two approaches: either minimizing or maximizing backwards compatibility.
* `NPM <https://www.npmjs.com>`__ has the concept of
`scoped packages <https://docs.npmjs.com/about-scopes>`__ which were
`introduced`__ primarily to combat there being a dearth of available good
package names (whether a real or perceived phenomenon). When a user or
organization signs up they are given a scope that matches their name. For
example, the
`package <https://www.npmjs.com/package/@google-cloud/storage>`__ for using
Google Cloud Storage is ``@google-cloud/storage`` where ``@google-cloud/`` is
the scope. Regular user accounts (non-organization) may publish `unscoped`__
packages for public use.
This approach has the lowest amount of backwards compatibility because every
installer and tool has to be modified to account for scopes.
* `NuGet <https://www.nuget.org>`__ has the concept of
`package ID prefix reservation`__ which was
`introduced`__ primarily to satisfy users wishing to know where a package
came from. A package name prefix may be reserved for use by one or more
owners. Every reserved package has a special indication
`on its page <https://www.nuget.org/packages/Google.Cloud.Storage.V1>`__ to
communicate this. After reservation, any upload with a reserved prefix will
fail if the user is not an owner of the prefix. Existing packages that have a
prefix that is owned may continue to release as usual. This approach has the
highest amount of backwards compatibility because only modifications to
indices like PyPI are required and installers do not need to change.
__ https://blog.npmjs.org/post/116936804365/solving-npms-hard-problem-naming-packages
__ https://docs.npmjs.com/package-scope-access-level-and-visibility
__ https://learn.microsoft.com/en-us/nuget/nuget-org/id-prefix-reservation
__ https://devblogs.microsoft.com/nuget/Package-identity-and-trust/
This PEP specifies the NuGet approach of authorized reservation across a flat
namespace. Any solution that requires new package syntax must be built atop the
existing flat namespace and therefore implicit namespaces acquired via a
reservation mechanism would be a prerequisite to such explicit namespaces.
Although existing packages matching a reserved namespace would be untouched,
preventing future unauthorized uploads and strategically applying :pep:`541`
takedown requests for malicious cases would reduce risks to users to a
negligible level.
Terminology
===========
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in :rfc:`2119`.
Organization
`Organizations <orgs_>`_ are entities that own projects and have various
users associated with them.
Grant
A grant is a reservation of a namespace for a package repository.
Open Namespace
An `open <open-namespaces_>`_ namespace allows for uploads from any project
owner.
Restricted Namespace
A restricted namespace only allows uploads from an owner of the namespace.
Parent Namespace
A namespace's parent refers to the namespace without the trailing
hyphenated component e.g. the parent of ``foo-bar`` is ``foo``.
Child Namespace
A namespace's child refers to the namespace with additional trailing
hyphenated components e.g. ``foo-bar`` is a valid child of ``foo`` as is
``foo-bar-baz``.
Specification
=============
.. _orgs:
Organizations
-------------
Any package repository that allows for the creation of projects (e.g.
non-mirrors) MAY offer the concept of `organizations`__. Organizations
are entities that own projects and have various users associated with them.
__ https://blog.pypi.org/posts/2023-04-23-introducing-pypi-organizations/
Organizations MAY reserve one or more namespaces. Such reservations neither
confer ownership nor grant special privileges to existing projects.
.. _naming:
Naming
------
A namespace MUST be a `valid`__ project name and `normalized`__ internally e.g.
``foo.bar`` would become ``foo-bar``.
__ https://packaging.python.org/en/latest/specifications/name-normalization/#name-format
__ https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization
Semantics
---------
A namespace grant bestows ownership over the following:
1. A project matching the namespace itself such as the placeholder package
`microsoft <https://pypi.org/project/microsoft/>`__.
2. Projects that start with the namespace followed by a hyphen. For example,
the namespace ``foo`` would match the normalized project name ``foo-bar``
but not the project name ``foobar``.
Package name matching acts upon the `normalized <naming_>`_ namespace.
Namespaces are per-package repository and SHALL NOT be shared between
repositories. For example, if PyPI has a namespace ``microsoft`` that is owned
by the company Microsoft, packages starting with ``microsoft-`` that come from
other non-PyPI mirror repositories do not confer the same level of trust.
Grants MUST NOT overlap. For example, if there is an existing grant
for ``foo-bar`` then a new grant for ``foo`` would be forbidden. An overlap is
determined by comparing the `normalized <naming_>`_ proposed namespace with the
normalized namespace of every existing root grant. Every comparison must append
a hyphen to the end of the proposed and existing namespace. An overlap is
detected when any existing namespace starts with the proposed namespace.
.. _uploads:
Uploads
-------
If the following criteria are all true for a given upload:
1. The project does not yet exist.
2. The name matches a reserved namespace.
3. The project is not owned by an organization with an active grant for the
namespace.
Then the upload MUST fail with a 403 HTTP status code.
.. _open-namespaces:
Open Namespaces
-----------------
The owner of a grant may choose to allow others the ability to release new
projects with the associated namespace. Doing so MUST allow
`uploads <uploads_>`_ for new projects matching the namespace from any user.
It is possible for the owner of a namespace to both make it open and allow
other organizations to use the grant. In this case, the authorized
organizations have no special permissions and are equivalent to an open grant
without ownership.
.. _hidden-grants:
Hidden Grants
-------------
Repositories MAY create hidden grants that are not visible to the public which
prevent their namespaces from being claimed by others. Such grants MUST NOT be
`open <open-namespaces_>`_ and SHOULD NOT be exposed in the
`API <repository-metadata_>`_.
Hidden grants are useful for repositories that wish to enforce upload
restrictions without the need to expose the namespace to the public.
.. _repository-metadata:
Repository Metadata
-------------------
The :pep:`JSON API <691>` version will be incremented from ``1.2`` to ``1.3``.
The following API changes MUST be implemented by repositories that support
this PEP. Repositories that do not support this PEP MUST NOT implement these
changes so that consumers of the API are able to determine whether the
repository supports this PEP.
.. _project-detail:
Project Detail
''''''''''''''
The :pep:`project detail <691#project-detail>` response will be modified as
follows.
The ``namespace`` key MUST be ``null`` if the project does not match an active
namespace grant. If the project does match a namespace grant, the value MUST be
a mapping with the following keys:
* ``prefix``: This is the associated `normalized <naming_>`_ namespace e.g.
``foo-bar``. If the owner of the project owns multiple matching grants then
this MUST be the namespace with the most number of characters. For example,
if the project name matched both ``foo-bar`` and ``foo-bar-baz`` then this
key would be the latter.
* ``authorized``: This is a boolean and will be true if the project owner
is an organization and is one of the current owners of the grant. This is
useful for tools that wish to make a distinction between official and
community packages.
* ``open``: This is a boolean indicating whether the namespace is
`open <open-namespaces_>`_.
Namespace Detail
''''''''''''''''
The format of this URL is ``/namespace/<namespace>`` where ``<namespace>`` is
the `normalized <naming_>`_ namespace. For example, the URL for the namespace
``foo.bar`` would be ``/namespace/foo-bar``.
The response will be a mapping with the following keys:
* ``prefix``: This is the `normalized <naming_>`_ version of the namespace e.g.
``foo-bar``.
* ``owner``: This is the organization that is responsible for the namespace.
* ``open``: This is a boolean indicating whether the namespace is
`open <open-namespaces_>`_.
* ``parent``: This is the parent namespace if it exists. For example, if the
namespace is ``foo-bar`` and there is an active grant for ``foo``, then this
would be ``"foo"``. If there is no parent then this key will be ``null``.
* ``children``: This is an array of any child namespaces. For example, if the
namespace is ``foo`` and there are active grants for ``foo-bar`` and
``foo-bar-baz`` then this would be ``["foo-bar", "foo-bar-baz"]``.
Grant Removal
-------------
When a reserved namespace becomes unclaimed, repositories MUST set the
``namespace`` key to ``null`` in the `API <project-detail_>`_.
Namespaces that were previously claimed but are now not SHOULD be eligible for
claiming again by any organization.
Community Buy-in
================
Representatives from the following organizations have expressed support for
this PEP (with a link to the discussion):
* `Apache Airflow <https://github.com/apache/airflow/discussions/41657#discussioncomment-10412999>`__
(`expanded <https://discuss.python.org/t/63191/75>`__)
* `pytest <https://discuss.python.org/t/63192/68>`__
* `Typeshed <https://discuss.python.org/t/1609/37>`__
* `Project Jupyter <https://discuss.python.org/t/61227/16>`__
(`expanded <https://discuss.python.org/t/61227/48>`__)
* `Microsoft <https://discuss.python.org/t/63191/40>`__
* `Sentry <https://discuss.python.org/t/63192/67>`__
(in favor of the NuGet approach over others but not negatively impacted
by the current lack of capability)
* `DataDog <https://discuss.python.org/t/63191/53>`__
Backwards Compatibility
=======================
There are no intrinsic concerns because there is still a flat namespace and
installers need no modification. Additionally, many projects have already
chosen to signal a shared purpose with a prefix like `typeshed has done`__.
__ https://github.com/python/typeshed/issues/2491#issuecomment-578456045
.. _security-implications:
Security Implications
=====================
* There is an opportunity to build on top of :pep:`740` and :pep:`480` so that
one could prove cryptographically that a specific release came from an owner
of the associated namespace. This PEP makes no effort to describe how this
will happen other than that work is planned for the future.
How to Teach This
=================
For consumers of packages we will document how metadata is exposed in the
`API <repository-metadata_>`_ and potentially in future note tooling that
supports utilizing namespaces to provide extra security guarantees during
installation.
Reference Implementation
========================
None at this time.
Rejected Ideas
==============
.. _artifact-level-association:
Artifact-level Namespace Association
------------------------------------
An earlier version of this PEP proposed that metadata be associated with
individual artifacts at the point of release. This was rejected because it
had the potential to cause confusion for users who would expect the namespace
authorization guarantee to be at the project level based on current grants
rather than the time at which a given release occurred.
.. _organization-scoping:
Organization Scoping
--------------------
The primary motivation for this PEP is to reduce dependency confusion attacks
and NPM-style scoping with an allowance of the legacy flat namespace would
increase the risk. If documentation instructed a user to install ``bar`` in the
namespace ``foo`` then the user must be careful to install ``@foo/bar`` and not
``foo-bar``, or vice versa. The Python packaging ecosystem has normalization
rules for names in order to maximize the ease of communication and this would
be a regression.
The runtime environment of Python is also not conducive to scoping. Whereas
multiple versions of the same JavaScript package may coexist, Python only
allows a single global namespace. Barring major changes to the language itself,
this is nearly impossible to change. Additionally, users have come to expect
that the package name is usually the same as what they would import and
eliminating the flat namespace would do away with that convention.
Scoping would be particularly affected by organization changes which are bound
to happen over time. An organization may change their name due to internal
shuffling, an acquisition, or any other reason. Whenever this happens every
project they own would in effect be renamed which would cause unnecessary
confusion for users, frequently.
Finally, the disruption to the community would be massive because it would
require an update from every package manager, security scanner, IDE, etc. New
packages released with the scoping would be incompatible with older tools and
would cause confusion for users along with frustration from maintainers having
to triage such complaints.
.. _dedicated-repositories:
Encourage Dedicated Package Repositories
----------------------------------------
Critically, this imposes a burden on projects to maintain their own infra. This
is an unrealistic expectation for the vast majority of companies and a complete
non-starter for community projects.
This does not help in most cases because the default behavior of most package
managers is to use PyPI so users attempting to perform a simple ``pip install``
would already be vulnerable to malicious packages.
In this theoretical future every project must document how to add their
repository to dependency resolution, which would be different for each package
manager. Few package managers are able to download specific dependencies from
specific repositories and would require users to use verbose configuration in
the common case.
The ones that do not support this would instead find a given package using an
ordered enumeration of repositories, leading to dependency confusion.
For example, say a user wants two packages from two custom repositories ``X``
and ``Y``. If each repository has both packages but one is malicious on ``X``
and the other is malicious on ``Y`` then the user would be unable to satisfy
their requirements without encountering a malicious package.
.. _provenance-assertions:
Exclusive Reliance on Provenance Assertions
-------------------------------------------
The idea here [5]_ would be to design a general purpose way for clients to make
provenance assertions to verify certain properties of dependencies, each with
custom syntax. Some examples:
* The package was uploaded by a specific organization or user name e.g.
``pip install "azure-loganalytics from microsoft"``
* The package was uploaded by an owner of a specific domain name e.g.
``pip install "google-cloud-compute from cloud.google.com"``
* The package was uploaded by a user with a specific email address e.g.
``pip install "aws-cdk-lib from contact@amazon.com"``
* The package matching a namespace was uploaded by an authorized party (this
PEP)
A fundamental downside is that it doesn't play well with multiple
repositories. For example, say a user wants the ``azure-loganalytics`` package
and wants to ensure it comes from the organization named ``microsoft``. If
Microsoft's organization name on PyPI is ``microsoft`` then a package manager
that defaults to PyPI could accept ``azure-loganalytics from microsoft``.
However, if multiple repositories are used for dependency resolution then the
user would have to specify the repository as part of the definition which is
unrealistic for reasons outlined in the dedicated section on
`asserting package owner names <asserting-package-owner-names_>`_.
Another general weakness with this approach is that a user attempting to
perform a simple ``pip install`` without special syntax, which is the most
common scenario, would already be vulnerable to malicious packages. In order to
overcome this there would have to be some default trust mechanism, which in all
cases would impose certain UX or resolver logic upon every tool.
For example, package managers could be changed such that the first time a
package is installed the user would receive a confirmation prompt displaying
the provenance details. This would be very confusing and noisy, especially for
new users, and would be a breaking UX change for existing users. Many methods
of installation wouldn't work for this scenario such as running in CI or
installing from a requirements file where the user would potentially be getting
hundreds of prompts.
One solution to make this less disruptive for users would be to manually
maintain a list of trustworthy details (organization/user names, domain names,
email addresses, etc.). This could be discoverable by packages providing
`entry points`__ which package managers could learn to detect and which
corporate environments could install by default. This has the major downside of
not providing automatic guarantees which would limit the usefulness for the
average user who is more likely to be affected.
__ https://packaging.python.org/en/latest/specifications/entry-points/
There are two ideas that could be used to provide automatic protection, which
could be based on :pep:`740` attestations or a new mechanism for utilizing
third-party APIs that host the metadata.
First, each repository could offer a service that verifies the owner of a
package using whatever criteria they deem appropriate. After verification, the
repository would add the details to a dedicated package that would be installed
by default.
This would require dedicated maintenance which is unrealistic for most
repositories, even PyPI currently. It's unclear how community projects without
the resources for something like a domain name would be supported. Critically,
this solution would cause extra confusion for users in the case of multiple
repositories as each might have their own verification processes, attestation
criteria and default package containing the verified details. It would be
challenging to get community buy-in of every package manager to be aware of
each repositories' chosen verification package and install that by default
before dependency resolution.
Should digital attestations become the chosen mechanism, a downside is that
implementing this in custom package repositories would require a significant
amount of work. In the case of PyPI, the prerequisite work on
`Trusted Publishing`__ and then the `PEP 740 implementation`__ itself took the
equivalent of a full-time engineer one year whose time was paid for by a
corporate sponsor. Other organizations are unlikely to implement similar work
because simpler mechanisms make it possible to implement reproducible builds.
When everything is internally managed, attestations are also not very useful.
Community projects are unlikely to undertake this effort because they would
likely lack the resources to maintain the necessary infrastructure themselves
and moreover there are significant downsides to
`encouraging dedicated package repositories <dedicated-repositories_>`_.
__ https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/#acknowledgements
__ https://blog.trailofbits.com/2024/10/01/securing-the-software-supply-chain-with-the-slsa-framework/
The other idea would be to host provenance assertions externally and push more
logic client-side. A possible implementation might be to specify a provenance
API that could be hosted at a designated relative path like
``/provenance``. Projects on each repository could then be configured to point
to a particular domain and this information would be passed on to clients
during installation.
While this distributed approach does impose less of an infrastructure burden on
repositories, it has the potential to be a security risk. If an external
provenance API is compromised, it could lead to malicious packages being
installed. If an external API is down, it could lead to package installation
failing or package managers might only emit warnings in which case there is no
security benefit.
Additionally, this disadvantages community projects that do not have the
resources to maintain such an API. They could use free hosting solutions such
as what many do for documentation but they do not technically own the
infrastructure and they would be compromised should the generous offerings be
restricted.
Finally, while both of these theoretical approaches are not yet prescriptive,
they imply assertions at the artifact level which was already a
`rejected idea <artifact-level-association_>`_.
.. _asserting-package-owner-names:
Asserting Package Owner Names
-----------------------------
This is about asserting that the package came from a specific organization or
user name. It's quite similar to the
`organization scoping <organization-scoping_>`_ idea except that a flat
namespace is the base assumption.
This would require modifications to the :pep:`JSON API <691>` of each supported
repository and could be implemented by exposing extra metadata or as proper
`provenance assertions <provenance-assertions_>`_.
As with the organization scoping idea, a new `syntax`__ would be required like
``microsoft::azure-loganalytics`` where ``microsoft`` is the organization and
``azure-loganalytics`` is the package. Although this plays well with the
existing flat namespace in comparison, it retains the critical downside of
being a disruption for the community with the number of changes required.
__ https://packaging.python.org/en/latest/specifications/dependency-specifiers/
A unique downside is that names are an implementation detail of repositories.
On PyPI, the names of organizations are separate from user names so there is
potential for conflicts. In the case of multiple repositories, users might run
into cases of dependency confusion similar to the one at the end of the
`Encourage Dedicated Package Repositories <dedicated-repositories_>`_
rejected idea.
To ameliorate this, it was suggested that the syntax be expanded to also
include the expected repository URL like
``microsoft@pypi.org::azure-loganalytics``. This syntax or something like it
is so verbose that it could lead to user confusion, and even worse, frustration
should it gain increased adoption among those able to maintain dedicated
infrastructure (community projects would not benefit).
The expanded syntax is an attempt to standardize resolver behavior and
configuration within dependency specifiers. Not only would this be mandating
the UX of tools, it lacks precedent in package managers for language ecosystems
with or without the concept of package repositories. In such cases, the
resolver configuration is separate from the dependency definition.
======== ======== =============================================================
Language Tool Resolution behavior
======== ======== =============================================================
Rust Cargo Dependency resolution can be `modified`__ within
``Cargo.toml`` using the the ``[patch]`` table.
JS Yarn Although they have the concept of `protocols`__ (which are
similar to the URL schemes of our `direct references`__),
users configure the `resolutions`__ field in the
``package.json`` file.
JS npm Users can configure the `overrides`__ field in the
``package.json`` file.
Ruby Bundler The ``Gemfile`` allows for specifying an
`explicit source`__ for a gem.
C# NuGet It's possible to `override package versions`__ by configuring
the ``Directory.Packages.props`` file.
PHP Composer The ``composer.json`` file allows for specifying
`repository`__ sources for specific packages.
Go go The ``go.mod`` file allows for specifying a `replace`__
directive. Note that this is used for direct dependencies
as well as transitive dependencies.
======== ======== =============================================================
__ https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html
__ https://yarnpkg.com/protocols
__ https://packaging.python.org/en/latest/specifications/version-specifiers/#direct-references
__ https://yarnpkg.com/configuration/manifest#resolutions
__ https://docs.npmjs.com/cli/v10/configuring-npm/package-json#overrides
__ https://bundler.io/v2.5/man/gemfile.5.html#SOURCE-PRIORITY
__ https://learn.microsoft.com/en-us/nuget/consume-packages/central-package-management#overriding-package-versions
__ https://getcomposer.org/doc/articles/repository-priorities.md#filtering-packages
__ https://go.dev/ref/mod#go-mod-file-replace
Use Fixed Prefixes
------------------
The idea here would be to have one or more top-level fixed prefixes that are
used for namespace reservations:
* ``com-``: Reserved for corporate organizations.
* ``org-``: Reserved for community organizations.
Organizations would then apply for a namespace prefixed by the type of their
organization.
This would cause perpetual disruption because when projects begin it is unknown
whether a user base will be large enough to warrant a namespace reservation.
Whenever that happens the project would have to be renamed which would put a
high maintenance burden on the project maintainers and would cause confusion
for users who have to learn a new way to reference the project's packages.
The potential for this deterring projects from reserving namespaces at all is
high.
Another issue with this approach is that projects often have branding in mind
(`example`__) and would be reluctant to change their package names.
__ https://github.com/apache/airflow/discussions/41657#discussioncomment-10417439
It's unrealistic to expect every company and project to voluntarily change
their existing and future package names.
Use DNS
-------
The `idea <https://discuss.python.org/t/63455>`__ here is to add a new
metadata field to projects in the API called ``domain-authority``. Repositories
would support a new endpoint for verifying the domain via HTTPS. Clients would
then support options to allow certain domains.
This does not solve the problem for the target audience who do not check where
their packages are coming from and is more about checking for the integrity of
uploads which is already supported in a more secure way by :pep:`740`.
Most projects do not have a domain and could not benefit from this, unfairly
favoring organizations that have the financial means to acquire one.
Open Issues
===========
None at this time.
Footnotes
=========
.. [1] Additional examples of projects with restricted namespaces:
- `Typeshed <https://github.com/python/typeshed>`__ is a community effort to
maintain type stubs for various packages. The stub packages they maintain
mirror the package name they target and are prefixed by ``types-``. For
example, the package ``requests`` has a stub that users would depend on
called ``types-requests``. Unofficial stubs are not supposed to use the
``types-`` prefix and are expected to use a ``-stubs`` suffix instead.
- `Sphinx <https://www.sphinx-doc.org>`__ is a documentation framework
popular for large technical projects such as
`Swift <https://www.swift.org>`__ and Python itself. They have
the concept of `extensions`__ which are prefixed by ``sphinxcontrib-``,
many of which are maintained within a
`dedicated organization <https://github.com/sphinx-contrib>`__.
- `Apache Airflow <https://airflow.apache.org>`__ is a platform to
programmatically orchestrate tasks as directed acyclic graphs (DAGs).
They have the concept of `plugins`__, and also `providers`__ which are
prefixed by ``apache-airflow-providers-``.
.. [2] Additional examples of projects with open namespaces:
- `pytest <https://docs.pytest.org>`__ is Python's most popular testing
framework. They have the concept of `plugins`__ which may be developed by
anyone and by convention are prefixed by ``pytest-``.
- `MkDocs <https://www.mkdocs.org>`__ is a documentation framework based on
Markdown files. They also have the concept of
`plugins <https://www.mkdocs.org/dev-guide/plugins/>`__ which may be
developed by anyone and are usually prefixed by ``mkdocs-``.
- `Datadog <https://www.datadoghq.com>`__ offers observability as a service.
The `Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships
out-of-the-box with
`official integrations <https://github.com/DataDog/integrations-core>`__
for many products, like various databases and web servers, which are
distributed as Python packages that are prefixed by ``datadog-``. There is
support for creating `third-party integrations`__ which customers may run.
.. [3] The following shows the package prefixes for the major cloud providers:
- Amazon: `aws-cdk- <https://docs.aws.amazon.com/cdk/api/v2/python/>`__
- Google: `google-cloud- <https://github.com/googleapis/google-cloud-python/tree/main/packages>`__
and others based on ``google-``
- Microsoft: `azure- <https://github.com/Azure/azure-sdk-for-python/tree/main/sdk>`__
.. [4] Examples of typosquatting attacks targeting Python users:
- ``django-`` namespace was squatted, among other packages, leading to
a `postmortem <https://mail.python.org/pipermail/security-announce/2017-September/000000.html>`__
by PyPI.
- ``cupy-`` namespace was
`squatted <https://github.com/cupy/cupy/issues/4787>`__ by a malicious
actor thousands of times.
- ``scikit-`` namespace was
`squatted <https://blog.phylum.io/a-pypi-typosquatting-campaign-post-mortem/>`__,
among other packages. Notice how packages with a known prefix are much
more prone to successful attacks.
- ``typing-`` namespace was
`squatted <https://zero.checkmarx.com/malicious-pypi-user-strikes-again-with-typosquatting-starjacking-and-unpacks-tailor-made-malware-b12669cefaa5>`__
and this would be useful to prevent as a `hidden grant <hidden-grants_>`__.
.. [5] `Detailed write-up <https://discuss.python.org/t/64679>`__ of the
potential for provenance assertions.
__ https://www.sphinx-doc.org/en/master/usage/extensions/index.html
__ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html
__ https://airflow.apache.org/docs/apache-airflow-providers/index.html
__ https://docs.pytest.org/en/stable/how-to/writing_plugins.html
__ https://docs.datadoghq.com/developers/integrations/agent_integration/
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.