PEP 752: Address feedback, round 4 (#3955)

This commit is contained in:
Ofek Lev 2024-09-16 16:28:38 -04:00 committed by GitHub
parent 4878da5bd3
commit cff867a298
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 186 additions and 64 deletions

View File

@ -24,29 +24,42 @@ Motivation
==========
The current ecosystem lacks a way for projects with many packages to signal a
verified pattern of ownership. Some examples:
verified pattern of ownership. Such projects fall into two categories.
The first category is projects [1]_ that want complete control over their
namespace. A few examples:
* `Typeshed <https://github.com/python/typeshed>`__ is a community effort to
maintain type stubs for various packages. The stub packages they maintain
mirror the package name they target and are prefixed by ``types-``. For
example, the package ``requests`` has a stub that users would depend on
called ``types-requests``.
* Major cloud providers like Amazon, Google and Microsoft have a common prefix
for each feature's corresponding package [1]_. For example, most of Google's
for each feature's corresponding package [3]_. For example, most of Google's
packages are prefixed by ``google-cloud-`` e.g. ``google-cloud-compute`` for
`using virtual machines <https://cloud.google.com/products/compute>`__.
* Many projects [2]_ support a model where some packages are officially
maintained and third-party developers are encouraged to participate by
creating their own. For example, `Datadog <https://www.datadoghq.com>`__
offers observability as a service for organizations at any scale. The
`Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships out-of-the-box
with
`official integrations <https://github.com/DataDog/integrations-core>`__
for many products, like various databases and web servers, which are
distributed as Python packages that are prefixed by ``datadog-``. There is
support for creating `third-party integrations`__ which customers may run.
* `OpenTelemetry <https://opentelemetry.io>`__ is an open standard for
observability with `official packages`__ for the core APIs and SDK with
`contrib packages`__ to collect data from various sources. All packages
are prefixed by ``opentelemetry-`` with child prefixes in the form
``opentelemetry-<component>-<name>-``. The contrib packages live in a
central repository and they are the only ones with the ability to publish.
__ https://docs.datadoghq.com/developers/integrations/agent_integration/
__ https://github.com/open-telemetry/opentelemetry-python
__ https://github.com/open-telemetry/opentelemetry-python-contrib
The second category is projects [2]_ that want to share their namespace such
that some packages are officially maintained and third-party developers are
encouraged to participate by publishing their own. Some examples:
* `Project Jupyter <https://jupyter.org>`__ is devoted to the development of
tooling for sharing interactive documents. They support `extensions`__
which in most cases (and in all cases for officially maintained
extensions) are prefixed by ``jupyter-``.
* `Django <https://www.djangoproject.com>`__ is one of the most widely used web
frameworks in existence. They have the concept of `reusable apps`__, which
are commonly installed via
`third-party packages <https://djangopackages.org>`__ that implement a subset
of functionality to extend Django-based websites. These packages are by
convention prefixed by ``django-`` or ``dj-``.
__ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
__ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/
Such projects are uniquely vulnerable to name-squatting attacks
which can ultimately result in `dependency confusion`__.
@ -54,14 +67,15 @@ which can ultimately result in `dependency confusion`__.
__ https://www.activestate.com/resources/quick-reads/dependency-confusion/
For example, say a new product is released for which monitoring would be
valuable. It would be reasonable to assume that Datadog would eventually
support it as an official integration. It takes a nontrivial amount of time to
deliver such an integration due to roadmap prioritization and the time required
for implementation. It would be impossible to reserve the name of every
potential package so in the interim an attacker may create a package that
appears legitimate which would execute malicious code at runtime. Not only are
users more likely to install such packages but doing so taints the perception
of the entire project.
valuable. It would be reasonable to assume that
`Datadog <https://www.datadoghq.com>`__ would eventually support it as an
official integration. It takes a nontrivial amount of time to deliver such an
integration due to roadmap prioritization and the time required for
implementation. It would be impossible to reserve the name of every potential
package so in the interim an attacker may create a package that appears
legitimate which would execute malicious code at runtime. Not only are users
more likely to install such packages but doing so taints the perception of the
entire project.
Although :pep:`708` attempts to address this attack vector, it is specifically
about the case of multiple repositories being considered during dependency
@ -71,7 +85,13 @@ Namespacing also would drastically reduce the incidence of
`typosquatting <https://en.wikipedia.org/wiki/Typosquatting>`__
because typos would have to be in the prefix itself which is
`normalized <naming_>`_ and likely to be a short, well-known identifier like
``aws-``.
``aws-``. In recent years, typosquatting has become a popular attack vector
[4]_.
The `current protection`__ against typosquatting used by PyPI is to normalize
similar characters but that is insufficient for these use cases.
__ https://github.com/pypi/warehouse/blob/8615326918a180eb2652753743eac8e74f96a90b/warehouse/migrations/versions/d18d443f89f0_ultranormalize_name_function.py#L29-L42
Rationale
=========
@ -113,6 +133,11 @@ namespace. Any solution that requires new package syntax must be built atop the
existing flat namespace and therefore implicit namespaces acquired via a
reservation mechanism would be a prerequisite to such explicit namespaces.
Although existing packages matching a reserved namespace would be untouched,
preventing future unauthorized uploads and strategically applying :pep:`541`
takedown requests for malicious cases would reduce risks to users to a
negligible level.
Terminology
===========
@ -219,6 +244,8 @@ other organizations to use the grant. In this case, the authorized
organizations have no special permissions and are equivalent to an open grant
without ownership.
.. _hidden-grants:
Hidden Grants
-------------
@ -235,7 +262,7 @@ restrictions without the need to expose the namespace to the public.
Repository Metadata
-------------------
The :pep:`JSON API <691>` version will be incremented from ``1.0`` to ``1.1``.
The :pep:`JSON API <691>` version will be incremented from ``1.2`` to ``1.3``.
The following API changes MUST be implemented by repositories that support
this PEP. Repositories that do not support this PEP MUST NOT implement these
changes so that consumers of the API are able to determine whether the
@ -295,6 +322,19 @@ When a reserved namespace becomes unclaimed, repositories MUST set the
Namespaces that were previously claimed but are now not SHOULD be eligible for
claiming again by any organization.
Community Buy-in
================
Representatives from the following organizations have expressed support for
this PEP (with a link to the discussion):
* `Apache Airflow <https://github.com/apache/airflow/discussions/41657#discussioncomment-10412999>`__
* `Typeshed <https://discuss.python.org/t/1609/37>`__
* `Project Jupyter <https://discuss.python.org/t/61227/16>`__
(`expanded <https://discuss.python.org/t/61227/48>`__)
* `Microsoft <https://discuss.python.org/t/63191/40>`__
* `DataDog <https://discuss.python.org/t/63191/53>`__
Backwards Compatibility
=======================
@ -358,6 +398,73 @@ packages released with the scoping would be incompatible with older tools and
would cause confusion for users along with frustration from maintainers having
to triage such complaints.
Encourage Dedicated Package Repositories
----------------------------------------
Critically, this imposes a burden on projects to maintain their own infra. This
is an unrealistic expectation for the vast majority of companies and a complete
non-starter for community projects.
This does not help in most cases because the default behavior of most package
managers is to use PyPI so users attempting to perform a simple ``pip install``
would already be vulnerable to malicious packages.
In this theoretical future every project must document how to add their
repository to dependency resolution, which would be different for each package
manager. Few package managers are able to download specific dependencies from
specific repositories and would require users to use verbose configuration in
the common case.
The ones that do not support this would instead find a given package using an
ordered enumeration of repositories, leading to dependency confusion.
For example, say a user wants two packages from two custom repositories ``X``
and ``Y``. If each repository has both packages but one is malicious on ``X``
and the other is malicious on ``Y`` then the user would be unable to satisfy
their requirements without encountering a malicious package.
Use Fixed Prefixes
------------------
The idea here would be to have one or more top-level fixed prefixes that are
used for namespace reservations:
* ``com-``: Reserved for corporate organizations.
* ``org-``: Reserved for community organizations.
Organizations would then apply for a namespace prefixed by the type of their
organization.
This would cause perpetual disruption because when projects begin it is unknown
whether a user base will be large enough to warrant a namespace reservation.
Whenever that happens the project would have to be renamed which would put a
high maintenance burden on the project maintainers and would cause confusion
for users who have to learn a new way to reference the project's packages.
The potential for this deterring projects from reserving namespaces at all is
high.
Another issue with this approach is that projects often have branding in mind
(`example`__) and would be reluctant to change their package names.
__ https://github.com/apache/airflow/discussions/41657#discussioncomment-10417439
It's unrealistic to expect every company and project to voluntarily change
their existing and future package names.
Use DNS
-------
The `idea <https://discuss.python.org/t/63455>`__ here is to add a new
metadata field to projects in the API called ``domain-authority``. Repositories
would support a new endpoint for verifying the domain via HTTPS. Clients would
then support options to allow certain domains.
This does not solve the problem for the target audience who do not check where
their packages are coming from and is more about checking for the integrity of
uploads which is already supported in a more secure way by :pep:`740`.
Most projects do not have a domain and could not benefit from this, unfairly
favoring organizations that have the financial means to acquire one.
Open Issues
===========
@ -366,25 +473,27 @@ None at this time.
Footnotes
=========
.. [1] The following shows the package prefixes for the major cloud providers:
.. [1] Additional examples of projects with restricted namespaces:
- Amazon: `aws-cdk- <https://docs.aws.amazon.com/cdk/api/v2/python/>`__
- Google: `google-cloud- <https://github.com/googleapis/google-cloud-python/tree/main/packages>`__
and others based on ``google-``
- Microsoft: `azure- <https://github.com/Azure/azure-sdk-for-python/tree/main/sdk>`__
- `Typeshed <https://github.com/python/typeshed>`__ is a community effort to
maintain type stubs for various packages. The stub packages they maintain
mirror the package name they target and are prefixed by ``types-``. For
example, the package ``requests`` has a stub that users would depend on
called ``types-requests``. Unofficial stubs are not supposed to use the
``types-`` prefix and are expected to use a ``-stubs`` suffix instead.
- `Sphinx <https://www.sphinx-doc.org>`__ is a documentation framework
popular for large technical projects such as
`Swift <https://www.swift.org>`__ and Python itself. They have
the concept of `extensions`__ which are prefixed by ``sphinxcontrib-``,
many of which are maintained within a
`dedicated organization <https://github.com/sphinx-contrib>`__.
- `Apache Airflow <https://airflow.apache.org>`__ is a platform to
programmatically orchestrate tasks as directed acyclic graphs (DAGs).
They have the concept of `plugins`__, and also `providers`__ which are
prefixed by ``apache-airflow-providers-``.
.. [2] Some examples of projects that have many packages with a common prefix:
.. [2] Additional examples of projects with open namespaces:
- `Django <https://www.djangoproject.com>`__ is one of the most widely used
web frameworks in existence. They have the concept of `reusable apps`__,
which are commonly installed via
`third-party packages <https://djangopackages.org>`__ that implement a
subset of functionality to extend Django-based websites. These packages
are by convention prefixed by ``django-`` or ``dj-``.
- `Project Jupyter <https://jupyter.org>`__ is devoted to the development of
tooling for sharing interactive documents. They support `extensions`__
which in most cases (and in all cases for officially maintained
extensions) are prefixed by ``jupyter-``.
- `pytest <https://docs.pytest.org>`__ is Python's most popular testing
framework. They have the concept of `plugins`__ which may be developed by
anyone and by convention are prefixed by ``pytest-``.
@ -392,30 +501,43 @@ Footnotes
Markdown files. They also have the concept of
`plugins <https://www.mkdocs.org/dev-guide/plugins/>`__ which may be
developed by anyone and are usually prefixed by ``mkdocs-``.
- `Sphinx <https://www.sphinx-doc.org>`__ is a documentation framework
popular for large technical projects such as
`Swift <https://www.swift.org>`__ and Python itself. They have
the concept of `extensions`__ which are prefixed by ``sphinxcontrib-``,
many of which are maintained within a
`dedicated organization <https://github.com/sphinx-contrib>`__.
- `OpenTelemetry <https://opentelemetry.io>`__ is an open standard for
observability with `official packages`__ for the core APIs and SDK with
`third-party packages`__ to collect data from various sources. All
packages are prefixed by ``opentelemetry-`` with child prefixes in the
form ``opentelemetry-<component>-<name>-``.
- `Apache Airflow <https://airflow.apache.org>`__ is a platform to
programmatically orchestrate tasks as directed acyclic graphs (DAGs).
They have the concept of `plugins`__, and also `providers`__ which are
prefixed by ``apache-airflow-providers-``.
- `Datadog <https://www.datadoghq.com>`__ offers observability as a service
for organizations at any scale. The
`Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships out-of-the-box
with
`official integrations <https://github.com/DataDog/integrations-core>`__
for many products, like various databases and web servers, which are
distributed as Python packages that are prefixed by ``datadog-``. There is
support for creating `third-party integrations`__ which customers may run.
.. [3] The following shows the package prefixes for the major cloud providers:
- Amazon: `aws-cdk- <https://docs.aws.amazon.com/cdk/api/v2/python/>`__
- Google: `google-cloud- <https://github.com/googleapis/google-cloud-python/tree/main/packages>`__
and others based on ``google-``
- Microsoft: `azure- <https://github.com/Azure/azure-sdk-for-python/tree/main/sdk>`__
.. [4] Examples of typosquatting attacks targeting Python users:
- ``django-`` namespace was squatted, among other packages, leading to
a `postmortem <https://mail.python.org/pipermail/security-announce/2017-September/000000.html>`__
by PyPI.
- ``cupy-`` namespace was
`squatted <https://github.com/cupy/cupy/issues/4787>`__ by a malicious
actor thousands of times.
- ``scikit-`` namespace was
`squatted <https://blog.phylum.io/a-pypi-typosquatting-campaign-post-mortem/>`__,
among other packages. Notice how packages with a known prefix are much
more prone to successful attacks.
- ``typing-`` namespace was
`squatted <https://zero.checkmarx.com/malicious-pypi-user-strikes-again-with-typosquatting-starjacking-and-unpacks-tailor-made-malware-b12669cefaa5>`__
and this would be useful to prevent as a `hidden grant <hidden-grants_>`__.
__ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/
__ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
__ https://docs.pytest.org/en/stable/how-to/writing_plugins.html
__ https://www.sphinx-doc.org/en/master/usage/extensions/index.html
__ https://github.com/open-telemetry/opentelemetry-python
__ https://github.com/open-telemetry/opentelemetry-python-contrib
__ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html
__ https://airflow.apache.org/docs/apache-airflow-providers/index.html
__ https://docs.pytest.org/en/stable/how-to/writing_plugins.html
__ https://docs.datadoghq.com/developers/integrations/agent_integration/
Copyright
=========