PEP 752: Address feedback, round 4 (#3955)
This commit is contained in:
parent
4878da5bd3
commit
cff867a298
|
@ -24,29 +24,42 @@ Motivation
|
|||
==========
|
||||
|
||||
The current ecosystem lacks a way for projects with many packages to signal a
|
||||
verified pattern of ownership. Some examples:
|
||||
verified pattern of ownership. Such projects fall into two categories.
|
||||
|
||||
The first category is projects [1]_ that want complete control over their
|
||||
namespace. A few examples:
|
||||
|
||||
* `Typeshed <https://github.com/python/typeshed>`__ is a community effort to
|
||||
maintain type stubs for various packages. The stub packages they maintain
|
||||
mirror the package name they target and are prefixed by ``types-``. For
|
||||
example, the package ``requests`` has a stub that users would depend on
|
||||
called ``types-requests``.
|
||||
* Major cloud providers like Amazon, Google and Microsoft have a common prefix
|
||||
for each feature's corresponding package [1]_. For example, most of Google's
|
||||
for each feature's corresponding package [3]_. For example, most of Google's
|
||||
packages are prefixed by ``google-cloud-`` e.g. ``google-cloud-compute`` for
|
||||
`using virtual machines <https://cloud.google.com/products/compute>`__.
|
||||
* Many projects [2]_ support a model where some packages are officially
|
||||
maintained and third-party developers are encouraged to participate by
|
||||
creating their own. For example, `Datadog <https://www.datadoghq.com>`__
|
||||
offers observability as a service for organizations at any scale. The
|
||||
`Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships out-of-the-box
|
||||
with
|
||||
`official integrations <https://github.com/DataDog/integrations-core>`__
|
||||
for many products, like various databases and web servers, which are
|
||||
distributed as Python packages that are prefixed by ``datadog-``. There is
|
||||
support for creating `third-party integrations`__ which customers may run.
|
||||
* `OpenTelemetry <https://opentelemetry.io>`__ is an open standard for
|
||||
observability with `official packages`__ for the core APIs and SDK with
|
||||
`contrib packages`__ to collect data from various sources. All packages
|
||||
are prefixed by ``opentelemetry-`` with child prefixes in the form
|
||||
``opentelemetry-<component>-<name>-``. The contrib packages live in a
|
||||
central repository and they are the only ones with the ability to publish.
|
||||
|
||||
__ https://docs.datadoghq.com/developers/integrations/agent_integration/
|
||||
__ https://github.com/open-telemetry/opentelemetry-python
|
||||
__ https://github.com/open-telemetry/opentelemetry-python-contrib
|
||||
|
||||
The second category is projects [2]_ that want to share their namespace such
|
||||
that some packages are officially maintained and third-party developers are
|
||||
encouraged to participate by publishing their own. Some examples:
|
||||
|
||||
* `Project Jupyter <https://jupyter.org>`__ is devoted to the development of
|
||||
tooling for sharing interactive documents. They support `extensions`__
|
||||
which in most cases (and in all cases for officially maintained
|
||||
extensions) are prefixed by ``jupyter-``.
|
||||
* `Django <https://www.djangoproject.com>`__ is one of the most widely used web
|
||||
frameworks in existence. They have the concept of `reusable apps`__, which
|
||||
are commonly installed via
|
||||
`third-party packages <https://djangopackages.org>`__ that implement a subset
|
||||
of functionality to extend Django-based websites. These packages are by
|
||||
convention prefixed by ``django-`` or ``dj-``.
|
||||
|
||||
__ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
|
||||
__ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/
|
||||
|
||||
Such projects are uniquely vulnerable to name-squatting attacks
|
||||
which can ultimately result in `dependency confusion`__.
|
||||
|
@ -54,14 +67,15 @@ which can ultimately result in `dependency confusion`__.
|
|||
__ https://www.activestate.com/resources/quick-reads/dependency-confusion/
|
||||
|
||||
For example, say a new product is released for which monitoring would be
|
||||
valuable. It would be reasonable to assume that Datadog would eventually
|
||||
support it as an official integration. It takes a nontrivial amount of time to
|
||||
deliver such an integration due to roadmap prioritization and the time required
|
||||
for implementation. It would be impossible to reserve the name of every
|
||||
potential package so in the interim an attacker may create a package that
|
||||
appears legitimate which would execute malicious code at runtime. Not only are
|
||||
users more likely to install such packages but doing so taints the perception
|
||||
of the entire project.
|
||||
valuable. It would be reasonable to assume that
|
||||
`Datadog <https://www.datadoghq.com>`__ would eventually support it as an
|
||||
official integration. It takes a nontrivial amount of time to deliver such an
|
||||
integration due to roadmap prioritization and the time required for
|
||||
implementation. It would be impossible to reserve the name of every potential
|
||||
package so in the interim an attacker may create a package that appears
|
||||
legitimate which would execute malicious code at runtime. Not only are users
|
||||
more likely to install such packages but doing so taints the perception of the
|
||||
entire project.
|
||||
|
||||
Although :pep:`708` attempts to address this attack vector, it is specifically
|
||||
about the case of multiple repositories being considered during dependency
|
||||
|
@ -71,7 +85,13 @@ Namespacing also would drastically reduce the incidence of
|
|||
`typosquatting <https://en.wikipedia.org/wiki/Typosquatting>`__
|
||||
because typos would have to be in the prefix itself which is
|
||||
`normalized <naming_>`_ and likely to be a short, well-known identifier like
|
||||
``aws-``.
|
||||
``aws-``. In recent years, typosquatting has become a popular attack vector
|
||||
[4]_.
|
||||
|
||||
The `current protection`__ against typosquatting used by PyPI is to normalize
|
||||
similar characters but that is insufficient for these use cases.
|
||||
|
||||
__ https://github.com/pypi/warehouse/blob/8615326918a180eb2652753743eac8e74f96a90b/warehouse/migrations/versions/d18d443f89f0_ultranormalize_name_function.py#L29-L42
|
||||
|
||||
Rationale
|
||||
=========
|
||||
|
@ -113,6 +133,11 @@ namespace. Any solution that requires new package syntax must be built atop the
|
|||
existing flat namespace and therefore implicit namespaces acquired via a
|
||||
reservation mechanism would be a prerequisite to such explicit namespaces.
|
||||
|
||||
Although existing packages matching a reserved namespace would be untouched,
|
||||
preventing future unauthorized uploads and strategically applying :pep:`541`
|
||||
takedown requests for malicious cases would reduce risks to users to a
|
||||
negligible level.
|
||||
|
||||
Terminology
|
||||
===========
|
||||
|
||||
|
@ -219,6 +244,8 @@ other organizations to use the grant. In this case, the authorized
|
|||
organizations have no special permissions and are equivalent to an open grant
|
||||
without ownership.
|
||||
|
||||
.. _hidden-grants:
|
||||
|
||||
Hidden Grants
|
||||
-------------
|
||||
|
||||
|
@ -235,7 +262,7 @@ restrictions without the need to expose the namespace to the public.
|
|||
Repository Metadata
|
||||
-------------------
|
||||
|
||||
The :pep:`JSON API <691>` version will be incremented from ``1.0`` to ``1.1``.
|
||||
The :pep:`JSON API <691>` version will be incremented from ``1.2`` to ``1.3``.
|
||||
The following API changes MUST be implemented by repositories that support
|
||||
this PEP. Repositories that do not support this PEP MUST NOT implement these
|
||||
changes so that consumers of the API are able to determine whether the
|
||||
|
@ -295,6 +322,19 @@ When a reserved namespace becomes unclaimed, repositories MUST set the
|
|||
Namespaces that were previously claimed but are now not SHOULD be eligible for
|
||||
claiming again by any organization.
|
||||
|
||||
Community Buy-in
|
||||
================
|
||||
|
||||
Representatives from the following organizations have expressed support for
|
||||
this PEP (with a link to the discussion):
|
||||
|
||||
* `Apache Airflow <https://github.com/apache/airflow/discussions/41657#discussioncomment-10412999>`__
|
||||
* `Typeshed <https://discuss.python.org/t/1609/37>`__
|
||||
* `Project Jupyter <https://discuss.python.org/t/61227/16>`__
|
||||
(`expanded <https://discuss.python.org/t/61227/48>`__)
|
||||
* `Microsoft <https://discuss.python.org/t/63191/40>`__
|
||||
* `DataDog <https://discuss.python.org/t/63191/53>`__
|
||||
|
||||
Backwards Compatibility
|
||||
=======================
|
||||
|
||||
|
@ -358,6 +398,73 @@ packages released with the scoping would be incompatible with older tools and
|
|||
would cause confusion for users along with frustration from maintainers having
|
||||
to triage such complaints.
|
||||
|
||||
Encourage Dedicated Package Repositories
|
||||
----------------------------------------
|
||||
|
||||
Critically, this imposes a burden on projects to maintain their own infra. This
|
||||
is an unrealistic expectation for the vast majority of companies and a complete
|
||||
non-starter for community projects.
|
||||
|
||||
This does not help in most cases because the default behavior of most package
|
||||
managers is to use PyPI so users attempting to perform a simple ``pip install``
|
||||
would already be vulnerable to malicious packages.
|
||||
|
||||
In this theoretical future every project must document how to add their
|
||||
repository to dependency resolution, which would be different for each package
|
||||
manager. Few package managers are able to download specific dependencies from
|
||||
specific repositories and would require users to use verbose configuration in
|
||||
the common case.
|
||||
|
||||
The ones that do not support this would instead find a given package using an
|
||||
ordered enumeration of repositories, leading to dependency confusion.
|
||||
For example, say a user wants two packages from two custom repositories ``X``
|
||||
and ``Y``. If each repository has both packages but one is malicious on ``X``
|
||||
and the other is malicious on ``Y`` then the user would be unable to satisfy
|
||||
their requirements without encountering a malicious package.
|
||||
|
||||
Use Fixed Prefixes
|
||||
------------------
|
||||
|
||||
The idea here would be to have one or more top-level fixed prefixes that are
|
||||
used for namespace reservations:
|
||||
|
||||
* ``com-``: Reserved for corporate organizations.
|
||||
* ``org-``: Reserved for community organizations.
|
||||
|
||||
Organizations would then apply for a namespace prefixed by the type of their
|
||||
organization.
|
||||
|
||||
This would cause perpetual disruption because when projects begin it is unknown
|
||||
whether a user base will be large enough to warrant a namespace reservation.
|
||||
Whenever that happens the project would have to be renamed which would put a
|
||||
high maintenance burden on the project maintainers and would cause confusion
|
||||
for users who have to learn a new way to reference the project's packages.
|
||||
The potential for this deterring projects from reserving namespaces at all is
|
||||
high.
|
||||
|
||||
Another issue with this approach is that projects often have branding in mind
|
||||
(`example`__) and would be reluctant to change their package names.
|
||||
|
||||
__ https://github.com/apache/airflow/discussions/41657#discussioncomment-10417439
|
||||
|
||||
It's unrealistic to expect every company and project to voluntarily change
|
||||
their existing and future package names.
|
||||
|
||||
Use DNS
|
||||
-------
|
||||
|
||||
The `idea <https://discuss.python.org/t/63455>`__ here is to add a new
|
||||
metadata field to projects in the API called ``domain-authority``. Repositories
|
||||
would support a new endpoint for verifying the domain via HTTPS. Clients would
|
||||
then support options to allow certain domains.
|
||||
|
||||
This does not solve the problem for the target audience who do not check where
|
||||
their packages are coming from and is more about checking for the integrity of
|
||||
uploads which is already supported in a more secure way by :pep:`740`.
|
||||
|
||||
Most projects do not have a domain and could not benefit from this, unfairly
|
||||
favoring organizations that have the financial means to acquire one.
|
||||
|
||||
Open Issues
|
||||
===========
|
||||
|
||||
|
@ -366,25 +473,27 @@ None at this time.
|
|||
Footnotes
|
||||
=========
|
||||
|
||||
.. [1] The following shows the package prefixes for the major cloud providers:
|
||||
.. [1] Additional examples of projects with restricted namespaces:
|
||||
|
||||
- Amazon: `aws-cdk- <https://docs.aws.amazon.com/cdk/api/v2/python/>`__
|
||||
- Google: `google-cloud- <https://github.com/googleapis/google-cloud-python/tree/main/packages>`__
|
||||
and others based on ``google-``
|
||||
- Microsoft: `azure- <https://github.com/Azure/azure-sdk-for-python/tree/main/sdk>`__
|
||||
- `Typeshed <https://github.com/python/typeshed>`__ is a community effort to
|
||||
maintain type stubs for various packages. The stub packages they maintain
|
||||
mirror the package name they target and are prefixed by ``types-``. For
|
||||
example, the package ``requests`` has a stub that users would depend on
|
||||
called ``types-requests``. Unofficial stubs are not supposed to use the
|
||||
``types-`` prefix and are expected to use a ``-stubs`` suffix instead.
|
||||
- `Sphinx <https://www.sphinx-doc.org>`__ is a documentation framework
|
||||
popular for large technical projects such as
|
||||
`Swift <https://www.swift.org>`__ and Python itself. They have
|
||||
the concept of `extensions`__ which are prefixed by ``sphinxcontrib-``,
|
||||
many of which are maintained within a
|
||||
`dedicated organization <https://github.com/sphinx-contrib>`__.
|
||||
- `Apache Airflow <https://airflow.apache.org>`__ is a platform to
|
||||
programmatically orchestrate tasks as directed acyclic graphs (DAGs).
|
||||
They have the concept of `plugins`__, and also `providers`__ which are
|
||||
prefixed by ``apache-airflow-providers-``.
|
||||
|
||||
.. [2] Some examples of projects that have many packages with a common prefix:
|
||||
.. [2] Additional examples of projects with open namespaces:
|
||||
|
||||
- `Django <https://www.djangoproject.com>`__ is one of the most widely used
|
||||
web frameworks in existence. They have the concept of `reusable apps`__,
|
||||
which are commonly installed via
|
||||
`third-party packages <https://djangopackages.org>`__ that implement a
|
||||
subset of functionality to extend Django-based websites. These packages
|
||||
are by convention prefixed by ``django-`` or ``dj-``.
|
||||
- `Project Jupyter <https://jupyter.org>`__ is devoted to the development of
|
||||
tooling for sharing interactive documents. They support `extensions`__
|
||||
which in most cases (and in all cases for officially maintained
|
||||
extensions) are prefixed by ``jupyter-``.
|
||||
- `pytest <https://docs.pytest.org>`__ is Python's most popular testing
|
||||
framework. They have the concept of `plugins`__ which may be developed by
|
||||
anyone and by convention are prefixed by ``pytest-``.
|
||||
|
@ -392,30 +501,43 @@ Footnotes
|
|||
Markdown files. They also have the concept of
|
||||
`plugins <https://www.mkdocs.org/dev-guide/plugins/>`__ which may be
|
||||
developed by anyone and are usually prefixed by ``mkdocs-``.
|
||||
- `Sphinx <https://www.sphinx-doc.org>`__ is a documentation framework
|
||||
popular for large technical projects such as
|
||||
`Swift <https://www.swift.org>`__ and Python itself. They have
|
||||
the concept of `extensions`__ which are prefixed by ``sphinxcontrib-``,
|
||||
many of which are maintained within a
|
||||
`dedicated organization <https://github.com/sphinx-contrib>`__.
|
||||
- `OpenTelemetry <https://opentelemetry.io>`__ is an open standard for
|
||||
observability with `official packages`__ for the core APIs and SDK with
|
||||
`third-party packages`__ to collect data from various sources. All
|
||||
packages are prefixed by ``opentelemetry-`` with child prefixes in the
|
||||
form ``opentelemetry-<component>-<name>-``.
|
||||
- `Apache Airflow <https://airflow.apache.org>`__ is a platform to
|
||||
programmatically orchestrate tasks as directed acyclic graphs (DAGs).
|
||||
They have the concept of `plugins`__, and also `providers`__ which are
|
||||
prefixed by ``apache-airflow-providers-``.
|
||||
- `Datadog <https://www.datadoghq.com>`__ offers observability as a service
|
||||
for organizations at any scale. The
|
||||
`Datadog Agent <https://docs.datadoghq.com/agent/>`__ ships out-of-the-box
|
||||
with
|
||||
`official integrations <https://github.com/DataDog/integrations-core>`__
|
||||
for many products, like various databases and web servers, which are
|
||||
distributed as Python packages that are prefixed by ``datadog-``. There is
|
||||
support for creating `third-party integrations`__ which customers may run.
|
||||
|
||||
.. [3] The following shows the package prefixes for the major cloud providers:
|
||||
|
||||
- Amazon: `aws-cdk- <https://docs.aws.amazon.com/cdk/api/v2/python/>`__
|
||||
- Google: `google-cloud- <https://github.com/googleapis/google-cloud-python/tree/main/packages>`__
|
||||
and others based on ``google-``
|
||||
- Microsoft: `azure- <https://github.com/Azure/azure-sdk-for-python/tree/main/sdk>`__
|
||||
|
||||
.. [4] Examples of typosquatting attacks targeting Python users:
|
||||
|
||||
- ``django-`` namespace was squatted, among other packages, leading to
|
||||
a `postmortem <https://mail.python.org/pipermail/security-announce/2017-September/000000.html>`__
|
||||
by PyPI.
|
||||
- ``cupy-`` namespace was
|
||||
`squatted <https://github.com/cupy/cupy/issues/4787>`__ by a malicious
|
||||
actor thousands of times.
|
||||
- ``scikit-`` namespace was
|
||||
`squatted <https://blog.phylum.io/a-pypi-typosquatting-campaign-post-mortem/>`__,
|
||||
among other packages. Notice how packages with a known prefix are much
|
||||
more prone to successful attacks.
|
||||
- ``typing-`` namespace was
|
||||
`squatted <https://zero.checkmarx.com/malicious-pypi-user-strikes-again-with-typosquatting-starjacking-and-unpacks-tailor-made-malware-b12669cefaa5>`__
|
||||
and this would be useful to prevent as a `hidden grant <hidden-grants_>`__.
|
||||
|
||||
__ https://docs.djangoproject.com/en/5.1/intro/reusable-apps/
|
||||
__ https://jupyterlab.readthedocs.io/en/stable/user/extensions.html
|
||||
__ https://docs.pytest.org/en/stable/how-to/writing_plugins.html
|
||||
__ https://www.sphinx-doc.org/en/master/usage/extensions/index.html
|
||||
__ https://github.com/open-telemetry/opentelemetry-python
|
||||
__ https://github.com/open-telemetry/opentelemetry-python-contrib
|
||||
__ https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/plugins.html
|
||||
__ https://airflow.apache.org/docs/apache-airflow-providers/index.html
|
||||
__ https://docs.pytest.org/en/stable/how-to/writing_plugins.html
|
||||
__ https://docs.datadoghq.com/developers/integrations/agent_integration/
|
||||
|
||||
Copyright
|
||||
=========
|
||||
|
|
Loading…
Reference in New Issue