python-peps/peps/pep-0582.rst

352 lines
16 KiB
ReStructuredText

PEP: 582
Title: Python local packages directory
Version: $Revision$
Last-Modified: $Date$
Author: Kushal Das <mail@kushaldas.in>, Steve Dower <steve.dower@python.org>,
Donald Stufft <donald@stufft.io>, Alyssa Coghlan <ncoghlan@gmail.com>
Discussions-To: https://discuss.python.org/t/pep-582-python-local-packages-directory/963/
Status: Rejected
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 16-May-2018
Python-Version: 3.12
Post-History: `01-Mar-2019 <https://discuss.python.org/t/pep-582-python-local-packages-directory/963>`__,
Resolution: https://discuss.python.org/t/pep-582-python-local-packages-directory/963/430
Abstract
========
This PEP proposes extending the existing mechanism for setting up ``sys.path``
to include a new ``__pypackages__`` directory, in addition to the existing
locations. The new directory will be added at the start of ``sys.path``, after
the current working directory and just before the system site-packages, to give
packages installed there priority over other locations.
This is similar to the existing mechanism of adding the current directory (or
the directory the script is located in), but by using a subdirectory,
additional libraries are kept separate from the user's work.
Motivation
==========
New Python programmers can benefit from being taught the value of isolating an
individual project's dependencies from their system environment. However, the
existing mechanism for doing this, virtual environments, is known to be complex
and error-prone for beginners to understand. Explaining virtual environments is
often a distraction when trying to get a group of beginners set up - differences
in platform and shell environments require individual assistance, and the need
for activation in every new shell session makes it easy for students to make
mistakes when coming back to work after a break. This proposal offers a lightweight
solution that gives isolation without the user needing to understand more
advanced concepts.
Furthermore, standalone Python applications usually need 3rd party libraries to
function. Typically, they are either designed to be run from a virtual environment,
where the dependencies are installed into the environment alongside the application,
or they bundle their dependencies in a subdirectory, and modify ``sys.path`` at
application startup. Virtual environments, while a common and effective solution
(used, for example, by the ``pipx`` tool), are somewhat awkward to set up and manage,
and are not relocatable. On the other hand, manual manipulation of ``sys.path`` is
boilerplate that developers need to get right, and (being a runtime behaviour)
it is not understood by tools like linters and type checkers. The ``__pypackages__``
proposal formalises the idea of a "bundled dependencies" location, avoiding the
boilerplate and providing a standard location that development tools can be taught
to recognise.
It should be noted that in general, Python libraries cannot be simply copied
between machines, platforms, or even necessarily between Python versions. This
proposal does nothing to change that fact, and while it is tempting to assume
that bundling a script and its ``__pypackages__`` is a mechanism for
distributing applications, this is explicitly *not* a goal of this proposal.
Developers remain responsible for the portability of their code.
Rationale
=========
While ``sys.path`` can be manipulated at runtime, the default value is important, as
it establishes a common baseline that users and tools can agree on. The current default
does not include a location that could be viewed as "private to the current project",
and yet that is a useful concept.
This is similar to the npm ``node_modules`` directory, which is popular in the
Javascript community, and something that developers familiar with that
ecosystem often ask for from Python.
Specification
=============
This PEP proposes to add a new step in the process of calculating ``sys.path`` at
startup.
When the interactive interpreter starts, if a ``__pypackages__`` directory is
found in the current working directory, then it will be included in
``sys.path`` after the entry for current working directory and just before the
system site-packages.
When the interpreter runs a script, Python will try to find ``__pypackages__``
in the same directory as the script. If found (along with the current Python
version directory inside), then it will be used, otherwise Python will behave
as it does currently.
The behaviour should work exactly the same as the way the existing mechanism
for adding the current working directory or script directory to ``sys.path``
works. For example, ``__pypackages__`` will be ignored if the ``-P`` option or
the ``PYTHONSAFEPATH`` environment variable is set.
In order to be recognised, the ``__pypackages__`` directory must be laid out
according to a new ``localpackages`` scheme in the sysconfig module.
Specifically, both of the ``purelib`` and ``platlib`` directories must be
present, using the following code to determine the locations of those
directories::
scheme = "localpackages"
purelib = sysconfig.get_path("purelib", scheme, vars={"base": "__pypackages__", "platbase": "__pypackages__"})
platlib = sysconfig.get_path("platlib", scheme, vars={"base": "__pypackages__", "platbase": "__pypackages__"})
These two locations will be added to ``sys.path``, other directories or
files in the ``__pypackages__`` directory will be silently ignored. The
paths will be based on Python versions.
.. note:: There is a possible option of having a separate new API, it is documented at `issue #3013 <https://github.com/python/peps/issues/3013>`_.
Example
-------
The following shows an example project directory structure, and different ways
the Python executable and any script will behave. The example is for Unix-like
systems - on Windows the subdirectories will be different.
::
foo
__pypackages__
lib
python3.10
site-packages
bottle
myscript.py
/> python foo/myscript.py
sys.path[0] == 'foo'
sys.path[1] == 'foo/__pypackages__/lib/python3.10/site-packages/'
cd foo
foo> /usr/bin/ansible
#! /usr/bin/env python3
foo> python /usr/bin/ansible
foo> python myscript.py
foo> python
sys.path[0] == '.'
sys.path[1] == './__pypackages__/lib/python3.10/site-packages'
foo> python -m bottle
We have a project directory called ``foo`` and it has a ``__pypackages__``
inside of it. We have ``bottle`` installed in that
``__pypackages__/lib/python3.10/site-packages/``, and have a ``myscript.py``
file inside of the project directory. We have used whatever tool we generally
use to install ``bottle`` in that location.
For invoking a script, Python will try to find a ``__pypackages__`` inside of
the directory that the script resides [1]_, ``/usr/bin``. The same will happen
in case of the last example, where we are executing ``/usr/bin/ansible`` from
inside of the ``foo`` directory. In both cases, it will **not** use the
``__pypackages__`` in the current working directory.
Similarly, if we invoke ``myscript.py`` from the first example, it will use the
``__pypackages__`` directory that was in the ``foo`` directory.
If we go inside of the ``foo`` directory and start the Python executable (the
interpreter), it will find the ``__pypackages__`` directory inside of the
current working directory and use it in the ``sys.path``. The same happens if we
try to use the ``-m`` and use a module. In our example, ``bottle`` module will
be found inside of the ``__pypackages__`` directory.
The above two examples are only cases where ``__pypackages__`` from current
working directory is used.
In another example scenario, a trainer of a Python class can say "Today we are
going to learn how to use Twisted! To start, please checkout our example
project, go to that directory, and then run a given command to install Twisted."
That will install Twisted into a directory separate from ``python3``. There's no
need to discuss virtual environments, global versus user installs, etc. as the
install will be local by default. The trainer can then just keep telling them to
use ``python3`` without any activation step, etc.
.. [1] In the case of symlinks, it is the directory where the actual script
resides, not the symlink pointing to the script
Relationship to virtual environments
====================================
At its heart, this proposal is simply to modify the calculation of the default
value of ``sys.path``, and does not relate at all to the virtual environment
mechanism. However, ``__pypackages__`` can be viewed as providing an isolation
capability, and in that sense, it "competes" with virtual environments.
However, there are significant differences:
* Virtual environments are isolated from the system environment, whereas
``__pypackages__`` simply adds to the system environment.
* Virtual environments include a full "installation scheme", with directories
for binaries, C header files, etc., whereas ``__pypackages__`` is solely
for Python library code.
* Virtual environments work most smoothly when "activated". This proposal
needs no activation.
This proposal should be seen as independent of virtual environments, not competing
with them. At best, some use cases currently only served by virtual environments
can also be served (possibly better) by ``__pypackages__``.
It should be noted that libraries installed in ``__pypackages__`` will be visible
in a virtual environment. This arguably breaks the isolation of virtual environments,
but it is no different in principle to the presence of the current directory on
``sys.path`` (or mechanisms like the ``PYTHONPATH`` environment variable). The only
difference is in degree, as the expectation is that people will more commonly install
packages in ``__pypackages__``. The alternative would be to explicitly detect virtual
environments and disable ``__pypackages__`` in that case - however that would break
scripts with bundled dependencies. The PEP authors believe that developers using
virtual environments should be experienced enough to understand the issue and
anticipate and avoid any problems.
Security Considerations
=======================
In theory, it is possible to add a library to the ``__pypackages__`` directory
that overrides a stdlib module or an installed 3rd party library. For the
``__pypackages__`` associated with a script, this is assumed not to be a
significant issue, as it is unlikely that anyone would be able to write to
``__pypackages__`` unless they also had the ability to write to the script itself.
For a ``__pypackages__`` directory in the current working directory, the
interactive interpreter could be affected. However, this is not significantly
different than the existing issue of someone having a ``math.py`` mdule in their
current directory, and while (just like that case) it can cause user confusion,
it does not introduce any new security implications.
When running a script, any ``__pypackages__`` directory in the current working
directory is ignored. This is the same approach Python uses for adding the
current working directory to ``sys.path`` and ensures that it is not possible
to change the behaviour of a script by modifying files in the current
directory.
Also, a ``__pypackages__`` directory is only recognised in the current (or
script) directory. The interpreter will *not* scan for ``__pypackages__`` in
parent directories. Doing so would open up the risk of security issues if
directory permissions on parents differ. In particular, scripts in the ``bin``
directory or ``__pypackages__`` (the ``scripts`` location in ``sysconfig``
terms) have no special access to the libraries installed in ``__pypackages__``.
Putting executable scripts in a ``bin`` directory is not supported by this
proposal.
How to Teach This
=================
The original motivation for this proposal was to make it easier to teach Python
to beginners. To that end, it needs to be easy to explain, and simple to use.
At the most basic level, this is similar to the existing mechanism where the
script directory is added to ``sys.path`` and can be taught in a similar manner.
However, for its intended use of "lightweight isolation", it would likely be taught
in terms of "things you put in a ``__pypackages__`` directory are private to your
script". The experience of the PEP authors suggests that this would be significantly
easier to teach than the current alternative of introducing virtual environments.
Impact on Tools
===============
As the intended use of the feature is to install 3rd party libraries in the new
directory, it is important that tools, particularly installers, understand how to
manage ``__pypackages__``.
It is hoped that tools will introduce a dedicated "pypackages" installation
mode that *is* guaranteed to match the expected layout in all cases. However,
the question of how best to support the ``__pypackages__`` layout is ultimately
left to individual tool maintainers to consider and decide on.
Tools that locate packages without actually running Python code (IDEs, linters,
type checkers, etc.) would need updating to recognise ``__pypackages__``. In the
absence of such updates, the ``__pypackages__`` directory would work similarly
to directories currently added to ``sys.path`` at runtime (i.e., the tool would
probably ignore it).
Backwards Compatibility
=======================
The directory name ``__pypackages__`` was chosen because it is unlikely to be in
common use. It is true that users who have chosen to use that name for their own
purposes will be impacted, but at the time this PEP was written, this was viewed
as a relatively low risk.
Unfortunately, in the time this PEP has been under discussion, a number of tools
have chosen to implement variations on what is being proposed here, which are not
all compatible with the final form of the PEP. As a result, the risk of clashes is
now higher than originally anticipated.
It would be possible to mitigate this by choosing a *different* name, hopefully as
uncommon as ``__pypackages__`` originally was. But realistically, any compatibility
issues can be viewed as simply the consequences of people trying to implement
draft proposals, without making the effort to track changes in the proposal. As such,
it seems reasonable to retain the ``__pypackages__`` name, and put the burden of
addressing the compatibility issue on the tools that implemented the draft version.
Impact on other Python implementations
--------------------------------------
Other Python implementations will need to replicate the new behavior of the
interpreter bootstrap, including locating the ``__pypackages__`` directory and
adding it the ``sys.path`` just before site packages, if it is present. This is
no different to any other Python change.
Reference Implementation
========================
`Here <https://github.com/kushaldas/pep582>`_ is a small script which will
enable the implementation for ``Cpython`` & in ``PyPy``.
Rejected Ideas
==============
* Alternative names, such as ``__pylocal__`` and ``python_modules``. Ultimately, the name is arbitrary and the chosen name is good enough.
* Additional features of virtual environments. This proposal is not a replacement for virtual environments, and such features are therefore out of scope.
* We will not scan any parent directory to find ``__pypackages__``. If we want to execute scripts inside of the ``~/bin/`` directory, then the ``__pypackages__`` directory must be inside of the ``~/bin/`` directory. Doing any such scan for ``__pypackages__`` (for the interpreter or a script) will have security implications and also increase startup time.
* Raise an error if unexpected files or directories are present in ``__pypackages__``. This is considered too strict, particularly as transitional approaches like ``pip install --prefix`` can create additional files in ``__pypackages__``.
* Using a different ``sysconfig`` scheme, or a dedicated ``pypackages`` scheme. While this is attractive in theory, it makes transition harder, as there will be no readily-available way of installing to ``__pypackages__`` until tools implement explicit support. And while the PEP authors hope and assume that such support would be added, having the proposal dependent on such support in order to be usable seems like an unacceptable risk.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 80
coding: utf-8
End: