PEP 615: Add PEP proposing zoneinfo module (#1316)

* Add IANA Time Zone Support PEP

* Add section rejecting Windows native time zones

* Normalize code-block and drop python3

Apparently this is rendered without pygments support, which is causing
the build to fail.

* Add justification for a separate module

* Clean up and reorganize footnotes and citations

This adds headlines / citations for each footnote, rather than bare
links, and tries to put them in a more logical grouping, and makes
them all render without issues.

* Add / adjust proposal targets

* Add justification for set_tzpath

Basically, "people are going to do this anyway, might as well have them
use a function that we can at least deprecate if it causes major
problems."

* Add citation for rearguard format

* Fix typos and formatting in abstract

Fixes a few typos and reformats the abstract using semantic newlines.

* Update location of reference implementation

* Tweaks to motivation section

* Clarify wording about PEP organization

* Tweaks to section on the ZoneInfo class

This adjusts the wording and fixes typos in the constructors and pickle
sections.

* Tweaks 'Sources for time zone data'

These are typos, clarifications and wording adjustments in the section
on time zone data sources.

* Use semantic newlines in 'Security implications'

* Clarify what 'no special verification' means

* Add clarification to custom compiler section

* Tweaks to wording on 'tzdata' in the standard library

* Tweaks to Windows section

* Adjust the justification for rejecting pytz

* Add quotation marks around blog titles

* Abandon semantic newlines

* Rename title

* Move module name into 'Open issues'

* Comment out temporarily unused headers

* Add open issue about TZPATH structure

* Remove erroneous blockquotes

* Add discourse link and Post-History
This commit is contained in:
Paul Ganssle 2020-02-25 10:56:05 -05:00 committed by GitHub
parent 2e5c584c6b
commit b674d3ae26
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 800 additions and 0 deletions

800
pep-0615.rst Normal file
View File

@ -0,0 +1,800 @@
PEP: 615
Title: Support for the IANA Time Zone Database in the Standard Library
Author: Paul Ganssle <paul at ganssle.io>
Discussions-To: https://discuss.python.org/t/3468
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2020-02-22
Python-Version: 3.9
Post-History: 2020-02-25
Replaces: 431
Abstract
========
This proposes adding a module, ``zoneinfo``, to provide a concrete time zone
implementation supporting the IANA time zone database. By default,
``zoneinfo`` will use the system's time zone data if available; if no system
time zone data is available, the library will fall back to using the
first-party package ``tzdata``, deployed on PyPI.
Motivation
==========
The ``datetime`` library uses a flexible mechanism to handle time zones: all
conversions and time zone information queries are delegated to an instance of a
subclass of the abstract ``datetime.tzinfo`` base class. [#tzinfo]_ This allows
users to implement arbitrarily complex time zone rules, but in practice the
majority of users want support for just three types of time zone: [a]_
1. UTC and fixed offsets thereof
2. The system local time zone
3. IANA time zones
In Python 3.2, the ``datetime.timezone`` class was introduced to support the
first class of time zone (with a special ``datetime.timezone.utc`` singleton
for UTC).
While there is still no "local" time zone, in Python 3.0 the semantics of naïve
time zones was changed to support many "local time" operations, and it is now
possible to get a fixed time zone offset from a local time::
>>> print(datetime(2020, 2, 22, 12, 0).astimezone())
2020-02-22 12:00:00-05:00
>>> print(datetime(2020, 2, 22, 12, 0).astimezone()
... .strftime("%Y-%m-%d %H:%M:%S %Z"))
2020-02-22 12:00:00 EST
>>> print(datetime(2020, 2, 22, 12, 0).astimezone(timezone.utc))
However, there is still no support for the time zones described in the IANA
time zone database (also called the "tz" database or the Olson database
[#tzdb-wiki]_). The time zone database is in the public domain and is widely
distributed — it is present by default on many Unix-like operating systems.
Great care goes into the stability of the database: there are IETF RFCs both
for the maintenance procedures (RFC 6557 [#rfc6557]_) and for the compiled
binary (TZif) format (RFC 8636 [#rfc8536]_). As such, it is likely that adding
support for the compiled outputs of the IANA database will add great value to
end users even with the relatively long cadence of standard library releases.
Proposal
========
This PEP has three main concerns:
1. The semantics of the ``zoneinfo.ZoneInfo`` class (zoneinfo-class_)
2. Time zone data sources used (data-sources_)
3. Options for configuration of the time zone search path (search-path-config_)
Because of the complexity of the proposal, rather than having separate
"specification" and "rationale" sections the design decisions and rationales
are grouped together by subject.
.. _zoneinfo-class:
The ``zoneinfo.ZoneInfo`` class
-------------------------------
.. _Constructors:
Constructors
############
The initial design of the ``zoneinfo.ZoneInfo`` class has several constructors.
.. code-block::
ZoneInfo(key: str)
The primary constructor takes a single argument, ``key``, which is a string
indicating the name of a zone file in the system time zone database (e.g.
``"America/New_York"``, ``"Europe/London"``), and returns a ``ZoneInfo``
constructed from the first matching TZif file on the search path (see the
data-sources_ section for more details).
One somewhat unusual guarantee made by this constructor is that calls with
identical arguments must return *identical* objects. Specifically, for all
values of ``key``, the following assertion must always be valid [b]_::
a = ZoneInfo(key)
b = ZoneInfo(key)
assert a is b
The reason for this comes from the fact that the semantics of datetime
operations (e.g. comparison, arithmetic) depend on whether the datetimes
involved represent the same or different zones; two datetimes are in the same
zone only if ``dt1.tzinfo is dt2.tzinfo``. [#nontransitive_comp]_ In addition
to the modest performance benefit from avoiding unnecessary proliferation of
``ZoneInfo`` objects, providing this guarantee should minimize surprising
behavior for end users.
|dateutil.tz.gettz| has provided a similar guarantee since version 2.7.0
(release March 2018). [#dateutil-tz]_
.. |dateutil.tz.gettz| replace:: ``dateutil.tz.gettz``
.. _dateutil.tz.gettz: https://dateutil.readthedocs.io/en/stable/tz.html#dateutil.tz.gettz
.. note::
The implementation may decide how to implement the cache behavior, but the
guarantee made here only requires that as long as two references exist to
the result of identical constructor calls, they must be references to the
same object. This is consistent with a reference counted cache where
``ZoneInfo`` objects are ejected when no references to them exist — it is
allowed but not required or recommended to implement this with a "strong"
cache, where all ``ZoneInfo`` files are kept alive indefinitely.
.. code-block::
ZoneInfo.nocache(key: str)
This is an alternate constructor that bypasses the constructor's cache. It is
identical to the primary constructor, but returns a new object on each call.
This is likely most useful for testing purposes, or to deliberately induce
"different zone" semantics between datetimes with the same nominal time zone.
.. code-block::
ZoneInfo.from_file(fobj: IO[bytes], /, key: str = None)
This is an alternate constructor that allows the construction of a ``ZoneInfo``
object from any TZif byte stream. This constructor takes an optional
parameter, ``key``, which sets the name of the zone, for the purposes of
``__str__`` and ``__repr__`` (see Representations_).
Unlike the primary constructor, this always constructs a new object. There are
two reasons that this deviates from the primary constructor's caching behavior:
stream objects have mutable state and so determining whether two inputs are
identical is difficult or impossible, and it is likely that users constructing
from a file specifically want to load from that file and not a cache.
.. _Representations:
String representation
#####################
The ``ZoneInfo`` class's ``__str__`` representation will be drawn from the
``key`` parameter. This is partially because the ``key`` represents a
human-readable "name" of the string, but also because it is a useful parameter
that users will want exposed. It is necessary to provide a mechanism to expose
the key for serialization between languages and because it is also a primary
key for localization projects like CLDR (the Unicode Common Locale Data
Repository [#cldr]_).
An example:
.. code-block::
>>> zone = ZoneInfo("Pacific/Kwajalein")
>>> str(zone)
'Pacific/Kwajalein'
When a ``key`` is not specified, the ``str`` operation should not fail, but
should return the empty string::
>>> with open("/dev/null", "w") as f:
... zone = ZoneInfo.from_file(f)
>>> str(zone)
''
Pickle serialization
####################
There are two reasonable options for the pickling behavior of ``ZoneInfo``
files: serialize the key when available and reconstruct the object from from
the files on disk during deserialization, or serialize all the data in the
object (including all transitions). This PEP proposes to choose the *second*
behavior, and unconditionally serialize all transition data.
The first behavior makes for much smaller files, but may result in different
behavior if the object is unpickled in an environment with a different version
of the time zone database. For example, a pickle for
``ZoneFile("Asia/Qostanay")`` generated from version 2019c of the database
would fail to deserialize in an environment with version 2018a, since the
``"Asia/Qostanay"`` zone was added in 2018h. More subtle failures are also
possible if offsets or the timing of offset changes has changed between the two
versions.
Serializing only the key would also fail for objects created from a file
without specifying a key, and so a fallback mechanism serializing all
transitions would need to be provided anyway, bringing additional maintenance
burdens.
There are many other failures that can occur when using ``pickle`` to send
objects between non-identical environments, but nevertheless it is still
commonly done, and so it seems that the benefit of smaller file sizes is likely
outweighed by the costs.
.. _data-sources:
Sources for time zone data
--------------------------
One of the hardest challenges for IANA time zone support is keeping the data up
to date; between 1997 and 2020, there have been between 3 and 21 releases per
year, often in response to changes in time zone rules with little to no notice
(see [#timing-of-tz-changes]_ for more details). In order to keep up to date,
and to give the system administrator control over the data source, we propose
to use system-deployed time zone data wherever possible. However, not all
systems ship a publicly accessible time zone database — notably Windows uses a
different system for managing time zones — and so if available ``zoneinfo``
falls back to an installable first-party package, ``tzdata``, available on
PyPI. If no system zoneinfo files are found but ``tzdata`` is installed, the
primary ``ZoneInfo`` constructor will use ``tzdata`` as the time zone source.
System time zone information
############################
Many Unix-like systems deploy time zone data by default, or provide a canonical
time zone data package (often called ``tzdata``, as it is on Arch Linux, RedHat
and Debian). Whenever possible, it would be preferable to defer to the system
time zone information, because this allows time zone information for all
language stacks to be updated and maintained in one place. Python distributors
are encouraged to ensure that time zone data is installed alongside Python
whenever possible (e.g. by declaring ``tzdata`` as a dependency for the
``python`` package).
The ``zoneinfo`` module will use a "search path" strategy analogous to the
``PATH`` environment variable or the ``sys.path`` variable in Python; the
``zoneinfo.TZPATH`` variable will be read-only (see search-path-config_ for
more details), ordered list of time zone data locations to search. When
creating a ``ZoneInfo`` instance from a key, the zone file will be constructed
from the first data source on the path in which the key exists, so for example,
if ``TZPATH`` were::
TZPATH = (
"/usr/share/zoneinfo",
"/etc/zoneinfo"
)
and (although this would be very unusual) ``/usr/share/zoneinfo`` contained
only ``America/New_York`` and ``/etc/zoneinfo`` contained both
``America/New_York`` and ``Europe/Moscow``, then
``ZoneInfo("America/New_York")`` would be satisfied by
``/usr/share/zoneinfo/America/New_York``, while ``ZoneInfo("Europe/Moscow")``
would be satisfied by ``/etc/zoneinfo/Europe/Moscow``.
At the moment, on Windows systems, the search path will default to empty,
because Windows does not officially ship a copy of the time zone database. On
non-Windows systems, the search path will default to a list of the most
commonly observed search paths. Although this is subject to change in future
versions, at launch the default search path will be::
TZPATH = (
"/usr/share/zoneinfo",
"/usr/lib/zoneinfo",
"/usr/share/lib/zoneinfo",
"/etc/zoneinfo",
)
This may be configured both at compile time or at runtime; more information on
configuration options at search-path-config_.
The ``tzdata`` Python package
#############################
In order to ensure easy access to time zone data for all end users, this PEP
proposes to create a data-only package ``tzdata`` as a fallback for when system
data is not available. The ``tzdata`` package would be distributed on PyPI as
a "first party" package, maintained by the CPython development team.
The ``tzdata`` package contains only data and metadata, with no public-facing
functions or classes. It will be designed to be compatible with both newer
``importlib.resources`` [#importlib_resources]_ access patterns and older
access patterns like ``pkgutil.get_data`` [#pkgutil_data]_ .
While it is designed explicitly for the use of CPython, the ``tzdata`` package
is intended as a public package in its own right, and it may be used as an
"official" source of time zone data for third party Python packages.
.. _search-path-config:
Search path configuration
-------------------------
The time zone search path is very system-dependent, and sometimes even
application-dependent, and as such it makes sense to provide options to
customize it. This PEP provides for three such avenues for customization:
1. Global configuration via a compile-time option
2. Per-run configuration via environment variables
3. Runtime configuration change via a ``set_tzpath`` function
Compile-time options
####################
It is most likely that downstream distributors will know exactly where their
system time zone data is deployed, and so a compile-time option
``PYTHONTZPATH`` will be provided to set the default search path.
The ``PYTHONTZPATH`` option should be a string delimited by ``os.pathsep``,
listing possible locations for the time zone data to be deployed (e.g.
``/usr/share/zoneinfo``).
Environment variables
#####################
When initializing ``TZPATH`` (and whenever ``set_tzpath`` is called with no
arguments), the ``zoneinfo`` module will use two environment variables,
``PYTHONTZPATH`` and ``PYTHONTZPATH_APPEND``, if they exist, to set the search
path.
Both are ``os.pathsep`` delimited strings. ``PYTHONTZPATH`` *replaces* the
default time zone path, whereas ``PYTHONTZPATH_APPEND`` appends to the end of
the time zone path.
Some examples of the proposed semantics::
$ python print_tzpath.py
("/usr/share/zoneinfo",
"/usr/lib/zoneinfo",
"/usr/share/lib/zoneinfo",
"/etc/zoneinfo")
$ PYTHONTZPATH="/etc/zoneinfo:/usr/share/zoneinfo" python print_tzpath.py
("/etc/zoneinfo",
"/usr/share/zoneinfo")
$ PYTHONTZPATH="" python print_tzpath.py
()
$ PYTHONTZPATH_APPEND="/my/directory" python print_tzpath.py
("/usr/share/zoneinfo",
"/usr/lib/zoneinfo",
"/usr/share/lib/zoneinfo",
"/etc/zoneinfo",
"/my/directory")
``set_tzpath`` function
#######################
``zoneinfo`` provides a ``set_tzpath`` function that allows for changing the
search path at runtime.
.. code-block::
def set_tzpath(tzpaths: Optional[Sequence[Union[str, Pathlike]]]) -> None:
...
When called with a sequence of paths, this function sets ``zoneinfo.TZPATH`` to
a tuple constructed from the desired value. When called with no arguments or
``None``, this function resets ``zoneinfo.TZPATH`` to the default
configuration.
This is likely to be primarily useful for (permanently or temporarily)
disabling the use of system time zone paths and forcing the module to use the
``tzdata`` package. It is not likely that ``set_tzpath`` will be a common
operation, save perhaps in test functions sensitive to time zone configuration,
but it seems preferable to provide an official mechanism for changing this
rather than allowing a proliferation of hacks around the immutability of
``TZPATH``.
.. caution::
Although changing ``TZPATH`` during a run is a supported operation, users
should be advised that doing so may occasionally lead to unusual semantics,
and when making design trade-offs greater weight will be afforded to using
a static ``TZPATH``, which is the much more common use case.
As noted in Constructors_, the primary ``ZoneInfo`` constructor employs a cache
to ensure that two identically-constructed ``ZoneInfo`` objects always compare
as identical (i.e. ``ZoneInfo(key) is ZoneInfo(key)``), and the nature of this
cache is implementation-defined. This means that the behavior of the
``ZoneInfo`` constructor may be unpredictably inconsistent in some situations
when used with the same ``key`` under different values of ``TZPATH``. For
example::
>>> set_tzpath(["/my/custom/tzdb"])
>>> a = ZoneInfo("My/Custom/Zone")
>>> set_tzpath()
>>> b = ZoneInfo("My/Custom/Zone")
>>> del a
>>> del b
>>> c = ZoneInfo("My/Custom/Zone")
In this example, ``My/Custom/Zone`` exists only in the ``/my/custom/tzdb`` and
not on the default search path. In all implementations the constructor for
``a`` must succeed. It is implementation-defined whether the constructor for
``b`` succeeds, but if it does, it must be true that ``a is b``, because both
``a`` and ``b`` are references to the same key. It is also
implementation-defined whether the constructor for ``c`` succeeds.
Implementations of ``zoneinfo`` *may* return the object constructed in previous
constructor calls, or they may fail with an exception.
Backwards Compatibility
=======================
This will have no backwards compatibility issues as it will create a new API.
With only minor modification, a backport with support for Python 3.6+ of the
``zoneinfo`` module could be created.
The ``tzdata`` package is designed to be "data only", and should support any
version of Python that it can be built for (including Python 2.7).
Security Implications
=====================
This will require parsing zoneinfo data from disk, mostly from system locations
but potentially from user-supplied data. Errors in the implementation
(particularly the C code) could cause potential security issues, but there is
no special risk relative to parsing other file types.
Reference Implementation
========================
An initial reference implementation is available at
https://github.com/pganssle/zoneinfo
This may eventually be converted into a backport for 3.6+.
Rejected Ideas
==============
Building a custom tzdb compiler
-------------------------------
One major concern with the use of the TZif format is that it does not actually
contain enough information to always correctly determine the value to return
for ``tzinfo.dst()``. This is because for any given time zone offset, TZif
only marks the UTC offset and whether or not it represents a DST offset, but
``tzinfo.dst()`` returns the total amount of the DST shift, so that the
"standard" offset can be reconstructed from ``datetime.utcoffset() -
datetime.dst()``. The value to use for ``dst()`` can be determined by finding
the equivalent STD offset and calculating the difference, but the TZif format
does not specify which offsets form STD/DST pairs, and so heuristics must be
used to determine this.
One common heuristic — looking at the most recent standard offset — notably
fails in the case of the time zone changes in Portugal in 1992 and 1996, where
the "standard" offset was shifted by 1 hour during a DST transition, leading to
a transition from STD to DST status with no change in offset. In fact, it is
possible (though it has never happened) for a time zone to be created that is
permanently DST and has no standard offsets.
Although this information is missing in the compiled TZif binaries, it is
present in the raw tzdb files, and it would be possible to parse this
information ourselves and create a more suitable binary format.
This idea was rejected for several reasons:
1. It precludes the use of any system-deployed time zone information, which is
usually present only in TZif format.
2. The raw tzdb format, while stable, is *less* stable than the TZif format;
some downstream tzdb parsers have already run into problems with old
deployments of their custom parsers becoming incompatible with recent tzdb
releases, leading to the creation of a "rearguard" format to ease the
transition. [#rearguard]_
3. Heuristics currently suffice in ``dateutil`` and ``pytz`` for all known time
zones, historical and present, and it is not very likely that new time zones
will appear that cannot be captured by heuristics — though it is somewhat
more likely that new rules that are not captured by the *current* generation
of heuristics will appear; in that case, bugfixes would be required to
accommodate the changed situation.
4. The ``dst()`` method's utility (and in fact the ``isdst`` parameter in TZif)
is somewhat questionable to start with, as almost all the useful information
is contained in the ``utcoffset()`` and ``tzname()`` methods, which are not
subject to the same problems.
In short, maintaining a custom tzdb compiler or compiled package adds
maintenance burdens to both the CPython dev team and system administrators, and
its main benefit is to address a hypothetical failure that would likely have
minimal real world effects were it to occur.
.. _why-no-default-tzdata:
Including ``tzdata`` in the standard library by default
-------------------------------------------------------
Although PEP 453 [#pep453-ensurepip]_, which introduced the ``ensurepip``
mechanism to CPython, provides a convenient template for a standard library
module maintained on PyPI, a potentially similar ``ensuretzdata`` mechanism is
somewhat less necessary, and would be complicated enough that it is considered
out of scope for this PEP.
Because the ``zoneinfo`` module is designed to use the system time zone data
wherever possible, the ``tzdata`` package is unnecessary (and may be
undesirable) on systems that deploy time zone data, and so it does not seem
critical to ship ``tzdata`` with CPython.
It is also not yet clear how these hybrid standard library / PyPI modules
should be updated, (other than ``pip``, which has a natural mechanism for
updates and notifications) and since it is not critical to the operation of the
module, it seems prudent to defer any such proposal.
Incorporating Windows' native time zone support
-----------------------------------------------
Windows has a non-IANA source of time zone information, along with public APIs
for accessing the data. Theoretically these could be supported in the
``zoneinfo`` module, but in practice they would not map cleanly enough to TZif
files to provide a good platform-independent experience, and a specialized API
supporting Windows time zones is a niche enough concern that it would be better
provided by a third party package.
The current Windows system time zones are provided by ``tzres.dll``, which
contains a list of simple rules for either fixed offsets or time zones with 2
DST transitions per year (DST start and DST end). The rules use
Windows-specific names such as "Eastern Standard Time" as opposed to
"America/New_York", and they contain no historical data.
Even if it were simple to unambiguously map IANA time zones to a
Windows-specific time zone name, the lack of historical data makes
Windows-style time zones sufficiently different that they cannot be used as a
drop-in replacement for the IANA database. They are also restricted to either
0 or 2 DST transitions per year, occurring on a regular schedule. This means
that, for example, the "Africa/Casablanca" time zone cannot be accurately
represented using its Windows equivalent, because for many years Morocco has
observed Daylight Saving Time during the summer months *except* during Ramadan,
and thus has 4 transitions per year in years where Ramadan overlaps with the
DST period.
Considering there is no easy way to use Microsoft's preferred APIs to emulate
IANA time zone support, it is best left to third parties (or at least a
different PEP) to provide a dedicated Windows time zone support library. In
fact, the ``dateutil`` package already provides ``dateutil.tz.win``
[#dateutil-tzwin]_, which contains ``tzinfo`` classes utilizing Windows system
time zone data.
If Microsoft were to provide a public system for accessing IANA time zone data,
even if it were somewhat unusual compared to access patterns on Unix-like
systems, the ``zoneinfo`` module should add support for it.
Using a ``pytz``-like interface
-------------------------------
A ``pytz``-like ([#pytz]_) interface was proposed in PEP 431 [#pep431]_, but
was ultimately withdrawn / rejected for lack of ambiguous datetime support.
PEP 495 [#pep495]_ added the ``fold`` attribute to address this problem, but
``fold`` obviates the need for ``pytz``'s non-standard ``tzinfo`` classes, and
so a ``pytz``-like interface is no longer necessary. [#fastest-footgun]_
The ``zoneinfo`` approach is more closely based on ``dateutil.tz``, which
implemented support for ``fold`` (including a backport to older versions) just
before the release of Python 3.6.
Open Issues
===========
Using the ``datetime`` module
-----------------------------
One possible idea would be to add ``ZoneInfo`` to the ``datetime`` module,
rather than giving it its own separate module. In the current version of the
PEP, this has been resolved in favor of using a separate module, for the
reasons detailed below, but the use of a nested submodule ``datetime.zoneinfo``
is also under consideration.
Arguments against putting ``ZoneInfo`` directly into ``datetime``
#################################################################
The ``datetime`` module is already somewhat crowded, as it has many classes
with somewhat complex behavior — ``datetime.datetime``, ``datetime.date``,
``datetime.time``, ``datetime.timedelta``, ``datetime.timezone`` and
``datetime.tzinfo``. The module's implementation and documentation are already
quite complicated, and it is probably beneficial to try to not to compound the
problem if it can be helped.
The ``ZoneInfo`` class is also in some ways different from all the other
classes provided by ``datetime``; the other classes are all intended to be
lean, simple data types, whereas the ``ZoneInfo`` class is more complex: it is
a parser for a specific format (TZif), a representation for the information
stored in that format and a mechanism to look up the information in well-known
locations in the system.
Finally, while it is true that someone who needs the ``zoneinfo`` module also
needs the ``datetime`` module, the reverse is not necessarily true: many people
will want to use ``datetime`` without ``zoneinfo``. Considering that
``zoneinfo`` will likely pull in additional, possibly more heavy-weight
standard library modules, it would be preferable to allow the two to be
imported separately — particularly if potential "tree shaking" distributions
are in Python's future. [#tree-shaking]_
In the final analysis, it makes sense to keep ``zoneinfo`` a separate module
with a separate documentation page rather than to put its classes and functions
directly into ``datetime``.
Using ``datetime.zoneinfo`` instead of ``zoneinfo``
###################################################
A more palatable configuration may be to nest ``zoneinfo`` as a module under
``datetime``, as ``datetime.zoneinfo``.
Arguments in favor of this:
1. It neatly namespaces ``zoneinfo`` together with ``datetime``
2. The ``timezone`` class is already in ``datetime``, and it may seem strange
that some time zones are in ``datetime`` and others are in a top-level
module.
3. As mentioned earlier, importing ``zoneinfo`` necessarily requires importing
``datetime``, so it is no imposition to require importing the parent module.
Arguments against this:
1. In order to avoid forcing all ``datetime`` users to import ``zoneinfo``, the
``zoneinfo`` module would need to be lazily imported, which means that
end-users would need to explicitly import ``datetime.zoneinfo`` (as opposed
to importing ``datetime`` and accessing the ``zoneinfo`` attribute on the
module). This is the way ``dateutil`` works (all submodules are lazily
imported), and it is a perennial source of confusion for end users.
This confusing requirement from end-users can be avoided using a
module-level ``__getattr__`` and ``__dir__`` per PEP 562, but this would
add some complexity to the implementation of the ``datetime`` module. This
sort of behavior in modules or classes tends to confuse static analysis
tools, which may not be desirable for a library as widely-used and critical
as ``datetime``.
2. Nesting the implementation under ``datetime`` would likely require
``datetime`` to be reorganized from a single-file module (``datetime.py``)
to a directory with an ``__init__.py``. This is a minor concern, but the
structure of the ``datetime`` module has been stable for many years, and it
would be preferable to avoid churn if possible.
This concern *could* be alleviated by implementing ``zoneinfo`` as
``_zoneinfo.py`` and importing it as ``zoneinfo`` from within ``datetime``,
but this does not seem desirable from an aesthetic or code organization
standpoint, and it would preclude the version of nesting where end users are
required to explicitly import ``datetime.zoneinfo``.
This PEP currently takes the position that on balance it would be best to use a
separate top-level ``zoneinfo`` module because the benefits of nesting are not
so great that it overwhelms the practical implementation concerns, but this
still requires some discussion.
Structure of the ``PYTHON_TZPATH`` environment variables
========================================================
This PEP proposes two variables to set the time zone path: ``PYTHONTZPATH`` and
``PYTHONTZPATH_APPEND``. This is based on the assumption that the majority of
users who would want to manipulate the time zone path would want to fully
replace it (e.g. "I know exactly where my time zone data is"), and a smaller
fraction would want to use the standard time zone paths wherever possible, but
add additional locations (possibly containing custom time zones).
There are several other schemes that were considered and weakly rejected:
1. Separate these into a ``DEFAULT_PYTHONTZPATH`` and ``PYTHONTZPATH``
variable, where ``PYTHONTZPATH`` would contain values to append (or prepend)
to the default time zone path, and ``DEFAULT_PYTHONTZPATH`` would *replace*
the default time zone path. This was rejected because it would likely lead
to user confusion if the primary use case is to replace rather than augment.
2. Use *only* the ``PYTHONTZPATH`` variable, but provide a custom special value
that represents the default time zone path, e.g. ``<<DEFAULT_TZPATH>>``, so
users could append to the time zone path with, e.g.
``PYTHONTZPATH=<<DEFAULT_TZPATH>>:/my/path`` could be used to append
``/my/path`` to the end of the time zone path.
This was rejected mainly because these sort of special values are not
usually found in ``PATH``-like variables, and it would be hard to discover
mistakes in your implementation.
Footnotes
=========
.. [a]
The claim that the vast majority of users only want a few types of time
zone is based on anecdotal impressions rather than anything remotely
scientific. As one data point, ``dateutil`` provides many time zone types,
but user support mostly focuses on these three types.
.. [b]
The statement that identically constructed ``ZoneInfo`` files should be
identical objects may be violated if the user deliberately clears the time
zone cache.
References
==========
.. [#rfc6557]
RFC 6557: Procedures for Maintaining the Time Zone Database
https://tools.ietf.org/html/rfc6557
.. [#rfc8536]
RFC 8636: The Time Zone Information Format (TZif)
https://tools.ietf.org/html/rfc8536
.. [#nontransitive_comp]
Paul Ganssle: "A curious case of non-transitive datetime comparison"
(Published 15 February 2018)
https://blog.ganssle.io/articles/2018/02/a-curious-case-datetimes.html
.. [#fastest-footgun]
Paul Ganssle: "pytz: The Fastest Footgun in the West" (Published 19 March
2018) https://blog.ganssle.io/articles/2018/03/pytz-fastest-footgun.html
.. [#cldr]
CLDR: Unicode Common Locale Data Repository
http://cldr.unicode.org/#TOC-How-to-Use-
.. [#tzdb-wiki]
Wikipedia page for Tz database:
https://en.wikipedia.org/wiki/Tz_database
.. [#timing-of-tz-changes]
Code of Matt: "On the Timing of Time Zone Changes" (Matt Johnson-Pint, 23
April 2016) https://codeofmatt.com/on-the-timing-of-time-zone-changes/
.. [#rearguard]
tz mailing list: [PROPOSED] Support zi parsers that mishandle negative DST
offsets (Paul Eggert, 23 April 2018)
https://mm.icann.org/pipermail/tz/2018-April/026421.html
.. [#tree-shaking]
"Russell Keith-Magee: Python On Other Platforms" (15 May 2019, Jesse Jiryu
Davis)
https://pyfound.blogspot.com/2019/05/russell-keith-magee-python-on-other.html
.. [#pep453-ensurepip]
PEP 453: Explicit bootstrapping of pip in Python installations
https://www.python.org/dev/peps/pep-0453/
.. [#pep431]
PEP 431: Time zone support improvements
https://www.python.org/dev/peps/pep-0431/
.. [#pep495]
PEP 495: Local Time Disambiguation
https://www.python.org/dev/peps/pep-0495/
.. [#tzinfo]
``datetime.tzinfo`` documentation
https://docs.python.org/3/library/datetime.html#datetime.tzinfo
.. [#importlib_resources]
``importlib.resources`` documentation
https://docs.python.org/3/library/importlib.html#module-importlib.resources
.. [#pkgutil_data]
``pkgutil.get_data`` documentation
https://docs.python.org/3/library/pkgutil.html#pkgutil.get_data
Other time zone implementations:
--------------------------------
.. [#dateutil-tz]
``dateutil.tz``
https://dateutil.readthedocs.io/en/stable/tz.html
.. [#dateutil-tzwin]
``dateutil.tz.win``: Concreate time zone implementations wrapping Windows
time zones
https://dateutil.readthedocs.io/en/stable/tzwin.html
.. [#pytz]
``pytz``
http://pytz.sourceforge.net/
Copyright
=========
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: