PEP 615: Add PEP proposing zoneinfo module (#1316)
* Add IANA Time Zone Support PEP * Add section rejecting Windows native time zones * Normalize code-block and drop python3 Apparently this is rendered without pygments support, which is causing the build to fail. * Add justification for a separate module * Clean up and reorganize footnotes and citations This adds headlines / citations for each footnote, rather than bare links, and tries to put them in a more logical grouping, and makes them all render without issues. * Add / adjust proposal targets * Add justification for set_tzpath Basically, "people are going to do this anyway, might as well have them use a function that we can at least deprecate if it causes major problems." * Add citation for rearguard format * Fix typos and formatting in abstract Fixes a few typos and reformats the abstract using semantic newlines. * Update location of reference implementation * Tweaks to motivation section * Clarify wording about PEP organization * Tweaks to section on the ZoneInfo class This adjusts the wording and fixes typos in the constructors and pickle sections. * Tweaks 'Sources for time zone data' These are typos, clarifications and wording adjustments in the section on time zone data sources. * Use semantic newlines in 'Security implications' * Clarify what 'no special verification' means * Add clarification to custom compiler section * Tweaks to wording on 'tzdata' in the standard library * Tweaks to Windows section * Adjust the justification for rejecting pytz * Add quotation marks around blog titles * Abandon semantic newlines * Rename title * Move module name into 'Open issues' * Comment out temporarily unused headers * Add open issue about TZPATH structure * Remove erroneous blockquotes * Add discourse link and Post-History
This commit is contained in:
parent
2e5c584c6b
commit
b674d3ae26
|
@ -0,0 +1,800 @@
|
||||||
|
PEP: 615
|
||||||
|
Title: Support for the IANA Time Zone Database in the Standard Library
|
||||||
|
Author: Paul Ganssle <paul at ganssle.io>
|
||||||
|
Discussions-To: https://discuss.python.org/t/3468
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Content-Type: text/x-rst
|
||||||
|
Created: 2020-02-22
|
||||||
|
Python-Version: 3.9
|
||||||
|
Post-History: 2020-02-25
|
||||||
|
Replaces: 431
|
||||||
|
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
========
|
||||||
|
|
||||||
|
This proposes adding a module, ``zoneinfo``, to provide a concrete time zone
|
||||||
|
implementation supporting the IANA time zone database. By default,
|
||||||
|
``zoneinfo`` will use the system's time zone data if available; if no system
|
||||||
|
time zone data is available, the library will fall back to using the
|
||||||
|
first-party package ``tzdata``, deployed on PyPI.
|
||||||
|
|
||||||
|
Motivation
|
||||||
|
==========
|
||||||
|
|
||||||
|
The ``datetime`` library uses a flexible mechanism to handle time zones: all
|
||||||
|
conversions and time zone information queries are delegated to an instance of a
|
||||||
|
subclass of the abstract ``datetime.tzinfo`` base class. [#tzinfo]_ This allows
|
||||||
|
users to implement arbitrarily complex time zone rules, but in practice the
|
||||||
|
majority of users want support for just three types of time zone: [a]_
|
||||||
|
|
||||||
|
1. UTC and fixed offsets thereof
|
||||||
|
2. The system local time zone
|
||||||
|
3. IANA time zones
|
||||||
|
|
||||||
|
In Python 3.2, the ``datetime.timezone`` class was introduced to support the
|
||||||
|
first class of time zone (with a special ``datetime.timezone.utc`` singleton
|
||||||
|
for UTC).
|
||||||
|
|
||||||
|
While there is still no "local" time zone, in Python 3.0 the semantics of naïve
|
||||||
|
time zones was changed to support many "local time" operations, and it is now
|
||||||
|
possible to get a fixed time zone offset from a local time::
|
||||||
|
|
||||||
|
>>> print(datetime(2020, 2, 22, 12, 0).astimezone())
|
||||||
|
2020-02-22 12:00:00-05:00
|
||||||
|
>>> print(datetime(2020, 2, 22, 12, 0).astimezone()
|
||||||
|
... .strftime("%Y-%m-%d %H:%M:%S %Z"))
|
||||||
|
2020-02-22 12:00:00 EST
|
||||||
|
>>> print(datetime(2020, 2, 22, 12, 0).astimezone(timezone.utc))
|
||||||
|
|
||||||
|
However, there is still no support for the time zones described in the IANA
|
||||||
|
time zone database (also called the "tz" database or the Olson database
|
||||||
|
[#tzdb-wiki]_). The time zone database is in the public domain and is widely
|
||||||
|
distributed — it is present by default on many Unix-like operating systems.
|
||||||
|
Great care goes into the stability of the database: there are IETF RFCs both
|
||||||
|
for the maintenance procedures (RFC 6557 [#rfc6557]_) and for the compiled
|
||||||
|
binary (TZif) format (RFC 8636 [#rfc8536]_). As such, it is likely that adding
|
||||||
|
support for the compiled outputs of the IANA database will add great value to
|
||||||
|
end users even with the relatively long cadence of standard library releases.
|
||||||
|
|
||||||
|
|
||||||
|
Proposal
|
||||||
|
========
|
||||||
|
|
||||||
|
This PEP has three main concerns:
|
||||||
|
|
||||||
|
1. The semantics of the ``zoneinfo.ZoneInfo`` class (zoneinfo-class_)
|
||||||
|
2. Time zone data sources used (data-sources_)
|
||||||
|
3. Options for configuration of the time zone search path (search-path-config_)
|
||||||
|
|
||||||
|
Because of the complexity of the proposal, rather than having separate
|
||||||
|
"specification" and "rationale" sections the design decisions and rationales
|
||||||
|
are grouped together by subject.
|
||||||
|
|
||||||
|
.. _zoneinfo-class:
|
||||||
|
|
||||||
|
The ``zoneinfo.ZoneInfo`` class
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
.. _Constructors:
|
||||||
|
|
||||||
|
Constructors
|
||||||
|
############
|
||||||
|
|
||||||
|
The initial design of the ``zoneinfo.ZoneInfo`` class has several constructors.
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
ZoneInfo(key: str)
|
||||||
|
|
||||||
|
The primary constructor takes a single argument, ``key``, which is a string
|
||||||
|
indicating the name of a zone file in the system time zone database (e.g.
|
||||||
|
``"America/New_York"``, ``"Europe/London"``), and returns a ``ZoneInfo``
|
||||||
|
constructed from the first matching TZif file on the search path (see the
|
||||||
|
data-sources_ section for more details).
|
||||||
|
|
||||||
|
One somewhat unusual guarantee made by this constructor is that calls with
|
||||||
|
identical arguments must return *identical* objects. Specifically, for all
|
||||||
|
values of ``key``, the following assertion must always be valid [b]_::
|
||||||
|
|
||||||
|
a = ZoneInfo(key)
|
||||||
|
b = ZoneInfo(key)
|
||||||
|
assert a is b
|
||||||
|
|
||||||
|
The reason for this comes from the fact that the semantics of datetime
|
||||||
|
operations (e.g. comparison, arithmetic) depend on whether the datetimes
|
||||||
|
involved represent the same or different zones; two datetimes are in the same
|
||||||
|
zone only if ``dt1.tzinfo is dt2.tzinfo``. [#nontransitive_comp]_ In addition
|
||||||
|
to the modest performance benefit from avoiding unnecessary proliferation of
|
||||||
|
``ZoneInfo`` objects, providing this guarantee should minimize surprising
|
||||||
|
behavior for end users.
|
||||||
|
|
||||||
|
|dateutil.tz.gettz| has provided a similar guarantee since version 2.7.0
|
||||||
|
(release March 2018). [#dateutil-tz]_
|
||||||
|
|
||||||
|
.. |dateutil.tz.gettz| replace:: ``dateutil.tz.gettz``
|
||||||
|
.. _dateutil.tz.gettz: https://dateutil.readthedocs.io/en/stable/tz.html#dateutil.tz.gettz
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The implementation may decide how to implement the cache behavior, but the
|
||||||
|
guarantee made here only requires that as long as two references exist to
|
||||||
|
the result of identical constructor calls, they must be references to the
|
||||||
|
same object. This is consistent with a reference counted cache where
|
||||||
|
``ZoneInfo`` objects are ejected when no references to them exist — it is
|
||||||
|
allowed but not required or recommended to implement this with a "strong"
|
||||||
|
cache, where all ``ZoneInfo`` files are kept alive indefinitely.
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
ZoneInfo.nocache(key: str)
|
||||||
|
|
||||||
|
This is an alternate constructor that bypasses the constructor's cache. It is
|
||||||
|
identical to the primary constructor, but returns a new object on each call.
|
||||||
|
This is likely most useful for testing purposes, or to deliberately induce
|
||||||
|
"different zone" semantics between datetimes with the same nominal time zone.
|
||||||
|
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
ZoneInfo.from_file(fobj: IO[bytes], /, key: str = None)
|
||||||
|
|
||||||
|
This is an alternate constructor that allows the construction of a ``ZoneInfo``
|
||||||
|
object from any TZif byte stream. This constructor takes an optional
|
||||||
|
parameter, ``key``, which sets the name of the zone, for the purposes of
|
||||||
|
``__str__`` and ``__repr__`` (see Representations_).
|
||||||
|
|
||||||
|
Unlike the primary constructor, this always constructs a new object. There are
|
||||||
|
two reasons that this deviates from the primary constructor's caching behavior:
|
||||||
|
stream objects have mutable state and so determining whether two inputs are
|
||||||
|
identical is difficult or impossible, and it is likely that users constructing
|
||||||
|
from a file specifically want to load from that file and not a cache.
|
||||||
|
|
||||||
|
|
||||||
|
.. _Representations:
|
||||||
|
|
||||||
|
String representation
|
||||||
|
#####################
|
||||||
|
|
||||||
|
The ``ZoneInfo`` class's ``__str__`` representation will be drawn from the
|
||||||
|
``key`` parameter. This is partially because the ``key`` represents a
|
||||||
|
human-readable "name" of the string, but also because it is a useful parameter
|
||||||
|
that users will want exposed. It is necessary to provide a mechanism to expose
|
||||||
|
the key for serialization between languages and because it is also a primary
|
||||||
|
key for localization projects like CLDR (the Unicode Common Locale Data
|
||||||
|
Repository [#cldr]_).
|
||||||
|
|
||||||
|
An example:
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
>>> zone = ZoneInfo("Pacific/Kwajalein")
|
||||||
|
>>> str(zone)
|
||||||
|
'Pacific/Kwajalein'
|
||||||
|
|
||||||
|
When a ``key`` is not specified, the ``str`` operation should not fail, but
|
||||||
|
should return the empty string::
|
||||||
|
|
||||||
|
>>> with open("/dev/null", "w") as f:
|
||||||
|
... zone = ZoneInfo.from_file(f)
|
||||||
|
|
||||||
|
>>> str(zone)
|
||||||
|
''
|
||||||
|
|
||||||
|
Pickle serialization
|
||||||
|
####################
|
||||||
|
|
||||||
|
There are two reasonable options for the pickling behavior of ``ZoneInfo``
|
||||||
|
files: serialize the key when available and reconstruct the object from from
|
||||||
|
the files on disk during deserialization, or serialize all the data in the
|
||||||
|
object (including all transitions). This PEP proposes to choose the *second*
|
||||||
|
behavior, and unconditionally serialize all transition data.
|
||||||
|
|
||||||
|
The first behavior makes for much smaller files, but may result in different
|
||||||
|
behavior if the object is unpickled in an environment with a different version
|
||||||
|
of the time zone database. For example, a pickle for
|
||||||
|
``ZoneFile("Asia/Qostanay")`` generated from version 2019c of the database
|
||||||
|
would fail to deserialize in an environment with version 2018a, since the
|
||||||
|
``"Asia/Qostanay"`` zone was added in 2018h. More subtle failures are also
|
||||||
|
possible if offsets or the timing of offset changes has changed between the two
|
||||||
|
versions.
|
||||||
|
|
||||||
|
Serializing only the key would also fail for objects created from a file
|
||||||
|
without specifying a key, and so a fallback mechanism serializing all
|
||||||
|
transitions would need to be provided anyway, bringing additional maintenance
|
||||||
|
burdens.
|
||||||
|
|
||||||
|
There are many other failures that can occur when using ``pickle`` to send
|
||||||
|
objects between non-identical environments, but nevertheless it is still
|
||||||
|
commonly done, and so it seems that the benefit of smaller file sizes is likely
|
||||||
|
outweighed by the costs.
|
||||||
|
|
||||||
|
|
||||||
|
.. _data-sources:
|
||||||
|
|
||||||
|
Sources for time zone data
|
||||||
|
--------------------------
|
||||||
|
|
||||||
|
One of the hardest challenges for IANA time zone support is keeping the data up
|
||||||
|
to date; between 1997 and 2020, there have been between 3 and 21 releases per
|
||||||
|
year, often in response to changes in time zone rules with little to no notice
|
||||||
|
(see [#timing-of-tz-changes]_ for more details). In order to keep up to date,
|
||||||
|
and to give the system administrator control over the data source, we propose
|
||||||
|
to use system-deployed time zone data wherever possible. However, not all
|
||||||
|
systems ship a publicly accessible time zone database — notably Windows uses a
|
||||||
|
different system for managing time zones — and so if available ``zoneinfo``
|
||||||
|
falls back to an installable first-party package, ``tzdata``, available on
|
||||||
|
PyPI. If no system zoneinfo files are found but ``tzdata`` is installed, the
|
||||||
|
primary ``ZoneInfo`` constructor will use ``tzdata`` as the time zone source.
|
||||||
|
|
||||||
|
System time zone information
|
||||||
|
############################
|
||||||
|
|
||||||
|
Many Unix-like systems deploy time zone data by default, or provide a canonical
|
||||||
|
time zone data package (often called ``tzdata``, as it is on Arch Linux, RedHat
|
||||||
|
and Debian). Whenever possible, it would be preferable to defer to the system
|
||||||
|
time zone information, because this allows time zone information for all
|
||||||
|
language stacks to be updated and maintained in one place. Python distributors
|
||||||
|
are encouraged to ensure that time zone data is installed alongside Python
|
||||||
|
whenever possible (e.g. by declaring ``tzdata`` as a dependency for the
|
||||||
|
``python`` package).
|
||||||
|
|
||||||
|
The ``zoneinfo`` module will use a "search path" strategy analogous to the
|
||||||
|
``PATH`` environment variable or the ``sys.path`` variable in Python; the
|
||||||
|
``zoneinfo.TZPATH`` variable will be read-only (see search-path-config_ for
|
||||||
|
more details), ordered list of time zone data locations to search. When
|
||||||
|
creating a ``ZoneInfo`` instance from a key, the zone file will be constructed
|
||||||
|
from the first data source on the path in which the key exists, so for example,
|
||||||
|
if ``TZPATH`` were::
|
||||||
|
|
||||||
|
TZPATH = (
|
||||||
|
"/usr/share/zoneinfo",
|
||||||
|
"/etc/zoneinfo"
|
||||||
|
)
|
||||||
|
|
||||||
|
and (although this would be very unusual) ``/usr/share/zoneinfo`` contained
|
||||||
|
only ``America/New_York`` and ``/etc/zoneinfo`` contained both
|
||||||
|
``America/New_York`` and ``Europe/Moscow``, then
|
||||||
|
``ZoneInfo("America/New_York")`` would be satisfied by
|
||||||
|
``/usr/share/zoneinfo/America/New_York``, while ``ZoneInfo("Europe/Moscow")``
|
||||||
|
would be satisfied by ``/etc/zoneinfo/Europe/Moscow``.
|
||||||
|
|
||||||
|
At the moment, on Windows systems, the search path will default to empty,
|
||||||
|
because Windows does not officially ship a copy of the time zone database. On
|
||||||
|
non-Windows systems, the search path will default to a list of the most
|
||||||
|
commonly observed search paths. Although this is subject to change in future
|
||||||
|
versions, at launch the default search path will be::
|
||||||
|
|
||||||
|
TZPATH = (
|
||||||
|
"/usr/share/zoneinfo",
|
||||||
|
"/usr/lib/zoneinfo",
|
||||||
|
"/usr/share/lib/zoneinfo",
|
||||||
|
"/etc/zoneinfo",
|
||||||
|
)
|
||||||
|
|
||||||
|
This may be configured both at compile time or at runtime; more information on
|
||||||
|
configuration options at search-path-config_.
|
||||||
|
|
||||||
|
The ``tzdata`` Python package
|
||||||
|
#############################
|
||||||
|
|
||||||
|
In order to ensure easy access to time zone data for all end users, this PEP
|
||||||
|
proposes to create a data-only package ``tzdata`` as a fallback for when system
|
||||||
|
data is not available. The ``tzdata`` package would be distributed on PyPI as
|
||||||
|
a "first party" package, maintained by the CPython development team.
|
||||||
|
|
||||||
|
The ``tzdata`` package contains only data and metadata, with no public-facing
|
||||||
|
functions or classes. It will be designed to be compatible with both newer
|
||||||
|
``importlib.resources`` [#importlib_resources]_ access patterns and older
|
||||||
|
access patterns like ``pkgutil.get_data`` [#pkgutil_data]_ .
|
||||||
|
|
||||||
|
While it is designed explicitly for the use of CPython, the ``tzdata`` package
|
||||||
|
is intended as a public package in its own right, and it may be used as an
|
||||||
|
"official" source of time zone data for third party Python packages.
|
||||||
|
|
||||||
|
.. _search-path-config:
|
||||||
|
|
||||||
|
Search path configuration
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
The time zone search path is very system-dependent, and sometimes even
|
||||||
|
application-dependent, and as such it makes sense to provide options to
|
||||||
|
customize it. This PEP provides for three such avenues for customization:
|
||||||
|
|
||||||
|
1. Global configuration via a compile-time option
|
||||||
|
2. Per-run configuration via environment variables
|
||||||
|
3. Runtime configuration change via a ``set_tzpath`` function
|
||||||
|
|
||||||
|
Compile-time options
|
||||||
|
####################
|
||||||
|
|
||||||
|
It is most likely that downstream distributors will know exactly where their
|
||||||
|
system time zone data is deployed, and so a compile-time option
|
||||||
|
``PYTHONTZPATH`` will be provided to set the default search path.
|
||||||
|
|
||||||
|
The ``PYTHONTZPATH`` option should be a string delimited by ``os.pathsep``,
|
||||||
|
listing possible locations for the time zone data to be deployed (e.g.
|
||||||
|
``/usr/share/zoneinfo``).
|
||||||
|
|
||||||
|
Environment variables
|
||||||
|
#####################
|
||||||
|
|
||||||
|
When initializing ``TZPATH`` (and whenever ``set_tzpath`` is called with no
|
||||||
|
arguments), the ``zoneinfo`` module will use two environment variables,
|
||||||
|
``PYTHONTZPATH`` and ``PYTHONTZPATH_APPEND``, if they exist, to set the search
|
||||||
|
path.
|
||||||
|
|
||||||
|
Both are ``os.pathsep`` delimited strings. ``PYTHONTZPATH`` *replaces* the
|
||||||
|
default time zone path, whereas ``PYTHONTZPATH_APPEND`` appends to the end of
|
||||||
|
the time zone path.
|
||||||
|
|
||||||
|
Some examples of the proposed semantics::
|
||||||
|
|
||||||
|
$ python print_tzpath.py
|
||||||
|
("/usr/share/zoneinfo",
|
||||||
|
"/usr/lib/zoneinfo",
|
||||||
|
"/usr/share/lib/zoneinfo",
|
||||||
|
"/etc/zoneinfo")
|
||||||
|
|
||||||
|
$ PYTHONTZPATH="/etc/zoneinfo:/usr/share/zoneinfo" python print_tzpath.py
|
||||||
|
("/etc/zoneinfo",
|
||||||
|
"/usr/share/zoneinfo")
|
||||||
|
|
||||||
|
$ PYTHONTZPATH="" python print_tzpath.py
|
||||||
|
()
|
||||||
|
|
||||||
|
$ PYTHONTZPATH_APPEND="/my/directory" python print_tzpath.py
|
||||||
|
("/usr/share/zoneinfo",
|
||||||
|
"/usr/lib/zoneinfo",
|
||||||
|
"/usr/share/lib/zoneinfo",
|
||||||
|
"/etc/zoneinfo",
|
||||||
|
"/my/directory")
|
||||||
|
|
||||||
|
``set_tzpath`` function
|
||||||
|
#######################
|
||||||
|
|
||||||
|
``zoneinfo`` provides a ``set_tzpath`` function that allows for changing the
|
||||||
|
search path at runtime.
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
def set_tzpath(tzpaths: Optional[Sequence[Union[str, Pathlike]]]) -> None:
|
||||||
|
...
|
||||||
|
|
||||||
|
When called with a sequence of paths, this function sets ``zoneinfo.TZPATH`` to
|
||||||
|
a tuple constructed from the desired value. When called with no arguments or
|
||||||
|
``None``, this function resets ``zoneinfo.TZPATH`` to the default
|
||||||
|
configuration.
|
||||||
|
|
||||||
|
This is likely to be primarily useful for (permanently or temporarily)
|
||||||
|
disabling the use of system time zone paths and forcing the module to use the
|
||||||
|
``tzdata`` package. It is not likely that ``set_tzpath`` will be a common
|
||||||
|
operation, save perhaps in test functions sensitive to time zone configuration,
|
||||||
|
but it seems preferable to provide an official mechanism for changing this
|
||||||
|
rather than allowing a proliferation of hacks around the immutability of
|
||||||
|
``TZPATH``.
|
||||||
|
|
||||||
|
.. caution::
|
||||||
|
|
||||||
|
Although changing ``TZPATH`` during a run is a supported operation, users
|
||||||
|
should be advised that doing so may occasionally lead to unusual semantics,
|
||||||
|
and when making design trade-offs greater weight will be afforded to using
|
||||||
|
a static ``TZPATH``, which is the much more common use case.
|
||||||
|
|
||||||
|
As noted in Constructors_, the primary ``ZoneInfo`` constructor employs a cache
|
||||||
|
to ensure that two identically-constructed ``ZoneInfo`` objects always compare
|
||||||
|
as identical (i.e. ``ZoneInfo(key) is ZoneInfo(key)``), and the nature of this
|
||||||
|
cache is implementation-defined. This means that the behavior of the
|
||||||
|
``ZoneInfo`` constructor may be unpredictably inconsistent in some situations
|
||||||
|
when used with the same ``key`` under different values of ``TZPATH``. For
|
||||||
|
example::
|
||||||
|
|
||||||
|
>>> set_tzpath(["/my/custom/tzdb"])
|
||||||
|
>>> a = ZoneInfo("My/Custom/Zone")
|
||||||
|
>>> set_tzpath()
|
||||||
|
>>> b = ZoneInfo("My/Custom/Zone")
|
||||||
|
>>> del a
|
||||||
|
>>> del b
|
||||||
|
>>> c = ZoneInfo("My/Custom/Zone")
|
||||||
|
|
||||||
|
In this example, ``My/Custom/Zone`` exists only in the ``/my/custom/tzdb`` and
|
||||||
|
not on the default search path. In all implementations the constructor for
|
||||||
|
``a`` must succeed. It is implementation-defined whether the constructor for
|
||||||
|
``b`` succeeds, but if it does, it must be true that ``a is b``, because both
|
||||||
|
``a`` and ``b`` are references to the same key. It is also
|
||||||
|
implementation-defined whether the constructor for ``c`` succeeds.
|
||||||
|
Implementations of ``zoneinfo`` *may* return the object constructed in previous
|
||||||
|
constructor calls, or they may fail with an exception.
|
||||||
|
|
||||||
|
Backwards Compatibility
|
||||||
|
=======================
|
||||||
|
|
||||||
|
This will have no backwards compatibility issues as it will create a new API.
|
||||||
|
|
||||||
|
With only minor modification, a backport with support for Python 3.6+ of the
|
||||||
|
``zoneinfo`` module could be created.
|
||||||
|
|
||||||
|
The ``tzdata`` package is designed to be "data only", and should support any
|
||||||
|
version of Python that it can be built for (including Python 2.7).
|
||||||
|
|
||||||
|
|
||||||
|
Security Implications
|
||||||
|
=====================
|
||||||
|
|
||||||
|
This will require parsing zoneinfo data from disk, mostly from system locations
|
||||||
|
but potentially from user-supplied data. Errors in the implementation
|
||||||
|
(particularly the C code) could cause potential security issues, but there is
|
||||||
|
no special risk relative to parsing other file types.
|
||||||
|
|
||||||
|
Reference Implementation
|
||||||
|
========================
|
||||||
|
|
||||||
|
An initial reference implementation is available at
|
||||||
|
https://github.com/pganssle/zoneinfo
|
||||||
|
|
||||||
|
This may eventually be converted into a backport for 3.6+.
|
||||||
|
|
||||||
|
Rejected Ideas
|
||||||
|
==============
|
||||||
|
|
||||||
|
Building a custom tzdb compiler
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
One major concern with the use of the TZif format is that it does not actually
|
||||||
|
contain enough information to always correctly determine the value to return
|
||||||
|
for ``tzinfo.dst()``. This is because for any given time zone offset, TZif
|
||||||
|
only marks the UTC offset and whether or not it represents a DST offset, but
|
||||||
|
``tzinfo.dst()`` returns the total amount of the DST shift, so that the
|
||||||
|
"standard" offset can be reconstructed from ``datetime.utcoffset() -
|
||||||
|
datetime.dst()``. The value to use for ``dst()`` can be determined by finding
|
||||||
|
the equivalent STD offset and calculating the difference, but the TZif format
|
||||||
|
does not specify which offsets form STD/DST pairs, and so heuristics must be
|
||||||
|
used to determine this.
|
||||||
|
|
||||||
|
One common heuristic — looking at the most recent standard offset — notably
|
||||||
|
fails in the case of the time zone changes in Portugal in 1992 and 1996, where
|
||||||
|
the "standard" offset was shifted by 1 hour during a DST transition, leading to
|
||||||
|
a transition from STD to DST status with no change in offset. In fact, it is
|
||||||
|
possible (though it has never happened) for a time zone to be created that is
|
||||||
|
permanently DST and has no standard offsets.
|
||||||
|
|
||||||
|
Although this information is missing in the compiled TZif binaries, it is
|
||||||
|
present in the raw tzdb files, and it would be possible to parse this
|
||||||
|
information ourselves and create a more suitable binary format.
|
||||||
|
|
||||||
|
This idea was rejected for several reasons:
|
||||||
|
|
||||||
|
1. It precludes the use of any system-deployed time zone information, which is
|
||||||
|
usually present only in TZif format.
|
||||||
|
|
||||||
|
2. The raw tzdb format, while stable, is *less* stable than the TZif format;
|
||||||
|
some downstream tzdb parsers have already run into problems with old
|
||||||
|
deployments of their custom parsers becoming incompatible with recent tzdb
|
||||||
|
releases, leading to the creation of a "rearguard" format to ease the
|
||||||
|
transition. [#rearguard]_
|
||||||
|
|
||||||
|
3. Heuristics currently suffice in ``dateutil`` and ``pytz`` for all known time
|
||||||
|
zones, historical and present, and it is not very likely that new time zones
|
||||||
|
will appear that cannot be captured by heuristics — though it is somewhat
|
||||||
|
more likely that new rules that are not captured by the *current* generation
|
||||||
|
of heuristics will appear; in that case, bugfixes would be required to
|
||||||
|
accommodate the changed situation.
|
||||||
|
|
||||||
|
4. The ``dst()`` method's utility (and in fact the ``isdst`` parameter in TZif)
|
||||||
|
is somewhat questionable to start with, as almost all the useful information
|
||||||
|
is contained in the ``utcoffset()`` and ``tzname()`` methods, which are not
|
||||||
|
subject to the same problems.
|
||||||
|
|
||||||
|
In short, maintaining a custom tzdb compiler or compiled package adds
|
||||||
|
maintenance burdens to both the CPython dev team and system administrators, and
|
||||||
|
its main benefit is to address a hypothetical failure that would likely have
|
||||||
|
minimal real world effects were it to occur.
|
||||||
|
|
||||||
|
.. _why-no-default-tzdata:
|
||||||
|
|
||||||
|
Including ``tzdata`` in the standard library by default
|
||||||
|
-------------------------------------------------------
|
||||||
|
|
||||||
|
Although PEP 453 [#pep453-ensurepip]_, which introduced the ``ensurepip``
|
||||||
|
mechanism to CPython, provides a convenient template for a standard library
|
||||||
|
module maintained on PyPI, a potentially similar ``ensuretzdata`` mechanism is
|
||||||
|
somewhat less necessary, and would be complicated enough that it is considered
|
||||||
|
out of scope for this PEP.
|
||||||
|
|
||||||
|
Because the ``zoneinfo`` module is designed to use the system time zone data
|
||||||
|
wherever possible, the ``tzdata`` package is unnecessary (and may be
|
||||||
|
undesirable) on systems that deploy time zone data, and so it does not seem
|
||||||
|
critical to ship ``tzdata`` with CPython.
|
||||||
|
|
||||||
|
It is also not yet clear how these hybrid standard library / PyPI modules
|
||||||
|
should be updated, (other than ``pip``, which has a natural mechanism for
|
||||||
|
updates and notifications) and since it is not critical to the operation of the
|
||||||
|
module, it seems prudent to defer any such proposal.
|
||||||
|
|
||||||
|
Incorporating Windows' native time zone support
|
||||||
|
-----------------------------------------------
|
||||||
|
|
||||||
|
Windows has a non-IANA source of time zone information, along with public APIs
|
||||||
|
for accessing the data. Theoretically these could be supported in the
|
||||||
|
``zoneinfo`` module, but in practice they would not map cleanly enough to TZif
|
||||||
|
files to provide a good platform-independent experience, and a specialized API
|
||||||
|
supporting Windows time zones is a niche enough concern that it would be better
|
||||||
|
provided by a third party package.
|
||||||
|
|
||||||
|
The current Windows system time zones are provided by ``tzres.dll``, which
|
||||||
|
contains a list of simple rules for either fixed offsets or time zones with 2
|
||||||
|
DST transitions per year (DST start and DST end). The rules use
|
||||||
|
Windows-specific names such as "Eastern Standard Time" as opposed to
|
||||||
|
"America/New_York", and they contain no historical data.
|
||||||
|
|
||||||
|
Even if it were simple to unambiguously map IANA time zones to a
|
||||||
|
Windows-specific time zone name, the lack of historical data makes
|
||||||
|
Windows-style time zones sufficiently different that they cannot be used as a
|
||||||
|
drop-in replacement for the IANA database. They are also restricted to either
|
||||||
|
0 or 2 DST transitions per year, occurring on a regular schedule. This means
|
||||||
|
that, for example, the "Africa/Casablanca" time zone cannot be accurately
|
||||||
|
represented using its Windows equivalent, because for many years Morocco has
|
||||||
|
observed Daylight Saving Time during the summer months *except* during Ramadan,
|
||||||
|
and thus has 4 transitions per year in years where Ramadan overlaps with the
|
||||||
|
DST period.
|
||||||
|
|
||||||
|
Considering there is no easy way to use Microsoft's preferred APIs to emulate
|
||||||
|
IANA time zone support, it is best left to third parties (or at least a
|
||||||
|
different PEP) to provide a dedicated Windows time zone support library. In
|
||||||
|
fact, the ``dateutil`` package already provides ``dateutil.tz.win``
|
||||||
|
[#dateutil-tzwin]_, which contains ``tzinfo`` classes utilizing Windows system
|
||||||
|
time zone data.
|
||||||
|
|
||||||
|
If Microsoft were to provide a public system for accessing IANA time zone data,
|
||||||
|
even if it were somewhat unusual compared to access patterns on Unix-like
|
||||||
|
systems, the ``zoneinfo`` module should add support for it.
|
||||||
|
|
||||||
|
Using a ``pytz``-like interface
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
A ``pytz``-like ([#pytz]_) interface was proposed in PEP 431 [#pep431]_, but
|
||||||
|
was ultimately withdrawn / rejected for lack of ambiguous datetime support.
|
||||||
|
PEP 495 [#pep495]_ added the ``fold`` attribute to address this problem, but
|
||||||
|
``fold`` obviates the need for ``pytz``'s non-standard ``tzinfo`` classes, and
|
||||||
|
so a ``pytz``-like interface is no longer necessary. [#fastest-footgun]_
|
||||||
|
|
||||||
|
The ``zoneinfo`` approach is more closely based on ``dateutil.tz``, which
|
||||||
|
implemented support for ``fold`` (including a backport to older versions) just
|
||||||
|
before the release of Python 3.6.
|
||||||
|
|
||||||
|
|
||||||
|
Open Issues
|
||||||
|
===========
|
||||||
|
|
||||||
|
Using the ``datetime`` module
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
One possible idea would be to add ``ZoneInfo`` to the ``datetime`` module,
|
||||||
|
rather than giving it its own separate module. In the current version of the
|
||||||
|
PEP, this has been resolved in favor of using a separate module, for the
|
||||||
|
reasons detailed below, but the use of a nested submodule ``datetime.zoneinfo``
|
||||||
|
is also under consideration.
|
||||||
|
|
||||||
|
Arguments against putting ``ZoneInfo`` directly into ``datetime``
|
||||||
|
#################################################################
|
||||||
|
|
||||||
|
The ``datetime`` module is already somewhat crowded, as it has many classes
|
||||||
|
with somewhat complex behavior — ``datetime.datetime``, ``datetime.date``,
|
||||||
|
``datetime.time``, ``datetime.timedelta``, ``datetime.timezone`` and
|
||||||
|
``datetime.tzinfo``. The module's implementation and documentation are already
|
||||||
|
quite complicated, and it is probably beneficial to try to not to compound the
|
||||||
|
problem if it can be helped.
|
||||||
|
|
||||||
|
The ``ZoneInfo`` class is also in some ways different from all the other
|
||||||
|
classes provided by ``datetime``; the other classes are all intended to be
|
||||||
|
lean, simple data types, whereas the ``ZoneInfo`` class is more complex: it is
|
||||||
|
a parser for a specific format (TZif), a representation for the information
|
||||||
|
stored in that format and a mechanism to look up the information in well-known
|
||||||
|
locations in the system.
|
||||||
|
|
||||||
|
Finally, while it is true that someone who needs the ``zoneinfo`` module also
|
||||||
|
needs the ``datetime`` module, the reverse is not necessarily true: many people
|
||||||
|
will want to use ``datetime`` without ``zoneinfo``. Considering that
|
||||||
|
``zoneinfo`` will likely pull in additional, possibly more heavy-weight
|
||||||
|
standard library modules, it would be preferable to allow the two to be
|
||||||
|
imported separately — particularly if potential "tree shaking" distributions
|
||||||
|
are in Python's future. [#tree-shaking]_
|
||||||
|
|
||||||
|
In the final analysis, it makes sense to keep ``zoneinfo`` a separate module
|
||||||
|
with a separate documentation page rather than to put its classes and functions
|
||||||
|
directly into ``datetime``.
|
||||||
|
|
||||||
|
Using ``datetime.zoneinfo`` instead of ``zoneinfo``
|
||||||
|
###################################################
|
||||||
|
|
||||||
|
A more palatable configuration may be to nest ``zoneinfo`` as a module under
|
||||||
|
``datetime``, as ``datetime.zoneinfo``.
|
||||||
|
|
||||||
|
Arguments in favor of this:
|
||||||
|
|
||||||
|
1. It neatly namespaces ``zoneinfo`` together with ``datetime``
|
||||||
|
|
||||||
|
2. The ``timezone`` class is already in ``datetime``, and it may seem strange
|
||||||
|
that some time zones are in ``datetime`` and others are in a top-level
|
||||||
|
module.
|
||||||
|
|
||||||
|
3. As mentioned earlier, importing ``zoneinfo`` necessarily requires importing
|
||||||
|
``datetime``, so it is no imposition to require importing the parent module.
|
||||||
|
|
||||||
|
Arguments against this:
|
||||||
|
|
||||||
|
1. In order to avoid forcing all ``datetime`` users to import ``zoneinfo``, the
|
||||||
|
``zoneinfo`` module would need to be lazily imported, which means that
|
||||||
|
end-users would need to explicitly import ``datetime.zoneinfo`` (as opposed
|
||||||
|
to importing ``datetime`` and accessing the ``zoneinfo`` attribute on the
|
||||||
|
module). This is the way ``dateutil`` works (all submodules are lazily
|
||||||
|
imported), and it is a perennial source of confusion for end users.
|
||||||
|
|
||||||
|
This confusing requirement from end-users can be avoided using a
|
||||||
|
module-level ``__getattr__`` and ``__dir__`` per PEP 562, but this would
|
||||||
|
add some complexity to the implementation of the ``datetime`` module. This
|
||||||
|
sort of behavior in modules or classes tends to confuse static analysis
|
||||||
|
tools, which may not be desirable for a library as widely-used and critical
|
||||||
|
as ``datetime``.
|
||||||
|
|
||||||
|
2. Nesting the implementation under ``datetime`` would likely require
|
||||||
|
``datetime`` to be reorganized from a single-file module (``datetime.py``)
|
||||||
|
to a directory with an ``__init__.py``. This is a minor concern, but the
|
||||||
|
structure of the ``datetime`` module has been stable for many years, and it
|
||||||
|
would be preferable to avoid churn if possible.
|
||||||
|
|
||||||
|
This concern *could* be alleviated by implementing ``zoneinfo`` as
|
||||||
|
``_zoneinfo.py`` and importing it as ``zoneinfo`` from within ``datetime``,
|
||||||
|
but this does not seem desirable from an aesthetic or code organization
|
||||||
|
standpoint, and it would preclude the version of nesting where end users are
|
||||||
|
required to explicitly import ``datetime.zoneinfo``.
|
||||||
|
|
||||||
|
This PEP currently takes the position that on balance it would be best to use a
|
||||||
|
separate top-level ``zoneinfo`` module because the benefits of nesting are not
|
||||||
|
so great that it overwhelms the practical implementation concerns, but this
|
||||||
|
still requires some discussion.
|
||||||
|
|
||||||
|
|
||||||
|
Structure of the ``PYTHON_TZPATH`` environment variables
|
||||||
|
========================================================
|
||||||
|
|
||||||
|
This PEP proposes two variables to set the time zone path: ``PYTHONTZPATH`` and
|
||||||
|
``PYTHONTZPATH_APPEND``. This is based on the assumption that the majority of
|
||||||
|
users who would want to manipulate the time zone path would want to fully
|
||||||
|
replace it (e.g. "I know exactly where my time zone data is"), and a smaller
|
||||||
|
fraction would want to use the standard time zone paths wherever possible, but
|
||||||
|
add additional locations (possibly containing custom time zones).
|
||||||
|
|
||||||
|
There are several other schemes that were considered and weakly rejected:
|
||||||
|
|
||||||
|
1. Separate these into a ``DEFAULT_PYTHONTZPATH`` and ``PYTHONTZPATH``
|
||||||
|
variable, where ``PYTHONTZPATH`` would contain values to append (or prepend)
|
||||||
|
to the default time zone path, and ``DEFAULT_PYTHONTZPATH`` would *replace*
|
||||||
|
the default time zone path. This was rejected because it would likely lead
|
||||||
|
to user confusion if the primary use case is to replace rather than augment.
|
||||||
|
|
||||||
|
2. Use *only* the ``PYTHONTZPATH`` variable, but provide a custom special value
|
||||||
|
that represents the default time zone path, e.g. ``<<DEFAULT_TZPATH>>``, so
|
||||||
|
users could append to the time zone path with, e.g.
|
||||||
|
``PYTHONTZPATH=<<DEFAULT_TZPATH>>:/my/path`` could be used to append
|
||||||
|
``/my/path`` to the end of the time zone path.
|
||||||
|
|
||||||
|
This was rejected mainly because these sort of special values are not
|
||||||
|
usually found in ``PATH``-like variables, and it would be hard to discover
|
||||||
|
mistakes in your implementation.
|
||||||
|
|
||||||
|
Footnotes
|
||||||
|
=========
|
||||||
|
|
||||||
|
.. [a]
|
||||||
|
The claim that the vast majority of users only want a few types of time
|
||||||
|
zone is based on anecdotal impressions rather than anything remotely
|
||||||
|
scientific. As one data point, ``dateutil`` provides many time zone types,
|
||||||
|
but user support mostly focuses on these three types.
|
||||||
|
|
||||||
|
.. [b]
|
||||||
|
The statement that identically constructed ``ZoneInfo`` files should be
|
||||||
|
identical objects may be violated if the user deliberately clears the time
|
||||||
|
zone cache.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. [#rfc6557]
|
||||||
|
RFC 6557: Procedures for Maintaining the Time Zone Database
|
||||||
|
https://tools.ietf.org/html/rfc6557
|
||||||
|
|
||||||
|
.. [#rfc8536]
|
||||||
|
RFC 8636: The Time Zone Information Format (TZif)
|
||||||
|
https://tools.ietf.org/html/rfc8536
|
||||||
|
|
||||||
|
.. [#nontransitive_comp]
|
||||||
|
Paul Ganssle: "A curious case of non-transitive datetime comparison"
|
||||||
|
(Published 15 February 2018)
|
||||||
|
https://blog.ganssle.io/articles/2018/02/a-curious-case-datetimes.html
|
||||||
|
|
||||||
|
.. [#fastest-footgun]
|
||||||
|
Paul Ganssle: "pytz: The Fastest Footgun in the West" (Published 19 March
|
||||||
|
2018) https://blog.ganssle.io/articles/2018/03/pytz-fastest-footgun.html
|
||||||
|
|
||||||
|
.. [#cldr]
|
||||||
|
CLDR: Unicode Common Locale Data Repository
|
||||||
|
http://cldr.unicode.org/#TOC-How-to-Use-
|
||||||
|
|
||||||
|
.. [#tzdb-wiki]
|
||||||
|
Wikipedia page for Tz database:
|
||||||
|
https://en.wikipedia.org/wiki/Tz_database
|
||||||
|
|
||||||
|
.. [#timing-of-tz-changes]
|
||||||
|
Code of Matt: "On the Timing of Time Zone Changes" (Matt Johnson-Pint, 23
|
||||||
|
April 2016) https://codeofmatt.com/on-the-timing-of-time-zone-changes/
|
||||||
|
|
||||||
|
.. [#rearguard]
|
||||||
|
tz mailing list: [PROPOSED] Support zi parsers that mishandle negative DST
|
||||||
|
offsets (Paul Eggert, 23 April 2018)
|
||||||
|
https://mm.icann.org/pipermail/tz/2018-April/026421.html
|
||||||
|
|
||||||
|
.. [#tree-shaking]
|
||||||
|
"Russell Keith-Magee: Python On Other Platforms" (15 May 2019, Jesse Jiryu
|
||||||
|
Davis)
|
||||||
|
https://pyfound.blogspot.com/2019/05/russell-keith-magee-python-on-other.html
|
||||||
|
|
||||||
|
.. [#pep453-ensurepip]
|
||||||
|
PEP 453: Explicit bootstrapping of pip in Python installations
|
||||||
|
https://www.python.org/dev/peps/pep-0453/
|
||||||
|
|
||||||
|
.. [#pep431]
|
||||||
|
PEP 431: Time zone support improvements
|
||||||
|
https://www.python.org/dev/peps/pep-0431/
|
||||||
|
|
||||||
|
.. [#pep495]
|
||||||
|
PEP 495: Local Time Disambiguation
|
||||||
|
https://www.python.org/dev/peps/pep-0495/
|
||||||
|
|
||||||
|
.. [#tzinfo]
|
||||||
|
``datetime.tzinfo`` documentation
|
||||||
|
https://docs.python.org/3/library/datetime.html#datetime.tzinfo
|
||||||
|
|
||||||
|
.. [#importlib_resources]
|
||||||
|
``importlib.resources`` documentation
|
||||||
|
https://docs.python.org/3/library/importlib.html#module-importlib.resources
|
||||||
|
|
||||||
|
.. [#pkgutil_data]
|
||||||
|
``pkgutil.get_data`` documentation
|
||||||
|
https://docs.python.org/3/library/pkgutil.html#pkgutil.get_data
|
||||||
|
|
||||||
|
|
||||||
|
Other time zone implementations:
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
.. [#dateutil-tz]
|
||||||
|
``dateutil.tz``
|
||||||
|
https://dateutil.readthedocs.io/en/stable/tz.html
|
||||||
|
|
||||||
|
.. [#dateutil-tzwin]
|
||||||
|
``dateutil.tz.win``: Concreate time zone implementations wrapping Windows
|
||||||
|
time zones
|
||||||
|
https://dateutil.readthedocs.io/en/stable/tzwin.html
|
||||||
|
|
||||||
|
.. [#pytz]
|
||||||
|
``pytz``
|
||||||
|
http://pytz.sourceforge.net/
|
||||||
|
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
=========
|
||||||
|
|
||||||
|
This document is placed in the public domain or under the
|
||||||
|
CC0-1.0-Universal license, whichever is more permissive.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
..
|
||||||
|
Local Variables:
|
||||||
|
mode: indented-text
|
||||||
|
indent-tabs-mode: nil
|
||||||
|
sentence-end-double-space: t
|
||||||
|
fill-column: 70
|
||||||
|
coding: utf-8
|
||||||
|
End:
|
Loading…
Reference in New Issue