PEP 615: Windows support, address first round of feedback (#1320)

* Address some of Petr's feedback

https://discuss.python.org/t/pep-615-support-for-the-iana-time-zone-database-in-the-standard-library/3468/3

* Update with description of update behavior

This is in response to Paul Eggert's insightful comments:

https://mm.icann.org/pipermail/tz/2020-February/028843.html

* Add note on leap seconds

Leap seconds are out of scope for the proposal, but several people have
asked about them, so it seems prudent to note this.

* Move Windows support to Open Issues

Based on comments sent privately by e-mail from Matt Johnson-Pint, who
suggested this approach.

* Make cache behavior explicit

Per Guido's comments, this was not previously explicit:

https://discuss.python.org/t/pep-615-support-for-the-iana-time-zone-database-in-the-standard-library/3468/10?u=pganssle

* Address Jelle's comment about type

* Update the pickle behavior to serialize by key

Per the discussion here (and the chain of posts that this replies to):
https://discuss.python.org/t/3468/17
This commit is contained in:
Paul Ganssle 2020-02-27 15:16:57 -05:00 committed by GitHub
parent e709487ed6
commit 632742d2d7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 148 additions and 61 deletions

View File

@ -91,8 +91,11 @@ The initial design of the ``zoneinfo.ZoneInfo`` class has several constructors.
The primary constructor takes a single argument, ``key``, which is a string
indicating the name of a zone file in the system time zone database (e.g.
``"America/New_York"``, ``"Europe/London"``), and returns a ``ZoneInfo``
constructed from the first matching TZif file on the search path (see the
data-sources_ section for more details).
constructed from the first matching data source on search path (see the
data-sources_ section for more details). All zone information must be eagerly
read from the data source (usually a TZif file) upon construction, and may
not change during the lifetime of the object (this restriction applies to all
``ZoneInfo`` constructors).
One somewhat unusual guarantee made by this constructor is that calls with
identical arguments must return *identical* objects. Specifically, for all
@ -122,7 +125,8 @@ behavior for end users.
guarantee made here only requires that as long as two references exist to
the result of identical constructor calls, they must be references to the
same object. This is consistent with a reference counted cache where
``ZoneInfo`` objects are ejected when no references to them exist — it is
``ZoneInfo`` objects are ejected when no references to them exist (for
example, a cache implemented with a ``weakref.WeakValueDictionary``) — it is
allowed but not required or recommended to implement this with a "strong"
cache, where all ``ZoneInfo`` files are kept alive indefinitely.
@ -135,6 +139,15 @@ identical to the primary constructor, but returns a new object on each call.
This is likely most useful for testing purposes, or to deliberately induce
"different zone" semantics between datetimes with the same nominal time zone.
Even if an object constructed by this method would have been a cache miss, it
must not be entered into the cache; in other words, the following assertion
should always be true:
.. code-block::
>>> a = ZoneInfo.nocache(key)
>>> b = ZoneInfo(key)
>>> a is not b
.. code-block::
@ -151,6 +164,29 @@ stream objects have mutable state and so determining whether two inputs are
identical is difficult or impossible, and it is likely that users constructing
from a file specifically want to load from that file and not a cache.
As with ``ZoneInfo.nocache``, objects constructed by this method must not be
added to the cache.
Behavior during data updates
############################
If a source of time zone data is updated during a run of the interpreter, it
will not invalidate any caches or modify any existing ``ZoneInfo`` objects, but
newly constructed ``ZoneInfo`` objects should come from the updated data
source.
This means that the point at which a ``ZoneInfo`` file is updated depends
primarily on the semantics of the caching behavior. The only guaranteed way to
get a ``ZoneInfo`` file from an updated data source is to induce a cache miss,
either by bypassing the cache and using ``ZoneInfo.nocache`` or by clearing the
cache.
.. note::
The specified cache behavior does not require that the cache be lazily
populated — it is consistent with the specification (though not
recommended) to eagerly pre-populate the cache with time zones that have
never been constructed.
.. _Representations:
@ -185,31 +221,54 @@ should return the empty string::
Pickle serialization
####################
There are two reasonable options for the pickling behavior of ``ZoneInfo``
files: serialize the key when available and reconstruct the object from from
the files on disk during deserialization, or serialize all the data in the
object (including all transitions). This PEP proposes to choose the *second*
behavior, and unconditionally serialize all transition data.
Rather than serializing all transition data, ``ZoneInfo`` objects will be
serialized by key, and ``ZoneInfo`` objects constructed from raw files (even
those with a value for ``key`` specified) cannot be pickled.
The first behavior makes for much smaller files, but may result in different
behavior if the object is unpickled in an environment with a different version
of the time zone database. For example, a pickle for
``ZoneFile("Asia/Qostanay")`` generated from version 2019c of the database
would fail to deserialize in an environment with version 2018a, since the
``"Asia/Qostanay"`` zone was added in 2018h. More subtle failures are also
possible if offsets or the timing of offset changes has changed between the two
versions.
The behavior of a ``ZoneInfo`` file depends on how it was constructed:
Serializing only the key would also fail for objects created from a file
without specifying a key, and so a fallback mechanism serializing all
transitions would need to be provided anyway, bringing additional maintenance
burdens.
1. ``ZoneInfo(key)``: When constructed with the primary constructor, a
``ZoneInfo`` object will be serialized by key, and when deserialized the
will use the primary constructor in the deserializing process, and thus be
expected to be the same object as other references to the same time zone.
For example, if ``europe_berlin_pkl`` is a string containing a pickle
constructed from ``ZoneInfo("Europe/Berlin")``, one would expect the
following behavior:
There are many other failures that can occur when using ``pickle`` to send
objects between non-identical environments, but nevertheless it is still
commonly done, and so it seems that the benefit of smaller file sizes is likely
outweighed by the costs.
.. code-block::
>>> a = ZoneInfo("Europe/Berlin")
>>> b = pickle.loads(europe_berlin_pkl)
>>> a is b
True
2. ``ZoneInfo.nocache(key)``: When constructed from the cache-bypassing
constructor, the ``ZoneInfo`` object will still be serialized by key, but
when deserialized, it will use the cache bypassing constructor. If
``europe_berlin_pkl_nc`` is a string containing a pickle constructed from
``ZoneInfo.nocache("Europe/Berlin")``, one would expect the following
behavior:
.. code-block::
>>> a = ZoneInfo("Europe/Berlin")
>>> b = pickle.loads(europe_berlin_pkl_nc)
>>> a is b
False
3. ``ZoneInfo.from_file(fobj, /, key=None)``: When constructed from a file, the
``ZoneInfo`` object will raise an exception on pickling. If an end user
wants to pickle a ``ZoneInfo`` constructed from a file, it is recommended
that they use a wrapper type or a custom serialization function: either
serializing by key or storing the contents of the file object and
serializing that.
This method of serialization requires that the time zone data for the required
key be available on both the serializing and deserializing side, similar to the
way that references to classes and functions are expected to exist in both the
serializing and deserializing environments. It also means that no guarantees
are made about the consistency of results when unpickling a ``ZoneInfo``
pickled in an environment with a different version of the time zone data.
.. _data-sources:
@ -232,7 +291,7 @@ System time zone information
############################
Many Unix-like systems deploy time zone data by default, or provide a canonical
time zone data package (often called ``tzdata``, as it is on Arch Linux, RedHat
time zone data package (often called ``tzdata``, as it is on Arch Linux, Fedora,
and Debian). Whenever possible, it would be preferable to defer to the system
time zone information, because this allows time zone information for all
language stacks to be updated and maintained in one place. Python distributors
@ -359,7 +418,9 @@ search path at runtime.
.. code-block::
def set_tzpath(tzpaths: Optional[Sequence[Union[str, Pathlike]]]) -> None:
def set_tzpath(
tzpaths: Optional[Sequence[Union[str, os.PathLike]]] = None
) -> None:
...
When called with a sequence of paths, this function sets ``zoneinfo.TZPATH`` to
@ -512,43 +573,17 @@ should be updated, (other than ``pip``, which has a natural mechanism for
updates and notifications) and since it is not critical to the operation of the
module, it seems prudent to defer any such proposal.
Incorporating Windows' native time zone support
-----------------------------------------------
Support for leap seconds
------------------------
Windows has a non-IANA source of time zone information, along with public APIs
for accessing the data. Theoretically these could be supported in the
``zoneinfo`` module, but in practice they would not map cleanly enough to TZif
files to provide a good platform-independent experience, and a specialized API
supporting Windows time zones is a niche enough concern that it would be better
provided by a third party package.
In addition to time zone offset and name rules, the IANA time zone database
also provides a source of leap second data. This is deemed out of scope because
``datetime.datetime`` currently has no support for leap seconds, and the
question of leap second data can be deferred until leap second support is
added.
The current Windows system time zones are provided by ``tzres.dll``, which
contains a list of simple rules for either fixed offsets or time zones with 2
DST transitions per year (DST start and DST end). The rules use
Windows-specific names such as "Eastern Standard Time" as opposed to
"America/New_York", and they contain no historical data.
Even if it were simple to unambiguously map IANA time zones to a
Windows-specific time zone name, the lack of historical data makes
Windows-style time zones sufficiently different that they cannot be used as a
drop-in replacement for the IANA database. They are also restricted to either
0 or 2 DST transitions per year, occurring on a regular schedule. This means
that, for example, the "Africa/Casablanca" time zone cannot be accurately
represented using its Windows equivalent, because for many years Morocco has
observed Daylight Saving Time during the summer months *except* during Ramadan,
and thus has 4 transitions per year in years where Ramadan overlaps with the
DST period.
Considering there is no easy way to use Microsoft's preferred APIs to emulate
IANA time zone support, it is best left to third parties (or at least a
different PEP) to provide a dedicated Windows time zone support library. In
fact, the ``dateutil`` package already provides ``dateutil.tz.win``
[#dateutil-tzwin]_, which contains ``tzinfo`` classes utilizing Windows system
time zone data.
If Microsoft were to provide a public system for accessing IANA time zone data,
even if it were somewhat unusual compared to access patterns on Unix-like
systems, the ``zoneinfo`` module should add support for it.
The first-party ``tzdata`` package should ship the leap second data, even if it
is not used by the ``zoneinfo`` module.
Using a ``pytz``-like interface
-------------------------------
@ -684,6 +719,46 @@ There are several other schemes that were considered and weakly rejected:
usually found in ``PATH``-like variables, and it would be hard to discover
mistakes in your implementation.
Windows support via Microsoft's ICU API
=======================================
Windows does not ship the time zone database as TZif files, but as of Windows
10's 2017 Creators Update, Microsoft has provided an API for interacting with
the International Components for Unicode (ICU) project [#icu-project]_
[#ms-icu-documentation]_ , which includes an API for accessing time zone data —
sourced from the IANA time zone database. [#icu-timezone-api]_
Providing bindings for this would allow for a mostly seamless cross-platform
experience for users on sufficiently recent versions of Windows — even without
falling back to the ``tzdata`` package.
This is a promising area, but is less mature than the remainder of the proposal,
and so there are several open issues with regards to Windows support:
1. None of the popular third party time zone libraries provide support for ICU
(``dateutil``'s native windows time zone support relies on legacy time zones
provided in the Windows Registry [#dateutil-tzwin]_, which would be
unsuitable as a drop-in replacement for TZif files), so this would need to
be developed *de novo* in the standard library, rather than first maturing
in the third party ecosystem.
2. The most likely implementation for this would be to have ``TZPATH`` default
to empty on Windows and have a search path precedence of ``TZPATH`` > ICU
> ``tzdata``, but this prevents end users from forcing the use of ``tzdata``
by setting an empty ``TZPATH``.
Two possible solutions for this are:
1. Add a mechanism to disable ICU globally independent of setting
``TZPATH``.
2. Add a cross-platform mechanism to give ``tzdata`` the highest
precedence.
3. This is not part of the reference implementation and it is uncertain whether
it can be ready and vetted in time for the Python 3.9 feature freeze. It is
an open question whether a failure to implement native Windows support in
3.9 should defer the release of ``zoneinfo`` or if only the ICU-based
Windows support should be deferred.
Footnotes
=========
@ -764,6 +839,18 @@ References
``pkgutil.get_data`` documentation
https://docs.python.org/3/library/pkgutil.html#pkgutil.get_data
.. [#icu-project]
ICU TimeZone classes
http://userguide.icu-project.org/datetime/timezone
.. [#ms-icu-documentation]
Microsoft documentation for International Components for Unicode (ICU)
`https://docs.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu- <https://docs.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu->`_
.. [#icu-timezone-api]
``icu::TimeZone`` class documentation
https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1TimeZone.html
Other time zone implementations:
--------------------------------