PEP: 495 Title: Local Time Disambiguation Version: $Revision$ Last-Modified: $Date$ Author: Alexander Belopolsky , Tim Peters Discussions-To: Datetime-SIG Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 02-Aug-2015 Abstract ======== This PEP adds a boolean member to the instances of ``datetime.time`` and ``datetime.datetime`` classes that can be used to differentiate between two moments in time for which local times are the same. .. sidebar:: US public service advertisement .. image:: pep-0495-daylightsavings.png :align: center :width: 95% Rationale ========= In the most world locations there have been and will be times when local clocks are moved back. [#]_ In those times, intervals are introduced in which local clocks show the same time twice in the same day. In these situations, the information displayed on a local clock (or stored in a Python datetime instance) is insufficient to identify a particular moment in time. The proposed solution is to add a boolean flag to the ``datetime`` instances that will distinguish between the two ambiguous times. .. [#] People who live in locations observing the Daylight Saving Time (DST) move their clocks back (usually one hour) every Fall. It is less common, but occasionally clocks can be moved back for other reasons. For example, Ukraine skipped the spring-forward transition in March 1990 and instead, moved their clocks back on July 1, 1990, switching from Moscow Time to Eastern European Time. In that case, standard (winter) time was in effect before and after the transition. Both DST and standard time changes may result in time shifts other than an hour. Terminology =========== When clocks are moved back, we say that a *fold* is created in time. When the clocks are moved forward, a *gap* is created. A local time that falls in the fold is called *ambiguous*. A local time that falls in the gap is called *missing*. Proposal ======== The "first" flag ---------------- We propose adding a boolean member called ``first`` to the instances of ``datetime.time`` and ``datetime.datetime`` classes. This member should have the value ``True`` for all instances except those that represent the second (chronologically) moment in time in an ambiguous case. [#]_ .. [#] An instance that has ``first=False`` in a non-ambiguous case is said to represent an invalid time (or is invalid for short), but users are not prevented from creating invalid instances by passing ``first=False`` to a constructor or to a ``replace()`` method. This is similar to the current situation with the instances that fall in the spring-forward gap. Such instances don't represent any valid time, but neither the constructors nor the ``replace()`` methods check whether the instances that they produce are valid. Moreover, this PEP specifies how various functions should behave when given an invalid instance. Affected APIs ------------- Attributes .......... Instances of ``datetime.time`` and ``datetime.datetime`` will get a new boolean attribute called "first." Constructors ............ The ``__new__`` methods of the ``datetime.time`` and ``datetime.datetime`` classes will get a new keyword-only argument called ``first`` with the default value ``True``. The value of the ``first`` argument will be used to initialize the value of the ``first`` attribute in the returned instance. Methods ....... The ``replace()`` methods of the ``datetime.time`` and ``datetime.datetime`` classes will get a new keyword-only argument called ``first``. It will behave similarly to the other ``replace()`` arguments: if the ``first`` argument is specified and given a boolean value, the new instance returned by ``replace()`` will have its ``first`` attribute set to that value. In CPython, a non-boolean value of ``first`` will raise a ``TypeError``, but other implementations may allow the value ``None`` to behave the same as when ``first`` is not given. If the ``first`` argument is not specified, the original value of the ``first`` attribute is copied to the result. C-API ..... Access macros will be defined to extract the value of ``first`` from ``PyDateTime_DateTime`` and ``PyDateTime_Time`` objects. .. code:: bool PyDateTime_GET_FIRST(PyDateTime_DateTime *o) Return the value of ``first`` as a C99 ``bool``. .. code:: bool PyDateTime_TIME_GET_FIRST(PyDateTime_Time *o) Return the value of ``first`` as a C99 ``bool``. Additional constructors will be defined that will take an additional boolean argument to specify the value of ``first`` in the created instance: .. code:: PyObject* PyDateTime_FromDateAndTimeAndFirst(int year, int month, int day, int hour, int minute, int second, int usecond, bool first) Return a ``datetime.datetime`` object with the specified year, month, day, hour, minute, second, microsecond and first. .. code:: PyObject* PyTime_FromTimeAndFirst(int hour, int minute, int second, int usecond, bool first) Return a ``datetime.time`` object with the specified hour, minute, second, microsecond and first. Affected Behaviors ------------------ What time is it? ................ The ``datetime.now()`` method called with no arguments, will set ``first=False`` when returning the second of the two ambiguous times in a fold. When called with a ``tzinfo`` argument, the value of the ``first`` will be determined by the ``tzinfo.fromutc()`` implementation. If an instance of the built-in ``datetime.timezone`` is passed as ``tzinfo``, the returned datetime instance will always have ``first=True``. Conversion from naive to aware .............................. The ``astimezone()`` method will now work for naive ``self``. The system local timezone will be assumed in this case and the ``first`` flag will be used to determine which local timezone is in effect in the ambiguous case. For example, on a system set to US/Eastern timezone:: >>> dt = datetime(2014, 11, 2, 1, 30) >>> dt.astimezone().strftime('%D %T %Z%z') '11/02/14 01:30:00 EDT-0400' >>> dt.replace(first=False).astimezone().strftime('%D %T %Z%z') '11/02/14 01:30:00 EST-0500' Conversion from POSIX seconds from EPOCH ........................................ The ``fromtimestamp()`` static method of ``datetime.datetime`` will set the ``first`` attribute appropriately in the returned object. For example, on a system set to US/Eastern timezone:: >>> datetime.fromtimestamp(1414906200) datetime.datetime(2014, 11, 2, 1, 30) >>> datetime.fromtimestamp(1414906200 + 3600) datetime.datetime(2014, 11, 2, 1, 30, first=False) Conversion to POSIX seconds from EPOCH ...................................... The ``timestamp()`` method of ``datetime.datetime`` will return different values for ``datetime.datetime`` instances that differ only by the value of their ``first`` attribute if and only if these instances represent an ambiguous or a missing time. When a ``datetime.datetime`` instance ``dt`` represents an ambiguous time, there are two values ``s0`` and ``s1`` such that:: datetime.fromtimestamp(s0) == datetime.fromtimestamp(s1) == dt In this case, ``dt.timestamp()`` will return the smaller of ``s0`` and ``s1`` values if ``dt.first == True`` and the larger otherwise. For example, on a system set to US/Eastern timezone:: >>> datetime(2014, 11, 2, 1, 30, first=True).timestamp() 1414906200.0 >>> datetime(2014, 11, 2, 1, 30, first=False).timestamp() 1414909800.0 When a ``datetime.datetime`` instance ``dt`` represents a missing time, there is no value ``s`` for which:: datetime.fromtimestamp(s) == dt but we can form two "nice to know" values of ``s`` that differ by the size of the gap in seconds. One is the value of ``s`` that would correspond to ``dt`` in a timezone where the UTC offset is always the same as the offset right before the gap and the other is the similar value but in a timezone the UTC offset is always the same as the offset right after the gap. The value returned by ``dt.timestamp()`` given a missing ``dt`` will be the larger of the two "nice to know" values if ``dt.first == True`` and the smaller otherwise. For example, on a system set to US/Eastern timezone:: >>> datetime(2015, 3, 8, 2, 30, first=True).timestamp() 1425799800.0 >>> datetime(2015, 3, 8, 2, 30, first=False).timestamp() 1425796200.0 Combining and splitting date and time ..................................... The ``datetime.datetime.combine()`` method will copy the value of the ``first`` attribute to the resulting ``datetime.datetime`` instance. The ``datetime.datetime.time()`` method will copy the value of the ``first`` attribute to the resulting ``datetime.time`` instance. Pickles ....... Pickle sizes for the ``datetime.datetime`` and ``datetime.time`` objects will not change. The ``first`` flag will be encoded in the first bit of the 5th byte of the ``datetime.datetime`` pickle payload or the 2nd byte of the datetime.time. In the `current implementation`_ these bytes are used to store minute value (0-59) and the first bit is always 0. Note that ``first=True`` will be encoded as 0 in the first bit and ``first=False`` as 1. (This change only affects pickle format. In the C implementation, the "first" member will get a full byte to store the actual boolean value.) .. _current implementation: https://hg.python.org/cpython/file/d3b20bff9c5d/Include/datetime.h#l17 Implementations of tzinfo in stdlib =================================== No new implementations of ``datetime.tzinfo`` abstract class are proposed in this PEP. The existing (fixed offset) timezones do not introduce ambiguous local times and their ``utcoffset()`` implementation will return the same constant value as they do now regardless of the value of ``first``. The basic implementation of ``fromutc()`` in the abstract ``datetime.tzinfo`` class will not change. It is currently not used anywhere in the stdlib because the only included ``tzinfo`` implementation (the ``datetime.timzeone`` class implementing fixed offset timezones) override ``fromutc()``. Guidelines for New tzinfo Implementations ========================================= Implementors of concrete ``datetime.tzinfo`` subclasses who want to support variable UTC offsets (due to DST and other causes) should follow these guidelines. Ignorance is Bliss ------------------ New implementations of ``utcoffset()``, ``tzname()`` and ``dst()`` methods should ignore the value of ``first`` unless they are called on the ambiguous or missing times. In the DST Fold --------------- New subclasses should override the base-class ``fromutc()`` method and implement it so that in all cases where two UTC times ``u1`` and ``u2`` (``u1`` <``u2``) correspond to the same local time ``fromutc(u1)`` will return an instance with ``first=True`` and ``fromutc(u2)`` will return an instance with ``first=False``. In all other cases the returned instance should have ``first=True``. On an ambiguous time introduced at the end of DST, the values returned by ``utcoffset()`` and ``dst()`` methods should be as follows +-----------------+----------------+------------------+ | | first=True | first=False | +=================+================+==================+ | utcoffset() | stdoff + dstoff| stdoff | +-----------------+----------------+------------------+ | dst() | dstoff | zero | +-----------------+----------------+------------------+ where ``stdoff`` is the standard (non-DST) offset, ``dstoff`` is the DST correction (typically ``dstoff = timedelta(hours=1)``) and ``zero = timedelta(0)``. Mind the DST Gap ---------------- On a missing time introduced at the start of DST, the values returned by ``utcoffset()`` and ``dst()`` methods should be as follows +-----------------+----------------+------------------+ | | first=True | first=False | +=================+================+==================+ | utcoffset() | stdoff | stdoff + dstoff | +-----------------+----------------+------------------+ | dst() | zero | dstoff | +-----------------+----------------+------------------+ Non-DST Folds and Gaps ---------------------- On ambiguous/missing times introduced by the change in the standard time offset, the ``dst()`` method should return the same value regardless of the value of ``first`` and the ``utcoffset()`` should return values according to the following table: +-----------------+----------------+-----------------------------+ | | first=True | first=False | +=================+================+=============================+ | ambiguous | oldoff | newoff = oldoff - delta | +-----------------+----------------+-----------------------------+ | missing | oldoff | newoff = oldoff + delta | +-----------------+----------------+-----------------------------+ where ``delta`` is the size of the fold or the gap. Temporal Arithmetic =================== The value of "first" will be ignored in all operations except those that involve conversion between timezones. [#]_ As a consequence, ``datetime.datetime`` or ``datetime.time`` instances that differ only by the value of ``first`` will compare as equal. Applications that need to differentiate between such instances should check the value of ``first`` or convert them to a timezone that does not have ambiguous times. The result of addition (subtraction) of a timedelta to (from) a datetime will always have ``first`` set to ``True`` even if the original datetime instance had ``first=False``. .. [#] Computing a difference between two aware datetime instances with different values of ``tzinfo`` involves an implicit timezone conversion. In this case, the result may depend on the value of the ``first`` flag in either of the instances, but only if the instance has ``tzinfo`` that accounts for the value of ``first`` in its ``utcoffset()`` method. Backward and Forward Compatibility ================================== This proposal will have little effect on the programs that do not read the ``first`` flag explicitly or use tzinfo implementations that do. The only visible change for such programs will be that conversions to and from POSIX timestamps will now round-trip correctly (up to floating point rounding). Programs that implemented a work-around to the old incorrect behavior may need to be modified. Pickles produced by older programs will remain fully forward compatible. Only datetime/time instances with ``first=False`` pickled in the new versions will become unreadable by the older Python versions. Pickles of instances with ``first=True`` (which is the default) will remain unchanged. Questions and Answers ===================== Why not call the new flag "isdst"? ---------------------------------- A non-technical answer ...................... * Alice: Bob - let's have a stargazing party at 01:30 AM tomorrow! * Bob: Should I presume initially that Daylight Saving Time is or is not in effect for the specified time? * Alice: Huh? ------- * Bob: Alice - let's have a stargazing party at 01:30 AM tomorrow! * Alice: You know, Bob, 01:30 AM will happen twice tomorrow. Which time do you have in mind? * Bob: I did not think about it, but let's pick the first. ------- A technical reason .................. While the ``tm_isdst`` field of the ``time.struct_time`` object can be used to disambiguate local times in the fold, the semantics of such disambiguation are completely different from the proposal in this PEP. The main problem with the ``tm_isdst`` field is that it is impossible to know what value is appropriate for ``tm_isdst`` without knowing the details about the time zone that are only available to the ``tzinfo`` implementation. Thus while ``tm_isdst`` is useful in the *output* of methods such as ``time.localtime``, it is cumbersome as an *input* of methods such as ``time.mktime``. If the programmer misspecified a non-negative value of ``tm_isdst`` to ``time.mktime``, the result will be time that is 1 hour off and since there is rarely a way to know anything about DST *before* a call to ``time.mktime`` is made, the only sane choice is usually ``tm_isdst=-1``. Unlike ``tm_isdst``, the proposed ``first`` flag has no effect on the interpretation of the datetime instance unless without that flag two (or no) interpretations are possible. Since it would be very confusing to have something called ``isdst`` that does not have the same semantics as ``tm_isdst``, we need a different name. Moreover, the ``datetime.datetime`` class already has a method called ``dst()`` and if we called ``first`` "isdst", we would necessarily have situations when "isdst" and ``bool(dst())`` values are different. Why "first"? ------------ This is a working name chosen initially because the obvious alternative ("second") conflicts with the existing attribute. It has since become clear that it is desirable to have a flag with the default value ``False`` and such that chronological ordering of disambiguated (datetime, flag) pairs would match their lexicographical order. The following alternative names have been proposed: **fold** Suggested by Guido van Rossum and favored by one (but disfavored by another) author. Has correct connotations and easy mnemonic rules, but at the same time does not invite unbased assumptions. **later** A close contender to "fold". One author dislikes it because it is confusable with equally fitting "latter," but in the age of auto-completion everywhere this is a small consideration. A stronger objection may be that in the case of missing time, we will have ``later=True`` instance converted to an earlier time by ``.astimezone(timezone.utc)`` that that with ``later=False``. Yet again, this can be interpreted as a desirable indication that the original time is invalid. **repeated** Did not receive any support on the mailing list. **ltdf** (Local Time Disambiguation Flag) - short and no-one will attempt to guess what it means without reading the docs. (Feel free to use it in discussions with the meaning ltdf=False is the earlier if you don't want to endorse any of the alternatives above.) Are two values enough? ---------------------- Several reasons have been raised to allow a ``None`` or -1 value for the ``first`` flag: backward compatibility, analogy with ``tm_isdst`` and strict checking for invalid times. Backward Compatibility ...................... This PEP provides only three ways for a program to discover that two otherwise identical datetime instances have different values of ``first``: (1) an explicit check of the ``first`` attribute; (2) if the instances are naive - conversion to another timezone using the ``astimezone()`` method; and (3) conversion to ``float`` using the ``timestamp()`` method. Since ``first`` is a new attribute, the first option is not available to the existing programs. Note that option (2) only works for naive datetimes that happen to be in a fold or a gap in the system time zone. In all other cases, the value of ``first`` will be ignored in the conversion unless the instances use a ``first``-aware ``tzinfo`` which would not be available in a pre-PEP program. Similarly, the ``astimezone()`` called on a naive instance will not be available in such program because ``astimezone()`` does not currently work with naive datetimes. This leaves us with only one situation where an existing program can start producing diferent results after the implementation of this PEP: when a ``datetime.timestamp()`` method is called on a naive datetime instance that happen to be in the fold or the gap. In the current implementation, the result is undefined. Depending on the system ``mktime`` implementation, the programs can see different results or errors in those cases. With this PEP in place, the value of timestamp will be well-defined in those cases but will depend on the value of the ``first`` flag. We consider the change in ``datetime.timestamp()`` method behavior a bug fix enabled by this PEP. The old behavior can still be emulated by the users who depend on it by writing ``time.mktime(dt.timetuple()) + 1e-6*dt.microsecond`` instead of ``dt.timestamp()``. Analogy with tm_isdst ..................... The ``time.mktime`` interface allows three values for the ``tm_isdst`` flag: -1, 0, and 1. As we explained above, -1 (asking ``mktime`` to determine whether DST is in effect for the given time from the rest of the fields) is the only choice that is useful in practice. With the ``first`` flag, however, ``datetime.timestamp()`` will return the same value as ``mktime`` with ``tm_isdst=-1`` in 99.98% of the time for most time zones with DST transitions. Moreover, ``tm_isdst=-1``-like behavior is specified *regardless* of the value of ``first``. It is only in the 0.02% cases (2 hours per year) that the ``datetime.timestamp()`` and ``mktime`` with ``tm_isdst=-1`` may disagree. However, even in this case, most of the ``mktime`` implementations will return the ``first=True`` or the ``first=False`` value even though relevant standards allow ``mktime`` to return -1 and set an error code in those cases. In other words, ``tm_isdst=-1`` behavior is not missing from this PEP. To the contrary, it is the only behavior provided in two different well-defined flavors. The behavior that is missing is when a given local hour is interpreted as a different local hour because of the misspecified ``tm_isdst``. For example, in the DST-observing time zones in the Northern hemisphere (where DST is in effect in June) one can get .. code:: >>> from time import mktime, localtime >>> t = mktime((2015, 6, 1, 12, 0, 0, -1, -1, 0)) >>> localtime(t)[:] (2015, 6, 1, 13, 0, 0, 0, 152, 1) Note that 12:00 was interpreted as 13:00 by ``mktime``. With the ``datetime.timestamp``, ``datetime.fromtimestamp``, it is currently guaranteed that .. code:: >>> t = datetime.datetime(2015, 6, 1, 12).timestamp() >>> datetime.datetime.fromtimestamp(t) datetime.datetime(2015, 6, 1, 12, 0) This PEP extends the same guarantee to both values of ``first``: .. code:: >>> t = datetime.datetime(2015, 6, 1, 12, first=True).timestamp() >>> datetime.datetime.fromtimestamp(t) datetime.datetime(2015, 6, 1, 12, 0) .. code:: >>> t = datetime.datetime(2015, 6, 1, 12, first=False).timestamp() >>> datetime.datetime.fromtimestamp(t) datetime.datetime(2015, 6, 1, 12, 0) Thus one of the suggested uses for ``first=-1`` -- to match the legacy behavior -- is not needed. Either choice of ``first`` will match the old behavior except in the few cases where the old behavior was undefined. Strict Invalid Time Checking ............................ Another suggestion was to use ``first=-1`` or ``first=None`` to indicate that the program truly has no means to deal with the folds and gaps and ``dt.utcoffset()`` should raise an error whenever ``dt`` represents an ambiguous or missing local time. The main problem with this proposal, is that ``dt.utcoffset()`` is used internally in situations where raising an error is not an option: for example, in dictionary lookups or list/set membership checks. So strict gap/fold checking behavior would need to be controlled by a separate flag, say ``dt.utcoffset(raise_on_gap=True, raise_on_fold=False)``. However, this functionality can be easily implemented in user code: .. code:: def utcoffset(dt, raise_on_gap=True, raise_on_fold=False): u = dt.utcoffset() v = dt.replace(first=not dt.first).utcoffset() if u == v: return u if (u < v) == dt.first: if raise_on_fold: raise AmbiguousTimeError else: if raise_on_gap: raise MissingTimeError return u Moreover, raising an error in the problem cases is only one of many possible solutions. An interactive program can ask the user for additional input, while a server process may log a warning and take an appropriate default action. We cannot possibly provide functions for all possible user requirements, but this PEP provides the means to implement any desired behavior in a few lines of code. Implementation ============== * Github fork: https://github.com/abalkin/cpython * Tracker issue: http://bugs.python.org/issue24773 Copyright ========= This document has been placed in the public domain. Picture Credit ============== This image is a work of a U.S. military or Department of Defense employee, taken or made as part of that person's official duties. As a work of the U.S. federal government, the image is in the public domain.