python-peps/pep-0410.txt

PEP: 410
Title: Use decimal.Decimal type for timestamps
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@haypocalc.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 01-February-2012
Python-Version: 3.3


Abstract
========

Python 3.3 introduced functions supporting nanosecond resolutions. Python 3.3
only supports int or float to store timestamps, but these types cannot be use
to store a timestamp with a nanosecond resolution.


Motivation
==========

Python 2.3 introduced float timestamps to support subsecond resolutions.
os.stat() uses float timestamps by default since Python 2.5. Python 3.3
introduced functions supporting nanosecond resolutions:

 * os module: futimens(), utimensat()
 * time module: clock_gettime(), clock_getres(), monotonic(), wallclock()

os.stat() reads nanoseconds fields of the stat structure, but returns
timestamps as float.

The Python float type uses binary64 format of the IEEE 754 standard. With a
resolution of 1 nanosecond (10\ :sup:`-9`), float timestamps lose precision for values
bigger than 2\ :sup:`24` seconds (194 days: 1970-07-14 for an Epoch timestamp).

Nanosecond resolution is required to set the exact modification time on
filesystems supporting nanosecond timestamps (e.g ext4, btrfs, NTFS, ...).  It
helps also to compare the modification time of two files when checking which
one is newer. Examples: copy a file and its modification time using
shutil.copystat(), create a TAR archive with the tarfile module, manage a
mailbox with the mailbox module, etc.

An arbitrary resolution is preferred over a fixed resolution (like nanosecond)
to not have to change the API when a better resolution is required. For
example, the NTP protocol uses fractions of 2\ :sup:`32` seconds
(approximatively 2.3 x 10\ :sup:`-10` second), whereas the NTP protocol version
4 uses fractions of 2\ :sup:`64` seconds (5.4 x 10\ :sup:`-20` second).

.. note::
   With a resolution of 1 microsecond (10\ :sup:`-6`), float timestamps lose
   precision for values bigger than 2\ :sup:`33` seconds (272 years: 2242-03-16
   for an Epoch timestamp).

   With a resolution of 100 nanoseconds (10\ :sup:`-7`), float timestamps lose
   precision for values bigger than 2\ :sup:`29` seconds (17 years: 1987-01-05
   for an Epoch timestamp).


Specification
=============

Add decimal.Decimal as a new type for timestamps. Decimal supports any
timestamp resolution, support arithmetic operations and is comparable.
Functions getting float inputs support directly Decimal, Decimal is converted
implicitly to float, even if the conversion may lose precision.

Add a *timestamp* optional argument to:

 * os module: fstat(), fstatat(), lstat() and stat()
 * time module: clock(), clock_gettime(), clock_getres(), time() and
   wallclock()

The *timestamp* argument is a type, there are two supported types:

 * float
 * decimal.Decimal

The float type is still used by default for backward compatibility.

Support decimal.Decimal (without implicit conversion to float to avoid lose of
precision) in functions having timestamp arguments:

 * datetime.datetime.fromtimestamp()
 * time.gmtime(), time.localtime()
 * os.utimensat(), os.futimens()

The os.stat_float_times() is deprecated: use an explicit cast using int()
instead.

.. note::
   The decimal module is implemented in Python and is slow, but there is a C
   reimplementation which is almost ready for inclusion in CPython.


Backwards Compatibility
=======================

The default timestamp type is unchanged, so there is no impact of backwad
compatibility, nor impact on performances. The new timestamp type,
decimal.Decimal, is only used when requested explicitly.


Alternatives: Timestamp types
=============================

To support timestamps with a nanosecond resolution, five types were considered:

 * 128 bits float
 * decimal.Decimal
 * datetime.datetime
 * datetime.timedelta
 * tuple of integers

Criteria:

 * Doing arithmetic on timestamps must be possible.
 * Timestamps must be comparable.
 * The type must have a resolution of a least 1 nanosecond (without losing
   precision) or an arbitrary resolution.

128 bits float
--------------

Add a new IEEE 754-2008 quad-precision float type. The IEEE 754-2008 quad
precision float has 1 sign bit, 15 bits of exponent and 112 bits of mantissa.

128 bits float is supported by GCC (4.3), Clang and ICC compilers. Python must
be portable and so cannot rely on a type only available on some platforms. For
example, Visual C++ 2008 doesn't support it 128 bits float, whereas it is used
to build the official Windows executables. Another example: GCC 4.3 does not
support __float128 in 32-bit mode on x86 (but GCC 4.4 does).

Intel CPUs have FPU (x87) supporting 80-bit floats, but not using SSE
intructions.  Other CPU vendors don't support this float size.

There is also a license issue: GCC uses the MPFR library for 128 bits float,
library distributed under the GNU LGPL license. This license is not compatible
with the Python license.

datetime.datetime
-----------------

Except os.stat(), time.time() and time.clock_gettime(time.CLOCK_GETTIME), all
time functions have an unspecified starting point and no timezone information,
and so cannot be converted to datetime.datetime.

datetime.datetime only supports microsecond resolution, but can be enhanced
to support nanosecond.

datetime.datetime has issues with timezone. For example, a datetime object
without timezone and a datetime with a timezone cannot be compared.

datetime.datetime has ordering issues with daylight saving time (DST) in the
duplicate hour of switching from DST to normal time.

datetime.datetime is not as well integrated than Epoch timestamps: there is no
datetime.datetime.totimestamp() function. Most functions expecting tiemstamps
don't support datime.datetime. For example, os.utime() expects a tuple of Epoch
timestamps.

datetime.timedelta
------------------

As datetime.datetime, datetime.timedelta only supports microsecond resolution,
but can be enhanced to support nanosecond.

datetime.timedelta is not as well integrated than Epoch timestamps, some
functions don't accept this type as input. Converting a timedelta object to a
float (number of seconds) requires to call an explicit method,
timedelta.toseconds(). Supporting timedelta would need to change every
functions getting timestamps, whereas all functions supporting float already
accept Decimal because Decimal can be casted to float.

.. _tuple-integers:

Tuple of integers
-----------------

Creating a tuple of integers is simple and fast, but arithmetic operations
cannot be done directly on tuple. For example, (2, 1) - (2, 0) fails with a
TypeError.

An integer fraction can be used to store any number without loss of precision
with any resolution: (numerator: int, denominator: int). The timestamp value
can be computed with a simple division: numerator / denominator.

For the C implementation, a variant can be used to avoid integer overflow
because C types have a fixed size: (intpart: int, numerator: int, denominator:
int), value = intpart + numerator / denominator. Still to avoid integer
overflow in C types, numerator can be bigger than denominator while intpart can
be zero.

Other formats have been proposed:

 * A: (sec, nsec): value = sec + nsec * 10\ :sup:`-9`
 * B: (intpart, floatpart, exponent): value = intpart + floatpart * 10\ :sup:`exponent`
 * C: (intpart, floatpart, base, exponent): value = intpart + floatpart * base\ :sup:`exponent`

The format A only supports nanosecond resolution. Formats A and B lose
precision if the clock frequency cannot be written as a power of 10: if the
clock frequency is not coprime with 2 and 5.

For some clocks, like ``QueryPerformanceCounter()`` on Windows, the frequency
is only known as runtime. The base and exponent has to be computed. If
computing the base and the exponent is too expensive (or not possible, e.g. if
the frequency is a prime number), exponent=1 can be used. The format (C) is
just a fractionn if exponent=1.

The only advantage of these formats is a small optimization if the base is 2
for float or if the base 10 for Decimal. In other cases, frequency = base\
:sup:`exponent` must be computed again to convert a timestamp as float or
Decimal. Storing directly the frequency in the denominator is simpler.

timespec structure
------------------

A resolution of one nanosecond is enough to support all current C functions. A
Timespec type can be added to store a timestamp with a nanosecond resolution.
Basic example supporting addition, subtraction and coercion to float::

    class timespec(tuple):
        def __new__(cls, sec, nsec):
            if not isinstance(sec, int):
                raise TypeError
            if not isinstance(nsec, int):
                raise TypeError
            asec, nsec = divmod(nsec, 10 ** 9)
            sec += asec
            obj = tuple.__new__(cls, (sec, nsec))
            obj.sec = sec
            obj.nsec = nsec
            return obj

        def __float__(self):
            return self.sec + self.nsec * 1e-9

        def total_nanoseconds(self):
            return self.sec * 10 ** 9 + self.nsec

        def __add__(self, other):
            if not isinstance(other, timespec):
                raise TypeError
            ns_sum = self.total_nanoseconds() + other.total_nanoseconds()
            return timespec(*divmod(ns_sum, 10 ** 9))

        def __sub__(self, other):
            if not isinstance(other, timespec):
                raise TypeError
            ns_diff = self.total_nanoseconds() - other.total_nanoseconds()
            return timespec(*divmod(ns_diff, 10 ** 9))

        def __str__(self):
            if self.sec < 0 and self.nsec:
                sec = abs(1 + self.sec)
                nsec = 10**9 - self.nsec
                return '-%i.%09u' % (sec, nsec)
            else:
                return '%i.%09u' % (self.sec, self.nsec)

        def __repr__(self):
            return '<timespec(%s, %s)>' % (self.sec, self.nsec)

The timespec type is similar to the `Tuple of integer, variant (A)
<tuple-integers>`_ type, except that it supports arithmetic.

The timespec type was rejected because it only supports nanosecond resolution.


Alternatives: API design
========================

Add a global flag to change the timestamp type
----------------------------------------------

A global flag like os.stat_decimal_times(), similar to os.stat_float_times(),
can be added to set globally the timestamp type.

A global flag may cause issues with libraries and applications expecting float
instead of Decimal. A float cannot be converted implicitly to Decimal.  The
os.stat_float_times() case is different because an int can be converted
implictly to float.

Add a protocol to create a timestamp
------------------------------------

Instead of hardcoding how timestamps are created, a new protocol can be added
to create a timestamp from a fraction. time.time(timestamp=type) would call
type.__from_fraction__(numerator, denominator) to create a timestamp object of
the specified type.

If the type doesn't support the protocol, a fallback can be used:
type(numerator) / type(denominator).

A variant is to use a "converter" callback to create a timestamp. Example
creating a float timestamp:

    def timestamp_to_float(numerator, denominator):
        return float(numerator) / float(denominator)

Common converters can be provided by time, datetime and other modules, or maybe
a specific "hires" module. Users can defined their own converters.

Such protocol has a limitation: the structure of data passed to the protocol or
the callback has to be decided once and cannot be changed later. For example,
adding a timezone or the absolution start of the timestamp (e.g. Epoch or
unspecified start for monotonic clocks) would break the API.

The protocol proposition was as being excessive given the requirements, but
that the specific syntax proposed (time.time(timestamp=type)) allows this to be
introduced later if compelling use cases are discovered.

.. note::
   Other formats can also be used instead of a fraction: see the `Tuple of integers
   <tuple-integers>`_ section

Add new fields to os.stat
-------------------------

To get the creation, modification and access time of a file with a nanosecond
resolution, three fields can be added to os.stat() structure.

The new fields can timestamps with nanosecond resolution (tuple of integers,
timespec structure, Decimal, etc.) or the nanosecond part of each timestamp.

If the new fields are timestamps with nanosecond resolution, populating the
extra fields would be time consuming. Any call to os.stat() would be slower,
even if os.stat() is only called to check if the file exists. A parameter can
be added to os.stat() to make these fields optional, but a structure with a
variable number of fields can be problematic.

If the new fields only contain the fractional part (nanoseconds), os.stat()
would be efficient. These fields would always be present and so set to zero if
the operating system does not support subsecond resolution. Splitting a
timestamp in two parts, seconds and nanoseconds, is similar to the `timespec
type <timespec>`_ and `tuple of integers <tuple-integers>`_, and so have the
same drawbacks.

Adding new fields to the os.stat() structure does not solve the nanosecond
issue in other modules (e.g. time).

Add a boolean argument
----------------------

Because we only need one new type, decimal.Decimal, a simple boolean flag
can be added. For example, time.time(decimal=True) or time.time(hires=True).

Such flag would require to do an hidden import which is considered as a bad
practice.

The boolean argument API was rejected because it is not "pythonic". Changing
the return type with a parameter value is preferred over a boolean parameter (a
flag).

Add new functions
-----------------

Add new functions for each type, examples:

 * time.clock_decimal()
 * time.time_decimal()
 * os.stat_decimal()
 * os.stat_timespec()
 * etc.

Adding a new function for each function creating timestamps duplicate a lot
of code.


Links
=====

 * `Issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution <http://bugs.python.org/issue11457>`_
 * `Issue #13882: Add format argument for time.time(), time.clock(), ... to get a timestamp as a Decimal object <http://bugs.python.org/issue13882>`_
 * `[Python-Dev] Store timestamps as decimal.Decimal objects <http://mail.python.org/pipermail/python-dev/2012-January/116025.html>`_
 * `Issue #7652: Merge C version of decimal into py3k <http://bugs.python.org/issue7652>`_ (cdecimal)


Copyright
=========

This document has been placed in the public domain.