PEP: 410 Title: Use decimal.Decimal type for timestamps Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 01-Feburary-2012 Python-Version: 3.3 Abstract ======== Python 3.3 introduced functions supporting nanosecond resolutions. Python 3.3 only supports int or float to store timestamps, but these types cannot be use to store a timestamp with a nanosecond resolution. Motivation ========== Python 2.3 introduced float timestamps to support subsecond resolutions. os.stat() uses float timestamps by default since Python 2.5. Python 3.3 introduced functions supporting nanosecond resolutions: * os module: stat(), utimensat(), futimens() * time module: clock_gettime(), clock_getres(), wallclock() The Python float type uses binary64 format of the IEEE 754 standard. With a resolution of 1 nanosecond (10\ :sup:`-9`), float timestamps lose precision for values bigger than 2\ :sup:`24` seconds (194 days: 1970-07-14 for an Epoch timestamp). .. note:: With a resolution of 1 microsecond (10\ :sup:`-6`), float timestamps lose precision for values bigger than 2\ :sup:`33` seconds (272 years: 2242-03-16 for an Epoch timestamp). Specification ============= Add decimal.Decimal as a new type for timestamps. Decimal supports any timestamp resolution, support arithmetic operations and is comparable. Functions getting float inputs support directly Decimal, Decimal is converted implicitly to float, even if the conversion may lose precision. Add a *timestamp* optional argument to: * os module: fstat(), fstatat(), lstat() and stat() * time module: clock(), clock_gettime(), clock_getres(), time() and wallclock() The *timestamp* argument is a type, there are three supported types: * int * float * decimal.Decimal The float type is still used by default for backward compatibility. Support decimal.Decimal (without implicit conversion to float to avoid lose of precision) in functions having timestamp arguments: * datetime.datetime.fromtimestamp() * time.gmtime(), time.localtime() * os.utimensat(), os.futimens() The os.stat_float_times() is deprecated: use timestamp=int argument instead. .. note:: The decimal module is implemented in Python and is slow, but there is a C reimplementation which is almost ready for inclusion in CPython. Backwards Compatibility ======================= The default timestamp type is unchanged, so there is no impact of backwad compatibility, nor impact on performances. The new timestamp type, decimal.Decimal, is only used when requested explicitly. Alternatives: Timestamp types ============================= To support timestamps with a nanosecond resolution, five types were considered: * 128 bits float * decimal.Decimal * datetime.datetime * datetime.timedelta * tuple of integers Criteria: * Doing arithmetic on timestamps must be possible. * Timestamps must be comparable. * The type must have a resolution of a least 1 nanosecond (without losing precision) or an arbitrary resolution. 128 bits float -------------- Add a new IEEE 754-2008 quad-precision float type. The IEEE 754-2008 quad precision float has 1 sign bit, 15 bits of exponent and 112 bits of mantissa. 128 bits float is supported by GCC (4.3), Clang and ICC compilers. Python must be portable and so cannot rely on a type only available on some platforms. For example, Visual C++ 2008 doesn't support it 128 bits float, whereas it is used to build the official Windows executables. Another example: GCC 4.3 does not support __float128 in 32-bit mode on x86 (but GCC 4.4 does). Intel CPUs have FPU (x87) supporting 80-bit floats, but not using SSE intructions. Other CPU vendors don't support this float size. There is also a license issue: GCC uses the MPFR library for 128 bits float, library distributed under the GNU LGPL license. This license is not compatible with the Python license. datetime.datetime ----------------- datetime.datetime only supports microsecond resolution, but can be enhanced to support nanosecond. datetime.datetime has issues: - there is no easy way to convert it into "seconds since the epoch" - any broken-down time has issues of time stamp ordering in the duplicate hour of switching from DST to normal time - time zone support is flaky-to-nonexistent in the datetime module datetime.datetime is also more complex than a simple number. datetime.timedelta ------------------ As datetime.datetime, datetime.timedelta only supports microsecond resolution, but can be enhanced to support nanosecond. Even if datetime.timedelta have most criteria, it was not selected because it is more complex than a simple number and is not accepted by functions getting timestamp inputs. .. _tuple-integers: Tuple of integers ----------------- Creating a tuple of integers is simple and fast, but arithmetic operations cannot be done directly on tuple. For example, (2, 1) - (2, 0) fails with a TypeError. An integer fraction can be used to store any number without loss of precision with any resolution: (numerator: int, denominator: int). The timestamp value can be computed with a simple division: numerator / denominator. For the C implementation, a variant can be used to avoid integer overflow because C types have a fixed size: (intpart: int, numerator: int, denominator: int), value = intpart + numerator / denominator. Still to avoid integer overflow in C types, numerator can be bigger than denominator while intpart can be zero. Other formats have been proposed: * A: (sec, nsec): value = sec + nsec * 10\ :sup:`-9` * B: (intpart, floatpart, exponent): value = intpart + floatpart * 10\ :sup:`exponent` * C: (intpart, floatpart, base, exponent): value = intpart + floatpart * base\ :sup:`exponent` The format A only supports nanosecond resolution. Formats A and B lose precision if the clock frequency cannot be written as a power of 10: if the clock frequency is not coprime with 2 and 5. For some clocks, like ``QueryPerformanceCounter()`` on Windows, the frequency is only known as runtime. The base and exponent has to be computed. If computing the base and the exponent is too expensive (or not possible, e.g. if the frequency is a prime number), exponent=1 can be used. The format (C) is just a fractionn if exponent=1. The only advantage of these formats is a small optimization if the base is 2 for float or if the base 10 for Decimal. In other cases, frequency = base\ :sup:`exponent` must be computed again to convert a timestamp as float or Decimal. Storing directly the frequency in the denominator is simpler. Alternatives: API design ======================== Add a global flag to change the timestamp type ---------------------------------------------- A global flag like os.stat_decimal_times(), similar to os.stat_float_times(), can be added to set globally the timestamp type. A global flag may cause issues with libraries and applications expecting float instead of Decimal. A float cannot be converted implicitly to Decimal. The os.stat_float_times() case is different because an int can be converted implictly to float. Add a protocol to create a timestamp ------------------------------------ Instead of hardcoding how timestamps are created, a new protocol can be added to create a timestamp from a fraction. time.time(timestamp=type) would call type.__from_fraction__(numerator, denominator) to create a timestamp object of the specified type. If the type doesn't support the protocol, a fallback can be used: type(numerator) / type(denominator). A variant is to use a "converter" callback to create a timestamp. Example creating a float timestamp: def timestamp_to_float(numerator, denominator): return float(numerator) / float(denominator) Common converters can be provided by time, datetime and other modules, or maybe a specific "hires" module. Users can defined their own converters. Such protocol has a limitation: the structure of data passed to the protocol or the callback has to be decided once and cannot be changed later. For example, adding a timezone or the absolution start of the timestamp (e.g. Epoch or unspecified start for monotonic clocks) would break the API. The protocol proposition was as being excessive given the requirements, but that the specific syntax proposed (time.time(timestamp=type)) allows this to be introduced later if compelling use cases are discovered. .. note:: Other formats can also be used instead of a fraction: see the `Tuple of integers `_ section Add new fields to os.stat ------------------------- It was proposed to add 3 fields to os.stat() structure to get nanoseconds of timestamps. Populating the extra fields is time consuming. If new fields are available by default, any call to os.stat() would be slower. If new fields are optional, the stat structure would have a variable number of fields, which can be surprising. Anyway, this approach does not help with the time module. Add a boolean argument ---------------------- Because we only need one new type, decimal.Decimal, a simple boolean flag can be added. For example, time.time(decimal=True) or time.time(hires=True). The boolean argument API was rejected because it is not "pythonic". Changing the return type with a parameter value is preferred over a boolean parameter (a flag). Add new functions ----------------- Add new functions for each type, examples: * time.clock_decimal() * time.time_decimal() * os.stat_decimal() * etc. Adding a new function for each function creating timestamps duplicate a lot of time. Links ===== * `Issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution `_ * `Issue #13882: Add format argument for time.time(), time.clock(), ... to get a timestamp as a Decimal object `_ * `[Python-Dev] Store timestamps as decimal.Decimal objects `_ Copyright ========= This document has been placed in the public domain.