Add PEP 410: Use decimal.Decimal type for timestamps

2012-02-03 01:06:18 +01:00 · 2012-02-03 01:06:18 +01:00 · 22e5bccdac
parent 634ee68583
commit 22e5bccdac
1 changed files with 260 additions and 0 deletions
--- a/pep-0410.txt
+++ b/pep-0410.txt
@ -0,0 +1,260 @@
+PEP: 410
+Title: Use decimal.Decimal type for timestamps
+Version: $Revision$
+Last-Modified: $Date$
+Author: Victor Stinner <victor.stinner@haypocalc.com>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 01-Feburary-2012
+Python-Version: 3.3
+
+
+Abstract
+========
+
+Python 3.3 introduced functions supporting nanosecond resolutions. Python 3.3
+only supports int or float to store timestamps, but these types cannot be use
+to store a timestamp with a nanosecond resolution.
+
+
+Motivation
+==========
+
+Python 2.3 introduced float timestamps to support subsecond resolutions.
+os.stat() uses float timestamps by default since Python 2.5.
+
+Python 3.3 introduced functions supporting nanosecond resolutions:
+
+ * os module: stat(), utimensat(), futimens()
+ * time module: clock_gettime(), clock_getres(), wallclock()
+
+The Python float type uses binary64 format of the IEEE 754 standard. With a
+resolution of 1 nanosecond (10^-9), float timestamps lose precision for values
+bigger than 2^24 seconds (194 days: 1970-07-14 for an Epoch timestamp).
+
+.. note::
+   With a resolution of 1 microsecond (10^-6), float timestamps lose precision
+   for values bigger than 2^33 seconds (272 years: 2242-03-16 for an Epoch
+   timestamp).
+
+
+Specification
+=============
+
+Add decimal.Decimal as a new type for timestamps. Add a *timestamp* optional
+argument to:
+
+ * os module: fstat(), fstatat(), lstat() and stat()
+ * time module: clock(), clock_gettime(), clock_getres(), time() and
+   wallclock()
+
+The *timestamp* argument is a type, there are three supported types:
+
+ * int
+ * float
+ * decimal.Decimal
+
+The float type is still used by default for backward compatibility.
+
+Support decimal.Decimal (without implicit conversion to float to avoid lose of
+precision) in functions having timestamp arguments:
+
+ * datetime.datetime.fromtimestamp()
+ * time.gmtime(), time.localtime()
+ * os.utimensat(), os.futimens()
+
+
+Backwards Compatibility
+=======================
+
+The default timestamp type is unchanged, so there is no impact of backwad
+compatibility, nor impact on performances. The new timestamp type,
+decimal.Decimal, is only used when requested explicitly.
+
+
+Alternatives: Timestamp types
+=============================
+
+To support timestamps with a nanosecond resolution, five types were considered:
+
+ * 128 bits float
+ * decimal.Decimal
+ * datetime.datetime
+ * datetime.timedelta
+ * tuple of integers
+
+Criteria:
+
+ * Doing arithmetic on timestamps must be possible.
+ * Timestamps must be comparable.
+ * The type must have a resolution of a least 1 nanosecond (without losing
+   precision) or an arbitrary resolution.
+
+128 bits float
+--------------
+
+Add a new IEEE 754-2008 quad-precision float type. The IEEE 754-2008 quad
+precision float has 1 sign bit, 15 bits of exponent and 112 bits of mantissa.
+
+128 bits float is supported by GCC (4.3), Clang and ICC. The problem is that
+Visual C++ 2008 doesn't support it. Python must be portable and so cannot rely
+on a type only available on some platforms. Another example: GCC 4.3 does not
+support __float128 in 32-bit mode on x86 (but gcc 4.4 does).
+
+Intel CPUs have FPU supporting 80-bit floats, but not using SSE intructions.
+Other CPU vendors don't support this float size.
+
+There is also a license issue: GCC uses the MPFR library which is distributed
+under the GNU LGPL license. This license is incompatible with the Python
+license.
+
+datetime.datetime
+-----------------
+
+datetime.datetime only supports microsecond resolution, but can be enhanced
+to support nanosecond.
+
+datetime.datetime has issues:
+
+- there is no easy way to convert it into "seconds since the epoch"
+- any broken-down time has issues of time stamp ordering in the
+  duplicate hour of switching from DST to normal time
+- time zone support is flaky-to-nonexistent in the datetime module
+
+datetime.timedelta
+------------------
+
+As datetime.datetime, datetime.timedelta only supports microsecond resolution,
+but can be enhanced to support nanosecond.
+
+Even if datetime.timedelta have most criteria, it was not selected because it
+is more complex than a simple number and is not accepted by functions getting
+timestamp inputs.
+
+
+decimal.Decimal
+---------------
+
+Decimal has an arbitrary precision, support arithmetic operations, is
+comparable. Functions getting float inputs support directly Decimal, Decimal is
+converted implicitly to float, even if the conversion may lose precision.
+
+Using Decimal by default would cause bootstrap issue because the module is
+implemented in Python, but using Decimal by default was not considered.
+
+The decimal module is implemented in Python and is slow, but there is a C
+reimplementation which is almost ready for inclusion in CPython.
+
+Tuple of integers
+-----------------
+
+Creating a tuple of integers is simple and fast, but arithmetic operations
+cannot be done directly on tuple. For example, (2, 1) - (2, 0) fails with a
+TypeError.
+
+An integer fraction can be used to store any number without loss of precision
+with any resolution: (numerator: int, denominator: int). The timestamp value
+can be computed with a simple division: numerator / denominator.
+
+For the C implementation, a variant can be used to avoid integer overflow
+because C types have a fixed size: (intpart: int, numerator: int, denominator:
+int), value = intpart + numerator / denominator. Still to avoid integer
+overflow in C types, numerator can be bigger than denominator while intpart can
+be zero.
+
+Other formats have been proposed:
+
+ * A: (sec, nsec): value = sec + nsec * 10 ** -9
+ * B: (intpart, floatpart, exponent): value = intpart + floatpart * 10 ** exponent
+ * C: (intpart, floatpart, base, exponent): value = intpart + floatpart * base ** exponent
+
+The format A only supports nanosecond resolution. Formats A and B lose
+precision if the clock frequency is not a power of 10. The format C has a
+similar issue.
+
+
+Alternatives: API design
+========================
+
+Add a global flag to change the timestamp type
+----------------------------------------------
+
+A global flag like os.stat_decimal_times(), similar to os.stat_float_times(),
+can be added to set globally the timestamp type.
+
+A global flag may cause issues with libraries and applications expecting float
+instead of Decimal. A float cannot be converted implicitly to Decimal.  The
+os.stat_float_times() case is different because an int can be converted
+implictly to float.
+
+Add a protocol to create a timestamp
+------------------------------------
+
+Instead of hardcoding how timestamps are created, a new protocol can be added
+to create a timestamp from a fraction. time.time(timestamp=type) would call
+type.__from_fraction__(numerator, denominator) to create a timestamp object of
+the specified type.
+
+If the type doesn't support the protocol, a fallback can be used:
+type(numerator) / type(denominator).
+
+A variant is to use a "converter" callback to create a timestamp. Example
+creating a float timestamp:
+
+    def timestamp_to_float(numerator, denominator):
+        return float(numerator) / float(denominator)
+
+Common converters can be provided by time, datetime and other modules, or maybe
+a specific "hires" module. Users can defined their own converters.
+
+Such protocol has a limitation: the structure of data passed to the protocol or
+the callback has to be decided once and cannot be changed later. For example,
+adding a timezone or the absolution start of the timestamp (e.g. Epoch or
+unspecified start for monotonic clocks) would break the API.
+
+Add new fields to os.stat
+-------------------------
+
+It was proposed to add 3 fields to os.stat() structure to get nanoseconds of
+timestamps.
+
+Populating the extra fields is time consuming. If new fields are available by
+default, any call to os.stat() would be slower. If new fields are optional, the
+stat structure would have a variable number of fields, which can be surprising.
+
+Anyway, this approach does not help with the time module.
+
+Add a boolean argument
+----------------------
+
+Because we only need one new type, decimal.Decimal, a simple boolean flag
+can be added. For example, time.time(decimal=True) or time.time(hires=True).
+
+Add new functions
+-----------------
+
+Add new functions for each type, examples:
+
+ * time.clock_decimal()
+ * time.time_decimal()
+ * os.stat_decimal()
+ * etc.
+
+Adding a new function for each function creating timestamps duplicate a lot
+of time.
+
+
+Links
+=====
+
+ * `Issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution <http://bugs.python.org/issue11457>`_
+ * `Issue #13882: Add format argument for time.time(), time.clock(), ... to get a timestamp as a Decimal object <http://bugs.python.org/issue13882>`_
+ * `[Python-Dev] Store timestamps as decimal.Decimal objects <http://mail.python.org/pipermail/python-dev/2012-January/116025.html>`_
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+