Add PEP 410: Use decimal.Decimal type for timestamps
This commit is contained in:
parent
634ee68583
commit
22e5bccdac
|
@ -0,0 +1,260 @@
|
||||||
|
PEP: 410
|
||||||
|
Title: Use decimal.Decimal type for timestamps
|
||||||
|
Version: $Revision$
|
||||||
|
Last-Modified: $Date$
|
||||||
|
Author: Victor Stinner <victor.stinner@haypocalc.com>
|
||||||
|
Status: Draft
|
||||||
|
Type: Standards Track
|
||||||
|
Content-Type: text/x-rst
|
||||||
|
Created: 01-Feburary-2012
|
||||||
|
Python-Version: 3.3
|
||||||
|
|
||||||
|
|
||||||
|
Abstract
|
||||||
|
========
|
||||||
|
|
||||||
|
Python 3.3 introduced functions supporting nanosecond resolutions. Python 3.3
|
||||||
|
only supports int or float to store timestamps, but these types cannot be use
|
||||||
|
to store a timestamp with a nanosecond resolution.
|
||||||
|
|
||||||
|
|
||||||
|
Motivation
|
||||||
|
==========
|
||||||
|
|
||||||
|
Python 2.3 introduced float timestamps to support subsecond resolutions.
|
||||||
|
os.stat() uses float timestamps by default since Python 2.5.
|
||||||
|
|
||||||
|
Python 3.3 introduced functions supporting nanosecond resolutions:
|
||||||
|
|
||||||
|
* os module: stat(), utimensat(), futimens()
|
||||||
|
* time module: clock_gettime(), clock_getres(), wallclock()
|
||||||
|
|
||||||
|
The Python float type uses binary64 format of the IEEE 754 standard. With a
|
||||||
|
resolution of 1 nanosecond (10^-9), float timestamps lose precision for values
|
||||||
|
bigger than 2^24 seconds (194 days: 1970-07-14 for an Epoch timestamp).
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
With a resolution of 1 microsecond (10^-6), float timestamps lose precision
|
||||||
|
for values bigger than 2^33 seconds (272 years: 2242-03-16 for an Epoch
|
||||||
|
timestamp).
|
||||||
|
|
||||||
|
|
||||||
|
Specification
|
||||||
|
=============
|
||||||
|
|
||||||
|
Add decimal.Decimal as a new type for timestamps. Add a *timestamp* optional
|
||||||
|
argument to:
|
||||||
|
|
||||||
|
* os module: fstat(), fstatat(), lstat() and stat()
|
||||||
|
* time module: clock(), clock_gettime(), clock_getres(), time() and
|
||||||
|
wallclock()
|
||||||
|
|
||||||
|
The *timestamp* argument is a type, there are three supported types:
|
||||||
|
|
||||||
|
* int
|
||||||
|
* float
|
||||||
|
* decimal.Decimal
|
||||||
|
|
||||||
|
The float type is still used by default for backward compatibility.
|
||||||
|
|
||||||
|
Support decimal.Decimal (without implicit conversion to float to avoid lose of
|
||||||
|
precision) in functions having timestamp arguments:
|
||||||
|
|
||||||
|
* datetime.datetime.fromtimestamp()
|
||||||
|
* time.gmtime(), time.localtime()
|
||||||
|
* os.utimensat(), os.futimens()
|
||||||
|
|
||||||
|
|
||||||
|
Backwards Compatibility
|
||||||
|
=======================
|
||||||
|
|
||||||
|
The default timestamp type is unchanged, so there is no impact of backwad
|
||||||
|
compatibility, nor impact on performances. The new timestamp type,
|
||||||
|
decimal.Decimal, is only used when requested explicitly.
|
||||||
|
|
||||||
|
|
||||||
|
Alternatives: Timestamp types
|
||||||
|
=============================
|
||||||
|
|
||||||
|
To support timestamps with a nanosecond resolution, five types were considered:
|
||||||
|
|
||||||
|
* 128 bits float
|
||||||
|
* decimal.Decimal
|
||||||
|
* datetime.datetime
|
||||||
|
* datetime.timedelta
|
||||||
|
* tuple of integers
|
||||||
|
|
||||||
|
Criteria:
|
||||||
|
|
||||||
|
* Doing arithmetic on timestamps must be possible.
|
||||||
|
* Timestamps must be comparable.
|
||||||
|
* The type must have a resolution of a least 1 nanosecond (without losing
|
||||||
|
precision) or an arbitrary resolution.
|
||||||
|
|
||||||
|
128 bits float
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Add a new IEEE 754-2008 quad-precision float type. The IEEE 754-2008 quad
|
||||||
|
precision float has 1 sign bit, 15 bits of exponent and 112 bits of mantissa.
|
||||||
|
|
||||||
|
128 bits float is supported by GCC (4.3), Clang and ICC. The problem is that
|
||||||
|
Visual C++ 2008 doesn't support it. Python must be portable and so cannot rely
|
||||||
|
on a type only available on some platforms. Another example: GCC 4.3 does not
|
||||||
|
support __float128 in 32-bit mode on x86 (but gcc 4.4 does).
|
||||||
|
|
||||||
|
Intel CPUs have FPU supporting 80-bit floats, but not using SSE intructions.
|
||||||
|
Other CPU vendors don't support this float size.
|
||||||
|
|
||||||
|
There is also a license issue: GCC uses the MPFR library which is distributed
|
||||||
|
under the GNU LGPL license. This license is incompatible with the Python
|
||||||
|
license.
|
||||||
|
|
||||||
|
datetime.datetime
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
datetime.datetime only supports microsecond resolution, but can be enhanced
|
||||||
|
to support nanosecond.
|
||||||
|
|
||||||
|
datetime.datetime has issues:
|
||||||
|
|
||||||
|
- there is no easy way to convert it into "seconds since the epoch"
|
||||||
|
- any broken-down time has issues of time stamp ordering in the
|
||||||
|
duplicate hour of switching from DST to normal time
|
||||||
|
- time zone support is flaky-to-nonexistent in the datetime module
|
||||||
|
|
||||||
|
datetime.timedelta
|
||||||
|
------------------
|
||||||
|
|
||||||
|
As datetime.datetime, datetime.timedelta only supports microsecond resolution,
|
||||||
|
but can be enhanced to support nanosecond.
|
||||||
|
|
||||||
|
Even if datetime.timedelta have most criteria, it was not selected because it
|
||||||
|
is more complex than a simple number and is not accepted by functions getting
|
||||||
|
timestamp inputs.
|
||||||
|
|
||||||
|
|
||||||
|
decimal.Decimal
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Decimal has an arbitrary precision, support arithmetic operations, is
|
||||||
|
comparable. Functions getting float inputs support directly Decimal, Decimal is
|
||||||
|
converted implicitly to float, even if the conversion may lose precision.
|
||||||
|
|
||||||
|
Using Decimal by default would cause bootstrap issue because the module is
|
||||||
|
implemented in Python, but using Decimal by default was not considered.
|
||||||
|
|
||||||
|
The decimal module is implemented in Python and is slow, but there is a C
|
||||||
|
reimplementation which is almost ready for inclusion in CPython.
|
||||||
|
|
||||||
|
Tuple of integers
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Creating a tuple of integers is simple and fast, but arithmetic operations
|
||||||
|
cannot be done directly on tuple. For example, (2, 1) - (2, 0) fails with a
|
||||||
|
TypeError.
|
||||||
|
|
||||||
|
An integer fraction can be used to store any number without loss of precision
|
||||||
|
with any resolution: (numerator: int, denominator: int). The timestamp value
|
||||||
|
can be computed with a simple division: numerator / denominator.
|
||||||
|
|
||||||
|
For the C implementation, a variant can be used to avoid integer overflow
|
||||||
|
because C types have a fixed size: (intpart: int, numerator: int, denominator:
|
||||||
|
int), value = intpart + numerator / denominator. Still to avoid integer
|
||||||
|
overflow in C types, numerator can be bigger than denominator while intpart can
|
||||||
|
be zero.
|
||||||
|
|
||||||
|
Other formats have been proposed:
|
||||||
|
|
||||||
|
* A: (sec, nsec): value = sec + nsec * 10 ** -9
|
||||||
|
* B: (intpart, floatpart, exponent): value = intpart + floatpart * 10 ** exponent
|
||||||
|
* C: (intpart, floatpart, base, exponent): value = intpart + floatpart * base ** exponent
|
||||||
|
|
||||||
|
The format A only supports nanosecond resolution. Formats A and B lose
|
||||||
|
precision if the clock frequency is not a power of 10. The format C has a
|
||||||
|
similar issue.
|
||||||
|
|
||||||
|
|
||||||
|
Alternatives: API design
|
||||||
|
========================
|
||||||
|
|
||||||
|
Add a global flag to change the timestamp type
|
||||||
|
----------------------------------------------
|
||||||
|
|
||||||
|
A global flag like os.stat_decimal_times(), similar to os.stat_float_times(),
|
||||||
|
can be added to set globally the timestamp type.
|
||||||
|
|
||||||
|
A global flag may cause issues with libraries and applications expecting float
|
||||||
|
instead of Decimal. A float cannot be converted implicitly to Decimal. The
|
||||||
|
os.stat_float_times() case is different because an int can be converted
|
||||||
|
implictly to float.
|
||||||
|
|
||||||
|
Add a protocol to create a timestamp
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
|
Instead of hardcoding how timestamps are created, a new protocol can be added
|
||||||
|
to create a timestamp from a fraction. time.time(timestamp=type) would call
|
||||||
|
type.__from_fraction__(numerator, denominator) to create a timestamp object of
|
||||||
|
the specified type.
|
||||||
|
|
||||||
|
If the type doesn't support the protocol, a fallback can be used:
|
||||||
|
type(numerator) / type(denominator).
|
||||||
|
|
||||||
|
A variant is to use a "converter" callback to create a timestamp. Example
|
||||||
|
creating a float timestamp:
|
||||||
|
|
||||||
|
def timestamp_to_float(numerator, denominator):
|
||||||
|
return float(numerator) / float(denominator)
|
||||||
|
|
||||||
|
Common converters can be provided by time, datetime and other modules, or maybe
|
||||||
|
a specific "hires" module. Users can defined their own converters.
|
||||||
|
|
||||||
|
Such protocol has a limitation: the structure of data passed to the protocol or
|
||||||
|
the callback has to be decided once and cannot be changed later. For example,
|
||||||
|
adding a timezone or the absolution start of the timestamp (e.g. Epoch or
|
||||||
|
unspecified start for monotonic clocks) would break the API.
|
||||||
|
|
||||||
|
Add new fields to os.stat
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
It was proposed to add 3 fields to os.stat() structure to get nanoseconds of
|
||||||
|
timestamps.
|
||||||
|
|
||||||
|
Populating the extra fields is time consuming. If new fields are available by
|
||||||
|
default, any call to os.stat() would be slower. If new fields are optional, the
|
||||||
|
stat structure would have a variable number of fields, which can be surprising.
|
||||||
|
|
||||||
|
Anyway, this approach does not help with the time module.
|
||||||
|
|
||||||
|
Add a boolean argument
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
Because we only need one new type, decimal.Decimal, a simple boolean flag
|
||||||
|
can be added. For example, time.time(decimal=True) or time.time(hires=True).
|
||||||
|
|
||||||
|
Add new functions
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Add new functions for each type, examples:
|
||||||
|
|
||||||
|
* time.clock_decimal()
|
||||||
|
* time.time_decimal()
|
||||||
|
* os.stat_decimal()
|
||||||
|
* etc.
|
||||||
|
|
||||||
|
Adding a new function for each function creating timestamps duplicate a lot
|
||||||
|
of time.
|
||||||
|
|
||||||
|
|
||||||
|
Links
|
||||||
|
=====
|
||||||
|
|
||||||
|
* `Issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution <http://bugs.python.org/issue11457>`_
|
||||||
|
* `Issue #13882: Add format argument for time.time(), time.clock(), ... to get a timestamp as a Decimal object <http://bugs.python.org/issue13882>`_
|
||||||
|
* `[Python-Dev] Store timestamps as decimal.Decimal objects <http://mail.python.org/pipermail/python-dev/2012-January/116025.html>`_
|
||||||
|
|
||||||
|
|
||||||
|
Copyright
|
||||||
|
=========
|
||||||
|
|
||||||
|
This document has been placed in the public domain.
|
||||||
|
|
Loading…
Reference in New Issue