python-peps/pep-0410.txt

286 lines
10 KiB
Plaintext

PEP: 410
Title: Use decimal.Decimal type for timestamps
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner <victor.stinner@haypocalc.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 01-Feburary-2012
Python-Version: 3.3
Abstract
========
Python 3.3 introduced functions supporting nanosecond resolutions. Python 3.3
only supports int or float to store timestamps, but these types cannot be use
to store a timestamp with a nanosecond resolution.
Motivation
==========
Python 2.3 introduced float timestamps to support subsecond resolutions.
os.stat() uses float timestamps by default since Python 2.5.
Python 3.3 introduced functions supporting nanosecond resolutions:
* os module: stat(), utimensat(), futimens()
* time module: clock_gettime(), clock_getres(), wallclock()
The Python float type uses binary64 format of the IEEE 754 standard. With a
resolution of 1 nanosecond (10\ :sup:`-9`), float timestamps lose precision for values
bigger than 2\ :sup:`24` seconds (194 days: 1970-07-14 for an Epoch timestamp).
.. note::
With a resolution of 1 microsecond (10\ :sup:`-6`), float timestamps lose precision
for values bigger than 2\ :sup:`33` seconds (272 years: 2242-03-16 for an Epoch
timestamp).
Specification
=============
Add decimal.Decimal as a new type for timestamps. Decimal supports any
timestamp resolution, support arithmetic operations and is comparable.
Functions getting float inputs support directly Decimal, Decimal is converted
implicitly to float, even if the conversion may lose precision.
Add a *timestamp* optional argument to:
* os module: fstat(), fstatat(), lstat() and stat()
* time module: clock(), clock_gettime(), clock_getres(), time() and
wallclock()
The *timestamp* argument is a type, there are three supported types:
* int
* float
* decimal.Decimal
The float type is still used by default for backward compatibility.
Support decimal.Decimal (without implicit conversion to float to avoid lose of
precision) in functions having timestamp arguments:
* datetime.datetime.fromtimestamp()
* time.gmtime(), time.localtime()
* os.utimensat(), os.futimens()
The os.stat_float_times() is deprecated: use timestamp=int argument instead.
.. note::
The decimal module is implemented in Python and is slow, but there is a C
reimplementation which is almost ready for inclusion in CPython.
Backwards Compatibility
=======================
The default timestamp type is unchanged, so there is no impact of backwad
compatibility, nor impact on performances. The new timestamp type,
decimal.Decimal, is only used when requested explicitly.
Alternatives: Timestamp types
=============================
To support timestamps with a nanosecond resolution, five types were considered:
* 128 bits float
* decimal.Decimal
* datetime.datetime
* datetime.timedelta
* tuple of integers
Criteria:
* Doing arithmetic on timestamps must be possible.
* Timestamps must be comparable.
* The type must have a resolution of a least 1 nanosecond (without losing
precision) or an arbitrary resolution.
128 bits float
--------------
Add a new IEEE 754-2008 quad-precision float type. The IEEE 754-2008 quad
precision float has 1 sign bit, 15 bits of exponent and 112 bits of mantissa.
128 bits float is supported by GCC (4.3), Clang and ICC compilers. Python must
be portable and so cannot rely on a type only available on some platforms. For
example, Visual C++ 2008 doesn't support it 128 bits float, whereas it is used
to build the official Windows executables. Another example: GCC 4.3 does not
support __float128 in 32-bit mode on x86 (but GCC 4.4 does).
Intel CPUs have FPU (x87) supporting 80-bit floats, but not using SSE
intructions. Other CPU vendors don't support this float size.
There is also a license issue: GCC uses the MPFR library for 128 bits float,
library distributed under the GNU LGPL license. This license is not compatible
with the Python license.
datetime.datetime
-----------------
datetime.datetime only supports microsecond resolution, but can be enhanced
to support nanosecond.
datetime.datetime has issues with timezone. For example, a datetime object
without timezone and a datetime with a timezone cannot be compared.
datetime.datetime has ordering issues with daylight saving time (DST) in the
duplicate hour of switching from DST to normal time.
datetime.datetime is not as well integrated than Epoch timestamps, some
functions don't accept this type as input. For example, os.utime() expects a
tuple of Epoch timestamps.
datetime.timedelta
------------------
As datetime.datetime, datetime.timedelta only supports microsecond resolution,
but can be enhanced to support nanosecond.
datetime.timedelta is not as well integrated than Epoch timestamps, some
functions don't accept this type as input. For example, os.utime() expects a
tuple of Epoch timestamps.
.. _tuple-integers:
Tuple of integers
-----------------
Creating a tuple of integers is simple and fast, but arithmetic operations
cannot be done directly on tuple. For example, (2, 1) - (2, 0) fails with a
TypeError.
An integer fraction can be used to store any number without loss of precision
with any resolution: (numerator: int, denominator: int). The timestamp value
can be computed with a simple division: numerator / denominator.
For the C implementation, a variant can be used to avoid integer overflow
because C types have a fixed size: (intpart: int, numerator: int, denominator:
int), value = intpart + numerator / denominator. Still to avoid integer
overflow in C types, numerator can be bigger than denominator while intpart can
be zero.
Other formats have been proposed:
* A: (sec, nsec): value = sec + nsec * 10\ :sup:`-9`
* B: (intpart, floatpart, exponent): value = intpart + floatpart * 10\ :sup:`exponent`
* C: (intpart, floatpart, base, exponent): value = intpart + floatpart * base\ :sup:`exponent`
The format A only supports nanosecond resolution. Formats A and B lose
precision if the clock frequency cannot be written as a power of 10: if the
clock frequency is not coprime with 2 and 5.
For some clocks, like ``QueryPerformanceCounter()`` on Windows, the frequency
is only known as runtime. The base and exponent has to be computed. If
computing the base and the exponent is too expensive (or not possible, e.g. if
the frequency is a prime number), exponent=1 can be used. The format (C) is
just a fractionn if exponent=1.
The only advantage of these formats is a small optimization if the base is 2
for float or if the base 10 for Decimal. In other cases, frequency = base\
:sup:`exponent` must be computed again to convert a timestamp as float or
Decimal. Storing directly the frequency in the denominator is simpler.
Alternatives: API design
========================
Add a global flag to change the timestamp type
----------------------------------------------
A global flag like os.stat_decimal_times(), similar to os.stat_float_times(),
can be added to set globally the timestamp type.
A global flag may cause issues with libraries and applications expecting float
instead of Decimal. A float cannot be converted implicitly to Decimal. The
os.stat_float_times() case is different because an int can be converted
implictly to float.
Add a protocol to create a timestamp
------------------------------------
Instead of hardcoding how timestamps are created, a new protocol can be added
to create a timestamp from a fraction. time.time(timestamp=type) would call
type.__from_fraction__(numerator, denominator) to create a timestamp object of
the specified type.
If the type doesn't support the protocol, a fallback can be used:
type(numerator) / type(denominator).
A variant is to use a "converter" callback to create a timestamp. Example
creating a float timestamp:
def timestamp_to_float(numerator, denominator):
return float(numerator) / float(denominator)
Common converters can be provided by time, datetime and other modules, or maybe
a specific "hires" module. Users can defined their own converters.
Such protocol has a limitation: the structure of data passed to the protocol or
the callback has to be decided once and cannot be changed later. For example,
adding a timezone or the absolution start of the timestamp (e.g. Epoch or
unspecified start for monotonic clocks) would break the API.
The protocol proposition was as being excessive given the requirements, but
that the specific syntax proposed (time.time(timestamp=type)) allows this to be
introduced later if compelling use cases are discovered.
.. note::
Other formats can also be used instead of a fraction: see the `Tuple of integers
<tuple-integers>`_ section
Add new fields to os.stat
-------------------------
It was proposed to add 3 fields to os.stat() structure to get nanoseconds of
timestamps.
Populating the extra fields is time consuming. If new fields are available by
default, any call to os.stat() would be slower. If new fields are optional, the
stat structure would have a variable number of fields, which can be surprising.
Anyway, this approach does not help with the time module.
Add a boolean argument
----------------------
Because we only need one new type, decimal.Decimal, a simple boolean flag
can be added. For example, time.time(decimal=True) or time.time(hires=True).
The boolean argument API was rejected because it is not "pythonic". Changing
the return type with a parameter value is preferred over a boolean parameter (a
flag).
Add new functions
-----------------
Add new functions for each type, examples:
* time.clock_decimal()
* time.time_decimal()
* os.stat_decimal()
* etc.
Adding a new function for each function creating timestamps duplicate a lot
of time.
Links
=====
* `Issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution <http://bugs.python.org/issue11457>`_
* `Issue #13882: Add format argument for time.time(), time.clock(), ... to get a timestamp as a Decimal object <http://bugs.python.org/issue13882>`_
* `[Python-Dev] Store timestamps as decimal.Decimal objects <http://mail.python.org/pipermail/python-dev/2012-January/116025.html>`_
Copyright
=========
This document has been placed in the public domain.