540 lines
20 KiB
Plaintext
540 lines
20 KiB
Plaintext
PEP: 410
|
||
Title: Use decimal.Decimal type for timestamps
|
||
Version: $Revision$
|
||
Last-Modified: $Date$
|
||
Author: Victor Stinner <victor.stinner@haypocalc.com>
|
||
Status: Draft
|
||
Type: Standards Track
|
||
Content-Type: text/x-rst
|
||
Created: 01-February-2012
|
||
Python-Version: 3.3
|
||
|
||
|
||
Abstract
|
||
========
|
||
|
||
Decimal becomes the official type for high-resolution timestamps to make Python
|
||
support new functions using a nanosecond resolution without loss of precision.
|
||
|
||
|
||
Rationale
|
||
=========
|
||
|
||
Python 2.3 introduced float timestamps to support sub-second resolutions.
|
||
os.stat() uses float timestamps by default since Python 2.5. Python 3.3
|
||
introduced functions supporting nanosecond resolutions:
|
||
|
||
* os module: futimens(), utimensat()
|
||
* time module: clock_gettime(), clock_getres(), monotonic(), wallclock()
|
||
|
||
os.stat() reads nanosecond timestamps but returns timestamps as float.
|
||
|
||
The Python float type uses binary64 format of the IEEE 754 standard. With a
|
||
resolution of one nanosecond (10\ :sup:`-9`), float timestamps lose precision
|
||
for values bigger than 2\ :sup:`24` seconds (194 days: 1970-07-14 for an Epoch
|
||
timestamp).
|
||
|
||
Nanosecond resolution is required to set the exact modification time on
|
||
filesystems supporting nanosecond timestamps (e.g. ext4, btrfs, NTFS, ...). It
|
||
helps also to compare the modification time to check if a file is newer than
|
||
another file. Use cases: copy the modification time of a file using
|
||
shutil.copystat(), create a TAR archive with the tarfile module, manage a
|
||
mailbox with the mailbox module, etc.
|
||
|
||
An arbitrary resolution is preferred over a fixed resolution (like nanosecond)
|
||
to not have to change the API when a better resolution is required. For
|
||
example, the NTP protocol uses fractions of 2\ :sup:`32` seconds
|
||
(approximatively 2.3 × 10\ :sup:`-10` second), whereas the NTP protocol version
|
||
4 uses fractions of 2\ :sup:`64` seconds (5.4 × 10\ :sup:`-20` second).
|
||
|
||
.. note::
|
||
With a resolution of 1 microsecond (10\ :sup:`-6`), float timestamps lose
|
||
precision for values bigger than 2\ :sup:`33` seconds (272 years: 2242-03-16
|
||
for an Epoch timestamp). With a resolution of 100 nanoseconds
|
||
(10\ :sup:`-7`, resolution used on Windows), float timestamps lose precision
|
||
for values bigger than 2\ :sup:`29` seconds (17 years: 1987-01-05 for an
|
||
Epoch timestamp).
|
||
|
||
|
||
Specification
|
||
=============
|
||
|
||
Add decimal.Decimal as a new type for timestamps. Decimal supports any
|
||
timestamp resolution, support arithmetic operations and is comparable. It is
|
||
possible to coerce a Decimal to float, even if the conversion may lose
|
||
precision. The clock resolution can also be stored in a Decimal object.
|
||
|
||
Add an optional *timestamp* argument to:
|
||
|
||
* os module: fstat(), fstatat(), lstat(), stat() (st_atime,
|
||
st_ctime and st_mtime fields of the stat structure),
|
||
sched_rr_get_interval(), times(), wait3() and wait4()
|
||
* resource module: ru_utime and ru_stime fields of getrusage()
|
||
* signal module: getitimer(), setitimer()
|
||
* time module: clock(), clock_gettime(), clock_getres(),
|
||
monotonic(), time() and wallclock()
|
||
|
||
The *timestamp* argument value can be float or Decimal, float is still the
|
||
default for backward compatibility. The following functions support Decimal as
|
||
input:
|
||
|
||
* datetime module: date.fromtimestamp(), datetime.fromtimestamp() and
|
||
datetime.utcfromtimestamp()
|
||
* os module: futimes(), futimesat(), lutimes(), utime()
|
||
* select module: epoll.poll(), kqueue.control(), select()
|
||
* signal module: setitimer(), sigtimedwait()
|
||
* time module: ctime(), gmtime(), localtime(), sleep()
|
||
|
||
The os.stat_float_times() function is deprecated: use an explicit cast using
|
||
int() instead.
|
||
|
||
.. note::
|
||
The decimal module is implemented in Python and is slower than float, but
|
||
there is a new C implementation which is almost ready for inclusion in
|
||
CPython.
|
||
|
||
|
||
Backwards Compatibility
|
||
=======================
|
||
|
||
The default timestamp type (float) is unchanged, so there is no impact on
|
||
backward compatibility nor on performances. The new timestamp type,
|
||
decimal.Decimal, is only returned when requested explicitly.
|
||
|
||
|
||
Objection: clocks accuracy
|
||
==========================
|
||
|
||
Computer clocks and operating systems are inaccurate and fail to provide
|
||
nanosecond accuracy in practice. A nanosecond is what it takes to execute a
|
||
couple of CPU instructions. Even on a real-time operating system, a
|
||
nanosecond-precise measurement is already obsolete when it starts being
|
||
processed by the higher-level application. A single cache miss in the CPU will
|
||
make the precision worthless.
|
||
|
||
.. note::
|
||
Linux *actually* is able to measure time in nanosecond precision, even
|
||
though it is not able to keep its clock synchronized to UTC with a
|
||
nanosecond accuracy.
|
||
|
||
|
||
Alternatives: Timestamp types
|
||
=============================
|
||
|
||
To support timestamps with an arbitrary or nanosecond resolution, the following
|
||
types have been considered:
|
||
|
||
* decimal.Decimal
|
||
* number of nanoseconds
|
||
* 128-bits float
|
||
* datetime.datetime
|
||
* datetime.timedelta
|
||
* tuple of integers
|
||
* timespec structure
|
||
|
||
Criteria:
|
||
|
||
* Doing arithmetic on timestamps must be possible
|
||
* Timestamps must be comparable
|
||
* An arbitrary resolution, or at least a resolution of one nanosecond without
|
||
losing precision
|
||
* It should be possible to coerce the new timestamp to float for backward
|
||
compatibility
|
||
|
||
|
||
A resolution of one nanosecond is enough to support all current C functions.
|
||
|
||
The best resolution used by operating systems is one nanosecond. In practice,
|
||
most clock accuracy is closer to microseconds than nanoseconds. So it sounds
|
||
reasonable to use a fixed resolution of one nanosecond.
|
||
|
||
|
||
Number of nanoseconds (int)
|
||
---------------------------
|
||
|
||
A nanosecond resolution is enough for all current C functions and so a
|
||
timestamp can simply be a number of nanoseconds, an integer, not a float.
|
||
|
||
The number of nanoseconds format has been rejected because it would require to
|
||
add new specialized functions for this format because it not possible to
|
||
differentiate a number of nanoseconds and a number of seconds just by checking
|
||
the object type.
|
||
|
||
|
||
128-bits float
|
||
--------------
|
||
|
||
Add a new IEEE 754-2008 quad-precision binary float type. The IEEE 754-2008
|
||
quad precision float has 1 sign bit, 15 bits of exponent and 112 bits of
|
||
mantissa. 128-bits float is supported by GCC (4.3), Clang and ICC compilers.
|
||
|
||
Python must be portable and so cannot rely on a type only available on some
|
||
platforms. For example, Visual C++ 2008 doesn't support 128-bits float, whereas
|
||
it is used to build the official Windows executables. Another example: GCC 4.3
|
||
does not support __float128 in 32-bit mode on x86 (but GCC 4.4 does).
|
||
|
||
There is also a license issue: GCC uses the MPFR library for 128-bits float,
|
||
library distributed under the GNU LGPL license. This license is not compatible
|
||
with the Python license.
|
||
|
||
.. note::
|
||
The x87 floating point unit of Intel CPU supports 80-bit floats. This format
|
||
is not supported by the SSE instruction set, which is now preferred over
|
||
float, especially on x86_64. Other CPU vendors don't support 80-bit float.
|
||
|
||
|
||
|
||
datetime.datetime
|
||
-----------------
|
||
|
||
The datetime.datetime type is the natural choice for a timestamp because it is
|
||
clear that this type contains a timestamp, whereas int, float and Decimal are
|
||
raw numbers. It is an absolute timestamp and so is well defined. It gives
|
||
direct access to the year, month, day, hours, minutes and seconds. It has
|
||
methods related to time like methods to format the timestamp as string (e.g.
|
||
datetime.datetime.strftime).
|
||
|
||
The major issue is that except os.stat(), time.time() and
|
||
time.clock_gettime(time.CLOCK_GETTIME), all time functions have an unspecified
|
||
starting point and no timezone information, and so cannot be converted to
|
||
datetime.datetime.
|
||
|
||
datetime.datetime has also issues with timezone. For example, a datetime object
|
||
without timezone (unaware) and a datetime with a timezone (aware) cannot be
|
||
compared. There is also an ordering issues with daylight saving time (DST) in
|
||
the duplicate hour of switching from DST to normal time.
|
||
|
||
datetime.datetime has been rejected because it cannot be used for functions
|
||
using an unspecified starting point like os.times() or time.clock().
|
||
|
||
For time.time() and time.clock_gettime(time.CLOCK_GETTIME): it is already
|
||
possible to get the current time as a datetime.datetime object using::
|
||
|
||
datetime.datetime.now(datetime.timezone.utc)
|
||
|
||
For os.stat(), it is simple to create a datetime.datetime object from a
|
||
decimal.Decimal timestamp in the UTC timezone::
|
||
|
||
datetime.datetime.fromtimestamp(value, datetime.timezone.utc)
|
||
|
||
.. note::
|
||
datetime.datetime only supports microsecond resolution, but can be enhanced
|
||
to support nanosecond.
|
||
|
||
datetime.timedelta
|
||
------------------
|
||
|
||
datetime.timedelta is the natural choice for a relative timestamp because it is
|
||
clear that this type contains a timestamp, whereas int, float and Decimal are
|
||
raw numbers. It can be used with datetime.datetime to get an absolute timestamp
|
||
when the starting point is known.
|
||
|
||
datetime.timedelta has been rejected because it cannot be coerced to float and
|
||
has a fixed resolution. One new standard timestamp type is enough, Decimal is
|
||
preferred over datetime.timedelta. Converting a datetime.timedelta to float
|
||
requires an explicit call to the datetime.timedelta.total_seconds() method.
|
||
|
||
.. note::
|
||
datetime.timedelta only supports microsecond resolution, but can be enhanced
|
||
to support nanosecond.
|
||
|
||
|
||
.. _tuple:
|
||
|
||
Tuple of integers
|
||
-----------------
|
||
|
||
To expose C functions in Python, a tuple of integers is the natural choice to
|
||
store a timestamp because the C language uses structures with integers fields
|
||
(e.g. timeval and timespec structures). Using only integers avoids the loss of
|
||
precision (Python supports integers of arbitrary length). Creating and parsing
|
||
a tuple of integers is simple and fast.
|
||
|
||
Depending of the exact format of the tuple, the precision can be arbitrary or
|
||
fixed. The precision can be choose as the loss of precision is smaller than
|
||
an arbitrary limit like one nanosecond.
|
||
|
||
Different formats have been proposed:
|
||
|
||
* A: (numerator, denominator)
|
||
|
||
* value = numerator / denominator
|
||
* resolution = 1 / denominator
|
||
* denominator > 0
|
||
|
||
* B: (seconds, numerator, denominator)
|
||
|
||
* value = seconds + numerator / denominator
|
||
* resolution = 1 / denominator
|
||
* 0 <= numerator < denominator
|
||
* denominator > 0
|
||
|
||
* C: (intpart, floatpart, base, exponent)
|
||
|
||
* value = intpart + floatpart / base\ :sup:`exponent`
|
||
* resolution = 1 / base \ :sup:`exponent`
|
||
* 0 <= floatpart < base \ :sup:`exponent`
|
||
* base > 0
|
||
* exponent >= 0
|
||
|
||
* D: (intpart, floatpart, exponent)
|
||
|
||
* value = intpart + floatpart / 10\ :sup:`exponent`
|
||
* resolution = 1 / 10 \ :sup:`exponent`
|
||
* 0 <= floatpart < 10 \ :sup:`exponent`
|
||
* exponent >= 0
|
||
|
||
* E: (sec, nsec)
|
||
|
||
* value = sec + nsec × 10\ :sup:`-9`
|
||
* resolution = 10 \ :sup:`-9` (nanosecond)
|
||
* 0 <= nsec < 10 \ :sup:`9`
|
||
|
||
All formats support an arbitrary resolution, except of the format (E).
|
||
|
||
The format (D) may not be able to store the exact value (may loss of precision)
|
||
if the clock frequency is arbitrary and cannot be expressed as a power of 10.
|
||
The format (C) has a similar issue, but in such case, it is possible to use
|
||
base=frequency and exponent=1.
|
||
|
||
The formats (C), (D) and (E) allow optimization for conversion to float if the
|
||
base is 2 and to decimal.Decimal if the base is 10.
|
||
|
||
The format (A) is a simple fraction. It supports arbitrary precision, is simple
|
||
(only two fields), only requires a simple division to get the floating point
|
||
value, and is already used by float.as_integer_ratio().
|
||
|
||
To simplify the implementation (especially the C implementation to avoid
|
||
integer overflow), a numerator bigger than the denominator can be accepted.
|
||
The tuple may be normalized later.
|
||
|
||
Tuple of integers have been rejected because they don't support arithmetic
|
||
operations.
|
||
|
||
.. note::
|
||
On Windows, the ``QueryPerformanceCounter()`` clock uses the frequency of
|
||
the processor which is an arbitrary number and so may not be a power or 2 or
|
||
10. The frequency can be read using ``QueryPerformanceFrequency()``.
|
||
|
||
|
||
timespec structure
|
||
------------------
|
||
|
||
timespec is the C structure used to store timestamp with a nanosecond
|
||
resolution. Python can use a type with the same structure: (seconds,
|
||
nanoseconds). For convenience, arithmetic operations on timespec are supported.
|
||
|
||
Example of an incomplete timespec type supporting addition, subtraction and
|
||
coercion to float::
|
||
|
||
class timespec(tuple):
|
||
def __new__(cls, sec, nsec):
|
||
if not isinstance(sec, int):
|
||
raise TypeError
|
||
if not isinstance(nsec, int):
|
||
raise TypeError
|
||
asec, nsec = divmod(nsec, 10 ** 9)
|
||
sec += asec
|
||
obj = tuple.__new__(cls, (sec, nsec))
|
||
obj.sec = sec
|
||
obj.nsec = nsec
|
||
return obj
|
||
|
||
def __float__(self):
|
||
return self.sec + self.nsec * 1e-9
|
||
|
||
def total_nanoseconds(self):
|
||
return self.sec * 10 ** 9 + self.nsec
|
||
|
||
def __add__(self, other):
|
||
if not isinstance(other, timespec):
|
||
raise TypeError
|
||
ns_sum = self.total_nanoseconds() + other.total_nanoseconds()
|
||
return timespec(*divmod(ns_sum, 10 ** 9))
|
||
|
||
def __sub__(self, other):
|
||
if not isinstance(other, timespec):
|
||
raise TypeError
|
||
ns_diff = self.total_nanoseconds() - other.total_nanoseconds()
|
||
return timespec(*divmod(ns_diff, 10 ** 9))
|
||
|
||
def __str__(self):
|
||
if self.sec < 0 and self.nsec:
|
||
sec = abs(1 + self.sec)
|
||
nsec = 10**9 - self.nsec
|
||
return '-%i.%09u' % (sec, nsec)
|
||
else:
|
||
return '%i.%09u' % (self.sec, self.nsec)
|
||
|
||
def __repr__(self):
|
||
return '<timespec(%s, %s)>' % (self.sec, self.nsec)
|
||
|
||
The timespec type is similar to the format (E) of tuples of integer, except
|
||
that it supports arithmetic and coercion to float.
|
||
|
||
The timespec type was rejected because it only supports nanosecond resolution
|
||
and requires to implement each arithmetic operation, whereas the Decimal type
|
||
is already implemented and well tested.
|
||
|
||
|
||
Alternatives: API design
|
||
========================
|
||
|
||
Add a string argument to specify the return type
|
||
------------------------------------------------
|
||
|
||
Add an string argument to function returning timestamps, example:
|
||
time.time(format="datetime"). A string is more extensible than a type: it is
|
||
possible to request a format that has no type, like a tuple of integers.
|
||
|
||
This API was rejected because it was necessary to import implicitly modules to
|
||
instantiate objects (e.g. import datetime to create datetime.datetime).
|
||
Importing a module may raise an exception and may be slow, such behaviour is
|
||
unexpected and surprising.
|
||
|
||
|
||
Add a global flag to change the timestamp type
|
||
----------------------------------------------
|
||
|
||
A global flag like os.stat_decimal_times(), similar to os.stat_float_times(),
|
||
can be added to set globally the timestamp type.
|
||
|
||
A global flag may cause issues with libraries and applications expecting float
|
||
instead of Decimal. Decimal is not fully compatible with float. float+Decimal
|
||
raises a TypeError for example. The os.stat_float_times() case is different
|
||
because an int can be coerced to float and int+float gives float.
|
||
|
||
|
||
Add a protocol to create a timestamp
|
||
------------------------------------
|
||
|
||
Instead of hard coding how timestamps are created, a new protocol can be added
|
||
to create a timestamp from a fraction.
|
||
|
||
For example, time.time(timestamp=type) would call the class method
|
||
type.__fromfraction__(numerator, denominator) to create a timestamp object of
|
||
the specified type. If the type doesn't support the protocol, a fallback is
|
||
used: type(numerator) / type(denominator).
|
||
|
||
A variant is to use a "converter" callback to create a timestamp. Example
|
||
creating a float timestamp:
|
||
|
||
def timestamp_to_float(numerator, denominator):
|
||
return float(numerator) / float(denominator)
|
||
|
||
Common converters can be provided by time, datetime and other modules, or maybe
|
||
a specific "hires" module. Users can define their own converters.
|
||
|
||
Such protocol has a limitation: the timestamp structure has to be decided once
|
||
and cannot be changed later. For example, adding a timezone or the absolute
|
||
start of the timestamp would break the API.
|
||
|
||
The protocol proposition was as being excessive given the requirements, but
|
||
that the specific syntax proposed (time.time(timestamp=type)) allows this to be
|
||
introduced later if compelling use cases are discovered.
|
||
|
||
.. note::
|
||
Other formats may be used instead of a fraction: see the tuple of integers
|
||
section for example.
|
||
|
||
|
||
Add new fields to os.stat
|
||
-------------------------
|
||
|
||
To get the creation, modification and access time of a file with a nanosecond
|
||
resolution, three fields can be added to os.stat() structure.
|
||
|
||
The new fields can be timestamps with nanosecond resolution (e.g. Decimal) or
|
||
the nanosecond part of each timestamp (int).
|
||
|
||
If the new fields are timestamps with nanosecond resolution, populating the
|
||
extra fields would be time consuming. Any call to os.stat() would be slower,
|
||
even if os.stat() is only called to check if a file exists. A parameter can be
|
||
added to os.stat() to make these fields optional, the structure would have a
|
||
variable number of fields.
|
||
|
||
If the new fields only contain the fractional part (nanoseconds), os.stat()
|
||
would be efficient. These fields would always be present and so set to zero if
|
||
the operating system does not support sub-second resolution. Splitting a
|
||
timestamp in two parts, seconds and nanoseconds, is similar to the timespec
|
||
type and tuple of integers, and so have the same drawbacks.
|
||
|
||
Adding new fields to the os.stat() structure does not solve the nanosecond
|
||
issue in other modules (e.g. the time module).
|
||
|
||
|
||
Add a boolean argument
|
||
----------------------
|
||
|
||
Because we only need one new type (Decimal), a simple boolean flag can be
|
||
added. Example: time.time(decimal=True) or time.time(hires=True).
|
||
|
||
Such flag would require to do an hidden import which is considered as a bad
|
||
practice.
|
||
|
||
The boolean argument API was rejected because it is not "pythonic". Changing
|
||
the return type with a parameter value is preferred over a boolean parameter (a
|
||
flag).
|
||
|
||
|
||
Add new functions
|
||
-----------------
|
||
|
||
Add new functions for each type, examples:
|
||
|
||
* time.clock_decimal()
|
||
* time.time_decimal()
|
||
* os.stat_decimal()
|
||
* os.stat_timespec()
|
||
* etc.
|
||
|
||
Adding a new function for each function creating timestamps duplicate a lot of
|
||
code and would be a pain to maintain.
|
||
|
||
|
||
Add a new hires module
|
||
----------------------
|
||
|
||
Add a new module called "hires" with the same API than the time module, except
|
||
that it would return timestamp with high resolution, e.g. decimal.Decimal.
|
||
Adding a new module avoids to link low-level modules like time or os to the
|
||
decimal module.
|
||
|
||
This idea was rejected because it requires to duplicate most of the code of the
|
||
time module, would be a pain to maintain, and timestamps are used modules other
|
||
than the time module. Examples: signal.sigtimedwait(), select.select(),
|
||
resource.getrusage(), os.stat(), etc. Duplicate the code of each module is not
|
||
acceptable.
|
||
|
||
|
||
Links
|
||
=====
|
||
|
||
Python:
|
||
|
||
* `Issue #7652: Merge C version of decimal into py3k <http://bugs.python.org/issue7652>`_ (cdecimal)
|
||
* `Issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution <http://bugs.python.org/issue11457>`_
|
||
* `Issue #13882: PEP 410: Use decimal.Decimal type for timestamps <http://bugs.python.org/issue13882>`_
|
||
* `[Python-Dev] Store timestamps as decimal.Decimal objects <http://mail.python.org/pipermail/python-dev/2012-January/116025.html>`_
|
||
|
||
Other languages:
|
||
|
||
* Ruby (1.9.3), the `Time class <http://ruby-doc.org/core-1.9.3/Time.html>`_
|
||
supports picosecond (10\ :sup:`-12`)
|
||
* .NET framework, `DateTime type <http://msdn.microsoft.com/en-us/library/system.datetime.ticks.aspx>`_:
|
||
number of 100-nanosecond intervals that have elapsed since 12:00:00
|
||
midnight, January 1, 0001. DateTime.Ticks uses a signed 64-bit integer.
|
||
* Java (1.5), `System.nanoTime() <http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/System.html#nanoTime()>`_:
|
||
wallclock with an unspecified starting point as a number of nanoseconds, use
|
||
a signed 64 bits integer (long).
|
||
* Perl, `Time::Hiref module <http://perldoc.perl.org/Time/HiRes.html>`_:
|
||
use float so has the same loss of precision issue with nanosecond resolution
|
||
than Python float timestamps
|
||
|
||
|
||
Copyright
|
||
=========
|
||
|
||
This document has been placed in the public domain.
|
||
|