* Add objection: clocks accuracy
 * Add Guido's type: number of nanosecond
 * Add time.time(format="decimal") alternative API
 * Add links to what other languages offer
This commit is contained in:
Victor Stinner 2012-02-18 04:03:06 +01:00
parent 6e3f3791ef
commit 102687f9ff
1 changed files with 246 additions and 158 deletions

View File

@ -13,32 +13,31 @@ Python-Version: 3.3
Abstract
========
Python 3.3 introduced functions supporting nanosecond resolutions. Python 3.3
only supports int or float to store timestamps, but these types cannot be use
to store a timestamp with a nanosecond resolution.
Decimal becomes the official type for high-resolution timestamps to make Python
support new functions using a nanosecond resolution without loss of precision.
Motivation
==========
Python 2.3 introduced float timestamps to support subsecond resolutions.
Python 2.3 introduced float timestamps to support sub-second resolutions.
os.stat() uses float timestamps by default since Python 2.5. Python 3.3
introduced functions supporting nanosecond resolutions:
* os module: futimens(), utimensat()
* time module: clock_gettime(), clock_getres(), monotonic(), wallclock()
os.stat() reads nanoseconds fields of the stat structure, but returns
timestamps as float.
os.stat() reads nanosecond timestamps but returns timestamps as float.
The Python float type uses binary64 format of the IEEE 754 standard. With a
resolution of 1 nanosecond (10\ :sup:`-9`), float timestamps lose precision for values
bigger than 2\ :sup:`24` seconds (194 days: 1970-07-14 for an Epoch timestamp).
resolution of one nanosecond (10\ :sup:`-9`), float timestamps lose precision
for values bigger than 2\ :sup:`24` seconds (194 days: 1970-07-14 for an Epoch
timestamp).
Nanosecond resolution is required to set the exact modification time on
filesystems supporting nanosecond timestamps (e.g ext4, btrfs, NTFS, ...). It
helps also to compare the modification time of two files when checking which
one is newer. Examples: copy a file and its modification time using
filesystems supporting nanosecond timestamps (e.g. ext4, btrfs, NTFS, ...). It
helps also to compare the modification time to check if a file is newer than
another file. Use cases: copy the modification time of a file using
shutil.copystat(), create a TAR archive with the tarfile module, manage a
mailbox with the mailbox module, etc.
@ -51,64 +50,82 @@ example, the NTP protocol uses fractions of 2\ :sup:`32` seconds
.. note::
With a resolution of 1 microsecond (10\ :sup:`-6`), float timestamps lose
precision for values bigger than 2\ :sup:`33` seconds (272 years: 2242-03-16
for an Epoch timestamp).
With a resolution of 100 nanoseconds (10\ :sup:`-7`), float timestamps lose
precision for values bigger than 2\ :sup:`29` seconds (17 years: 1987-01-05
for an Epoch timestamp).
for an Epoch timestamp). With a resolution of 100 nanoseconds
(10\ :sup:`-7`, resolution used on Windows), float timestamps lose precision
for values bigger than 2\ :sup:`29` seconds (17 years: 1987-01-05 for an
Epoch timestamp).
Specification
=============
Add decimal.Decimal as a new type for timestamps. Decimal supports any
timestamp resolution, support arithmetic operations and is comparable.
Functions getting float inputs support directly Decimal, Decimal is converted
implicitly to float, even if the conversion may lose precision.
timestamp resolution, support arithmetic operations and is comparable. It is
possible to coerce a Decimal to float, even if the conversion may lose
precision. The clock resolution can also be stored in a Decimal object.
Add a *timestamp* optional argument to:
Add an optional *timestamp* argument to:
* os module: fstat(), fstatat(), lstat() and stat()
* time module: clock(), clock_gettime(), clock_getres(), time() and
wallclock()
* os module: fstat(), fstatat(), lstat(), stat() (st_atime,
st_ctime and st_mtime fields of the stat structure),
sched_rr_get_interval(), times(), wait3() and wait4()
* resource module: ru_utime and ru_stime fields of getrusage()
* signal module: getitimer(), setitimer()
* time module: clock(), clock_gettime(), clock_getres(),
monotonic(), time() and wallclock()
The *timestamp* argument is a type, there are two supported types:
The *timestamp* argument value can be float or Decimal, float is still the
default for backward compatibility. The following functions support Decimal as
input:
* float
* decimal.Decimal
* datetime module: date.fromtimestamp(), datetime.fromtimestamp() and
datetime.utcfromtimestamp()
* os module: futimes(), futimesat(), lutimes(), utime()
* select module: epoll.poll(), kqueue.control(), select()
* signal module: setitimer(), sigtimedwait()
* time module: ctime(), gmtime(), localtime(), sleep()
The float type is still used by default for backward compatibility.
Support decimal.Decimal (without implicit conversion to float to avoid lose of
precision) in functions having timestamp arguments:
* datetime.datetime.fromtimestamp()
* time.gmtime(), time.localtime()
* os.utimensat(), os.futimens()
The os.stat_float_times() is deprecated: use an explicit cast using int()
instead.
The os.stat_float_times() function is deprecated: use an explicit cast using
int() instead.
.. note::
The decimal module is implemented in Python and is slow, but there is a C
reimplementation which is almost ready for inclusion in CPython.
The decimal module is implemented in Python and is slower than float, but
there is a new C implementation which is almost ready for inclusion in
CPython.
Backwards Compatibility
=======================
The default timestamp type is unchanged, so there is no impact of backwad
compatibility, nor impact on performances. The new timestamp type,
decimal.Decimal, is only used when requested explicitly.
The default timestamp type is unchanged, so there is no impact on backward
compatibility nor on performances. The new timestamp type, decimal.Decimal, is
only returned when requested explicitly.
Objection: clocks accuracy
==========================
Computer clocks and operating systems are inaccurate and fail to provide
nanosecond accuracy in practice. A nanosecond is what it takes to execute a
couple of CPU instructions. Even on a real-time operating system, a
nanosecond-precise measurement is already obsolete when it starts being
processed by the higher-level application. A single cache miss in the CPU will
make the precision worthless.
.. note::
Linux *actually* is able to measure time in nanosecond precision, even
though it is not able to keep its clock synchronized to UTC with a
nanosecond accuracy.
Alternatives: Timestamp types
=============================
To support timestamps with a nanosecond resolution, the following types has
been considered:
To support timestamps with an arbitrary or nanosecond resolution, the following
types have been considered:
* 128 bits float
* number of nanoseconds
* 128-bits float
* decimal.Decimal
* datetime.datetime
* datetime.timedelta
@ -119,64 +136,87 @@ Criteria:
* Doing arithmetic on timestamps must be possible
* Timestamps must be comparable
* An arbitrary resolution, or at least a resolution of 1 nanosecond without
* An arbitrary resolution, or at least a resolution of one nanosecond without
losing precision
* Compatibility with the float type
* It should be possible to coerce the new timestamp to float for backward
compatibility
It should be possible to coerce the new timestamp to float for backward
compatibility, even if programs should not get this new type if they did not
ask explicitly to get it.
128 bits float
A resolution of one nanosecond is enough to support all current C functions.
The best resolution used by operating systems is one nanosecond. In practice,
most clock accuracy is closer to microseconds than nanoseconds. So it sounds
reasonable to use a fixed resolution of one nanosecond.
Number of nanoseconds (int)
---------------------------
A nanosecond resolution is enough for all current C functions and so a
timestamp can simply be a number of nanoseconds, an integer, not a float.
The number of nanoseconds format has been rejected because it would require to
add new specialized functions for this format because it not possible to
differentiate a number of nanoseconds and a number of seconds just by checking
the object type.
128-bits float
--------------
Add a new IEEE 754-2008 quad-precision float type. The IEEE 754-2008 quad
precision float has 1 sign bit, 15 bits of exponent and 112 bits of mantissa.
Add a new IEEE 754-2008 quad-precision binary float type. The IEEE 754-2008
quad precision float has 1 sign bit, 15 bits of exponent and 112 bits of
mantissa. 128-bits float is supported by GCC (4.3), Clang and ICC compilers.
128 bits float is supported by GCC (4.3), Clang and ICC compilers. Python must
be portable and so cannot rely on a type only available on some platforms. For
example, Visual C++ 2008 doesn't support it 128 bits float, whereas it is used
to build the official Windows executables. Another example: GCC 4.3 does not
support __float128 in 32-bit mode on x86 (but GCC 4.4 does).
Python must be portable and so cannot rely on a type only available on some
platforms. For example, Visual C++ 2008 doesn't support 128-bits float, whereas
it is used to build the official Windows executables. Another example: GCC 4.3
does not support __float128 in 32-bit mode on x86 (but GCC 4.4 does).
Intel CPUs have FPU (x87) supporting 80-bit floats, but not using SSE
intructions. Other CPU vendors don't support this float size.
There is also a license issue: GCC uses the MPFR library for 128 bits float,
There is also a license issue: GCC uses the MPFR library for 128-bits float,
library distributed under the GNU LGPL license. This license is not compatible
with the Python license.
.. note::
The x87 floating point unit of Intel CPU supports 80-bit floats. This format
is not supported by the SSE instruction set, which is now preferred over
float, especially on x86_64. Other CPU vendors don't support 80-bit float.
datetime.datetime
-----------------
Advantages:
The datetime.datetime type is the natural choice for a timestamp because it is
clear that this type contains a timestamp, whereas int, float and Decimal are
raw numbers. It is an absolute timestamp and so is well defined. It gives
direct access to the year, month, day, hours, minutes and seconds. It has
methods related to time like methods to format the timestamp as string (e.g.
datetime.datetime.strftime).
* datetime.datetime is the natural choice for a timestamp because it is clear
that this type contains a timestamp, whereas int, float and Decimal are
raw numbers.
* datetime.datetime is an absolute timestamp and so is well defined
* datetime.datetime gives direct access to the year, month, day, hours,
minutes and seconds.
* datetime.datetime has methods related to time like methods to format the
timestamp as string (e.g. datetime.datetime.strftime).
The major issue is that except os.stat(), time.time() and
time.clock_gettime(time.CLOCK_GETTIME), all time functions have an unspecified
starting point and no timezone information, and so cannot be converted to
datetime.datetime.
Drawbacks:
* Except os.stat(), time.time() and time.clock_gettime(time.CLOCK_GETTIME),
all time functions have an unspecified starting point and no timezone
information, and so cannot be converted to datetime.datetime.
* datetime.datetime has issues with timezone. For example, a datetime object
without timezone and a datetime with a timezone cannot be compared.
* datetime.datetime has ordering issues with daylight saving time (DST) in the
duplicate hour of switching from DST to normal time.
* datetime.datetime is not as well integrated than Epoch timestamps: there is
no datetime.datetime.totimestamp() function. Most functions expecting
tiemstamps don't support datime.datetime. For example, os.utime() expects a
tuple of Epoch timestamps.
datetime.datetime has also issues with timezone. For example, a datetime object
without timezone (unaware) and a datetime with a timezone (aware) cannot be
compared. There is also an ordering issues with daylight saving time (DST) in
the duplicate hour of switching from DST to normal time.
datetime.datetime has been rejected because it cannot be used for functions
using an unspecified starting point like os.times() or time.clock().
For time.time() and time.clock_gettime(time.CLOCK_GETTIME): it is already
possible to get the current time as a datetime.datetime object using::
datetime.datetime.now(datetime.timezone.utc)
For os.stat(), it is simple to create a datetime.datetime object from a
decimal.Decimal timestamp in the UTC timezone::
datetime.datetime.fromtimestamp(value, datetime.timezone.utc)
.. note::
datetime.datetime only supports microsecond resolution, but can be enhanced
to support nanosecond.
@ -186,19 +226,20 @@ datetime.timedelta
datetime.timedelta is the natural choice for a relative timestamp because it is
clear that this type contains a timestamp, whereas int, float and Decimal are
raw numbers. It can be used with datetime.datetime to get an absolute
timestamp.
raw numbers. It can be used with datetime.datetime to get an absolute timestamp
when the starting point is known.
datetime.timedelta has been rejected because it is not "compatible" with float,
whereas Decimal can be converted to float, and has a fixed resolution. One new
standard timestamp type is enough, and Decimal is preferred over
datetime.timedelta.
datetime.timedelta has been rejected because it cannot be coerced to float and
has a fixed resolution. One new standard timestamp type is enough, Decimal is
preferred over datetime.timedelta. Converting a datetime.timedelta to float
requires an explicit call to the datetime.timedelta.total_seconds() method.
.. note::
datetime.timedelta only supports microsecond resolution, but can be enhanced
to support nanosecond.
.. _tuple-integers:
.. _tuple:
Tuple of integers
-----------------
@ -206,8 +247,8 @@ Tuple of integers
To expose C functions in Python, a tuple of integers is the natural choice to
store a timestamp because the C language uses structures with integers fields
(e.g. timeval and timespec structures). Using only integers avoids the loss of
precision (Python supports integer of arbitrary length). Creating and parsing a
tuple of integers is simple and fast.
precision (Python supports integers of arbitrary length). Creating and parsing
a tuple of integers is simple and fast.
Depending of the exact format of the tuple, the precision can be arbitrary or
fixed. The precision can be choose as the loss of precision is smaller than
@ -219,73 +260,72 @@ Different formats has been proposed:
* value = numerator / denominator
* resolution = 1 / denominator
* the numerator is a signed integer and can be bigger than the denominator
* denominator > 0
* B: (seconds, numerator, denominator)
* value = seconds + numerator / denominator
* resolution = 1 / denominator
* seconds is a signed integer
* 0 <= numerator < denominator
* denominator > 0
* C: (intpart, floatpart, base, exponent)
* value = intpart + floatpart × base\ :sup:`exponent`
* resolution = base \ :sup:`-exponent`
* intpart is a signed integer
* value = intpart + floatpart / base\ :sup:`exponent`
* resolution = 1 / base \ :sup:`exponent`
* 0 <= floatpart < base \ :sup:`exponent`
* base > 0
* exponent is a signed integer and should be negative
* exponent >= 0
* D: (intpart, floatpart, exponent)
* value = intpart + floatpart × 10\ :sup:`exponent`
* resolution = 10 \ :sup:`-exponent`
* intpart is a signed integer
* value = intpart + floatpart / 10\ :sup:`exponent`
* resolution = 1 / 10 \ :sup:`exponent`
* 0 <= floatpart < 10 \ :sup:`exponent`
* exponent is a signed integer and should be negative
* exponent >= 0
* E: (sec, nsec)
* value = sec + nsec × 10\ :sup:`-9`
* resolution = 10 \ :sup:`-9` (nanosecond)
* sec is a signed integer
* 0 <= nsec < 10 \ :sup:`9`
All formats support an arbitary resolution, except of the format (E).
All formats support an arbitrary resolution, except of the format (E).
The format (D) may loss of precision if the clock frequency is arbitrary and
cannot be expressed as 10 \ :sup:`exponent`. The format (C) has a similar
issue, but in such case, it is possible to use base=frequency and exponent=-1.
The format (D) may not be able to store the exact value (may loss of precision)
if the clock frequency is arbitrary and cannot be expressed as a power of 10.
The format (C) has a similar issue, but in such case, it is possible to use
base=frequency and exponent=1.
The formats (D) and (E) allow optimization for conversion to float if the base
is 2 and to decimal.Decimal if the base is 10.
The formats (C), (D) and (E) allow optimization for conversion to float if the
base is 2 and to decimal.Decimal if the base is 10.
The format (A) supports arbitrary precision, is simple (only two fields), only
requires a simple division to get the floating point value, and is already used
by float.as_integer_ratio().
The format (A) is a simple fraction. It supports arbitrary precision, is simple
(only two fields), only requires a simple division to get the floating point
value, and is already used by float.as_integer_ratio().
To simplify the implementation (especially if implemented in C to avoid integer
overflow), it may be possible to accept numerator bigger than the denominator
(e.g. floatpart bigger than base \ :sup:`exponent` for the format (C)), and
normalize the tuple later.
To simplify the implementation (especially the C implementation to avoid
integer overflow), a numerator bigger than the denominator can be accepted.
The tuple may be normalized later.
Tuple of integers have been rejected because they don't support arithmetic
operations.
.. note::
On Windows, the ``QueryPerformanceCounter()`` clock uses the frequency of
the processor which is an arbitrary number and can be read using
``QueryPerformanceFrequency()``.
the processor which is an arbitrary number and so may not be a power or 2 or
10. The frequency can be read using ``QueryPerformanceFrequency()``.
timespec structure
------------------
A resolution of one nanosecond is enough to support all current C functions. A
Timespec type can be added to store a timestamp with a nanosecond resolution.
Basic example supporting addition, subtraction and coercion to float::
timespec is the C structure used to store timestamp with a nanosecond
resolution. Python can use a type with the same structure: (seconds,
nanoseconds). For convenience, arithmetic operations on timespec are supported.
Example of an incomplete timespec type supporting addition, subtraction and
coercion to float::
class timespec(tuple):
def __new__(cls, sec, nsec):
@ -329,15 +369,30 @@ Basic example supporting addition, subtraction and coercion to float::
def __repr__(self):
return '<timespec(%s, %s)>' % (self.sec, self.nsec)
The timespec type is similar to the `Tuple of integer, variant (A)
<tuple-integers>`_ type, except that it supports arithmetic.
The timespec type is similar to the format (E) of tuples of integer, except
that it supports arithmetic and coercion to float.
The timespec type was rejected because it only supports nanosecond resolution.
The timespec type was rejected because it only supports nanosecond resolution
and requires to implement each arithmetic operation, whereas the Decimal type
is already implemented and well tested.
Alternatives: API design
========================
Add a string argument to specify the return type
------------------------------------------------
Add an string argument to function returning timestamps, example:
time.time(format="datetime"). A string is more extensible than a type: it is
possible to request a format that has no type, like a tuple of integers.
This API was rejected because it was necessary to import implicitly modules to
instantiate objects (e.g. import datetime to create datetime.datetime).
Importing a module may raise an exception and may be slow, such behaviour is
unexpected and surprising.
Add a global flag to change the timestamp type
----------------------------------------------
@ -345,20 +400,21 @@ A global flag like os.stat_decimal_times(), similar to os.stat_float_times(),
can be added to set globally the timestamp type.
A global flag may cause issues with libraries and applications expecting float
instead of Decimal. A float cannot be converted implicitly to Decimal. The
os.stat_float_times() case is different because an int can be converted
implictly to float.
instead of Decimal. Decimal is not fully compatible with float. float+Decimal
raises a TypeError for example. The os.stat_float_times() case is different
because an int can be coerced to float and int+float gives float.
Add a protocol to create a timestamp
------------------------------------
Instead of hard coding how timestamps are created, a new protocol can be added
to create a timestamp from a fraction. time.time(timestamp=type) would call
type.__from_fraction__(numerator, denominator) to create a timestamp object of
the specified type.
to create a timestamp from a fraction.
If the type doesn't support the protocol, a fallback can be used:
type(numerator) / type(denominator).
For example, time.time(timestamp=type) would call the class method
type.__fromfraction__(numerator, denominator) to create a timestamp object of
the specified type. If the type doesn't support the protocol, a fallback is
used: type(numerator) / type(denominator).
A variant is to use a "converter" callback to create a timestamp. Example
creating a float timestamp:
@ -367,20 +423,20 @@ creating a float timestamp:
return float(numerator) / float(denominator)
Common converters can be provided by time, datetime and other modules, or maybe
a specific "hires" module. Users can defined their own converters.
a specific "hires" module. Users can define their own converters.
Such protocol has a limitation: the structure of data passed to the protocol or
the callback has to be decided once and cannot be changed later. For example,
adding a timezone or the absolution start of the timestamp (e.g. Epoch or
unspecified start for monotonic clocks) would break the API.
Such protocol has a limitation: the timestamp structure has to be decided once
and cannot be changed later. For example, adding a timezone or the absolute
start of the timestamp would break the API.
The protocol proposition was as being excessive given the requirements, but
that the specific syntax proposed (time.time(timestamp=type)) allows this to be
introduced later if compelling use cases are discovered.
.. note::
Other formats can also be used instead of a fraction: see the `Tuple of integers
<tuple-integers>`_ section
Other formats may be used instead of a fraction: see the tuple of integers
section for example.
Add new fields to os.stat
-------------------------
@ -388,30 +444,30 @@ Add new fields to os.stat
To get the creation, modification and access time of a file with a nanosecond
resolution, three fields can be added to os.stat() structure.
The new fields can timestamps with nanosecond resolution (tuple of integers,
timespec structure, Decimal, etc.) or the nanosecond part of each timestamp.
The new fields can be timestamps with nanosecond resolution (e.g. Decimal) or
the nanosecond part of each timestamp (int).
If the new fields are timestamps with nanosecond resolution, populating the
extra fields would be time consuming. Any call to os.stat() would be slower,
even if os.stat() is only called to check if the file exists. A parameter can
be added to os.stat() to make these fields optional, but a structure with a
variable number of fields can be problematic.
even if os.stat() is only called to check if a file exists. A parameter can be
added to os.stat() to make these fields optional, the structure would have a
variable number of fields.
If the new fields only contain the fractional part (nanoseconds), os.stat()
would be efficient. These fields would always be present and so set to zero if
the operating system does not support subsecond resolution. Splitting a
timestamp in two parts, seconds and nanoseconds, is similar to the `timespec
type <timespec>`_ and `tuple of integers <tuple-integers>`_, and so have the
same drawbacks.
the operating system does not support sub-second resolution. Splitting a
timestamp in two parts, seconds and nanoseconds, is similar to the timespec
type and tuple of integers, and so have the same drawbacks.
Adding new fields to the os.stat() structure does not solve the nanosecond
issue in other modules (e.g. time).
issue in other modules (e.g. the time module).
Add a boolean argument
----------------------
Because we only need one new type, decimal.Decimal, a simple boolean flag
can be added. For example, time.time(decimal=True) or time.time(hires=True).
Because we only need one new type (Decimal), a simple boolean flag can be
added. Example: time.time(decimal=True) or time.time(hires=True).
Such flag would require to do an hidden import which is considered as a bad
practice.
@ -420,6 +476,7 @@ The boolean argument API was rejected because it is not "pythonic". Changing
the return type with a parameter value is preferred over a boolean parameter (a
flag).
Add new functions
-----------------
@ -431,17 +488,48 @@ Add new functions for each type, examples:
* os.stat_timespec()
* etc.
Adding a new function for each function creating timestamps duplicate a lot
of code.
Adding a new function for each function creating timestamps duplicate a lot of
code and would be a pain to maintain.
Add a new hires module
----------------------
Add a new module called "hires" with the same API than the time module, except
that it would return timestamp with high resolution, e.g. decimal.Decimal.
Adding a new module avoids to link low-level modules like time or os to the
decimal module.
This idea was rejected because it requires to duplicate most of the code of the
time module, would be a pain to maintain, and timestamps are used modules other
than the time module. Examples: signal.sigtimedwait(), select.select(),
resource.getrusage(), os.stat(), etc. Duplicate the code of each module is not
acceptable.
Links
=====
* `Issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution <http://bugs.python.org/issue11457>`_
* `Issue #13882: Add format argument for time.time(), time.clock(), ... to get a timestamp as a Decimal object <http://bugs.python.org/issue13882>`_
* `[Python-Dev] Store timestamps as decimal.Decimal objects <http://mail.python.org/pipermail/python-dev/2012-January/116025.html>`_
Python:
* `Issue #7652: Merge C version of decimal into py3k <http://bugs.python.org/issue7652>`_ (cdecimal)
* `Issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution <http://bugs.python.org/issue11457>`_
* `Issue #13882: PEP 410: Use decimal.Decimal type for timestamps <http://bugs.python.org/issue13882>`_
* `[Python-Dev] Store timestamps as decimal.Decimal objects <http://mail.python.org/pipermail/python-dev/2012-January/116025.html>`_
Other languages:
* Ruby (1.9.3), the `Time class <http://ruby-doc.org/core-1.9.3/Time.html>`_
supports picosecond (10\ :sup:`-12`)
* .NET framework, `DateTime type <http://msdn.microsoft.com/en-us/library/system.datetime.ticks.aspx>`_:
number of 100-nanosecond intervals that have elapsed since 12:00:00
midnight, January 1, 0001. DateTime.Ticks uses a signed 64-bit integer.
* Java (1.5), `System.nanoTime() <http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/System.html#nanoTime()>`_:
wallclock with an unspecified starting point as a number of nanoseconds, use
a signed 64 bits integer (long).
* Perl, `Time::Hiref module <http://perldoc.perl.org/Time/HiRes.html>`_:
use float so has the same loss of precision issue with nanosecond resolution
than Python float timestamps
Copyright