PEP 410: complete motivation section, add timespec type, rephrase os.stat() and

datetime.datetime() sections
This commit is contained in:
Victor Stinner 2012-02-17 02:20:31 +01:00
parent 006728b301
commit 76e6380050
1 changed files with 100 additions and 15 deletions

View File

@ -22,9 +22,8 @@ Motivation
==========
Python 2.3 introduced float timestamps to support subsecond resolutions.
os.stat() uses float timestamps by default since Python 2.5.
Python 3.3 introduced functions supporting nanosecond resolutions:
os.stat() uses float timestamps by default since Python 2.5. Python 3.3
introduced functions supporting nanosecond resolutions:
* os module: stat(), utimensat(), futimens()
* time module: clock_gettime(), clock_getres(), wallclock()
@ -33,10 +32,27 @@ The Python float type uses binary64 format of the IEEE 754 standard. With a
resolution of 1 nanosecond (10\ :sup:`-9`), float timestamps lose precision for values
bigger than 2\ :sup:`24` seconds (194 days: 1970-07-14 for an Epoch timestamp).
Nanosecond resolution is required to set the exact modification time on
filesystems supporting nanosecond timestamps (e.g ext4, btrfs, NTFS, ...). It
helps also to compare the modification time of two files when checking which
one is newer. Examples: copy a file and its modification time using
shutil.copystat(), create a TAR archive with the tarfile module, manage a
mailbox with the mailbox module, etc.
An arbitrary resolution is preferred over a fixed resolution (like nanosecond)
to not have to change the API when a better resolution is required. For
example, the NTP protocol uses fractions of 2\ :sup:`32` seconds
(approximatively 2.3 x 10\ :sup:`-10` second), whereas the NTP protocol version
4 uses fractions of 2\ :sup:`64` seconds (5.4 x 10\ :sup:`-20` second).
.. note::
With a resolution of 1 microsecond (10\ :sup:`-6`), float timestamps lose precision
for values bigger than 2\ :sup:`33` seconds (272 years: 2242-03-16 for an Epoch
timestamp).
With a resolution of 1 microsecond (10\ :sup:`-6`), float timestamps lose
precision for values bigger than 2\ :sup:`33` seconds (272 years: 2242-03-16
for an Epoch timestamp).
With a resolution of 100 nanoseconds (10\ :sup:`-7`), float timestamps lose
precision for values bigger than 2\ :sup:`29` seconds (17 years: 1987-01-05
for an Epoch timestamp).
Specification
@ -123,8 +139,9 @@ with the Python license.
datetime.datetime
-----------------
Most functions returning timestamps don't have a known starting point or
timezone information, and so cannot be converted to datetime.datetime.
Except os.stat(), time.time() and time.clock_gettime(time.CLOCK_GETTIME), all
time functions have an unspecified starting point and no timezone information,
and so cannot be converted to datetime.datetime.
datetime.datetime only supports microsecond resolution, but can be enhanced
to support nanosecond.
@ -193,6 +210,60 @@ for float or if the base 10 for Decimal. In other cases, frequency = base\
:sup:`exponent` must be computed again to convert a timestamp as float or
Decimal. Storing directly the frequency in the denominator is simpler.
timespec structure
------------------
A resolution of one nanosecond is enough to support all current C functions. A
Timespec type can be added to store a timestamp with a nanosecond resolution.
Basic example supporting addition, subtraction and coercion to float::
class timespec(tuple):
def __new__(cls, sec, nsec):
if not isinstance(sec, int):
raise TypeError
if not isinstance(nsec, int):
raise TypeError
asec, nsec = divmod(nsec, 10 ** 9)
sec += asec
obj = tuple.__new__(cls, (sec, nsec))
obj.sec = sec
obj.nsec = nsec
return obj
def __float__(self):
return self.sec + self.nsec * 1e-9
def total_nanoseconds(self):
return self.sec * 10 ** 9 + self.nsec
def __add__(self, other):
if not isinstance(other, timespec):
raise TypeError
ns_sum = self.total_nanoseconds() + other.total_nanoseconds()
return timespec(*divmod(ns_sum, 10 ** 9))
def __sub__(self, other):
if not isinstance(other, timespec):
raise TypeError
ns_diff = self.total_nanoseconds() - other.total_nanoseconds()
return timespec(*divmod(ns_diff, 10 ** 9))
def __str__(self):
if self.sec < 0 and self.nsec:
sec = abs(1 + self.sec)
nsec = 10**9 - self.nsec
return '-%i.%09u' % (sec, nsec)
else:
return '%i.%09u' % (self.sec, self.nsec)
def __repr__(self):
return '<timespec(%s, %s)>' % (self.sec, self.nsec)
The timespec type is similar to the `Tuple of integer, variant (A)
<tuple-integers>`_ type, except that it supports arithmetic.
The timespec type was rejected because it only supports nanosecond resolution.
Alternatives: API design
========================
@ -244,14 +315,27 @@ introduced later if compelling use cases are discovered.
Add new fields to os.stat
-------------------------
It was proposed to add 3 fields to os.stat() structure to get nanoseconds of
timestamps.
To get the creation, modification and access time of a file with a nanosecond
resolution, three fields can be added to os.stat() structure.
Populating the extra fields is time consuming. If new fields are available by
default, any call to os.stat() would be slower. If new fields are optional, the
stat structure would have a variable number of fields, which can be surprising.
The new fields can timestamps with nanosecond resolution (tuple of integers,
timespec structure, Decimal, etc.) or the nanosecond part of each timestamp.
Anyway, this approach does not help with the time module.
If the new fields are timestamps with nanosecond resolution, populating the
extra fields would be time consuming. Any call to os.stat() would be slower,
even if os.stat() is only called to check if the file exists. A parameter can
be added to os.stat() to make these fields optional, but a structure with a
variable number of fields can be problematic.
If the new fields only contain the fractional part (nanoseconds), os.stat()
would be efficient. These fields would always be present and so set to zero if
the operating system does not support subsecond resolution. Splitting a
timestamp in two parts, seconds and nanoseconds, is similar to the `timespec
type <timespec>`_ and `tuple of integers <tuple-integers>`_, and so have the
same drawbacks.
Adding new fields to the os.stat() structure does not solve the nanosecond
issue in other modules (e.g. time).
Add a boolean argument
----------------------
@ -274,10 +358,11 @@ Add new functions for each type, examples:
* time.clock_decimal()
* time.time_decimal()
* os.stat_decimal()
* os.stat_timespec()
* etc.
Adding a new function for each function creating timestamps duplicate a lot
of time.
of code.
Links