The Unix leap second mess

[This page was originally copied from an entry in my blog.]

Contents

Background on time scales (TAI, UTC, UT1)

Let me start with some background. I discussed elsewhere the reason why the day is ever so slightly longer than 86400 SI seconds. One of the consequences of this is that there are several different time scales involved in accurate timekeeping (which I discuss in more detail here). One is TAI, which is a pure linear time scale, maintained by several atomic clocks across the globe and counting SI seconds on the geoid; it has been maintained since 1958 (though it can conceptually be extended back in the past as a variant of Terrestrial Time), it was initially synchronized with universal time and has now drifted away from it by approximately 34.136 seconds today. Another is Universal time UT1 (there are other variants of universal time such as UT2, but they will not concern us here): though there are subtleties, this is essentially the real mean solar time on Greenwich meridian, and it is the parameter of the real orientation of Earth in space. So as measured in UT1, one solar day is always 86400 seconds: conversely, this means that UT1 does not measure actual SI seconds, but rather some kind of angle parameter. This time scale makes sense very far back in the past.

Finally, we have UTC, which is the most important time standard as it is the basis of civil time: this is a compromise between TAI and UT1, in the following manner: since 1972[#], it always differs from TAI by an integral number of seconds, and always stays within 1s of UT1; the way these goals are accomplished is by inserting (or possibly, subtracting) leap second discontinuities into UTC, at the end of June or December of a year (or possibly March or September, but these possibilities have never been used), as decided by the International Earth Rotation Service and published at least three months in advance (for example, this is the decision implying that no leap second will be inserted at the end of this year). Unlike TAI and UT1, the UTC time scale should not be considered as a pure real number (or seconds count): instead, it should be viewed as a broken-down time (year-month-day-hour-minute-second-fraction) in which the number of seconds ranges from 0 to 60 inclusive (there can be 61 or 59 seconds in a minute); during a positive leap second the number of seconds takes the value 60 (while a negative leap second would skip the value 59, but this has never occurred). For example, today at noon civil time in Paris, UTC was (exactly) 2010-12-27T11:00:00.000 while TAI was (exactly) 2010-12-27T11:00:34.000 and UT1 was (approximately) 2010-12-27T10:59:59.863. The last leap second took place at the end of 2008 (and increased the TAIUTC difference from 33s to 34s): the following three events were separated by half-second intervals:

(Note that the instant at which the leap second occurs is the same in every time zone, the local time is not. So the last leap second, 2008-12-31T23:59:60 in UTC, was 2009-01-01T00:59:60 in Paris (+0100), 2008-12-31T18:59:60 in New York (−0500), 2009-01-01T10:59:60 in Sydney (+1100), and so on: Australians get their leap second in the end of the morning on New Year's day or July 1, and Americans get theirs during the evening of New Year's eve or June 30.)

If we attempt to condense UTC to a single number (say, the number of seconds since 1970-01-01T00:00:00 or since 1900-01-01T00:00:00, or the number of 86400s-days since 1858-11-17T00:00:00, or something of the sort), we encounter the problem that the same value can refer to two different instants since the clock has been set back one second (negative leap seconds, of course, would cause no such difficulty). One of the themes of what follows is what Unix functions such as gettimeofday() or clock_gettime() can, should, do or might return at these various points in time.

What the spec says

The earlier versions of the Single Unix specification stated that: The gettimeofday() function obtains the current time, expressed as seconds and microseconds since 00:00 Coordinated Universal Time (UTC), January 1, 1970 (a moment known as the Unix epoch). The meaning of this phrase is a bit obscure, because, after all, leap seconds have elapsed just the same as non-leap seconds (so one might interpret this sentence to mean the number of seconds actually elapsed in a linear time scale such as TAI since the Unix epoch). However, the intent is clear, and some documentations specify that leap seconds are not counted or ignored; this still leaves some room for doubt as to what happens with the “rubber seconds” used by UTC between 1970 and 1972. To make things even clearer, versions 3 and 4 of the Single Unix specification have introduced the following wording:

§4.15. Seconds Since the Epoch

A value that approximates the number of seconds that have elapsed since the Epoch. A Coordinated Universal Time name (specified in terms of seconds (tm_sec), minutes (tm_min), hours (tm_hour), days since January 1 of the year (tm_yday), and calendar year minus 1900 (tm_year)) is related to a time represented as seconds since the Epoch, according to the expression below.

If the year is <1970 or the value is negative, the relationship is undefined. If the year is >=1970 and the value is non-negative, the value is related to a Coordinated Universal Time name according to the C-language expression, where tm_sec, tm_min, tm_hour, tm_yday, and tm_year are all integer types:

tm_sec + tm_min*60 + tm_hour*3600 + tm_yday*86400 +
    (tm_year-70)*31536000 + ((tm_year-69)/4)*86400 -
    ((tm_year-1)/100)*86400 + ((tm_year+299)/400)*86400

The relationship between the actual time of day and the current value for seconds since the Epoch is unspecified.

How any changes to the value of seconds since the Epoch are made to align to a desired relationship with the current actual time is implementation-defined. As represented in seconds since the Epoch, each and every day shall be accounted for by exactly 86400 seconds.

—The Open Group, Single Unix Specification, Base definitions (4: General concepts)

The key point here is that instead of using a vague expression such as number of seconds elapsed since <Epoch> not counting leap seconds, a precise expression is given to convert a broken-down UTC value to a time as seconds since the Epoch. But the meaning is the same. And the consequence is as mentioned above: by reducing UTC to a single number, we have an ambiguity in the actual instant being referred. For example, if we apply the formula to the same six instants as above, we get:

So the value 1230768000 can refer to two different consecutive seconds: the leap second itself (that is, the start of the leap second if we indicate a precise instant) and the next second (that is, the end of the leap second); or, if we consider seconds_since_epoch to be a fractional quantity (as indicated within brackets), then every value between 1230768000.0 inclusive and 1230768001.0 exclusive refers to two instants at one second interval. So it is not possible to break down a seconds_since_epoch value to hour, minutes and seconds (and, based on this value, one could never actually infer a time of 23:59:60).

This is what the spec says; however, does not mean that gettimeofday() actually returns these values on real Unix systems. What actually happens depends on the implementation details.

What actually happens

The TAI−10 setup

First, there are some people who suggest ignoring the spec and basing gettimeofday() not on UTC as the spec defines it (number of seconds counted by UTC since 1970-01-01T00:00:00(UTC) excluding leap seconds, see above) but on TAI as the number of seconds since 1970-01-01T00:00:10(TAI), sometimes summarized as TAI−10. The reason for the 10s offset is that it was the TAIUTC offset in 1972 when leap seconds were introduced, so that TAI−10 coincides, post-1972, with the number of seconds counted by UTC since 1970-01-01T00:00:00(UTC) including leap seconds. The deduction of leap seconds is then left to the time zone file, i.e., converting to UTC or any civil time is considered as a time zone shift. This proposal was made by Arthur David Olson (founding author of the timezone database) and, for this reason, a series of time zones, known as the right/ time zones exist in the database which assume that the system clock is set to TAI−10. The main advantage of this setup is that it is unambiguous: for the sample six instants one would have

The advantage of this scheme is that TAI−10 uniquely refers to an instant in time, and the difference between two such values determines the interval between these instants (but see the next paragraph for a related misconception). It is only by setting the system time to TAI−10 and using the right/ time zones that a Unix system can presently display the time correctly during a leap second (as 23:59:60 in the +0000 time zone, say). The disadvantages, however, in my mind outweigh the advantages. Using TAI−10 breaks the spec: it will confuse not only programs that make an explicit assumption about the reference of time (say, ephemerides programs), but also interoperability (timestamps written in many Unix-related filesystems and formats, e.g., tar: the twenty-so second difference between UTC and TAI−10 is no longer so negligible that we can ignore it) and probably a number of programs that will use the fact that one can convert time to date by simply integer dividing by 86400. Also, there is no reliable way to synchronize clocks to TAI (GPS receptors and NTP servers, for example, only broadcast UTC, not TAI). And even ignoring the fact that we have an explicit commitment to UTC by the spec and the notion of civil time, using some form of universal time as a time basis is probably a good thing because we are still more concerned with the position of the Sun in the sky than some abstract thing like the flow of time measured by atomic clocks on the geoid. Maybe using UT1 as time basis would have been smarter (as long as PC's don't have atomic clocks built into them[#2]), because it is easier to skew the clock ever so slightly than to insert a full second every odd year, but anyway.

Digression: measuring intervals

At this point, I should probaby rebut a misconception: the idea that one can/should use gettimeofday() to measure intervals. This is wrong not merely because leap seconds can set the system clock back by one second every now and then: there are many other reasons why the system clock can be readjusted after getting out of synch, and one should consequently avoid taking the difference between values of gettimeofday() to measure delays. Instead, use clock_gettime(CLOCK_MONOTONIC,...): the gettimeofday() function should only be used to obtain the current date and time (as a wall clock), not to measure intervals (as a stopwatch) except if those intervals span over several months (or, perhaps more to the point, if they are expected to survive over a reboot). One should imagine that the accuracy of gettimeofday() is never too good and the error can vary from measure to measure (because the clock can be reset), while clock_gettime(CLOCK_MONOTONIC,...) gives you a slowly changing error which cannot tell you the date but is more appropriate for measuring intervals in time.

Keeping UTC in reality

Now return to gettimeofday(): what if we do follow the spec and use UTC as a time basis for the system clock? If the system is completely unaware of leap seconds, it will tick past the leap second without noticing it, and (if synchronized with some external time source like NTP) suddenly realize that it is ticking one second early and attempt to resynchronize. One possibility is that resynchronization will be achieved by slowing the clock by a few dozen parts per million, thus effectively diluting the error caused by leap second into a few hours or perhaps a day. This is a common view of things (and perhaps desirable, see the suggestion about CLOCK_UTS below), but I am uncertain whether any real-life system (typically a Unix system + an NTP implementation or some other timekeeping device) actually does this. Another possibility is that resynchronization will be performed brutally, by stepping the system clock back one second: this is like counting the leap second, except that it is performed a few minutes or hours too late and at an unpredictable moment: obviously undesirable. Now assume the system clock handling part in the kernel knows about leap seconds and tries to handle them gracefully: what can it actually do?

The NTP protocol attempts to do things as follows (described here in details by the inventor of NTP): NTP packets record time as seconds counted by UTC since 1900-01-01T00:00:00(UTC) excluding leap seconds, but that is not really relevant and the actual handling of the leap second is really left to the kernel clock discipline. The idea is that the system clock will be set back by one second during the leap second, but that for the duration of this second, gettimeofday() will nearly stall: it returns a constant value incremented by the smallest possible increment at each call. (Personally, I don't really see the point of the increment: anyone assuming that gettimeofday() must strictly increase between two subsequent calls is in error.) So if we take our recurring example of the leap second at the end of 2008 and assume gettimeofday() is called every half-second as displayed, we would get:

One could argue that this violates the spec: I don't think it does because the spec only seems to say something about the integral part of seconds_since_epoch, but anyway that would be reading the spec in a very anal way. However, it is still profoundly unsatisfactory, because we lose all accurate timekeeping during the leap second itself, and this still leaves no way of displaying a time as 23:59:60 (or distinguishing 23:59:60 from 00:00:00). There is no point in using NTP to synchronize clocks to a few milliseconds if we have to lower the expectation to one second every time a leap second is inserted. Incidentally, what the actual NTP protocol should show during the leap second itself is very obscure (but that doesn't matter much because the servers can always be interrogated again one second later).

A proposal: CLOCK_UTC and friends

What I suggest to fix this mess is to create a new clock specifier to clock_gettime(), called, for example, CLOCK_UTC, which differs from gettimeofday() (or equivalently CLOCK_REALTIME) in only one respect: during a leap second, clock_gettime(CLOCK_UTC,...) would return the same value in tv_sec as during the previous second, and a value in tv_nsec which is greater than 1000000000. E.g.:

There are a number of nice features about this scheme. The creation of a new clock CLOCK_UTC is destined to protect compatibility on gettimeofday() for those programs which believe that tv_nsec should never exceed 1000000000. The values of tv_sec and tv_nsec returned by clock_gettime(CLOCK_UTC,...) uniquely determine the full value of UTC and (with the knowledge of the leap seconds table) the instant in time. The duration of the leap second is characterized by the simple test tv_nsec>=1000000000L. The boundaries of days is simply determined by the value of tv_sec being a multiple of 86400 (whereas this is not the case with gettimeofday() as above: it returns a multiple of 86400 at 23:59:60, just one second before the beginning of the new day), so converting a time to a date becomes simply possible. And even displaying the time becomes a simple matter of adding 1 to the tm_sec field in a broken-down time (returned by gmtime() or localtime()) if the tv_nsec value exceeds 1000000000, after the time has been broken down:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int
main (void)
{
  struct timespec t;
  if ( clock_gettime (CLOCK_UTC, &t) == -1 ) {
    perror ("clock_gettime() failed");
    exit (EXIT_FAILURE);
  }
  struct tm broken;
  if ( gmtime_r (&t.tv_sec, &broken) == NULL ) {
    fprintf (stderr, "gmtime_r() failed\n");
    exit (EXIT_FAILURE);
  }
  while ( t.tv_nsec >= 1000000000L ) {
    t.tv_nsec -= 1000000000L;  // t.tv_sec++;
    broken.tm_sec++;
  }
  char buf[4096];
  if ( strftime (buf, sizeof(buf), "%FT%T", &broken) == 0 ) {
    fprintf (stderr, "strftime() failed\n");
    exit (EXIT_FAILURE);
  }
  printf ("%s.%03ld\n", buf, t.tv_nsec/1000000L);
  return 0;
}

Assuming clock_gettime(CLOCK_UTC,...) works as I suggest, this would correctly display the time as 23:59:60 (and fraction) during a leap second, and the same trick works for local time (using localtime() instead of gmtime()) as well as universal time. The strftime() function is designed to work with a value of tm_sec of 60, because leap seconds can exist (and indeed do if one uses the right/ time zones). Using while instead of if in the test t.tv_nsec >= 1000000000L above ensures that the code will still work even if double leap seconds are introduced some day. So this code is very robust. Negative leap seconds, of course, present no difficulty: they are simply skipped.

Similarly, should one wish to convert the return value of clock_gettime(CLOCK_UTC,...) to an accurate value of TAI−10, one would simply consult a table of leap seconds, count the number of those whose time of occurrence is less than tv_sec (and subtract negative leap seconds whose time of occurrence is similarly less than tv_sec) to compute the (TAI−10)−UTC offset, and add this result to tv_sec (and then subtract 1000000000 from tv_nsec and increase tv_sec so long as tv_nsec exceeds 1000000000). The point here is that whether or not the leap second has elapsed (and hence the value of (TAI−10)−UTC) is entirely reflected in the tv_sec value, ignoring tv_nsec. While this computation can be done in user space, it might be argued that the kernel could/should store the TAI offset and provide a clock_gettime(CLOCK_TAIMINUS10,...) as well.

Besides clock_gettime(CLOCK_UTC,...) and possibly clock_gettime(CLOCK_TAIMINUS10,...) which have been discussed here, it might also be desirable for the kernel to provide a clock_gettime(CLOCK_UTS,...) clock, where UTS stands for Universal Time, Smoothed: this clock would be equal to UTC except for a few hours after a leap second, where it would slowly compensate for the latter by speeding up by a small (but unspecified) factor. This would be for the benefit of programs wishing to “ignore away” leap seconds while keeping a reasonable level of precision when measuring durations.

So far I haven't actually looked into what goes on inside the kernel (say, Linux): the little that I have seen (in kernel/time/timekeeping.c and so on) is messy and undocumented, so it's hard to gauge how hard it would be to actually implement this clock_gettime(CLOCK_UTC,...) proposal; and, of course, it would probably be even harder to get it accepted. Even assuming it can be done, this does not fully solve the leap second mess: some filesystems provide sub-second granularity on timestamps, and it is hard to fix the stat() return structure to provide correct access to times inside a leap second without breaking compatibility with those programs that might have already assumed that the sub-seconds field in struct stat will never exceed 1000000000. It is also unclear which clock or which time scale is referred to when a timeout is specified in a function or system call (e.g., select(), pthread_cond_timedwait()…), though the Linux timerfd interface solves this particular problem. And, of course, there are zillions of programs in existence which simply call gettimeofday(), that need to be fixed one way or another to work properly inside a leap second; and zillions of programming languages beside C which would need to be given access to some accurate way of measuring time. I find this horribly depressing.

[#] The UTC time scale has existed since 1961, but before 1972, the relation between UTC and TAI was more complex: not only were there discontinuities, but also the length of the second itself was adjusted to keep closer to UT1; and TAIUTC was not kept integral. For example, the Unix epoch of UTC=1970-01-01T00:00:00 equals TAI=1970-01-01T00:00:08.000082 (exactly).

[#2] A typical PC quartz seems to have an accuracy of about ten to fifty parts per million, i.e., a couple of seconds per day. Counting SI seconds without any attempt at correction gives UT1 an accuracy of about 2ms per day, i.e., 20 parts per billion, so that's around a thousand times better. Hence, without external assistance (say by NTP, an atomic clock or a GPS receiver), the typical PC quartz cannot see or hope to see the difference between universal time and atomic time.