Timestamps done right (opens in new tab)

(getkerf.wordpress.com)

64 pointsmollmerx10y ago71 comments

71 comments

When you build that syntax into the language / tokenizer, you're making a bunch of choices for the user.

As many have said elsewhere in the thread, what is `m` supposed to mean, minute or month? If the smallest value you can represent with this special syntax is 1s, you're excluding people who work with sub-second timestamps. Unless, you want them to do like 1m0.0000415s or something like that, but now you have the same problem as you started with (long series of digits being hard to read for humans).

More problems: different cultures have different concepts of a "month" (because they have different calendars). Also, the units that you want to use depend heavily on the application. `1y` is different from `52w` is different from `365d`, but all of those concepts are very similar to users. I expect people to create bugs where these tokens are mixed. When you write `now + 1y` do you mean "this time on this date, one year from now" or do you mean "this instant plus 606024*365 seconds" (which is slightly greater/less, I forget which way)? How do you communicate that to the user? If they can't control which sense ("human" time vs "physical" time) the larger units represent, then those people have to calculate the time offsets by hand.

Just pick nanos or micros since some Epoch (the Unix one is one reasonable choice), use them everywhere internally, and let the user pretty-print them. If you want to support high-energy timeseries where the time scale is smaller, either make your nanos fractional or use femtos/attos/zeptos/plancks.

Jabbles10y ago

The ambiguity of "now + 1y" caused an outage for Azure:

https://azure.microsoft.com/en-us/blog/summary-of-windows-az...

ehartsuyker10y ago

> I mean, just try it in Java.

For ages JodaTime actually nailed it, and the Java 8 date API was based off this.

> Not an add on type as in R or Python or Java.

Again, let's talk about the modern version of the language and not act like prior screw ups are the end all for a language.

Also

> 2012.01.01 + 1m1d

How is that more clean than:

> new DateTime(2012, 1, 1).plusMonths(1).plusDays(1)

32bitkid10y ago

Personally, I think there is a more subtle conceptual idea that it glosses over that makes the more explicit Java version more "clean".

> D + 1m1d != D + 1d1m

Mixing time units like days, months, years (where units are intransitive) is, in my opinion, a bad idea.

nickpsecurity10y ago

Should be able to type or sanity check them like anything else. What specific issue are you worried about?

1 more reply

jandrese10y ago

IMHO, working on timestamps in code like this is just asking for edge case to bite you in the ass.

The only sane way I've found to work with time is to convert any timestamp into a seconds-since-the-epoch value when doing any internal work and then covert back to the timestamp format for display. As an added bonus your code won't get super messy when you start getting timestamps from different sources that are formatted differently. Everything gets normalized to the internal representation before you do work on it.

henrikschroder10y ago

You can't perform the operations "add one month" or "add one year" if you're working with seconds since the epoch.

For storing a timestamp, absolutely, you should use an integer-based format, counting discrete somethings since somewhen.

For working with timestamps, you need all the nuance, you need something that can manipulate the different parts of it independent of each other.

And finally, for displaying or reading timestamps, you need all the localization and parsing crap to figure out what "020304" means. Fourth of March 2002? Third of February 2004?

1 more reply

skywhopper10y ago

By my count it's over 30 characters less "clean". The Java syntax obscures some meaning and requires a lot more boilerplate in favor of less magical (and thus complex) syntax.

I agree that this sort of first-class datetime-type representation may not be appropriate for every language, but myself, I find it refreshing and brilliant, and I'd love to see more languages support this sort of syntax instead of overloading strings or using complicated objects or APIs.

It's like comparing the power and ease of using regular expressions in Perl or Ruby versus in Java or Python. In Perl and Ruby, regexes are built-in to the syntax of the language itself. They're a truly first-class type, like strings and integers are in all four languages, and like lists and dicts/hashes/associative arrays are in Perl, Ruby, and Python.

I'd love to see datetime objects promoted to similar first-class native syntax support in this way in more languages. It won't be appropriate everywhere, but in the right language it'd be amazing.

ehartsuyker10y ago

The Java is clean because it's perfectly understandable. I don't have to think about "+ 1m" meaning month, minute, or milli-. Verbose, yes, but the meaning is 100% unambiguous which I think makes it a better API.

> overloading strings

I'm pretty sure writing "+ 1m" is more of an overloading of a string than ".plusMinutes(1)".

stanmancan10y ago

How is plusMonths(1).plusDays(1) obscure in any way? You could show that to my grandparents and they would be able to guess what it does. "+1m1d" on the other hand they wouldn't have a clue.

Length, in either direction, does not correlate to "clean". Clarity and intent does. Clean Coder (http://www.amazon.ca/Clean-Code-Handbook-Software-Craftsmans...) does a fantastic job talking about this, it's worth picking up if you haven't read it in the past.

2 more replies

JadeNB10y ago

> By my count it's over 30 characters less "clean".

If character count were really the ultimate measure of cleanliness, then we'd all be programming pointlessly (https://en.wikipedia.org/wiki/Tacit_programming) and using single-character names for any variables that slipped through. (That's not to say that shorter is never better, but rather that, when it is better, its brevity is not the only reason.)

crispyambulance10y ago

The java "syntax" doesn't matter.

The IDE provides context-sensitive cues as one types so that you don't have the cognitive paper-cut from having to think for even a second if "1m" means "one minute" or "one month".

At a deeper level, chaining method calls to build-up an object is perfectly understandable way to modify something like a date.

xixi7710y ago

They are both about equally clear, but the former syntax will always take less time to process (unless perhaps you are a Java programmer who is familiar with the latter syntax but unfamiliar with the former, in which case it's not a valid comparison anyway). If it's an analysis-oriented package with most of the work done at a REPL, less time to type, too.

nickpsecurity10y ago

Because one is an expression like humans think of it and one is a pile of OOP. The former is more desirable. The data types are also obvious.

ehartsuyker10y ago

How is "1m" more obvious that "plusMonths(1)"? I mean really, the latter is basically and English phrase. I know it's cool to shit all over OOP and Java these days, but if Haskell/Clojure/Whatever had a function (plusMonths dt 1) or something that, you wouldn't call it a pile of FP.

1 more reply

stygiansonic10y ago

This is just an aside, and not a knock on the article, but Java does have a built-in Timestamp type, [1] and has had it for some time.

Maybe it's not "first-class" (not sure what this means in context?), but it's definitely there and not in a third-party JAR or anything.

However, it's bad for other reasons, the first being that it extends java.util.Date (the Javadoc seems to admit this) and combined with the related java.sql.Date (which also extends java.util.Date) makes for a very confusing API.

For this reason, Oracle recommends just using the new Date APIs, [2] and mapping a SQL TIMESTAMP to LocalDateTime.

1. https://docs.oracle.com/javase/8/docs/api/java/sql/Timestamp...

2. http://www.oracle.com/technetwork/articles/java/jf14-date-ti...

alblue10y ago

The original Java date and time API was probably one of the worst APIs ever invented, as I describe in my answer on StackOverflow: http://stackoverflow.com/a/1969651

However the new javax.time APIs (created by Stephen Colebourne) are excellent and probably one of the best designed APIs. It's funny what twenty years difference can make :)

jcranmer10y ago

It's worth noting that much of the original java.util.Date class was based on the same misfeatures as POSIX C:

1. Leap seconds? Those don't exist, right? 2. Years date from 1900. 3. January is Month 0 (ignoring the fact that there is already a widespread convention of numbering January 1).

Of course, the "fix" of Calendar didn't attempt to fix any of the POSIX-did-it-first problems but instead mostly limited itself to supporting other locales by allowing non-Gregorian calendars.

stygiansonic10y ago

Totally concur. But date/time is a hard thing to get right!

However, getting it wrong the first time is different than getting wrong again, as you pointed out with the Calendar type. 0 == JANUARY, but most other things are indexed from 1...

Third time's the charm, I guess :) Getting the input from Stephen Colebourne (of Joda-Time fame) was undoubtedly a key point in getting this done.

1 more reply

DiabloD310y ago

Why are we not using unix timestamps (seconds since the epoch) for timestamps on everything that doesn't require sub-second precision?

kabdib10y ago

We're getting into domains where sub-second precision is pretty important. Logs, for instance; on s aystem handling thousands of messages a second, storing log entries at a resolution of one second gets . . . irritating.

I like Windows NT's system of seconds-since-1600-or-so in increments of 100 nanoseconds. That system has served me well. 100ns isn't crazy to read directly and often from a hardware register (talk to a hardware engineer about latching a rapid counter sometime...), it covers a reasonable range for humans (handles everyone currently alive, for their whole lifetimes and most events they care about) and it fits pretty well with events that occur in multiprocessor systems (I'll be saying something different if I ever work on systems with terahertz clock rates, though).

NyxWulf10y ago

They are difficult to read for humans. Without using a computer can you tell me when this timestamp was taken? 1453383978

Ultimately that was the point of the article, but the tone of the article about how they got it right and everyone else is wrong bugged me.

My approach has been to store everything in epoch form, do all of my calculations and manipulations from there, then build tools that make converting back to a human readable representation when and where it is needed. I think the idea that you have a problem completely solved though just prevents the search for any improvements.

JadeNB10y ago

Skimming the article doesn't make it clear whether they've addressed this, but, if you want to add units of time of non-constant size, then you cannot just use time-stamps. For example, `(today + 1 year) - today` is longer than `(today + 2 years) - (today + 1 year)` [0], so that one can't think of them as just `(today.unix + seconds_in_year).human` and `(today.unix + 2*seconds_in_year).human`.

[0] Assuming that today + 1 year is defined to be 2017-01-21, and today + 2 years is defined to be 2018-01-21; and, if not, then one faces other problems with intuition.

1 more reply

jandrese10y ago

Who cares how pretty the internal representation is? Only the processor sees that, you output pretty printed versions of the timestamp as you mentioned. You can even add the necessary bits to your debugger to automatically convert timestamps if necessary.

Dealing with sub-second precision isn't that hard. You just need either a second value that holds the microseconds/nanoseconds or 64 bit time that counts nanoseconds since the epoch instead of seconds.

zimpenfish10y ago

You still need to record the timezone somewhere and you might as well just stick to ISO8601/RFC3339 format then.

NyxWulf10y ago

One approach is to store numeric timestamps in UTC format. Then if you do have data stored in a different timezone you have to include the timezone with it. That is the default mode of operation for many of the standard date formats.

If you get timestamp data from a system outside of your control though you always have to make sure you know what it means. At least half the time it seems like a date without a timezone isn't UTC, but in whatever the originating timezone was but the developers didn't include a timezone offset in the data...Timezones....the bane of my existence.

1 more reply

DiabloD310y ago

No I don't. UTC.

unabashedturtle10y ago

I work with timestamps A LOT. In fact, I often have to deal with streams of 5 million + /second. Even worse, for most of them, microsecond precision is critical.

I'm convinced most languages do timestamps wrong. Specifically, they separate out the "time" component from the "date" component.

What's a more common operation? Counting how many events happened in a span of time? Or shifting every timestamp by 15 days?

Timestamps should generally be designed for extremely fast and lightweight comparison, but keep enough information that a shift is doable. From my experience, all you need is: a unix timestamp, a microsecond (or nanosecond) offset, and the source timezone.

In this case, if you want to find elapsed time between two timestamps you simply subtract the unix timestamps and offsets. Very CPU friendly and easily vectorizable. You can do this even if the timestamps originated at different timezones (since everything is UTC under the hood).

What if you want to shift the date? Or group by date? Turns out computing the date on the fly is a very cheap operation. Easily can do hundreds of millions/second on a single high-end server core.

an use whatever calendar system floats your boat.

Any timestamp system that relies on year/month/day semantics is rarely going to be optimized for the most common operations users do with timestamps. Even worse, for simple comparison you run into all the weird edge-cases that you wouldn't have cared about if you stuck with a unix timestamp under the hood.

de_Selby10y ago

I'm surprised the author is so big on kerf but doesn't mention q/kdb+[1] at all, which kerf took almost all these ideas from and which has a much bigger user base.

1) https://kx.com/

acveilleux10y ago

This should enlighten you why: https://scottlocklin.wordpress.com/2015/12/15/an-introductio...

Basically, he's big on Kerf because he's involved in the development. The above blog post will tie it with Kx and other APLs.

https://getkerf.wordpress.com/ is the official Kerf blog. And Kerf is not meant to be free...

nickpsecurity10y ago

It's definitely an improvement in reader comprehension. I'll admit one of the reason I didn't learn those languages is that they looked like gibberish.

falsedan10y ago

Doing timestamps right: use a library or class to abstract away the way the data is stored from the intent of the actions you want to perform on them.

What the article sez: GET KERF IT'S THE BEST AND ONLY WAY TO SOLVE THIS

MikeTLive10y ago

  Numericaly sortable date [YYYY+][mm][dd].[HH][MM][SS],[NNNNNNNNN]
  date -u +%Y%m%d.%H%M%S,%N
  20160122.044145,052215000

  Absolute dates with nanos as seconds since epoch
  date -u +%s.%N
  1453437715.409682000

  Hyphenated,"chunks", with nanos and TZ offset.
  date -uIns
  2016-01-22T04:41:03,121501000+0000

lostcolony10y ago

This is interesting, and I can see that adding 1d10m is easier than adding 24 * 60 * 60000 + 10 * 60000 to a (millisecond) timestamp. Though I have to guess m is minute, not millisecond or month.

But...the display and storage of a timezone is a datetime, and it doesn't appear to have a timezone attached. Meaning I'd still rather just store/retrieve/work with millis since epoch, since that avoids any ambiguity about what timezone the timestamp takes place in. With just a datetime...I can ~assume~ it's GMT, but...is it? Millis since epoch are the same across all timezones, there is no ambiguity; datetimes vary, and I have to make assumptions, and make sure my library/driver/etc handles conversions correctly.

smartmic10y ago

I wonder if a closed source language for data processing tasks can thrive on the long run. The concept for timestamps is nice though. But I do not want to spend time of learning proprietary languages.

mollmerxOP10y ago

I think that's a very interesting question, to which the answer probably depends on how you define 'thrive'.

It seems unlikely to me that any language could become mainstream without being open source. The expectation that a compiler or interpreter should have its source available is only growing.

At the same time, kx's kdb+ is an example of a product that sits in a niche and has generated significant revenue despite being closed source. kdb+ has achieved this by primarily targeting financial services, which is a sector that's less sensitive to closed source than most, and is able to spend money on whichever product solves their problem.

If kerf manages to gain an edge over its competitors for a particular set of problems then sure, it can thrive in the long run the way kdb+ has.

xixi7710y ago

It could change in the future, but there are a plenty of languages that got pretty mainstream while closed source -- in terms of working with numbers and data, I'd definitely call matlab and sas mainstream, stata and mathematica are nearly mainstream, R of course is open source which is a huge reason for its popularity, but it's a clone of closed-source s+ which was fairly popular already. In the general-purpose world, c# and swift got most of their traction while still closed source, etc. Surely there are more examples in other domains... But yes, there is definitely a very welcome trend to open source stuff.

kevinmgranger10y ago

This post doesn't load at all for me without javascript.

mikegerwitz10y ago

I had a related comment earlier this week:

https://news.ycombinator.com/item?id=10930663

PhantomGremlin10y ago

The post is quite readable in Firefox w/o Javascript once you know the secret, which is

   View -> Page Style -> No Style

anc8410y ago

And it does not even manage to keep its formatting intact if you resize text. Why JS then?!

jedisct110y ago

Regarding timestamps: https://howtologatimestamp.info/

DannoHung10y ago

How's your treatment of timezones?

tempodox10y ago

  date -u +'%FT%H:%M:%S %Z'

Zash10y ago

    date -u +%FT%TZ

zimpenfish10y ago

    date -uIs

Shorter but sadly not RFC3339 compliant although it is ISO8601 compliant.

j / k navigate · click thread line to collapse

71 comments

philsnow10y ago

When you build that syntax into the language / tokenizer, you're making a bunch of choices for the user.

Jabbles10y ago

The ambiguity of "now + 1y" caused an outage for Azure:

https://azure.microsoft.com/en-us/blog/summary-of-windows-az...

ehartsuyker10y ago

> I mean, just try it in Java.

For ages JodaTime actually nailed it, and the Java 8 date API was based off this.

> Not an add on type as in R or Python or Java.

Again, let's talk about the modern version of the language and not act like prior screw ups are the end all for a language.

Also

> 2012.01.01 + 1m1d

How is that more clean than:

> new DateTime(2012, 1, 1).plusMonths(1).plusDays(1)

32bitkid10y ago

Personally, I think there is a more subtle conceptual idea that it glosses over that makes the more explicit Java version more "clean".

> D + 1m1d != D + 1d1m

Mixing time units like days, months, years (where units are intransitive) is, in my opinion, a bad idea.

nickpsecurity10y ago

Should be able to type or sanity check them like anything else. What specific issue are you worried about?

1 more reply

jandrese10y ago

IMHO, working on timestamps in code like this is just asking for edge case to bite you in the ass.

henrikschroder10y ago

You can't perform the operations "add one month" or "add one year" if you're working with seconds since the epoch.

For storing a timestamp, absolutely, you should use an integer-based format, counting discrete somethings since somewhen.

For working with timestamps, you need all the nuance, you need something that can manipulate the different parts of it independent of each other.

And finally, for displaying or reading timestamps, you need all the localization and parsing crap to figure out what "020304" means. Fourth of March 2002? Third of February 2004?

1 more reply

skywhopper10y ago

By my count it's over 30 characters less "clean". The Java syntax obscures some meaning and requires a lot more boilerplate in favor of less magical (and thus complex) syntax.

I'd love to see datetime objects promoted to similar first-class native syntax support in this way in more languages. It won't be appropriate everywhere, but in the right language it'd be amazing.

ehartsuyker10y ago

> overloading strings

I'm pretty sure writing "+ 1m" is more of an overloading of a string than ".plusMinutes(1)".

stanmancan10y ago

How is plusMonths(1).plusDays(1) obscure in any way? You could show that to my grandparents and they would be able to guess what it does. "+1m1d" on the other hand they wouldn't have a clue.

2 more replies

JadeNB10y ago

> By my count it's over 30 characters less "clean".

crispyambulance10y ago

The java "syntax" doesn't matter.

The IDE provides context-sensitive cues as one types so that you don't have the cognitive paper-cut from having to think for even a second if "1m" means "one minute" or "one month".

At a deeper level, chaining method calls to build-up an object is perfectly understandable way to modify something like a date.

xixi7710y ago

nickpsecurity10y ago

Because one is an expression like humans think of it and one is a pile of OOP. The former is more desirable. The data types are also obvious.

ehartsuyker10y ago

1 more reply

stygiansonic10y ago

This is just an aside, and not a knock on the article, but Java does have a built-in Timestamp type, [1] and has had it for some time.

Maybe it's not "first-class" (not sure what this means in context?), but it's definitely there and not in a third-party JAR or anything.

For this reason, Oracle recommends just using the new Date APIs, [2] and mapping a SQL TIMESTAMP to LocalDateTime.

1. https://docs.oracle.com/javase/8/docs/api/java/sql/Timestamp...

2. http://www.oracle.com/technetwork/articles/java/jf14-date-ti...

alblue10y ago

The original Java date and time API was probably one of the worst APIs ever invented, as I describe in my answer on StackOverflow: http://stackoverflow.com/a/1969651

However the new javax.time APIs (created by Stephen Colebourne) are excellent and probably one of the best designed APIs. It's funny what twenty years difference can make :)

jcranmer10y ago

It's worth noting that much of the original java.util.Date class was based on the same misfeatures as POSIX C:

1. Leap seconds? Those don't exist, right? 2. Years date from 1900. 3. January is Month 0 (ignoring the fact that there is already a widespread convention of numbering January 1).

Of course, the "fix" of Calendar didn't attempt to fix any of the POSIX-did-it-first problems but instead mostly limited itself to supporting other locales by allowing non-Gregorian calendars.

stygiansonic10y ago

Totally concur. But date/time is a hard thing to get right!

However, getting it wrong the first time is different than getting wrong again, as you pointed out with the Calendar type. 0 == JANUARY, but most other things are indexed from 1...

Third time's the charm, I guess :) Getting the input from Stephen Colebourne (of Joda-Time fame) was undoubtedly a key point in getting this done.

1 more reply

DiabloD310y ago

Why are we not using unix timestamps (seconds since the epoch) for timestamps on everything that doesn't require sub-second precision?

kabdib10y ago

NyxWulf10y ago

They are difficult to read for humans. Without using a computer can you tell me when this timestamp was taken? 1453383978

Ultimately that was the point of the article, but the tone of the article about how they got it right and everyone else is wrong bugged me.

JadeNB10y ago

[0] Assuming that today + 1 year is defined to be 2017-01-21, and today + 2 years is defined to be 2018-01-21; and, if not, then one faces other problems with intuition.

1 more reply

jandrese10y ago

zimpenfish10y ago

You still need to record the timezone somewhere and you might as well just stick to ISO8601/RFC3339 format then.

NyxWulf10y ago

1 more reply

DiabloD310y ago

No I don't. UTC.

unabashedturtle10y ago

I work with timestamps A LOT. In fact, I often have to deal with streams of 5 million + /second. Even worse, for most of them, microsecond precision is critical.

I'm convinced most languages do timestamps wrong. Specifically, they separate out the "time" component from the "date" component.

What's a more common operation? Counting how many events happened in a span of time? Or shifting every timestamp by 15 days?

What if you want to shift the date? Or group by date? Turns out computing the date on the fly is a very cheap operation. Easily can do hundreds of millions/second on a single high-end server core.

an use whatever calendar system floats your boat.

de_Selby10y ago

I'm surprised the author is so big on kerf but doesn't mention q/kdb+[1] at all, which kerf took almost all these ideas from and which has a much bigger user base.

1) https://kx.com/

acveilleux10y ago

This should enlighten you why: https://scottlocklin.wordpress.com/2015/12/15/an-introductio...

Basically, he's big on Kerf because he's involved in the development. The above blog post will tie it with Kx and other APLs.

https://getkerf.wordpress.com/ is the official Kerf blog. And Kerf is not meant to be free...

nickpsecurity10y ago

It's definitely an improvement in reader comprehension. I'll admit one of the reason I didn't learn those languages is that they looked like gibberish.

falsedan10y ago

Doing timestamps right: use a library or class to abstract away the way the data is stored from the intent of the actions you want to perform on them.

What the article sez: GET KERF IT'S THE BEST AND ONLY WAY TO SOLVE THIS

MikeTLive10y ago

  Numericaly sortable date [YYYY+][mm][dd].[HH][MM][SS],[NNNNNNNNN]
  date -u +%Y%m%d.%H%M%S,%N
  20160122.044145,052215000

  Absolute dates with nanos as seconds since epoch
  date -u +%s.%N
  1453437715.409682000

  Hyphenated,"chunks", with nanos and TZ offset.
  date -uIns
  2016-01-22T04:41:03,121501000+0000

lostcolony10y ago

This is interesting, and I can see that adding 1d10m is easier than adding 24 * 60 * 60000 + 10 * 60000 to a (millisecond) timestamp. Though I have to guess m is minute, not millisecond or month.

smartmic10y ago

mollmerxOP10y ago

I think that's a very interesting question, to which the answer probably depends on how you define 'thrive'.

It seems unlikely to me that any language could become mainstream without being open source. The expectation that a compiler or interpreter should have its source available is only growing.

If kerf manages to gain an edge over its competitors for a particular set of problems then sure, it can thrive in the long run the way kdb+ has.

xixi7710y ago

kevinmgranger10y ago

This post doesn't load at all for me without javascript.

mikegerwitz10y ago

I had a related comment earlier this week:

https://news.ycombinator.com/item?id=10930663

PhantomGremlin10y ago

The post is quite readable in Firefox w/o Javascript once you know the secret, which is