Saturday 20 August 2011

What's wrong with DateTime anyway?

A few times after tweeting about Noda Time, people have asked why they should use Noda Time - they believe that the .NET date and time support is already good enough. Now obviously I haven't seen their code, but I suspect that pretty much any code base doing any work with dates and times will be clearer using Noda Time - and quite possibly more correct, due to the way that Noda Time forces you into making some decisions which are murky in .NET. This post is about the shortcomings of the .NET date and time API. Obviously I'm biased, and I hope this post isn't seen as disrespectful to the BCL team - aside from anything else, they work under a different set of constraints regarding COM interop etc.

What's does a DateTime mean?

When there's a Stack Overflow question about DateTime not doing quite what the questioner expected, I often find myself wondering what a particular value is meant to represent, exactly. It sounds simple - it's a date and time, right? But it gets rather more complicated as soon as you start thinking about it more carefully. For example, assuming the clock doesn't tick between the two property invocations, what should the value of "mystery" be in the following snippet?
DateTime utc = DateTime.UtcNow;
DateTime local = DateTime.Now;
bool mystery = local == utc;
I honestly don't know what this will do. There are three options which all make a certain amount of sense:
  • It should always be true: the two values are associated with the same instant in time, it's just that one is expressed locally and one is expressed universally
  • It should always be false: the two values represent different kinds of data, so are automatically unequal
  • It should return true if your local time zone is currently in sync with UTC, i.e. when time zones are disregarded completely, the two values are equal
I don't care much what the actual behaviour is - the fact that the behaviour is unobvious is a symptom of a deeper problem. It all comes back to the DateTime.Kind property which allows a DateTime to represent one of three kinds of value:
  • DateTimeKind.Utc: A UTC date and time
  • DateTimeKind.Local: A date and time which is a local time for the system the code is executing on
  • DateTimeKind.Unspecified: Um, tricky. Depends on what you do with it.
The value of the property affects various different operations in different ways. For example, if you call ToUniversalTime() on an "unspecified" DateTime, it will assume that you really meant it as a local value before. On the other hand, if you call ToLocalTime() on an "unspecified" DateTime, it will assume that you really meant it as a UTC value before. That's one model of behaviour.
If you construct a DateTimeOffset from a DateTime and a TimeSpan, the behaviour is somewhat different:
  • A UTC value is simple - you've given it UTC, and you want to represent "UTC + the specified offset"
  • A local value is only sometimes valid: the constructor validates that the offset from UTC at the specified local time in the system default time zone is the same as the offset you've specified.
  • An unspecified value is always valid, and represents the local time in some unspecified time zone, such that the offset is valid at the time.
I don't know about you, but this sort of thing gives me the semantic heebie-jeebies. It's like having a "number" type which has a sequence of digits - but you have to ask another property whether those digits are hex or decimal, and the answer can sometimes be "Well, what do you think?"
Of course, in .NET 1.1, DateTimeKind didn't even exist. This didn't mean the problem didn't exist - it means that the confusing behaviour which tries to make sense of a type which represents different kinds of value couldn't even try to be consistent. It had to be based on the context: it was as if it were permanently Unspecified.

Doesn't DateTimeOffset fix this?

Okay, so now we know we don't like DateTime much. Does DateTimeOffset help us? Well, somewhat. A DateTimeOffset value has a very definite meaning: it's a local date and time with a specific offset from UTC. I should probably take a moment to explain what I mean by "local" date and times - and instants - at this point.
A local date and time isn't tied to any particular time zone. At this moment, is it before or after "10pm on August 20th 2011"? It depends where you are in the world. (I'm leaving aside any non-ISO calendar representations for the moment, by the way.) So a DateTimeOffset contains a time-zone-independent component (that "10pm on ..." part) but also an offset from UTC - which means it can be converted to an instant on what I think of as the time line. Ignoring relativity, everyone on the planet experiences a a particular instant simultaneously. If I click my fingers (infinitely quickly!) then any particular event in the universe happened before that instant, at that instant or after that instant. Whether you were in a particular time zone or not is irrelevant. In that respect instants are global compared to the local date and time which any particular individual may have observed at a particular instant.
(Still with me? Keep going - I'm hoping that the previous paragraph will end up being the hardest in this post. It's a hugely important conceptual point though.)
So a DateTimeOffset maps to an instant, but also deals with a local date and time. That means it's not really an ideal type if we only want to represent a local date and time - but then neither is DateTime. A DateTime with a kind of DateTimeKind.Local isn't really local in the same sense - it's tied to the default time zone of the system it's running on. A DateTime with a kind of DateTimeKind.Unspecified is closer in some cases - such as when constructing a DateTimeOffset - but the semantics are odd in other cases, as described above. So neither DateTimeOffset nor DateTime are good types to use for genuinely local date and time values.
DateTimeOffset also isn't a good type to use if you want to tie yourself to a specific time zone, because it has no idea of the time zone which gave the relevant offset in the first place. As of .NET 3.5 there's a pretty reasonable TimeZoneInfo class, but no type which talks about "a local time in a particular time zone". So with DateTimeOffset you know what that particular time is in some unspecified time zone, but you don't know what the local time will be a minute later, as the offset for that time zone could change (usually due to daylight saving time changes).

What about dates and times?

So far I've only been talking about "date and time" values. What about date values and time values - values which only have one component or the other. It's more common to want to represent a date than a time, but both are common enough to be worth considering.
Now yes, you can use a DateTime for a date - heck, there's even the DateTime.Date property which will return the date for a particular date and time... but as another DateTime which happens to be at midnight. That's not at all the same as having a separate type which is readily identifiable as "just a date" (and likewise "just a time of day" - .NET uses TimeSpan for that, which again doesn't really feel quite right to me).

What about time zones themselves? Surely TimeZoneInfo is fine there.

As I said before, TimeZoneInfo isn't bad. It suffers from two major problems and some minor ones:
First, it's all based on Windows time zone IDs. That's natural enough - but it's not what the rest of the world uses. Every non-Windows system I've ever seen is based on the Olson (aka tz aka zoneinfo) time zone database, and the IDs assigned there. You may have seen IDs such as "Europe/London" or "America/Los_Angeles" - those are Olson identifiers. Talk to a web service offering geo information, chances are it'll talk in Olson identifiers. Interact with another calendaring system, chances are it'll talk in Olson identifiers. Now there are problems there too in terms of identifier stability, which the Unicode Consortium tries to address with CLDR... but at least you've got a good chance. It would be nice if TimeZoneInfo offered some kind of mapping between the two identifier schemes, or somewhere else in .NET did. (Noda Time knows about both sets of identifiers, although the mapping isn't publicly accessible just yet. This will be fixed before release.)
Second, it's based on DateTime and DateTimeOffset, which means you've got to be careful when you use it - if you assume one kind of DateTime when you're actually giving or receiving another kind, you may have problems. It's reasonably well documented, but frankly explaining this sort of thing is intrinsically hard enough without having to put everything in terms which are inconsistent.
Then there are a few issues around ambiguous or invalid local date and time values. These occur due to daylight saving changes: if the clock goes forward (e.g. from 1am to 2am) that introduces some invalid local date and time values (e.g. 1.30am doesn't occur on that day). If the clock goes backward (e.g. from 2am to 1am) that introduces ambiguities: 1.30am occurs twice. You can explicitly ask TimeZoneInfo whether a particular value is invalid or ambiguous, but it's easy to miss that it's even a possibility. If you try to convert a local value to a UTC value via a time zone, it will throw an exception if it's invalid but silently assume standard time (as opposed to daylight saving time) if it's ambiguous. That sort of decision leads developers to not even consider the possibilities involved. Speaking of which...

This all sounds too complicated.

You may be thinking at this point, "You're making a big deal out of nothing. I don't want to think about this stuff - why are you trying to make everything so complicated? I've been using the .NET API for years, and not had problems." If so, I suspect there are three broad possibilities:
  • You're far, far smarter than I am, and understood all of these intricacies through intuition. Your code always makes use of the right kind of DateTime, uses DateTimeOffset appropriately, and will always do the right thing with invalid or ambiguous local date and time values. No doubt you also write lock-free multi-threaded code sharing state in a way which is as efficient as possible but still rock solid. What the heck are you doing reading this in the first place?
  • You have run into these issues, but have mostly forgotten them - after all, they've only sucked away 10 minutes of your life at a time, as you experimented to get something that appeared to work (or at least made the unit tests pass; the unit tests which may well be conceptually wrong too). Maybe you've wondered about it, but decided that the problem was with you rather than the API.
  • You've never seen the problems, but only because you don't bother testing your code, which has so far only ever run in a single time zone, on computers which are always turned off at night (thus missing all daylight saving transitions). In some ways you're lucky, but you've got a time zone.
Okay, that was somewhat facetious, but it really is a problem. If you've never really thought about the difference between "local" times and "global" instants before, you should have done. It's an important distinction - similar to the distinction between binary floating point and decimal floating point types. Failures can be subtle, hard to diagnose, hard to explain, hard to correct, pervasive, and easy to reintroduce at another point of the program.
Handling date and time values is intrinsically tricky. There are nasty cases to think about like days which don't start at midnight due to daylight saving changes (for example, Sunday October 17th 2010 in Brazil started at 1am). If you're particularly unlucky you'll have to work with multiple calendar systems (Gregorian, Julian, Coptic, Buddhist etc). If you deal with dates and time around the start of the 20th century you may see some very odd time zone transitions as countries went from strictly-longitudinal offsets to mostly "round" values (e.g. Paris in 1911). You may need to deal with governments changing time zone transitions with only a couple of weeks' notice. You may need to deal with time zone identifiers changing (e.g. Asia/Calcutta to Asia/Kolcata - archived thread).
All of this is on top of the actual business rules you're trying to implement, of course. They may be complicated too. Given all this complexity, you should at least have an API which allows you to express what you mean relatively clearly.

So is Noda Time perfect then?

Of course not. Noda Time suffers several problems:
  • Despite all of the above, I'm a rank amateur when it comes to the theory of date and time. Leap seconds baffle me. The thought of a Julian-Gregorian calendar with a cutover point makes me want to cry, which is why I haven't quite implemented it yet. As far as I'm aware, no-one involved in Noda Time is an expert - although Stephen Colebourne, the author of Joda Time and lead of JSR-310 lurks on the mailing list. (Point of trivia: He was present at my first presentation on Noda Time. I asked if anyone happened to know the difference between the Gregorian calendar and the ISO-8601 calendar. He raised his hand and gave the correct answer, obviously. I asked how he happened to know it, and he replied, "I'm Stephen Colebourne." I nearly collapsed.)
  • We haven't finished yet. A beautifully designed API is useless if it isn't implemented.
  • There are bound to be bugs - the BCL team's code is exercised on hundreds of thousands of machines around the world all the time. Errors are likely to be picked up quickly.
  • We don't have any resources - we're a small group of active developers doing this for fun. I'm not saying that for pity (it's great fun) but for the inevitable issues around the amount of time that can be spent on features, documentation etc.
  • We're not part of the BCL. Want to use Noda Time in a LINQ to SQL (or even NHibernate) query? Good luck with that. Even if we succeed beyond my expectations, I'm not expecting other open source projects to take a dependency on us for ages.
Having said that, I am pleased with the overall design. We've tried to keep a balance between flexibility and providing one simple way of achieving any particular goal (with more to do, of course). I'll write another post some time about the design style we've been gradually evolving towards, comparing it with both Joda Time and .NET. The best outcome is the set of types to come out of it, each of which has a reasonably clear role. I won't bore you with all the details here - see other posts, documentation etc.
Ironically, the best outcome for the world would probably be for the BCL team to pick up on this post and decide to overhaul the API radically for .NET 6 (I'm assuming the ship has effectively sailed on .NET 5). While I'm enjoying doing this, I'm sure there are other projects I'd enjoy too - and frankly date and time is too important a concept to rest on my shoulders for the .NET community for long.

Conclusion

I hope I've persuaded you that the .NET API has significant flaws. I may have also persuaded you that Noda Time is worth looking at more closely, but that's a secondary goal really. If you truly understand the flaws in the built-in types - in particular the semantic ambiguity around DateTime - then you're more likely to use those types carefully and accurately in your code. That alone is enough to make me happy.

32 comments:

  1. Excellent post, Jon!

    I have encountered many of the issues you discussed. After diving into DateTime craziness for hours, I gave into a migraine and decided to acknowledge the problem existed and leave it at that. "Let Future Jim worry about it."

    Looks like I'm Future Jim now...

    ReplyDelete
    Replies
    1. I know... I always hate past-me for not taking these issues seriously. :)

      Delete
  2. "... the best outcome for the world would probably be for the BCL team to pick up on this post..."

    You're THE Jon Skeet. You don't have a red phone with them already? I would have expected you to chat regularly with Anders regarding the design of C#.Next :)

    ReplyDelete
  3. Great post Jon, glad you are doing this for us because I've had som much problems with dates in .NET. Some of them never solved because the underlying API is just wrong on so many levels.

    Thanks for this, really looking forward to using NodaTime.

    ReplyDelete
  4. I'm glad it's not just me who's struggling with the .NET time API. Indeed just recently I spent an hour or so (as I seem to do once every year or so) making sure I understand the bits of it that I'm using. Seems like each time I do this I realize that I know just enough to be dangerous ;-)

    The Noda factoring of this makes a lot of sense to me. Kudos to you guys for taking on this task!

    ReplyDelete
  5. Thanks a lot for this post, Jon.
    By the way, can you share some links that explains theory of date and time? I'm sure you're said you're an amateur in these things just because you're humble

    ReplyDelete
  6. The concept of dealing with datetime and timezones is quite interesting. But, I've not found any examples to use this Noda-Time; almost no samples or examples.

    Please provide the examples if possible.

    ReplyDelete
  7. This is just my two cents.. I haven't given this topic nearly as much thought. Would it be useful to analyze the problem not at the level of class library artifacts but at the level of programming problems and solution design? For instance, to define what classes of functionality require what levels of functionality to operate properly. I feel like time falls into the bucket of "cross cutting concern" although this is an abstract thought - it seems like maybe it's too hard to make a one-size-fits-all solution and ultimately something more like a guidance document would be most useful to programmers at large...

    ReplyDelete
  8. To answer the mystery: Depends on your current time zone. If timezone is UTC then its true. Otherwise its false.

    ReplyDelete
    Replies
    1. "Otherwise it's false" is incorrect; the DateTimeKind is NOT CHECKED when DateTime comparisons are made in DotNet. Try the following Unit Test; it passes in NZ(12-13 hours ahead of UTC)

      Assert.AreEqual(New DateTime(1999, 12, 31, 23, 59, 58, 571, DateTimeKind.Local), New DateTime(1999, 12, 31, 23, 59, 58, 571, DateTimeKind.Utc))

      Assert.AreEqual(New DateTime(1999, 12, 31, 10, 59, 58, 571, DateTimeKind.Utc), New DateTime(1999, 12, 31, 23, 59, 58, 571, DateTimeKind.Local).ToUniversalTime, "13 hours diff")

      Delete
  9. Great thing going here Jon. Have you thought of how you'll serialize/store this data (Any specific format)?

    If you're not sure why Dates and TimeZones are touchh...
    take your Birthdate. If you were born at 1:03am January 1, in a State/Province (or country) that spans more than 2 time zones, A government system might record that you were born on the 31st of December the previous year depending on how they save their dates. Sure they save the TimeZone offset so "Realistically" you were born on Jan 1st. But then that system exports your BirthTime to another application and forgets to put the TimeZone on it because it's a Legacy system... (like a downhill rolling snowball of yellow snow).

    ReplyDelete
  10. I have never had an issue with the DateTime struct in .NET. However, I have always made it a practice to work with UTC time exclusively, and only convert it to local time when displaying information to the user. I also handle offsets and time zone differences myself as well. This seems to avoid almost all localization issues/issues you outlined in your article. However, since I have never used DateTimeOffset or TimeZoneInfo, these issues may present themselves the second I ever try to let .NET manage my times.

    I actually wrote a rather useful class called DateTimeUTC (made this during a project where different times had to be compared at any given point anywhere in the world) which uses DateTime and TimeSpan as its backbone, but it follows my general DateTime practice above, and I haven't had any of the issues outlined in the article. I'll see if I can find the source sometime and post it.

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. Jon,
    How to store ZonedDateTime values to SQL Server? (Sorry for off-topic)

    ReplyDelete
  13. what's the difference between the Gregorian calendar and the ISO-8601 calendar ?

    ReplyDelete
    Replies
    1. From https://en.wikipedia.org/wiki/List_of_calendars#Calendaring_and_timekeeping_standards

      ISO 8601, standard based on the Gregorian calendar, Coordinated Universal Time and ISO week date, a leap week calendar system used with the Gregorian calendar

      Delete
    2. ISO-8601 sets up the possibility of a calendar for all time (if in agreement of the partners in information interchange). The Gregorian calendar begins on October 15th, 1582. ISO-8601 proleptically extends the Gregorian calendar back in time, including a year 0, prior to year 1. Year 0 is a leap year. And then negative years prior to year 0: year -1, year -2, etc. Year -4 is a leap year. Year -100 is not a leap year. ISO-8601 year 100 is not a leap year. Year 100 Julian is a leap year.

      Note that negative ISO-8601 years are susceptible to an "off by one" error with respect to B.C. years. Year 0 ISO-8601 is 1 B.C. Year -1 ISO-8601 is 2 B.C.

      My opinion: The ISO-8601 proleptic Gregorian calendar is a very sane system as different locales adopted the Gregorian calendar at different times. For example the US did not switch from Julian to Gregorian until 1752 (170 years after Italy). Specifying a {year, month, day} in a specific calendar (Julian, Gregorian, Islamic, Coptic, whatever) is easier and more precise than specifying {year, month, day} in a specific locale. Make each calendar proleptic, and create conversions between all calendars that can tolerate any date. That eliminates all ambiguities.

      Delete
  14. The Gregorian calender was put into effect starting October 15th, 1582. The time span until and including December 31st, 1582 is not included in ISO-8601 and requires proprietary extensions.

    ReplyDelete
  15. an Interface; lets say "IDateTime" for both types "LocalDateTime" and "OffsetDateTime" would bring both worlds a bit closer.

    ReplyDelete
  16. Your link "Asia/Calcutta to Asia/Kolcata" (http://comments.gmane.org/gmane.comp.time.tz/3500) is dead (Error 523 "Origin is unreachable")

    ReplyDelete
    Replies
    1. Thanks - I've updated it to an archived copy of the thread.

      Delete
  17. > a type which is readily identifiable as "just a date"

    I always had trouble with this notion of a C# DateTime being "just a date". What does that even mean?

    From the perspective of points in time (i.e. "instants") one might infer that "2019-09-17" means the start of the day, which is "2019-09-17 00:00:00". But suddenly timezone information becomes relevant again and we are not talking about "just a date".

    The other option is that the date is supposed to mean the whole day, so we are actually talking about a _timespan_ from "2019-09-17 00:00:00" to "2019-09-17 23:59:59" and timezone information cannot be ignored either.

    Why do developers think that there even exists a sound concept of something that is "just a date"? I can't find any use case where this model is not glossing over important details.

    ReplyDelete
    Replies
    1. Do you give people a different birth date for yourself depending on what time zone you happen to be in at any given time?

      Delete
    2. All mechanisms for storing a time are actually storing an interval of time at whatever precision the storage works at. If you have a timestamp class with millisecond precision then each possible value represents a millisecond-wide interval of time. A "Date" class, then, would be a timestamp with its precision set to a whole day. Using a DateTime to represent a Date means having to explicitly treat 86,400,000 (give or take 3.6 million depending on daylight savings, or 1000 when there's a leap second) unique DateTime values as equal.

      Possibly what we really need is to embrace the fact that time is continuous and introduce a TimeInterval class instead of trying to capture individual instants. Instead of testing for equality, a TimeInterval would be queried to see if another interval fits inside it, whether they overlap, etc. An "instant" would generally be an interval with a width of zero (or whatever our platform's smallest resolution can be).

      I've only just started thinking about this. Still plenty of ideas to work through about how an instant should be structured... Should it store two timestamps as the start and end times, or a start time and a duration? Maybe an inclusive start time and an exclusive end time (ie, the first timestamp that is NOT part of the range) so that it works like the [0..2] range operator?

      Delete
  18. DateTimeKind is your leading point of contention With a DateTime? Really?

    DateTimeKind is set when you construct a DateTime object. It is set correctly to one of the other values if you give it something that is specific. If you give it a vague string to parse and you don't set this, OF COURSE it isn't going to know what to do with it. This is on you, and not a problem with Datetime.

    I am not sure that building a library because you don't understand GIGO is a good solution.

    ReplyDelete
    Replies
    1. I understand GIGO perfectly well. That doesn't mean that designing one type that can be used incorrectly *really* easily (and where code that uses it properly needs conditions all over the place) is a good idea.

      9 years on, I'm still *very* happy to have written Noda Time, and pretty much everyone I've spoken to who's tried it has liked it too.

      If you're happy using DateTime in your codebase, that's fine... but I'm pretty certain the equivalent code using NodaTime would be clearer and less error-prone.

      Delete
  19. I was using NodaTime because for us it was necessary to correctly calculate instant time given local time (which might be on Daylight Saving) and latitude/longitude. No way I could do it without NodaTime.

    ReplyDelete
  20. Let me re-frame my critique for you.

    I have an engineer who sings Noda's praises. I have written code for more than a few decades that needs to be timezone aware, and the the only problem I have ever had is inexperienced devs who assume that local time is sufficient for everything. So I need to figure out what Noda is and what problem it solves.

    I come here, and I see your opening argument is that DateTime.Kind is set to a vague value when you initialize it with insufficient data. It seems like a pretty weak argument to encumber my code base with yet another library.

    My comment was not intended to be a 'You suck, turn in your compiler' imperative, but more of a, 'Your arguments about why Noda is essential are a little weak'. (My engineer was a little vague on the problems Noda will prevent, too. That's why I came looking...)

    It is nine plus years in since you wrote this, and I know who Jon Skeet is and that he under stand GIGO (Shots fired!). I'm just looking for the best, most up to date argument, that sells someone on why Noda will make times and dates easier than vanilla C# data structures.

    Pun fully intended, Thanks for taking the time to respond...

    ReplyDelete
  21. I've reread the post, and I stand by it. I don't think it's a weak argument - whereas I do think your somewhat ad hominem final paragraph in your first comment before was a disrespectful way of expressing yourself.

    If you really don't think there's anything wrong with a single type expressing the different kinds of date/time - that that's a perfectly fine design - then I think we'll just have to agree to disagree.

    ReplyDelete
  22. "whereas I do think your somewhat ad hominem final paragraph in your first comment before was a disrespectful way of expressing yourself."

    Well then, I do apologize. Text messages are a poor substitute for real conversation. The last line was intended to be read as a more playful jab, given your status on SO as a well known veteran coder, rather than a mean spirited attack.

    ReplyDelete