U.S.-centric dates and localized languages

Bill whinges a bit about the fact that date literals in VB are expressed using the U.S. date format, which is (of course) different that what most of the world uses. It’s a legitimate gripe, but there’s not much to be done about it. The cardinal rule that we follow with the language is that the current locale settings of a machine should never affect the compiler’s output. Thus, everything expressed in the language that’s processed at compile-time has to be in one canonical format and cannot be locale-dependent. And, of course, somewhere back in the mists of time when the canonical format was chosen, it just happened to be the format of the home country of the software… (This is also why you can’t do string conversions in constant expressions – string conversions are locale sensitive.)

It’s true, though, that the IDE used to be much more permissive about what it accepted as a date literal that you typed in (and which it would then convert to U.S. format). When we moved to .NET, we stopped using VarBstrToDate to convert strings that were input from the IDE and instead started accepting only legal canonical date strings. I don’t exactly remember why it was we did that, though…

I will add that VarBstrToDate is a very, very, very scary piece of code. As Greg Low notes in the comments on Bill’s entry, the old COM date conversions routines (on which IsDate is based) will do some truly wacky things. During the year and a half that I owned OLE Automation, I had to spend a good bit of time in the code for that routine and the finite state machine that it uses to try and recognize dates. Rather than just accepting the current locale, it has all kinds of weird logic to try and guess what you meant based on what’s possible. So if you call the routine on “24/1/04” using the U.S. locale, it’ll still parse it as January 24th, 2004 because it figured that the “24” is most likely a day value and that you were entering it in dd/mm/yy format. On the other hand, if you call the routine on “1/3/04” using the U.S. locale, you’ll always get January 3rd, 2004. Scary stuff.

(I also heard tell that at one point back in the mists of Excel 5 days, the VB team experimented with allowing the language syntax to be localizable. So you could say, I don’t know, “Por x = 1 a 5 : Escribe x : Siguiente x”, if you pardon my horrible Spanish. This turned out to be such a horrible idea that they quickly dropped it and never looked back. But supposedly the legacy lives on in OLE Automation and the fact that IDispatch takes a locale ID when converting names to DispIDs (i.e. so that you could have different method names for different languages). I can’t vouch for the absolute truthfulness of this story, but it’s one of those stories that’s “too good to check.”)

8 thoughts on “U.S.-centric dates and localized languages”

  1. If I understand it, the short form of what you are saying is this: "We made a a bad problem worse last time, so this time you can just have the bad problem in its simple form, and screw you if you don’t like it."

    In my job, I work with hand written dates which have to be unambiguous internationally. The agreed format we use is dd/mmm/yy or dd/mmm/yyyy (eg 01-Jul-04), with the choice of separator optional. Why can’t the "canonical" format follow this approach?

  2. Not sure about language syntax localization, but I know that there were regional versions of Word prior to 2000 that used localized field codes. So, for instance, instead of a TOC field code in the Spanish version of Word, it was TDC (tabla de contenido, maybe?). Caused me no end of trouble when trying to code company-wide Office solutions for an international client.

  3. Greg: I’m just saying that the canonical format is just that… canonical. The choice was made a long time ago, and changing it at this point would be extremely difficult and disruptive, even if we could all agree on a format that’s better. It’s just the way things are.

    I’m going to check on the IDE question, since it’s possible something can be done there…

  4. How about extending the syntax to permit the ISO date format: yyyy/mm/dd? It was designed specifically for this purpose.

    1. It’s an interesting idea, Eric… If the dates were lexically distinct, it might work, although we’d need to play around with it to make sure it didn’t look too confusing.

  5. … you can verify the truthfulness of the last paragraph very easily by switching your locale setting to some non-usa country. Excel uses commas to separate arguments in formulas but when you have locale set to some european country it uses semicolons (the same english version of excel on the same computer). If this isn’t insane then I don’t know what is. The same goes for exporting a sheet to CSV, commas in english locale and semicolons in others. One would think that people at microsoft know that csv means COMMA separated values.

  6. I guess the canonical format at the moment is such a PIA that the "cure" (a breaking change) may not be worse than the current "disease". If Longhorn is about getting it right and accepting not being backwards compatible, then maybe a new date format could be addressed there.

  7. > I can’t vouch for the absolute truthfulness of this story

    I can. I saw versions of VBA when I was an intern in the early 1990s that had "Fin Sub" on French machines.

    And yes, that is exactly why the dispatch name resolution code takes an lcid — so that you can have one method with multiple names in different locales.

    Even though no one uses this feature, ironically some inconsistencies in the way that the .NET – COM interop layer sets the dispatch id and the way vba set it in Excel have led to a rather complex localization bug in Visual Studio Tools for Office that we need to address. That’s a subject for another day though!

Leave a Reply