Parsing Date Strings with Time Zones in c++

C

codejockey

I am writing an app where i have to parse a date/string (pretty much
in any format). The string can include timezones both in numeric or
abbreviated form. The output should be in a standard format like ISO
or in a numeric format which captures no of seconds from 1970 Jan 1st
(or preferrably 1900).

I am currently using strptime and running it over a number of formats
and trying this. But strptime does not seem to have good support to
parse timezones (especially timzones in abbreviated format).

I have tried boost, but did not have much luck in getting timezones
parsed.

I did look at ICU, but could not even get it to parse properly. (I
might not have done a thorough job here).

Can anybody tell me which is a good library for my needs.
 
Ö

Öö Tiib

I am writing an app where i have to parse a date/string (pretty much
in any format). The string can include timezones both in numeric or
abbreviated form. The output should be in a standard format like ISO
or in a numeric format which captures no of seconds from 1970 Jan 1st
(or preferrably 1900).

I am currently using strptime and running it over a number of formats
and trying this. But strptime does not seem to have good support to
parse timezones (especially timzones in abbreviated format).

I have tried boost, but did not have much luck in getting timezones
parsed.

I did look at ICU, but could not even get it to parse properly. (I
might not have done a thorough job here).

Can anybody tell me which is a good library for my needs.

No. Requirement to parse a date/time text pretty much in any format,
language, style and local conventions takes pretty much human operator/
translator who may also fail. "half past 9 at morning fifth May past
year in London" translated to Turkish and then parsed? No way. You
should tighten up the requirements exactly in what formats you need to
parse the dates and how flexibly it should be customizable.
 
C

codejockey

No. Requirement to parse a date/time text pretty much in any format,
language, style and local conventions takes pretty much human operator/
translator who may also fail. "half past 9 at morning fifth May past
year in London" translated to Turkish and then parsed? No way.  You
should tighten up the requirements exactly in what formats you need to
parse the dates and how flexibly it should be customizable.- Hide quoted text -

- Show quoted text -

Ok, let me reduce the requirements.

If i have to parse only the abbreviated timezone strings in c++ how do
i do it?
Preferrably, i would like the function to interpret the timezone and
provide me the offset to gmt at least.

Or even simpler:
I would like to parse the below string in c++ and get the time
equivalent in gmt.
Tue, 08 May 2007 15:04:23 (IST)
 
J

James Kanze

On Oct 6, 7:38 am, Öö Tiib <[email protected]> wrote:

[...]
If i have to parse only the abbreviated timezone strings in c++ how do
i do it?
Preferrably, i would like the function to interpret the timezone and
provide me the offset to gmt at least.
Or even simpler:
I would like to parse the below string in c++ and get the time
equivalent in gmt.
Tue, 08 May 2007 15:04:23 (IST)

The response is almost impossible: IST, for example is
ambiguous, and can be UTC+01 (Irish summer time), UTC+02
(Israeli standard time), UTC+0330 (Iran standard time) or
UTC+0530 (Indian standard time).

Once you've chosen which short forms you want to support, and
what you want them to mean, you can create some sort of data
base to look them up. (I'd suggest a configuration file, with
name - offset in minutes in two columns, and an std::map
internally.) But if you want the actual offset, you'll need
more than that; you'll need some sort of way of determining if
and when summer time applies. (This leads to a four column
table: name, winter time offset, summer time offset, and an
identifier of the rule used to change.) And you still have the
problem that not all jurisdictions using the same timezone use
the same rules for summer time, so the results still might be
ambiguous.
 
G

Goran

Ok, let me reduce the requirements.

If i have to parse only the abbreviated timezone strings in c++ how do
i do it?
Preferrably, i would like the function to interpret the timezone and
provide me the offset to gmt at least.

You actually need to know UTC from local time, don't you? If so, and
setting aside any parsing consiiderations...

What do you consider a time zone? If "one of 24 "Earth" time zones",
forget it. You have countries/regions that do not respect "Earth" time
zones (e.g. have an offset of 5h 45min etc). But that's actually the
easy part. You have countries/regions, within the same zone, that use
different daylight savings. To get to the UTC from local time in a
given time zone, you need to know whether your local time falls into
winter or summer time. On top of everything, winter/summer time
changes over time (you have "rules" of the sort: DST switch occurs on
last Saturday of October from 1955-2005, on ffirst Sunday of November
in 2006, etc).

You can get these rules from the system (e.g. through tzdata under
Linux), but applying them to a given local time is another matter.
Standard library isn't of much help, because AFAIK it's made to work
with one time zone. I don't know whether there's a comprehensive
enough library to help, and I have been looking (I need to do
something similar for work - calculate UTC moments of DST switches for
a given time zone for some years in the future, and I do it "by
hand").

IOW, you don't know what you are letting yourself into, and parsing is
the least of your worries ;-).

Goran.
 
J

Jorgen Grahn

You actually need to know UTC from local time, don't you? If so, and
setting aside any parsing consiiderations...

What do you consider a time zone? If "one of 24 "Earth" time zones",
forget it. You have countries/regions that do not respect "Earth" time
zones (e.g. have an offset of 5h 45min etc).

I think it's pretty well-known that there are more than 24 zones. At
least the POSIX APIs uses a higher resolution than hours, and makes a
point of documenting that.

I thought that these zones *did* have standard names. Look at e.g. RFC
2822 to see what internet mail headers support -- the RFCs certainly
have a few sets of standard names, and I don't think IETF invented
them themselves.
But that's actually the
easy part. You have countries/regions, within the same zone, that use
different daylight savings. To get to the UTC from local time in a
given time zone, you need to know whether your local time falls into
winter or summer time. On top of everything, winter/summer time
changes over time (you have "rules" of the sort: DST switch occurs on
last Saturday of October from 1955-2005, on ffirst Sunday of November
in 2006, etc).

You can get these rules from the system (e.g. through tzdata under
Linux), but applying them to a given local time is another matter.
Standard library isn't of much help, because AFAIK it's made to work
with one time zone.

I *think* that on Linux you can at least say "I have this time in my
current time zone; please give me UTC time" and get the right answer
(except for times that happen during the DST-duplicated hour in fall
and the lost hour in spring) by passing -1 as the TZ in some system
call (I forget which, am too lazy to look it up, and too lazy to check
if POSIX makes guarantees here).

/Jorgen
 
J

James Kanze


[...]
I *think* that on Linux you can at least say "I have this time in my
current time zone; please give me UTC time" and get the right answer
(except for times that happen during the DST-duplicated hour in fall
and the lost hour in spring) by passing -1 as the TZ in some system
call (I forget which, am too lazy to look it up, and too lazy to check
if POSIX makes guarantees here).

As has been pointed out, it's simply not possible, even if the
manual claims to do it. To determine the right answer, you have
to know whether summer time is in effect or not, and to know
that, you have to know the jurisdiction, in addition to the time
zone.
 
J

Jorgen Grahn


[...]
I *think* that on Linux you can at least say "I have this time in my
current time zone; please give me UTC time" and get the right answer
(except for times that happen during the DST-duplicated hour in fall
and the lost hour in spring) by passing -1 as the TZ in some system
call (I forget which, am too lazy to look it up, and too lazy to check
if POSIX makes guarantees here).

I was thinking of mktime(3), and passing -1 in struct tm::tm_isdst.
As has been pointed out, it's simply not possible, even if the
manual claims to do it. To determine the right answer, you have
to know whether summer time is in effect or not, and to know
that, you have to know the jurisdiction, in addition to the time
zone.

I admit I read the thread sloppily ... Are you essentially saying you
cannot convert the brief timezone info in a textual timestamp to
system timezone info, because the former may cover many jurisdictions?

(If you *do* know the jurisdiction you should be OK. Timezone info (on
my machine) *does* have knowledge of plenty of jurisdictions;
/usr/share/zoneinfo is full of (among other things) DST rules for
places like restricted parts of Tasmania, a hut on the Bailey
Peninsula of Antarctica and so on.)

/Jorgen
 
J

James Kanze

[...]
I *think* that on Linux you can at least say "I have this
time in my current time zone; please give me UTC time" and
get the right answer (except for times that happen during
the DST-duplicated hour in fall and the lost hour in
spring) by passing -1 as the TZ in some system call (I
forget which, am too lazy to look it up, and too lazy to
check if POSIX makes guarantees here).
I was thinking of mktime(3), and passing -1 in struct tm::tm_isdst.
I admit I read the thread sloppily ... Are you essentially
saying you cannot convert the brief timezone info in a textual
timestamp to system timezone info, because the former may
cover many jurisdictions?
Exactly.

(If you *do* know the jurisdiction you should be OK. Timezone info (on
my machine) *does* have knowledge of plenty of jurisdictions;
/usr/share/zoneinfo is full of (among other things) DST rules for
places like restricted parts of Tasmania, a hut on the Bailey
Peninsula of Antarctica and so on.)

If you know the jurisdiction (and have all of the data files
installed), you're OK, but unlike the time zone, the
jurisdiction typically isn't part of the time and date string.

In the US, I believe that it is common to use things like EDT
instead of EST when you're in summer time. This is probably the
best approach (the sender knows whether he is in summer time or
not), but it's far from universal---most of the time, in Europe,
I've seen CET (central European time) used, regardless of
whether we're in summer time or not. (The Wikipedia gives CEST
and CEDT, but I've never seen either in actual use. I have
seen, occasionally, software which claims EET for France when
we're in summer time, but this seems to be the exception rather
than the rule. And leads to further confusion: is this EET
somewhere in eastern Europe, with summer time in effect, so
+0300, or is central Europe faking it, and +0200.)
 
R

Rui Maciel

codejockey said:
I am writing an app where i have to parse a date/string (pretty much
in any format). The string can include timezones both in numeric or
abbreviated form. The output should be in a standard format like ISO
or in a numeric format which captures no of seconds from 1970 Jan 1st
(or preferrably 1900).

For an ISO standard describing the representation of dates and times you should check out ISO 8601.


Rui Maciel
 
Ö

Öö Tiib

For an ISO standard describing the representation of dates and times you should check out ISO 8601.

Note that there are no time zones in ISO 8601 formats, there may be
given time offset from Zulu time (or Zulu time suffix "Z") by these
formats.
 
Ö

Öö Tiib

(If you *do* know the jurisdiction you should be OK. Timezone info (on
my machine) *does* have knowledge of plenty of jurisdictions;
/usr/share/zoneinfo is full of (among other things) DST rules for
places like restricted parts of Tasmania, a hut on the Bailey
Peninsula of Antarctica and so on.)

No. You are not OK. Jurisdictions do change as do their rules about
DST. Even if you know jurisdiction and present rules about DST for
that jurisdiction you may still fail if you lack information what were
DST rules there at 1985 when parsing such a date.

Overall all these abbreviations are pointless. "-06" is lot better
than "CST" or "Central Standard Time", "-05" is lot better than "CDT"
or "Central Daylight Time" and saying briefly that it was in Chicago
is outright asking for trouble. Even if jurisdiction of that village
hasn't changed recently it might in next ten years (but recorded time
may still stay important).
 
J

James Kanze

On 9 okt, 10:32, Jorgen Grahn <[email protected]> wrote:

[...]
No. You are not OK. Jurisdictions do change as do their rules about
DST. Even if you know jurisdiction and present rules about DST for
that jurisdiction you may still fail if you lack information what were
DST rules there at 1985 when parsing such a date.

He'll be OK for any date in the present, and the not too distant
future. And as the rules change, his system will automatically
update its database with the new rules (not deleting the old
ones, which will still be used for past dates).
Overall all these abbreviations are pointless. "-06" is lot better
than "CST" or "Central Standard Time", "-05" is lot better than "CDT"
or "Central Daylight Time" and saying briefly that it was in Chicago
is outright asking for trouble. Even if jurisdiction of that village
hasn't changed recently it might in next ten years (but recorded time
may still stay important).

Overall, UTC is really the only thing that works. Otherwise,
yes: [+-]0430 is better than an abbreviation. But even if all
software miraculously changed today, you'd still be stuck with
timestamps generated in the past. Pretending we can ignore
things like CET is just wishful thinking.
 
J

Jorgen Grahn

No. You are not OK. Jurisdictions do change as do their rules about
DST.

Well, it goes without saying that there's always *some* point where it
breaks down, doesn't it? If they announce tomorrow that .se will have
DST all year around, /someone/ will have to change something in
software.

(What are the alternatives, anyway?)
Even if you know jurisdiction and present rules about DST for
that jurisdiction you may still fail if you lack information what were
DST rules there at 1985 when parsing such a date.

It's feasible for the system to keep track of past rules[1]. I don't know
if they typically do, and frankly I don't care enough to check.

/Jorgen

[1] Since there's a lot of nitpicking going on here: I'd better
clarify that I'm talking about time_t times only, not entire weeks
getting misplaced in the 18th century, and other oddnesses which
more or less require a historian's help.
 
Ö

Öö Tiib

Well, it goes without saying that there's always *some* point where it
breaks down, doesn't it? If they announce tomorrow that .se will have
DST all year around, /someone/ will have to change something in
software.

(What are the alternatives, anyway?)

Just instead of recording some "Stockholm time" record raw Zulu time
or local with offset mentioned "+01" "+0100" "+01:00". Your system
tells you the offset easily if its so smart that it can tell same
about hut in Tasmania or what you had there. Greenwich meridian they
hopefully move less often, so whatever parses it does not need any
tables and no software needs to be modified because of some foam-head
politicians. Possibly you have there mature society anyway ... and
sane politicians, but not everybody are so lucky.
It's feasible for the system to keep track of past rules[1]. I don't know
if they typically do, and frankly I don't care enough to check.

They typically don't. Laws may demand you keep records about for
example financial transactions long enough that it starts to matter.
Since all business keeps globalizing the more it spreads the worse it
gets. Especially hot are always places with "developing markets" and
all the nonsense accompanying it. Immediately they want to see same
thing that wall clock shows. However if 5 years later the records do
not calculate to same result that they were then whose fault it is?
Sorry-ass developer of course who can't parse.
 
R

Rui Maciel

Öö Tiib said:
Note that there are no time zones in ISO 8601 formats, there may be
given time offset from Zulu time (or Zulu time suffix "Z") by these
formats.

That isn't particularly relevant. If someone intends to parse representations of date/time then the
starting point should be a standardized definition of those representations. From that point that
person is free to do anything he feels like doing, such as extending that format as they see fit.
For example, it is quite possible to rely on ISO 8601 to represent the UTC and then use a suffix to
represent the time zones.


Rui Maciel
 
J

James Kanze

Well, it goes without saying that there's always *some* point where it
breaks down, doesn't it? If they announce tomorrow that .se will have
DST all year around, /someone/ will have to change something in
software.

Not in the software, but in the databases the software uses.
Databases which are, on most systems, automatically kept up to
date via the internet.
(What are the alternatives, anyway?)
It's feasible for the system to keep track of past rules[1]. I
don't know if they typically do, and frankly I don't care
enough to check.

Unix, and I'm pretty sure Windows, do, although I don't know how
far back in the past they go. (Since a Unix time_t can't
represent dates before 1970, there's no point in going further
back under Unix.)
 
J

James Kanze

On 10 okt, 03:11, Jorgen Grahn <[email protected]> wrote:

[...]
Just instead of recording some "Stockholm time" record raw
Zulu time or local with offset mentioned "+01" "+0100"
"+01:00".

The problem isn't what you send, it's what you receive. Have
you actually seen any systems behaving this way?

You're actually proposing yet another standard. Which has just
about 0% chance of being accepted, because the goal is to
present the time in a format that local people can read the
local time easily. The best we can hope for is local time, with
an indication along the lines UTC-0200.

[...]
It's feasible for the system to keep track of past rules[1].
I don't know if they typically do, and frankly I don't care
enough to check.
They typically don't.

Then Unix isn't typical. Do you have some concrete information
concerning Windows, or are you just speculating?
Laws may demand you keep records about for example financial
transactions long enough that it starts to matter.

The shortest term I've heard about such laws is 50 years. A lot
say "forever" (but of course, only for transactions after the
law passed). On the other hand, must such transactions only use
dates, not times, so it doesn't matter.
 
Ö

Öö Tiib

On 10 okt, 03:11, Jorgen Grahn <[email protected]> wrote:

    [...]
Just instead of recording some "Stockholm time" record raw
Zulu time or local with offset mentioned "+01" "+0100"
"+01:00".

The problem isn't what you send, it's what you receive.  Have
you actually seen any systems behaving this way?

I have no copy under hand so i just humbly believe that it is like ISO
8601 specifies.
You're actually proposing yet another standard.  Which has just
about 0% chance of being accepted, because the goal is to
present the time in a format that local people can read the
local time easily.  The best we can hope for is local time, with
an indication along the lines UTC-0200.

Hmm? By ISO 8601 it was that suffix Z means UTC, otherwise you add
"±hh:mm" "±hhmm" or "±hh" at end to indicate offset. You may of course
omit the end but then who parses it has to know the time zone info.

You may put there " UTC-0200" to indicate offset, but then ISO 8601
compatible parser does not parse it and it has to be parsed
separately. If separate parser is expected to have tz database then
you may add some " America/Noronha" instead of " UTC-0200". On that
case parser has to look up the offset from database for that date.
Plus rest of the usual everyday nuisance and bugs.
    [...]
It's feasible for the system to keep track of past rules[1].
I don't know if they typically do, and frankly I don't care
enough to check.
They typically don't.

Then Unix isn't typical.  Do you have some concrete information
concerning Windows, or are you just speculating?

If there is Olson database present in Windows? No. Also HP-UX was
bundled without. I doubt that all the mobile devices currently up-
rocketing have it. Last i dealt with timezones in Windows then Windows
had some self-breed timezone ID that was outright ambiguous. Since
zoneinfo is public domain database one may of course kick it up and
bundle with his software (and update with service packs). It is not
convenient for each application but fine for bigger special purpose
package. Raw offset is winner anyway, nothing to do.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top