PEP 321: Date/Time Parsing and Formatting

Discussion in 'Python' started by Gerrit Holl, Nov 17, 2003.

  1. Gerrit Holl

    Gerrit Holl Guest

    Posted with permission from the author.
    I have some comments on this PEP, see the (coming) followup to this message.

    PEP: 321
    Title: Date/Time Parsing and Formatting
    Version: $Revision: 1.3 $
    Last-Modified: $Date: 2003/10/28 19:48:44 $
    Author: A.M. Kuchling <>
    Status: Draft
    Type: Standards Track
    Content-Type: text/x-rst
    Python-Version: 2.4
    Created: 16-Sep-2003
    Post-History:


    Abstract
    ========

    Python 2.3 added a number of simple date and time types in the
    ``datetime`` module. There's no support for parsing strings in various
    formats and returning a corresponding instance of one of the types.
    This PEP proposes adding a family of predefined parsing function for
    several commonly used date and time formats, and a facility for generic
    parsing.

    The types provided by the ``datetime`` module all have
    ``.isoformat()`` and ``.ctime()`` methods that return string
    representations of a time, and the ``.strftime()`` method can be used
    to construct new formats. There are a number of additional
    commonly-used formats that would be useful to have as part of the
    standard library; this PEP also suggests how to add them.


    Input Formats
    =======================

    Useful formats to support include:

    * `ISO8601`_
    * ARPA/`RFC2822`_
    * `ctime`_
    * Formats commonly written by humans such as the American
    "MM/DD/YYYY", the European "YYYY/MM/DD", and variants such as
    "DD-Month-YYYY".
    * CVS-style or tar-style dates ("tomorrow", "12 hours ago", etc.)

    XXX The Perl `ParseDate.pm`_ module supports many different input formats,
    both absolute and relative. Should we try to support them all?

    Options:

    1) Add functions to the ``datetime`` module::

    import datetime
    d = datetime.parse_iso8601("2003-09-15T10:34:54")

    2) Add class methods to the various types. There are already various
    class methods such as ``.now()``, so this would be pretty natural.::

    import datetime
    d = datetime.date.parse_iso8601("2003-09-15T10:34:54")

    3) Add a separate module (possible names: date, date_parse, parse_date)
    or subpackage (possible names: datetime.parser) containing parsing
    functions::

    import datetime
    d = datetime.parser.parse_iso8601("2003-09-15T10:34:54")


    Unresolved questions:

    * Naming convention to use.
    * What exception to raise on errors? ValueError, or a specialized exception?
    * Should you know what type you're expecting, or should the parsing figure
    it out? (e.g. ``parse_iso8601("yyyy-mm-dd")`` returns a ``date`` instance,
    but parsing "yyyy-mm-ddThh:mm:ss" returns a ``datetime``.) Should
    there be an option to signal an error if a time is provided where
    none is expected, or if no time is provided?
    * Anything special required for I18N? For time zones?


    Generic Input Parsing
    =======================

    Is a strptime() implementation that returns ``datetime`` types sufficient?

    XXX if yes, describe strptime here. Can the existing pure-Python
    implementation be easily retargeted?


    Output Formats
    =======================

    Not all input formats need to be supported as output formats, because it's
    pretty trivial to get the ``strftime()`` argument right for simple things
    such as YYYY/MM/DD. Only complicated formats need to be supported; RFC2822
    is currently the only one I can think of.

    Options:

    1) Provide predefined format strings, so you could write this::

    import datetime
    d = datetime.datetime(...)
    print d.strftime(d.RFC2822_FORMAT) # or datetime.RFC2822_FORMAT?

    2) Provide new methods on all the objects::

    d = datetime.datetime(...)
    print d.rfc822_time()


    Relevant functionality in other languages includes the `PHP date`_
    function (Python implementation by Simon Willison at
    http://simon.incutio.com/archive/2003/10/07/dateInPython)


    References
    ==========

    ... _RFC2822: http://rfc2822.x42.com

    ... _ISO8601: http://www.cl.cam.ac.uk/~mgk25/iso-time.html

    ... _ParseDate.pm: http://search.cpan.org/author/MUIR/Time-modules-2003.0211/lib/Time/ParseDate.pm

    ... _ctime: http://www.opengroup.org/onlinepubs/007908799/xsh/asctime.html

    ... _PHP date: http://www.php.net/date

    Other useful links:

    http://www.egenix.com/files/python/mxDateTime.html
    http://ringmaster.arc.nasa.gov/tools/time_formats.html
    http://www.thinkage.ca/english/gcos/expl/b/lib/0tosec.html


    Copyright
    =========

    This document has been placed in the public domain.

    yours,
    Gerrit.

    --
    157. If any one be guilty of incest with his mother after his father,
    both shall be burned.
    -- 1780 BC, Hammurabi, Code of Law
    --
    Asperger Syndroom - een persoonlijke benadering:
    http://people.nl.linux.org/~gerrit/
    Kom in verzet tegen dit kabinet:
    http://www.sp.nl/
     
    Gerrit Holl, Nov 17, 2003
    #1
    1. Advertising

  2. Gerrit Holl

    Paul Moore Guest

    Gerrit Holl <> writes:

    > Python 2.3 added a number of simple date and time types in the
    > ``datetime`` module. There's no support for parsing strings in various
    > formats and returning a corresponding instance of one of the types.
    > This PEP proposes adding a family of predefined parsing function for
    > several commonly used date and time formats, and a facility for generic
    > parsing.


    I assume you're aware of Gustavo Niemeyer's DateUtil module
    (https://moin.conectiva.com.br/DateUtil)?

    I'm not 100% sure how the parser functionality fits in with this PEP.
    It seems to me that this PEP is more focused on parsing specifically
    formatted data (not something I need often) whereas Gustavo's function
    is about parsing highly general "human input" formats.

    As most of my date parsing need is for user input parameters and the
    like, I prefer Gustavo's module :)

    [After reading through this PEP and commenting, I'd say that my
    preference (which may not be Gustavo's!) would be to add dateutil to
    the standard library, with the following changes/additions:

    1. Add a dateutil.RFC822_FORMAT for output of RFC822-compliant dates.
    2. Extend dateutil.parser.parse to handle additional (CVS-style)
    possibilities - today, tomorrow, yesterday, things like that.
    3. Add dateutil.parser.strptime as a wrapper round time.strptime.

    I think that's all.]

    > * Formats commonly written by humans such as the American
    > "MM/DD/YYYY", the European "YYYY/MM/DD", and variants such as
    > "DD-Month-YYYY".


    UK format DD/MM/YYYY is worth adding (in my UK-based opinion :)) But
    you can get all of these via strptime (wrapped to return a datetime
    value).

    > * CVS-style or tar-style dates ("tomorrow", "12 hours ago", etc.)


    That would be nice. I assume it should be combined with a highly
    flexible parser, so that the same function that handles "tomorrow"
    will also handle "12-dec-2003". This would basically be like Gustavo's
    parser, but with extended functionality (Gustavo's doesn't handle
    things like "tomorrow").

    > 3) Add a separate module (possible names: date, date_parse, parse_date)
    > or subpackage (possible names: datetime.parser) containing parsing
    > functions::
    >
    > import datetime
    > d = datetime.parser.parse_iso8601("2003-09-15T10:34:54")


    I'd go for this option. Actually, I'd support including Gustavo's
    dateutil module in the standard library. This PEP then involves adding
    a number of additional (specialised) parsers to the dateutil.parser
    subpackage.

    > * What exception to raise on errors? ValueError, or a specialized exception?


    ValueError seems perfectly adequate.

    > * Should you know what type you're expecting, or should the parsing figure
    > it out? (e.g. ``parse_iso8601("yyyy-mm-dd")`` returns a ``date`` instance,
    > but parsing "yyyy-mm-ddThh:mm:ss" returns a ``datetime``.)


    I don't think that the functions should return a type which depends on
    the input (I'd push that as a general rule, but I've probably missed
    an obvious counterexample - nevermind, I think it applies here
    regardless).

    > Should there be an option to signal an error if a time is provided
    > where none is expected, or if no time is provided?


    I think that returning a datetime always, with a zero time component
    when no time is specified, should be enough. You can use the date()
    method of datetime instances to get just the date part if you want it.
    But this is something that should be prototyped - real-world use is
    far more important here than theoretical considerations.

    > * Anything special required for I18N? For time zones?


    Scary. Do we need to parse "21-janvier-2001"? Only if in a
    French-speaking locale?

    > Generic Input Parsing
    > =======================
    >
    > Is a strptime() implementation that returns ``datetime`` types sufficient?
    >
    > XXX if yes, describe strptime here. Can the existing pure-Python
    > implementation be easily retargeted?


    Not sufficient, but very useful. It effectively covers all of the
    fixed-format cases (with a suitable format string). And it does I18N,
    I believe (hard to tell in a UK locale...)

    Options:

    * class methods on the 3 datetime classes. This might be hard,
    because datetime is a C extension, and strptime is Python.
    * Modify strptime to return a datetime value rather than a
    struct_time. But this isn't backward compatible, and so is
    probably not on. Shame, as it feels like the right answer.
    * Have a new function in the time module. Either just a wrapper
    round strptime, or a modified strptime, with strptime changed
    into a wrapper round the new function. But a good name is going
    to be hard to come up with.
    * Add a new parameter to strptime (datetime=True or something).
    Ugly, and violates my "functions shouldn't return different
    types depending on their arguments" comment above.
    * A function in a new module - something like
    dateutil.parser.strptime, as a wrapper round time.strptime.
    (Excuse the subliminal advertising for Gustavo's module - change
    the name if you prefer :))

    > Output Formats
    > =======================
    >
    > Not all input formats need to be supported as output formats, because it's
    > pretty trivial to get the ``strftime()`` argument right for simple things
    > such as YYYY/MM/DD. Only complicated formats need to be supported; RFC2822
    > is currently the only one I can think of.


    An *output* format for RFC2822 compliant dates shouldn't be too hard,
    surely? Ah, I see what you mean. It's possible, but hard to
    *remember*, so it's best to define it somewhere. Good point.

    > Options:
    >
    > 1) Provide predefined format strings, so you could write this::
    >
    > import datetime
    > d = datetime.datetime(...)
    > print d.strftime(d.RFC2822_FORMAT) # or datetime.RFC2822_FORMAT?


    This is what I'd prefer. A module-level constant in a dateutil module
    would be fine for me, too.

    > 2) Provide new methods on all the objects::
    >
    > d = datetime.datetime(...)
    > print d.rfc822_time()


    Seems overkill. And I'd rather just have strftime for all date output
    formatting - one way of doing things, and all that.

    Paul.
    --
    This signature intentionally left blank
     
    Paul Moore, Nov 17, 2003
    #2
    1. Advertising

  3. Gerrit Holl wrote:
    > Posted with permission from the author.
    > I have some comments on this PEP, see the (coming) followup to this message.
    >
    > PEP: 321
    > Title: Date/Time Parsing and Formatting

    <<SNIP>>
    >
    > Abstract
    > ========
    >
    > Python 2.3 added a number of simple date and time types in the
    > ``datetime`` module. There's no support for parsing strings in various
    > formats and returning a corresponding instance of one of the types.
    > This PEP proposes adding a family of predefined parsing function for
    > several commonly used date and time formats, and a facility for generic
    > parsing.
    >
    > The types provided by the ``datetime`` module all have
    > ``.isoformat()`` and ``.ctime()`` methods that return string
    > representations of a time, and the ``.strftime()`` method can be used
    > to construct new formats. There are a number of additional
    > commonly-used formats that would be useful to have as part of the
    > standard library; this PEP also suggests how to add them.
    >

    <<SNIP>>
    >
    > Unresolved questions:
    >
    > * Naming convention to use.
    > * What exception to raise on errors? ValueError, or a specialized exception?
    > * Should you know what type you're expecting, or should the parsing figure
    > it out? (e.g. ``parse_iso8601("yyyy-mm-dd")`` returns a ``date`` instance,
    > but parsing "yyyy-mm-ddThh:mm:ss" returns a ``datetime``.) Should
    > there be an option to signal an error if a time is provided where
    > none is expected, or if no time is provided?
    > * Anything special required for I18N? For time zones?
    >


    I am in favour of there being an intelligent 'guess the format'
    routine that would be easy to use, but maybe computationally
    inefficient, backed up by a computationally efficient routine where
    you specify the format. This latter case being split into two sub
    items: first where the parsing routine is passed a constant
    representing one of the standard formats and another where the parsing
    routine is passed a string representing the format.
    datetime.datetime.parse("

    datetime.datetime.parse("1985-08-13 15:03")
    gives: datetime(1985, 8, 13, 13, 5)
    datetime.date.parse("1985-08-13 15:03")
    gives: date(1985, 8, 13) # You asked for the date, date was found
    # first in string and converted
    datetime.date.parse("13/08/1985", "%d/%m/%Y")
    gives: date(1985, 8, 13)
    datetime.datetime.parse("1985-08-13 15:03", datetime.ISO8601)
    gives: datetime(1985, 8, 13, 13, 5)

    The idea being for the parser to be able to automatically extract a
    date from one of the standard formats it knows, or to accept a
    strptime type string for unknown formats.

    (Apologies to Gerrit for using values from his reply)

    Cheers, Paddy.
     
    Paddy McCarthy, Nov 18, 2003
    #3
  4. Gerrit Holl

    John Roth Guest

    "Paddy McCarthy" <> wrote in message
    news:...

    > I am in favour of there being an intelligent 'guess the format'
    > routine that would be easy to use, but maybe computationally
    > inefficient, backed up by a computationally efficient routine where
    > you specify the format.


    The trouble with "guess the format" is that it's not possible
    to do it correctly in the general case from one sample.
    Given enough samples of one consistent format, it's
    certainly possible. However, that's a two pass process.

    John Roth

    >
    > Cheers, Paddy.
     
    John Roth, Nov 18, 2003
    #4
  5. Gerrit Holl

    Gerrit Holl Guest

    Paul Moore wrote:
    > Gerrit Holl <> writes:
    > > Python 2.3 added a number of simple date and time types in the
    > > ``datetime`` module. There's no support for parsing strings in various
    > > formats and returning a corresponding instance of one of the types.
    > > This PEP proposes adding a family of predefined parsing function for
    > > several commonly used date and time formats, and a facility for generic
    > > parsing.

    >
    > I assume you're aware of Gustavo Niemeyer's DateUtil module
    > (https://moin.conectiva.com.br/DateUtil)?


    I was not, actually. Thanks for the link.
    It looks like a very comprehensive library!
    The example actually calculates the next time I'm having birthday on
    friday the 13th :)

    > [After reading through this PEP and commenting, I'd say that my
    > preference (which may not be Gustavo's!) would be to add dateutil to
    > the standard library, with the following changes/additions:


    Sounds like a good idea.

    >
    > > * Formats commonly written by humans such as the American
    > > "MM/DD/YYYY", the European "YYYY/MM/DD", and variants such as
    > > "DD-Month-YYYY".

    >
    > UK format DD/MM/YYYY is worth adding (in my UK-based opinion :)) But
    > you can get all of these via strptime (wrapped to return a datetime
    > value).


    I don't think so. One you just as well add the German "D.M.YY", and many
    others.

    > > 3) Add a separate module (possible names: date, date_parse, parse_date)
    > > or subpackage (possible names: datetime.parser) containing parsing
    > > functions::
    > >
    > > import datetime
    > > d = datetime.parser.parse_iso8601("2003-09-15T10:34:54")

    >
    > I'd go for this option.


    It depends on how comprehensive it would be. Gustavo's DateUtil module
    does a lot more than this PEP suggests. For an implementation of this
    PEP, I think a seperate module is not necessary. For DateUtil, I think
    it is.

    yours,
    Gerrit.

    --
    185. If a man adopt a child and to his name as son, and rear him, this
    grown son can not be demanded back again.
    -- 1780 BC, Hammurabi, Code of Law
    --
    Asperger Syndroom - een persoonlijke benadering:
    http://people.nl.linux.org/~gerrit/
    Kom in verzet tegen dit kabinet:
    http://www.sp.nl/
     
    Gerrit Holl, Nov 18, 2003
    #5
  6. On Mon, 17 Nov 2003 22:00:38 +0000,
    Paul Moore <> wrote:
    > I'd go for this option. Actually, I'd support including Gustavo's
    > dateutil module in the standard library. This PEP then involves adding
    > a number of additional (specialised) parsers to the dateutil.parser
    > subpackage.


    Actually I think the PEP mostly evaporates, especially if verbal dates
    aren't covered. The common cases are then trivial with DateUtil, leaving
    only a few cases such as RFC-2822 times.

    --amk
     
    A.M. Kuchling, Nov 18, 2003
    #6
  7. Gerrit Holl

    Paul Moore Guest

    "John Roth" <> writes:

    > The trouble with "guess the format" is that it's not possible
    > to do it correctly in the general case from one sample.
    > Given enough samples of one consistent format, it's
    > certainly possible. However, that's a two pass process.


    I think you can do it with a hint or two. The key one is whether in
    ambiguous cases, you choose DD/MM or MM/DD. You need a second hint
    with 2-digit years, as 01-02-03 is *very* ambiguous (given that
    putting the year in the middle is insane, you only need a flag saying
    whether the year is at the start or the end).

    I'm not sure what other ambiguities you'd need to cater for?

    Paul.
    --
    This signature intentionally left blank
     
    Paul Moore, Nov 18, 2003
    #7
  8. Gerrit Holl

    Paul Moore Guest

    "A.M. Kuchling" <> writes:

    > On Mon, 17 Nov 2003 22:00:38 +0000,
    > Paul Moore <> wrote:
    >> I'd go for this option. Actually, I'd support including Gustavo's
    >> dateutil module in the standard library. This PEP then involves adding
    >> a number of additional (specialised) parsers to the dateutil.parser
    >> subpackage.

    >
    > Actually I think the PEP mostly evaporates, especially if verbal dates
    > aren't covered. The common cases are then trivial with DateUtil, leaving
    > only a few cases such as RFC-2822 times.


    The PEP is pretty borderline, in any case. Not because the
    functionality isn't useful, but because most of it exists somewhere
    already. So the PEP is more of the form "now that we have datetime,
    consolidating the parsing stuff would be good".

    Specifically:

    def dt_strptime(s, fmt):
    tm = time.strptime(s, fmt)[:6]
    return datetime(*tm)

    def dt_rfc2822(s):
    tm = email.Utils.parsedate(s)[:6]
    return datetime(*tm)

    This isn't to say that these are immediately obvious (it took me a
    while to realise that using the * form of call saved a horribly long
    and ugly constructor call)

    If this is worth doing, I'd have to say that time.strptime, and
    email.Utils.parsedate should get deprecated in favour of the "new
    forms". And I'm not sure I can see that being acceptable.

    I guess I'm -0 on the PEP as it stands. Incorporate it into a more
    general "date/time utilities" module, and I'm +1.

    Also, I'm -1 on adding anything to the datetime module itself (this
    includes adding more classmethods to the types). The module is clean,
    and lean as it stands. Bloating it (particularly under the banner of
    "it's more OO to keep the functions as part of the classes") doesn't
    appeal to me at all.

    Paul.
    --
    This signature intentionally left blank
     
    Paul Moore, Nov 18, 2003
    #8
  9. Gerrit Holl

    John Roth Guest

    "Paul Moore" <> wrote in message
    news:...
    > "John Roth" <> writes:
    >
    > > The trouble with "guess the format" is that it's not possible
    > > to do it correctly in the general case from one sample.
    > > Given enough samples of one consistent format, it's
    > > certainly possible. However, that's a two pass process.

    >
    > I think you can do it with a hint or two. The key one is whether in
    > ambiguous cases, you choose DD/MM or MM/DD. You need a second hint
    > with 2-digit years, as 01-02-03 is *very* ambiguous (given that
    > putting the year in the middle is insane, you only need a flag saying
    > whether the year is at the start or the end).
    >
    > I'm not sure what other ambiguities you'd need to cater for?


    Those are basically it. I've played around with doing "intelligent"
    parsing, and I'm absolutely against providing hints. If you're
    processing a file with dates all in one format, it will frequently
    give the wrong answers for a substantial number of them.
    In other words, hints don't give your program the capacity
    to learn from experiance. Scanning a number of cases and
    noting which fields contained numbers > 31 (or 0), or numbers
    greater than 12, does.

    In any case, there are three formats, and you can't
    always depend on a delimiter to tell you where the year
    is for 8 digit inputs. Lots of the inputs I've seen have not
    had delimiters. On the other hand, a lot of them have
    been guaranteed to be in the late 19th century or later.
    That's a hint worth having.

    John Roth
    >
    > Paul.
    > --
    > This signature intentionally left blank
     
    John Roth, Nov 18, 2003
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Berg
    Replies:
    0
    Views:
    844
    Chris Berg
    Oct 27, 2003
  2. Peter Grison

    Date, date date date....

    Peter Grison, May 28, 2004, in forum: Java
    Replies:
    10
    Views:
    3,367
    Michael Borgwardt
    May 30, 2004
  3. Gerrit Holl
    Replies:
    12
    Views:
    707
    Gerrit Holl
    Nov 20, 2003
  4. Keith Cochrane
    Replies:
    2
    Views:
    649
    Keith Cochrane
    Aug 6, 2006
  5. David Hearn
    Replies:
    1
    Views:
    443
    David Hearn
    Sep 10, 2003
Loading...

Share This Page