SimpleDateFormat Slow, Looking to Build or Find Faster One

Discussion in 'Java' started by Niko, Sep 16, 2003.

  1. Niko

    Niko Guest

    Hi,

    We are finding that SimpleDateFormat is pretty slow, if your trying to
    use it to pass millions of records. We improved upon it by added some
    caches in the code, if things like the Month was the same and so on
    but in all we find it to be a hog.

    For example we can pass 20,000 records a second, if they don't contain
    dates in them but when you add dates this can drop to 4,000.

    So does anyone know of a good class out there or before we go and
    build a faster one.

    TIA
     
    Niko, Sep 16, 2003
    #1
    1. Advertising

  2. Niko

    Roedy Green Guest

    On 15 Sep 2003 16:17:47 -0700, (Niko) wrote
    or quoted :

    >So does anyone know of a good class out there or before we go and
    >build a faster one.


    You might want to look into BigDate if you are dealing only with Dates
    not date/timestamps. It has a couple of toString methods. You could
    roll your own on those models which should be much faster than
    SimpleDateFormat.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 16, 2003
    #2
    1. Advertising

  3. Niko

    Roedy Green Guest

    Roedy Green, Sep 16, 2003
    #3
  4. "Niko" <> wrote in message
    news:...
    > Hi,
    >
    > We are finding that SimpleDateFormat is pretty slow, if your trying to
    > use it to pass millions of records. We improved upon it by added some
    > caches in the code, if things like the Month was the same and so on
    > but in all we find it to be a hog.
    >
    > For example we can pass 20,000 records a second, if they don't contain
    > dates in them but when you add dates this can drop to 4,000.
    >
    > So does anyone know of a good class out there or before we go and
    > build a faster one.


    If speed is the issue, you might want to consider turning the problem
    around. There are only 365 days in a year, so over a 100 year period there
    are only 36500 distinct dates. Pre-format those that are most likely to be
    in your range of dates and put them in a hash table, or use a simple
    indexing method. This completely sidesteps expensive string formatting
    problems and is especially good if there are many redundant dates.

    Cheers,
    Matt Humphrey http://www.iviz.com/
     
    Matt Humphrey, Sep 16, 2003
    #4
  5. Niko

    Niko Guest

    Thanks for the bigdate and the index lookup ideas, unfortunately I'm
    working with DateTimes, i.e 3rd Jun 1993 05:01:43. However I was
    thinking I could produce two hash tables, one for the time and one for
    the date, ignoring the year, split the string and lookup in both
    tables and adjust for year.



    "Matt Humphrey" <> wrote in message news:<APt9b.1265$>...
    > "Niko" <> wrote in message
    > news:...
    > > Hi,
    > >
    > > We are finding that SimpleDateFormat is pretty slow, if your trying to
    > > use it to pass millions of records. We improved upon it by added some
    > > caches in the code, if things like the Month was the same and so on
    > > but in all we find it to be a hog.
    > >
    > > For example we can pass 20,000 records a second, if they don't contain
    > > dates in them but when you add dates this can drop to 4,000.
    > >
    > > So does anyone know of a good class out there or before we go and
    > > build a faster one.

    >
    > If speed is the issue, you might want to consider turning the problem
    > around. There are only 365 days in a year, so over a 100 year period there
    > are only 36500 distinct dates. Pre-format those that are most likely to be
    > in your range of dates and put them in a hash table, or use a simple
    > indexing method. This completely sidesteps expensive string formatting
    > problems and is especially good if there are many redundant dates.
    >
    > Cheers,
    > Matt Humphrey http://www.iviz.com/
     
    Niko, Sep 16, 2003
    #5
  6. Niko

    Roedy Green Guest

    On 16 Sep 2003 10:42:28 -0700, (Niko) wrote
    or quoted :

    >However I was
    >thinking I could produce two hash tables, one for the time and one for
    >the date, ignoring the year, split the string and lookup in both
    >tables and adjust for year.


    "adjust for year" means recreating the logic in BigDate.

    you could precompute the strings for the days for a period of five
    years, and index to get the YYYYYMMDD part and then plop in the time
    part, but that is a rather big chunk of RAM.

    You could get BigDate to get you the date part. You have to do the
    time part yourself. You need a timezone adjust.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Sep 16, 2003
    #6
  7. "Niko" <> wrote in message
    news:...
    > Thanks for the bigdate and the index lookup ideas, unfortunately I'm
    > working with DateTimes, i.e 3rd Jun 1993 05:01:43. However I was
    > thinking I could produce two hash tables, one for the time and one for
    > the date, ignoring the year, split the string and lookup in both
    > tables and adjust for year.


    That's workable and equivalent to forming the string via indexed lookup, but
    with more lookup elements. Your tables for lookup would the day/month, the
    year, the hour, minute and second (all the same table from 0..59) Assemble
    them via a StringBuffer. There are over 84000 second time-stamps in a day,
    so that's a bit much for direct lookup. Part of what you're trying to avoid
    is the number-to-string conversion and the string assembly. This technique
    does not avoid the string assembly problem, but the number-to-string lookup
    is reduced to table index.

    Another way to avoid string assembly is to arrange the string to always have
    the same layout: e.g. 2dights, th|rd, sp, 3-letters, sp, dd:dd:dd. This way
    you only allocate the string once and copy the elements to fixed places. But
    I really only suggest this after a run with a serious profiler.

    Cheers,
     
    Matt Humphrey, Sep 16, 2003
    #7
  8. Niko

    Jim Sculley Guest

    Niko wrote:
    > Hi,
    >
    > We are finding that SimpleDateFormat is pretty slow, if your trying to
    > use it to pass millions of records. We improved upon it by added some
    > caches in the code, if things like the Month was the same and so on
    > but in all we find it to be a hog.
    >
    > For example we can pass 20,000 records a second, if they don't contain
    > dates in them but when you add dates this can drop to 4,000.


    On what hardware?

    >
    > So does anyone know of a good class out there or before we go and
    > build a faster one.
    >


    Are you using SimpleDateFormat correctly? You should not create a new
    instance for each record. I get a throughput of about 150,000 calls to
    format() per second using an array of one million random dates.


    > TIA
     
    Jim Sculley, Sep 16, 2003
    #8
  9. Niko

    Chris Uppal Guest

    Niko wrote:

    > Thanks for the bigdate and the index lookup ideas, unfortunately I'm
    > working with DateTimes, i.e 3rd Jun 1993 05:01:43.


    I realise this probably won't help, but do you actually *have* to format all
    the dates ? If you can arrange to keep them in their initial (not String) form
    thoughout, and only change them into strings when/if displayed to a user then
    you can avoid the overhead that way. That might well be difficult, but not
    necessarily worse than messing around with faster parsing or complex cacheing
    schemes.

    -- chris
     
    Chris Uppal, Sep 17, 2003
    #9
  10. Niko

    Niko Guest

    I create one single instance but when we look at the profiler we see a
    chunk of time spent in SimpleDateFormat, it may only be a few percent
    but when you are loading a file with 50 fields and maybe 8 dates then
    you really start to see the chunk grow. We spent a long time
    optimizing other parts of the code and even NIO showed no improvement
    over our enhanced buffered IO (though we prefer NIO as it reduces the
    amount of custom code) so it seams an awful shame to let
    SimpleDateFormat get away without being optimized.

    As for the source supplying pure dates, it sometimes can come like
    that but the code is part of data loading tool which is configurable
    for any data source that can come via Streams or Channels, and we only
    format the date for display at the very end. It's the passing that
    takes the time and creating a table with all known value sections
    doesn't scare us to much as memory is cheap and this type of software
    is running on big boxes overnight.

    Jim Sculley <> wrote in message news:<>...
    > Niko wrote:
     
    Niko, Sep 17, 2003
    #10
  11. Niko

    Wojtek Guest

    On 16 Sep 2003 10:42:28 -0700, (Niko)
    wrote:

    >Thanks for the bigdate and the index lookup ideas, unfortunately I'm
    >working with DateTimes, i.e 3rd Jun 1993 05:01:43. However I was
    >thinking I could produce two hash tables, one for the time and one for
    >the date, ignoring the year, split the string and lookup in both
    >tables and adjust for year.


    Why not break the date string into parts using StringTokenizer, then
    evaluate each part and build the input for a Calendar object, then
    evaluate on the Calendar object.

    3rd -> value of the number, ignore the text
    Jun -> lookup on 12 possibilities, less if you use progressive
    lookup (ie: check the first letter, if no match check the second, if
    no match check the third)
    1993 -> value
    substring on the time

    ------------------------
    Wojtek Bok
    Solution Developer
     
    Wojtek, Sep 18, 2003
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Manuel
    Replies:
    0
    Views:
    573
    Manuel
    Aug 28, 2003
  2. Sam Iam
    Replies:
    6
    Views:
    4,016
    satdmail
    Jul 19, 2006
  3. Raymond Hettinger

    Is a 32-bit build faster than a 64-bit build

    Raymond Hettinger, Nov 12, 2010, in forum: Python
    Replies:
    3
    Views:
    265
    Antoine Pitrou
    Nov 13, 2010
  4. Stuart Clarke

    Slow Find.find - real problem

    Stuart Clarke, Sep 4, 2010, in forum: Ruby
    Replies:
    7
    Views:
    151
    Stuart Clarke
    Sep 6, 2010
  5. Replies:
    21
    Views:
    1,093
    Juha Nieminen
    Aug 13, 2012
Loading...

Share This Page