SimpleDateFormat Slow, Looking to Build or Find Faster One

Niko · Sep 16, 2003

Hi,

We are finding that SimpleDateFormat is pretty slow, if your trying to
use it to pass millions of records. We improved upon it by added some
caches in the code, if things like the Month was the same and so on
but in all we find it to be a hog.

For example we can pass 20,000 records a second, if they don't contain
dates in them but when you add dates this can drop to 4,000.

So does anyone know of a good class out there or before we go and
build a faster one.

TIA

Roedy Green · Sep 16, 2003

So does anyone know of a good class out there or before we go and
build a faster one.

You might want to look into BigDate if you are dealing only with Dates
not date/timestamps. It has a couple of toString methods. You could
roll your own on those models which should be much faster than
SimpleDateFormat.

Roedy Green · Sep 16, 2003

You might want to look into BigDate

see http://mindprod.com/jgloss/bigdate.html

Matt Humphrey · Sep 16, 2003

Niko said:
Hi,

We are finding that SimpleDateFormat is pretty slow, if your trying to
use it to pass millions of records. We improved upon it by added some
caches in the code, if things like the Month was the same and so on
but in all we find it to be a hog.

For example we can pass 20,000 records a second, if they don't contain
dates in them but when you add dates this can drop to 4,000.

So does anyone know of a good class out there or before we go and
build a faster one.

If speed is the issue, you might want to consider turning the problem
around. There are only 365 days in a year, so over a 100 year period there
are only 36500 distinct dates. Pre-format those that are most likely to be
in your range of dates and put them in a hash table, or use a simple
indexing method. This completely sidesteps expensive string formatting
problems and is especially good if there are many redundant dates.

Cheers,
Matt Humphrey (e-mail address removed) http://www.iviz.com/

Niko · Sep 16, 2003

Thanks for the bigdate and the index lookup ideas, unfortunately I'm
working with DateTimes, i.e 3rd Jun 1993 05:01:43. However I was
thinking I could produce two hash tables, one for the time and one for
the date, ignoring the year, split the string and lookup in both
tables and adjust for year.

Roedy Green · Sep 16, 2003

However I was
thinking I could produce two hash tables, one for the time and one for
the date, ignoring the year, split the string and lookup in both
tables and adjust for year.

"adjust for year" means recreating the logic in BigDate.

you could precompute the strings for the days for a period of five
years, and index to get the YYYYYMMDD part and then plop in the time
part, but that is a rather big chunk of RAM.

You could get BigDate to get you the date part. You have to do the
time part yourself. You need a timezone adjust.

Matt Humphrey · Sep 16, 2003

Niko said:
Thanks for the bigdate and the index lookup ideas, unfortunately I'm
working with DateTimes, i.e 3rd Jun 1993 05:01:43. However I was
thinking I could produce two hash tables, one for the time and one for
the date, ignoring the year, split the string and lookup in both
tables and adjust for year.

That's workable and equivalent to forming the string via indexed lookup, but
with more lookup elements. Your tables for lookup would the day/month, the
year, the hour, minute and second (all the same table from 0..59) Assemble
them via a StringBuffer. There are over 84000 second time-stamps in a day,
so that's a bit much for direct lookup. Part of what you're trying to avoid
is the number-to-string conversion and the string assembly. This technique
does not avoid the string assembly problem, but the number-to-string lookup
is reduced to table index.

Another way to avoid string assembly is to arrange the string to always have
the same layout: e.g. 2dights, th|rd, sp, 3-letters, sp, dd:dd:dd. This way
you only allocate the string once and copy the elements to fixed places. But
I really only suggest this after a run with a serious profiler.

Cheers,

Jim Sculley · Sep 16, 2003

Niko said:
Hi,

We are finding that SimpleDateFormat is pretty slow, if your trying to
use it to pass millions of records. We improved upon it by added some
caches in the code, if things like the Month was the same and so on
but in all we find it to be a hog.

For example we can pass 20,000 records a second, if they don't contain
dates in them but when you add dates this can drop to 4,000.

On what hardware?

So does anyone know of a good class out there or before we go and
build a faster one.

Are you using SimpleDateFormat correctly? You should not create a new
instance for each record. I get a throughput of about 150,000 calls to
format() per second using an array of one million random dates.

Chris Uppal · Sep 17, 2003

Niko said:
Thanks for the bigdate and the index lookup ideas, unfortunately I'm
working with DateTimes, i.e 3rd Jun 1993 05:01:43.

I realise this probably won't help, but do you actually *have* to format all
the dates ? If you can arrange to keep them in their initial (not String) form
thoughout, and only change them into strings when/if displayed to a user then
you can avoid the overhead that way. That might well be difficult, but not
necessarily worse than messing around with faster parsing or complex cacheing
schemes.

-- chris

Niko · Sep 17, 2003

I create one single instance but when we look at the profiler we see a
chunk of time spent in SimpleDateFormat, it may only be a few percent
but when you are loading a file with 50 fields and maybe 8 dates then
you really start to see the chunk grow. We spent a long time
optimizing other parts of the code and even NIO showed no improvement
over our enhanced buffered IO (though we prefer NIO as it reduces the
amount of custom code) so it seams an awful shame to let
SimpleDateFormat get away without being optimized.

As for the source supplying pure dates, it sometimes can come like
that but the code is part of data loading tool which is configurable
for any data source that can come via Streams or Channels, and we only
format the date for display at the very end. It's the passing that
takes the time and creating a table with all known value sections
doesn't scare us to much as memory is cheap and this type of software
is running on big boxes overnight.

Wojtek · Sep 18, 2003

Thanks for the bigdate and the index lookup ideas, unfortunately I'm
working with DateTimes, i.e 3rd Jun 1993 05:01:43. However I was
thinking I could produce two hash tables, one for the time and one for
the date, ignoring the year, split the string and lookup in both
tables and adjust for year.

Why not break the date string into parts using StringTokenizer, then
evaluate each part and build the input for a Calendar object, then
evaluate on the Calendar object.

3rd -> value of the number, ignore the text
Jun -> lookup on 12 possibilities, less if you use progressive
lookup (ie: check the first letter, if no match check the second, if
no match check the third)
1993 -> value
substring on the time

How to set up a fast correct java build?	41	Jan 8, 2010
Slow compared to php	7	Aug 11, 2004
To dynamically bind, or not?	9	Jan 20, 2011
One Small step one infinite leap	1	Feb 6, 2005
Faster Prime class then Ruby 1.9	0	Feb 3, 2006
Practical or not: two machines, one developer, working against one codebase?	1	Mar 19, 2007
How to map your neighborhood or any USA neighborhood	1	Dec 20, 2006
[ANN] Looking for folks to help with a collaborative Ruby blog	0	Mar 29, 2009

SimpleDateFormat Slow, Looking to Build or Find Faster One

Niko

Roedy Green

Roedy Green

Matt Humphrey

Niko

Roedy Green

Matt Humphrey

Jim Sculley

Chris Uppal

Niko

Wojtek

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads