Parse string to double really fast

  • Thread starter Thomas Kowalski
  • Start date
T

Thomas Kowalski

Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.

Thanks in advance,
Thomas Kowalski
 
M

Mike Wahler

Thomas Kowalski said:
Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.

Thanks in advance,
Thomas Kowalski

#include <iostream>
#include <sstream>
#include <string>

int main()
{
std::string line("123.456 987.654 3.1416");
std::istringstream iss(line);
double x(0);
double y(0);
double z(0);
if(!(iss >> x >> y >> z))
std::cerr << "Conversion error\n";
else
std::cout << x << '\n' << y << '\n' << z << '\n';
return 0;
}

-Mike
 
G

Gianni Mariani

Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.

Thanks in advance,
Thomas Kowalski

If they are exactly of the form xxxx.yyyyyyyyy then you can do
somthing like:

double convert( const char * str )
{

return convert_char( str[0], 1000 ) +
convert_char( str[1], 100 ) +
convert_char( str[2], 10 ) +
convert_char( str[3], 1 ) +
convert_char( str[5], 0.1 ) +
.... get the rest ?
}

Where convert_char checks for ' ' or isdigit and does the appropriate
thing.

G
 
T

Thomas Kowalski

Hi Mike,
thank you for your quick answer. I used the stream-approach already.
My question is rather a hint to some really custom parser.
More about what I tried:
1. approach) Using stringstreams to parse. For my file (about 400.000
lines with 3 doubles) it took about 30s to parse.
2. approach) Using sscanf which took about 13s.
3. approach) Using atof and strchr to parse took about 8s.

The improvements in my opinion show that the IO is not yet the
limiting factor. The CPU is still busy at 100% during the parsing.

Since atof is using local information (we always use the "." as
separator) and also should be able to parse different representations
of float numbers, I guess there is plenty of room for improvement. In
case of a custom parser the search is also not necessary since the we
know that the next double will follow directly one char after the end
of the last.

Does anyone have experience with such optimizations?

Regards,
Thomas Kowalski
 
T

Thomas Kowalski

Hi Gianni,
thank you for your answer.
If they are exactly of the form xxxx.yyyyyyyyy then you can do
somthing like:

I have to admit that my example was not clear enough. The format is
not that fixed.
Its rather something like "123.456 987.654 3.1416" or "43123.987
654.1234556 3".
Means the representation is not scientific or hex and we use the "."
as a separator agnostic of the local.
double convert( const char * str )
{

return convert_char( str[0], 1000 ) +
convert_char( str[1], 100 ) +
convert_char( str[2], 10 ) +
convert_char( str[3], 1 ) +
convert_char( str[5], 0.1 ) +
.... get the rest ?

}

Where convert_char checks for ' ' or isdigit and does the appropriate
thing.

I doubt that this might be the fastest way to do things..

Thanks again,
Thomas Kowalski
 
R

Roland Pibinger

I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.

try strtod()
 
G

Gianni Mariani

Thomas said:
Hi Gianni,
thank you for your answer.
If they are exactly of the form xxxx.yyyyyyyyy then you can do
somthing like:

I have to admit that my example was not clear enough. The format is
not that fixed.
Its rather something like "123.456 987.654 3.1416" or "43123.987
654.1234556 3".
Means the representation is not scientific or hex and we use the "."
as a separator agnostic of the local.
double convert( const char * str )
{

return convert_char( str[0], 1000 ) +
convert_char( str[1], 100 ) +
convert_char( str[2], 10 ) +
convert_char( str[3], 1 ) +
convert_char( str[5], 0.1 ) +
.... get the rest ?

}

Where convert_char checks for ' ' or isdigit and does the appropriate
thing.

I doubt that this might be the fastest way to do things..

But is it fast enough? You don't normally ever need the "fastest" way
to do things since that might be very hard to write. It might also be
very fast on one platform and very slow on another.

Can you describe accurately what the strings representing these numbers
looks like ? Signed/unsigned ? Max N digits before and M digits after
the decimal point. Do you need error checking ? (i.e. can you assume it
will be a valid non scientific notation floating point number ?).

The only potentially costly thing is the floating point multiply. You
can eliminate all the multiples if you know a few things about your numbers.

If you really need extreme fast, you can probably use SIMD instructions
to convert the chars to digits (0..9) and probably have a couple more to
perform all the adds. You might even create a table like so:


const double digit_array[10][30] =
{
{ 0e+15, 1e+15, 2e+15, .... },
{ 0e+14, 1e+14, ... }
....
{ 0e+0, 1e+0, 2e+0 ...},
....
{ 0e-14, 1e-14 .... )
};

This was no multiplication is needed.
 
T

Thomas Kowalski

But is it fast enough? You don't normally ever need the "fastest" way
to do things since that might be very hard to write. It might also be
very fast on one platform and very slow on another.

True. But it should be a reference implementation on what is possible.
Therefore I like to take any solution into account that is not too
hard to program and especially stable.
Can you describe accurately what the strings representing these numbers
looks like ? Signed/unsigned ?
Yes, the numbers can be signed or unsigned. There is never an explicit
"+" in front of the numbers.
Max N digits before and M digits after the decimal point.
Arbitrary. There is no limitation on characters before or after.
Do you need error checking ?
No, error checking is not need in the release build (if there is some
in the debug-build its fine).

(i.e. can you assume it will be a valid non scientific notation floating point number ?).
Yes, I can assume that. In general I should assume that it might be
any string generated by
a sprintf(buffer, "%lf %lf %lf", x,y,z);


If you really need extreme fast, you can probably use SIMD instructions
to convert the chars to digits (0..9) and probably have a couple more to
perform all the adds. You might even create a table like so:

const double digit_array[10][30] =
{
{ 0e+15, 1e+15, 2e+15, .... },
{ 0e+14, 1e+14, ... }
....
{ 0e+0, 1e+0, 2e+0 ...},
....
{ 0e-14, 1e-14 .... )

};

This was no multiplication is needed.

I like the idea with table, although I worry about the portability
issues you mentioned. The future target platforms are x86 (32 and 64-
Bit) and Linux, Windows.

Thanks for your time and help,
Thomas Kowalski
 
J

Jacek Dziedzic

Thomas said:
Hi,
I would like to know whether someone knows a library or function that
parses a string containing 3 double numbers in the form like
xxxx.yyyyyyyyy xxxx.yyyyyyyyy xxxx.yyyyyyyyy really fast.
Currently I am using "sscanf(line.c_str(), "%lf %lf %lf", &x, &y,
&z);" which is kind of slow.

I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf(). This parses a
single double value from a const char*, and advances the
pointer -- by using it three times you will be able to parse
a line akin to what you have.

It supports scientific notation, but does not support radii
different than 10 (0x notation for example). Lookup tables are
used to speed up the process.

The code would linewrap if I pasted it here, so here's a link

http://tiny.pl/frtq

AFAIK, the only modifications would be to define
EException, EParseError and signaling_NaN() or you could remove
the error-handling altogether.

HTH,
- J.
 
D

David Harmon

On Wed, 06 Jun 2007 05:59:47 GMT in comp.lang.c++, "Mike Wahler"
std::string line("123.456 987.654 3.1416");
std::istringstream iss(line);
double x(0);
double y(0);
double z(0);
if(!(iss >> x >> y >> z))
std::cerr << "Conversion error\n";

What is your reason for thinking that constructing a string,
constructing a stringstream, and extracting the three doubles, would
be faster than the sscanf call that the OP wishes to improve upon?
 
B

Boris Kolpackov

Hi Thomas,

Thomas Kowalski said:
Since atof is using local information (we always use the "." as
separator) and also should be able to parse different representations
of float numbers, I guess there is plenty of room for improvement. In
case of a custom parser the search is also not necessary since the we
know that the next double will follow directly one char after the end
of the last.

I think aside from a custom parser, strtod is your best choice. It has
the added benefit of returning a pointer to the end of value being
parsed so you can just increment it and pass to the next call to strtod.

hth,
-boris
 
T

Thomas Kowalski

I have a string to double parser that performs about 2x times
better on my machine than strtod() or sscanf().

Thanks a lot. I will give it a try. Dzi kuje :)
BTW: Which license do you use for the parser? GPL?

Thomas Kowalski
 
I

Ian Collins

Thomas said:
Thanks a lot. I will give it a try.

I found is about 60% slower than strtod() on my system (Solaris), so
your money will vary depending on how good your native strtod() family
perform.
 
J

Jacek Dziedzic

Thomas said:
Thanks a lot. I will give it a try. Dzi kuje :)
BTW: Which license do you use for the parser? GPL?

You're welcome. There's no licence, I rolled this one
during one evening, feel free to have it for free!

If you intend to use it, perhaps you should test it
extensively, I just wrote several simple tests to make
sure it works.

HTH,
- J.
 
J

Jacek Dziedzic

Ian said:
I found is about 60% slower than strtod() on my system (Solaris), so
your money will vary depending on how good your native strtod() family
perform.

Thanks for the input. That goes to show how fragile such
hand-rolled optimizations really are. I measured on an Itanium-2
and on an AMD64 and my version was about twice as fast.

cheers,
- J.
 
I

Ian Collins

Jacek said:
Thanks for the input. That goes to show how fragile such
hand-rolled optimizations really are. I measured on an Itanium-2
and on an AMD64 and my version was about twice as fast.
Or how variable implementation's standard libraries can be!
 
J

James Kanze

Or how variable implementation's standard libraries can be!

Note that you can often convert to floating point a lot faster
if 1) you don't mind small errors in the least significant bits,
or 2) you can limit the number of digits you have to deal with
in the input. A quality convertion routine in the standard
library can't do either of these. And needs integer arithmetic
of more than 32 bits to get correct results. If you're running
on a 32 bit machine, and limit the input to, say, 9 digits, you
should be able to do a lot better than the standard library.
This won't necessarily hold for a 64 bit machine, however.
 
T

Thomas Kowalski

Note that you can often convert to floating point a lot faster
if 1) you don't mind small errors in the least significant bits,

What exactly are you referring too? Errors caused by addition of
doubles?

Regards,
Thomas Kowalski
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top