Converting strings to numbers - a question of speed

M

MIckey Mathieson

This is the fastest way to convert String to Double.


double d;
char[10] s;

// must make sure s in a valid double or conversion will not work!

memcpy(&d, s, 8); // assuming that double are 8 bytes long.

Mickey
 
O

Old Wolf

zexpe said:
Just tried the boost::lexical_cast<double>() method too,
and it's even slower:

boost::lexical_cast<double>() 1.79

I consider the boost one to be the least simple: it can
throw an exception.
 
Z

zexpe

Roland said:
Which interface would you prefer for a C++ 'shell' str2long which is:
- convenient
- does appropriate error handling?
[ ... ]

The question is of implementation rather than interface. Sure, you
could write a variety of nice C++ interfaces that wrap underlying C
library calls. What we are discussing is more to do with how does one
implement a string to number conversion function without relying upon
any of the traditional C libraries, such as <cstdlib>. The only
solution is to use stringstreams which are not as efficient as the
<cstdlib> functions for the specific task required. Therefore, if speed
is required, then one must ultimately use the <cstdlib> library
functions directly (or within your own home-brewed C++ interface
wrapper, such as the ones you have described).

Ross
 
Z

zexpe

Roland said:
The stringstream times are very fast. Do you reuse the same
stringstream object all the time? Or do you use a Standard library
which implements SSO for std::string?

Perhaps I should have restated more clearly here that these are not
true clear-cut scientific benchmark comparisons between each of the
methods. They simply denote by how much my program has slowed down by
changing the underlying implementation of its convertToDouble()
function. I'm just illustrating the overall impact of such a change in
my Real World program that is performing other calculations as well
which of course will have a systematic affect upon these "benchmarks".

I realise these benchmarks are terribly useful to anyone else, they
just illustrate a point. My implementation, btw, is as stated earlier
in this thread.

Ross
 
Z

zexpe

Greg said:
I'm not convinced of that, but then I'm not unconvinced either :)

See my last post for comments about the "benchmarks". Regarding the
boost method, it's slowness may be due to something specific about my
installation - I've never used boost before.
BTW, since this one operation is so important to you (why again?),
I'm surprised you are not pursuing tweaks of the different ways
and/or even trying to get strtod faster (if that is possible).
I mean, assuming 1.0 truly is the current best, since it matters
so much, why settle for it?

You are, of course, right. When it comes to optimisation it's best to
look at the bigger picture. I'm working on a code base that I've
inherited, and if I ever have the time, I believe it would be made much
faster by minimising the dependence on string to number operations as
much as possible. However, the design of the code at present is
strongly based on these conversions, and being a rather large program,
would take a long time to completely redesign. So, I'd rather find a
"quick fix" for the time being, and would like to be convinced that I'm
using both the most efficient technique *and* if possible the most
approved modern C++ technique. The point of this thread is a discussion
of the best, standard technique. If the code is to be completely
redesigned in the future then there's not much point trying to do
anything as ambitious as create my own string to number conversion
function.

If you're interested in the background... the code is manipulating
approximately 10 terabytes of data in an astronomical database.
However, this test is just working on 50,000 records.

Thanks for all your comments everyone!

Ross
 
Z

zexpe

Old said:
I consider the boost one to be the least simple: it can
throw an exception.

I meant simple in terms of writing C++-style code and it's readability.
So, in that sense, it's just being compared to the stringstream
technique.

Ross
 
W

Walter Bright

zexpe said:
If you're interested in the background... the code is manipulating
approximately 10 terabytes of data in an astronomical database.
However, this test is just working on 50,000 records.

If you have control over the generation of the data, consider having it
generated in %a format rather than %g. It's convertible back to floating
point much faster, and avoids creeping roundoff errors.

Also, if profiling shows that strtod() really is the bottleneck, it can be
productive to take the source code for it and convert it to hand-tuned
assembler.

Walter Bright
www.digitalmars.com C, C++, D programming language compilers
 
E

Earl Purple

You would not want to make the istringstream static because that would
be non thread-safe. But you
might want an object that has its own istringstream and re-use the
object within the same thread for
multiple converts.

You could make your class a template, and also have a variety of ways
of handling a failed convert.
(Note, for floating point types you can always return a NaN).

Beware that if you reuse a stringstream you must reset it whenever an
error has occurred before you
can ask it to convert another string.
 
V

Vladimir

zexpe said:
would take a long time to completely redesign. So, I'd rather find a
"quick fix" for the time being, and would like to be convinced that I'm

Are you aware that atof/strtod take thousands of CPU cycles
(more than 3000 on my config)? They are quite universal to handle
many possible cases which may be not needed for you. You can
examine how your data is formatted and take great advantage of it.

For example, if strings are like "434.312827", simple code would
suffice:

double convert(char * s)
{
double result = 0.0;

// convert integer part
while(*s >= '0' && *s <= '9') result = result * 10 + (*s++ - '0');

// skip dot
s++;

// convert fractional part
double f = 0.1;
while(*s >= '0' && *s <= '9') result += (*s++ - '0') * f, f *= 0.1;

return result;
}

This runs at around 100 cycles here which gives a whopping factor of
30!
You can make it safe and even faster after examining your data.
 
G

Greg Comeau

Are you aware that atof/strtod take thousands of CPU cycles
(more than 3000 on my config)? They are quite universal to handle
many possible cases which may be not needed for you. You can
examine how your data is formatted and take great advantage of it.

For example, if strings are like "434.312827", simple code would
suffice:

double convert(char * s)
{
double result = 0.0;

// convert integer part
while(*s >= '0' && *s <= '9') result = result * 10 + (*s++ - '0');

// skip dot
s++;

// convert fractional part
double f = 0.1;
while(*s >= '0' && *s <= '9') result += (*s++ - '0') * f, f *= 0.1;

return result;
}

This runs at around 100 cycles here which gives a whopping factor of
30!
You can make it safe and even faster after examining your data.

zexpe, this, and Walter's point is my point: Your "complaint" is
that stringstreams were too burdensome on an X hour program run,
but frankly although the difference between 1.5 second and 1 second
is 50%, either (a) the time of this one operations does not really
matter on your program or (b) it does matter whereas if it were
me I would look at how to make it better -- imagine you could
get it to 0.50 or less, etc. As Walter pointed out, find out
first and foremost if this is really your bottleneck, then look
at the stragegies Walter and Vladimir mentioned, and any others,
otherwise, we don't understand your premise.
 
Z

zexpe

Greg said:
zexpe, this, and Walter's point is my point: Your "complaint" is
that stringstreams were too burdensome on an X hour program run,
but frankly although the difference between 1.5 second and 1 second
is 50%, either (a) the time of this one operations does not really
matter on your program or (b) it does matter whereas if it were
me I would look at how to make it better -- imagine you could
get it to 0.50 or less, etc. As Walter pointed out, find out
first and foremost if this is really your bottleneck, then look
at the stragegies Walter and Vladimir mentioned, and any others,
otherwise, we don't understand your premise.

Sorry for confusing you Greg. I think I ended up confusing myself!

The thread started following my attempt to upgrade a piece of C++ code
I'd inherited, as an experiment in "modernising" old code design. These
days we are recommended to use STL containers and algorithms, and
abandon old C-style practises wherever possible. So I ditched the
<cstdlib> atof() in favour of the stringstream technique, but
discovered it made my program run 50% more slowly than before. Clearly
this is too high a price to pay for just "modernisation". Being
aggrieved at having been given bad advice (well, incomplete advice)
about the benefits of the stringstream technique, I posted a message to
find out if there was an equivalent modern C++ to my simple
stringstream approach that did not have a performance penalty. Clearly
there is not.

I had not originally intended to be optimising the code (I *was* happy
with its original execution time and there are more important things to
be done). I was merely modernising the code as a learning exercise for
myself. However, this exercise has highlighted the fact that my program
spends a significant proprotion of its time converting strings to
numbers. I think a lot of these conversions are strictly not necessary
and simply a product of poor overall code design. So, when I have the
time I'll redesign the code, trying to remove as many of these
conversions as possible (I believe the original author wasn't very
mindful of the 1000s of CPU cycles a string to number conversion
consumes). If these conversions remain the most time consuming part of
the program, then I shall investigate the strategies of Walter and
Vladimir.

Thanks for your time everyone.

Ross
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,438
Messages
2,571,699
Members
48,796
Latest member
Greg L.
Top