Working with Large Values (double)

Ian Collins · Mar 6, 2014

Öö Tiib said:
Is iostream slow? Here is test that I have observed that on some platforms C++
I/O performs 20% faster than C I/O and on others 50% slower. In general
neither should be used for truely performance-critical I/O.

It really does come down the the C++ library used. On Solaris with the
native compiler's default library, the C version takes half the time.
With stlport or using g++, the C++ version is about 25-30% faster.

Öö Tiib · Mar 6, 2014

Thanks for that program. I obtain:

C : 7.66352e+06
C++: 9.3946e+06

Exactly, on older Apples the difference was even bigger (I trust it was Apple
where with some earlier standard lib the C++ was 50% slower than C). I
have no Solaris but I trust Ian if he say that C++ can take 200% time too
with some library. Note that 'fscanf' or '>>' can be good enough on lot of
cases regardless that difference. When one or other is not good enough
then you go full mmap on Apple to read contents of file (just an example
common between C and C++ code, no warranties):

int MapFile( char * inPathName, void ** outDataPtr, size_t * outDataLength )
{
int outError;
int fileDescriptor;
struct stat statInfo;
// Return safe values on error.
outError = 0;
*outDataPtr = NULL;
*outDataLength = 0;
// Open the file.
fileDescriptor = open( inPathName, O_RDONLY, 0 );
if( fileDescriptor < 0 )
{
outError = errno;
}
else
{
// We now know the file exists. Retrieve the file size.
if( fstat( fileDescriptor, &statInfo ) != 0 )
{
outError = errno;
}
else
{
// Map the file into a read-only memory region.
*outDataPtr = mmap(NULL,
statInfo.st_size,
PROT_READ,
0,
fileDescriptor,
0);
if( *outDataPtr == MAP_FAILED )
{
outError = errno;
}
else
{
// On success, return the size of the mapped file.
*outDataLength = statInfo.st_size;
}
}
// Now close the file. The kernel doesn't use our file descriptor.
close( fileDescriptor );
}
return outError;
}

Öö Tiib · Mar 6, 2014

Victor Bazarov said:
Victor Bazarov said:

On Thursday, 6 March 2014 15:45:34 UTC+2, Daniel wrote:

struct CRead
{
CRead(char const* filename): _filename(filename) {}

void operator()() {
FILE* file = fopen(_filename, "r");

int count = 0;
while ( fscanf(file,"%s", _buffer) == 1 ) { ++count; }

fclose(file);
}

char const* _filename;
char _buffer[1024];
};

You've chosen the _least efficient_ method of counting strings
in C that you could find. At least dump the fscanf and use fgets.

Better would be simply to memory map the file and walk memory
(mmap, CreateFileMapping) counting newlines, windowing as necessary
for larger files.

Click to expand...

You maybe a bit confused. He's not counting lines. He's reading
strings (sequences of non-whitespace characters). For counting lines in
C++ one could use 'getline'.

Click to expand...

The code posted, above, does nothing with the string it reads. It
simply counts them (which means it counts newlines).

while ( fscanf(file,"%s", _buffer) == 1 ) { ++count; }

At best, code that calls the function operator on it will be able to
get the final string in the file from _buffer.

The test still does measure time of reading from file.
When it would do anything with read information then it would
not only measure the time that reading takes but also what processing
takes.
IOW I am puzzled what exactly is the complaint if any.

Victor Bazarov · Mar 6, 2014

Victor Bazarov said:
Victor Bazarov said:

On Thursday, 6 March 2014 15:45:34 UTC+2, Daniel wrote:

struct CRead
{
CRead(char const* filename): _filename(filename) {}

void operator()() {
FILE* file = fopen(_filename, "r");

int count = 0;
while ( fscanf(file,"%s", _buffer) == 1 ) { ++count; }

fclose(file);
}

char const* _filename;
char _buffer[1024];
};

You've chosen the _least efficient_ method of counting strings
in C that you could find. At least dump the fscanf and use fgets.

Better would be simply to memory map the file and walk memory
(mmap, CreateFileMapping) counting newlines, windowing as necessary
for larger files.

Click to expand...

You maybe a bit confused. He's not counting lines. He's reading
strings (sequences of non-whitespace characters). For counting lines in
C++ one could use 'getline'.

Click to expand...

The code posted, above, does nothing with the string it reads. It
simply counts them (which means it counts newlines).

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Do you not know what %s does in fscanf? RTFM. Hint: it's not "read
until newline". Another hint: he's counting *words*. Need more hints?
Just ask.

while ( fscanf(file,"%s", _buffer) == 1 ) { ++count; }

At best, code that calls the function operator on it will be able to
get the final string in the file from _buffer.

scott

V

Melzzzzz · Mar 6, 2014

And how did they manage to make << >> so slow! << >> seems to be
the counter example to the advice often heard here, don't optimize
prematurely. If the original architecture is inherently slow, there
are limits to what even a super optimizing compiler team can do
with it.

Click to expand...

Is iostream slow? Here is test that I have observed that on some
platforms C++ I/O performs 20% faster than C I/O and on others 50%
slower. In general neither should be used for truely
performance-critical I/O.

/// THE TEST //////////////////////

#include <fstream>
#include <iostream>
#include <iomanip>

#include <cmath>
#include <cstdio>

#include <sys/time.h>

template <typename Func>
double benchmark(Func f, size_t iterations)
{
f();

timeval a, b;
gettimeofday(&a, 0);
for (; iterations --> 0
{
f();
}
gettimeofday(&b, 0);
return (b.tv_sec * (unsigned int)1e6 + b.tv_usec) -
(a.tv_sec * (unsigned int)1e6 + a.tv_usec);
}

struct CRead
{
CRead(char const* filename): _filename(filename) {}

void operator()() {
FILE* file = fopen(_filename, "r");

int count = 0;
while ( fscanf(file,"%s", _buffer) == 1 ) { ++count; }

fclose(file);
}

char const* _filename;
char _buffer[1024];
};

C version is not safe. What happens if input is longer then 1023 chars?

Öö Tiib · Mar 6, 2014

C version is not safe. What happens if input is longer then 1023 chars?

The test does compare speed of standard input of C versus standard
input of C++. It is *not* meant as example code that you copy-paste
use in real product. I was afraid that saying it out explicitly might
insult most reader's intelligence.

To give exact answer to "what happens" is that "you take other text file
for running the test". How many *.txt files you have on your hard drive
that contain words that are bigger than 1023 chars?

Melzzzzz · Mar 6, 2014

The test does compare speed of standard input of C versus standard
input of C++. It is *not* meant as example code that you copy-paste
use in real product. I was afraid that saying it out explicitly might
insult most reader's intelligence.

To give exact answer to "what happens" is that "you take other text
file for running the test". How many *.txt files you have on your
hard drive that contain words that are bigger than 1023 chars?

Ones that are made to exploit that bug.
No one is using fscanf for anything useful in C code,
perhaps you should measure speed of fgets vs getline?
Correct C version for counting words in a file is not
trivial as your example.

Öö Tiib · Mar 6, 2014

Ones that are made to exploit that bug.
No one is using fscanf for anything useful in C code,
perhaps you should measure speed of fgets vs getline?
Correct C version for counting words in a file is not
trivial as your example.

Isn't it really? I believe that replacing that

fscanf(file,"%s", _buffer)

with that

fscanf(file,"%1023s", _buffer)

would make it safer. Did it become lot less trivial?

Again, it is not meant to do anything even remotely useful
(like counting words). It only measures performance of more
or less equal reading-from-file functions of standard
libraries of two languages.

Since Daniel asked about
why '<<' and '>>' perform so bad I did just provide a
test that equivalent thing to '>>' in C (and that is 'fscanf')
performs roughly about as bad (sometimes better,
sometimes worse).

That makes both C and C++ religionists the Pot and Kettle.

Daniel · Mar 6, 2014

The idea with C++ streams is that they are extensible. Care to give
an alternative solution that retains the extensibility but is clearer
and easier to use? Because I can't think of a way off the top of my
head.

Overloading something that converts a thing into text is okay. But fixed, scientific, dec, noshowbase, showpoint etc are a poor man's formatting tools.. Formatting should be done with masks, masks do away with the need for anyof the above. That was prior art even at C++'s inception.

Daniel

Stuart · Mar 7, 2014

On 3/6/14, Melzzzzz wrote:
[snip]

Isn't it really? I believe that replacing that

fscanf(file,"%s", _buffer)

with that

fscanf(file,"%1023s", _buffer)

would make it safer. Did it become lot less trivial?

Again, it is not meant to do anything even remotely useful
(like counting words). It only measures performance of more
or less equal reading-from-file functions of standard
libraries of two languages. Since Daniel asked about
why '<<' and '>>' perform so bad I did just provide a
test that equivalent thing to '>>' in C (and that is 'fscanf')
performs roughly about as bad (sometimes better,
sometimes worse).

I think that it may be just this feature (increasing buffer for the
string that is just being read) that could make C++ streams much slower
than C. I suppose that the C++ methods that you have used in your
example are probably just a bit of syntactic sugar on top of the C
methods, so I would expect just a very minor decrease in speed.

But I could very well imagine that the C++ version may need a
significantly more time because it has to re-allocate the buffer that
holds the string. So for the sake of accurateness, you should compare
the C++ snippet against a C program that can handle arbitrary large
words (as little sense as this would make in real life). Although, on
second thought, I'm probably going to an essay on
Donaudampfschifffahrtskapitänspatentvergabemechanismenüberwachungsbeauftragtengehaltsstatistiken,
or something even more convoluted ;-)

Regards,
Stuart

88888 Dihedral · Mar 7, 2014

To be fair, GLIBC does permit you to define extensions to printf, so

it is possible, if not all that pretty.

http://www.gnu.org/software/libc/manual/html_node/Printf-Extension-Example.html

The <<, and >> operators are
overloadable for console and file I/O
operations. But that means the obj in the lib has to bundle a lot functions.

Also the buffer and unbuffered settings
are not clear.

is

Öö Tiib · Mar 7, 2014

On 3/6/14, Melzzzzz wrote:
[snip]

Isn't it really? I believe that replacing that

fscanf(file,"%s", _buffer)

with that

fscanf(file,"%1023s", _buffer)

would make it safer. Did it become lot less trivial?

Again, it is not meant to do anything even remotely useful
(like counting words). It only measures performance of more
or less equal reading-from-file functions of standard
libraries of two languages. Since Daniel asked about
why '<<' and '>>' perform so bad I did just provide a
test that equivalent thing to '>>' in C (and that is 'fscanf')
performs roughly about as bad (sometimes better,
sometimes worse).

Click to expand...

I think that it may be just this feature (increasing buffer for the
string that is just being read) that could make C++ streams much slower
than C. I suppose that the C++ methods that you have used in your
example are probably just a bit of syntactic sugar on top of the C
methods, so I would expect just a very minor decrease in speed.

Evidence shows that with some compilers and libraries the
C++ wins C in that posted test. Not by much, but with 20%. That wasn't
the point of course.

But I could very well imagine that the C++ version may need a
significantly more time because it has to re-allocate the buffer that
holds the string. So for the sake of accurateness, you should compare
the C++ snippet against a C program that can handle arbitrary large
words (as little sense as this would make in real life).

Things read/think/discuss/experiment should be done in balance, YMMV.

Although, on
second thought, I'm probably going to an essay on
Donaudampfschifffahrtskapitänspatentvergabemechanismenüberwachungsbeauftragtengehaltsstatistiken,
or something even more convoluted ;-)

That is nothing near 1023 characters. Count it.

I posted also 'mmap' (POSIX) code else thread. It usually wins
everything in standard lib by formidable margin. My impression
has been that 'fgets' or 'geline' gets bad of both worlds
(bad-looking buffer-digging and byte-crunching code that does
not perform).

Ian Collins · Mar 7, 2014

Öö Tiib said:
Evidence shows that with some compilers and libraries the
C++ wins C in that posted test. Not by much, but with 20%. That wasn't
the point of course.

Some real numbers (times changed via ugly format operators to seconds!):

g++ -O3 -m64 ../x.cc; ./a.out 5000
C : 11.10
C++: 9.05

g++ -O3 ../x.cc; ./a.out 5000
C : 15.27
C++: 10.57

CC -fast -library=stlport4 -m64 ../x.cc; ./a.out 5000
C : 11.02
C++: 8.81

CC -fast -library=stlport4 ../x.cc; ./a.out 5000
C : 15.48
C++: 10.02

and for completeness, with a crappy C++ library:

CC -fast ../x.cc; ./a.out 5000
C : 15.56
C++: 40.65

wc largefile.txt
8991 14721 375121 largefile.txt

As expected, the C version are nearly identical due to using the same
system library and the C++ version are close.

James Kanze · Apr 1, 2014

Le 23/02/2014 14:59, James Kanze a écrit :

Of course. For instance in the evil C language you use:

printf("0x%08x\n", x);

In the much more advanced C++ language you use:

std::cout << std::hex << std::setfill('0') << std::setw(8) << x <<
std::dec << std::endl;

Of course C has ONE of the worst designs. But C++ has THE WORST!

What is it Bob Dylan said: don't criticize what you don't
understand. In C++, I'd write:

std::cout << HexFmt(8) << x << '\n';

Except, of course, most of the time, it would be something like:

std::cout << degrees << x << '\n';

, or whatever x logically was. You don't want to go around
specifying the physical details of your formatting each time you
output some particular type of data.

James Kanze · Apr 1, 2014

The C version is not type safe, cannot be expanded with new types, and
has poor abstraction, while the C++ version has all of those.

Just as an example of that, assume you have in one file

typedef int Integral;

and in another something along the lines of

void foo(Integral i)
{
printf("???", i); // What to put here?
}

If you put "%i" there, it will break if 'Integral' is ever changed to
a bigger or incompatible type.

With luck, the compiler will give you warnings for the format string
not matching the parameter types, so you can manually go and fix all
the instances. Excellent programming technique.

A more frequent example: you're outputting degrees using
a format "%.1f" You client requests two digits after the
decimal. In C, you have to go through the entire code base,
determining each time whether the "%.1f" is for degrees, or
something else, and only change those for degrees. In C++, you
modify the degree manipulator, and that's it. A simple change,
in one location, and clear what is being modified.

James Kanze · Apr 1, 2014

Le 24/02/2014 12:32, Juha Nieminen a écrit :

Most modern C compilers will warn if you pass an incorrect type to
printf. please, no straw men here.

I don't see how that's possible. The formatting information is
embedded in the text, and the text comes from a separate,
language dependent file, which didn't even exist when the code
was compiled. (That's when the real fun starts. The program
only crashes when it's run outputting in Finnish.)

converting double to int	1	Nov 19, 2013
Building a Large Container	26	Dec 2, 2013
Chatbot	0	Oct 8, 2024
How to prevent a double from appearing in e-notation?	4	Aug 23, 2007
strtod(*iter) + double	4	Aug 6, 2006
dealing with large csv files	5	Nov 29, 2008
bitset Multiple Values Assignment	1	Jun 10, 2012
Expert Guide to Convert MBOX to PST File Manually in 2025	3	Dec 1, 2024

Working with Large Values (double)

Ian Collins

Öö Tiib

Öö Tiib

Victor Bazarov

Melzzzzz

Öö Tiib

Melzzzzz

Öö Tiib

Daniel

Stuart

88888 Dihedral

Öö Tiib

Ian Collins

James Kanze

James Kanze

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads