nicely printf a matrix

V

vectorizor

Hi guys,

My program deals with with lots of large matrices. To easily debug it,
I would like to be able to dump them in files, so that I can easily
import then in other software (think matlab, excel) so I can analyse
the intermediate results. To make sure the data can be understood by
other programs, a space need to be included between each cell, and a
return at the end of each row (row-major order). Hence at the moment,
the dump code looks something like

for each row
for each col
fprintf("%f \t"); // a float followed by a space
end
fprintf("\n"); // to start a new line
end

It works ok, but as I said, the matrices are rather large, so it's
slooow. Has anybody some good ideas to significantly speed up the
process?

Thanks

victor
 
C

Chris Dollin

vectorizor said:
My program deals with with lots of large matrices. To easily debug it,
I would like to be able to dump them in files, so that I can easily
import then in other software (think matlab, excel) so I can analyse
the intermediate results. To make sure the data can be understood by
other programs, a space need to be included between each cell, and a
return at the end of each row (row-major order). Hence at the moment,
the dump code looks something like

for each row
for each col
fprintf("%f \t"); // a float followed by a space
end
fprintf("\n"); // to start a new line
end

It works ok, but as I said, the matrices are rather large, so it's
slooow.

How large, how slow, where are you printing them to, what's the
/actual code/ you're using, are you sure it's the printing that's
taking the time, have you any reason to think it can go faster,
are the arrays sparse?
 
R

Richard Heathfield

vectorizor said:

the dump code looks something like

for each row
for each col
fprintf("%f \t"); // a float followed by a space
end
fprintf("\n"); // to start a new line
end

Well, I hope it looks nothing like that. :) But I get the idea.
It works ok, but as I said, the matrices are rather large, so it's
slooow. Has anybody some good ideas to significantly speed up the
process?

Your system will be line-buffering, so there isn't much - if anything -
to be gained from caching the writes. If you are prepared to give up
text format (and I do not minimise the pain - text is so much more
useful than binary), writing the arrays out using fwrite is likely to
be much, much quicker because you're skipping the conversion from
internal format to text (and back, on the load). But the cost is high -
suddenly, your data format is at the mercy of your implementation.

Whether you can do anything seriously effective without losing text
format very much depends on the data.

This happened to me once, kinda. I had to save the state of a mainframe
program to a file so that it could be transferred down to the PC for
easier debugging (and if you've ever debugged on a mainframe, you'll
know why!).

There were brazillions of values to transfer, and it was taking way too
long - the mists of time are kind to me, but it was certainly a great
many minutes. Then I took a look at the data, and realised that almost
every value was 0! I mean, yes, there were many hundreds of non-zero
values, but there were literally(!?!) squillions of zero values too.

So I wrote the code in such a way that it only dumped non-zero values.
Suddenly, the whole process - save state, transfer, load state - only
took a few seconds. Job done.

You may find that the same trick, or a similar one, works for you, too.
 
E

Eric Sosman

vectorizor wrote On 05/03/07 10:39,:
Hi guys,

My program deals with with lots of large matrices. To easily debug it,
I would like to be able to dump them in files, so that I can easily
import then in other software (think matlab, excel) so I can analyse
the intermediate results. To make sure the data can be understood by
other programs, a space need to be included between each cell, and a
return at the end of each row (row-major order). Hence at the moment,
the dump code looks something like

for each row
for each col
fprintf("%f \t"); // a float followed by a space
end
fprintf("\n"); // to start a new line
end

It works ok, but as I said, the matrices are rather large, so it's
slooow. Has anybody some good ideas to significantly speed up the
process?

Buy faster disks? I/O operations take about six or
seven decimal orders of magnitude more time than CPU
operations, so rearranging the CPU's computations will
improve the overall time very little. Your biggest
improvement might come from eliminating either the space
or the tab character (assuming you don't actually need
both): if the matrix is a thousand rows by a thousand
columns, that'll save you a megabyte of I/O.

One *possible* exception to all the above involves
buffering. Are you, by any chance, sending this output
to the stderr stream? By default that stream is not
buffered, so you may be generating a huge number of
instead of a smaller number of
. There are at least two ways to
improve things if that's the source of the problem: use
setvbuf() or setbuf() to change the buffering of stderr
(you must do this *before* using the stream in any other
way), or send the output to a different stream opened
with fopen(). I'd recommend the second as more robust
and more easily managed.
 
J

John Bode

Hi guys,

My program deals with with lots of large matrices. To easily debug it,
I would like to be able to dump them in files, so that I can easily
import then in other software (think matlab, excel) so I can analyse
the intermediate results. To make sure the data can be understood by
other programs, a space need to be included between each cell, and a
return at the end of each row (row-major order). Hence at the moment,
the dump code looks something like

for each row
for each col
fprintf("%f \t"); // a float followed by a space
end
fprintf("\n"); // to start a new line
end

It works ok, but as I said, the matrices are rather large, so it's
slooow. Has anybody some good ideas to significantly speed up the
process?

Thanks

victor

First of all, profile your code so that you *know* it's the calls to
printf() that are sucking up the most cycles before trying to speed it
up.

If it turns out that the calls to printf() are sucking up all the
time, one thing you can try is to build each line of text in memory
using sprintf(), then call printf() on that line; writing to a buffer
in memory *should* be faster than writing to stdout.

#include <stdio.h>
#include <stdlib.h>

int buildLine(float *vals, size_t numVals, int fieldWidth, int
precision, char **lineBuf)
{
size_t bufLen = numVals * (fieldWidth + 1) + 1;
size_t i, pos;
char *tmp;

if (*lineBuf != NULL)
{
free(*lineBuf);
}

tmp = realloc(*lineBuf, bufLen);
if (tmp)
return 0;
else
*lineBuf = tmp;

for (i = 0, pos = 0; i < numVals; i++, pos += (fieldWidth + 1))
sprintf(*lineBuf+pos, "%*.*f ", fieldWidth, precision, vals);

return 1;
}

The in the code that dumps the matrix, just do the following:

double matrix[ROWS][COLS] = {...};

for (i = 0; i < rows; i++)
{
char *theLine;
int width = 7;
int prec = 3;
if (buildLine(matrix, width, prec, &theLine))
printf("%s\n", theLine);
}

I haven't exhaustively tested or profiled this code, so it may not buy
you anything at all, but it should give you some ideas.
 
¬

¬a\\/b

Hi guys,
My program deals with with lots of large matrices. To easily debug it,
I would like to be able to dump them in files, so that I can easily
import then in other software (think matlab, excel) so I can analyse
the intermediate results. To make sure the data can be understood by
other programs, a space need to be included between each cell, and a
return at the end of each row (row-major order). Hence at the moment,
the dump code looks something like

for each row
for each col
fprintf("%f \t"); // a float followed by a space
end
fprintf("\n"); // to start a new line
end

you can write the array all in the memory than you can print it in the
file. this should be more fast because write in a array is in general,
1000 times more fast than write in files

if you can have the right function can do something like below

char buffer[2048];
int siz=2045;

for each row i
for each col j
siz-=sprintf_R(buffer, siz, "%3.3f ", a[j]);
end
siz-=sprintf_R(buffer, siz, "\n");
end

if(siz>0) printf("%s \n", buffer);
else printf("Errore\n")

where int sprintf_(char* buffer, int sizeofbuffer, char* fmt, ...)
is like sprintf but
return the number of char written in buffer

if some error occur
sprintf_(buffer, sizeofbuffer, fmt, ...)
should return sizeofbuffer

but the above is only a function that i have think just now not know
if could work ok
 
E

Eric Sosman

John Bode wrote On 05/03/07 12:50,:
First of all, profile your code so that you *know* it's the calls to
printf() that are sucking up the most cycles before trying to speed it
up.

This part is good advice.
If it turns out that the calls to printf() are sucking up all the
time, one thing you can try is to build each line of text in memory
using sprintf(), then call printf() on that line; writing to a buffer
in memory *should* be faster than writing to stdout.

This part is suspect. The conversion from numbers to
characters should take approximately the same amount of time
regardless of the ultimate destination, so the only difference
between line-at-a-time and number-at-a-time lies in buffer
behavior. Buffering can certainly have a strong influence
on the total time, but there are more direct ways to affect
the buffering.
#include <stdio.h>
#include <stdlib.h>

int buildLine(float *vals, size_t numVals, int fieldWidth, int
precision, char **lineBuf)
{
size_t bufLen = numVals * (fieldWidth + 1) + 1;

The correctness of this calculation depends crucially
on a correct value for fieldWidth, which may not be easy
to come by. Quickly: How many characters will be generated
if fieldWidth is seven? Would you like to know the value
of the number being converted before you answer?
size_t i, pos;
char *tmp;

if (*lineBuf != NULL)
{
free(*lineBuf);
}

Unnecessary test, because free(NULL) is a no-op.
tmp = realloc(*lineBuf, bufLen);

Undefined behavior if *lineBuf was not NULL originally,
because if it wasn't NULL it got freed a few lines ago.
You probably want malloc() instead.
if (tmp)
return 0;
else
*lineBuf = tmp;

Looks like the test is sdrawkcab.
for (i = 0, pos = 0; i < numVals; i++, pos += (fieldWidth + 1))
sprintf(*lineBuf+pos, "%*.*f ", fieldWidth, precision, vals);

return 1;
}

The in the code that dumps the matrix, just do the following:

double matrix[ROWS][COLS] = {...};

for (i = 0; i < rows; i++)
{
char *theLine;
int width = 7;
int prec = 3;


As an illustration of my misgivings about getting
a reliable value for fieldWidth, here's a snap question.
Please answer as rapidly as you can:

What is the largest value that can be safely
handled by these width and prec values?

You needn't disclose your answer, but be honest with
yourself. Did you leave a spot for the decimal point?
Did you leave a spot for the minus sign? Did you take
rounding into account? If you did all these things and
still answered correctly, you're a better man than I am,
Gunga Din.
if (buildLine(matrix, width, prec, &theLine))


Oog. Uninitialized theLine, even though the called
function will almost immediately inspect its non-existent
value ...
printf("%s\n", theLine);
}

I haven't exhaustively tested or profiled this code, so it may not buy
you anything at all, but it should give you some ideas.

IMHO your first idea was by far the better. The second
idea (once repaired) remains shaky because of its reliance
on fieldWidth. Also, the fixed fieldWidth means that small
numbers are preceded by space characters which do little
except increase the total volume of output -- not usually a
recipe for speeding things up.
 
K

Keith Thompson

vectorizor said:
My program deals with with lots of large matrices. To easily debug it,
I would like to be able to dump them in files, so that I can easily
import then in other software (think matlab, excel) so I can analyse
the intermediate results. To make sure the data can be understood by
other programs, a space need to be included between each cell, and a
return at the end of each row (row-major order). Hence at the moment,
the dump code looks something like

for each row
for each col
fprintf("%f \t"); // a float followed by a space
end
fprintf("\n"); // to start a new line
end

It works ok, but as I said, the matrices are rather large, so it's
slooow. Has anybody some good ideas to significantly speed up the
process?

Quantifying "rather large" and "slooow" might be helpful.

I don't have a whole lot to offer in the way of performance
improvements. If you're writing to a file (presumably a disk file),
fprintf() itself will probably do all the buffering you need; there
are ways to adjust that, but they probably won't help much.

Since this is obviously pseudo-code, I won't bother to mention that
you omitted the FILE* argument to fprintf().

Do you really print a space *and* a tab after each number? One or the
other would probably suffice.

The simplest algorithm will give you an extra space at the end of each
line. That probably doesn't matter, but if it does, you can treat the
first or last element of each row as a special case.

putc('\n', foo) may be marginally faster than fprintf(foo, "\n").
I doubt that that's your bottleneck, but it's a simple enough change
that it's probably worth doing.

By using fprintf(), you're invoking code that can parse arbitrary
format strings. A specialized function that just converts a
floating-point value to a string *might* be more efficient. There's
no such function in standard C (is there?), so you'd have to find one
or write your own. But the overhead of the conversion isn't likely to
be significant relative to the overhead of performing the output.
Don't even think about doing this until you've profiled your code and
determined where it's actually spending its time.

Consider writing and timing a similar program that prints fixed
strings rather than floating-point values. Comparing its speed to the
speed of your program should give you some idea of how much time is
spent doing the floating-point conversions, and how much is spent
performing output.

Think about loss of precision. The "%f" format is going to round the
printed value to just a few decimal digits, especially values very
close to 0.0. You can use "%e" to force scientific notation, which
makes the loss of precision more uniform. If loss of precision is an
issue, specify more digits, e.g., "%.16f" or "%.16g" (pick an
appropriate value for "16"). Alas, this will likely slow down your
code even more. There's a tradeoff between speed and precision; do
you want wrong answers really fast?
 
C

Chris Dollin

Chris said:
How large, how slow, where are you printing them to, what's the
/actual code/ you're using, are you sure it's the printing that's
taking the time, have you any reason to think it can go faster,
are the arrays sparse?

I wrote a proglet that did a 10000 x 10000 iteration (no arrays, so
that's a missing overhead), and another that just wrote
out a fixed string for each value instead, and another that wrote
the fixed string using `puts` not `printf` with `%s`.

The numbers are

puts fixed string -- 16s
printf %s fixed string -- 25s
printf %f i * j * 1.0 -- (waiting ... 185s)
printf no-percent-string i * j * 1.0 -- 15s or 20s depending on size

from which I'd guess that there was some chunk of time just outputing
the characters, a noticable but significantly smaller part interpreting
the format string, and another noticable ditto which is the %f
conversion.

The moral of the story (I see Richard H has a similar observation) is
that the best thing to do is to write fewer things out; are there
values in the array which are repeated a lot?
 
¬

¬a\\/b

Hi guys,
My program deals with with lots of large matrices. To easily debug it,
I would like to be able to dump them in files, so that I can easily
import then in other software (think matlab, excel) so I can analyse
the intermediate results. To make sure the data can be understood by
other programs, a space need to be included between each cell, and a
return at the end of each row (row-major order). Hence at the moment,
the dump code looks something like

for each row
for each col
fprintf("%f \t"); // a float followed by a space
end
fprintf("\n"); // to start a new line
end

you can write the array all in the memory than you can print it in the
file. this should be more fast because write in a array is in general,
1000 times more fast than write in files

if you can have the right function can do something like below

char buffer[2048];
int siz=2045;

for each row i
for each col j
siz-=sprintf_R(buffer, siz, "%3.3f ", a[j]);
end
siz-=sprintf_R(buffer, siz, "\n");
end

if(siz>0) printf("%s \n", buffer);
else printf("Errore\n")


char buffer[20048];
int siz=20048;
for each row i
for each col j
buffer+=sprintf_R(buffer, &siz, "%3.3f ", a[j]);
end
buffer+=sprintf_R(buffer, &siz, "\n");
end

if(siz>0) printf("%s \n", buffer);
else printf("Errore\n")


where int sprintf_R(char* buffer, int* sizeRemain, char* fmt, ...)
is like sprintf but, return 0 if error, or if ok the lenght of buffer
string; than write in sizeRemain the new value of buffer lenght of
string in buffer - strlen(buffer)
 
¬

¬a\\/b

char buffer[20048];
int siz=20048;
for each row i
for each col j
buffer+=sprintf_R(buffer, &siz, "%3.3f ", a[j]);
end
buffer+=sprintf_R(buffer, &siz, "\n");
end

if(siz>0) printf("%s \n", buffer);
else printf("Errore\n")


char buffer[20048], *b=buffer;
int siz=20048, k=1;
for each row i
for each col j
k=sprintf_R(b, siz, "%3.3f ", a[j]);
b+=k; siz-=k;
if(k<=0||siz<=0) goto label;
end
b+=sprintf_R(b, &siz, "\n");
end
label:
if(k<=0||siz<=0) printf("Errore\n");
else printf("%s \n", buffer);

where int sprintf_R(char* buffer, int sizebuffer, char* fmt, ...)
is like sprintf but, return 0 if error, or if ok the lenght of buffer
string;

int sprintf_R(0, 0, char* fmt, ...)
return the lenght of array if the parameters in fmt are all expanded
(it should be like snprintf?)

or something like

char *buffer, *b;
int siz, k;
for(i=0, k=0; i<nrow; ++i)
{for(j=0; j<ncol; ++j) /* this find the lenght of array*/
k+=sprintf_R(0, 0, "%3.3f ", a[j]);
k+=sprintf_R(0, 0, "\n");
}
if( (buffer=malloc(k+4))==0 )
{printf("Errore di memoria\n"); return 0;}

for(i=0, b=buffer; i<nrows; ++i)
{for(j=0; j<ncols; ++j)
{k=sprintf_R(b, siz, "%3.3f ", a[j]);
b+=k; siz-=k;
if(k==0||siz<=0) goto label;
}
b+=sprintf_R(b, &siz, "\n");
}
label:
if(k==0||siz<=0) printf("Errore\n");
else printf("%s \n", buffer);

free(buffer);

if you have to print in stdout a matrix of doubles....
what do you use for that operation? sprintf?
 
¬

¬a\\/b

char buffer[20048], *b=buffer;
int siz=20048, k=1;
for each row i
for each col j
k=sprintf_R(b, siz, "%3.3f ", a[j]);
b+=k; siz-=k;
if(k<=0||siz<=0) goto label;
end
b+=sprintf_R(b, &siz, "\n");

k =sprintf_R(b, siz, "\n");
b+=k; siz-=k;
if(k<=0||siz<=0) goto label;
end
label:
if(k<=0||siz<=0) printf("Errore\n");
else printf("%s \n", buffer);

where int sprintf_R(char* buffer, int sizebuffer, char* fmt, ...)
is like sprintf but, return 0 if error, or if ok the lenght of buffer
string;

int sprintf_R(0, 0, char* fmt, ...)
return the lenght of array if the parameters in fmt are all expanded
(it should be like snprintf?)

or something like

char *buffer, *b;
int siz, k;
for(i=0, k=0; i<nrow; ++i)
{for(j=0; j<ncol; ++j) /* this find the lenght of array*/
k+=sprintf_R(0, 0, "%3.3f ", a[j]);
k+=sprintf_R(0, 0, "\n");
}
if( (buffer=malloc(k+4))==0 )
{printf("Errore di memoria\n"); return 0;}

for(i=0, b=buffer; i<nrows; ++i)
{for(j=0; j<ncols; ++j)
{k=sprintf_R(b, siz, "%3.3f ", a[j]);
b+=k; siz-=k;
if(k==0||siz<=0) goto label;
}
b+=sprintf_R(b, &siz, "\n");

k =sprintf_R(b, siz, "\n");
b+=k; siz-=k;
if(k<=0||siz<=0) goto label;
}
label:
if(k==0||siz<=0) printf("Errore\n");
else printf("%s \n", buffer);

free(buffer);

if you have to print in stdout a matrix of doubles....
what do you use for that operation? sprintf?

so how do all you print it?
it should be a "standard" way of build arrays
are not you agree on that?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top