Best way to write the vector content to a file.

S

sravanreddy001

Hi,

Can anyone give the best way (faster) to write the vector contents to a file?

Is there a better way to do it than below method?

for(unsigned i=0;i<terms.size();i++){
sprintf(temp,"%d %s\n",i,terms.c_str());
ids_n_terms = ids_n_terms + temp;
}

Is the string addition a very expensive operation? (will a new memory allocation takes place)

there are around 5,00,000 (strings) in the vector, and these have to stored in
<id term> format.

Thanks,
Sravan.
 
I

Ian Collins

Hi,

Can anyone give the best way (faster) to write the vector contents to a file?

Is there a better way to do it than below method?

for(unsigned i=0;i<terms.size();i++){
sprintf(temp,"%d %s\n",i,terms.c_str());
ids_n_terms = ids_n_terms + temp;
}


Where do you do the write?
Is the string addition a very expensive operation? (will a new memory allocation takes place)

there are around 5,00,000 (strings) in the vector, and these have to stored in
<id term> format.

You will probably be I/O limited, so optimising your code won't make any
significant difference.
 
S

sravanreddy001

I'm actually writing this to a file.
But, file operation is just one simple write.
(the string that is constructed in the loop is written.. and the file size is around 5 MB.)

but.. string+ = string + some_thing;

if this is going to allocate a new memory location, then.. writing it multiple times to file.. will any of this approach help?

One my i3 2nd gen computer.. it took around 10 mins. (using one single core)
not multithreaded.
 
S

sravanreddy001

Is there a way to automatically push the vector contents to file in an efficient manner?
 
I

Ian Collins

On 09/20/11 03:28 PM, sravanreddy001 wrote:

Context is nice!
I'm actually writing this to a file.
But, file operation is just one simple write.
(the string that is constructed in the loop is written.. and the file size is around 5 MB.)

but.. string+ = string + some_thing;

if this is going to allocate a new memory location, then.. writing it multiple times to file.. will any of this approach help?

One my i3 2nd gen computer.. it took around 10 mins. (using one single core)
not multithreaded.

You would probably be better off just using an ofstream and writing a
line at a time. I have one application that reads and writes in excess
of 1GB with fstreams. Reading takes a couple of minutes, writing about 5.

Your huge string will do a lot of progressively bigger memory moves,
which are a bigger issue than the allocations. Kind of a
macro-deoptimisation!
 
S

sravanreddy001

do you think, writing so many times... 5 lakhs times(lines) is more expencive... more disk writes.. thats why i came to that approach..

so.. will wiriting to file every 20-30 lines at once.. will it help me?
it will atleast reduce the disk writes by 20-30 times..
 
S

sravanreddy001

Also, I'm reading all the file content in one GO.

seeking until the end of file.. read from start to end.
this is efficient than reading line by line right?
 
I

Ian Collins

Also, I'm reading all the file content in one GO.

Please keep the context from the post you are relying to, otherwise your
posts don't make sense.
seeking until the end of file.. read from start to end.
this is efficient than reading line by line right?

You are worrying about the wrong thing, just create an ifstream and read
the entries in one at a time.
 
I

Ian Collins

do you think, writing so many times... 5 lakhs times(lines) is more expencive... more disk writes.. thats why i came to that approach..

so.. will wiriting to file every 20-30 lines at once.. will it help me?
it will atleast reduce the disk writes by 20-30 times..

Leave the buffering to the stream and the OS, it's their job. Unless
you flush the stream each line and force synchronous writes on the file,
you will not see millions of writes. On my system, that much data would
be coalesced into one disk write!
 
J

Juha Nieminen

sravanreddy001 said:
for(unsigned i=0;i<terms.size();i++){
sprintf(temp,"%d %s\n",i,terms.c_str());
ids_n_terms = ids_n_terms + temp;
}


I don't understand why you can't write to the file directly. Creating
the contents of the file into a dynamic string first like that is
significantly more inefficient.
 
G

Goran

Hi,

Can anyone give the best way (faster) to write the vector contents to a file?

Is there a better way to do it than below method?

        for(unsigned i=0;i<terms.size();i++){
                sprintf(temp,"%d %s\n",i,terms.c_str());
                ids_n_terms = ids_n_terms + temp;
        }

Is the string addition a very expensive operation? (will a new memory allocation takes place)


Well, yes. The above loop is bound to get very expensive, because you
want to append. That means, from time to time (possibly at every
append!), "allocate more space, copy old string there, append new
content, free old space".

You should instead just use ofstream, and play with it's rdbuf-
pubsetbuf to find a sweet spot for speed (depends on the system, but
reasonable buffer sizes like 1K-8K (1024 to 8192) are a good guess). I
wouldn't even be surprised if you find that the default works just as
well. E.g.

ofstream f(whatever);
char buffer[try sizes here]; // speed to be gained here.
f.rdbuf->pubsetbuf(buffer, sizeof(buffer));

your loop here, but instead of sprintf and +, do

f << i << " " << terms << endl;

That does conversion to text for you.

Goran.
 
I

Ian Collins

Hi,

Can anyone give the best way (faster) to write the vector contents to a file?

Is there a better way to do it than below method?

for(unsigned i=0;i<terms.size();i++){
sprintf(temp,"%d %s\n",i,terms.c_str());
ids_n_terms = ids_n_terms + temp;
}

Is the string addition a very expensive operation? (will a new memory allocation takes place)


Well, yes. The above loop is bound to get very expensive, because you
want to append. That means, from time to time (possibly at every
append!), "allocate more space, copy old string there, append new
content, free old space".

You should instead just use ofstream, and play with it's rdbuf-
pubsetbuf to find a sweet spot for speed (depends on the system, but
reasonable buffer sizes like 1K-8K (1024 to 8192) are a good guess). I
wouldn't even be surprised if you find that the default works just as
well. E.g.

ofstream f(whatever);
char buffer[try sizes here]; // speed to be gained here.
f.rdbuf->pubsetbuf(buffer, sizeof(buffer));

your loop here, but instead of sprintf and +, do

f<< i<< " "<< terms<< endl;


Flushing after each line rather defeats the effort of fiddling with the
ofstream's buffer!
 
G

Goran

Hi,
Can anyone give the best way (faster) to write the vector contents to a file?
Is there a better way to do it than below method?
         for(unsigned i=0;i<terms.size();i++){
                 sprintf(temp,"%d %s\n",i,terms.c_str());
                 ids_n_terms = ids_n_terms + temp;
         }
Is the string addition a very expensive operation? (will a new memory allocation takes place)

Well, yes. The above loop is bound to get very expensive, because you
want to append. That means, from time to time (possibly at every
append!), "allocate more space, copy old string there, append new
content, free old space".
You should instead just use ofstream, and play with it's rdbuf-
reasonable buffer sizes like 1K-8K (1024 to 8192) are a good guess). I
wouldn't even be surprised if you find that the default works just as
well. E.g.
ofstream f(whatever);
char buffer[try sizes here]; // speed to be gained here.
f.rdbuf->pubsetbuf(buffer, sizeof(buffer));
your loop here, but instead of sprintf and +, do
    f<<  i<<  " "<<  terms<<  endl;


Flushing after each line rather defeats the effort of fiddling with the
ofstream's buffer!


Whoops! (blushes)

Goran.
 
R

Rui Maciel

sravanreddy001 said:
Hi,

Can anyone give the best way (faster) to write the vector contents to a
file?

Is there a better way to do it than below method?

for(unsigned i=0;i<terms.size();i++){
sprintf(temp,"%d %s\n",i,terms.c_str());
ids_n_terms = ids_n_terms + temp;
}

Is the string addition a very expensive operation? (will a new memory
allocation takes place)


Before this loop, have you allocated any memory to your temp c-string?



Rui Maciel
 
F

Fulvio Esposito

Hi,

Can anyone give the best way (faster) to write the vector contents to a file?

Is there a better way to do it than below method?

for(unsigned i=0;i<terms.size();i++){
sprintf(temp,"%d %s\n",i,terms.c_str());
ids_n_terms = ids_n_terms + temp;
}

Is the string addition a very expensive operation? (will a new memory allocation takes place)

there are around 5,00,000 (strings) in the vector, and these have to stored in
<id term> format.

Thanks,
Sravan.


if you use standard file stream classes, you always get buffered I/O. For non-trivial implementation, buffer size is choosen as what is best for OS you are running on! Moreover, if the file is on a network filesystem, the single call can be broken in pieces to be sent over the network.

It always depends on your context. There's no general solution aside from let standard library and OS do what they know how to do best!

Cheers,
Fulvio Esposito
 
S

sravanreddy001

Wow...

<Special>Fulvio Esposito, Ian Collins, Goran, Juha Nieminen:</Special>
Thanks a lot for the valuable inputs. This is really content that u have provided.

I've another question. I'm reading all the file in a single go, instead of line by line.

This is efficient then other scenario's right?
each file is around 5kb - 10kb.

@Did any one worked on creating forward indexes. (Info retrieval.)
 
G

Goran

Wow...

<Special>Fulvio Esposito, Ian Collins, Goran, Juha Nieminen:</Special>
Thanks a lot for the valuable inputs. This is really content that u have provided.

I've another question. I'm reading all the file in a single go, instead of line by line.

This is efficient then other scenario's right?
each file is around 5kb - 10kb.

You absolutely must measure to know.

That said, depending on what you're doing with resulting string, even
this is probably not as efficient as simply reading off the stream and
"converting" into vector. It's not efficient because you need to read
all data, then you copy^^^ it into a string, then you parse that
string. If you simply read into your structure, you avoid the copy^^^.
Reading itself, due to all sorts of buffering that are happening
behind your back, won't be faster either way.

Goran.
 
S

sravanreddy001

Hi all,
thanks a lot for the valuable inputs.

I followed your(all of you) guidelines..

now.. its taking just 2 seconds.. against initial 10 minutes.

Thanks a lot. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top