Best way to write the vector content to a file.

Discussion in 'C++' started by sravanreddy001, Sep 20, 2011.

  1. Hi,

    Can anyone give the best way (faster) to write the vector contents to a file?

    Is there a better way to do it than below method?

    for(unsigned i=0;i<terms.size();i++){
    sprintf(temp,"%d %s\n",i,terms.c_str());
    ids_n_terms = ids_n_terms + temp;
    }

    Is the string addition a very expensive operation? (will a new memory allocation takes place)

    there are around 5,00,000 (strings) in the vector, and these have to stored in
    <id term> format.

    Thanks,
    Sravan.
    sravanreddy001, Sep 20, 2011
    #1
    1. Advertising

  2. sravanreddy001

    Ian Collins Guest

    On 09/20/11 03:05 PM, sravanreddy001 wrote:
    > Hi,
    >
    > Can anyone give the best way (faster) to write the vector contents to a file?
    >
    > Is there a better way to do it than below method?
    >
    > for(unsigned i=0;i<terms.size();i++){
    > sprintf(temp,"%d %s\n",i,terms.c_str());
    > ids_n_terms = ids_n_terms + temp;
    > }


    Where do you do the write?

    > Is the string addition a very expensive operation? (will a new memory allocation takes place)
    >
    > there are around 5,00,000 (strings) in the vector, and these have to stored in
    > <id term> format.


    You will probably be I/O limited, so optimising your code won't make any
    significant difference.

    --
    Ian Collins
    Ian Collins, Sep 20, 2011
    #2
    1. Advertising

  3. I'm actually writing this to a file.
    But, file operation is just one simple write.
    (the string that is constructed in the loop is written.. and the file size is around 5 MB.)

    but.. string+ = string + some_thing;

    if this is going to allocate a new memory location, then.. writing it multiple times to file.. will any of this approach help?

    One my i3 2nd gen computer.. it took around 10 mins. (using one single core)
    not multithreaded.
    sravanreddy001, Sep 20, 2011
    #3
  4. Is there a way to automatically push the vector contents to file in an efficient manner?
    sravanreddy001, Sep 20, 2011
    #4
  5. sravanreddy001

    Ian Collins Guest

    On 09/20/11 03:28 PM, sravanreddy001 wrote:

    Context is nice!

    > I'm actually writing this to a file.
    > But, file operation is just one simple write.
    > (the string that is constructed in the loop is written.. and the file size is around 5 MB.)
    >
    > but.. string+ = string + some_thing;
    >
    > if this is going to allocate a new memory location, then.. writing it multiple times to file.. will any of this approach help?
    >
    > One my i3 2nd gen computer.. it took around 10 mins. (using one single core)
    > not multithreaded.


    You would probably be better off just using an ofstream and writing a
    line at a time. I have one application that reads and writes in excess
    of 1GB with fstreams. Reading takes a couple of minutes, writing about 5.

    Your huge string will do a lot of progressively bigger memory moves,
    which are a bigger issue than the allocations. Kind of a
    macro-deoptimisation!

    --
    Ian Collins
    Ian Collins, Sep 20, 2011
    #5
  6. do you think, writing so many times... 5 lakhs times(lines) is more expencive... more disk writes.. thats why i came to that approach..

    so.. will wiriting to file every 20-30 lines at once.. will it help me?
    it will atleast reduce the disk writes by 20-30 times..
    sravanreddy001, Sep 20, 2011
    #6
  7. Also, I'm reading all the file content in one GO.

    seeking until the end of file.. read from start to end.
    this is efficient than reading line by line right?
    sravanreddy001, Sep 20, 2011
    #7
  8. sravanreddy001

    Ian Collins Guest

    On 09/20/11 03:50 PM, sravanreddy001 wrote:
    > Also, I'm reading all the file content in one GO.


    Please keep the context from the post you are relying to, otherwise your
    posts don't make sense.

    > seeking until the end of file.. read from start to end.
    > this is efficient than reading line by line right?


    You are worrying about the wrong thing, just create an ifstream and read
    the entries in one at a time.

    --
    Ian Collins
    Ian Collins, Sep 20, 2011
    #8
  9. sravanreddy001

    Ian Collins Guest

    On 09/20/11 03:47 PM, sravanreddy001 wrote:
    > do you think, writing so many times... 5 lakhs times(lines) is more expencive... more disk writes.. thats why i came to that approach..
    >
    > so.. will wiriting to file every 20-30 lines at once.. will it help me?
    > it will atleast reduce the disk writes by 20-30 times..


    Leave the buffering to the stream and the OS, it's their job. Unless
    you flush the stream each line and force synchronous writes on the file,
    you will not see millions of writes. On my system, that much data would
    be coalesced into one disk write!

    --
    Ian Collins
    Ian Collins, Sep 20, 2011
    #9
  10. sravanreddy001 <> wrote:
    > for(unsigned i=0;i<terms.size();i++){
    > sprintf(temp,"%d %s\n",i,terms.c_str());
    > ids_n_terms = ids_n_terms + temp;
    > }


    I don't understand why you can't write to the file directly. Creating
    the contents of the file into a dynamic string first like that is
    significantly more inefficient.
    Juha Nieminen, Sep 20, 2011
    #10
  11. sravanreddy001

    Goran Guest

    On Sep 20, 5:05 am, sravanreddy001 <> wrote:
    > Hi,
    >
    > Can anyone give the best way (faster) to write the vector contents to a file?
    >
    > Is there a better way to do it than below method?
    >
    >         for(unsigned i=0;i<terms.size();i++){
    >                 sprintf(temp,"%d %s\n",i,terms.c_str());
    >                 ids_n_terms = ids_n_terms + temp;
    >         }
    >
    > Is the string addition a very expensive operation? (will a new memory allocation takes place)


    Well, yes. The above loop is bound to get very expensive, because you
    want to append. That means, from time to time (possibly at every
    append!), "allocate more space, copy old string there, append new
    content, free old space".

    You should instead just use ofstream, and play with it's rdbuf-
    >pubsetbuf to find a sweet spot for speed (depends on the system, but

    reasonable buffer sizes like 1K-8K (1024 to 8192) are a good guess). I
    wouldn't even be surprised if you find that the default works just as
    well. E.g.

    ofstream f(whatever);
    char buffer[try sizes here]; // speed to be gained here.
    f.rdbuf->pubsetbuf(buffer, sizeof(buffer));

    your loop here, but instead of sprintf and +, do

    f << i << " " << terms << endl;

    That does conversion to text for you.

    Goran.
    Goran, Sep 20, 2011
    #11
  12. sravanreddy001

    Ian Collins Guest

    On 09/20/11 07:37 PM, Goran wrote:
    > On Sep 20, 5:05 am, sravanreddy001<> wrote:
    >> Hi,
    >>
    >> Can anyone give the best way (faster) to write the vector contents to a file?
    >>
    >> Is there a better way to do it than below method?
    >>
    >> for(unsigned i=0;i<terms.size();i++){
    >> sprintf(temp,"%d %s\n",i,terms.c_str());
    >> ids_n_terms = ids_n_terms + temp;
    >> }
    >>
    >> Is the string addition a very expensive operation? (will a new memory allocation takes place)

    >
    > Well, yes. The above loop is bound to get very expensive, because you
    > want to append. That means, from time to time (possibly at every
    > append!), "allocate more space, copy old string there, append new
    > content, free old space".
    >
    > You should instead just use ofstream, and play with it's rdbuf-
    >> pubsetbuf to find a sweet spot for speed (depends on the system, but

    > reasonable buffer sizes like 1K-8K (1024 to 8192) are a good guess). I
    > wouldn't even be surprised if you find that the default works just as
    > well. E.g.
    >
    > ofstream f(whatever);
    > char buffer[try sizes here]; // speed to be gained here.
    > f.rdbuf->pubsetbuf(buffer, sizeof(buffer));
    >
    > your loop here, but instead of sprintf and +, do
    >
    > f<< i<< " "<< terms<< endl;


    Flushing after each line rather defeats the effort of fiddling with the
    ofstream's buffer!

    --
    Ian Collins
    Ian Collins, Sep 20, 2011
    #12
  13. sravanreddy001

    Goran Guest

    On Sep 20, 9:45 am, Ian Collins <> wrote:
    > On 09/20/11 07:37 PM, Goran wrote:
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > > On Sep 20, 5:05 am, sravanreddy001<>  wrote:
    > >> Hi,

    >
    > >> Can anyone give the best way (faster) to write the vector contents to a file?

    >
    > >> Is there a better way to do it than below method?

    >
    > >>          for(unsigned i=0;i<terms.size();i++){
    > >>                  sprintf(temp,"%d %s\n",i,terms.c_str());
    > >>                  ids_n_terms = ids_n_terms + temp;
    > >>          }

    >
    > >> Is the string addition a very expensive operation? (will a new memory allocation takes place)

    >
    > > Well, yes. The above loop is bound to get very expensive, because you
    > > want to append. That means, from time to time (possibly at every
    > > append!), "allocate more space, copy old string there, append new
    > > content, free old space".

    >
    > > You should instead just use ofstream, and play with it's rdbuf-
    > >> pubsetbuf to find a sweet spot for speed (depends on the system, but

    > > reasonable buffer sizes like 1K-8K (1024 to 8192) are a good guess). I
    > > wouldn't even be surprised if you find that the default works just as
    > > well. E.g.

    >
    > > ofstream f(whatever);
    > > char buffer[try sizes here]; // speed to be gained here.
    > > f.rdbuf->pubsetbuf(buffer, sizeof(buffer));

    >
    > > your loop here, but instead of sprintf and +, do

    >
    > >     f<<  i<<  " "<<  terms<<  endl;

    >
    > Flushing after each line rather defeats the effort of fiddling with the
    > ofstream's buffer!


    Whoops! (blushes)

    Goran.
    Goran, Sep 20, 2011
    #13
  14. sravanreddy001

    Rui Maciel Guest

    sravanreddy001 wrote:

    > Hi,
    >
    > Can anyone give the best way (faster) to write the vector contents to a
    > file?
    >
    > Is there a better way to do it than below method?
    >
    > for(unsigned i=0;i<terms.size();i++){
    > sprintf(temp,"%d %s\n",i,terms.c_str());
    > ids_n_terms = ids_n_terms + temp;
    > }
    >
    > Is the string addition a very expensive operation? (will a new memory
    > allocation takes place)


    Before this loop, have you allocated any memory to your temp c-string?



    Rui Maciel
    Rui Maciel, Sep 20, 2011
    #14
  15. On Tuesday, September 20, 2011 5:05:44 AM UTC+2, sravanreddy001 wrote:
    > Hi,
    >
    > Can anyone give the best way (faster) to write the vector contents to a file?
    >
    > Is there a better way to do it than below method?
    >
    > for(unsigned i=0;i<terms.size();i++){
    > sprintf(temp,"%d %s\n",i,terms.c_str());
    > ids_n_terms = ids_n_terms + temp;
    > }
    >
    > Is the string addition a very expensive operation? (will a new memory allocation takes place)
    >
    > there are around 5,00,000 (strings) in the vector, and these have to stored in
    > <id term> format.
    >
    > Thanks,
    > Sravan.


    if you use standard file stream classes, you always get buffered I/O. For non-trivial implementation, buffer size is choosen as what is best for OS you are running on! Moreover, if the file is on a network filesystem, the single call can be broken in pieces to be sent over the network.

    It always depends on your context. There's no general solution aside from let standard library and OS do what they know how to do best!

    Cheers,
    Fulvio Esposito
    Fulvio Esposito, Sep 20, 2011
    #15
  16. Wow...

    <Special>Fulvio Esposito, Ian Collins, Goran, Juha Nieminen:</Special>
    Thanks a lot for the valuable inputs. This is really content that u have provided.

    I've another question. I'm reading all the file in a single go, instead of line by line.

    This is efficient then other scenario's right?
    each file is around 5kb - 10kb.

    @Did any one worked on creating forward indexes. (Info retrieval.)
    sravanreddy001, Sep 20, 2011
    #16
  17. sravanreddy001

    Goran Guest

    On Sep 20, 2:00 pm, sravanreddy001 <> wrote:
    > Wow...
    >
    > <Special>Fulvio Esposito, Ian Collins, Goran, Juha Nieminen:</Special>
    > Thanks a lot for the valuable inputs. This is really content that u have provided.
    >
    > I've another question. I'm reading all the file in a single go, instead of line by line.
    >
    > This is efficient then other scenario's right?
    > each file is around 5kb - 10kb.


    You absolutely must measure to know.

    That said, depending on what you're doing with resulting string, even
    this is probably not as efficient as simply reading off the stream and
    "converting" into vector. It's not efficient because you need to read
    all data, then you copy^^^ it into a string, then you parse that
    string. If you simply read into your structure, you avoid the copy^^^.
    Reading itself, due to all sorts of buffering that are happening
    behind your back, won't be faster either way.

    Goran.
    Goran, Sep 20, 2011
    #17
  18. Hi all,
    thanks a lot for the valuable inputs.

    I followed your(all of you) guidelines..

    now.. its taking just 2 seconds.. against initial 10 minutes.

    Thanks a lot. :)
    sravanreddy001, Sep 21, 2011
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bruce Lee
    Replies:
    1
    Views:
    418
    Jon Martin Solaas
    Oct 9, 2005
  2. pmatos
    Replies:
    6
    Views:
    23,732
  3. Replies:
    8
    Views:
    1,894
    Csaba
    Feb 18, 2006
  4. Javier
    Replies:
    2
    Views:
    544
    James Kanze
    Sep 4, 2007
  5. Rushikesh Joshi
    Replies:
    0
    Views:
    349
    Rushikesh Joshi
    Jul 10, 2004
Loading...

Share This Page