file streams

Discussion in 'C++' started by David Breton, Oct 13, 2009.

  1. David Breton

    David Breton Guest

    Hi,

    I have a few questions about ifstream and file i/o in general.

    Say I open a file stream with this command:
    ifstream in;
    in.open("file.dat", ios::in|ios::binary);

    1) Is any part of file.dat loaded in memory at this point?

    If I now do:
    in.seekg(sizeof(long)*1000000);
    in.read((char*)l, sizeof(long));

    2) Roughly how much of the file is in, or went through, memory?

    3) What is the time complexity of in.seekg(n)?
    Is it O(1), O(n), something else, or does it depend on the hardware?

    Also if anyone could point me to a url where I can find answers to such
    questions, it would be appreciated.

    If you want to know, I'm asking because I have a large file, about
    200GB, which contains nothing but longs. Given a number 'k' I want to
    efficiently retrieve the k_th long stored in the file.

    Thanks,

    David
    --
    When in doubt, use brute force.
    -- Ken Thompson
     
    David Breton, Oct 13, 2009
    #1
    1. Advertising

  2. David Breton wrote:
    > I have a few questions about ifstream and file i/o in general.
    >
    > Say I open a file stream with this command:
    > ifstream in;
    > in.open("file.dat", ios::in|ios::binary);
    >
    > 1) Is any part of file.dat loaded in memory at this point?


    Unknown. Probably specific to your OS. And to the meaning of the file.

    > If I now do:
    > in.seekg(sizeof(long)*1000000);
    > in.read((char*)l, sizeof(long));
    >
    > 2) Roughly how much of the file is in, or went through, memory?


    Again, who can tell? Unless your OS manual specifies it, it's unknown.

    > 3) What is the time complexity of in.seekg(n)?
    > Is it O(1), O(n), something else, or does it depend on the hardware?


    What's the meaning of your file? If it's a tape, it can't be O(1), can
    it? When you're at the beginning, seeking to the beginning is
    instantaneous, right? Seeking anywhere is proportional to the distance
    between that place and wherever your stream "cursor" is.

    > Also if anyone could point me to a url where I can find answers to such
    > questions, it would be appreciated.


    Try asking in the newsgroup for your OS.

    > If you want to know, I'm asking because I have a large file, about
    > 200GB, which contains nothing but longs. Given a number 'k' I want to
    > efficiently retrieve the k_th long stored in the file.


    It doesn't make sense to do it inefficiently, with that I can agree. It
    is most likely more efficient to do 'seek' than to read all data up to
    the one you need into a dummy buffer. But for some devices that's
    probably the only way to get to the data... C++ does not specify how
    seeking in a stream is done.

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, Oct 13, 2009
    #2
    1. Advertising

  3. David Breton

    James Kanze Guest

    On Oct 13, 9:03 pm, David Breton <> wrote:

    > I have a few questions about ifstream and file i/o in general.


    > Say I open a file stream with this command:
    > ifstream in;
    > in.open("file.dat", ios::in|ios::binary);


    > 1) Is any part of file.dat loaded in memory at this point?


    Totally unspecified. It depends on the implementation of the
    library, and on the OS. None of the widespread library
    implementations I know read data from the system before you try
    to extract it, but I think a lot of systems do start reading the
    file immediately after the open, so that future reads will go to
    a system buffer, rather than having to wait for disk.

    Of course, a library implementation is free to handle this any
    way it sees fit, and I think some experimental library
    implementations will immediately mmap the file, so all of it is
    conceptually in memory, but won't physically be in memory until
    you try to access it. (I would expect such implementations to
    become more frequent on 64 bit processors, where the library can
    afford to use up large blocks of address space.)

    > If I now do:
    > in.seekg(sizeof(long)*1000000);
    > in.read((char*)l, sizeof(long));


    > 2) Roughly how much of the file is in, or went through, memory?


    Technically, it's unspecified, and totally system dependent.
    Practically, you'll get the sector (or sectors) you're reading
    from, plus any pre-reads the OS did on open, and nothing else.
    Unless, of course, the library implementation is using mmap or
    its equivalent under Windows, in which case: define what you
    mean by "in, or went through, memory".

    > 3) What is the time complexity of in.seekg(n)?
    > Is it O(1), O(n), something else, or does it depend on the hardware?


    The standard doesn't say, but typically, it's O(1).

    > Also if anyone could point me to a url where I can find
    > answers to such questions, it would be appreciated.


    > If you want to know, I'm asking because I have a large file,
    > about 200GB, which contains nothing but longs. Given a number
    > 'k' I want to efficiently retrieve the k_th long stored in the
    > file.


    If you're on a 64 bit machine, just mmap it; if the file
    contains a memory image (of long or whatever), then there's no
    point in going through streams. (Be aware, too, that you may
    not be able to reread the file if you recompile your code.)

    --
    James Kanze
     
    James Kanze, Oct 14, 2009
    #3
  4. David Breton

    James Kanze Guest

    On Oct 13, 9:18 pm, Victor Bazarov <> wrote:

    [...]
    > > 3) What is the time complexity of in.seekg(n)?
    > > Is it O(1), O(n), something else, or does it depend on the hardware?


    > What's the meaning of your file? If it's a tape, it can't be
    > O(1), can it? When you're at the beginning, seeking to the
    > beginning is instantaneous, right? Seeking anywhere is
    > proportional to the distance between that place and wherever
    > your stream "cursor" is.


    Just a nit (because I basically agree with what you're saying),
    but on the tape drives I remember (back in the 1970's), seeking
    was O(n), but it was an order of magnitude or more faster than
    reading.

    And of course, if you're asking what he means by file:) (and
    from a C++ point of view, it's a very valid question), not all
    "files" support seeking. How long does it take to seek ahead
    100000 characters on a keyboard?

    > > Also if anyone could point me to a url where I can find
    > > answers to such questions, it would be appreciated.


    > Try asking in the newsgroup for your OS.


    > > If you want to know, I'm asking because I have a large file,
    > > about 200GB, which contains nothing but longs. Given a
    > > number 'k' I want to efficiently retrieve the k_th long
    > > stored in the file.


    > It doesn't make sense to do it inefficiently, with that I can
    > agree. It is most likely more efficient to do 'seek' than to
    > read all data up to the one you need into a dummy buffer. But
    > for some devices that's probably the only way to get to the
    > data... C++ does not specify how seeking in a stream is done.


    If he's on a 32 bit machine, and using a lot of memory
    otherwise, he may not even be able to fit the file entirely in
    memory.

    But for this sort of thing, iostream is the wrong tool. It can
    be made to work, and probably isn't that inefficient, but mmap
    or its Windows equivalent seems to be more appropriate, provided
    he can map the entire file: it's not only more efficient, but
    the abstraction is a lot closer to what he's doing, and it will
    be easier to use and understand.

    --
    James Kanze
     
    James Kanze, Oct 14, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul L
    Replies:
    2
    Views:
    340
    Paul L
    Apr 11, 2005
  2. Shea Martin

    Multiple streams per file

    Shea Martin, Aug 16, 2005, in forum: Java
    Replies:
    4
    Views:
    1,199
    Mike Schilling
    Aug 17, 2005
  3. Saulius

    Using file streams in DLL

    Saulius, Aug 16, 2003, in forum: C++
    Replies:
    3
    Views:
    480
    Gianni Mariani
    Aug 17, 2003
  4. Marc Cromme
    Replies:
    1
    Views:
    2,591
    Jorge L Rivera
    Dec 2, 2003
  5. Robert Schweikert
    Replies:
    1
    Views:
    357
    Jack Klein
    Jan 6, 2004
Loading...

Share This Page