file streams

Discussion in 'C++' started by David Breton, Oct 13, 2009.

  1. David Breton

    David Breton Guest


    I have a few questions about ifstream and file i/o in general.

    Say I open a file stream with this command:
    ifstream in;"file.dat", ios::in|ios::binary);

    1) Is any part of file.dat loaded in memory at this point?

    If I now do:
    in.seekg(sizeof(long)*1000000);*)l, sizeof(long));

    2) Roughly how much of the file is in, or went through, memory?

    3) What is the time complexity of in.seekg(n)?
    Is it O(1), O(n), something else, or does it depend on the hardware?

    Also if anyone could point me to a url where I can find answers to such
    questions, it would be appreciated.

    If you want to know, I'm asking because I have a large file, about
    200GB, which contains nothing but longs. Given a number 'k' I want to
    efficiently retrieve the k_th long stored in the file.


    David Breton, Oct 13, 2009
    1. Advertisements

  2. Unknown. Probably specific to your OS. And to the meaning of the file.
    Again, who can tell? Unless your OS manual specifies it, it's unknown.
    What's the meaning of your file? If it's a tape, it can't be O(1), can
    it? When you're at the beginning, seeking to the beginning is
    instantaneous, right? Seeking anywhere is proportional to the distance
    between that place and wherever your stream "cursor" is.
    Try asking in the newsgroup for your OS.
    It doesn't make sense to do it inefficiently, with that I can agree. It
    is most likely more efficient to do 'seek' than to read all data up to
    the one you need into a dummy buffer. But for some devices that's
    probably the only way to get to the data... C++ does not specify how
    seeking in a stream is done.

    Victor Bazarov, Oct 13, 2009
    1. Advertisements

  3. David Breton

    James Kanze Guest

    Totally unspecified. It depends on the implementation of the
    library, and on the OS. None of the widespread library
    implementations I know read data from the system before you try
    to extract it, but I think a lot of systems do start reading the
    file immediately after the open, so that future reads will go to
    a system buffer, rather than having to wait for disk.

    Of course, a library implementation is free to handle this any
    way it sees fit, and I think some experimental library
    implementations will immediately mmap the file, so all of it is
    conceptually in memory, but won't physically be in memory until
    you try to access it. (I would expect such implementations to
    become more frequent on 64 bit processors, where the library can
    afford to use up large blocks of address space.)
    Technically, it's unspecified, and totally system dependent.
    Practically, you'll get the sector (or sectors) you're reading
    from, plus any pre-reads the OS did on open, and nothing else.
    Unless, of course, the library implementation is using mmap or
    its equivalent under Windows, in which case: define what you
    mean by "in, or went through, memory".
    The standard doesn't say, but typically, it's O(1).
    If you're on a 64 bit machine, just mmap it; if the file
    contains a memory image (of long or whatever), then there's no
    point in going through streams. (Be aware, too, that you may
    not be able to reread the file if you recompile your code.)
    James Kanze, Oct 14, 2009
  4. David Breton

    James Kanze Guest

    Just a nit (because I basically agree with what you're saying),
    but on the tape drives I remember (back in the 1970's), seeking
    was O(n), but it was an order of magnitude or more faster than

    And of course, if you're asking what he means by file:) (and
    from a C++ point of view, it's a very valid question), not all
    "files" support seeking. How long does it take to seek ahead
    100000 characters on a keyboard?
    If he's on a 32 bit machine, and using a lot of memory
    otherwise, he may not even be able to fit the file entirely in

    But for this sort of thing, iostream is the wrong tool. It can
    be made to work, and probably isn't that inefficient, but mmap
    or its Windows equivalent seems to be more appropriate, provided
    he can map the entire file: it's not only more efficient, but
    the abstraction is a lot closer to what he's doing, and it will
    be easier to use and understand.
    James Kanze, Oct 14, 2009
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.