file streams

D

David Breton

Hi,

I have a few questions about ifstream and file i/o in general.

Say I open a file stream with this command:
ifstream in;
in.open("file.dat", ios::in|ios::binary);

1) Is any part of file.dat loaded in memory at this point?

If I now do:
in.seekg(sizeof(long)*1000000);
in.read((char*)l, sizeof(long));

2) Roughly how much of the file is in, or went through, memory?

3) What is the time complexity of in.seekg(n)?
Is it O(1), O(n), something else, or does it depend on the hardware?

Also if anyone could point me to a url where I can find answers to such
questions, it would be appreciated.

If you want to know, I'm asking because I have a large file, about
200GB, which contains nothing but longs. Given a number 'k' I want to
efficiently retrieve the k_th long stored in the file.

Thanks,

David
 
V

Victor Bazarov

David said:
I have a few questions about ifstream and file i/o in general.

Say I open a file stream with this command:
ifstream in;
in.open("file.dat", ios::in|ios::binary);

1) Is any part of file.dat loaded in memory at this point?

Unknown. Probably specific to your OS. And to the meaning of the file.
If I now do:
in.seekg(sizeof(long)*1000000);
in.read((char*)l, sizeof(long));

2) Roughly how much of the file is in, or went through, memory?

Again, who can tell? Unless your OS manual specifies it, it's unknown.
3) What is the time complexity of in.seekg(n)?
Is it O(1), O(n), something else, or does it depend on the hardware?

What's the meaning of your file? If it's a tape, it can't be O(1), can
it? When you're at the beginning, seeking to the beginning is
instantaneous, right? Seeking anywhere is proportional to the distance
between that place and wherever your stream "cursor" is.
Also if anyone could point me to a url where I can find answers to such
questions, it would be appreciated.

Try asking in the newsgroup for your OS.
If you want to know, I'm asking because I have a large file, about
200GB, which contains nothing but longs. Given a number 'k' I want to
efficiently retrieve the k_th long stored in the file.

It doesn't make sense to do it inefficiently, with that I can agree. It
is most likely more efficient to do 'seek' than to read all data up to
the one you need into a dummy buffer. But for some devices that's
probably the only way to get to the data... C++ does not specify how
seeking in a stream is done.

V
 
J

James Kanze

I have a few questions about ifstream and file i/o in general.
Say I open a file stream with this command:
ifstream in;
in.open("file.dat", ios::in|ios::binary);
1) Is any part of file.dat loaded in memory at this point?

Totally unspecified. It depends on the implementation of the
library, and on the OS. None of the widespread library
implementations I know read data from the system before you try
to extract it, but I think a lot of systems do start reading the
file immediately after the open, so that future reads will go to
a system buffer, rather than having to wait for disk.

Of course, a library implementation is free to handle this any
way it sees fit, and I think some experimental library
implementations will immediately mmap the file, so all of it is
conceptually in memory, but won't physically be in memory until
you try to access it. (I would expect such implementations to
become more frequent on 64 bit processors, where the library can
afford to use up large blocks of address space.)
If I now do:
in.seekg(sizeof(long)*1000000);
in.read((char*)l, sizeof(long));
2) Roughly how much of the file is in, or went through, memory?

Technically, it's unspecified, and totally system dependent.
Practically, you'll get the sector (or sectors) you're reading
from, plus any pre-reads the OS did on open, and nothing else.
Unless, of course, the library implementation is using mmap or
its equivalent under Windows, in which case: define what you
mean by "in, or went through, memory".
3) What is the time complexity of in.seekg(n)?
Is it O(1), O(n), something else, or does it depend on the hardware?

The standard doesn't say, but typically, it's O(1).
Also if anyone could point me to a url where I can find
answers to such questions, it would be appreciated.
If you want to know, I'm asking because I have a large file,
about 200GB, which contains nothing but longs. Given a number
'k' I want to efficiently retrieve the k_th long stored in the
file.

If you're on a 64 bit machine, just mmap it; if the file
contains a memory image (of long or whatever), then there's no
point in going through streams. (Be aware, too, that you may
not be able to reread the file if you recompile your code.)
 
J

James Kanze

[...]
What's the meaning of your file? If it's a tape, it can't be
O(1), can it? When you're at the beginning, seeking to the
beginning is instantaneous, right? Seeking anywhere is
proportional to the distance between that place and wherever
your stream "cursor" is.

Just a nit (because I basically agree with what you're saying),
but on the tape drives I remember (back in the 1970's), seeking
was O(n), but it was an order of magnitude or more faster than
reading.

And of course, if you're asking what he means by file:) (and
from a C++ point of view, it's a very valid question), not all
"files" support seeking. How long does it take to seek ahead
100000 characters on a keyboard?
Try asking in the newsgroup for your OS.
It doesn't make sense to do it inefficiently, with that I can
agree. It is most likely more efficient to do 'seek' than to
read all data up to the one you need into a dummy buffer. But
for some devices that's probably the only way to get to the
data... C++ does not specify how seeking in a stream is done.

If he's on a 32 bit machine, and using a lot of memory
otherwise, he may not even be able to fit the file entirely in
memory.

But for this sort of thing, iostream is the wrong tool. It can
be made to work, and probably isn't that inefficient, but mmap
or its Windows equivalent seems to be more appropriate, provided
he can map the entire file: it's not only more efficient, but
the abstraction is a lot closer to what he's doing, and it will
be easier to use and understand.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top