Designing Data Interface for Very Large Files [more than GB size]

S

shailesh kumar

Hi,

I need to design data interfaces for accessing files of very large
sizes efficiently. The data will be accessed in chunks of fixed size
[may be a block of 16 KB]... My data interface should be able to do a
random seek in the file, as well as sequential access block by
block....

One aspect of the usage of this interface is that there is quite good
chance of accessing same blocks again and again by the application..
Hence, some caching might be needed for efficient implementation..

I was wondering how should such a data interface be implemented. I
could not find much literature on issues in handling very large files
of GB size.. I am wondering Whether C++ fstream classes are suitable
for the above problem or not?

Can somebody help me with some information about how to tackle this
problem? Or some pointers to where relavant information can be found?

Thanx and regards
Shailesh Kumar
 
P

Peter van Merkerk

I need to design data interfaces for accessing files of very large
sizes efficiently. The data will be accessed in chunks of fixed size
[may be a block of 16 KB]... My data interface should be able to do a
random seek in the file, as well as sequential access block by
block....

One aspect of the usage of this interface is that there is quite good
chance of accessing same blocks again and again by the application..
Hence, some caching might be needed for efficient implementation..

Chances are that your OS and/or the implementation of the standard
library already does some caching. Personally I would not implement
caching right away, but design the interface in such a way that caching
can be added transparently later if the need arises.
I was wondering how should such a data interface be implemented. I
could not find much literature on issues in handling very large files
of GB size.. I am wondering Whether C++ fstream classes are suitable
for the above problem or not?

One potential problem you may run is that the data type used for file
positioning isn't large enough. When dealing with files larger than 2 or
4 GByte this may very well be a problem. AFAIK there is no portable
solution guaranteed to be able to do randomly access all data on files
larger than the files sizes mentioned before.
 
A

amit gulati

This is just a suggestion, If you are using windows, then use the Memory
Mapped files to open large files rather than using iostreams.
Memory Mapped file access is fast and the load time will also be less
than the iostreams.

And as someone already said, on 32 bit systems it is not possible to
load a file of size more than 4 GB.
 
S

shailesh kumar

amit gulati said:
This is just a suggestion, If you are using windows, then use the Memory
Mapped files to open large files rather than using iostreams.
Memory Mapped file access is fast and the load time will also be less
than the iostreams.
I am developing on windows only, but the code is going to be portable,
hence
i am not sure if using these memory mapped files would be good.
And as someone already said, on 32 bit systems it is not possible to
load a file of size more than 4 GB.
This is exactly one of my concerns... Does the Visual C++ compiler
have some support
for 64-bit file access? Or in general which file systems really
support such a thing?

regards
shailesh
shailesh said:
Hi,

I need to design data interfaces for accessing files of very large
sizes efficiently. The data will be accessed in chunks of fixed size
[may be a block of 16 KB]... My data interface should be able to do a
random seek in the file, as well as sequential access block by
block....

One aspect of the usage of this interface is that there is quite good
chance of accessing same blocks again and again by the application..
Hence, some caching might be needed for efficient implementation..

I was wondering how should such a data interface be implemented. I
could not find much literature on issues in handling very large files
of GB size.. I am wondering Whether C++ fstream classes are suitable
for the above problem or not?

Can somebody help me with some information about how to tackle this
problem? Or some pointers to where relavant information can be found?

Thanx and regards
Shailesh Kumar
 
A

amit gulati

shailesh said:
I am developing on windows only, but the code is going to be portable,
hence
i am not sure if using these memory mapped files would be good.

I think linux has some sort of a file memory mapping mechanism, you can
use #ifdef and #define to seperate the machine dependent code.
This is exactly one of my concerns... Does the Visual C++ compiler
have some support
for 64-bit file access? Or in general which file systems really
support such a thing?

You need a 64 bit processor and operating system. Windows has not come
up out with a 64 bit operating system for x86, AMD recently cam out with
a 64 bit x86 baesd processor.
regards
shailesh

shailesh kumar wrote:

Hi,

I need to design data interfaces for accessing files of very large
sizes efficiently. The data will be accessed in chunks of fixed size
[may be a block of 16 KB]... My data interface should be able to do a
random seek in the file, as well as sequential access block by
block....

One aspect of the usage of this interface is that there is quite good
chance of accessing same blocks again and again by the application..
Hence, some caching might be needed for efficient implementation..

I was wondering how should such a data interface be implemented. I
could not find much literature on issues in handling very large files
of GB size.. I am wondering Whether C++ fstream classes are suitable
for the above problem or not?

Can somebody help me with some information about how to tackle this
problem? Or some pointers to where relavant information can be found?

Thanx and regards
Shailesh Kumar
 
S

Sandeep Pulla

I am developing on windows only, but the code is going to be portable,
hence
i am not sure if using these memory mapped files would be good.

AFAIK memory-mapped files are available on Linux (mmap()) and perhaps
on other unix variants (?). The usage semantics are quite similar to
Windows and therefore you can create a thin abstraction layer between
the two.

Sandeep
 
P

Peter van Merkerk

This is exactly one of my concerns... Does the Visual C++ compiler
File access is not a compiler issue, but a library and OS API issue. The
Win32 API does have support for 64-bit file access.
You need a 64 bit processor and operating system. Windows has not come
up out with a 64 bit operating system for x86, AMD recently cam out with
a 64 bit x86 baesd processor.

That is not true. The maximum file size an OS can handle is not related
to whether it runs on a 32-bit or 64-bit processor. Just like the good
old 16-bit OSes could handle files larger than 64Kbytes, 32-bit Windows
(and many other 32-bit OSes for that matter) can handle files larger
than 4GB. If you look for example at Win32 API function SetFilePointer()
you see it uses two (32-bit) signed long variables for the position, so
it can potentially address 2^63 bytes. More than seven years ago I wrote
software for the Windows NT platform that handled files that were larger
than 4GB. Only if you intend to load/map the complete file in memory a
64-bit OS would come in handy.

The real problem is that there is not standard way that is guaranteed to
work with large (>2 GByte) files. You will have to use platform specific
functions for that. If you intend to port your software to another
platform it is best write one or more wrappers around those platform
specific function calls. If you port to another platform you will only
have to rewrite those wrappers. As long as there are no platform or
compiler specific things on the interface of the wrappers, porting to
another platform should be relatively straightforward. When designing
the wrappers interface it might be wise to look at various OS API's to
see if there is a common denominator. I also recommend looking for
cross-platform libraries that wrap the OS API; it can save you a lot of
work. However if reading 16KByte chucks is all you ever going to need I
think the interface can be very straightforard and simple.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top