iostream and files larger than 4GB

R

Robert Kochem

Hi,

I am relative new to C++ regarding it's functions and libraries. I need to
access files larger than 4GB which is AFAIK not possible with the STL
iostream - at least not if using a 32 bit compiler. iostream was my
favorite as my code has to work on files as well as memory buffers...

Could somebody please help me what functions/classes are the best in this
case?

BTW: I am currently using Visual C++ 2008 on Win32, but if possible I want
to write my code as "portable as possible".

Robert
 
R

Robert Kochem

Victor said:
Have you actually tried and failed, or is that only your speculation?

If you get a "possible loss of data" warning when feeding seekg() with an
64 bit integer - what would you expect?
AFAIK, even standard C Library functions like fread and fseek should
work with large files. And since C++ I/O streams are relatively thin
wrappers around C streams, those are expected to work just as well.
Write a program, see if you get it to work, if not, post your code and
explain the situation.

It may work for files, but can I work with them on memory streams?

Robert
 
R

Ron AF Greve

Hi,

Most 32 bit OS'es have a limitation of 2GB (maybe 4GB I am not sure about
that) per file. If the OS can handle both like for instance sun V440 or
ubuntu 64 or lots of others you can compile with m64 with gcc (and link to
64 versions of all libraries) and usually have the full 64 bit range.
Sometimes you have to add something like LARGE_FILE_SUPPORT from the top of
my memory.

I havent' tried but maybe the same applies to 64 bit MS-Windows.


Regards, Ron AF Greve

http://www.InformationSuperHighway.eu
 
R

Robert Kochem

Victor said:
I expect not to use seekg then. Or switch to a better implementation of
the library.

That is easy to say - but what else to use?
I don't know what those are, sorry.

May be that was not the correct name in the C++ realm: I need an
abstraction of the underlaying data source. My code have to work on files
as well as on memory buffers and I call a stream using a memory buffer as
source a memory stream.

Robert
 
R

Robert Kochem

Ron said:
Most 32 bit OS'es have a limitation of 2GB (maybe 4GB I am not sure about
that) per file.

Sorry, but I can't believe that. Do you really mean that e.g. a 32bit Linux
filesystem can not handle files larger than 4GB?

Robert
 
R

Ron AF Greve

Hi,

In the past there certainly was a time it couldn't. Currently I haven't a
pure 32 bit linux version although I could test with 64 and compiling for 32
(maybe tomorrow) And I am sure a lot of OS'es indeed don't. Lookup your
flavor and search for large file support.

Regards, Ron AF Greve

http://www.InformationSuperHighway.eu
 
M

Marcel Müller

Robert said:
Sorry, but I can't believe that. Do you really mean that e.g. a 32bit Linux
filesystem can not handle files larger than 4GB?

I don't think so either. The 64 bit file API is in no way related to 64
bit extension of the CPU. even 8 bit CPUs could deal with 64 bit numbers.

It is a compile time feature of the runtime library. At the operatin
system level there are either two sets of API function with and without
large file support or optional 64 bit extension parameters to the 32 bit
API functions (like Win32). Unfortunately the C++ runtimes are not the
first ones that support this.

For tasks like that I do not recommend to use the iostream libraries at
all. Usually they are not trimmed for maximum perpormance. Sometimes the
implementations are more like case studies.
And writing that large files /is/ a question of performance. You might
want to control the caching of the content. Or you might do the I/O
asynchronously.


Marcel
 
I

ian-news

Please stop top-posting.
Hi,

Most 32 bit OS'es have a limitation of 2GB (maybe 4GB I am not sure about
that) per file.

That's nonsense.
If the OS can handle both like for instance sun V440 or

A V440 is a machine, not an OS.
ubuntu 64 or lots of others you can compile with m64 with gcc (and link to
64 versions of all libraries) and usually have the full 64 bit range.

Also nonsense.

Ian.
 
J

James Kanze

Have you actually tried and failed, or is that only your
speculation?

It's really implementation defined. I know that some
implementations do have this restriction.
AFAIK, even standard C Library functions like fread and fseek
should work with large files.

According to what or who? The standards (both C and C++) are
really very, very vague about this (intentionally). I think
about all you can portably count on is that you can read
anything you can write. If the library doesn't allow writing
files with more than some upper limit of characters, then
there's no reason to assume that it can read them.

From a quality of implementation point of view, of course, one
would expect that the library not introduce additional
restrictions not present in the OS. But backwards compatibility
issues sometimes pose problems: changing the size of off_t on a
Posix implementation breaks binary compatibility, for example.
So libc.so (the dynamic object which contains the system API and
the basic C library under Solaris) must stick with 32 bit file
offsets, or existing binaries will cease to work. And if
libc.so uses a 32 bit file offset, then any new code which links
against it must, too. So by default, fopen uses a 32 bit file
offset, and only allows access to the first 4 GB of a file, at
least in programs compiled in 32 bit mode. I don't know how
Windows handles this, but I'd be surprised if they didn't
encounter the same problems, at least to some degree.

The obvious solution would be to have three models, instead of
two: a pure 32 bit mode for legacy code, a 32 bit mode with 64
bit file offsets for new 32 bit code, and a 64 bit mode. On the
other hand, even coping with two different models on the same
machine can be confusing enough.
 
J

James Kanze

I don't think so either. The 64 bit file API is in no way
related to 64 bit extension of the CPU. even 8 bit CPUs could
deal with 64 bit numbers.

Posix requires off_t to be a typedef to a signed integral type.
It also requires that the file size, in bytes, be held in an
off_t. In the days before long long, the largest signed
integral type was long, normally 32 bits on a 32 bit machine.
Which meant that file sizes were limited to 2GB. (Of course,
back then, a file of more than 2GB wouldn't fit on most disks.)

The integration of large file support has been extremely
complex, since breaking existing binaries (which dynamically
link to the system API) was not considered an acceptable option.
The result is that by default, both 32 bit Solaris and 32 bit
Linux do not support files greater than 2GB. (I think that both
have means to do so; it's highly unlikely, however, that the
C++, or even the C standard library use these.)
 
M

Matthias Buelow

James said:
But backwards compatibility
issues sometimes pose problems: changing the size of off_t on a
Posix implementation breaks binary compatibility, for example.
So libc.so (the dynamic object which contains the system API and
the basic C library under Solaris) must stick with 32 bit file
offsets, or existing binaries will cease to work. And if
libc.so uses a 32 bit file offset, then any new code which links
against it must, too.

I don't know how Solaris implements this in particular but this could be
solved by providing legacy compatibility libs for older binaries (I
think that, for example, FreeBSD does it that way, which has had a
64-bit off_t since at least 1996 iirc.)
 
M

Matthias Buelow

James said:
The result is that by default, both 32 bit Solaris and 32 bit
Linux do not support files greater than 2GB. (I think that both

What do you mean by that? At least Linux (i386) has had support for
files >2gb for many years now, "out of box" (that is, by default).
 
J

James Kanze

What do you mean by that? At least Linux (i386) has had
support for files >2gb for many years now, "out of box" (that
is, by default).

The OS, yes, but at least on the 32 bit implementations I have
access to, off_t is an int32_t, which means (indirectly) that
the standard FILE* and fstream will have problems with them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top