Programming in standard c

U

user923005

user923005 said:
I work for a database company, and most of our customers are large
customers.  (E.g. huge US company, large university, government of
country x, etc.)
It is not at all unusual for a single file to be 20-100 GB.  Needless
to say, you would not want to put this file into memory even if you
could do it.

Sometimes you do.  For some time, I worked for a company that had
a gigantic Perforce repository[*].  Every developer made heavy
use of this repository.  Unfortunately, Perforce doesn't scale to
anything that big or that busy.  The solution turned out to be to
put 128 GB of RAM in the Perforce server.  Then the whole
database was cached.  Performance was then tolerable, if still
not all that great.

[*] Perforce is a version control system.

Tell me, was the file read into a fixed memory buffer or memory
mapped?
Actually, since Perforce uses a database, the answer is obvious.
 
E

Eric Sosman

Bart said:
Same here.

Now, however, you've got 1GB RAM or whatever sitting there and you need to
do *something* with it. I'm talking about at least desktop PCs of course.

To coin a phrase, one gig ought to be enough for anyone.

... except it isn't. Just a couple days ago, my backup
unexpectedly wanted an additional DVD and I started looking
around for whatever was filling up my disk. After a while
I found a log file that had grown from a few dozen meg to
more than two gig since the previous backup, and I wanted to
look inside to see if I could get a hint about what kind of
hot air it was inflating itself with. I've got 1.5 gig of
RAM and I'm not running Vaster so there's still a smidgen of
it left -- but all the readily available editors and file
viewers were of the "read the whole shebang" variety ...

(Wound up using DOS ports of "head" and "tail" to snip
out a few samples from the file and discover that they were
full of uninterpretable binary garbage. Deleted the monster --
thank goodness "DEL" doesn't read the whole file! -- and am
keeping my fingers crossed.)

How common are "big" files? At a guess, relatively rare.
Still, various computer vendors and standards bodies have found
them important enough to warrant special attention to "large
file" support, with types like off_t and system calls like
open64() and endless verbiage in the documentation. Memories
have grown since the days when I learned to be parsimonious with
it, but files have grown, too, and with adoption of not-very-
dense encodings like XML they seem destined to grow larger yet.
The lesson I take from this is that it's nearly always worth
while to consider processing a file in a continuous "pass" as
opposed to counting on being able to fit its image in memory.
 
D

dj3vande

Programmers should be able to understand that doing filesize("LPT1:") or
something would be silly and an incorrect value would be returned or rather
an errorcode.

--------
int write_log_entry(const char *filename,struct log_entry *msg)
{
FILE *logfile;
int ret;
int sz;

sz=filesize(filename);
if(sz<0)
return sz;
if(sz >= globals.max_logfile_size)
rotate_logfiles();

logfile=fopen(filename,"a");
ret=log_to_stdio_stream(logfile,msg);
fclose(logfile);
return ret;
}
--------
Now what happens when somebody decides to log to the printer? Does it
still sound silly to be passing "LPT1" to filesize?


dave
(maintains code that does something remarkably similar to this)
 
E

Eric Sosman

jacob said:
Chris Torek wrote:

[snip]

This is possible in standard C.

You are forced to read character by character of data, until you reach
the end of the file. This is maybe ok (disk I/O could be the limiting
factor here) but ignores the problem I tried to solve concerning
abstracting the text/binary difference.

... a difference the Standard C library already handles,
and handles better.
 
S

Serve La

--------
int write_log_entry(const char *filename,struct log_entry *msg)
{
FILE *logfile;
int ret;
int sz;

sz=filesize(filename);
if(sz<0)
return sz;
if(sz >= globals.max_logfile_size)
rotate_logfiles();

logfile=fopen(filename,"a");
ret=log_to_stdio_stream(logfile,msg);
fclose(logfile);
return ret;
}

yes
if one decides that i;m sure there will be other unportable means to do it
 
C

CBFalconer

user923005 said:
.... snip ...

My opinion is that Jacob chose what is probably one of the most
difficult possible projects to ridicule what is possible in
standard C, and also that he probably knew it before hand. He
acts like a dummy, but he isn't one.

I agree. However he does have a very limited view of the software
and computer industry. That means that he systematically ignores
many real problems.
 
C

CBFalconer

jacob said:
.... snip ...

You are forced to read character by character of data, until you
reach the end of the file. This is maybe ok (disk I/O could be the
limiting factor here) but ignores the problem I tried to solve
concerning abstracting the text/binary difference.

If we had a function called filsize() in the standard (one of my
main complaints) your program could be written in a few lines, and
read all the size of the file into memory.

I concede that there may be rare cases when you want the whole file
in memory. However, I can't really think of any for now. Remember
that you can't count on the availability of more than 64 Kbytes of
memory (although most systems provide more).

If you can think of cases where this function (full read-in) is
really useful, do specify a few in detail.
 
C

Chris Torek

Chris Torek wrote:
[snippage occurred here]
#ifdef OPTIONAL
if (n < space) {
new = realloc(buf, n + 1);
if (new != NULL)
buf = new;
}
#endif
buf[n] = '\0';

Minor error. If the realloc fails, buf[n] = '\0' will write beyond
the buffer.

No -- the buffer's size is space+1 bytes, not n+1. This is the
whole point of the realloc(): to shrink the buffer from space+1
bytes long to n+1 bytes long, so that buf[n] is the last byte,
rather than somewhere before the last byte.

(As someone else noted, though, there was a malloc() that should
have been a realloc().)
 
M

Mark McIntyre

Look at that:

AlphaServer DS10L Noname pc clone (my machine)
466Mhz CPU EV6 2GHZ Dual core AMD

Remember, CISC vs RISC !
$699 600 Euros

Thanks for pointing out that one can pick up a ludicrously over spec'ed
machine which struggles to run a bloated OS, is a virus and spam magnet
and goes tits-up regularly for only marginally less than you can pick up
rock-solid extremely reliable bit of kit thats still rocking nearly two
decades after it was first shipped.... :)
 
J

jacob navia

Mark said:
Remember, CISC vs RISC !

The RISC idea was to reduce the instructions and speed up the clock.
Here we have a reduced instructions with a slower clock by a
factor of 4 ...
Thanks for pointing out that one can pick up a ludicrously over spec'ed
machine which struggles to run a bloated OS, is a virus and spam magnet
and goes tits-up regularly for only marginally less than you can pick up
rock-solid extremely reliable bit of kit thats still rocking nearly two
decades after it was first shipped.... :)

yeah sure...

keep your DS10L.
 
B

Bart C

I concede that there may be rare cases when you want the whole file
in memory. However, I can't really think of any for now. Remember
that you can't count on the availability of more than 64 Kbytes of
memory (although most systems provide more).

I can't count on more than 0KB, when out of memory. If writing an actual
user *application*, put the minimum ram on the box.
If you can think of cases where this function (full read-in) is
really useful, do specify a few in detail.

Sometimes it is not necessary but might be faster: Reading a block into the
memory is fast. Scanning bytes in that memory block is also fast.

But scanning bytes while needing to negotiate with the file system *per
byte* could be slower; in fact the file device might well be slow.

And sometimes random access is required to the whole file: data of various
kinds (eg uncompressed bitmap), executable data (eg. native code, byte
code). Or some small application likes to store all it's persistent
variables as disk files. When it starts again, it naturally wants to re-load
those files.

Sometimes you want to just grab the file in case something happens to it
(someone unplugs the flashdrive for example).

These are typical small-medium sized files. Huge files (large database,
video etc) are not suitable for this but would anyway use a different
approach or are designed for serial access.

I wouldn't call these examples rare cases, not on typical desktop computers
anyway. Common enough that a reading-entire-file-into-memory function would
be useful.

Bart
 
M

Malcolm McLean

Bart C said:
Of course in practice one would write:

char *loadfile(FILE *fp, size_t *sizep) {
if (thisiswindows) /* or other capable OS */
/*do it in a dozen lines */
else
/*do it the hard way*/
}
No, we want the code to work anywhere. In practise you need too many #ifdefs
for each compiler, and then you can't test the code easily.

My function would slurp in an ASCII file, reasonably portably. As Chris
Torek pointed out, it could be improved to make it both more reusable and to
support weirder systems. It wasn't written with very much thought.

In practise I don't expect the MiniBasic script executor to break anytime
soon, on any system anyone will actually want to run it on. In fact the
people who have used it seriously have given it a total rewrite - to not use
the standard IO streams, to use integer only arithmetic and take out math
library calls, and so on, because it seems to have found favour with small
embedded systems. There was no way I could have anticipated those
requirements.
 
J

Joachim Schmitz

jacob navia said:
The RISC idea was to reduce the instructions and speed up the clock.
Here we have a reduced instructions with a slower clock by a
factor of 4 ...
No. The idea is to reduce the instruction set and make these reduced set
execute in less CPU cycles. That way a RISC CPU doesn't need to speed up the
clock, it just gets more work done in less cycles regardless it's slower
clock.
Comparing the clocks of different type CPUs isn't helpfull at all,
benchmarks are needed for a meaningfull comparison.

Bye, Jojo
 
W

Walter Roberson

The RISC idea was to reduce the instructions and speed up the clock.
Here we have a reduced instructions with a slower clock by a
factor of 4 ...

And?

The main desktop machines I use are 200 MHz / 128 Mb (home) and
250 MHz / 256 Mb (work) -- and their desktop is still more responsive
than my 2.0 GHz / 1.5 Gb Windows PC. I won't deny that the browser is
noticably faster on the 2.0 GHz Windows PC, but start one task on
the Windows PC and everything else crawls even when there
is plenty of memory, whereas my RISC boxes stay peppy until you
run something big enough to swap to disk. (And when they do swap
to disk, they still record the input events, unlike the high speed
Opeteron Linux cluster at work, which loses most keypresses if it
is busy swapping to disk!)

A small clarification, by the way: the 200 MHz and 250 MHz are
the internal clock speeds, which are double the external clock
speeds on the system. The CPUs are being externally clocked at
only 100 MHz and 125 MHz respectively, and the memory is only 100 ns
on the boxes. In theory my Windows PC should be able to run rings
around those decade old boxes, but that's not what the user
experiences.
 
C

CJ

Remember that you can't count on the availability of more than 64
Kbytes of memory (although most systems provide more).

Many C programs will need to get by on a lot less memory than this! I
know that for Jacob every computer is a 32-bit Windows box with
a couple of gigabytes of RAM, but out there in the real world C programs
often control things like toasters or kettles, where memory is severely
limited.
 
J

jacob navia

CJ said:
Many C programs will need to get by on a lot less memory than this! I
know that for Jacob every computer is a 32-bit Windows box with
a couple of gigabytes of RAM, but out there in the real world C programs
often control things like toasters or kettles, where memory is severely
limited.

Look "CJ" whoever you are:

You know NOTHING of where I have programmed, or what I am doing.
Versions of lcc-win run in DSPs with 80k of memory, and only
20 usable.

You (like all the "regulars") repeat the same lies about me again
and again but that doesn't makes them true.

You feel like insulting someone?

Pick up another target or (much better) try to stop kissing ass ok?
 
R

Richard

CBFalconer said:
I concede that there may be rare cases when you want the whole file
in memory. However, I can't really think of any for now. Remember
that you can't count on the availability of more than 64 Kbytes of
memory (although most systems provide more).

You appear to have zero experience of using C in any REAL applications.

Files are used to store data all the time.

This data is often required in matrix operations, for example, where ALL
the data is required at one time.

This is one of a million similar scenarios.

Why do you have to be so contrary all the time? You never seem happy
unless your are prancing around trying to get one over on someone. You
are like RH but without the C skills.

As for your comments about the memory ... get real. From that ridiculous
statement are you saying that no "standard C" program can assume more
than 64k will be available? That would mean a hell of a lot of ISO C
programs bugging out with "malloc didn't work" errors.

Why do you do this?
 
R

Richard

CJ said:
Many C programs will need to get by on a lot less memory than this! I

And many do. What is your point? The point here is that many don't.
know that for Jacob every computer is a 32-bit Windows box with
a couple of gigabytes of RAM, but out there in the real world C programs
often control things like toasters or kettles, where memory is severely
limited.

Your comments have nothing to do with the subject.
 
R

Richard

Bart C said:
I can't count on more than 0KB, when out of memory. If writing an actual
user *application*, put the minimum ram on the box.


Sometimes it is not necessary but might be faster: Reading a block into the
memory is fast. Scanning bytes in that memory block is also fast.

But scanning bytes while needing to negotiate with the file system *per
byte* could be slower; in fact the file device might well be slow.

And sometimes random access is required to the whole file: data of various
kinds (eg uncompressed bitmap), executable data (eg. native code, byte
code). Or some small application likes to store all it's persistent
variables as disk files. When it starts again, it naturally wants to re-load
those files.

Sometimes you want to just grab the file in case something happens to it
(someone unplugs the flashdrive for example).

These are typical small-medium sized files. Huge files (large database,
video etc) are not suitable for this but would anyway use a different
approach or are designed for serial access.

I wouldn't call these examples rare cases, not on typical desktop computers
anyway. Common enough that a reading-entire-file-into-memory function would
be useful.

Good reply. The clique wont like it, because programming on *smaller*
boards here immediately signifies that you are a "real C user"
...... garbage I know.
 
S

Serve La

Bart C said:
These are all very interesting examples which should be kept in mind when
writing mission-critical code or code for life-support systems.

But there is a distinct class of well-behaved files (input files of a
compiler for example), which are unlikely to be huge and unlikely to
change. For a lot of applications and their files, this will be the case.

With shared files on multi-user/multi-process systems there can be
pitfalls, but size of the file suddenly changing would be the least of the
problems.

I don't know how to deal with /var/adm/messages or similar. Suppose I read
byte-by-byte as recommended then someone updates the beginning of the
file?

How about let filesize("/var/adm/messages") and others UB?? On some systems
it could return correct values on others an error

fflush(stdin) is UB so i dont see a reason why filesize should have every
single filetype perfectly defined.
And here in clc people will have 1 more reason to tell others that demons
will fly out of their nose when they try filesize() on /dev/random or
something!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,792
Messages
2,569,639
Members
45,353
Latest member
RogerDoger

Latest Threads

Top