file of exact size

P

Peter J. Holzer

Wade Ward said:
Yeah, well, I let Heathfield's program run for about fifteen minutes. I'm
missing something and my IQ isn't getting bigger tonight, so I'll let it
rest.

The benchmark to beat is eight minutes. Sleep on it.
[...]

given todays average harware performance, 1 minute seems a good goal for
this benchmark.
using fwrite with a decent buffer size should do it.

% ./heathfield > foo
please wait
../heathfield > foo 45.09s user 6.26s system 79% cpu 1:04.24 total


Looks like Richard's program reaches the goal even without fwrite.

hp
 
R

Richard Heathfield

Peter J. Holzer said:
"Wade Ward" <[email protected]> a ?rit dans le message de (e-mail address removed)...
The benchmark to beat is eight minutes. Sleep on it.
[...]

given todays average harware performance, 1 minute seems a good goal
for this benchmark.
using fwrite with a decent buffer size should do it.

% ./heathfield > foo
please wait
./heathfield > foo 45.09s user 6.26s system 79% cpu 1:04.24 total

Looks like Richard's program reaches the goal even without fwrite.

Gosh! :) That makes me wonder how fast I could make it if I were
actually trying to make it fast.

But of course it doesn't quite make me wonder *enough*...
 
P

Peter J. Holzer

Charlie Gordon wrote, On 15/09/07 13:32:

[creating a file of ca. 2.5 GB ]
The benchmark to beat is eight minutes. Sleep on it.
[...]

given todays average harware performance, 1 minute seems a good goal for
this benchmark.
using fwrite with a decent buffer size should do it.

Pick the right system and the right method and I would expect closer to
1s. Actually, I would expect a lot *less* than 1s. E.g.

markg@brenda:~$ rm /tmp/big
markg@brenda:~$ time ./a.out

real 0m0.002s
user 0m0.000s
sys 0m0.000s
markg@brenda:~$ ls -l /tmp/big
-rw-r--r-- 1 markg markg 2282899 2007-09-15 19:02 /tmp/big
markg@brenda:~$

For the earlier mentioned size of 2563695577 my method did not work,

That's a bit of a cop-out. If your file is 1123 times smaller one would
expect your program to be 1123 times faster. In fact, the difference
can be expected to be even larger since 2.2 MB fit comfortably into the
buffer cache of just about any contamporary PC, but 2.5 GB do not - so
you are hitting only main memory while Richard's program is hitting the
(much slower) disk.
and for various reasons (including portability) my method might not be
suitable to the OP. I did not use standard C to do this so the code is
not topical here.

One method which uses only standard C and which is much faster on some
systems has already been mentioned: fseeking to the desired position and
writing one byte. However, fseeking beyond LONG_MAX may not be
supported.

hp
 
F

Flash Gordon

Peter J. Holzer wrote, On 15/09/07 22:05:
Charlie Gordon wrote, On 15/09/07 13:32:

[creating a file of ca. 2.5 GB ]
The benchmark to beat is eight minutes. Sleep on it. [...]
given todays average harware performance, 1 minute seems a good goal for
this benchmark.
using fwrite with a decent buffer size should do it.
Pick the right system and the right method and I would expect closer to
1s. Actually, I would expect a lot *less* than 1s. E.g.

markg@brenda:~$ rm /tmp/big
markg@brenda:~$ time ./a.out

real 0m0.002s
user 0m0.000s
sys 0m0.000s
markg@brenda:~$ ls -l /tmp/big
-rw-r--r-- 1 markg markg 2282899 2007-09-15 19:02 /tmp/big
markg@brenda:~$

For the earlier mentioned size of 2563695577 my method did not work,

That's a bit of a cop-out. If your file is 1123 times smaller one would
expect your program to be 1123 times faster.> In fact, the difference
can be expected to be even larger since 2.2 MB fit comfortably into the
buffer cache of just about any contamporary PC, but 2.5 GB do not - so
you are hitting only main memory while Richard's program is hitting the
(much slower) disk.

The trick I used meant it did not write much at all to the disk. If I
bothered to sort things out so that it produced a 1TB file it would
still take the same time.
One method which uses only standard C and which is much faster on some
systems has already been mentioned: fseeking to the desired position and
writing one byte. However, fseeking beyond LONG_MAX may not be
supported.

The method I was using allowed specifying a 64 bit offset, but at the
size specified it failed so it is possible that there is another limit
being imposed on the file size.
 
W

Wade Ward

If I had wanted to restrict myself to C99 you could use a single count
variable of
type 'long long' and just a single loop instead of two, but I did it this
way
to be portable to implementations not supporting 'long long' as well.
Doing it the way you did, you made the program readable.
It might also be worth noting that writing a single character at a time is
not the most
efficient way of doing it - but improving that is left as an exercise for
the reader.

Modifying the program so that it can be used to create a file with an
arbitrary size
is also left as an exercise.

Oh, and the fclose() calls in the program are not actually necessary - see
if you can
figure out why.
Because windows doesn't care.
<I am the major model of a modern major genital.>
Erik Trulsson
(e-mail address removed)
http://www.zaxfuuq.net/bigfile1.htm
Thanks all contributors.

Below the image is what's left of Heathfield's code. It didn't port, which
is funny. The joke is either on him or me, depending on whether he intended
to fail to port. If he did, then it must have been funny to see me last
night looking at a flickering dos window that was ghosting tja and -1. If
he didn't, then that's just rich.

I didn't have a c compiler when I made the original post. I tried to
install lcc-win32, and it was a pain in the ass. Sorry, Jacob. (Any press
is good press, but when the French feed me dogshit, I'm likely to mention
it.) Bloodshed was cranking on Richard's problem in less than a half hour.
No diagnostics issued for either prog. Richard's ended up as nasal demons;
Erik's an unqualified success. Took about five minutes.

Tip your waitresses. Today is our ½ St. Pats day, and I'm not sauced. If
anyone wants to help me with tensor multiplication on a computer Walter,
please comment in c.l.f.
 
W

Walter Roberson

What do you mean? When you're erasing information you get it right to the
byte.

In your original posting, you asked to *create* a file, not to
*erase* an existing file. If you had an existing file that
needed to be written over, the task of getting it the right length
becomes easier -- you can, for example, drop all the speculation
about whether the OS allows files that long, because the file of that
length actually exists when it is handed to the program.


With erasure, you still run into possibilities about trailing binary
0's (especially if you specify that the erasure has to be with binary 0);
that's a platform-specific question which can be readily answered
by platform specialists (and -likely- for Win32, someone would even
answer it here.)
 
W

Wade Ward

Keith Thompson said:
"Wade Ward" <[email protected]> writes:
Why do you want to do this? Creating a file of an exact specified
size without saying anything about its contents seems like a very odd
requirement.
What do you mean? When you're erasing information you get it right to the
byte. I'm impressed with Bloodshed's freely-available C compiler. Even
though I don't do a whole with C, I know that when I
#define UP_CLOSE_AND_PERSONAL
#define WIN_32_LEAN_AND_MEAN
, my best path is through the C Programming Language.
 
W

Wade Ward

Walter Roberson said:
In your original posting, you asked to *create* a file, not to
*erase* an existing file. If you had an existing file that
needed to be written over, the task of getting it the right length
becomes easier -- you can, for example, drop all the speculation
about whether the OS allows files that long, because the file of that
length actually exists when it is handed to the program.
This is beyond the purview of C.
With erasure, you still run into possibilities about trailing binary
0's (especially if you specify that the erasure has to be with binary 0);
that's a platform-specific question which can be readily answered
by platform specialists (and -likely- for Win32, someone would even
answer it here.)
No! I've got my OS fooled.
 
C

Charlie Gordon

Richard Heathfield said:
Peter J. Holzer said:
"Wade Ward" <[email protected]> a ?rit dans le message de (e-mail address removed)...
The benchmark to beat is eight minutes. Sleep on it.
[...]

given todays average harware performance, 1 minute seems a good goal
for this benchmark.
using fwrite with a decent buffer size should do it.

% ./heathfield > foo
please wait
./heathfield > foo 45.09s user 6.26s system 79% cpu 1:04.24 total

Looks like Richard's program reaches the goal even without fwrite.

Gosh! :) That makes me wonder how fast I could make it if I were
actually trying to make it fast.

But of course it doesn't quite make me wonder *enough*...

Don't get carried away, if you are writing actual bytes to an actual hard
drive without tricks such as compression, delayed commit... The speed limit
is the drive throughput which averages 40-50 MB per second these days.

The timings show that your system is pretty busy running that loop popping
one byte at a time and checking its counter (79%) while we can assume the
hard drive does its flushing of memory buffers asynchronously.

Improving the code will mostly result in less user cpu time and more time
waiting for the drive.

If you have enough memory and set up a ram drive for this test, you should
get a 20% improvement and 100% cpu. Then you should see noticeable
improvements if you switch to a more efficient bufferisation:

using the program below, and limiting to 2g because of the file system, I
get the following benchmark:

$ time ./bigfile /tmp/toto 2147483647 0

real 0m48.271s
user 0m38.896s
sys 0m8.674s


$ time ./bigfile /tmp/toto 2147483647 8k

real 0m38.379s
user 0m0.463s
sys 0m9.286s

$ time ./bigfile /tmp/toto 2147483647 64k

real 0m34.771s
user 0m0.067s
sys 0m8.897s

$ time ./bigfile /tmp/toto 2147483647 1m

real 0m35.619s
user 0m0.023s
sys 0m9.109s

I can't test to a ram drive right now, but I would expect the overall time
to drop well below 10 seconds.

/* bigfile.c (C) Charlie Gordon 2007 */
/* Large file benchmark, placed in the public domain */

#include <stdio.h>
#include <stdlib.h>

long long getnumber(const char *str)
{
long long num = strtoll(str, (char **)&str, 0);
switch (*str) {
case 'g': return num << 30;
case 'm': return num << 20;
case 'k': return num << 10;
default: return num;
}
}

int main(int argc, char **argv)
{
FILE *fp;
long long file_size;
int buffer_size;

if (argc < 4) {
printf("usage: bigfile filename filesize buffersize\n");
return 0;
}
fp = fopen(argv[1], "wb");
file_size = getnumber(argv[2]);
buffer_size = getnumber(argv[3]);

if (buffer_size <= 0) {
while (file_size > 0) {
putc(0, fp);
file_size--;
}
} else {
char *buffer = calloc(buffer_size, 1);

while (file_size > 0) {
int written, towrite = buffer_size;
if (towrite > file_size)
towrite = file_size;
written = fwrite(buffer, 1, towrite, fp);
if (written <= 0) {
fclose(fp);
fprintf(stderr, "write error\n");
exit(EXIT_FAILURE);
}
file_size -= written;
}
}
fclose(fp);
return 0;
}
 
M

Michal Nazarewicz

Peter J. Holzer wrote, On 15/09/07 22:05:

Flash Gordon said:
The trick I used meant it did not write much at all to the disk. If
I bothered to sort things out so that it produced a 1TB file it would
still take the same time.

Sparse files? Using (modified) method shown on Wikipedia
(<http://en.wikipedia.org/wiki/Sparse_file>):

#v+
$ rm -f -- bigsparse && time dd if=/dev/zero of=bigsparse bs=1 count=1 \
seek=1T && ls -l bigsparse
1+0 records in
1+0 records out
1 byte (1 B) copied, 8,1393e-05 s, 12,3 kB/s

real 0m0.036s
user 0m0.007s
sys 0m0.000s
-rw------- 1 mina86 users 1099511627777 Sep 16 13:01 bigsparse
#v-
 
E

Erik Trulsson

Wade Ward said:
Doing it the way you did, you made the program readable.

If I had used 'long long' and a single loop, then the program would have been
even more readable - but less portable.
Because windows doesn't care.

No, nothing to do with Windows or any implementation-specific behaviour.
The answer is simply that when a program exits normally (by returning from main() or
by calling exit() ) then any open files will be flushed and closed automatically.
(If a program exits abnormally - e.g. by calling abort(), then the C standard does
not guarantee that files will be close correctly. There is also no guarantee that
malloc()ed memory will be released upon exit of a program - although it is on
most implementations.)
 
J

JimS

Charlie Gordon said:



<shrug> A broken OS is indeed one possible obstacle to producing the
file as specified. There are other obstacles too, some of them rather
more difficult to overcome.

Some implementations allow you to change the file mode of the standard
file handles. eg. _setmode(handle, mode).

Jim
 
P

Peter J. Holzer

Peter J. Holzer wrote, On 15/09/07 22:05:
Charlie Gordon wrote, On 15/09/07 13:32:
"Wade Ward" <[email protected]> a écrit dans le message de
(e-mail address removed)...

[creating a file of ca. 2.5 GB ]
The benchmark to beat is eight minutes. Sleep on it. [...]
given todays average harware performance, 1 minute seems a good goal for
this benchmark.
using fwrite with a decent buffer size should do it.
Pick the right system and the right method and I would expect closer to
1s. Actually, I would expect a lot *less* than 1s. E.g.

markg@brenda:~$ rm /tmp/big
markg@brenda:~$ time ./a.out

real 0m0.002s
user 0m0.000s
sys 0m0.000s
markg@brenda:~$ ls -l /tmp/big
-rw-r--r-- 1 markg markg 2282899 2007-09-15 19:02 /tmp/big
markg@brenda:~$

For the earlier mentioned size of 2563695577 my method did not work,

That's a bit of a cop-out. If your file is 1123 times smaller one would
expect your program to be 1123 times faster.> In fact, the difference
can be expected to be even larger since 2.2 MB fit comfortably into the
buffer cache of just about any contamporary PC, but 2.5 GB do not - so
you are hitting only main memory while Richard's program is hitting the
(much slower) disk.

The trick I used meant it did not write much at all to the disk. If I
bothered to sort things out so that it produced a 1TB file it would
still take the same time.

I guessed that. But since you didn't disclose your "trick" even though
it probably has already been mentioned in this thread, your quoted
timing doesn't provide much evidence.

The method I was using allowed specifying a 64 bit offset, but at the
size specified it failed so it is possible that there is another limit
being imposed on the file size.

Accessing files larger than 2GB is a bit tricky on most 32 bit systems.
Typically you have to use a special function (open64 or fopen64) to open
them or at least pass a special flag or compile the program in a special
way. For example, on Linux (or any other system using the glibc), the
following program which uses only standard C functions doesn't work if
compiled with "gcc foo.c -o foo", but does work if compiled with "gcc
-D_FILE_OFFSET_BITS=64 foo.c -o foo":


#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
int errupt = 0;
unsigned long size = 2563695577;
long o1 = size/2;
long o2 = size - o1 - 1;
FILE *fp;
if ((fp = fopen("bigfile", "wb")) == NULL) {
fprintf(stderr, "%s: cannot open %s: %s\n",
__FILE__, "bigfile", strerror(errno));
exit(1);
}
if (fseek(fp, o1, SEEK_SET) != 0) {
fprintf(stderr, "%s: cannot fseek to position %ld, SEEK_SET: %s\n",
__FILE__, o1, strerror(errno));
exit(1);
}
if (fseek(fp, o2, SEEK_CUR) != 0) {
fprintf(stderr, "%s: cannot fseek to position %ld, SEEK_CUR: %s\n",
__FILE__, o2, strerror(errno));
exit(1);
}
if (putc('x', fp) == EOF) {
fprintf(stderr, "%s: cannot write 1 char: %s\n",
__FILE__, strerror(errno));
exit(1);
}
if (fclose(fp) == EOF) {
fprintf(stderr, "%s: cannot close file: %s\n",
__FILE__, strerror(errno));
exit(1);
}
return 0;
}

Alternatively I could have used fopen64 instead of fopen. Or ditching
stdio altogether and using POSIX I/O functions ater opening the file
with open64 or open(... O_LARGEFILE ...).

hp
 
T

Tor Rustad

Charlie said:
Richard Heathfield said:
Peter J. Holzer said:
"Wade Ward" <[email protected]> a ?rit dans le message de (e-mail address removed)...
The benchmark to beat is eight minutes. Sleep on it.

[...]
given todays average harware performance, 1 minute seems a good goal
for this benchmark.
using fwrite with a decent buffer size should do it.
% ./heathfield > foo
please wait
./heathfield > foo 45.09s user 6.26s system 79% cpu 1:04.24 total

Looks like Richard's program reaches the goal even without fwrite.
Gosh! :) That makes me wonder how fast I could make it if I were
actually trying to make it fast.

But of course it doesn't quite make me wonder *enough*...

Don't get carried away, if you are writing actual bytes to an actual hard
drive without tricks such as compression, delayed commit... The speed limit
is the drive throughput which averages 40-50 MB per second these days.

Writing ~2 Gb to a file, is not exactly a fast method:

# cat mr_big.c
#include <unistd.h>

int main(void)
{
int rc;

rc = truncate("/big", 2282899UL);

if (rc)
perror("/big");

return rc;
}

# touch /big
# time ./a.out

real 0m0.002s
user 0m0.000s
sys 0m0.004s


so ISO C, is clearly not the best tool around for this job. :)
 
R

Richard Heathfield

Wade Ward said:

I like having peoplearound who are by orders of
magnitude
drunker than I.

If that is your hope...

As I embark on my third paragraph, I realize that I have to say something
about numbers, and the thing you need to know when you're erasing files
is that I'm coming for is that it is best done to Scheryl Crow's
wonderful is
every I fell lame I'm lup. Yikes. End of either 3 or 4 without definit
status. You must discharge your weapon.

....then I suspect that you are doomed to be disappointed.

Welcome to my bozo bin.
 
C

Chris Thomasson

Wade Ward said:
What do you mean? When you're erasing information you get it right to the
byte. I'm impressed with Bloodshed's freely-available C compiler.
[...]

Bloodshed Dev-C++ is an IDE, not a compiler.
 
K

Keith Thompson

Wade Ward said:
Keith Thompson said:
Wade Ward said:
Why do you want to do this? Creating a file of an exact specified
size without saying anything about its contents seems like a very odd
requirement.
What do you mean? When you're erasing information you get it right to
the
byte.
[...]

How does creating a file erase information?
A lot of it has to do with rules of evidence. I think I can type for five
paragraphs, so I'm going for it.

Kenny is playing. I like having peoplearound who are by orders of magnitude
drunker than I. Kenny actually should be the poster child for liver health.
It is when people imbibe under dessert(sic) conditions where the danger
enters. The answer to 99% of your questions is "water."
[snip more of the same]

Creating a file does not erase information, unless it does so
incidentally by clobbering some existing file. If you wanted to ask
about how to erase information, you should have done so rather than
wasting our time with irrelevancies.

You recently claimed that I had called you a troll, when in fact I'm
reasonably sure that I had never done so. Ok, fine. You're a troll.
Don't expect any help from me (or, quite likely, from anyone else
here) until and unless you stop trolling. I suspect that this may
require you to post only while sober, but I don't care.

Bye.
 
W

Wade Ward

Keith Thompson said:
Wade Ward said:
What do you mean? When you're erasing information you get it right to
the
byte.
[...]

How does creating a file erase information?
A lot of it has to do with rules of evidence. I think I can type for five
paragraphs, so I'm going for it.

Kenny is playing. I like having peoplearound who are by orders of magnitude
drunker than I. Kenny actually should be the poster child for liver health.
It is when people imbibe under dessert(sic) conditions where the danger
enters. The answer to 99% of your questions is "water."

As I embark on my third paragraph, I realize that I have to say something
about numbers, and the thing you need to know when you're erasing files is
that I'm coming for is that it is best done to Scheryl Crow's wonderful is
every I fell lame I'm lup. Yikes. End of either 3 or 4 without definit
status. You must discharge your weapon.

If there's anybody out there who are cohort to my sister Jennifer who is
erasing files so that DOJ can't find, just remember, I'm coming for you. I
will bring you in front of Bill Richardson, and if gives you two enthusiatic
thumbs up, then what the hell was I typing. Not looking at the monitor
isn't fun. Oh, now I remember where I am.

Keith has sloppy drums.
--
Wade Ward
(e-mail address removed)
"Ich denk an Dich
und laß 'nen fliegen."
--Nena, announcing that she just farted
 
W

Wade Ward

JimS said:
Some implementations allow you to change the file mode of the standard
file handles. eg. _setmode(handle, mode).
How do you let the C Programming Language know that's it's dealing with
windows 2^m, m near 6.
--
Wade Ward
(e-mail address removed)
"Ich denk an Dich
und laß 'nen fliegen."
--Nena, announcing that she just farted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top