Slow file copying in C program?

M

mike3

Hi.

I have an interesting problem. The C program presented below takes
around
12 seconds to copy 128 MB of data on my machine. Yet I know the
machine can go
faster since a copying done at a DOS prompt takes only 4 seconds to do
the same
128 MB copy (and a copy is both a read and write.). So why is this
thing like 3x
worse, anyway?

/* Copy len digits of b starting at b->StartOffs to
* a starting at a->StartOffs.
*/
int DiskInt_Copy(DiskInt *a, DiskInt *b, long len)
{
long DigitsRemaining, BufferLen, MaxBuffer;

/* Set up lengths */
DigitsRemaining = len;
MaxBuffer = DISK_BUF_SIZE/sizeof(DIGIT);
BufferLen = MaxBuffer;
if(DigitsRemaining < 0) return(ERR_SUCCESS);
if(DigitsRemaining < MaxBuffer)
BufferLen = DigitsRemaining;

/* Copy digits */
DiskInt_SeekDigit(a, 0);
DiskInt_SeekDigit(b, 0);
while(DigitsRemaining)
{
fread(diskbuf1, BufferLen, sizeof(DIGIT), b->fd);
fwrite(diskbuf1, BufferLen, sizeof(DIGIT), a->fd);
DigitsRemaining -= BufferLen;
if(DigitsRemaining < MaxBuffer)
BufferLen = DigitsRemaining;
}

/* Done! */
DiskInt_SeekDigit(a, 0);
return(ERR_SUCCESS);
}

Both files, a->fd and b->fd are opened with "wb+" (read/write,
binary).
Here, "DIGIT" is defined as "unsigned long", and "DISK_BUF_SIZE"
equals 1,048,576.
 
M

Malcolm McLean

mike3 said:
Hi.

I have an interesting problem. The C program presented below takes
around
12 seconds to copy 128 MB of data on my machine. Yet I know the
machine can go
faster since a copying done at a DOS prompt takes only 4 seconds to do
the same
128 MB copy (and a copy is both a read and write.). So why is this
thing like 3x
worse, anyway?

/* Copy len digits of b starting at b->StartOffs to
* a starting at a->StartOffs.
*/
int DiskInt_Copy(DiskInt *a, DiskInt *b, long len)
{
long DigitsRemaining, BufferLen, MaxBuffer;

/* Set up lengths */
DigitsRemaining = len;
MaxBuffer = DISK_BUF_SIZE/sizeof(DIGIT);
BufferLen = MaxBuffer;
if(DigitsRemaining < 0) return(ERR_SUCCESS);
if(DigitsRemaining < MaxBuffer)
BufferLen = DigitsRemaining;

/* Copy digits */
DiskInt_SeekDigit(a, 0);
DiskInt_SeekDigit(b, 0);
while(DigitsRemaining)
{
fread(diskbuf1, BufferLen, sizeof(DIGIT), b->fd);
fwrite(diskbuf1, BufferLen, sizeof(DIGIT), a->fd);
DigitsRemaining -= BufferLen;
if(DigitsRemaining < MaxBuffer)
BufferLen = DigitsRemaining;
}

/* Done! */
DiskInt_SeekDigit(a, 0);
return(ERR_SUCCESS);
}

Both files, a->fd and b->fd are opened with "wb+" (read/write,
binary).
Here, "DIGIT" is defined as "unsigned long", and "DISK_BUF_SIZE"
equals 1,048,576.
There could be several reasons.

Try this

void copy(char *in, char *out)
{
FILE *fpin = fopen(in, "rb);
FILE *fpout = fopen(out, "wb");
int ch;

assert(fpin && fpout);

while( (ch = getc(fpin)) != EOF)
putc(ch, fpout);
fclose(fpin);
fclose(fpout);
}

That will tell you how fast the machine can copy 128 MB, using default
buffering as provided by the standard library. Running you own buffering
scheme over the top will tend to slow things down, due to cache effects and
the like.
 
M

mike3

There could be several reasons.

Try this

void copy(char *in, char *out)
{
FILE *fpin = fopen(in, "rb);
FILE *fpout = fopen(out, "wb");
int ch;

assert(fpin && fpout);

while( (ch = getc(fpin)) != EOF)
putc(ch, fpout);
fclose(fpin);
fclose(fpout);

}

That will tell you how fast the machine can copy 128 MB, using default
buffering as provided by the standard library. Running you own buffering
scheme over the top will tend to slow things down, due to cache effects and
the like.

It still takes the same ~13 sec. So then why does Windows's
"copy" command go faster?
 
P

pete

mike3 said:
It still takes the same ~13 sec. So then why does Windows's
"copy" command go faster?

There's a chance that it might use some efficient opcodes
that don't directly correlate to any C code constructs.
 
L

Lew Pitcher

mike3 said:
[snip]
It still takes the same ~13 sec. So then why does Windows's
"copy" command go faster?

It could be because of any number of reasons.

Perhaps the Windows COPY is not written in C.

Or, perhaps it has been tuned to an optimal I/O pattern for Windows.

Or, perhaps it does not use C stdio, but instead uses Windows-specific I/O
facilities

Or, perhaps it does not actually copy files, but instead manipulates file
directory entries in a Windows-specific way to achieve the same results.

The answer to your question can only be obtained from Microsoft.

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
C

CBFalconer

mike3 said:
.... snip ...

It still takes the same ~13 sec. So then why does Windows's
"copy" command go faster?

Look at the generated code and see if putc and getc are implemented
with a macro. Also, check the results on an immediate second run -
the input file may be cached.
 
M

MQ

Hi.

I have an interesting problem. The C program presented below takes
around
12 seconds to copy 128 MB of data on my machine. Yet I know the
machine can go
faster since a copying done at a DOS prompt takes only 4 seconds to do
the same
128 MB copy (and a copy is both a read and write.). So why is this
thing like 3x
worse, anyway?

If you need fast copying of files, and are unhappy with the
performance of the C standard library, you could use file copy
routines of the operating system, which may be more efficient.
However, this will of course make the code non-portable and outside
the scope of comp.lang.c
 
M

mike3

If you need fast copying of files, and are unhappy with the
performance of the C standard library, you could use file copy
routines of the operating system, which may be more efficient.
However, this will of course make the code non-portable and outside
the scope of comp.lang.c

But basically with exclusively the C language, and not resorting
to anything outside that this is as fast as it gets, then. Alright.
 
G

Gene

Hi.

I have an interesting problem. The C program presented below takes
around
12 seconds to copy 128 MB of data on my machine. Yet I know the
machine can go
faster since a copying done at a DOS prompt takes only 4 seconds to do
the same
128 MB copy (and a copy is both a read and write.). So why is this
thing like 3x
worse, anyway?

/* Copy len digits of b starting at b->StartOffs to
* a starting at a->StartOffs.
*/
int DiskInt_Copy(DiskInt *a, DiskInt *b, long len)
{
long DigitsRemaining, BufferLen, MaxBuffer;

/* Set up lengths */
DigitsRemaining = len;
MaxBuffer = DISK_BUF_SIZE/sizeof(DIGIT);
BufferLen = MaxBuffer;
if(DigitsRemaining < 0) return(ERR_SUCCESS);
if(DigitsRemaining < MaxBuffer)
BufferLen = DigitsRemaining;

/* Copy digits */
DiskInt_SeekDigit(a, 0);
DiskInt_SeekDigit(b, 0);
while(DigitsRemaining)
{
fread(diskbuf1, BufferLen, sizeof(DIGIT), b->fd);
fwrite(diskbuf1, BufferLen, sizeof(DIGIT), a->fd);
DigitsRemaining -= BufferLen;
if(DigitsRemaining < MaxBuffer)
BufferLen = DigitsRemaining;
}

/* Done! */
DiskInt_SeekDigit(a, 0);
return(ERR_SUCCESS);

}

Both files, a->fd and b->fd are opened with "wb+" (read/write,
binary).
Here, "DIGIT" is defined as "unsigned long", and "DISK_BUF_SIZE"
equals 1,048,576.

The Windows API has "overlapped i/o" calls that admit simultaneous DMA
for disk input and output if your hardware allows it. I'd be
surprised if the system copy did not use this in a very highly tuned
way. A simple copy loop isn't going to compete.
 
S

Stephen Sprunk

mike3 said:
I have an interesting problem. The C program presented below
takes around 12 seconds to copy 128 MB of data on my
machine. Yet I know the machine can go faster since a copying
done at a DOS prompt takes only 4 seconds to do the same
128 MB copy (and a copy is both a read and write.). So why is
this thing like 3x worse, anyway?

/* Copy len digits of b starting at b->StartOffs to
* a starting at a->StartOffs.
*/
int DiskInt_Copy(DiskInt *a, DiskInt *b, long len)
{
long DigitsRemaining, BufferLen, MaxBuffer;

/* Set up lengths */
DigitsRemaining = len;
MaxBuffer = DISK_BUF_SIZE/sizeof(DIGIT);
BufferLen = MaxBuffer;
if(DigitsRemaining < 0) return(ERR_SUCCESS);
if(DigitsRemaining < MaxBuffer)
BufferLen = DigitsRemaining;

/* Copy digits */
DiskInt_SeekDigit(a, 0);
DiskInt_SeekDigit(b, 0);
while(DigitsRemaining)
{
fread(diskbuf1, BufferLen, sizeof(DIGIT), b->fd);
fwrite(diskbuf1, BufferLen, sizeof(DIGIT), a->fd);
DigitsRemaining -= BufferLen;
if(DigitsRemaining < MaxBuffer)
BufferLen = DigitsRemaining;
}

/* Done! */
DiskInt_SeekDigit(a, 0);
return(ERR_SUCCESS);
}

That's rather ugly and seems suboptimal. I'd write it like this:

/* note: untested, just to show a general idea */
#include <stdio.h>
int copy_file(FILE *src, FILE *dst) {
size_t amt_read, amt_written;
char buf[DISK_BUF_SIZE];

rewind(src);
rewind(dst);
do {
amt_read = fread(buf, 1, sizeof *buf, src);
amt_written = fwrite(buf, 1, amt_read, dst);
if ((amt_read!=amt_written)||ferror(dst))
return ERR_WRITEFAIL;
if (ferror(src))
return ERR_READFAIL;
} while (!feof(src));

return ERR_SUCCESS;
}

Of course, if you don't want to copy the entire thing, you'll need to add a
bit more logic, but then it wouldn't be a fair test against the "COPY"
command.

<OT>
Still, I bet a non-portable version would be faster since most OSes provide
an API call to do it in one operation, like CopyFile() on Windows and
sendfile() frequently found on POSIX systems. Simply eliminating hundreds
of transitions back and forth between kernel and user modes (and copying the
buffer between address spaces) will speed things up quite a bit.
Both files, a->fd and b->fd are opened with "wb+" (read/write,
binary).

Why open the source file with write permission? You're not writing to it.
Here, "DIGIT" is defined as "unsigned long", and "DISK_BUF_SIZE"
equals 1,048,576.

What's with "DIGIT"? If you're just copying files, it doesn't matter.

S
 
C

Chris Dollin

mike3 said:
I have an interesting problem. The C program presented below takes
around
12 seconds to copy 128 MB of data on my machine. Yet I know the
machine can go
faster since a copying done at a DOS prompt takes only 4 seconds to do
the same
128 MB copy (and a copy is both a read and write.). So why is this
thing like 3x
worse, anyway?

Noting that performance isn't something the Standard says
anthing about, and that you're asking an implementation-
dependant question, and I know almost nothing about DOS,
I note:
fread(diskbuf1, BufferLen, sizeof(DIGIT), b->fd);
fwrite(diskbuf1, BufferLen, sizeof(DIGIT), a->fd);
Here, "DIGIT" is defined as "unsigned long", and "DISK_BUF_SIZE"
equals 1,048,576.

that you're reading and writing in chunks of `sizeof(unsigned long)`,
which is probably 4 or 8. If you were looking for efficiency in
copying, I would have thought that something a little bigger might
help.

And why `DIGIT`?!
 
P

Peter J. Holzer

that you're reading and writing in chunks of `sizeof(unsigned long)`,
which is probably 4 or 8.

No, he's reading and writing sizeof(unsigned long) elements of BufferLen
size. I suspect that's not what he wanted, though.

hp
 
R

Roland Pibinger

I'd write it like this:

/* note: untested, just to show a general idea */
#include <stdio.h>
int copy_file(FILE *src, FILE *dst) {
size_t amt_read, amt_written;
char buf[DISK_BUF_SIZE];

char buf[BUFSIZ];
rewind(src);
rewind(dst);
do {
amt_read = fread(buf, 1, sizeof *buf, src);

amt_read = fread(buf, 1, sizeof buf, src); // ;-)
 
J

Joe Wright

mike3 said:
But basically with exclusively the C language, and not resorting
to anything outside that this is as fast as it gets, then. Alright.
For what it's worth, the following C program..

/* Simple file copy */
#include <stdio.h>
#include <stdlib.h>

void usage(void) {
puts("Usage: cpy <file1> <file2>");
}

int main(int argc, char *argv[]) {
FILE *in, *out;
size_t size;
char *buff;
if (argc != 3) usage();
if ((in = fopen(argv[1], "rb")) == NULL)
printf("Can't open %s\n", argv[1]), exit(1);
if ((out = fopen(argv[2], "wb")) == NULL)
printf("Can't make %s\n", argv[2]), exit(1);
if ((buff = malloc(BUFSIZ)) == NULL)
puts("Can't allocate memory"), exit(1);
while ((size = fread(buff, 1, BUFSIZ, in)))
fwrite(buff, 1, size, out);
fclose(in);
fclose(out);
return 0;
}

...takes 3 to 5 seconds on my kit, exactly the same as COPY.
 
J

Joe Wright

Joe said:
mike3 said:
But basically with exclusively the C language, and not resorting
to anything outside that this is as fast as it gets, then. Alright.
For what it's worth, the following C program..

/* Simple file copy */
#include <stdio.h>
#include <stdlib.h>

void usage(void) {
puts("Usage: cpy <file1> <file2>");
}

int main(int argc, char *argv[]) {
FILE *in, *out;
size_t size;
char *buff;
if (argc != 3) usage();
if ((in = fopen(argv[1], "rb")) == NULL)
printf("Can't open %s\n", argv[1]), exit(1);
if ((out = fopen(argv[2], "wb")) == NULL)
printf("Can't make %s\n", argv[2]), exit(1);
if ((buff = malloc(BUFSIZ)) == NULL)
puts("Can't allocate memory"), exit(1);
while ((size = fread(buff, 1, BUFSIZ, in)))
fwrite(buff, 1, size, out);
fclose(in);
fclose(out);
return 0;
}

..takes 3 to 5 seconds on my kit, exactly the same as COPY.
I see at least one error, I don't exit() from usage(). Sorry.
 
R

Roland Pibinger

I see at least one error, I don't exit() from usage(). Sorry.

You don't free the malloc-ed memory (why malloc at all?). More
important, you don't check for fread, fwrite and fclose errors (but
return 0 (EXIT_SUCCESS)).
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Absolutely!
 
M

mike3

I have an interesting problem. The C program presented below
takes around 12 seconds to copy 128 MB of data on my
machine. Yet I know the machine can go faster since a copying
done at a DOS prompt takes only 4 seconds to do the same
128 MB copy (and a copy is both a read and write.). So why is
this thing like 3x worse, anyway?
/* Copy len digits of b starting at b->StartOffs to
* a starting at a->StartOffs.
*/
int DiskInt_Copy(DiskInt *a, DiskInt *b, long len)
{
long DigitsRemaining, BufferLen, MaxBuffer;
/* Set up lengths */
DigitsRemaining = len;
MaxBuffer = DISK_BUF_SIZE/sizeof(DIGIT);
BufferLen = MaxBuffer;
if(DigitsRemaining < 0) return(ERR_SUCCESS);
if(DigitsRemaining < MaxBuffer)
BufferLen = DigitsRemaining;
/* Copy digits */
DiskInt_SeekDigit(a, 0);
DiskInt_SeekDigit(b, 0);
while(DigitsRemaining)
{
fread(diskbuf1, BufferLen, sizeof(DIGIT), b->fd);
fwrite(diskbuf1, BufferLen, sizeof(DIGIT), a->fd);
DigitsRemaining -= BufferLen;
if(DigitsRemaining < MaxBuffer)
BufferLen = DigitsRemaining;
}
/* Done! */
DiskInt_SeekDigit(a, 0);
return(ERR_SUCCESS);
}

That's rather ugly and seems suboptimal. I'd write it like this:

/* note: untested, just to show a general idea */
#include <stdio.h>
int copy_file(FILE *src, FILE *dst) {
size_t amt_read, amt_written;
char buf[DISK_BUF_SIZE];

rewind(src);
rewind(dst);
do {
amt_read = fread(buf, 1, sizeof *buf, src);
amt_written = fwrite(buf, 1, amt_read, dst);
if ((amt_read!=amt_written)||ferror(dst))
return ERR_WRITEFAIL;
if (ferror(src))
return ERR_READFAIL;
} while (!feof(src));

return ERR_SUCCESS;

}

Of course, if you don't want to copy the entire thing, you'll need to add a
bit more logic, but then it wouldn't be a fair test against the "COPY"
command.

Anyway, this doesn't seem to go any faster, and "COPY"
now seems to take 13 seconds, so I guess I was shooting
for "phantom" speed improvements, when really my HD
caching was probably creating the _appearance_ of
added speed.
<OT>
Still, I bet a non-portable version would be faster since most OSes provide
an API call to do it in one operation, like CopyFile() on Windows and
sendfile() frequently found on POSIX systems. Simply eliminating hundreds
of transitions back and forth between kernel and user modes (and copying the
buffer between address spaces) will speed things up quite a bit.


Why open the source file with write permission? You're not writing to it.

Because in the full program, this thing is supposed
to be used as part of a disk-based multiprecision
integer/fixed point arithmetic package, designed for
calculating pi to tens of millions of base 26 digits
(A...Z, instead of 0...9).

So hence you need to be able to both read and write
to the integers, right?

What's with "DIGIT"? If you're just copying files, it doesn't matter.

DIGIT is what the data on disk represents. Namely,
the digits of the multiprecision integer, stored
in base 456,976 (26 to the 4th power -- 4 digits
per DIGIT).
 
S

Stephen Sprunk

Roland Pibinger said:
I'd write it like this:

/* note: untested, just to show a general idea */
#include <stdio.h>
int copy_file(FILE *src, FILE *dst) {
size_t amt_read, amt_written;
char buf[DISK_BUF_SIZE];

char buf[BUFSIZ];

*shrug* Call the constant what you want.
amt_read = fread(buf, 1, sizeof buf, src); // ;-)

Oops, that typo is important. That's why the disclaimer at the top :)

S
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top