copying files

H

Hans Vlems

I'm maintaing large numbers of Adobe Reader files (.pdf). One of my
programs, written in C (gcc 4.4.4), must make a copies between
different filesystems of these pdf files.
There is AFAIK no library function that does this, which leaves me two
options:
1- use the console interface, i.e. build a command string and pass
this to system().
2- open the file, copy the contents and close the target
I'd rather avoid option 1 because system runs out of control of my
program.
My question is what read and write functions are best suited to copy
the (binary) pdf files?
'Performance is not the main objective, but I want to be sure that the
copy finished succesfully and accurately.
Hans
 
M

Malcolm McLean

I'm maintaing large numbers of Adobe Reader files (.pdf). One of my
programs, written in C (gcc 4.4.4), must make a copies between
different filesystems of these pdf files.
There is AFAIK no library function that does this, which leaves me two
options:
1- use the console interface, i.e. build a command string and pass
this to system().
2- open the file, copy the contents and close the target
I'd rather avoid option 1 because system runs out of control of my
program.
My question is what read and write functions are best suited to copy
the (binary) pdf files?
'Performance is not the main objective, but I want to be sure that the
copy finished succesfully and accurately.
Hans

/*
Untested code

return -2 can't open input, -3 can't open output, -1 read/write
error (probably hardware problems).
*/
int copy(const char *dest, const char *source)
{
FILE *fpin;
FILE *fpout;
int err;
int ch;

fpin = fopen(source, "rb2");
if(!fpin)
return -2;
fpout = fopen(dest, "wb");
if(!fpout)
{
fclose(fpin);
return -3;
}
while( (ch = fgetc(fpin)) != EOF)
{
err = fputc(ch, fpout);
if(err == EOF)
goto error_exit;
}
/* if EOF was generated by read error instead of end of file, feof
is false */
if(!feof(fpin))
goto error_exit;
fclose(fpin);
/* we need to check that fclose flushes data to destination
correctly */
err = fclose(fpout);
if(err == EOF)
goto error_exit;
return 0;

/* read/write error */
error_exit:
fclose(fpin);
fclose(fpout);
return -1;
}
 
K

Keith Thompson

Hans Vlems said:
I'm maintaing large numbers of Adobe Reader files (.pdf). One of my
programs, written in C (gcc 4.4.4), must make a copies between
different filesystems of these pdf files.
There is AFAIK no library function that does this, which leaves me two
options:
1- use the console interface, i.e. build a command string and pass
this to system().
2- open the file, copy the contents and close the target
I'd rather avoid option 1 because system runs out of control of my
program.
My question is what read and write functions are best suited to copy
the (binary) pdf files?
'Performance is not the main objective, but I want to be sure that the
copy finished succesfully and accurately.

I'd just invoke the OS's command to copy the files ("cp" on
Unix-like systems, "copy" on Windows, probably something else on
other systems). It's likely to be at least as fast as anything
you write yourself, and it may preserve metadata (permissions,
etc.) that you're not going to be able to handle in your own code
without considerable difficulty.

I'm not sure why system running "out of control" of your program
should be an issue; can you elaborate?
 
J

JohnF

Hans Vlems said:
I'm maintaing large numbers of Adobe Reader files (.pdf). One of my
programs, written in C (gcc 4.4.4), must make a copies between
different filesystems of these pdf files.
There is AFAIK no library function that does this, which leaves me two
options:
1- use the console interface, i.e. build a command string and pass
this to system().
2- open the file, copy the contents and close the target
I'd rather avoid option 1 because system runs out of control of my
program.
My question is what read and write functions are best suited to copy
the (binary) pdf files?
'Performance is not the main objective, but I want to be sure that the
copy finished succesfully and accurately.
Hans

I'm guessing you're already aware of what Malcolm suggested
in preceding followup, and it's not adequate. And since I've
read your posts in comp.os.vms, I'm also guessing we're
talking about an ods-2/5 filesystem at one end, and maybe a
linux ext3, or whatever, at the other. Could you elaborate on
that a little? And does linux have some ods-2/5 support so you
can mount a vms disk? I wasn't aware of that. If you can indeed
just mount it, then by all means try Malcolm's suggestion and
see what happens. Should just work if the ods filesystem support
is any good. Otherwise, how are you intending to even access
the disk? Decnet for linux (note that it's no longer being
very actively supported)? I think that would be the driving
question that dictates an appropriate answer to your question.
So you need to supply all that additional info first.
By the way, I usually just ftp zipped files back and forth
between vms and linux boxes on my soho lan. Despite your
"out of control" issue, I'd just write a script (using C's
system() if you want the script in C) to do the job, unless
security's some really, really significant issue for your
situation.
 
N

Nobody

I'm maintaing large numbers of Adobe Reader files (.pdf). One of my
programs, written in C (gcc 4.4.4), must make a copies between
different filesystems of these pdf files.
There is AFAIK no library function that does this, which leaves me two
options:
1- use the console interface, i.e. build a command string and pass
this to system().

Avoid system() unless executing a "canned" command supplied by the user.
If you need to spawn a child process with specific arguments, use fork()
and exec*() rather than attempting to construct a shell command.
2- open the file, copy the contents and close the target

First, you need to decide what you mean by "copy". Part of the reason that
there isn't a library function is that there isn't a single obvious
definition of what it means to copy a file. Two plausible choices are:

1. open() the destination, write the contents of the source to it, close()
it.

2. open() a temporary file in the same directory as the destination,
write the contents of the source to it, close() it, rename() it over the
original.

The two alternatives have many differences, including (but not limited to):

1. If there are multiple hard links to the destination, #1 will leave all
links intact, and all will refer to the modified file. #2 will break one
specific hard link, causing the filename to point to a new file; the
others will still refer to the original file.

2. If the destination file is open in some other process, #1 will cause
the process to immediately see the new contents, while #2 will only affect
processes which open() the file after the rename() has occurred.

3. #1 requires that you have write permission on the destination file if
it exists, or write permission on the directory if it doesn't. #2 requires
that you have write permission on the directory (the file's permissions
don't matter); if the destination exists and the directory has the sticky
bit set, you must own the file (or be root).

4. #1 won't affect the owner, group or permissions of an existing file. #2
will create a new file with your uid, primary gid and umask.

5. If the destination exists and is a device (block or character) or FIFO,
#1 will open() it and write to it. #2 will replace it with a file.

6. If the destination exists and is a symlink, #1 will open() it (i.e.
open its target) and write to it. #2 will replace it with a file.

Note that the Unix "cp" command is similar to option #1, except that if
the file exists but open()ing it fails and the "-f" flag is used, it
attempts to unlink() it then, if that succeeds, proceeds as if the file
didn't exist.
I'd rather avoid option 1 because system runs out of control of my
program.
My question is what read and write functions are best suited to copy
the (binary) pdf files?
'Performance is not the main objective, but I want to be sure that the
copy finished succesfully and accurately.

For robustness, choose option #2 above. If writing the file fails,
remove() the temporary file rather than rename()ing it. The original file
will be left intact.

Performance-wise, mmap()ing the source and write()ing directly from the
mmap()d region eliminates a copy. mmap()ing both source and destination
and memcpy()ing may or may nor provide any additional benefit.
 
H

Hans Vlems

I'm guessing you're already aware of what Malcolm suggested
in preceding followup, and it's not adequate. And since I've
read your posts in comp.os.vms, I'm also guessing we're
talking about an ods-2/5 filesystem at one end, and maybe a
linux ext3, or whatever, at the other. Could you elaborate on
that a little? And does linux have some ods-2/5 support so you
can mount a vms disk? I wasn't aware of that. If you can indeed
just mount it, then by all means try Malcolm's suggestion and
see what happens. Should just work if the ods filesystem support
is any good. Otherwise, how are you intending to even access
the disk? Decnet for linux (note that it's no longer being
very actively supported)? I think that would be the driving
question that dictates an appropriate answer to your question.
So you need to supply all that additional info first.
   By the way, I usually just ftp zipped files back and forth
between vms and linux boxes on my soho lan. Despite your
"out of control" issue, I'd just write a script (using C's
system() if you want the script in C) to do the job, unless
security's some really, really significant issue for your
situation.

John,
your investigating powers are impressive! Unfortunately they've led
you into a dead end street...
On a VMS system I wouldn't have had the need to ask a question. VMS
has an IO subsystem (RMS) and a neatly documented API.
And I doubt I'd have used C to solve this problem ;-) since I have a
choice of at least 4 other languages that I'm more
comfortable with...
The project I'm involved in runs on a Windows platform, on Citrix
servers more precisely and I have _no_ provileges on these
systems. The reason I use the (old) DJGPP compiler is that doesn't
need a Windows install process that uses the registry.
The command line interface on WIndows doesn't even come close to what
DCL has to offer. But I digress.

I want to copy pdf files from one windows disk to another, so the
rename() function is useless. Next, I must retain the original file
which is another reason why rename() won't do.
C has a choice of functions to read from and write to diskfiles. I
want to be sure that all content gets copied, unaltered and without
inflating the file too much. One option is to read the input file one
byte at a time and write it until EOF is signalled.
Or read blocks, say 1 kB, and write them. Probably faster but may have
other drawbacks I'm not aware of.
The original post was written with this in mind and that was perhaps
not too smart.

Hans
 
H

Hans Vlems

/*
   Untested code

   return -2 can't open input, -3 can't open output, -1 read/write
error (probably hardware problems).
*/
int copy(const char *dest, const char *source)
{
  FILE *fpin;
  FILE *fpout;
  int err;
  int ch;

  fpin = fopen(source, "rb2");
  if(!fpin)
    return -2;
  fpout = fopen(dest, "wb");
  if(!fpout)
  {
    fclose(fpin);
    return -3;
  }
  while( (ch = fgetc(fpin)) != EOF)
  {
    err = fputc(ch, fpout);
    if(err == EOF)
      goto error_exit;
  }
  /* if EOF was generated by read error instead of end of file, feof
is false */
  if(!feof(fpin))
    goto error_exit;
  fclose(fpin);
  /* we need to check that fclose flushes data to destination
correctly */
  err = fclose(fpout);
  if(err == EOF)
    goto error_exit;
  return 0;

  /* read/write error */
error_exit:
  fclose(fpin);
  fclose(fpout);
  return -1;



}- Tekst uit oorspronkelijk bericht niet weergeven -

- Tekst uit oorspronkelijk bericht weergeven -

Malcolm,
thanks for the example. Copying the input file one byte per read
(getch) operation may not be the fastest way,
it does not inflate the detsination filesize (we pay for dsk storage
here).
Hans
 
H

Hans Vlems

[...]>   fpin = fopen(source, "rb2");

[...]

Is "rb2" a typo?

--
Keith Thompson (The_Other_Keith) (e-mail address removed)  <http://www.ghoti.net/~kst>
    Will write code for food.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Possibly, but the setting the proper filemode is not my main concern.
Hans
 
K

Kleuske

I'm maintaing large numbers of Adobe Reader files (.pdf). One of my
programs, written in C (gcc 4.4.4), must make a copies between different
filesystems of these pdf files. There is AFAIK no library function that
does this, which leaves me two options:
1- use the console interface, i.e. build a command string and pass this
to system().

Don't. It's ineffective and may open up your system to abuse.

2- open the file, copy the contents and close the target I'd rather
avoid option 1 because system runs out of control of my program.
My question is what read and write functions are best suited to copy the
(binary) pdf files?

Try fopen, fread, fwrite and fclose. Use a big buffer, since PDF's (especially
with grahics) tend to be big.
'Performance is not the main objective, but I want to be sure that the
copy finished succesfully and accurately. Hans

Check for error codes.
 
H

Hans Vlems

I'd just invoke the OS's command to copy the files ("cp" on
Unix-like systems, "copy" on Windows, probably something else on
other systems).  It's likely to be at least as fast as anything
you write yourself, and it may preserve metadata (permissions,
etc.) that you're not going to be able to handle in your own code
without considerable difficulty.

I'm not sure why system running "out of control" of your program
should be an issue; can you elaborate?

--
Keith Thompson (The_Other_Keith) (e-mail address removed)  <http://www.ghoti.net/~kst>
    Will write code for food.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith, the system(s) we have to work with are connected to disks in a
way that seriously affects performance.
I've seen a file copy last a little over one minute, with a filesize
of approx. 2 MB. A copy may also just fail.
The function described in the OP may be used for several files, say
about 20 and all at least 1 MB in size.
That is certainly tot impressive by todays standards and shouldn't be
a challenge for the underlying hardware.
Unfortunately, these things do fail occasionally and when a copy fails
then system() does not signal that failure.
I'd rather know about it and hence the desire to perform the copy in
my own code. Malcolm's example demonstrates
the various possibilities to signal an error situation.
Hans
 
J

JohnF

John,
your investigating powers are impressive! Indeed they are!
Unfortunately they've led you into a dead end street...
Indeed they do!
On a VMS system I wouldn't have had the need to ask a question. VMS
has an IO subsystem (RMS) and a neatly documented API.
And I doubt I'd have used C to solve this problem ;-) since I have a
choice of at least 4 other languages that I'm more
comfortable with...
The project I'm involved in runs on a Windows platform, on Citrix
servers more precisely and I have _no_ provileges on these
systems. The reason I use the (old) DJGPP compiler is that doesn't
need a Windows install process that uses the registry.
The command line interface on WIndows doesn't even come close to what
DCL has to offer. But I digress.

I want to copy pdf files from one windows disk to another, so the
rename() function is useless. Next, I must retain the original file
which is another reason why rename() won't do.
C has a choice of functions to read from and write to diskfiles. I
want to be sure that all content gets copied, unaltered and without
inflating the file too much. One option is to read the input file one
byte at a time and write it until EOF is signalled.
Or read blocks, say 1 kB, and write them. Probably faster but may have
other drawbacks I'm not aware of.
The original post was written with this in mind and that was perhaps
not too smart.
Hans

Yeah, I've used djgpp and mingw to compile C programs on windows.
I think I'd recommend mingw if that works for you (I say "think"
because I can't recall >>why<< I prefer it). Anyway, when you sayi.e., whoever's running it has whatever's necessary to do that.
In that case, block reads and writes are fine. I've done that
and it works okay for me. Of course, you should try it yourself,
since only God knows what'll happen in your particular situation.
But I think you can safely start with something of the form...

int fcopy( char *infile, char *outfile ) {
FILE *inptr = fopen(infile,"rb"), /*open file for binary read*/
*outptr = fopen(outfile,"wb"); /*and write*/
unsigned char buff[256]; /*block of bytes from infile*/
int buflen=255, nread=0,nwrite=0, /*#bytes we try to read/write*/
nrw = 0; /*total bytes read/written*/
if ( inptr!=NULL && outptr!=NULL ) { /*have opened files*/
while ( 1 ) { /*read & write them till eof*/
/* --- read bytes from infile --- */
nread = fread(buff,sizeof(unsigned char),buflen,inptr); /*read*/
if ( nread < 1 ) break; /* no bytes left in file */
/* --- write bytes to outfile --- */
nwrite = fwrite(buff,sizeof(unsigned char),nread,outptr); /*write*/
if ( nwrite != nread ) { nrw=(-1); break; } /*problem writing*/
nrw += nwrite; /*total #bytes*/
if ( nread < buflen ) break; /* no bytes left in file */
} /* --- end-of-while(1) --- */
fclose(inptr); fclose(outptr); /* close files */
} /* --- end-of-if(fileptrs!=NULL) --- */
return ( nrw ); /*back to caller with file size*/
} /* --- end-of-function fcopy() --- */

....which I've snipped from some code that works (with a few
essentially cosmetic changes so it reads okay as a code fragment).
 
S

Stefan Ram

Keith Thompson said:
I'd just invoke the OS's command to copy the files ("cp" on
Unix-like systems, "copy" on Windows, probably something else on
other systems). It's likely to be at least as fast as anything

I'd invoke the OS call to copy the file, on Windows it's
»CopyFile«. For example, from one of my programs:


#include <windows.h>
#include <tchar.h>

....

int filecopy( LPTSTR const target, LPTSTR const source )
{ BOOL const success = CopyFile( source, target, TRUE );
int const terminate = success ? 1 : error5();
return terminate; }

....
 
H

Hans Vlems

  I'd invoke the OS call to copy the file, on Windows it's
  »CopyFile«. For example, from one of my programs:

#include <windows.h>
#include <tchar.h>

...

int filecopy( LPTSTR const target, LPTSTR const source )
{ BOOL const success = CopyFile( source, target, TRUE );
  int const terminate = success ? 1 : error5();
  return terminate; }

...

OK, I tried both John's and Malcom's solutions and they both work
fine, as expected.
There's very little difference in performance between the two. Both
copy 2 MB files
instantly: that is the user hits return and immediately gets a result
printed.
Thanks for your time!

Hans
 
H

Hans Vlems

Don't. It's ineffective and may open up your system to abuse.


Try fopen, fread, fwrite and fclose. Use a big buffer, since PDF's (especially
with grahics) tend to be big.


Check for error codes.

Given the quality and performance of the disks on the systems we've
got to work with
error checking is my main objective here. Performance is not my main
concern right.
However once the hardware problems get solved, performance may just
become user issue #1 again.
Hans
 
H

Hans Vlems

Hans Vlems said:
John,
your investigating powers are impressive! Indeed they are!
Unfortunately they've led you into a dead end street...

Indeed they do!




On a VMS system I wouldn't have had the need to ask a question. VMS
has an IO subsystem (RMS) and a neatly documented API.
And I doubt I'd have used C to solve this problem ;-) since I have a
choice of at least 4 other languages that I'm more
comfortable with...
The project I'm involved in runs on a Windows platform, on Citrix
servers more precisely and I have _no_ provileges on these
systems. The reason I use the (old) DJGPP compiler is that doesn't
need a Windows install process that uses the registry.
The command line interface on WIndows doesn't even come close to what
DCL has to offer. But I digress.
I want to copy pdf files from one windows disk to another, so the
rename() function is useless. Next, I must retain the original file
which is another reason why rename() won't do.
C has a choice of functions to read from and write to diskfiles. I
want to be sure that all content gets copied, unaltered and without
inflating the file too much. One option is to read the input file one
byte at a time and write it until EOF is signalled.
Or read blocks, say 1 kB, and write them. Probably faster but may have
other drawbacks I'm not aware of.
The original post was written with this in mind and that was perhaps
not too smart.
Hans

Yeah, I've used djgpp and mingw to compile C programs on windows.
I think I'd recommend mingw if that works for you (I say "think"
because I can't recall >>why<< I prefer it). Anyway, when you say>>no<< priviliges, I assume your program can read and write files,

i.e., whoever's running it has whatever's necessary to do that.
   In that case, block reads and writes are fine. I've done that
and it works okay for me. Of course, you should try it yourself,
since only God knows what'll happen in your particular situation.
But I think you can safely start with something of the form...

int     fcopy( char *infile, char *outfile ) {
FILE    *inptr = fopen(infile,"rb"),  /*open file for binary read*/
        *outptr = fopen(outfile,"wb");        /*and write*/
unsigned char buff[256];                /*block of bytes from infile*/
int     buflen=255, nread=0,nwrite=0,   /*#bytes we try to read/write*/
        nrw = 0;                        /*total bytes read/written*/
if ( inptr!=NULL && outptr!=NULL ) {    /*have opened files*/
  while ( 1 ) {                         /*read & write them till eof*/
    /* --- read bytes from infile --- */
    nread = fread(buff,sizeof(unsigned char),buflen,inptr); /*read*/
    if ( nread < 1 ) break;          /* no bytes left in file */
    /* --- write bytes to outfile --- */
    nwrite = fwrite(buff,sizeof(unsigned char),nread,outptr); /*write*/
    if ( nwrite != nread ) { nrw=(-1); break; } /*problem writing*/
    nrw += nwrite;                      /*total #bytes*/
    if ( nread < buflen ) break;     /* no bytes left in file */
    } /* --- end-of-while(1) --- */
  fclose(inptr); fclose(outptr);        /* close files */
  } /* --- end-of-if(fileptrs!=NULL) --- */
return ( nrw );                         /*back tocaller with file size*/

} /* --- end-of-function fcopy() --- */

...which I've snipped from some code that works (with a few
essentially cosmetic changes so it reads okay as a code fragment).
--
John Forkosh  ( mailto:  (e-mail address removed)  where j=john and f=forkosh)- Tekst uit oorspronkelijk bericht niet weergeven -

- Tekst uit oorspronkelijk bericht weergeven -- Tekst uit oorspronkelijk bericht niet weergeven -

- Tekst uit oorspronkelijk bericht weergeven -

The reason I use djgpp is that it is *very* simple to set up: unpack a
zip file and that's it.
Much later I came across mingw and that proved not as easy to set up.
Since I don't write code that is so subtle that it takes a very
refined compiler I think that gcc 4.4.4 is quite alright.

Hans
 
I

Ike Naar

int fcopy( char *infile, char *outfile ) {
FILE *inptr = fopen(infile,"rb"), /*open file for binary read*/
*outptr = fopen(outfile,"wb"); /*and write*/
unsigned char buff[256]; /*block of bytes from infile*/
int buflen=255, nread=0,nwrite=0, /*#bytes we try to read/write*/
nrw = 0; /*total bytes read/written*/
if ( inptr!=NULL && outptr!=NULL ) { /*have opened files*/
while ( 1 ) { /*read & write them till eof*/
/* --- read bytes from infile --- */
nread = fread(buff,sizeof(unsigned char),buflen,inptr); /*read*/

sizeof(unsigned char) is 1 by definition.
Is there a particular reason why you are reading one byte less than
the size of the buffer? It seems more logical to use the entire buffer:

nread = fread(buff, 1, sizeof buff, inptr);
if ( nread < 1 ) break; /* no bytes left in file */
/* --- write bytes to outfile --- */
nwrite = fwrite(buff,sizeof(unsigned char),nread,outptr); /*write*/
if ( nwrite != nread ) { nrw=(-1); break; } /*problem writing*/
nrw += nwrite; /*total #bytes*/
if ( nread < buflen ) break; /* no bytes left in file */

if (nread < sizeof buff) break;
} /* --- end-of-while(1) --- */
fclose(inptr); fclose(outptr); /* close files */
} /* --- end-of-if(fileptrs!=NULL) --- */
return ( nrw ); /*back to caller with file size*/

Potential file descriptor leak: if one of the files could be opened
but the other couldn't, the opened file is not closed.
Also, in this case 0 is returned, so from the return value you
cannot distinguish between having fopen errors, and having completed
a successful copy of an empty file.
 
B

Ben Bacarisse

Hans Vlems said:
Keith, the system(s) we have to work with are connected to disks in a
way that seriously affects performance.
I've seen a file copy last a little over one minute, with a filesize
of approx. 2 MB. A copy may also just fail.
The function described in the OP may be used for several files, say
about 20 and all at least 1 MB in size.
That is certainly tot impressive by todays standards and shouldn't be
a challenge for the underlying hardware.
Unfortunately, these things do fail occasionally and when a copy fails
then system() does not signal that failure.
I'd rather know about it and hence the desire to perform the copy in
my own code. Malcolm's example demonstrates
the various possibilities to signal an error situation.

Given your concerns about safety, you may well be working at the wrong
level. It's perfectly possible for a C program (using standard C IO) to
signal success without a single byte of data hitting the disk.

If your concern for a safe copy is indeed paramount, you will have ask
about OS-level facilities in a suitable group. Both the OS and the
file-system types that are involved in the copy may have a bearing on
how to get maximum safety.
 
K

Keith Thompson

Nobody said:
Avoid system() unless executing a "canned" command supplied by the user.
If you need to spawn a child process with specific arguments, use fork()
and exec*() rather than attempting to construct a shell command.
[...]

This, as well as the rest of your response, relies heavily on the
assumption that the OP is on a Unix-like system.
 
K

Keith Thompson

Hans Vlems said:
[...]>   fpin = fopen(source, "rb2");

[...]

Is "rb2" a typo?

Possibly, but the setting the proper filemode is not my main concern.

If you mean that it's a trivial thing to correct and you're not going to
get it wrong, that's fine. If you mean that it doesn't matter, it most
certainly does.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top