Create a Copy of Files

H

Hans

I'm creating a utility program to create empty text files using the
filenames PDFs, but with the case--upper and lower--preserved. I need
a duplicate file with zero footprint on the HD for indexing purposes.
Can anybody help me with this? I managed to write something in another
language, but it came out all in lowercase so I'm looking into C.

Example
 
C

CBFalconer

Hans said:
I'm creating a utility program to create empty text files using the
filenames PDFs, but with the case--upper and lower--preserved. I need
a duplicate file with zero footprint on the HD for indexing purposes.
Can anybody help me with this? I managed to write something in another
language, but it came out all in lowercase so I'm looking into C.

A .pdf file is NOT a text file. It is a binary file. Text files
are easily created with any text file editor.
 
H

Hans

A .pdf file is NOT a text file.  It is a binary file.  Text files
are easily created with any text file editor.

Yes, I know that.

The scenario is this. I download a lot of ebooks--PDF's, CHM's,
DJVU's, etc. I want to create a separate index using the operating
system's folders as a kind of catalog and store the ebooks elsewhere.
My solution is to create proxy files for the ebooks (i.e., empty text
files with the same names as the original). The text files now become
like catalog cards in the library. One "catalog card" (text file) will
have the ISBN in front and another has the book name in front.

I need to do the creation of the files in bulk because I download like
hundreds of ebooks per day. So, the cataloging part is really slowing
me down.
 
K

Keith Thompson

CBFalconer said:
A .pdf file is NOT a text file. It is a binary file. Text files
are easily created with any text file editor.

He knows that. If you had read the rest of the article, or even the
part you quoted, you would understand that. He wants to create an
empty text file whose name is based on the name of an existing PDF
file.

If you're going to snip quoted text, can you please do us all the
simple courtesy of marking it somehow? "[snip]" and "[...]" are
commonly used markers.
 
S

Spiros Bousbouras

I'm creating a utility program to create empty text files using the
filenames PDFs, but with the case--upper and lower--preserved. I need
a duplicate file with zero footprint on the HD for indexing purposes.
Can anybody help me with this? I managed to write something in another
language, but it came out all in lowercase so I'm looking into C.

Example
----------------------------
Input: 0763704814 - C++ Plus Data Structures, 3ed (Jones and
Bartlett-2003).pdf
Output: 0763704814 - C++ Plus Data Structures, 3ed (Jones and
Bartlett-2003).pdf.txt
---------------------------

And I also want to move the ISBN to the end.
----------------------------
Input: 0763704814 - C++ Plus Data Structures, 3ed (Jones and
Bartlett-2003).pdf
Output: C++ Plus Data Structures, 3ed (Jones and Bartlett-2003) -
0763704814.pdf.txt

Here's an approximation to what you want. You would
have to add more error checks based on the exact
format of your input.

#include <ctype.h>
#include <string.h>
#include <stdio.h>

void transform(const char *s , char *d) {
size_t i , isbn_end , suffix_beg ;

#define check_for_faulty_input { \
if ( s == 0 ) { \
*d = 0 ; \
return ; \
} \
}

if ( !isdigit(s[0]) ) {
*d = 0 ;
return ;
}
while ( isdigit(s) ) {
check_for_faulty_input
i++ ;
}
isbn_end = i-1 ;
while ( s[i++] ) ;
while ( s != '.' ) {
if ( i == 0 ) {
*d = 0 ;
return ;
}
i-- ;
}
suffix_beg = i ;
strncpy(d , s+isbn_end+2 , suffix_beg-isbn_end-2) ;
d[suffix_beg-isbn_end-2] = ' ' ;
strncpy(d+(suffix_beg-isbn_end-1) , s , isbn_end+1 ) ;
strcpy(d+suffix_beg , s+suffix_beg) ;
}

int main(void) {
char d[1000] ,
*input = "0763704814 - C++"
" Plus Data Structures,"
" 3ed (Jones and Bartlett-2003).pdf" ;
transform(input,d) ;
printf("%s\n%s\n\n\n",input,d) ;
return 0 ;
}
 
B

Beej Jorgensen

Hans said:
Input: 0763704814 - C++ Plus Data Structures, 3ed (Jones and
Bartlett-2003).pdf
Output: C++ Plus Data Structures, 3ed (Jones and Bartlett-2003) -
0763704814.pdf.txt

Probably a job for sscanf(), but there are a variety of options. Let's
assume filenames are in the form "ISBN - Title.ext":

#include <stdio.h>
#include <string.h>

int main(void)
{
char filename[] = "0763704814 - C++ Plus Data Structures, 3ed (Jones and Bartlett-2003).pdf";
char ext[16], isbn[16], title[512];
int num;

num = sscanf(filename, "%15s - %511[^.].%15s", isbn, title, ext);
if (num == 3) {
printf("%s - %s.%s.txt\n", title, isbn, ext);
} else {
printf("Parse failure\n");
}

return 0;
}

To create an empty file, you can just open it and close it, I'm pretty
sure:

FILE *fp;
char newfilename[768];

...

snprintf(newfilename, sizeof newfilename,
"%s - %s.%s.txt\n", title, isbn, ext);

fp = fopen(newfilename, "w"); // don't forget to error-check this
fclose(fp);

There I've used snprintf() to store the new file name in another string
so we could pass it to fopen(). (Use sprintf() if you don't have
snprintf().)

You might also consider a language like Python with better support for
regexs and walking directory trees.

-Beej
 
H

Hans

void transform(const char *s , char *d) {
    size_t i , isbn_end , suffix_beg ;

I'm guessing this is the routine to move the ISBN to the end of the
filename. Anyway, I'll check it out and add it to what I've been
working on since this morning.

Thanks a million.
 
H

Hans

    int main(void)
    {
        char filename[] = "0763704814 - C++ Plus Data Structures, 3ed (Jones and Bartlett-2003).pdf";
        char ext[16], isbn[16], title[512];
        int num;

        num = sscanf(filename, "%15s - %511[^.].%15s", isbn, title, ext);
        if (num == 3) {
            printf("%s - %s.%s.txt\n", title, isbn, ext);
        } else {
            printf("Parse failure\n");
        }

        return 0;
    }

Hmm. This is an elegant solution. I'll try this out as well and get
back to you guys.

Thanks another million.
 
S

Spiros Bousbouras

I'm guessing this is the routine to move the ISBN to the end of the
filename. Anyway, I'll check it out and add it to what I've been
working on since this morning.

Yes, the aim is to produce the filename you wanted. But
your examples are not enough to show what are the
possible formats for the input or exactly what the output
should look like. So it's just a guess. It assumes that d
is an array of char at least as long as *s including the
terminating NUL.
 
H

Hans

This is what I have so far. I got the code snippets from another forum
and combined it with Beej Jorgensen's code to move the ISBN to the
last part of the name. Now, I need a way to skip over files other than
PDF, CHM, DJVU, ZIP and RAR.

P.S. Please ignore the way I format the code. I'm not used to very
short variable names.

//----------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <string.h>

FILE *thisFile;

//---------------------------------------------------------
int main( int argc , char *argv[] ) {
//-------------------------------------------------------
struct dirent *themFiles;
char FileName[512];
char FileType[16];
char Title[512];
char ISBN[16];

int Matched;

DIR *thisFolder;
thisFolder = opendir( "." );

if ( thisFolder ) {

while ( ( themFiles = readdir( thisFolder ) ) != NULL ) {

printf( "%s\n", themFiles->d_name );

//------------------------------------------------------------------------
Matched = sscanf( themFiles->d_name , "%15s - %511[^.].%15s" ,
ISBN , Title , FileType );

if ( Matched == 3 ) {
printf( "%s - %s.%s.txt\n" , Title , ISBN , FileType );
} else {
printf( "Parse failure\n" );
}
//----------------------------------------------------
/* create the file

strcpy( FileName , themFiles->d_name );
strcat( FileName , ".txt" );
if ( ( thisFile = fopen( FileName , "w+" ) ) == NULL )
;
else
printf( "%s\n", FileName );

fclose( thisFile );
//*/

}

closedir( thisFolder );
}

system( "PAUSE" );
return 0;
}
 
H

Hans

Yes, the aim is to produce the filename you wanted. But
your examples are not enough to show what are the
possible formats for the input or exactly what the output
should look like. So it's just a guess. It assumes that d
is an array of char at least as long as *s including the
terminating NUL.

I checked out Beej Jorgensen's solution below and it worked well
moving the ISBN to the end part of the name. Anyway, as for the
possible inputs, I'm only concerned with those files with the 10-digit
ISBN at the beginning and I'll be skipping those without.
 
S

Spiros Bousbouras

//----------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>

If you use stuff outside of standard C you need to explain
what they do.
#include <string.h>

FILE *thisFile;

//---------------------------------------------------------
int main( int argc , char *argv[] ) {
//-------------------------------------------------------
struct dirent *themFiles;

Same here. What are the fields of the dirent structure?
char FileName[512];
char FileType[16];
char Title[512];
char ISBN[16];

int Matched;

DIR *thisFolder;
thisFolder = opendir( "." );

if ( thisFolder ) {

while ( ( themFiles = readdir( thisFolder ) ) != NULL ) {

printf( "%s\n", themFiles->d_name );

//------------------------------------------------------------------------
Matched = sscanf( themFiles->d_name , "%15s - %511[^.].%15s" ,
ISBN , Title , FileType );

if ( Matched == 3 ) {
printf( "%s - %s.%s.txt\n" , Title , ISBN , FileType );
} else {
printf( "Parse failure\n" );

If the parse has failed I don't think it makes sense to go
on and create the new file so you probably need a continue
here.
 
H

Hans

I did it. It's finished. Thanks a million guys.

/*---------------------------------------------------------
Program : eBook Catalog Maker
Author : Hans alias Kanshu / Hanabi
Description : Creates proxy / empty files for ebooks of type
PDF, CHM, DJVU, RAR, ZIP
Date : 05/09/2009 2102H
Notes : Solutions provided by programmers from comp.lang.c
and cprogramming.com
*/

#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <string.h>

FILE *thisCard; // catalog card; proxy file

//---------------------------------------------------------
int main( int argc , char *argv[] ) {
//-------------------------------------------------------
struct dirent *thisFile;

char CatalogCard[768];
char FileType[16];
char Title[512];
char ISBN[16];

char TypePDF[] = "pdf";
char TypeCHM[] = "chm";
char TypeDJVU[] = "djvu";
char TypeZIP[] = "zip";
char TypeRAR[] = "rar";

int Matched;
int i;

DIR *thisFolder;
thisFolder = opendir( "." );

if ( thisFolder ) {

while ( ( thisFile = readdir( thisFolder ) ) != NULL ) {

// typical filename + extension
Matched = sscanf( thisFile->d_name , "%511[^.].%15s" , Title ,
FileType );

if ( Matched == 2 ) {

// recognize only PDF, CHM, DJVU, ZIP, RAR
for( i = 0 ; ( FileType = tolower( FileType ) ) ; i++ );

if ( ( strstr( FileType , TypePDF ) != NULL ) ||
( strstr( FileType , TypeCHM ) != NULL ) ||
( strstr( FileType , TypeDJVU ) != NULL ) ||
( strstr( FileType , TypeZIP ) != NULL ) ||
( strstr( FileType , TypeRAR ) != NULL ) ) {

// create 1st catalog card; ISBN card
//-----------------------------------------------------------------
if ( strstr( FileType , TypePDF ) != NULL )
snprintf( CatalogCard , sizeof CatalogCard ,
"%s.%s.txt" , Title , FileType );

// NOTE: MQH filetype is the MetaQuotes Language file
// that has a yellow icon in the file explorer
if ( ( strstr( FileType , TypeCHM ) != NULL ) ||
( strstr( FileType , TypeDJVU ) != NULL ) )
snprintf( CatalogCard , sizeof CatalogCard ,
"%s.%s.mqh" , Title , FileType );

if ( ( strstr( FileType , TypeRAR ) != NULL ) ||
( strstr( FileType , TypeZIP ) != NULL ) )
snprintf( CatalogCard , sizeof CatalogCard ,
"%s.%s.zip" , Title , FileType );

if ( ( thisCard = fopen( CatalogCard , "w+" ) ) == NULL )
printf( "Can't create file.\n" );
else
fclose( thisCard );

// create 2nd catalog card; Title + ISBN card
//-----------------------------------------------------------------
Matched = sscanf( thisFile->d_name , "%15s - %511[^.].
%15s" ,
ISBN , Title ,
FileType );

if ( Matched == 3 ) {

for( i = 0 ; ( FileType = tolower( FileType ) ) ; i+
+ );

if ( strstr( FileType , TypePDF ) != NULL )
snprintf( CatalogCard , sizeof CatalogCard ,
"%s - %s.%s.txt" , Title , ISBN , FileType );

if ( ( strstr( FileType , TypeCHM ) != NULL ) ||
( strstr( FileType , TypeDJVU ) != NULL ) )
snprintf( CatalogCard , sizeof CatalogCard ,
"%s - %s.%s.mqh" , Title , ISBN , FileType );

if ( ( strstr( FileType , TypeRAR ) != NULL ) ||
( strstr( FileType , TypeZIP ) != NULL ) )
snprintf( CatalogCard , sizeof CatalogCard ,
"%s - %s.%s.zip" , Title , ISBN , FileType );

if ( ( thisCard = fopen( CatalogCard , "w+" ) ) == NULL )
printf( "Can't create file.\n" );
else
fclose( thisCard );

}
}
}
}

closedir( thisFolder );
}

system( "PAUSE" );
return 0;
}
 
B

Ben Bacarisse

Hans said:
This is what I have so far.

Obviously since this is a C group and you asked about C you have had C
answers, but someone should say that it need not be this hard. If
you are using Linux (or other Unix-like system) it can be done in one
command. If not, there must be similar tools that do this sort of
maintenance job. System administrators have to do this sort of things
all the time.

If you are doing this to learn C then fine, but if not it will pay to
learn how to do this sort of thing on your system.
 
H

Hans

I did it. It's finished. Thanks a million guys.

/*---------------------------------------------------------
Program     : eBook Catalog Maker
Author      : Hans alias Kanshu / Hanabi
Description : Creates proxy / empty files for ebooks of type
                PDF, CHM, DJVU, RAR, ZIP
Date        : 05/09/2009 2102H
Notes       : Solutions provided by programmers from comp.lang.c
                and cprogramming.com
*/

#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <string.h>

FILE *thisCard;  // catalog card; proxy file

//---------------------------------------------------------
  int main( int argc , char *argv[] ) {
//-------------------------------------------------------
  struct dirent *thisFile;

  char CatalogCard[768];
  char FileType[16];
  char Title[512];
  char ISBN[16];

  char TypePDF[]  = "pdf";
  char TypeCHM[]  = "chm";
  char TypeDJVU[] = "djvu";
  char TypeZIP[]  = "zip";
  char TypeRAR[]  = "rar";

  int Matched;
  int i;

  DIR *thisFolder;
  thisFolder = opendir( "." );

  if ( thisFolder ) {

    while ( ( thisFile = readdir( thisFolder ) ) != NULL ) {

      // typical filename + extension
      Matched = sscanf( thisFile->d_name , "%511[^.].%15s" , Title ,
FileType );

      if ( Matched == 2 ) {

        // recognize only PDF, CHM, DJVU, ZIP, RAR
        for( i = 0 ; ( FileType = tolower( FileType ) ) ; i++ );

        if ( ( strstr( FileType , TypePDF )  != NULL ) ||
             ( strstr( FileType , TypeCHM )  != NULL ) ||
             ( strstr( FileType , TypeDJVU ) != NULL ) ||
             ( strstr( FileType , TypeZIP )  != NULL ) ||
             ( strstr( FileType , TypeRAR )  != NULL ) ) {

          // create 1st catalog card; ISBN card
          //-----------------------------------------------------------------
          if ( strstr( FileType , TypePDF ) != NULL )
            snprintf( CatalogCard , sizeof CatalogCard ,
                      "%s.%s.txt" , Title , FileType );

          // NOTE: MQH filetype is the MetaQuotes Language file
          //   that has a yellow icon in the file explorer
          if ( ( strstr( FileType , TypeCHM )  != NULL ) ||
               ( strstr( FileType , TypeDJVU ) != NULL ) )
            snprintf( CatalogCard , sizeof CatalogCard ,
                      "%s.%s.mqh" , Title , FileType );

          if ( ( strstr( FileType , TypeRAR ) != NULL ) ||
               ( strstr( FileType , TypeZIP ) != NULL ) )
            snprintf( CatalogCard , sizeof CatalogCard ,
                      "%s.%s.zip" , Title , FileType );

          if ( ( thisCard = fopen( CatalogCard , "w+" ) ) == NULL )
            printf( "Can't create file.\n" );
          else
            fclose( thisCard );

          // create 2nd catalog card; Title + ISBN card
          //-----------------------------------------------------------------
          Matched = sscanf( thisFile->d_name , "%15s - %511[^..].
%15s" ,
                                               ISBN , Title ,
FileType );

          if ( Matched == 3 ) {

            for( i = 0 ; ( FileType = tolower( FileType ) ) ; i+
+ );

            if ( strstr( FileType , TypePDF ) != NULL )
              snprintf( CatalogCard , sizeof CatalogCard ,
                        "%s - %s.%s.txt" , Title , ISBN , FileType );

            if ( ( strstr( FileType , TypeCHM )  != NULL ) ||
                 ( strstr( FileType , TypeDJVU ) != NULL ) )
              snprintf( CatalogCard , sizeof CatalogCard ,
                        "%s - %s.%s.mqh" , Title , ISBN , FileType );

            if ( ( strstr( FileType , TypeRAR ) != NULL ) ||
                 ( strstr( FileType , TypeZIP ) != NULL ) )
              snprintf( CatalogCard , sizeof CatalogCard ,
                        "%s - %s.%s.zip" , Title , ISBN , FileType );

            if ( ( thisCard = fopen( CatalogCard , "w+" ) ) == NULL )
              printf( "Can't create file.\n" );
            else
              fclose( thisCard );

          }
        }
      }
    }

    closedir( thisFolder );
  }

  system( "PAUSE" );
  return 0;

}


Some refinement needed. It's not skipping some text files.
 
K

Keith Thompson

Great. I see from your more recent followup that it's still got some
problems. It really wasn't necessary to quote the entire article to
add a one-line response -- but since I'll be commenting on your code
myself, I'll take advantage of it.

You should be aware that <dirent.h> is non-standard. More precisely,
it's not defined by the C standard. I believe it is defined by POSIX.
This shouldn't be a problem for your purposes, but it means that (a)
your program isn't entirely portable (which is probably unavoidable
unless you use some other means to get the list of files into your
program), and (b) questions about those aspects of your program should
be directed elsewhere, probably comp.unix.programmer or some forum
that deals with your implementation.

Standard C has no support for directories.
#include <string.h>

FILE *thisCard;  // catalog card; proxy file

//---------------------------------------------------------
  int main( int argc , char *argv[] ) {
//-------------------------------------------------------
  struct dirent *thisFile;

  char CatalogCard[768];
  char FileType[16];
  char Title[512];
  char ISBN[16];

  char TypePDF[]  = "pdf";
  char TypeCHM[]  = "chm";
  char TypeDJVU[] = "djvu";
  char TypeZIP[]  = "zip";
  char TypeRAR[]  = "rar";

I'd declare all these as const, since you're not going to be changing
them.

Presumably this tests whether the opendir() call succeeded -- but you
don't do anything if it fails. An error message would be good.
    while ( ( thisFile = readdir( thisFolder ) ) != NULL ) {

      // typical filename + extension
      Matched = sscanf( thisFile->d_name , "%511[^.].%15s" , Title ,
FileType );

511 and 15 are "magic numbers". At the very least, they should be
documented. Better yet, declare them as named constants.
      if ( Matched == 2 ) {

        // recognize only PDF, CHM, DJVU, ZIP, RAR
        for( i = 0 ; ( FileType = tolower( FileType ) ) ; i++ );

        if ( ( strstr( FileType , TypePDF )  != NULL ) ||
             ( strstr( FileType , TypeCHM )  != NULL ) ||
             ( strstr( FileType , TypeDJVU ) != NULL ) ||
             ( strstr( FileType , TypeZIP )  != NULL ) ||
             ( strstr( FileType , TypeRAR )  != NULL ) ) {


This screams out for an array and a loop.

Here you're taking slightly different actions depending on the file
extension, but it could still be table-driven. That would make
maintenance easier; if you need to add support for another file
extension later, just update the table.

Error messages should be written to stdout.
          else
            fclose( thisCard );

          // create 2nd catalog card; Title + ISBN card
          //-----------------------------------------------------------------
          Matched = sscanf( thisFile->d_name , "%15s - %511[^.].
%15s" ,
                                               ISBN , Title ,
FileType );

          if ( Matched == 3 ) {

            for( i = 0 ; ( FileType = tolower( FileType ) ) ; i+
+ );

            if ( strstr( FileType , TypePDF ) != NULL )
              snprintf( CatalogCard , sizeof CatalogCard ,
                        "%s - %s.%s.txt" , Title , ISBN , FileType );


snprintf() is a good way to avoid writing past the end of a buffer,
but if the buffer isn't big enough you just ignore the error. I'm not
sure how you should handle it, but ignoring the issue is rarely the
best solution. (Sometimes it is, but then you should document the
reasons.)

This is unnecessarily system-specific.
Some refinement needed. It's not skipping some text files.

Personally, I would have implemented this in Perl.
 
S

Spiros Bousbouras

Yeah. I seem to be making more than my share of dumb mistakes today.
Thanks for catching this one.

The "Float comparison" thread is taking its toll on you :-D
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,173
Latest member
GeraldReund
Top