case insentisive file search

  • Thread starter henrik.sorensen
  • Start date
H

henrik.sorensen

Hi List,

I am looking for a way to do a case insensitive search for file names.

Anybody have some hints ?

thanks
Henrik

pl1gcc.sourceforge.net
 
S

Spiros Bousbouras

Hi List,

I am looking for a way to do a case insensitive search for file names.

Anybody have some hints ?

I don't know who List is but I'll attempt a reply.
For every file name you read, turn it into all lower
using tolower() and then do an ordinary search.
 
I

Ian Collins

Hi List,

I am looking for a way to do a case insensitive search for file names.

Anybody have some hints ?
Not realy a C question, but why don't you just read the filename and
convert it to lower or upper case?
 
H

henrik.sorensen

Ian said:
Not realy a C question, but why don't you just read the filename and
convert it to lower or upper case?
ok I should have explained a bit more.

I am writing a PL/I frontend for gcc. (pl1gcc.sourceforge.net)
When processing %INCLUDE statements, that just reads a file from the
filesystem, and places the text in the source program, I am faced with the
following problem.
Syntax
%INCLUDE filename ;

Traditionally PL/I is case insensitive.
So when my scanner/parser searches for the filename to include, it can
happen that the filename is in uppercase, but the actual file on the
filesystem is in mixed case.

Also I am trying to avoid to scan the whole directory for each file I have
to include.

The scanner is written using flex, and the parser is using bison, and all
the necessary help functions are written in C.

thanks
Henrik
 
S

Spiros Bousbouras

ok I should have explained a bit more.

I am writing a PL/I frontend for gcc. (pl1gcc.sourceforge.net)
When processing %INCLUDE statements, that just reads a file from the
filesystem, and places the text in the source program, I am faced with the
following problem.
Syntax
%INCLUDE filename ;

Is %INCLUDE something related to PL/I ?
Traditionally PL/I is case insensitive.
So when my scanner/parser searches for the filename to include, it can
happen that the filename is in uppercase, but the actual file on the
filesystem is in mixed case.

Also I am trying to avoid to scan the whole directory for each file I have
to include.

Scan the whole directory once , turn each file name into
all lower case as you read it and put it into memory.
Then for every file you want to include check if its name
appears in what you have stored in memory.

There may be some way which avoids reading the whole
directory but this would depend on the way your platform
allows you to read directories. In any case it would make
things more complicated so unless the directories involved
are really huge I don't think it would worth the trouble. And
of course there's always the possibility that one of the included
files appears at the end of the directory so you would still
need to search the whole thing no matter which method you
use.
 
W

Walter Roberson

I am looking for a way to do a case insensitive search for file names.
Anybody have some hints ?

You can't do that in standard C, as standard C gives no mechanisms
to search for any file. The best you can do in standard C is to
attempt to open a file and see if you succeed or not -- and
if you do succeed, there is no way to tell if you are looking at
the same file as another or a different file. Standard C places
no interpretation upon filenames (other than that they are null
terminated, so even if you know one filename, you cannot guess
from it which other filenames might be valid.

Thus, in order to do a search for filenames, you need to use
implementation-specific system calls or libraries or build in knowledge
about what filenames look like for your purposes.

Generally speaking, you will need to find a system extension that
allowed you to examine a directory for filenames. Those extensions
vary between operating systems and system versions. For example,
once upon a time in Unix the standard mechanism was to open the
directory as if it were a file, and then to read the binary contents
using the built-in knowledge that the first 14 characters out of every
16 were the null-padded filename (and the last 2 characters were
a binary encoding of an inode number.) This mechanism isn't
much used in newer systems -- but really, the method for Windows
looks a lot different than the method for Unix. Some systems
provide mechanisms to pass in a prefix or pattern and to get back
the next matching filename (or all matching filenames); many do not.

If you are willing to restrict your portability to POSIX and
some other random systems, you can use opendir() and readdir(),
but don't count on a filename pattern match routine.
 
S

Spiros Bousbouras

out_of_topic {
If you are willing to restrict your portability to POSIX and
some other random systems, you can use opendir() and readdir(),
but don't count on a filename pattern match routine.

What about glob() ?

}
 
K

Keith Thompson

ok I should have explained a bit more.

I am writing a PL/I frontend for gcc. (pl1gcc.sourceforge.net)
When processing %INCLUDE statements, that just reads a file from the
filesystem, and places the text in the source program, I am faced with the
following problem.
Syntax
%INCLUDE filename ;

Traditionally PL/I is case insensitive.
So when my scanner/parser searches for the filename to include, it can
happen that the filename is in uppercase, but the actual file on the
filesystem is in mixed case.

Also I am trying to avoid to scan the whole directory for each file I have
to include.

Assuming a file system that allows mixed-case file names, and in which
case distinctions are significant, I don't see how you can avoid
scanning the whole directory. You can probably cache the result
rather than scanning it for each "%INCLUDE". But scanning a directory
shouldn't be a terribly expensive on most systems.

You'll also need to decide what to do if "filename", "FileName", and
"FILENAME" all exist. I'm guesing the PL/I standard provides some
guidance on this.

But since standard C has no concept of directories, you're probably
better off asking in comp.unix.programmer.
 
S

Spiros Bousbouras

Keith said:
Assuming a file system that allows mixed-case file names, and in which
case distinctions are significant, I don't see how you can avoid
scanning the whole directory.

You start reading file names from the directory and
you check if they match against what you want to
inlude. If you find a match you can stop there , you
don't need to read the rest of the directory. This of
course if you know that there aren't file names which
are repeated with different capitalization or you don't
care.
 
H

henrik.sorensen

Spiros said:
Is %INCLUDE something related to PL/I ?
yes.
It is similar to C's #include.
Scan the whole directory once , turn each file name into
all lower case as you read it and put it into memory.
Then for every file you want to include check if its name
appears in what you have stored in memory.

good idea.
this would work, and even bring a nice improvement as well
thanks
 
H

henrik.sorensen

Walter said:
You can't do that in standard C, as standard C gives no mechanisms
to search for any file. The best you can do in standard C is to
attempt to open a file and see if you succeed or not -- and
if you do succeed, there is no way to tell if you are looking at
the same file as another or a different file. Standard C places
no interpretation upon filenames (other than that they are null
terminated, so even if you know one filename, you cannot guess
from it which other filenames might be valid.

thanks for explaining this...
Thus, in order to do a search for filenames, you need to use
implementation-specific system calls or libraries or build in knowledge
about what filenames look like for your purposes.
....
If you are willing to restrict your portability to POSIX and
some other random systems, you can use opendir() and readdir(),
but don't count on a filename pattern match routine.
ok I will look into opendir()/readdir()

thanks
 
C

cloverman

Hi List,

I am looking for a way to do a case insensitive search for file names.

Anybody have some hints ?

thanks
Henrik

pl1gcc.sourceforge.net

/*this is a OS particular question but I'd like my UNIX-specific code
criticized here - any errors I'd like to know*/

#include <stdio.h>
#include <ctype.h>
#include <string.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <ustat.h>

#include <malloc.h>
#include <unistd.h>
#include <dirent.h>


static char *
CopyStringPrefix( const char *String, size_t PrefixLength )
{
char *CopyOfString = NULL;

if ( NULL == String || PrefixLength < 0 )
{
return NULL;
}

CopyOfString = (char *)malloc( PrefixLength + 1 );
if ( NULL == CopyOfString )
{
return NULL;
}

strncpy( CopyOfString, String, PrefixLength );
CopyOfString[ PrefixLength ] = '\0';
return CopyOfString;
}

static char *
CopyString( const char *String )
{
if ( NULL == String )
{
return NULL;
}

return CopyStringPrefix( String, strlen( String ) );
}

static char *
CopyDirectory( const char *InFullPath )
{
char *DirectoryName = NULL;
char *LastSlash = NULL;

if ( NULL == InFullPath )
{
return NULL;
}

LastSlash = strrchr( InFullPath, '/' );
if ( NULL == LastSlash )
{
DirectoryName = CopyString( "./" );
}
else
{
DirectoryName = CopyStringPrefix( InFullPath, LastSlash - InFullPath
+ 1 );
}
return DirectoryName;
}

static char *
CopyFileName( const char *InFullPath )
{
char *FileName= NULL;
char *LastSlash = NULL;

if ( NULL == InFullPath )
{
return NULL;
}

LastSlash = strrchr( InFullPath, '/' );
if ( NULL == LastSlash )
{
FileName = CopyString( InFullPath );
}
else
{
FileName = CopyString( LastSlash + 1 );
}
return FileName;
}

static int
IsRegularFile( const char *FullPath )
{
struct stat FileStatus;

if ( 0 != stat( FullPath, &FileStatus ) )
{
return 0;
}
return S_ISREG(FileStatus.st_mode );
}

static int
OpenDirFile( const char *DirPath, const char *FileName, int
ReadWriteMode )
{
int fd = -1;
char *NewFullPath = NULL;

if ( NULL == DirPath || NULL == FileName )
{
return -1;
}

NewFullPath = malloc( strlen( DirPath ) + strlen( FileName ) + 1 );

if ( NULL == NewFullPath )
return -1;

strcpy( NewFullPath, DirPath );
strcat( NewFullPath, FileName );

if ( IsRegularFile( NewFullPath ) )
{
ReadWriteMode = O_RDWR;
fd = open( NewFullPath, ReadWriteMode );
}
free( NewFullPath );
return fd;
}

int
CaseInsensitiveIsEqual( const char *String1, const char *String2 )
{
int LengthOfString1 = 0, LengthOfString2 = 0, EqualCharCount = 0;

if ( NULL == String1 || NULL == String2 )
{
return String1 == String2;
}

LengthOfString1 = strlen( String1 );
LengthOfString2 = strlen( String2 );
if ( LengthOfString1 != LengthOfString2 )
{
return 0;
}

EqualCharCount = 0;
while ( EqualCharCount < LengthOfString1 &&
tolower(String1[EqualCharCount]) == tolower(String2[EqualCharCount])
)
{
EqualCharCount++;
}

return EqualCharCount == LengthOfString1;
}

static char *
CaseInsensitiveFindFileName( const char *DirPath, const char *FileName
)
{
char *FoundFileName = NULL;
struct dirent *DirectoryEntry = NULL;
DIR *Directory = NULL;

if ( NULL == DirPath )
{
return NULL;
}

Directory = opendir( DirPath );
if ( NULL == Directory )
{
return NULL;
}

DirectoryEntry = readdir( Directory );
while ( NULL != DirectoryEntry )
{
if ( CaseInsensitiveIsEqual( FileName, DirectoryEntry->d_name ) )
{
FoundFileName = CopyString( DirectoryEntry->d_name );
break;
}
DirectoryEntry = readdir( Directory );
}
closedir( Directory );
return FoundFileName;
}

int
CaseInsensitiveFileOpen( /* const */ char *FullPath, int ReadWriteMode
)
{
char *DirectoryPath = NULL, *FileName = NULL;
int fd = open( FullPath, O_RDWR );

if ( fd >= 0 )
{
return fd;
}

DirectoryPath = CopyDirectory( FullPath );
if ( NULL == DirectoryPath )
{
return -1;
}

FileName = CopyFileName( FullPath );
if ( NULL == FileName )
{
free( DirectoryPath );
return -1;
}
else
{
char *FoundFileName = CaseInsensitiveFindFileName( DirectoryPath,
FileName );

fd = -1;
if ( NULL != FoundFileName )
{
fd = OpenDirFile( DirectoryPath, FoundFileName, ReadWriteMode );
free( FoundFileName );
}
}

free( DirectoryPath );
free( FileName );

return fd;
}
 
C

cloverman

[snip]

funny how posting my code here has made me examine the code more
closely
CaseInsensitiveFileOpen( /* const */ char *FullPath, int ReadWriteMode
/* const */ is an unneeded bodge

the use of O_RDWR as opposed to ReadWriteMode is a horrible bodge -
O_RDWR should be replaced by ReadWriteMode
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top