Getting a list of files in a directory using a wildcard

  • Thread starter Michael McGarry
  • Start date
M

Michael McGarry

Hi,

Does anyone know how to get a list of files in a directory using a
wildcard, like "*.txt"?

Thanks,

Michael
 
K

karthicksmail

Hi,

Can you specify the environment in which you want to do this?

In case of Unix, you can use the globbing functions:
------------------------------------------
SYNOPSIS
#include <glob.h>

int glob(const char *pattern, int flags,
int errfunc(const char * epath, int eerrno),
glob_t *pglob);
void globfree(glob_t *pglob);
------------------------------------------

Check Section 3 of the manual page for glob.


Rgds,
Karthick S.
 
W

Walter Roberson

Does anyone know how to get a list of files in a directory using a
wildcard, like "*.txt"?

There is no way to do that in standard C. Ask in a newsgroup that
discusses programming for your operating system.


[OT]
The POSIX mechanism would be to open the directory, read one entry
at a time, perform your own wild-card match to see whether the name
matches your criteria, then double-check to be sure the entry represents
a file instead of some other kind of object, save the name if you
have a match, and loop back to the next directory entry; at the end,
close the directory.

You can find pre-written code that does all this for you.

But it looks like you might be using Windows, so you might or might
not have POSIX routines available, so an operating-system specific
group is really your second best bet (after searching the OS
documentation.)
 
D

Default User

Michael said:
Hi,

Does anyone know how to get a list of files in a directory using a
wildcard, like "*.txt"?

Standard C has no facilities for getting directory lists at all. You
need to find a newsgroup dedicated to your platform.



Brian
 
J

Jordan Abel

Standard C has no facilities for getting directory lists at all.
You need to find a newsgroup dedicated to your platform.

Strictly, the C standard doesn't even speak of directories, and
implementations are not required to provide them.
 
M

Michael McGarry

Hi,

Thanks, I had a feeling this wasn't possible with standard C. It seems
such a common and general concept that it should be part of a standard
library.

Specifically, I am looking to do this on a POSIX compliant system.

Regards,

Michael
 
M

Malcolm

Michael McGarry said:
Does anyone know how to get a list of files in a directory using a
wildcard, like "*.txt"?
Write a function

int matchwild(char *pattern, char *str)

which matches a wildcard. This isn't too difficult to do.

The you need to find a platform-specific directory reading function. Windows
and Unix have different approaches to this. The Windows function is called
something like "findfirst", whilst the Unix is called something like
"dirent". I can't remember the details.

If you want the program to be easy to port, use a wrapper like

char **listdirectory(char *path, int *N)

This then calls the OS-specific stuff.

The you put the two together, and you have your directory lister.
 
M

Mark B

Michael McGarry said:
Hi,

Thanks, I had a feeling this wasn't possible with standard C. It seems
such a common and general concept that it should be part of a standard
library.

It probably is... just not 'the' standard library ;-)
Specifically, I am looking to do this on a POSIX compliant system.

I use opendir() and readdir() in conjunction with fnmatch() - but as far
as I know only fnmatch() is POSIX compliant... which is why you'd be
better served by another group such as: comp.unix.programmer

HTH,
Mark
 
S

Simon Biber

Mark said:
It probably is... just not 'the' standard library ;-)




I use opendir() and readdir() in conjunction with fnmatch() - but as far
as I know only fnmatch() is POSIX compliant... which is why you'd be
better served by another group such as: comp.unix.programmer

<OT> According to my manual, opendir(3) and closedir(3) are POSIX
compliant, and readdir(3) is POSIX 1003.1-2001 compliant.

"According to POSIX, the dirent structure contains a field char
d_name[] of unspecified size, with at most NAME_MAX characters preced-
ing the terminating null character. Use of other fields will harm the
portability of your programs. POSIX 1003.1-2001 also documents the
field ino_t d_ino as an XSI extension."

</OT>
 
M

Mark B

Simon Biber said:
Mark said:
It probably is... just not 'the' standard library ;-)




I use opendir() and readdir() in conjunction with fnmatch() - but as far
as I know only fnmatch() is POSIX compliant... which is why you'd be
better served by another group such as: comp.unix.programmer

<OT> According to my manual, opendir(3) and closedir(3) are POSIX
compliant, and readdir(3) is POSIX 1003.1-2001 compliant.

"According to POSIX, the dirent structure contains a field char
d_name[] of unspecified size, with at most NAME_MAX characters preced-
ing the terminating null character. Use of other fields will harm the
portability of your programs. POSIX 1003.1-2001 also documents the
field ino_t d_ino as an XSI extension."

Thanks for the information... I don't have a copy of the POSIX standard
and typically go by the man pages to determine compliance... unfortunately
those (opendir/readdir) functions don't have a 'Standards' section on any
of my servers.
 
M

Michael Wojcik

Thanks, I had a feeling this wasn't possible with standard C. It seems
such a common and general concept that it should be part of a standard
library.

It's not a "general concept", which is why it's not part of the
standard library. There are a number of environments suitable for
hosted C implementations (most of which in fact *have* hosted C
implementations) where "directory" and "wildcard" have no precise
equivalent, or look very different from the filesystems used on
Unix-like OSes or Windows.

If it were included in the standard library, it would have to be
defined so vaguely as to be useless for portable code, much like
system().

[OT] For that matter, it's not even really well-defined in POSIX,
because different shells have different globbing syntax; so what a
user considers a "wildcard" may not correspond to what a given
application thinks is a wildcard. SUS's fnmatch() standardizes a
wildcard syntax for programs, but it's a subset of what various
shells permit, and so does not necessarily match user expectations.

--
Michael Wojcik (e-mail address removed)

"Well, we're not getting a girl," said Marilla, as if poisoning wells were
a purely feminine accomplishment and not to be dreaded in the case of a boy.
-- L. M. Montgomery, _Anne of Green Gables_
 
K

Keith Thompson

It's not a "general concept", which is why it's not part of the
standard library. There are a number of environments suitable for
hosted C implementations (most of which in fact *have* hosted C
implementations) where "directory" and "wildcard" have no precise
equivalent, or look very different from the filesystems used on
Unix-like OSes or Windows.

If it were included in the standard library, it would have to be
defined so vaguely as to be useless for portable code, much like
system().

The same could be said for file names and environment variables. And
yes, in both cases the standard lets you use them, but makes very few
guarantees about what they look like (both file names and environment
variable names are basically treated as uninterpreted strings).

There have been attempts to define directory support in a portable
manner. It wouldn't have been entirely unreasonable to include such
support in the C standard. The authors of the standard just decided
not to do so. IMHO it was the right decision, but it wasn't the only
possible one.

My point, I guess, is that there isn't a clearly logical division
between what's in the standard library (because it makes sense
everywhere) and what isn't (because it can't be defined portably).
And there are plenty of secondary standards to fill in the gaps.
 
J

Joe Wright

Michael said:
Hi,

Does anyone know how to get a list of files in a directory using a
wildcard, like "*.txt"?

Thanks,

Michael
Have a look at what I did some years ago..

#include <stdio.h>

#define max(a,b) ((a) > (b) ? (a) : (b))

void space(int n) {
while (n--) putchar(' ');
}

char **arr;

int index, len, number, major, minor = 4;

int main(int argc, char *argv[]) {
int j, k;
if (argc == 1) {
printf("Argument Lister. Type some arguments..\n");
}
else {
number = argc-1;
printf("There %s %d argument%s...",
(number==1) ? "is" : "are", number,
(number==1) ? "" : "s");
arr = argv + 1;
for (len = j = 0; j < number; ++j)
len = max(len, strlen(arr[j]));
len += 1;
minor = 79 / len;
major = (number-1+minor) / minor;
for (k = 0; k < major; ++k) {
for (j = 0; j < minor; ++j) {
if (j == 0) printf("\n");
if ((index = j * major + k) < number)
space(len - printf("%s", *(arr + index)));
}
}
printf("\n");
}
return 0;
}

I takes advantage of the fact that my implementation (maybe yours too)
will expand the wildcard * and ? and present the result to the C program
in terms of argc and argv. This has nothing to do with C itself but with
your environment. On Unix it may the shell which presents the file list.
If you would copy and save the above program to x.c and compile it, you
might then invoke it as 'x *.txt'. Your command processor might expand
*.txt and my x will pretty-print the result on your screen. Enjoy.
 
M

Michael Wojcik

If [globbing] were included in the standard library, it would have to be
defined so vaguely as to be useless for portable code, much like
system().

The same could be said for file names and environment variables. And
yes, in both cases the standard lets you use them, but makes very few
guarantees about what they look like (both file names and environment
variable names are basically treated as uninterpreted strings).

While I agree that there is clearly a continuum in the standard
library from facilities which are highly independent of the
implementation's environmental idiosyncrasies (eg strlen) to those
that are rather more dependent on them (eg getenv, with system again
being one of the more extreme examples), it's my feeling that there's
rather a significant gap between the treatment of "file names" and
"environment variables", on the one hand, and "wildcard expansion" on
the other.

Suppose the standard defined a function "glob" which took a const
char * argument representing a "file name pattern" and a pointer to
function of some sort - a visitor-pattern design for a wildcard
expander. Should it define the pattern characters with special
meaning? Presumably, otherwise the function is entirely implementa-
tion-defined and there's no point in including it in the standard.

But defining pattern characters in the standard in effect imposes
restrictions on file names in the implementation which may not be
appropriate for the environment.

Currently, the standard imposes very few such restrictions. It does
not allow the nul character in filenames. It does not allow the
character ">" in header filenames, in effect. And so on.

That's my opinion, at any rate: any reasonable definition of a
globbing function would be too vague to be of much use in portable
code.
The authors of the standard just decided
not to do so. IMHO it was the right decision, but it wasn't the only
possible one.

Of course not, and I didn't mean to imply that it was. Clearly
they could include anything that they wanted to include. They
included system.

--
Michael Wojcik (e-mail address removed)

"Well, we're not getting a girl," said Marilla, as if poisoning wells were
a purely feminine accomplishment and not to be dreaded in the case of a boy.
-- L. M. Montgomery, _Anne of Green Gables_
 
K

Keith Thompson

Keith Thompson said:
If [globbing] were included in the standard library, it would have to be
defined so vaguely as to be useless for portable code, much like
system().

The same could be said for file names and environment variables. And
yes, in both cases the standard lets you use them, but makes very few
guarantees about what they look like (both file names and environment
variable names are basically treated as uninterpreted strings).

While I agree that there is clearly a continuum in the standard
library from facilities which are highly independent of the
implementation's environmental idiosyncrasies (eg strlen) to those
that are rather more dependent on them (eg getenv, with system again
being one of the more extreme examples), it's my feeling that there's
rather a significant gap between the treatment of "file names" and
"environment variables", on the one hand, and "wildcard expansion" on
the other.

Actually, I was thinking more of directory support than globbing.
 
N

Neil Cerutti

Actually, I was thinking more of directory support than
globbing.

A library for directory support would be tied up with support for
filenames, I think, since a filename might be a directory and
vice-versa.

The only such library I've seen in a standard (though there must
be more) is the one defined in Common Lisp. It's cumbersome, and
in practice not portable between implementations. :-(
 
K

Keith Thompson

Neil Cerutti said:
A library for directory support would be tied up with support for
filenames, I think, since a filename might be a directory and
vice-versa.

Absolutely; defining portable directory support is a hard problem.

David Tribble has put together several proposals for C200X, including
one for directory access functions
<http://david.tribble.com/text/c0xdir.html>. A search of the archives
of this newsgroup (or was it comp.std.c?) should turn up some
discussion of the proposals.
The only such library I've seen in a standard (though there must
be more) is the one defined in Common Lisp. It's cumbersome, and
in practice not portable between implementations. :-(

Well, you could consider POSIX to be a standard that provides
directory access functions. POSIX is a standard for an operating
system interface, not necessarily for an operating system, and the
interface has been implemented on systems that aren't particularly
Unix-like. On the other hand, I think POSIX implementations on
"exotic" systems tend to be emulation layers rather than interfaces to
the underlying OS.

(Yes, we're talking about things outside the C standard, but I claim
it's topical as a discussion of why the C standard isn't the
appropriate place to define this kind of thing.)
 
M

Michael Wojcik

Ah. I agree that directory support appears to be less system-
specific (more general) than globbing, particularly since the
latter more or less requires the former.
David Tribble has put together several proposals for C200X, including
one for directory access functions
<http://david.tribble.com/text/c0xdir.html>. A search of the archives
of this newsgroup (or was it comp.std.c?) should turn up some
discussion of the proposals.

Thanks for the pointer; I missed the earlier discussions. It's
an interesting proposal. I'm glad to see that he includes test
macros for some individual facilities (eg setting the "current
directory").
Well, you could consider POSIX to be a standard that provides
directory access functions. POSIX is a standard for an operating
system interface, not necessarily for an operating system, and the
interface has been implemented on systems that aren't particularly
Unix-like. On the other hand, I think POSIX implementations on
"exotic" systems tend to be emulation layers rather than interfaces to
the underlying OS.

The glorious AS/400 is an interesting case. Early versions of
OS/400 supported only a single filesystem (probably derived from
either the IBM System/38 or IBM's cancelled Future Systems project,
or both). It has a flat (non-hierarchical) set of "libraries",
which contain any number of a variety of "objects" of various
"types". There are "*PGM", "*SRVPGM", and "*MODULE" objects, for
example, which are kinds of objects that contain executable code.
There are *FILE objects, which are containers for arbitrary data.
There are various kinds of IPC objects and journal objects and
queue objects and so on - in the older version of OS/400 I have on
one machine here, about 75 types of objects in all. These are
strong types - you can't open a *PGM object as if it were a data
file.

*FILE objects are composed of "members", which contain "records".
The members of a *FILE object are often what someone coming from,
say, a Unix background would think of as a file. For example, C
source code is kept in one or more *FILE objects of the "PF"
("program file") subtype, and each C "source file" is actually a
member of a *FILE object. In my application, all the header files
are (named) members of a single *FILE object named "H". (For source
code, the records of a member are lines of source.) The equivalent
to an "include file search path" for the OS/400 C compiler is a
list of *FILE objects to search for the header member named by the
#include directive.

So directory support for this filesystem should consider libraries
as "directories", but it also probably needs to handle *FILE objects
as both "files" (for the purpose of operating on the contents of a
library) and as "directories" (for the purpose of examining the
members of a file).

Later versions of OS/400 support multiple filesystems. The original
one became IFS, the Integrated File System. Tacked on alongside it,
for POSIX compatibility, is HFS, the Hierarchical File System.
OS/400 programs that use the appropriate (POSIX) APIs can operate
on files contained in an HFS, but HFS can't be used for everything,
at least in the version of OS/400 I'm running. You need IFS for
standard OS/400 applications.

The standard library functions - fopen, etc - work on both
filesystems. More complex system-specific tasks on IFS objects
require using OS/400 APIs. The POSIX ones mostly don't work with
IFS.

--
Michael Wojcik (e-mail address removed)

The lark is exclusively a Soviet bird. The lark does not like the
other countries, and lets its harmonious song be heard only over the
fields made fertile by the collective labor of the citizens of the
happy land of the Soviets. -- D. Bleiman
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,444
Messages
2,571,709
Members
48,796
Latest member
Greg L.
Top