Inconsistent results from (dos)glob

  • Thread starter Theo van den Heuvel
  • Start date
T

Theo van den Heuvel

Hi,

The following script behaves differently on two Windows machines, one
XP with v5.10.0, the other Vista with v5.10.1. Here is the script:

<script>

#!/usr/bin/perl
use strict;
use warnings;

use FindBin qw($Bin);
# use File::DosGlob 'glob';

my $subdir = "$Bin/FIResources";
my @file = <"$subdir/*">;
my @file_again = glob("\"$subdir\"/*");
print "angular: ", join(', ', @file), "\n";
print "glob: ", join(', ', @file_again), "\n";

</script>

On XP the script lists the files in both print statements. On that
system the path
contains spaces (I know, not my idea).
The Vista directory does not contain spaces, but both arrays remain
empty.

Any suggestions?

Thanks

Theo van den Heuvel
 
T

Theo van den Heuvel

Quoth Theo van den Heuvel <[email protected]>:




Have you verified that FindBin is working properly (it doesn't always)
and that the FIResources directory actually exists?

Do you have read permission (or whatever permission is required on Win32
to call readdir) on the directory? What do you get if you call
opendir/readdir directly?

Ben

Hi Ben,

FindBin was working properly and there is no difference in permissions
that I am aware of.

I added:

opendir(my $dir, $subdir) or die "can't opendir $subdir: $!";
my @file_yetagain = readdir($dir);
print "readdir: ", join(', ', @file_yetagain), "\n";
closedir $dir;

and that works okay on both platforms. That proves that the files are
really there.

I am still stumped on how the script can fail on the second machine.

Thanks,

Theo
 
T

Theo van den Heuvel

Quoth Theo van den Heuvel <[email protected]>:
Then I'm afraid you need to go grubbing around in File::DosGlob adding
debug statements until you find where the problem is (or use the
debugger, if that's your cup of tea).

Ben

Something like that. However it is both DosGlob and the ordinary glob
that misbehaves.
I need to get some sleep first, and will apply the debugger first
thing in the morning.

Theo
 
I

Ilya Zakharevich

my $subdir = "$Bin/FIResources";
my @file = <"$subdir/*">;
my @file_again = glob("\"$subdir\"/*");

If this works, then ONLY due to bugs in glob() (this is IMO; prove me wrong if
you can). Use
bsdglob( "$subdir/*" )
instead.

Hope this helps,
Ilya
 
T

Theo van den Heuvel

If this works, then ONLY due to bugs in glob() (this is IMO; prove me wrong if
you can).  Use
  bsdglob( "$subdir/*" )
instead.

Hope this helps,
Ilya

Dear Ilya,

bsd_glob() does work consistently on both systems. This means I have a
solution and that makes me a happy man.

My confusion, however, has increased, because the documentation
suggests that glob is implemented in terms of bsd_glob. I added the
double quotes to avoid that glob splits
the path on the spaces. (Spaces in names IMO is one of the most
unfortunate design mistakes in Windows).

Anyway, thanks a million, Ilya,

Theo
 
I

Ilya Zakharevich

bsd_glob() does work consistently on both systems. This means I have a
solution and that makes me a happy man.

My confusion, however, has increased, because the documentation
suggests that glob is implemented in terms of bsd_glob.

.... but interprets spaces differently...
I added the double quotes to avoid that glob splits the path on the
spaces.

And what made you think that this would "avoid this"? (Except, maybe,
experiments with a buggy implementation?)

Yours,
Ilya
 
T

Theo van den Heuvel

... but interprets spaces differently...

Ok. Something I am missing in the documentation.
And what made you think that this would "avoid this"?  (Except, maybe,
experiments with a buggy implementation?)

Yes. Naively, surely, I guessed from the fact that you can use double
quotes in a Windows command box in the same way. Prior to your
comments I had no indication that glob was buggy. I am still in shock
about that.
Yours,
Ilya

Thanks,

Theo
 
M

Martijn Lievaart

(Spaces in names IMO is one of the most unfortunate design mistakes in
Windows).

Spaces are fine. Newlines in filenames on Unix, now there is trouble!

M4
 
S

sreservoir

Spaces are fine. Newlines in filenames on Unix, now there is trouble!

not really. you can have newlines without trouble. however, if you use
one of those filesystems that let you have nulls in filenames, some of
the standard utilities might segfault or overflow.

funny how segfault is in my dictionary and not filename.
 
J

John Bokma

Theo van den Heuvel said:
the path on the spaces. (Spaces in names IMO is one of the most
unfortunate design mistakes in Windows).

Heh, I would say it the other way around: not supporting spaces in
filenames/directory names is a design mistake. (One that is even to some
extent visible on the www...)
 
J

Jürgen Exner

John Bokma said:
Heh, I would say it the other way around: not supporting spaces in
filenames/directory names is a design mistake.

Are there any widely used file systems that don't support spaces in file
names?

jue
 
P

Peter J. Holzer

not really. you can have newlines without trouble.

Right, the kernel API doesn't care about spaces or newlines. The only
characters (bytes) in filenames with special meaning are "/" and "\0".

But many standard utilities treat whitespace as delimiters. I can't
think of any which is explicitely intended for processing filenames
where newline is more special than other whitespace, but this is
certainly true for the general-purpose text processing tools (sort,
grep, ...).

Many GNU tools have an option to use "\0" instead of "\n"
as the record delimiter, so you can do someting like
find -print0 | grep -z | sort -z | xargs -0
but this isn't portable.

Writing shell scripts which correctly handle all filenames is possible
(at least on Linux) but you really have to know about and remember all
the corner cases. It is usually simpler to write a Perl script (although
Perl has its share of annoying DWIMmery, too).
however, if you use one of those filesystems that let you have nulls
in filenames, some of the standard utilities might segfault or
overflow.

That's impossible. All the syscalls dealing with filenames treat "\0" as
a terminator. There is no way to create or access a file with a null in
its name[1]. If a filesystem allows such names and there is a possibility
that they actually exist (e.g., the filesystem is on an external disk
previously mounted under another OS) then the filesystem code must
provide a translation.

hp

[1] Yes, I do remember the MacOS/SunOS/NFS desaster. But in this case
the SunOS NFS server code (residing in the kernel) in effect created
a second API for accessing files.
 
P

Peter J. Holzer

Quoth "Peter J. Holzer said:
however, if you use one of those filesystems that let you have nulls
in filenames, some of the standard utilities might segfault or
overflow.

That's impossible. All the syscalls dealing with filenames treat "\0" as
a terminator. There is no way to create or access a file with a null in
its name[1].

All modern Win32 filesystems (FAT32, NTFS) represent filenames
internally as UCS-2 or UTF-16, which often contain nulls.

The context (at least of the last two postings before I replied) was
Unix, not Windows. On a POSIX compatible OS, the filesystem may use
UTF-16 to actually store filenames on disk, but it needs to translate
them in the API, because the string representation in the API
(zero-terminated byte strings) doesn't allow UTF-16.
UTF-8 is the most logical choice here.

The current official API (CreateFileW &c.) and the MS-specific
stdc-like wrappers (_wopen, _wfopen, &c.) all take
16-bit-null-delimited 16-bit strings.

Yes. But note that the API here is not byte-oriented but operates on
16-bit quantities. So the strings are still zero-terminated, and you
don't have a null *character* in the file name.

The 8-bit 'ANSI' API translates filenames to and from some 8-bit or
multibyte encoding, specified as the current process 'code page'. Since
processes are not normally using a UTF-8 code page, this means some
names are untranslatable.

This is all a serious, and seriously annoying, issue for perl on Win32.

On Win32, the Right Thing(TM) would probably be to always use the UTF-16
API and translate from/to Perl character strings. That would be an
incompatibility with Unix perl where filenames are byte strings, but
every alternative seems worse to me.

hp
 
P

Peter J. Holzer

Are there any widely used file systems that don't support spaces in file
names?

I don't know any filesystem which doesn't support spaces (even FAT-16
back in MS-DOS 3.x days did). The problem isn't the filesystem but the
tools and applications. If the file system didn't support spaces that
wouldn't be a big deal: The user would simply use a different character
(maybe "_" or "-"). But if the filesystem does support spaces but some
tools don't, then you have a problem: The user will create files with
spaces (because he can) and then some tools will fail. (What Microsoft
really fucked up in Win95 was that although there were some important
directories with spaces in the default installation ("Program Files",
....) some core OS tools couldn't handle them. Hilarity ensued ...)

hp
 
I

Ilya Zakharevich

That's impossible. All the syscalls dealing with filenames treat "\0" as
a terminator. There is no way to create or access a file with a null in
its name[1].

It is not a problem to create or access a file with a null in its name
on Unix. (Remember read()/write() syscalls?)

Hope this helps,
Ilya
 
I

Ilya Zakharevich

All modern Win32 filesystems (FAT32, NTFS) represent filenames
internally as UCS-2 or UTF-16, which often contain nulls.

The internal representation of a directory entry on a raw file system
should not matter when accessing files through the OS'es API.
The current
official API (CreateFileW &c.) and the MS-specific stdc-like wrappers
(_wopen, _wfopen, &c.) all take 16-bit-null-delimited 16-bit strings.

So there is no problem: 0 terminates the name.
The 8-bit 'ANSI' API translates filenames to and from some 8-bit or
multibyte encoding, specified as the current process 'code page'.
Likewise.

Since
processes are not normally using a UTF-8 code page, this means some
names are untranslatable.

AFAIK, any file name on Win32 is translatable to 8.3. But I might be wrong...
This is all a serious, and seriously annoying, issue for perl on Win32.

Only due to bugs in the porting layer.

Yours,
Ilya
 
J

Jürgen Exner

Peter J. Holzer said:
I don't know any filesystem which doesn't support spaces (even FAT-16
back in MS-DOS 3.x days did). The problem isn't the filesystem but the
tools and applications. If the file system didn't support spaces that
wouldn't be a big deal: The user would simply use a different character
(maybe "_" or "-"). But if the filesystem does support spaces but some
tools don't, then you have a problem: The user will create files with
spaces (because he can) and then some tools will fail.

Using the same logic we should not use any characters but ASCII. After
all the user would simply use a different character and the programs
would not fail any longer on those nasty non-ASCII characters. Would
make live a lot easier for programmers, wouldn't it?

Jürgen
J Rgen
Jrgen
Jürgen

jue
 
P

Peter J. Holzer

Using the same logic we should not use any characters but ASCII.

Huh? Where did I say that?

What we should do of course is to write tools and applications which do
work well with arbitrary file names. Spaces in file names have been
around since at least the 1970's and common since at least the
mid-1990's. It's time that programmers (and sysadmin's) stop pretending
they don't exist.

hp
 
J

Jürgen Exner

Peter J. Holzer said:
Huh? Where did I say that?

What we should do of course is to write tools and applications which do
work well with arbitrary file names. Spaces in file names have been
around since at least the 1970's and common since at least the
mid-1990's. It's time that programmers (and sysadmin's) stop pretending
they don't exist.

My appologies, obviously I totally misunderstood the drift of your
earlier posting. I honestly thought you were blaming the file system,
not the tools. Again, my appologies.

jue
 
P

Peter J. Holzer

That's impossible. All the syscalls dealing with filenames treat "\0" as
a terminator. There is no way to create or access a file with a null in
its name[1].

It is not a problem to create or access a file with a null in its name
on Unix. (Remember read()/write() syscalls?)

read and write syscalls do not create or access files. Unless you are
talking about opening the block device and reading from/writing to that.
But in that case you aren't "accessing files with a null in its name",
you are just accessing one huge file the size of your (logical) disk.

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top