Inconsistent results from (dos)glob

Discussion in 'Perl Misc' started by Theo van den Heuvel, Jan 27, 2010.

  1. Hi,

    The following script behaves differently on two Windows machines, one
    XP with v5.10.0, the other Vista with v5.10.1. Here is the script:

    <script>

    #!/usr/bin/perl
    use strict;
    use warnings;

    use FindBin qw($Bin);
    # use File::DosGlob 'glob';

    my $subdir = "$Bin/FIResources";
    my @file = <"$subdir/*">;
    my @file_again = glob("\"$subdir\"/*");
    print "angular: ", join(', ', @file), "\n";
    print "glob: ", join(', ', @file_again), "\n";

    </script>

    On XP the script lists the files in both print statements. On that
    system the path
    contains spaces (I know, not my idea).
    The Vista directory does not contain spaces, but both arrays remain
    empty.

    Any suggestions?

    Thanks

    Theo van den Heuvel
    Theo van den Heuvel, Jan 27, 2010
    #1
    1. Advertising

  2. On 28 jan, 00:53, Ben Morrow <> wrote:
    > Quoth Theo van den Heuvel <>:
    >


    >
    > > The following script behaves differently on two Windows machines, one
    > > XP with v5.10.0, the other Vista with v5.10.1. Here is the script:

    >
    > > <script>



    >
    > > Any suggestions?

    >
    > Have you verified that FindBin is working properly (it doesn't always)
    > and that the FIResources directory actually exists?
    >
    > Do you have read permission (or whatever permission is required on Win32
    > to call readdir) on the directory? What do you get if you call
    > opendir/readdir directly?
    >
    > Ben


    Hi Ben,

    FindBin was working properly and there is no difference in permissions
    that I am aware of.

    I added:

    opendir(my $dir, $subdir) or die "can't opendir $subdir: $!";
    my @file_yetagain = readdir($dir);
    print "readdir: ", join(', ', @file_yetagain), "\n";
    closedir $dir;

    and that works okay on both platforms. That proves that the files are
    really there.

    I am still stumped on how the script can fail on the second machine.

    Thanks,

    Theo
    Theo van den Heuvel, Jan 28, 2010
    #2
    1. Advertising

  3. On 28 jan, 02:25, Ben Morrow <> wrote:
    > Quoth Theo van den Heuvel <>:
    >


    >
    > Then I'm afraid you need to go grubbing around in File::DosGlob adding
    > debug statements until you find where the problem is (or use the
    > debugger, if that's your cup of tea).
    >
    > Ben


    Something like that. However it is both DosGlob and the ordinary glob
    that misbehaves.
    I need to get some sleep first, and will apply the debugger first
    thing in the morning.

    Theo
    Theo van den Heuvel, Jan 28, 2010
    #3
  4. On 2010-01-27, Theo van den Heuvel <> wrote:
    > my $subdir = "$Bin/FIResources";
    > my @file = <"$subdir/*">;
    > my @file_again = glob("\"$subdir\"/*");


    If this works, then ONLY due to bugs in glob() (this is IMO; prove me wrong if
    you can). Use
    bsdglob( "$subdir/*" )
    instead.

    Hope this helps,
    Ilya
    Ilya Zakharevich, Jan 28, 2010
    #4
  5. On 28 jan, 04:42, Ilya Zakharevich <> wrote:
    > On 2010-01-27, Theo van den Heuvel <> wrote:
    >
    > > my $subdir = "$Bin/FIResources";
    > > my @file = <"$subdir/*">;
    > > my @file_again = glob("\"$subdir\"/*");

    >
    > If this works, then ONLY due to bugs in glob() (this is IMO; prove me wrong if
    > you can).  Use
    >   bsdglob( "$subdir/*" )
    > instead.
    >
    > Hope this helps,
    > Ilya


    Dear Ilya,

    bsd_glob() does work consistently on both systems. This means I have a
    solution and that makes me a happy man.

    My confusion, however, has increased, because the documentation
    suggests that glob is implemented in terms of bsd_glob. I added the
    double quotes to avoid that glob splits
    the path on the spaces. (Spaces in names IMO is one of the most
    unfortunate design mistakes in Windows).

    Anyway, thanks a million, Ilya,

    Theo
    Theo van den Heuvel, Jan 28, 2010
    #5
  6. On 2010-01-28, Theo van den Heuvel <> wrote:
    >> > my @file_again = glob("\"$subdir\"/*");

    >>
    >> If this works, then ONLY due to bugs in glob() (this is IMO; prove me wrong if
    >> you can).  Use
    >>   bsdglob( "$subdir/*" )
    >> instead.


    > bsd_glob() does work consistently on both systems. This means I have a
    > solution and that makes me a happy man.
    >
    > My confusion, however, has increased, because the documentation
    > suggests that glob is implemented in terms of bsd_glob.


    .... but interprets spaces differently...

    > I added the double quotes to avoid that glob splits the path on the
    > spaces.


    And what made you think that this would "avoid this"? (Except, maybe,
    experiments with a buggy implementation?)

    Yours,
    Ilya
    Ilya Zakharevich, Jan 28, 2010
    #6
  7. On 28 jan, 12:39, Ilya Zakharevich <> wrote:
    > On 2010-01-28, Theo van den Heuvel <> wrote:
    >
    > >> > my @file_again = glob("\"$subdir\"/*");

    >


    > > My confusion, however, has increased, because the documentation
    > > suggests that glob is implemented in terms of bsd_glob.

    >
    > ... but interprets spaces differently...


    Ok. Something I am missing in the documentation.

    >
    > > I added the double quotes to avoid that glob splits the path on the
    > > spaces.

    >
    > And what made you think that this would "avoid this"?  (Except, maybe,
    > experiments with a buggy implementation?)


    Yes. Naively, surely, I guessed from the fact that you can use double
    quotes in a Windows command box in the same way. Prior to your
    comments I had no indication that glob was buggy. I am still in shock
    about that.

    >
    > Yours,
    > Ilya


    Thanks,

    Theo
    Theo van den Heuvel, Jan 28, 2010
    #7
  8. [OT] Re: Inconsistent results from (dos)glob

    On Thu, 28 Jan 2010 00:34:37 -0800, Theo van den Heuvel wrote:

    > (Spaces in names IMO is one of the most unfortunate design mistakes in
    > Windows).


    Spaces are fine. Newlines in filenames on Unix, now there is trouble!

    M4
    Martijn Lievaart, Jan 28, 2010
    #8
  9. Theo van den Heuvel

    sreservoir Guest

    Re: [OT] Re: Inconsistent results from (dos)glob

    On 1/28/2010 2:40 PM, Martijn Lievaart wrote:
    > On Thu, 28 Jan 2010 00:34:37 -0800, Theo van den Heuvel wrote:
    >
    >> (Spaces in names IMO is one of the most unfortunate design mistakes in
    >> Windows).

    >
    > Spaces are fine. Newlines in filenames on Unix, now there is trouble!


    not really. you can have newlines without trouble. however, if you use
    one of those filesystems that let you have nulls in filenames, some of
    the standard utilities might segfault or overflow.

    funny how segfault is in my dictionary and not filename.

    --

    "Six by nine. Forty two."
    "That's it. That's all there is."
    "I always thought something was fundamentally wrong with the universe"
    sreservoir, Jan 29, 2010
    #9
  10. Theo van den Heuvel

    John Bokma Guest

    Theo van den Heuvel <> writes:

    > the path on the spaces. (Spaces in names IMO is one of the most
    > unfortunate design mistakes in Windows).


    Heh, I would say it the other way around: not supporting spaces in
    filenames/directory names is a design mistake. (One that is even to some
    extent visible on the www...)

    --
    John Bokma j3b

    Hacking & Hiking in Mexico - http://johnbokma.com/
    http://castleamber.com/ - Perl & Python Development
    John Bokma, Jan 29, 2010
    #10
  11. John Bokma <> wrote:
    >Theo van den Heuvel <> writes:
    >
    >> the path on the spaces. (Spaces in names IMO is one of the most
    >> unfortunate design mistakes in Windows).

    >
    >Heh, I would say it the other way around: not supporting spaces in
    >filenames/directory names is a design mistake.


    Are there any widely used file systems that don't support spaces in file
    names?

    jue
    Jürgen Exner, Jan 29, 2010
    #11
  12. Re: [OT] Re: Inconsistent results from (dos)glob

    On 2010-01-29 00:53, sreservoir <> wrote:
    > On 1/28/2010 2:40 PM, Martijn Lievaart wrote:
    >> On Thu, 28 Jan 2010 00:34:37 -0800, Theo van den Heuvel wrote:
    >>
    >>> (Spaces in names IMO is one of the most unfortunate design mistakes in
    >>> Windows).

    >>
    >> Spaces are fine. Newlines in filenames on Unix, now there is trouble!

    >
    > not really. you can have newlines without trouble.


    Right, the kernel API doesn't care about spaces or newlines. The only
    characters (bytes) in filenames with special meaning are "/" and "\0".

    But many standard utilities treat whitespace as delimiters. I can't
    think of any which is explicitely intended for processing filenames
    where newline is more special than other whitespace, but this is
    certainly true for the general-purpose text processing tools (sort,
    grep, ...).

    Many GNU tools have an option to use "\0" instead of "\n"
    as the record delimiter, so you can do someting like
    find -print0 | grep -z | sort -z | xargs -0
    but this isn't portable.

    Writing shell scripts which correctly handle all filenames is possible
    (at least on Linux) but you really have to know about and remember all
    the corner cases. It is usually simpler to write a Perl script (although
    Perl has its share of annoying DWIMmery, too).

    > however, if you use one of those filesystems that let you have nulls
    > in filenames, some of the standard utilities might segfault or
    > overflow.


    That's impossible. All the syscalls dealing with filenames treat "\0" as
    a terminator. There is no way to create or access a file with a null in
    its name[1]. If a filesystem allows such names and there is a possibility
    that they actually exist (e.g., the filesystem is on an external disk
    previously mounted under another OS) then the filesystem code must
    provide a translation.

    hp

    [1] Yes, I do remember the MacOS/SunOS/NFS desaster. But in this case
    the SunOS NFS server code (residing in the kernel) in effect created
    a second API for accessing files.
    Peter J. Holzer, Jan 29, 2010
    #12
  13. Re: [OT] Re: Inconsistent results from (dos)glob

    On 2010-01-29 13:03, Ben Morrow <> wrote:
    > Quoth "Peter J. Holzer" <>:
    >> On 2010-01-29 00:53, sreservoir <> wrote:
    >> > however, if you use one of those filesystems that let you have nulls
    >> > in filenames, some of the standard utilities might segfault or
    >> > overflow.

    >>
    >> That's impossible. All the syscalls dealing with filenames treat "\0" as
    >> a terminator. There is no way to create or access a file with a null in
    >> its name[1].

    >
    > All modern Win32 filesystems (FAT32, NTFS) represent filenames
    > internally as UCS-2 or UTF-16, which often contain nulls.


    The context (at least of the last two postings before I replied) was
    Unix, not Windows. On a POSIX compatible OS, the filesystem may use
    UTF-16 to actually store filenames on disk, but it needs to translate
    them in the API, because the string representation in the API
    (zero-terminated byte strings) doesn't allow UTF-16.
    UTF-8 is the most logical choice here.


    > The current official API (CreateFileW &c.) and the MS-specific
    > stdc-like wrappers (_wopen, _wfopen, &c.) all take
    > 16-bit-null-delimited 16-bit strings.


    Yes. But note that the API here is not byte-oriented but operates on
    16-bit quantities. So the strings are still zero-terminated, and you
    don't have a null *character* in the file name.


    > The 8-bit 'ANSI' API translates filenames to and from some 8-bit or
    > multibyte encoding, specified as the current process 'code page'. Since
    > processes are not normally using a UTF-8 code page, this means some
    > names are untranslatable.
    >
    > This is all a serious, and seriously annoying, issue for perl on Win32.


    On Win32, the Right Thing(TM) would probably be to always use the UTF-16
    API and translate from/to Perl character strings. That would be an
    incompatibility with Unix perl where filenames are byte strings, but
    every alternative seems worse to me.

    hp
    Peter J. Holzer, Jan 29, 2010
    #13
  14. On 2010-01-29 03:21, Jürgen Exner <> wrote:
    > John Bokma <> wrote:
    >>Theo van den Heuvel <> writes:
    >>
    >>> the path on the spaces. (Spaces in names IMO is one of the most
    >>> unfortunate design mistakes in Windows).

    >>
    >>Heh, I would say it the other way around: not supporting spaces in
    >>filenames/directory names is a design mistake.

    >
    > Are there any widely used file systems that don't support spaces in file
    > names?


    I don't know any filesystem which doesn't support spaces (even FAT-16
    back in MS-DOS 3.x days did). The problem isn't the filesystem but the
    tools and applications. If the file system didn't support spaces that
    wouldn't be a big deal: The user would simply use a different character
    (maybe "_" or "-"). But if the filesystem does support spaces but some
    tools don't, then you have a problem: The user will create files with
    spaces (because he can) and then some tools will fail. (What Microsoft
    really fucked up in Win95 was that although there were some important
    directories with spaces in the default installation ("Program Files",
    ....) some core OS tools couldn't handle them. Hilarity ensued ...)

    hp
    Peter J. Holzer, Jan 29, 2010
    #14
  15. Re: [OT] Re: Inconsistent results from (dos)glob

    On 2010-01-29, Peter J. Holzer <> wrote:
    > That's impossible. All the syscalls dealing with filenames treat "\0" as
    > a terminator. There is no way to create or access a file with a null in
    > its name[1].


    It is not a problem to create or access a file with a null in its name
    on Unix. (Remember read()/write() syscalls?)

    Hope this helps,
    Ilya
    Ilya Zakharevich, Jan 29, 2010
    #15
  16. Re: [OT] Re: Inconsistent results from (dos)glob

    On 2010-01-29, Ben Morrow <> wrote:
    > All modern Win32 filesystems (FAT32, NTFS) represent filenames
    > internally as UCS-2 or UTF-16, which often contain nulls.


    The internal representation of a directory entry on a raw file system
    should not matter when accessing files through the OS'es API.

    > The current
    > official API (CreateFileW &c.) and the MS-specific stdc-like wrappers
    > (_wopen, _wfopen, &c.) all take 16-bit-null-delimited 16-bit strings.


    So there is no problem: 0 terminates the name.

    > The 8-bit 'ANSI' API translates filenames to and from some 8-bit or
    > multibyte encoding, specified as the current process 'code page'.


    Likewise.

    > Since
    > processes are not normally using a UTF-8 code page, this means some
    > names are untranslatable.


    AFAIK, any file name on Win32 is translatable to 8.3. But I might be wrong...

    > This is all a serious, and seriously annoying, issue for perl on Win32.


    Only due to bugs in the porting layer.

    Yours,
    Ilya
    Ilya Zakharevich, Jan 29, 2010
    #16
  17. "Peter J. Holzer" <> wrote:
    >On 2010-01-29 03:21, Jürgen Exner <> wrote:
    >> John Bokma <> wrote:
    >>>Theo van den Heuvel <> writes:
    >>>
    >>>> the path on the spaces. (Spaces in names IMO is one of the most
    >>>> unfortunate design mistakes in Windows).
    >>>
    >>>Heh, I would say it the other way around: not supporting spaces in
    >>>filenames/directory names is a design mistake.

    >>
    >> Are there any widely used file systems that don't support spaces in file
    >> names?

    >
    >I don't know any filesystem which doesn't support spaces (even FAT-16
    >back in MS-DOS 3.x days did). The problem isn't the filesystem but the
    >tools and applications. If the file system didn't support spaces that
    >wouldn't be a big deal: The user would simply use a different character
    >(maybe "_" or "-"). But if the filesystem does support spaces but some
    >tools don't, then you have a problem: The user will create files with
    >spaces (because he can) and then some tools will fail.


    Using the same logic we should not use any characters but ASCII. After
    all the user would simply use a different character and the programs
    would not fail any longer on those nasty non-ASCII characters. Would
    make live a lot easier for programmers, wouldn't it?

    Jürgen
    J Rgen
    Jrgen
    Jürgen

    jue
    Jürgen Exner, Jan 29, 2010
    #17
  18. On 2010-01-29 17:27, Jürgen Exner <> wrote:
    > "Peter J. Holzer" <> wrote:
    >>On 2010-01-29 03:21, Jürgen Exner <> wrote:
    >>> John Bokma <> wrote:
    >>>>Theo van den Heuvel <> writes:
    >>>>> the path on the spaces. (Spaces in names IMO is one of the most
    >>>>> unfortunate design mistakes in Windows).
    >>>>
    >>>>Heh, I would say it the other way around: not supporting spaces in
    >>>>filenames/directory names is a design mistake.
    >>>
    >>> Are there any widely used file systems that don't support spaces in file
    >>> names?

    >>
    >>I don't know any filesystem which doesn't support spaces (even FAT-16
    >>back in MS-DOS 3.x days did). The problem isn't the filesystem but the
    >>tools and applications. If the file system didn't support spaces that
    >>wouldn't be a big deal: The user would simply use a different character
    >>(maybe "_" or "-"). But if the filesystem does support spaces but some
    >>tools don't, then you have a problem: The user will create files with
    >>spaces (because he can) and then some tools will fail.

    >
    > Using the same logic we should not use any characters but ASCII.


    Huh? Where did I say that?

    What we should do of course is to write tools and applications which do
    work well with arbitrary file names. Spaces in file names have been
    around since at least the 1970's and common since at least the
    mid-1990's. It's time that programmers (and sysadmin's) stop pretending
    they don't exist.

    hp
    Peter J. Holzer, Jan 29, 2010
    #18
  19. "Peter J. Holzer" <> wrote:
    >On 2010-01-29 17:27, Jürgen Exner <> wrote:
    >> "Peter J. Holzer" <> wrote:
    >>>On 2010-01-29 03:21, Jürgen Exner <> wrote:
    >>>> John Bokma <> wrote:
    >>>>>Theo van den Heuvel <> writes:
    >>>>>> the path on the spaces. (Spaces in names IMO is one of the most
    >>>>>> unfortunate design mistakes in Windows).
    >>>>>
    >>>>>Heh, I would say it the other way around: not supporting spaces in
    >>>>>filenames/directory names is a design mistake.
    >>>>
    >>>> Are there any widely used file systems that don't support spaces in file
    >>>> names?
    >>>
    >>>I don't know any filesystem which doesn't support spaces (even FAT-16
    >>>back in MS-DOS 3.x days did). The problem isn't the filesystem but the
    >>>tools and applications. If the file system didn't support spaces that
    >>>wouldn't be a big deal: The user would simply use a different character
    >>>(maybe "_" or "-"). But if the filesystem does support spaces but some
    >>>tools don't, then you have a problem: The user will create files with
    >>>spaces (because he can) and then some tools will fail.

    >>
    >> Using the same logic we should not use any characters but ASCII.

    >
    >Huh? Where did I say that?
    >
    >What we should do of course is to write tools and applications which do
    >work well with arbitrary file names. Spaces in file names have been
    >around since at least the 1970's and common since at least the
    >mid-1990's. It's time that programmers (and sysadmin's) stop pretending
    >they don't exist.


    My appologies, obviously I totally misunderstood the drift of your
    earlier posting. I honestly thought you were blaming the file system,
    not the tools. Again, my appologies.

    jue
    Jürgen Exner, Jan 29, 2010
    #19
  20. Re: [OT] Re: Inconsistent results from (dos)glob

    On 2010-01-29 17:17, Ilya Zakharevich <> wrote:
    > On 2010-01-29, Peter J. Holzer <> wrote:
    >> That's impossible. All the syscalls dealing with filenames treat "\0" as
    >> a terminator. There is no way to create or access a file with a null in
    >> its name[1].

    >
    > It is not a problem to create or access a file with a null in its name
    > on Unix. (Remember read()/write() syscalls?)


    read and write syscalls do not create or access files. Unless you are
    talking about opening the block device and reading from/writing to that.
    But in that case you aren't "accessing files with a null in its name",
    you are just accessing one huge file the size of your (logical) disk.

    hp
    Peter J. Holzer, Jan 29, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Georgy Pruss
    Replies:
    15
    Views:
    723
    Tim Roberts
    Dec 1, 2003
  2. Tim Peters
    Replies:
    1
    Views:
    356
    Duncan Booth
    Dec 1, 2003
  3. Sean Berry

    Question about glob.glob <--newbie

    Sean Berry, May 4, 2004, in forum: Python
    Replies:
    3
    Views:
    344
    David M. Cooke
    May 4, 2004
  4. Elbert Lev

    glob.glob unicode bug or feature

    Elbert Lev, Jul 31, 2004, in forum: Python
    Replies:
    5
    Views:
    392
    Neil Hodgson
    Aug 2, 2004
  5. Hitesh

    glob.glob output

    Hitesh, Mar 12, 2007, in forum: Python
    Replies:
    6
    Views:
    397
    Hitesh
    Mar 13, 2007
Loading...

Share This Page