Oddity with File::Find and timestamps

D

Dave Saville

Playing with File::Find and I seem to have found an oddity.

In a "wanted" routine -M _ differs from -M $_ depending on whether one
processes files or not: Test directory has two sub directories in it.

[T:\tmp\Test]ls -lR
total 0
drwxrwx--- 0 Dec 24 14:42 dir1
drwxrwx--- 0 Dec 24 14:42 dir2

dir1:
total 0
-rw-rw---a 0 Dec 24 14:37 stuff1

dir2:
total 0
-rw-rw---a 0 Dec 24 14:42 stuff2

use strict;
use warnings;
use File::Find;
my $d = shift || '';
#print time(), "\n";
$^T=1293201891; # fix basetime to keep numbers the same between runs
finddepth(\&notwanted, '.');
exit;
sub notwanted
{
return if m/^\.$/; # Don't need '.'
return if $d && ! -d; # only directories
print $_, ' ';
print -M _, " ";
print -M $_, " ";
print "\n";
}

[T:\tmp\Test]../try.pl
stuff1 0.00533564814814815 0.00533564814814815
dir1 0.00533564814814815 0.00167824074074074
stuff2 0.00140046296296296 0.00140046296296296
dir2 0.00140046296296296 0.00149305555555556

[T:\tmp\Test]../try.pl d
dir1 0.00167824074074074 0.00167824074074074
dir2 0.00149305555555556 0.00149305555555556

find() and finddepth() do the same thing - which is not surprising as
it always does finddepth and inverts the results. It's not the OS.
Does the same thing on two different ones. OS/2 and Ubuntu.

Hmm.......... Knowing me I am missing something :)
 
S

Skye Shaw!@#$

Playing with File::Find and I seem to have found an oddity.

In a "wanted" routine -M _ differs from -M $_ depending on whether one
processes files or not:

<snip directory listing and some code>

finddepth(\&notwanted, '.');
exit;
sub notwanted
{
  return if m/^\.$/; # Don't need '.'
  return if $d && ! -d; # only directories
  print $_, ' ';
  print -M _, " ";
  print -M $_, " ";
  print "\n";    

}


Try passing this function to findepth()

sub notwanted
{
return if m/^\.$/; # Don't need
'.'
return if $d && ! -d; # only
directories
print $_, ' ';
print -M _, " ";
print " (using a file's stat entry) " if -f _;
print -M $_, " (using ${_}'s stat entry)";
print "\n";
}


using _ will use the last stat() call's info.

See perldoc -f -X


-Skye
 
D

Dave Saville

Try passing this function to findepth()

sub notwanted
{
return if m/^\.$/; # Don't need
'.'
return if $d && ! -d; # only
directories
print $_, ' ';
print -M _, " ";
print " (using a file's stat entry) " if -f _;
print -M $_, " (using ${_}'s stat entry)";
print "\n";
}

[T:\tmp\test]../try1.pl
stuff1 0.00533564814814815 (using a file's stat entry)
0.00533564814814815 (usi
ng stuff1's stat entry)
dir1 0.00533564814814815 (using a file's stat entry)
0.00167824074074074 (using
dir1's stat entry)
stuff2 0.00140046296296296 (using a file's stat entry)
0.00140046296296296 (usi
ng stuff2's stat entry)
dir2 0.00140046296296296 (using a file's stat entry)
0.00149305555555556 (using
dir2's stat entry)

[T:\tmp\test]../try1.pl d
dir1 0.00167824074074074 0.00167824074074074 (using dir1's stat entry)
dir2 0.00149305555555556 0.00149305555555556 (using dir2's stat entry)

I don't see what that proved.
using _ will use the last stat() call's info.

Yes, I know - but the point is the docs imply that $_ has been stat'ed
before wanted() gets control for each file/dir.
 
S

Skye Shaw!@#$

I don't see what that proved.

It shows why you're not getting the output you expected.

Yes, I know - but the point is the docs imply that $_ has been stat'ed
before wanted() gets control for each file/dir.

Your original question did not imply that certain functionality
described in the docs was not being obtained in your program.

See File::Find's follow and follow_fast options.
 
S

Skye Shaw!@#$

No it doesn't.

The output differs because _ was never updated, which is what I
showed.
It shows a different way of getting the expected output.
(that is, a way that does not use the _ filehandle)

It shows that too.

-Skye
 
P

Peter J. Holzer

Huh?

What docs are you referring to?

perldoc File::Find, presumably:

| * It is guaranteed that an lstat has been called before the
| user’s "wanted()" function is called. This enables fast file
| checks involving _. Note that this guarantee no longer holds
| if follow or follow_fast are not set.

Maybe Dave read only the first but not the second sentence of this
paragraph.

hp
 
D

Dave Saville

When $d is false, there is no stat call.




When $d is false, there is still no stat call.

I know - I put that test in to save editing the script for doing all
or just doing directories every time.
Huh?

What docs are you referring to?

$_ will have been stat()ed only when $d is true.

Since you want it to be stat()ed every time, you must do the file
test before you test the $d flag:

return if ! -d && $d;

Lets start again :)

use strict;
use warnings;
use File::Find;

$^T=1293201891; # fix basetime to keep numbers the same between runs
finddepth(\&wanted1, '.');
print "\n";
finddepth(\&wanted2, '.');
exit;

sub wanted1
{
return if m/^\.$/; # Don't need '.'
return if ! -d; # only directories
print $_, ' ', -M _, " ", "\n";
}

sub wanted2
{
return if m/^\.$/; # Don't need '.'
print $_, ' ', -M _, " ", "\n";
}


[T:\tmp\test]../try.pl
dir1 0.00167824074074074
dir2 0.00149305555555556

stuff1 0.00533564814814815
dir1 0.00533564814814815
stuff2 0.00140046296296296
dir2 0.00140046296296296

It looks to me that files always get a stat call performed for you,
but directories don't unless a test is done that implicitly calls
stat(). inserting a "return if ! -e;" into wanted2 then gets the
correct time for the directories. Whilst file timestamps remain the
same. Indicating t to me that files *always* get stat'ed before wanted
gets a look in.

I can't use follow* settings to check what Peter pointed out because
the port for OS/2, which does not have symlinks by default, fails with
"The stat preceding -l _ wasn't an lstat at
D:/usr/lib/perl/lib/5.8.2/File/Find.pm line 515". Find.pm obviously
needs whatever the code is there for Windows.



I gave this solution to your problem yesterday in the duplicate
version of this thread.

Why did you start a duplicate version of this thread?

Huh? I didn't. I see only one thread and never saw your answer
yesterday.
 
D

Dr.Ruud

sub wanted1
{
return if m/^\.$/; # Don't need '.'

ITYM:

return if $_ eq '.';

(and be aware that you are skipping a file called ".\n")

return if ! -d; # only directories
print $_, ' ', -M _, " ", "\n";

Are you afraid of printf?

printf q{%s %s\n}, $_, -M _;

(and why did you have the space before the newline?)
 
D

Dave Saville

ITYM:

return if $_ eq '.';

(and be aware that you are skipping a file called ".\n")



Are you afraid of printf?

printf q{%s %s\n}, $_, -M _;

(and why did you have the space before the newline?)

Because the whole sorry mess is me hacking stuff about inserting and
deleting bits of code and joining lines together to see WTF is going
on with the timestamps and whether or not stat() is automaticlly
involved. Which it appears not to be for directories and the default
mode of operation. I am sorry if my temporary diagnostics are not to
your purest taste. :)
 
P

Peter J. Holzer

Lets start again :)

use strict;
use warnings;
use File::Find;

$^T=1293201891; # fix basetime to keep numbers the same between runs
finddepth(\&wanted1, '.');
print "\n";
finddepth(\&wanted2, '.');
exit;

sub wanted1
{
return if m/^\.$/; # Don't need '.'
return if ! -d; # only directories
print $_, ' ', -M _, " ", "\n";
}

sub wanted2
{
return if m/^\.$/; # Don't need '.'
print $_, ' ', -M _, " ", "\n";
}


[T:\tmp\test]../try.pl
dir1 0.00167824074074074
dir2 0.00149305555555556

stuff1 0.00533564814814815
dir1 0.00533564814814815
stuff2 0.00140046296296296
dir2 0.00140046296296296

It looks to me that files always get a stat call performed for you,
but directories don't unless a test is done that implicitly calls
stat(). inserting a "return if ! -e;" into wanted2 then gets the
correct time for the directories. Whilst file timestamps remain the
same. Indicating t to me that files *always* get stat'ed before wanted
gets a look in.

I don't know the OS/2 file system. It is possible that on OS/2 every
directory entry has to be stat'ed to determine whether it is a
subdirectory or a file (on Unix this can sometimes be optimized).
Since you are using finddepth the sequence would be something like:

read directory . (returns ".", "..", "dir1", "dir2").
stat "dir1" (aha, its a directory, so enter it)
read "." (returns ".", "..", "stuff1")
call wanted(".")
skip ".." (special case)
stat "stuff1" (its a file)
call wanted("stuff1")
(no we are done with the contents of "dir1", so we return to the parent dir and)
call wanted("dir1")
stat "dir2" (aha, its a directory, so enter it)
read "." (returns ".", "..", "stuff2")
call wanted(".")
skip ".." (special case)
stat "stuff2" (its a file)
call wanted("stuff2")
(no we are done with the contents of "dir2", so we return to the parent dir and)
call wanted("dir2")

You see that dir1 and dir2 are stat'ed, but wanted() isn't called
immediately after the stat, it is called only after all the the files
the directory are stat'ed, too. So _ contains the data from the last
file.


The lesson you should learn here is not that File::Find behaves as
in this example (it behaves differently on a Unix system),
but that if the docs say you cannot rely on something you really
shouldn't rely on it, even if it seems to work most of the time.

hp
 
D

Dave Saville

Then there is no guarantee that lstat has been called.

So you need to arrange to have it called yourself.

I had worked that out by now :) But....

If, as it appears, all but directories *are* stat'ed then by doing a
test that invokes stat one is doing it twice for 99% of the cases. Now
I agree that most of the time this matters not a jot, but it just so
happens that the structure I am really processing, rather than the
Micky Mouse test case, has upwards of 60 directories with around
100,000 files. That *must* be a performance hit surely?
The one with:

Subject: Oddity with Find::File and -M

As opposed to this thread with:

Subject: Oddity with File::Find and timestamps

Hmm, then there is something very wrong with my newsreader. I posted
the first - but I never saw it nor a reply *and* upon checking found
it was not in my "sent" folder so I assumed I had closed the compose
window rather than hitting send and so did it again. Sorry if it
caused confusion. I will take it up with the maintainer of the
newsreader.

Thanks for helping get things straight in my mind.
 
D

Dave Saville

On Sun, 26 Dec 2010 23:20:29 UTC, "Peter J. Holzer"

You see that dir1 and dir2 are stat'ed, but wanted() isn't called
immediately after the stat, it is called only after all the the files
the directory are stat'ed, too. So _ contains the data from the last
file.

Makes complete sense.
The lesson you should learn here is not that File::Find behaves as
in this example (it behaves differently on a Unix system),

Actually it fails the same way on Ubuntu - First thing I tried.
but that if the docs say you cannot rely on something you really
shouldn't rely on it, even if it seems to work most of the time.

Actually, I had not read that before you, or someone else, pointed it
out. Perldoc does not play well on OS/2 and I usually don't bother and
use Google. Unfortunately, I found what must have been an early
version that did not carry that particular nugget of information.
Previous uses and sample code I had seen suggested it always worked -
hence the confusion. Law of expected behaviour :)

Thanks.
 
I

Ilya Zakharevich

Actually, I had not read that before you, or someone else, pointed it
out. Perldoc does not play well on OS/2

Works absolutely fine here (except that it does not allow
customization of less switches - but neither it does on other
platforms...).

Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top