Perl and du output difference.

  • Thread starter george.e.sullivan
  • Start date
G

george.e.sullivan

In a closed thread Mr. John W. Kahn posted a script to add up directory
usage per user that produces simple output of such as:

userA 112345
userB 57389293
userC 323

and so forth
Here is Mr. Kahn's script:

perl -MFile::Find -le '($m) = stat( $d = shift ); find( sub{ @s =
lstat; $m == $s[0] and $u{ getpwuid $s[4] } += $s[7]}, $d ); printf
"%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .

The above is a cut and paste from there.

Output on one of my larger directories produces almost a 1 gigabyte
difference between this script and the du -ks command syntax.

du -ks = 37,928,180,000
script = 38,641,548,183

Is there any minute error in the script that would cause this or is the
script actually reading deeper into the file/directory structure and
accounting for unused blocks on the hard drive or other overhead types?

Thanks to all.
 
G

george.e.sullivan

In a closed thread Mr. John W. Kahn posted a script to add up directory
usage per user that produces simple output of such as:

userA 112345
userB 57389293
userC 323

and so forth
Here is Mr. Kahn's script:

perl -MFile::Find -le '($m) = stat( $d = shift ); find( sub{ @s =
lstat; $m == $s[0] and $u{ getpwuid $s[4] } += $s[7]}, $d ); printf
"%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .

The above is a cut and paste from there.

Output on one of my larger directories produces almost a 1 gigabyte
difference between this script and the du -ks command syntax.

du -ks = 37,928,180,000
script = 38,641,548,183

Is there any minute error in the script that would cause this or is the
script actually reading deeper into the file/directory structure and
accounting for unused blocks on the hard drive or other overhead types?

Thanks to all.

Also, if anyone knows how to modify the above script so that it sums
the individual totals that would be helpful also. Thanks to all.
 
J

John W. Krahn

In a closed thread Mr. John W. Kahn posted a script to add up directory
usage per user that produces simple output of such as:

userA 112345
userB 57389293
userC 323

and so forth
Here is Mr. Kahn's script:

perl -MFile::Find -le '($m) = stat( $d = shift ); find( sub{ @s =
lstat; $m == $s[0] and $u{ getpwuid $s[4] } += $s[7]}, $d ); printf
"%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .

The above is a cut and paste from there.

Output on one of my larger directories produces almost a 1 gigabyte
difference between this script and the du -ks command syntax.

du -ks = 37,928,180,000
script = 38,641,548,183

Is there any minute error in the script that would cause this or is the
script actually reading deeper into the file/directory structure and
accounting for unused blocks on the hard drive or other overhead types?

du checks for hard linked files but that script doesn't so you could have some
hard linked files that are counted multiple times by the script. Try du with
the -l switch and see if they produce the same results.

If you want the script to ignore hard linked files:

perl -MFile::Find -le'($m) = stat( $d = shift ); find( sub{ @s = lstat; $m ==
$s[0] and !$seen{$s[1]}++ and $u{ getpwuid $s[4] } += $s[7] }, $d ); printf
"%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .



John
 
J

John W. Krahn

perl -MFile::Find -le '($m) = stat( $d = shift ); find( sub{ @s =
lstat; $m == $s[0] and $u{ getpwuid $s[4] } += $s[7]}, $d ); printf
"%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .

Also, if anyone knows how to modify the above script so that it sums
the individual totals that would be helpful also. Thanks to all.

perl -MFile::Find -le'($m) = stat( $d = shift ); find( sub{ @s = lstat; $m ==
$s[0] and $u{ getpwuid $s[4] } += $s[7] }, $d ); printf "%-5s %d\n", $_,
$u{$_}, $t += $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u; print "Total: $t"' .


John
 
R

Randal L. Schwartz

george> perl -MFile::Find -le '($m) = stat( $d = shift ); find( sub{ @s =
george> lstat; $m == $s[0] and $u{ getpwuid $s[4] } += $s[7]}, $d ); printf
george> "%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .

This is adding $s[7] (highest byte in use), when it really should be adding
$s[11] * $s[12] (block size * number of blocks), which correctly deals with
indirect blocks and file holes. See the source for "du".

For example:

perl -e 'open X, ">somefile"; seek X, 2**30, 0 or die "$!"; print X "x"'

creates a file that is "1 gigabyte", but du reports as only a few
dozen blocks (the indirect blocks).
 
G

george.e.sullivan

Michele said:
In a closed thread Mr. John W. Kahn posted a script to add up directory
usage per user that produces simple output of such as: [snip]
Output on one of my larger directories produces almost a 1 gigabyte
difference between this script and the du -ks command syntax.

I just gave a very quick peek into the thread and I see that stuff
like hard links is being mentioned. All this may well be relevant. But
also take into account that the *disk usage* of a file is generally
different from its exact size, and the former depends on the block
size of the device. This circumstance affects most commonly used osen.


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,


Oh heck... a typo. I am so sorry about the name misspelling. Thanks
to Christian Winter for pointing that out. And my apology Mr. Krahn.
Your script, however, is going to be very useful and I thank you for
taking the time to post it along with the updates.

The du I am using is on a SGI Altix 350 running their version of Redhat
Linux. I believe it is based on Advanced Server 3.0. SGI boxed it and
added their on bells and whistles. Also thank for the advice on
adding the individual numbers.

The "l" and "s" both product more comparible numbers. Thanks for that
tip. I know...man pages...they are there for a reason. Especially for
us who find old habits hard to break. :) :) :)

Randal... I will also try fields 11 and 12 so thanks for that tip also.


And Michele...I was worried about the actual space being used, blocking
factors, and such also. Thanks.

I am grateful for all the information each of you has provided. This
is truely a great community of users.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,135
Latest member
VeronaShap
Top