Perl and du output difference.

george.e.sullivan · Sep 7, 2006

In a closed thread Mr. John W. Kahn posted a script to add up directory
usage per user that produces simple output of such as:

userA 112345
userB 57389293
userC 323

and so forth
Here is Mr. Kahn's script:

perl -MFile::Find -le '($m) = stat( $d = shift ); find( sub{ @s =
lstat; $m == $s[0] and $u{ getpwuid $s[4] } += $s[7]}, $d ); printf
"%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .

The above is a cut and paste from there.

Output on one of my larger directories produces almost a 1 gigabyte
difference between this script and the du -ks command syntax.

du -ks = 37,928,180,000
script = 38,641,548,183

Is there any minute error in the script that would cause this or is the
script actually reading deeper into the file/directory structure and
accounting for unused blocks on the hard drive or other overhead types?

Thanks to all.

george.e.sullivan · Sep 7, 2006

In a closed thread Mr. John W. Kahn posted a script to add up directory
usage per user that produces simple output of such as:

userA 112345
userB 57389293
userC 323

and so forth
Here is Mr. Kahn's script:

perl -MFile::Find -le '($m) = stat( $d = shift ); find( sub{ @s =
lstat; $m == $s[0] and $u{ getpwuid $s[4] } += $s[7]}, $d ); printf
"%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .

The above is a cut and paste from there.

Output on one of my larger directories produces almost a 1 gigabyte
difference between this script and the du -ks command syntax.

du -ks = 37,928,180,000
script = 38,641,548,183

Is there any minute error in the script that would cause this or is the
script actually reading deeper into the file/directory structure and
accounting for unused blocks on the hard drive or other overhead types?

Thanks to all.

Also, if anyone knows how to modify the above script so that it sums
the individual totals that would be helpful also. Thanks to all.

John W. Krahn · Sep 7, 2006

In a closed thread Mr. John W. Kahn posted a script to add up directory
usage per user that produces simple output of such as:

userA 112345
userB 57389293
userC 323

and so forth
Here is Mr. Kahn's script:

perl -MFile::Find -le '($m) = stat( $d = shift ); find( sub{ @s =
lstat; $m == $s[0] and $u{ getpwuid $s[4] } += $s[7]}, $d ); printf
"%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .

The above is a cut and paste from there.

Output on one of my larger directories produces almost a 1 gigabyte
difference between this script and the du -ks command syntax.

du -ks = 37,928,180,000
script = 38,641,548,183

Is there any minute error in the script that would cause this or is the
script actually reading deeper into the file/directory structure and
accounting for unused blocks on the hard drive or other overhead types?

du checks for hard linked files but that script doesn't so you could have some
hard linked files that are counted multiple times by the script. Try du with
the -l switch and see if they produce the same results.

If you want the script to ignore hard linked files:

perl -MFile::Find -le'($m) = stat( $d = shift ); find( sub{ @s = lstat; $m ==
$s[0] and !$seen{$s[1]}++ and $u{ getpwuid $s[4] } += $s[7] }, $d ); printf
"%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .

John

John W. Krahn · Sep 7, 2006

perl -MFile::Find -le '($m) = stat( $d = shift ); find( sub{ @s =
lstat; $m == $s[0] and $u{ getpwuid $s[4] } += $s[7]}, $d ); printf
"%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .

Click to expand...

Also, if anyone knows how to modify the above script so that it sums
the individual totals that would be helpful also. Thanks to all.

perl -MFile::Find -le'($m) = stat( $d = shift ); find( sub{ @s = lstat; $m ==
$s[0] and $u{ getpwuid $s[4] } += $s[7] }, $d ); printf "%-5s %d\n", $_,
$u{$_}, $t += $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u; print "Total: $t"' .

John

Randal L. Schwartz · Sep 7, 2006

george> perl -MFile::Find -le '($m) = stat( $d = shift ); find( sub{ @s =
george> lstat; $m == $s[0] and $u{ getpwuid $s[4] } += $s[7]}, $d ); printf
george> "%-5s %d\n", $_, $u{$_} for sort { $u{$b} <=> $u{$a} } keys %u' .

This is adding $s[7] (highest byte in use), when it really should be adding
$s[11] * $s[12] (block size * number of blocks), which correctly deals with
indirect blocks and file holes. See the source for "du".

For example:

perl -e 'open X, ">somefile"; seek X, 2**30, 0 or die "$!"; print X "x"'

creates a file that is "1 gigabyte", but du reports as only a few
dozen blocks (the indirect blocks).

george.e.sullivan · Sep 7, 2006

Michele said:
In a closed thread Mr. John W. Kahn posted a script to add up directory
usage per user that produces simple output of such as: [snip]
Output on one of my larger directories produces almost a 1 gigabyte
difference between this script and the du -ks command syntax.

Click to expand...

I just gave a very quick peek into the thread and I see that stuff
like hard links is being mentioned. All this may well be relevant. But
also take into account that the *disk usage* of a file is generally
different from its exact size, and the former depends on the block
size of the device. This circumstance affects most commonly used osen.

Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

Oh heck... a typo. I am so sorry about the name misspelling. Thanks
to Christian Winter for pointing that out. And my apology Mr. Krahn.
Your script, however, is going to be very useful and I thank you for
taking the time to post it along with the updates.

The du I am using is on a SGI Altix 350 running their version of Redhat
Linux. I believe it is based on Advanced Server 3.0. SGI boxed it and
added their on bells and whistles. Also thank for the advice on
adding the individual numbers.

The "l" and "s" both product more comparible numbers. Thanks for that
tip. I know...man pages...they are there for a reason. Especially for
us who find old habits hard to break.

Randal... I will also try fields 11 and 12 so thanks for that tip also.

And Michele...I was worried about the actual space being used, blocking
factors, and such also. Thanks.

I am grateful for all the information each of you has provided. This
is truely a great community of users.

FAQ 8.44 How do I tell the difference between errors from the shell and perl?	0	Feb 20, 2011
perl - array functions (union, intersection, difference, aonly,bonly) input problems.	3	Feb 3, 2008
interesting behavior for Perl and Postgres	1	Aug 8, 2008
Interesting PERL anamoly - confirmation and/or explanations welcomed	7	Jun 14, 2007
ActiveState Perl mangles text files	18	May 24, 2004
Problem with one perl script executing another, execution started byApache httpd	9	Sep 16, 2008
how to simply parse raw image data and output to an image file ?	2	Jan 3, 2008
differences between find2perl and find	4	Aug 28, 2009

Perl and du output difference.

george.e.sullivan

george.e.sullivan

John W. Krahn

John W. Krahn

Randal L. Schwartz

george.e.sullivan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads