Getting the actual size of a sparse file

Discussion in 'Ruby' started by Daniel Berger, Jan 5, 2011.

  1. Hi,

    How do you get the true size of a sparse file? Using /var/log/lastlog
    on Ubuntu as an example I see this with "ls -lh"

    287K lastlog

    With "ls -sh" I see this:

    40K lastlog

    A File.stat call reveals this:

    #<File::Stat
    dev=0x801,
    ino=5249695,
    mode=0100664 (file rw-rw-r--),
    nlink=1,
    uid=0 (root),
    gid=43 (utmp),
    rdev=0x0 (0, 0),
    size=292876,
    blksize=4096,
    blocks=80,
    atime=Mon Jan 03 16:03:24 -0700 2011 (1294095804),
    mtime=Thu Oct 21 11:34:51 -0600 2010 (1287682491),
    ctime=Thu Oct 21 11:34:51 -0600 2010 (1287682491)>

    Multiplying blocks * blksize doesn't seem to match up, either.

    How do I arrive at 40k?

    Also, how would one go about detecting a sparse file?

    Regards,

    Dan
     
    Daniel Berger, Jan 5, 2011
    #1
    1. Advertising

  2. Daniel Berger

    Perry Smith Guest

    I would go find the source for Ubuntu's ls and see what does it do for
    the -s option.

    Note that -s is output in blocks.

    --
    Posted via http://www.ruby-forum.com/.
     
    Perry Smith, Jan 5, 2011
    #2
    1. Advertising

  3. Daniel Berger

    Ryan Davis Guest

    On Jan 5, 2011, at 08:37 , Daniel Berger wrote:

    > Hi,
    >
    > How do you get the true size of a sparse file? Using /var/log/lastlog
    > on Ubuntu as an example I see this with "ls -lh"


    If you're on ubuntu you should be able to provide extra options to du:

    http://en.wikipedia.org/wiki/Sparse_file
     
    Ryan Davis, Jan 5, 2011
    #3
  4. On Jan 5, 10:04=A0am, Ryan Davis <> wrote:
    > On Jan 5, 2011, at 08:37 , Daniel Berger wrote:
    >
    > > Hi,

    >
    > > How do you get the true size of a sparse file? Using /var/log/lastlog
    > > on Ubuntu as an example I see this with "ls -lh"

    >
    > If you're on ubuntu you should be able to provide extra options to du:
    >
    > http://en.wikipedia.org/wiki/Sparse_file


    True, but I'd like to use pure Ruby (not system calls) if possible, at
    least for *nix systems. Or will this require an extension?

    Regards,

    Dan
     
    Daniel Berger, Jan 5, 2011
    #4
  5. On 05/01/11 17:27, Daniel Berger wrote:
    >
    >
    > On Jan 5, 10:04 am, Ryan Davis<> wrote:
    >> On Jan 5, 2011, at 08:37 , Daniel Berger wrote:
    >>
    >>> Hi,

    >>
    >>> How do you get the true size of a sparse file? Using /var/log/lastlog
    >>> on Ubuntu as an example I see this with "ls -lh"

    >>
    >> If you're on ubuntu you should be able to provide extra options to du:
    >>
    >> http://en.wikipedia.org/wiki/Sparse_file

    >
    > True, but I'd like to use pure Ruby (not system calls) if possible, at
    > least for *nix systems. Or will this require an extension?


    dd if=/dev/zero bs=1 seek=100M count=0 of=out2

    irb(main):011:0> stat=File.stat("out2")
    => #<File::Stat dev=0xfc0c, ino=160, mode=0100644, nlink=1, uid=1006,
    gid=1006, rdev=0x0, size=104857600, blksize=4096, blocks=0, atime=Wed
    Jan 05 18:02:08 +0000 2011, mtime=Wed Jan 05 18:02:08 +0000 2011,
    ctime=Wed Jan 05 18:02:08 +0000 2011>

    irb(main):013:0> [stat.blocks*stat.blksize, stat.size]
    => [0, 104857600]

    Gives you allocated size & filesystem size.

    --
    Matthew Bloch Bytemark Hosting
    http://www.bytemark.co.uk/
    tel: +44 (0) 1904 890890
     
    Matthew Bloch, Jan 5, 2011
    #5
  6. On Jan 5, 9:58=A0am, Perry Smith <> wrote:
    > I would go find the source for Ubuntu's ls and see what does it do for
    > the -s option.
    >
    > Note that -s is output in blocks.


    Yeah, looks like ls -s defaults to a block size of 1.

    Hm, how does this look?

    class File
    def self.sparse?(file)
    stats =3D File.stat(file)
    stats.size > stats.blocks * stats.blksize
    end
    end
     
    Daniel Berger, Jan 5, 2011
    #6
  7. On Wed, Jan 5, 2011 at 5:37 PM, Daniel Berger <> wrote:
    > Hi,
    >
    > How do you get the true size of a sparse file? Using /var/log/lastlog
    > on Ubuntu as an example I see this with "ls -lh"
    >
    > 287K lastlog
    >
    > With "ls -sh" I see this:
    >
    > 40K lastlog
    >
    > A File.stat call reveals this:
    >
    > #<File::Stat
    > =A0dev=3D0x801,
    > =A0ino=3D5249695,
    > =A0mode=3D0100664 (file rw-rw-r--),
    > =A0nlink=3D1,
    > =A0uid=3D0 (root),
    > =A0gid=3D43 (utmp),
    > =A0rdev=3D0x0 (0, 0),
    > =A0size=3D292876,
    > =A0blksize=3D4096,
    > =A0blocks=3D80,
    > =A0atime=3DMon Jan 03 16:03:24 -0700 2011 (1294095804),
    > =A0mtime=3DThu Oct 21 11:34:51 -0600 2010 (1287682491),
    > =A0ctime=3DThu Oct 21 11:34:51 -0600 2010 (1287682491)>
    >
    > Multiplying blocks * blksize doesn't seem to match up, either.
    >


    See stat(2):

    The st_blocks field indicates the number of blocks allocated to =
    the
    file, 512-byte units. (This may be smaller than st_size/512 when th=
    e
    file has holes.)

    The st_blksize field gives the "preferred" blocksize for efficient f=
    ile
    system I/O. (Writing to a file in smaller chunks may cause an inef=
    fi-
    cient read-modify-rewrite.)

    So "blksize" has nothing to do with the size of the "blocks". They are
    always counted in 512-byte units.

    /Johan Holmberg
     
    Johan Holmberg, Jan 5, 2011
    #7
  8. On Jan 5, 12:39=A0pm, Johan Holmberg <> wrote:
    > On Wed, Jan 5, 2011 at 5:37 PM, Daniel Berger <> wrote:
    > > Hi,

    >
    > > How do you get the true size of a sparse file? Using /var/log/lastlog
    > > on Ubuntu as an example I see this with "ls -lh"

    >
    > > 287K lastlog

    >
    > > With "ls -sh" I see this:

    >
    > > 40K lastlog

    >
    > > A File.stat call reveals this:

    >
    > > #<File::Stat
    > > =A0dev=3D0x801,
    > > =A0ino=3D5249695,
    > > =A0mode=3D0100664 (file rw-rw-r--),
    > > =A0nlink=3D1,
    > > =A0uid=3D0 (root),
    > > =A0gid=3D43 (utmp),
    > > =A0rdev=3D0x0 (0, 0),
    > > =A0size=3D292876,
    > > =A0blksize=3D4096,
    > > =A0blocks=3D80,
    > > =A0atime=3DMon Jan 03 16:03:24 -0700 2011 (1294095804),
    > > =A0mtime=3DThu Oct 21 11:34:51 -0600 2010 (1287682491),
    > > =A0ctime=3DThu Oct 21 11:34:51 -0600 2010 (1287682491)>

    >
    > > Multiplying blocks * blksize doesn't seem to match up, either.

    >
    > See stat(2):
    >
    > =A0 =A0 =A0 =A0The st_blocks field indicates the number of =A0blocks =A0a=

    llocated =A0to =A0the
    > =A0 =A0 =A0 =A0file, =A0512-byte =A0units. =A0(This may be smaller than s=

    t_size/512 when the
    > =A0 =A0 =A0 =A0file has holes.)
    >
    > =A0 =A0 =A0 =A0The st_blksize field gives the "preferred" blocksize for e=

    fficient file
    > =A0 =A0 =A0 =A0system =A0I/O. =A0(Writing to a file in smaller chunks may=

    cause an ineffi-
    > =A0 =A0 =A0 =A0cient read-modify-rewrite.)
    >
    > So "blksize" has nothing to do with the size of the "blocks". They are
    > always counted in 512-byte units.


    Oh, wow, I don't think I knew that. It's strikes me as particularly
    bizarre that they would return some notion of a "preferred block size"
    instead of the actual block size. Seriously, what's the use of that?

    Now I need to check other platforms (Solaris, HP-UX) to see if they
    use 512 byte convention.

    Is this something that's universal? Or is it something I can get via a
    C call somewhere?

    Regards,

    Dan
     
    Daniel Berger, Jan 6, 2011
    #8
  9. Daniel Berger

    Gary Wright Guest

    On Jan 6, 2011, at 3:44 PM, Daniel Berger wrote:
    >
    > Oh, wow, I don't think I knew that. It's strikes me as particularly
    > bizarre that they would return some notion of a "preferred block size"
    > instead of the actual block size. Seriously, what's the use of that?
    >
    > Now I need to check other platforms (Solaris, HP-UX) to see if they
    > use 512 byte convention.
    >
    > Is this something that's universal? Or is it something I can get via a
    > C call somewhere?


    I think you'll want to read up on the stat() system call. The POSIX
    standard leaves a bit of wiggle room though since while it does specify
    that st_blocks must be returned it doesn't specify the size of the blocks.

    I'm not sure I understand your concern about 'actual' vs. 'preferred'.
    I'm guessing they would be the same in almost any rational implementation
    but the main reason for having the information is to perform I/O in
    efficiently sized chunks. In that case, the 'preferred' block size would
    seem to be what you want even if the 'actual' block size was different.

    Gary Wright
     
    Gary Wright, Jan 6, 2011
    #9
  10. On Thu, Jan 6, 2011 at 9:44 PM, Daniel Berger <> wrote:
    >
    >>
    >> See stat(2):
    >>
    >> =A0 =A0 =A0 =A0The st_blocks field indicates the number of =A0blocks =A0=

    allocated =A0to =A0the
    >> =A0 =A0 =A0 =A0file, =A0512-byte =A0units. =A0(This may be smaller than =

    st_size/512 when the
    >> =A0 =A0 =A0 =A0file has holes.)
    >>
    >> =A0 =A0 =A0 =A0The st_blksize field gives the "preferred" blocksize for =

    efficient file
    >> =A0 =A0 =A0 =A0system =A0I/O. =A0(Writing to a file in smaller chunks ma=

    y cause an ineffi-
    >> =A0 =A0 =A0 =A0cient read-modify-rewrite.)
    >>
    >> So "blksize" has nothing to do with the size of the "blocks". They are
    >> always counted in 512-byte units.

    >
    > Oh, wow, I don't think I knew that. It's strikes me as particularly
    > bizarre that they would return some notion of a "preferred block size"
    > instead of the actual block size. Seriously, what's the use of that?
    >


    I think the two fields "st_blocks" and "st_blksize" just happens to
    use the same word ("block") in two slightly different meanings. To
    count the "st_blocks" in 512-byte units seem to be an arbitrary
    convention, unrelated to the "physical block size" used for files.

    > Now I need to check other platforms (Solaris, HP-UX) to see if they
    > use 512 byte convention.
    >
    > Is this something that's universal? Or is it something I can get via a
    > C call somewhere?
    >


    I looked in "Advanced UNIX Programming, 2nd ed" by Rochkind, and there
    the "st_blocks" field is described as the number of 512-byte blocks
    allocated for a file. So I guess this is a universal thing for UN*X
    (Linux, Mac OS X, Solaris, etc.).

    The Rochkind book also mentions that "st_blksize is in the stat
    structure so that an implementation can vary it by file if it chooses
    to do so".

    Regards,
    /Johan Holmberg
     
    Johan Holmberg, Jan 6, 2011
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andrew
    Replies:
    7
    Views:
    358
    Andrew
    Oct 1, 2003
  2. Clarence Gardner
    Replies:
    8
    Views:
    7,517
    redBind
    Apr 25, 2007
  3. Finn Stampe Mikkelsen

    Actual size of a table

    Finn Stampe Mikkelsen, Jun 28, 2010, in forum: ASP .Net
    Replies:
    0
    Views:
    317
    Finn Stampe Mikkelsen
    Jun 28, 2010
  4. Simon Wigzell

    Display images actual size

    Simon Wigzell, Jun 28, 2006, in forum: ASP General
    Replies:
    7
    Views:
    141
    Mike Brind
    Jun 29, 2006
  5. Alexander

    Actual Div Size

    Alexander, Nov 17, 2003, in forum: Javascript
    Replies:
    2
    Views:
    92
    Thomas 'PointedEars' Lahn
    Nov 22, 2003
Loading...

Share This Page