Getting file size of binary file

Discussion in 'C Programming' started by Arnold, Jan 8, 2004.

  1. Arnold

    Arnold Guest

    Is using fseek and ftell a reliable method of getting the file size on a
    binary file? I thought I remember reading somewhere it wasn't... If not what
    would be the "right" and portable method to obtain it? Thanks.
    Arnold, Jan 8, 2004
    #1
    1. Advertising

  2. Arnold

    Richard Bos Guest

    "Arnold" <> wrote:

    > Is using fseek and ftell a reliable method of getting the file size on a
    > binary file?


    No. From 7.19.9.2#3: "A binary stream need not meaningfully support
    fseek calls with a whence value of SEEK_END".

    To say that this irks me would be a bit of an understatement.

    > I thought I remember reading somewhere it wasn't... If not what
    > would be the "right" and portable method to obtain it?


    There is none, in ISO C.

    To say that _this_ irks me would be a bit of an understatement, as well.
    It should at least be possible to get the value of "what the OS thinks
    the file size is", but apparently there are reasons why it isn't; I've
    never heard one that is convincing, though.

    Richard
    Richard Bos, Jan 8, 2004
    #2
    1. Advertising

  3. Arnold

    Richard Head Guest

    On Thu, 08 Jan 2004 08:46:35 +0000, Arnold wrote:

    > Is using fseek and ftell a reliable method of getting the file size on a
    > binary file? I thought I remember reading somewhere it wasn't... If not what
    > would be the "right" and portable method to obtain it? Thanks.


    try fstat()
    Richard Head, Jan 8, 2004
    #3
  4. Richard Head <> scribbled the following:
    > On Thu, 08 Jan 2004 08:46:35 +0000, Arnold wrote:
    >> Is using fseek and ftell a reliable method of getting the file size on a
    >> binary file? I thought I remember reading somewhere it wasn't... If not what
    >> would be the "right" and portable method to obtain it? Thanks.


    > try fstat()


    Which part of the ISO C standard defines fstat()?

    --
    /-- Joona Palaste () ------------- Finland --------\
    \-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
    "My absolute aspect is probably..."
    - Mato Valtonen
    Joona I Palaste, Jan 8, 2004
    #4
  5. Arnold

    CBFalconer Guest

    Richard Head wrote:
    > On Thu, 08 Jan 2004 08:46:35 +0000, Arnold wrote:
    >
    > > Is using fseek and ftell a reliable method of getting the file
    > > size on a binary file? I thought I remember reading somewhere it
    > > wasn't... If not what would be the "right" and portable method

    > to obtain it? Thanks.
    >
    > try fstat()


    No, don't. There is no fstat() in standard C. Please do not give
    off-topic answers in this newsgroup, where there may be nobody to
    make corrections.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
    CBFalconer, Jan 8, 2004
    #5
  6. Richard Bos wrote:

    > "Arnold" <> wrote:
    >
    >
    >>Is using fseek and ftell a reliable method of getting the file size on a
    >>binary file?

    >
    >
    > No. From 7.19.9.2#3: "A binary stream need not meaningfully support
    > fseek calls with a whence value of SEEK_END".


    From the FAQ for this group:

    http://www.eskimo.com/~scs/C-faq/q19.12.html

    ---
    How can I find out the size of a file, prior to reading it in?

    If the ``size of a file'' is the number of characters you'll be able to
    read from it in C, it is difficult or impossible to determine this
    number exactly).

    Under Unix, the stat call will give you an exact answer. Several other
    systems supply a Unix-like stat which will give an approximate answer.
    You can fseek to the end and then use ftell, but these tend to have the
    same problems: fstat is not portable, and generally tells you the same
    thing stat tells you; ftell is not guaranteed to return a byte count
    except for binary files. Some systems provide routines called filesize
    or filelength, but these are not portable, either.

    Are you sure you have to determine the file's size in advance? Since the
    most accurate way of determining the size of a file as a C program will
    see it is to open the file and read it, perhaps you can rearrange the
    code to learn the size as it reads.
    ---

    Does this look strange to anyone else? There's that lone closing paren
    in the first paragraph, but the part that really bothers me is "ftell is
    not guaranteed to return a byte count except for binary files." It seems
    to be suggesting that the fseek/ftell method would be OK for a binary
    file, but line from the standard that Richard quoted suggests the opposite.

    >
    > To say that this irks me would be a bit of an understatement.
    >
    >
    >>I thought I remember reading somewhere it wasn't... If not what
    >>would be the "right" and portable method to obtain it?

    >
    >
    > There is none, in ISO C.
    >
    > To say that _this_ irks me would be a bit of an understatement, as well.
    > It should at least be possible to get the value of "what the OS thinks
    > the file size is", but apparently there are reasons why it isn't; I've
    > never heard one that is convincing, though.


    I suppose that it's partly because C deals with streams, not files
    directly (for the most part). Many things may not make sense for a
    stream, size included. How could the size of stdin be meaningful, for
    example? At the same time, there are at least a few standard functions
    that only make sense for certain types of streams. Seems like it
    wouldn't be such a bad idea to have a few more.

    -Kevin
    --
    My email address is valid, but changes periodically.
    To contact me please use the address from a recent posting.
    Kevin Goodsell, Jan 8, 2004
    #6
  7. Richard Bos wrote:

    (snip)

    > No. From 7.19.9.2#3: "A binary stream need not meaningfully support
    > fseek calls with a whence value of SEEK_END".
    >
    > To say that this irks me would be a bit of an understatement.


    (snip)

    > To say that _this_ irks me would be a bit of an understatement, as well.
    > It should at least be possible to get the value of "what the OS thinks
    > the file size is", but apparently there are reasons why it isn't; I've
    > never heard one that is convincing, though.


    I was reading not so long ago what one of IBM's C compilers for
    VM/CMS or MVS does for fseek/ftell. For files with variable length
    records, text or binary, ftell returns the block number in the
    upper 17 bits, and position in the block in the lower 15 bits.
    (OS restrictions tend to keep blocks less than 32K.) I think
    it wraps at 128K blocks.

    MVS keeps track of files in tracks, which can't reliably be
    converted to bytes. CMS maps variable length blocks onto
    a fixed block file system, but also doesn't accurately
    keep track of bytes of file data.

    On traditional IBM mainframe OS's, tracks are formatted when
    written. The block size is determined by the program, and can
    either fixed fixed or variable length. As an added complication,
    files with fixed length blocks will usually have a short block
    at the end. If opened for append, this short block stays in
    place, so even for fixed length blocks a block count can't
    reliably indicate file size.

    -- glen
    glen herrmannsfeldt, Jan 8, 2004
    #7
  8. Arnold

    Richard Bos Guest

    Kevin Goodsell <> wrote:

    > Richard Bos wrote:
    >
    > > It should at least be possible to get the value of "what the OS thinks
    > > the file size is", but apparently there are reasons why it isn't; I've
    > > never heard one that is convincing, though.

    >
    > I suppose that it's partly because C deals with streams, not files
    > directly (for the most part). Many things may not make sense for a
    > stream, size included. How could the size of stdin be meaningful, for
    > example? At the same time, there are at least a few standard functions
    > that only make sense for certain types of streams. Seems like it
    > wouldn't be such a bad idea to have a few more.


    Exactly; the function could always return -1 for "not available".

    Richard
    Richard Bos, Jan 9, 2004
    #8
  9. Arnold

    Richard Bos Guest

    glen herrmannsfeldt <> wrote:

    > Richard Bos wrote:
    >
    > > To say that _this_ irks me would be a bit of an understatement, as well.
    > > It should at least be possible to get the value of "what the OS thinks
    > > the file size is", but apparently there are reasons why it isn't; I've
    > > never heard one that is convincing, though.

    >
    > I was reading not so long ago what one of IBM's C compilers for
    > VM/CMS or MVS does for fseek/ftell. For files with variable length
    > records, text or binary, ftell returns the block number in the
    > upper 17 bits, and position in the block in the lower 15 bits.
    > (OS restrictions tend to keep blocks less than 32K.) I think
    > it wraps at 128K blocks.
    >
    > MVS keeps track of files in tracks, which can't reliably be
    > converted to bytes. CMS maps variable length blocks onto
    > a fixed block file system, but also doesn't accurately
    > keep track of bytes of file data.
    >
    > On traditional IBM mainframe OS's, tracks are formatted when
    > written. The block size is determined by the program, and can
    > either fixed fixed or variable length. As an added complication,
    > files with fixed length blocks will usually have a short block
    > at the end. If opened for append, this short block stays in
    > place, so even for fixed length blocks a block count can't
    > reliably indicate file size.


    That doesn't convince me, either.

    The OS has _some_ idea of how large the file is, if only to prevent the
    user from writing past the end of it. It should be possible to pass this
    knowledge on to the C implementation. If the result is approximate, that
    is inherent in the OS, and the user will be expecting it.

    Richard
    Richard Bos, Jan 9, 2004
    #9
  10. Richard Bos wrote:

    > glen herrmannsfeldt <> wrote:


    (snip)

    >>I was reading not so long ago what one of IBM's C compilers for
    >>VM/CMS or MVS does for fseek/ftell. For files with variable length
    >>records, text or binary, ftell returns the block number in the
    >>upper 17 bits, and position in the block in the lower 15 bits.
    >>(OS restrictions tend to keep blocks less than 32K.) I think
    >>it wraps at 128K blocks.


    >>MVS keeps track of files in tracks, which can't reliably be
    >>converted to bytes.


    (snip)

    > That doesn't convince me, either.


    > The OS has _some_ idea of how large the file is, if only to prevent the
    > user from writing past the end of it. It should be possible to pass this
    > knowledge on to the C implementation. If the result is approximate, that
    > is inherent in the OS, and the user will be expecting it.


    The OS keeps track of how many tracks are allocated, but now how many
    bytes are written to each one. The number of bytes you can fit on a
    track with a BLKSIZE of 1 is about 1% of the maximum. There also
    could be empty tracks allocated but not yet used, after the data.

    There is no standard (or non-standard) way to say approximately how
    much space a data set takes.

    Assuming that every file system is like unix is not a good idea.

    -- glen
    glen herrmannsfeldt, Jan 9, 2004
    #10
  11. In article <CuELb.7737$5V2.11724@attbi_s53> glen herrmannsfeldt <> writes:
    > Richard Bos wrote:
    > > glen herrmannsfeldt <> wrote:
    > >>I was reading not so long ago what one of IBM's C compilers for
    > >>VM/CMS or MVS does for fseek/ftell. For files with variable length
    > >>records, text or binary, ftell returns the block number in the
    > >>upper 17 bits, and position in the block in the lower 15 bits.
    > >>(OS restrictions tend to keep blocks less than 32K.) I think
    > >>it wraps at 128K blocks.


    Note the "variable length records". I think that records can not span
    track boundaries, and so each track contains unused data.

    > > That doesn't convince me, either.

    >
    > > The OS has _some_ idea of how large the file is, if only to prevent the
    > > user from writing past the end of it.


    No. The OS only has to have some idea where the end of a file is.

    > The OS keeps track of how many tracks are allocated, but now how many
    > bytes are written to each one. The number of bytes you can fit on a
    > track with a BLKSIZE of 1 is about 1% of the maximum. There also
    > could be empty tracks allocated but not yet used, after the data.


    The empty tracks are no problem I think, it is the partly filled tracks
    that will give problems.

    > There is no standard (or non-standard) way to say approximately how
    > much space a data set takes.


    There is a non-standard way. Take each allocated track in succession
    and find the number of allocated bytes for each track (that number is
    available). Add them and you are done. However, this does not tell
    you where the next byte should be written. You could of course write
    an ftell and fseek that would use byte-numbers, but implementation
    would be slow as for each execution of such a routine you have to
    consult a table containing the size of each track.

    > Assuming that every file system is like unix is not a good idea.


    Indeed.
    --
    dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
    home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
    Dik T. Winter, Jan 10, 2004
    #11
  12. "Arnold" <> wrote in message news:<LN8Lb.1332$>...
    > Is using fseek and ftell a reliable method of getting the file size on a
    > binary file? I thought I remember reading somewhere it wasn't... If not what
    > would be the "right" and portable method to obtain it? Thanks.


    no. because a binary stream may have padding.
    though i'm not so sure why a binary stream would
    have padding. This is just what the standard says.



    --
    nethlek
    Mantorok Redgormor, Jan 10, 2004
    #12
  13. Dik T. Winter wrote:

    (snip regarding a file system used on some IBM machines that have
    a C compiler)

    > There is a non-standard way. Take each allocated track in succession
    > and find the number of allocated bytes for each track (that number is
    > available). Add them and you are done.


    I believe the only way to do that is to read all the tracks up
    to the EOF. But why would you want to do that? You can't fseek()
    with it, but you can with the block/offset form. Though I
    am not sure that it doesn't need to read them even for that form.

    It might be that the C library reads it once and keeps track of the
    length of each block, and the track it is on for later use.

    > However, this does not tell
    > you where the next byte should be written. You could of course write
    > an ftell and fseek that would use byte-numbers, but implementation
    > would be slow as for each execution of such a routine you have to
    > consult a table containing the size of each track.


    It might be that it does keep track of where the last track with
    data on it is.

    > > Assuming that every file system is like unix is not a good idea.


    > Indeed.


    -- glen
    glen herrmannsfeldt, Jan 10, 2004
    #13
  14. Mantorok Redgormor wrote:

    > "Arnold" <> wrote in message news:<LN8Lb.1332$>...
    >
    >>Is using fseek and ftell a reliable method of getting the file size on a
    >>binary file? I thought I remember reading somewhere it wasn't... If not what
    >>would be the "right" and portable method to obtain it? Thanks.


    > no. because a binary stream may have padding.
    > though i'm not so sure why a binary stream would
    > have padding. This is just what the standard says.


    I believe that there are some file systems that use fixed blocks,
    such as 512 bytes, and keep track of the number of blocks but not
    the number of bytes in the last block.

    Rumors are that CP/M did this, and used X'26' on text files to mark
    the real end.

    Some tape systems also can only write 512 byte blocks.

    -- glen
    glen herrmannsfeldt, Jan 10, 2004
    #14
  15. On Thu, 08 Jan 2004 19:51:46 GMT, Kevin Goodsell
    <> wrote:
    (regarding FAQ 19.12, and 7.19.9.2p3)<snip>
    > Under Unix, the stat call will give you an exact answer. Several other
    > systems supply a Unix-like stat which will give an approximate answer.
    > You can fseek to the end and then use ftell, but these tend to have the
    > same problems: fstat is not portable, and generally tells you the same
    > thing stat tells you; ftell is not guaranteed to return a byte count
    > except for binary files. Some systems provide routines called filesize
    > or filelength, but these are not portable, either. <snip>
    > ---
    >
    > Does this look strange to anyone else? There's that lone closing paren
    > in the first paragraph, but the part that really bothers me is "ftell is
    > not guaranteed to return a byte count except for binary files." It seems
    > to be suggesting that the fseek/ftell method would be OK for a binary
    > file, but line from the standard that Richard quoted suggests the opposite.
    >

    What it's trying to say, but doesn't spell out well, is that ftell()
    of a binary stream, if it works at all, must return a byte count --
    and similarly fseek() of a binary stream if it works must accept a
    byte count, however much extra work the C runtime must do to deal with
    radically non-Unix-like files -- but ftell() of a text stream may
    return, and fseek() accept, a "cookie" on which arithmetic does not
    work, and (in this context) does not even resemble a file size
    measure; 7.19.9.4p2.

    As an extreme example, I think someone reliable posted a few months
    back (or maybe in c.s.c) that VMS C couldn't fit the necessary info in
    a long so it allocated memory space where it stored the RMS record
    info and returned the address of that space (on VAX all addresses were
    flat 32 bit, with a break at 2 up 31, and so fit in 32-bit long).

    In other words, it is saying: if you want to try the fseek(END),ftell
    method, only try it on a binary stream; and it should but doesn't note
    that even that may fail (at runtime, but at least noisily).

    - David.Thompson1 at worldnet.att.net
    Dave Thompson, Jan 19, 2004
    #15
  16. filesystem granularity, was Re: Getting file size of binary file

    On Sat, 10 Jan 2004 07:23:33 GMT, glen herrmannsfeldt
    <> wrote, in comp.lang.c:

    > Mantorok Redgormor wrote:
    >
    > > "Arnold" <> wrote in message news:<LN8Lb.1332$>...
    > >
    > >>Is using fseek and ftell a reliable method of getting the file size on a
    > >>binary file? I thought I remember reading somewhere it wasn't... If not what
    > >>would be the "right" and portable method to obtain it? Thanks.

    >
    > > no. because a binary stream may have padding.
    > > though i'm not so sure why a binary stream would
    > > have padding. This is just what the standard says.

    >

    Of course even in this case it could and probably would give you the
    size allocated, it's just that that's increased from the size written.

    > I believe that there are some file systems that use fixed blocks,
    > such as 512 bytes, and keep track of the number of blocks but not
    > the number of bytes in the last block.
    >
    > Rumors are that CP/M did this, and used X'26' on text files to mark
    > the real end.
    >

    CP/M used 128-byte block = 1 sector on floppy; Dan Pop has said it
    used at least one larger size (maybe several?) on harddisks and I
    believe him, but the CP/M system I used had no harddisk.

    And 0x1A = (dec) 26 for EOF. From whence MS-DOS seems to have picked
    it up, even though MS-DOS has and IIRC always had exact byte counts.

    RT-11 used 512-byte blocks (on everything), and I *think* the same
    character but I don't remember for sure as TECO took care of that for
    me (and PIP, but if I did DK:FOO=TT:/A it was so rare I've forgotten);
    crosspost added for check.

    > Some tape systems also can only write 512 byte blocks.
    >

    Including DECtape <G!>. Although you can still have labels or other
    metadata that tells you how much padding to ignore.

    - David.Thompson1 at worldnet.att.net
    Dave Thompson, Jan 19, 2004
    #16
  17. Arnold

    Brian Inglis Guest

    Re: filesystem granularity, was Re: Getting file size of binary file

    On Mon, 19 Jan 2004 07:32:46 GMT in alt.sys.pdp11, Dave Thompson
    <> wrote:

    >On Sat, 10 Jan 2004 07:23:33 GMT, glen herrmannsfeldt
    ><> wrote, in comp.lang.c:
    >
    >> Mantorok Redgormor wrote:
    >>
    >> > "Arnold" <> wrote in message news:<LN8Lb.1332$>...
    >> >
    >> >>Is using fseek and ftell a reliable method of getting the file size on a
    >> >>binary file? I thought I remember reading somewhere it wasn't... If not what
    >> >>would be the "right" and portable method to obtain it? Thanks.


    only on disk files -- skip to EOF is not good on other devices

    >> > no. because a binary stream may have padding.
    >> > though i'm not so sure why a binary stream would
    >> > have padding. This is just what the standard says.


    *text* streams may have padding (CRs) or no carriage control (IBM VB
    or DEC implied CR) from the POV of ftell()/fseek(), which I believe
    are deprecated in favour of the more opaque fgetpos()/fsetpos();
    fixed record length binary files should have no padding on most
    systems; variable record length binary files may have padding on some
    systems where the record metadata is stored with the file data

    >Of course even in this case it could and probably would give you the
    >size allocated, it's just that that's increased from the size written.


    allocated => blocks / clusters
    bytes stored on disk >= (| <=) bytes written to disk

    --
    Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

    (Brian dot Inglis at SystematicSw dot ab dot ca)
    fake address use address above to reply
    Brian Inglis, Jan 19, 2004
    #17
  18. Dave Thompson wrote:

    (snip)

    > What it's trying to say, but doesn't spell out well, is that ftell()
    > of a binary stream, if it works at all, must return a byte count --
    > and similarly fseek() of a binary stream if it works must accept a
    > byte count, however much extra work the C runtime must do to deal with
    > radically non-Unix-like files -- but ftell() of a text stream may
    > return, and fseek() accept, a "cookie" on which arithmetic does not
    > work, and (in this context) does not even resemble a file size
    > measure; 7.19.9.4p2.


    > As an extreme example, I think someone reliable posted a few months
    > back (or maybe in c.s.c) that VMS C couldn't fit the necessary info in
    > a long so it allocated memory space where it stored the RMS record
    > info and returned the address of that space (on VAX all addresses were
    > flat 32 bit, with a break at 2 up 31, and so fit in 32-bit long).


    Previously in this thread, I had indicated that MVS and VM/CMS on
    variable length block files, even opened in binary mode, return
    32768*(block number)+(offset into block). Standard access methods
    limit blocksize to less than 32768, but files can have more than
    131071 blocks, especially if they are small.

    I don't know how much work it is to come up with that. I don't
    believe that the number of blocks is stored, though I am not sure
    about that. (MVS keeps track of the number of tracks allocated, but
    not the number of blocks on each track.)

    -- glen
    glen herrmannsfeldt, Jan 31, 2004
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. tiewknvc9
    Replies:
    6
    Views:
    648
    Chris Uppal
    Oct 1, 2006
  2. Ron Eggler

    writing binary file (ios::binary)

    Ron Eggler, Apr 25, 2008, in forum: C++
    Replies:
    9
    Views:
    905
    James Kanze
    Apr 28, 2008
  3. Jason Cavett

    Preferred Size, Minimum Size, Size

    Jason Cavett, May 23, 2008, in forum: Java
    Replies:
    5
    Views:
    12,529
    Michael Jung
    May 25, 2008
  4. Jim
    Replies:
    6
    Views:
    721
  5. jodleren

    Getting picture size/setting window size

    jodleren, Feb 14, 2007, in forum: Javascript
    Replies:
    2
    Views:
    145
    jodleren
    Feb 15, 2007
Loading...

Share This Page