How to get file size?

E

Eric Sosman

jacob said:
[...]
If your "text" mode includes translating tabs into blanks for instance,
that WILL go wrong. But again, we are speaking about BINARY mode
Eric. JUST BINARY MODE ok?

The complete text of the question that started this
thread is
Hi all,
Is there a function in the standard library that can get the size of a
file?
Thank you very much.
Sam.

Please point out where Sam specifies BINARY or JUST
BINARY MODE.
 
W

Walter Roberson

File size is the number of bytes you get when reading the stuff
in binary mode.

Is it? What about files on filesystems such as SGI's XFS, which
permit "holes" in filesystems? Those holes take up no disk space
(beyond the descriptor of the size of the holes). It is not unreasonable
to say that the "file size" of such a file would be the amount of
disk space it occupies, rather than the number of bytes you can read
from it.
As far as I understand this complicated subject,
when you make "DIR" or "ls -l" or whatever your system command
is, the size reported is the actual number of bytes in the "file"
entity. This is (or should be) the same as reported by the above
method.

I can't help but feel that your experience has been relatively narrow.

There are several operating systems which can transparently compress
and decompress files. The disk space reported may be the compressed
file size, or it may be the uncompressed file size -- it varies with
the OS and the way of asking the question.
Ahhh no wonder Digital went under... what a system!
If your "text" mode includes translating tabs into blanks for instance,
that WILL go wrong. But again, we are speaking about BINARY mode
Eric. JUST BINARY MODE ok?

VMS has a number of different filesystem formats, each aimed at a
different purpose, and with noticable internal filesystem optimizations
to suit those different purposes. What you get when you read such files
in binary mode is not generally the byte-stream stored on disk. For
example, if you are reading a variable-length record file, when you
read in binary mode you get the -contents- of the record, not the
infrastructure bytes that describe the record. This is not a
"contradiction" in the meaning of "binary" mode, because the other
standard mode, "text mode" would imply that you are reading text
whereas the variable-length record might be (say) a struct written as a
complete object. Binary mode is not necessarily the same as what might
be termed "raw mode".

Another example: executables in VMS may include one or more "patch"
sections, which are additional records intended to overlay a stored
code section -- rather than change the original binary itself
in a non-reversable way, the various patches could be added or
removed as virtual addendums. Considering that executable code
is certainly not what most people would consider as "text", then
when one reads the executable in "binary" mode, one expects to
read out the code stream with all the active modifications made to it.
If one needed to get at the underlying structure (e.g., to add another
patch) then there were RMS (Record Management Services) calls that
could be made for that purpose. Now, what is the "size" of such
an executable? The size of the patched stream, or the amount of disk
space it takes to represent all the records and headers including
patches?
 
M

Mark McIntyre

Dear copx
I proposed several months ago the same solution as you and
received the same pompous answers as you have received.

I have never seen any system where that would fail, sorry.

Then your experience is /even more/ limited than I thought.
And the guys throwing nonsense didn't ever show me an example
where this would fail:

Seriously, if you can't see the problem, there's no hope for you. But
if you want an example, try this on VMS. I suspect it also fails on
most IBM OSen too, but I'm without an S/360 to try it on... :)
I have never seen a system where setting the file pointer at
the end would fail.

And I've never seen a planet whose gravity was other than one gee, or
a Native Australian. That clearly means /they/ don't exist either.
 
M

Mark McIntyre

File size is the number of bytes you get when reading the stuff
in binary mode.

Interesting, if bollocks, definition.

Under VMS, you could write 112 bytes to a file. However its 'filesize'
could, according to this method, be 512 bytes because all VMS file
allocations are in blocks of 512. This is also what DIR shows you, one
block allocated to the file.

Reading all 512 bytes back would give you 400 bytes which didn't
belong to your file. If this happened to be some sort of data file,
you'd end up processing 400 bytes of garbage. Thats why the OS has
system specific functions to determine the amount of actual data. You
should use them, and avoid daft hacks.
 
M

Mark McIntyre

On 27 Apr 2005 16:06:57 GMT, in comp.lang.c ,
Another example: executables in VMS may include one or more "patch"
sections, which are additional records intended to overlay a stored
code section -- rather than change the original binary itself
in a non-reversable way, the various patches could be added or
removed as virtual addendums. Considering that executable code
is certainly not what most people would consider as "text", then
when one reads the executable in "binary" mode, one expects to
read out the code stream with all the active modifications made to it.
If one needed to get at the underlying structure (e.g., to add another
patch) then there were RMS (Record Management Services) calls that
could be made for that purpose.

Golly, I'd forgotten all about RMS calls. What a great OS VMS was,
definitely the King of Kings.
 
J

jacob navia

Walter said:
There are several operating systems which can transparently compress
and decompress files. The disk space reported may be the compressed
file size, or it may be the uncompressed file size -- it varies with
the OS and the way of asking the question.

If its transparent (as you say above) it is not relevant to
the discussion.

I repeat. The C runtime should be able to give me back 10 000 'a'
characters after I have written them. If the file is represented
as '10000*a' (7 chars) or as effectively 10 thousand letters
is not relevant. fseek(file,0,SEEK_END) should put me at the
10 000th position, and ftell should return 10 000.
VMS has a number of different filesystem formats, each aimed at a
different purpose, and with noticable internal filesystem optimizations
to suit those different purposes. What you get when you read such files
in binary mode is not generally the byte-stream stored on disk. For
example, if you are reading a variable-length record file, when you
read in binary mode you get the -contents- of the record, not the
infrastructure bytes that describe the record. This is not a
"contradiction" in the meaning of "binary" mode, because the other
standard mode, "text mode" would imply that you are reading text
whereas the variable-length record might be (say) a struct written as a
complete object. Binary mode is not necessarily the same as what might
be termed "raw mode".
Of course but who cares?
If after writing some data I can read it back as a stream of
characters that's all I need to make my algorithm work.
 
W

Walter Roberson

Walter Roberson wrote:
If its transparent (as you say above) it is not relevant to
the discussion.

It *is* relevant to the discussion, since you stated that the
size returned by your proposed algorithm should match the size
shown by DIR or ls -l . VMS's relevant command is named DIR .
Also when one is ftp'ing the internal ftp command is named DIR
(ftp clients that provide 'ls' do so by issuing DIR commands.)
Thus in order to know whether the size returned by your proposed
algorithm is "right" or not, one must know what the size returned
by DIR -means-.
I repeat. The C runtime should be able to give me back 10 000 'a'
characters after I have written them.

That's not a "repeat", that's a new phrasing, which
completely divorces the notion of "file size" from "size shown by"
the directory commands or "the size on disk".

You are now in the circular argument that file size is -defined-
by the value returned by your algorithm. Well of course your
algorithm is "correct" under that definition... the problem is
that your algorithm is -incorrect- under several other valid
interpretations of what "file size" means.

Next time you are on a Unix system, you should try applying
your algorithm to the files /dev/random and /dev/zero --
and you should try applying your definition to the file /dev/null .
What value is going to be returned by ftell() after you write
(LONG_MAX + 1) bytes to /dev/null ? For that matter, what size
is going to be returned by ftell() after you write (LONG_MAX + 1)
bytes to a regular file on a filesystem that supports larger
files? (And yes, such filesystems really truly do exist.
I can write files larger than LONG_MAX on the system I am
using now.)
 
K

Kenneth Brody

Christopher said:
The OP may also appreciate knowing that your proposed solution is dubious
at best.


If something doesn't work the way the ANSI standard says it should,
the implementation in question is not a C implementation. End of story.

Does the ANSI standard say that fseek/ftell _only_ work as you described,
or that they are only _guaranteed_ to work as you described? (Relating to
text-vs-binary modes.) In that case, a system which allows ftell() on a
text stream, and non-EOF fseek()s on text streams can still be compatible
with the standard.

On the other hand, I can tell you that MSVC6.0 does _not_ work properly
with ftell() and fseek() in text mode. Specifically, if the file which
is open in text mode uses LF instead of CRLF for line endings, these two
functions return horribly wrong values.

[...]

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:[email protected]>
 
J

John Smith

John said:
But then again, you're a pompous asshole.
Very disturbing to see somebody with my name behave this way on c.l.c.
Please don't confuse me with this guy.

John Smith (the other one)
 
W

Walter Roberson

Walter Roberson said:
Also when one is ftp'ing the internal ftp command is named DIR
(ftp clients that provide 'ls' do so by issuing DIR commands.)

Christopher Nehren politely pointed out to me in email that
the internal ftp command is LIST, not DIR. [A related command is NLST --
which isn't DIR either.]
 
C

Christopher Benson-Manica

Kenneth Brody said:
Does the ANSI standard say that fseek/ftell _only_ work as you described,
or that they are only _guaranteed_ to work as you described?

The latter, but I figured concepts such as implementation-defined,
unspecified, and undefined behavior weren't likely to interest the OP.

I do hope my statement wasn't totally out of line...
 
R

Richard Bos

John Smith said:
John Smith wrote:
[ Something. ]
John Smith (the other one)

_The_ other one? Isn't John Smith about the most common name in English?
There must be oodles of men legally using that name, of whom dozens post
to Usenet, I wouldn't be surprised.

Richard
 
R

Richard Bos

Mark McIntyre said:
And I've never seen a planet whose gravity was other than one gee, or
a Native Australian. That clearly means /they/ don't exist either.

I've never seen copx or jacob navia in real life. Obviously, they only
exist on Usenet. Then again, I've never met Mark McIntyre, either...

Richard
 
J

jacob navia

Walter said:
That's not a "repeat", that's a new phrasing, which
completely divorces the notion of "file size" from "size shown by"
the directory commands or "the size on disk".


I quote from the C standard:
A binary stream is an ordered sequence of characters that can
transparently record
internal data. Data read in from a binary stream shall compare equal to
the data that were
earlier written out to that stream, under the same implementation.

Page 263 Streams.

This is not a "new" phrasing. It is the definition of a stream.
Next time you are on a Unix system, you should try applying
your algorithm to the files /dev/random and /dev/zero --
and you should try applying your definition to the file /dev/null .
What value is going to be returned by ftell() after you write
(LONG_MAX + 1) bytes to /dev/null ? For that matter, what size
is going to be returned by ftell() after you write (LONG_MAX + 1)
bytes to a regular file on a filesystem that supports larger
files? (And yes, such filesystems really truly do exist.
I can write files larger than LONG_MAX on the system I am
using now.)


Those devices are NOT streams as understood by the C language, and
are NOT relevant to this argument.
 
J

John Smith

Richard said:
John Smith wrote:

[ Something. ]

John Smith (the other one)


_The_ other one? Isn't John Smith about the most common name in English?
There must be oodles of men legally using that name, of whom dozens post
to Usenet, I wouldn't be surprised.

Richard

Johnson is the most common name in English. But sometimes I wish my
mother had named me Boris. I post occasionally to this group, always
asking for advice, never giving it. And I don't call anybody names.

JS
 
C

Chris Croughton

Johnson is the most common name in English. But sometimes I wish my
mother had named me Boris. I post occasionally to this group, always
asking for advice, never giving it. And I don't call anybody names.

Google Groups turns up several:

One in Italy, one in the Netherlands, one in hr.*, one in aus.*...
A radio amateur
A Doctor Who fan
A pilot
A motorsport fan
...

Of course, several of them might be the same person...

Although both John and Smith are common, the combination John Smith
seems to be quite rare. I've only ever known one personally, and he had
never met another one. Most parents, if they have a common surname like
Smith, don't want to put another common name to it.

The same way that I wouldn't name a real variable foo even if that was
the most descriptive name for it (to be vaguely near some sort of
topic)...

Chris C
 
M

Michael Wojcik

Very disturbing to see somebody with my name behave this way on c.l.c.
Please don't confuse me with this guy.

'twas an obvious sock puppet; the email address was
<[email protected]>. Experienced Usenet readers should have
noted that and ignored the message. Inexperienced Usenet readers
might not, but there's nothing to be done about that, except hope
they gain experience.

It's generally best to just ignore this sort of trolling.
 
D

Default User

John Smith wrote:

Johnson is the most common name in English. But sometimes I wish my
mother had named me Boris. I post occasionally to this group, always
asking for advice, never giving it. And I don't call anybody names.


I don't know about English as a whole, but Smith is the most common
surname in the USA, according to census data.




Brian

#! rnews 1144
Xref: xyzzy sci.energy:129236 sci.energy.hydrogen:80066 sci.environment:392849
Newsgroups: sci.environment,sci.energy.hydrogen,sci.energy
Path: xyzzy!nntp
From: "Fred McGalliard" <[email protected]>
Subject: Re: Iceland's Hydrogen Buses Zip Toward Oil-Free Economy
X-Nntp-Posting-Host: e056750.nw.nos.boeing.com
Message-ID: <[email protected]>
X-Mimeole: Produced By Microsoft MimeOLE V6.00.2800.1409
X-Priority: 3
X-Msmail-Priority: Normal
Lines: 9
Sender: (e-mail address removed) (Boeing NNTP News Access)
Organization: The Boeing Company
X-Newsreader: Microsoft Outlook Express 6.00.2800.1409
References: <[email protected]> <[email protected]> <[email protected]> <pcI9e.88956$f%[email protected]> <[email protected]> <[email protected]>
Date: Thu, 28 Apr 2005 16:06:02 GMT


....
Just as avocados are manufactured from water and diesel fuel.

Makes a lousy dip though.


#! rnews 1020
Xref: xyzzy comp.lang.c:555829
Newsgroups: comp.lang.c
Path: xyzzy!nntp
From: "Default User" <[email protected]>
Subject: Re: When using select call on many sockets, how to check if one of them is closed
X-Nntp-Posting-Host: pls025033.mw.nos.boeing.com
Content-Type: text/plain; charset=iso-8859-1
Message-ID: <[email protected]>
User-Agent: XanaNews/1.16.3.1
Sender: (e-mail address removed) (Boeing NNTP News Access)
Organization: The Boeing Company
References: <[email protected]>
Mime-Version: 1.0
Date: Thu, 28 Apr 2005 16:06:57 GMT

I am using the select call to read from many sockets. I don't want to
call read on every socket to check if it is closed, as it defeats the
purpose of using the select call.
Is there any way to know whether one of the sockets have been closed,
without actually testing each socket.

From the nature of your question, I'd say comp.unix.programmer is the
newsgroup for you.




Brian
#! rnews 1019
Xref: xyzzy comp.lang.c++:654734
Newsgroups: comp.lang.c++
Path: xyzzy!nntp
From: "Default User" <[email protected]>
Subject: Re: OT: is there any way to get rid of "amos"?
X-Nntp-Posting-Host: pls025033.mw.nos.boeing.com
Message-ID: <[email protected]>
User-Agent: XanaNews/1.16.3.1
Sender: (e-mail address removed) (Boeing NNTP News Access)
Organization: The Boeing Company
References: <[email protected]> <[email protected]>
Date: Thu, 28 Apr 2005 16:09:06 GMT
I don't see any 'amos' in here. news.individual.net must filter the
spam at source.

I don't see it either, but I may have put the sender into my bozo bin
(its actual name on Xananews) some time back.




Brian
 
T

Thomas Matthews

Sam said:
Hi all,
Is there a function in the standard library that can get the size of a file?
Thank you very much.
Sam.

No, there is not a standard function.
The most portable method is to write a function that
counts all the characters or octects (binary) in
the file.

A more precise method is to use a platform or operating
system function, which is not discussed in this
newsgroup.

Or design your algorithms so that they don't need
to know the size of a file; the function operate
on the data until the end of the file is reached.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library
 
M

Mark McIntyre

Johnson is the most common name in English.

In american english. Smith is the most common name in english english.
Many of the american johnsons were originally johannsens and other
names of scandi or germanic origin. Not to mention probably a few
eastern europeans who got randomly named by customs officials too lazy
to spell the real word.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top