getting file size

B

Bob Smith

Are these the same:

1. f_size = os.path.getsize(file_name)

2. fp1 = file(file_name, 'r')
data = fp1.readlines()
last_byte = fp1.tell()

I always get the same value when doing 1. or 2. Is there a reason I
should do both? When reading to the end of a file, won't tell() be just
as accurate as os.path.getsize()?

Thanks guys,

Bob
 
J

John Machin

Bob said:
Are these the same:

1. f_size = os.path.getsize(file_name)

2. fp1 = file(file_name, 'r')
data = fp1.readlines()
last_byte = fp1.tell()

I always get the same value when doing 1. or 2. Is there a reason I
should do both? When reading to the end of a file, won't tell() be just
as accurate as os.path.getsize()?

Read the docs. Note the hint that you get what the stdio serves up.
ftell() can only be _guaranteed_ to give you a magic cookie that you
may later use with fseek(magic_cookie) to return to the same place in a
more reliable manner than with Hansel & Gretel's non-magic
bread-crumbs. On 99.99% of modern filesystems, the cookie obtained by
ftell() when positioned at EOF is in fact the size in bytes. But why
chance it? os.path.getsize does as its name suggests; why not use it,
instead of a method with a side-effect? As for doing _both_, why would
you??
 
M

Marc 'BlackJack' Rintsch

Are these the same:

1. f_size = os.path.getsize(file_name)

2. fp1 = file(file_name, 'r')
data = fp1.readlines()
last_byte = fp1.tell()

I always get the same value when doing 1. or 2. Is there a reason I
should do both? When reading to the end of a file, won't tell() be just
as accurate as os.path.getsize()?

You don't always get the same value, even on systems where `tell()`
returns a byte position. You need the rights to read the file in case 2.
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IOError: [Errno 13] Permission denied: '/etc/shadow'

Ciao,
Marc 'BlackJack' Rintsch
 
T

Tim Roberts

Bob Smith said:
Are these the same:

1. f_size = os.path.getsize(file_name)

2. fp1 = file(file_name, 'r')
data = fp1.readlines()
last_byte = fp1.tell()

I always get the same value when doing 1. or 2. Is there a reason I
should do both? When reading to the end of a file, won't tell() be just
as accurate as os.path.getsize()?

On Windows, those two are not equivalent. Besides the newline conversion
done by reading text files, the solution in 2. will stop as soon as it sees
a ctrl-Z.

If you used 'rb', you'd be much closer.
 
J

John Machin

Tim said:
On Windows, those two are not equivalent. Besides the newline conversion
done by reading text files,

Doesn't appear to me to go wrong due to newline conversion:

Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
import os.path
txt = 'qwertyuiop\nasdfghjkl\nzxcvbnm\n'
file('bob', 'w').write(txt)
len(txt) 29
os.path.getsize('bob') 32L ##### as expected
f = file('bob', 'r')
lines = f.readlines()
lines ['qwertyuiop\n', 'asdfghjkl\n', 'zxcvbnm\n']
f.tell()
32L ##### as expected
the solution in 2. will stop as soon as it sees
a ctrl-Z.

.... and the value returned by f.tell() is not the position of the
ctrl-Z but more likely the position of the end of the current block --
which could be thousands/millions of bytes before the physical end of
the file.

Good ol' CP/M.
If you used 'rb', you'd be much closer.

And be much less hassled when that ctrl-Z wasn't meant to mean EOF, it
just happened to appear in an unvalidated data field part way down a
critical file :-(
 
J

John Machin

Tim said:
On Windows, those two are not equivalent. Besides the newline conversion
done by reading text files,

Doesn't appear to me to go wrong due to newline conversion:

Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
import os.path
txt = 'qwertyuiop\nasdfghjkl\nzxcvbnm\n'
file('bob', 'w').write(txt)
len(txt) 29
os.path.getsize('bob') 32L ##### as expected
f = file('bob', 'r')
lines = f.readlines()
lines ['qwertyuiop\n', 'asdfghjkl\n', 'zxcvbnm\n']
f.tell()
32L ##### as expected
the solution in 2. will stop as soon as it sees
a ctrl-Z.

.... and the value returned by f.tell() is not the position of the
ctrl-Z but more likely the position of the end of the current block --
which could be thousands/millions of bytes before the physical end of
the file.

Good ol' CP/M.
If you used 'rb', you'd be much closer.

And be much less hassled when that ctrl-Z wasn't meant to mean EOF, it
just happened to appear in an unvalidated data field part way down a
critical file :-(
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top