file.read() doesn't read the whole file

Sreejith K · Mar 20, 2009

Hi,
'./mango.txt_snaps/snap1/0'

The above code works fine and it reads the whole file till EOF. But
when this method is used in a different scenario the file is not read
completely. I'll post the code that read only some part of the file...

self.snap = open(self.snapdir + '/snap%d/%d' % (self.snap_cnt,
block),'r') ## opens /mnt/gfs_local/mango.txt_snaps/snap1/0
self.snap.seek(off%4096) ## seeks to 0 in this case
bend = 4096-(off%4096) ## 4096 in this case
if length-bend <= 0: ## true in this case as length is 4096
tf.writelines("returned \n")
data = self.snap.read(length)
self.snap.close()
break

the output data is supposed to read the whole fie but it only reads a
part of it. Why is it encountering an early EOF ?

R. David Murray · Mar 20, 2009

Sreejith K said:
Hi,

'./mango.txt_snaps/snap1/0'

The above code works fine and it reads the whole file till EOF. But
when this method is used in a different scenario the file is not read
completely. I'll post the code that read only some part of the file...

self.snap = open(self.snapdir + '/snap%d/%d' % (self.snap_cnt,
block),'r') ## opens /mnt/gfs_local/mango.txt_snaps/snap1/0
self.snap.seek(off%4096) ## seeks to 0 in this case
bend = 4096-(off%4096) ## 4096 in this case
if length-bend <= 0: ## true in this case as length is 4096
tf.writelines("returned \n")
data = self.snap.read(length)
self.snap.close()
break

the output data is supposed to read the whole fie but it only reads a
part of it. Why is it encountering an early EOF ?

It's not. In the second case you told it to read only 4096 bytes. You
might want to read the docs for the 'read' method, paying particular
attention to the optional argument and its meaning.

Sreejith K · Mar 20, 2009

It's not. In the second case you told it to read only 4096 bytes. You
might want to read the docs for the 'read' method, paying particular
attention to the optional argument and its meaning.

I'm using the above codes in a pthon-fuse's file class's read
function. The offset and length are 0 and 4096 respectively for my
test inputs. When I open a file and read the 4096 bytes from offset,
only a few lines are printed, not the whole file. Actually the file is
only a few bytes. But when I tried reading from the Interactive mode
of python it gave the whole file.

Is there any problem using read() method in fuse-python ?

Also statements like break and continue behaves weirdly in fuse
functions. Any help is appreciated....

R. David Murray · Mar 20, 2009

Thanks for the reply,
Actually the file is only few bytes and file.read() and file.read
(4096) will give the same result, i.e the whole file. But in my case
its not happening (using it in python-fuse file class).. Any other
ideas ?

Not offhand. If it were me I'd start playing with parameters and moving
things around, trying to find additional clues. See if calling read
without the argument in the second case works, start stripping the second
case down until it starts working (even if wrongly for the ultimate goal),
and things like that.

Terry Reedy · Mar 20, 2009

Sreejith said:
I'm using the above codes in a pthon-fuse's file class's read
function. The offset and length are 0 and 4096 respectively for my
test inputs. When I open a file and read the 4096 bytes from offset,
only a few lines are printed, not the whole file. Actually the file is
only a few bytes. But when I tried reading from the Interactive mode
of python it gave the whole file.

Is there any problem using read() method in fuse-python ?

Also statements like break and continue behaves weirdly in fuse
functions. Any help is appreciated....

I had never heard of fuse until today. I get the impression that
working with fuse is not the same as working with the stock os
filesystem, so you might have mentioned this from the beginning;-). I
suggest you direct these questions to python-fuse and fuse-python folk.

I V · Mar 20, 2009

I'm using the above codes in a pthon-fuse's file class's read function.
The offset and length are 0 and 4096 respectively for my test inputs.
When I open a file and read the 4096 bytes from offset, only a few lines
are printed, not the whole file. Actually the file is only a few bytes.
But when I tried reading from the Interactive mode of python it gave the
whole file.

Your example doesn't show what you are doing with "data" after you've
read it. Presumably you're outputting it somehow, which is where you see
that it doesn't contain the whole file. But are you sure the problem is
in the reading, and not in the outputting? Could you show the section of
the code where you output "data"?

Sreejith K · Mar 21, 2009

Your example doesn't show what you are doing with "data" after you've
read it. Presumably you're outputting it somehow, which is where you see
that it doesn't contain the whole file. But are you sure the problem is
in the reading, and not in the outputting? Could you show the section of
the code where you output "data"?

Thanks,

I think there is no use in showing the whole code (which is a
filesystem implementation using python-fuse), but if you want it I'll
post it, see the next post

Forget about the 'data' variable (I'm just returning it at the end of
the function). But even if I'm returning self.snap.read(length), it
doesn't read length (if length is larger than the size of the file
then till EOF) bytes from the file.

But this only happens when the second code samples (they're used in
fuse-python classes) and not in the first example. In the normal
interactive python shell the same procedure returns the whole file
(using read() or read(4096) which is a particular case)...

I'm sure, In my case, self.snap is opening the file ./mango.txt_snaps/
snap1/0 and off and length variables are 0 and 4096 respectively. The
file ./mango.txt_snaps/snap1/0 is only 56 bytes with 9 lines of text.
But either read(4096)/read() in the second example reads/returns only
7 lines. But this happens only in when using it in a fuse-python file
class's read function. In normal interactive mode it reads the whole
file. Hope you understand the problem now. I'm very sorry that I
couldn't explain the problem very well.

Sreejith K · Mar 21, 2009

class MedusaFile(object):
def __init__(self, path, flags, *mode):
global METHOD
global NORMAL
global SNAP
global FRESH_SNAP
self.path = path
tf.writelines("File initiating..\n")
self.file = os.fdopen(os.open("." + path, flags, *mode),flag2mode
(flags))
self.fd = self.file.fileno()
curdir = GetDirPath('.' + path)
self.snapdir = '.' + path + '_snaps'
self.snap_cnt = 0
if os.path.exists(self.snapdir):
self.snap_cnt = len(os.listdir(self.snapdir))
METHOD = SNAP
elif FRESH_SNAP == 1:
self.snap_cnt += 1
os.mkdir(self.snapdir)
os.mkdir(self.snapdir+'/snap%s' % repr(self.snap_cnt))
METHOD = SNAP
FRESH_SNAP = 0
tf.writelines("File initiated..\n")
def read(self, length, offset):
global METHOD
global NORMAL
global SNAP
tf.writelines("Read length: %d offset: %d\n" % (length,offset))
blk_num = offset/4096
no_blks = length/4096 + 1
blocks = []
block_read = False
## Read form the Base File (Snap 1)
if METHOD == NORMAL:
self.file.seek(offset)
tf.writelines("Normal read\n")
return self.file.read(length)
## Read blocks from the snapshots
else:
snap_list = range(self.snap_cnt)
rev_snap_list = snap_list
rev_snap_list.reverse()
for i in snap_list:
blocks.append(os.listdir(self.snapdir+'/snap%s' % repr(i+1))) ##
list of blocks in the current snapshot
off = offset
bend = 0
data = ''
for block in range(blk_num,blk_num+no_blks): ## loop through the
blocks which are to be read
for i in rev_snap_list: ## loop through snapshots (starting from
latest snap)
tf.writelines('Snapshot %d opened..\n' % i)
if repr(block) in blocks: ## Check if it is in the snapshot
tf.writelines("Snap read\n")
self.snap = open(self.snapdir + '/snap%d/%d' % (i+1,
block),'r')
self.snap.seek(off%4096)
bend = 4096-(off%4096)
block_read = True
if length-bend <= 0: ## if only a part of block file is to be
read (i.e. not till the end of block file)
tf.writelines("Partial read from snap \n")
data = self.snap.read(length)
self.snap.close()
break
tf.writelines("Full block read from snap\n")
data += self.snap.read(bend)
length -= bend
off = 4096
self.snap.close()
break
tf.writelines("Block not in snap\n")
if block_read:
block_read = False
continue
## otherwise the block should be in the base file itself. So,
read from there
tf.writelines("Reading from Base File\n")
self.file.seek(block*4096 + off%4096)
bend = 4096-(off%4096)
if length-bend <= 0: ## if only a part of a block is to be read
(not till the end of the block)
data = self.file.read(length)
break
data += self.file.read(bend)
length -= bend
off = 4096
return data

This is the filesystem class for files. Whenever a read occurs an
instance is created and read function is called. In my example when
accessing a file named 'mango.txt' it checks for mango.txt_snaps/snap1
dirctory and open file '0' as self.snap. But the read() returns (i.e
data) a small part....

Almost all the code worked weird in this example. Apart from read(),
the break and continue also works weird. When opening the file
'mango.txt' the following output is written by tf (an output file).
Here METHOD is not NORMAL, self.snap_cnt is 1, blocks is [['0']]

File initiating..
File initiated..
Read length: 4096 offset: 0
Snapshot 0 opened..
Snap read
Partial read from snap
Snapshot 0 opened..
Block not in snap
Reading from Base File

See the weirdness of continue and break here ?(loop was supposed to
loop only once as rev_snap_list contains 0 only)

R. David Murray · Mar 21, 2009

Sreejith K said:
tf.writelines("Reading from Base File\n")
self.file.seek(block*4096 + off%4096)
bend = 4096-(off%4096)
if length-bend <= 0: ## if only a part of a block is to be read
(not till the end of the block)
data = self.file.read(length)
break
data += self.file.read(bend)
length -= bend
off = 4096
return data

This is the filesystem class for files. Whenever a read occurs an
instance is created and read function is called. In my example when
accessing a file named 'mango.txt' it checks for mango.txt_snaps/snap1
dirctory and open file '0' as self.snap. But the read() returns (i.e
data) a small part....

Almost all the code worked weird in this example. Apart from read(),
the break and continue also works weird. When opening the file
'mango.txt' the following output is written by tf (an output file).
Here METHOD is not NORMAL, self.snap_cnt is 1, blocks is [['0']]

File initiating..
File initiated..
Read length: 4096 offset: 0
Snapshot 0 opened..
Snap read
Partial read from snap
Snapshot 0 opened..
Block not in snap
Reading from Base File

See the weirdness of continue and break here ?(loop was supposed to
loop only once as rev_snap_list contains 0 only)

I'm not going to look at all this code now...it looks way too complicated
and in need of some serious refactoring

But a couple of on-point comments:

How do you know rev_snap_list contains only 0? You didn't log it.

Same for the read. How do you know the read didn't read the whole
file? You didn't log it.

Both your statements might be true, but until you show the logging
output proving it, you don't _know_ that your assumptions are true.

Sreejith K · Mar 21, 2009

Sreejith K said:
Sreejith K said:

tf.writelines("Reading from Base File\n")
self.file.seek(block*4096 + off%4096)
bend = 4096-(off%4096)
if length-bend <= 0: ## if only a part of a block is to be read
(not till the end of the block)
data = self.file.read(length)
break
data += self.file.read(bend)
length -= bend
off = 4096
return data

Click to expand...

This is the filesystem class for files. Whenever a read occurs an
instance is created and read function is called. In my example when
accessing a file named 'mango.txt' it checks for mango.txt_snaps/snap1
dirctory and open file '0' as self.snap. But the read() returns (i.e
data) a small part....

Click to expand...

Almost all the code worked weird in this example. Apart from read(),
the break and continue also works weird. When opening the file
'mango.txt' the following output is written by tf (an output file).
Here METHOD is not NORMAL, self.snap_cnt is 1, blocks is [['0']]

Click to expand...

File initiating..
File initiated..
Read length: 4096 offset: 0
Snapshot 0 opened..
Snap read
Partial read from snap
Snapshot 0 opened..
Block not in snap
Reading from Base File

Click to expand...

See the weirdness of continue and break here ?(loop was supposed to
loop only once as rev_snap_list contains 0 only)

Click to expand...

I'm not going to look at all this code now...it looks way too complicated
and in need of some serious refactoring

But a couple of on-point comments:

How do you know rev_snap_list contains only 0? You didn't log it.

Same for the read. How do you know the read didn't read the whole
file? You didn't log it.

Both your statements might be true, but until you show the logging
output proving it, you don't _know_ that your assumptions are true.

Thanks for your comments.

As I said, self.snap_cnt is 1 then

snap_list = range(1)
rev_snap_list = snap_list
rev_snap_list [0]
rev_snap_list.reverse()
rev_snap_list

Click to expand...

Click to expand...

[0]

So I thought there is no need to log it. In the above code, as break
and continue doesn't work as I expected the read() is called twice
("Partial read from snap" and "reading from base file") the output is
a combination of the 2 reads (so it can't be read using 'less'). But
the expected result was "Partial read from snap" i.e read(4096) and
break from the loop. Instead after this break, the loop is again
started ("Snapshot 0 opened.. "). I have no idea what went wrong here.
But when I comment out block_read = False before the continue
statement the log displays the following and the file is read
partially....

File initiating..
File initiated..
Read length: 4096 offset: 0
Snapshot 0 opened..
Snap read
Partial read from snap
Snapshot 0 opened..
Block not in snap

Sreejith K · Mar 21, 2009

The break and continue problem was actually my own mistake. I wrote
no_blks = length/4096 + 1, so the loop actually executes twice. Sorry
for my idiotic mistake....

But the read() problem still persists.....

Thanks..

Steve Holden · Mar 21, 2009

Sreejith said:
The break and continue problem was actually my own mistake. I wrote
no_blks = length/4096 + 1, so the loop actually executes twice. Sorry
for my idiotic mistake....

That's good!

But the read() problem still persists.....

Try and write an example that shows the problem in fifteen lines or
less. Much easier for us to focus on the issue that way.

regards
Steve

Sreejith K · Mar 23, 2009

Try and write an example that shows the problem in fifteen lines or

less. Much easier for us to focus on the issue that way.

import os
def read(length, offset):
os.chdir('/mnt/gfs_local/')
snap = open('mango.txt_snaps/snap1/0','r')
snap.seek(offset)
data = snap.read(length)
print data

read(4096,0)

This code shows what actually happens inside the code I've written.
This prints the 4096 bytes from the file '0' which is only 654 bytes.
When we run the code we get the whole file. That's right. I also get
it. But when this read() function becomes the file class read()
function in fuse, the data printed is not the whole but only a few
lines from the beginning. I usually use less to read a file, when
'less'ing a file (whose size is less than 4096bytes) a call to read
(0,4096) is made and data is returned. 'less' use this data returned
by my fuse read() function to display its contents. But it was
supposed to be the whole lines in the file like the example, but its
not.... This is the problem I'm facing. Did I do something wrong here ?

Sreejith K · Mar 23, 2009

Try and write an example that shows the problem in fifteen lines or

less. Much easier for us to focus on the issue that way.

import os
def read(length, offset):
os.chdir('/mnt/gfs_local/')
snap = open('mango.txt_snaps/snap1/0','r')
snap.seek(offset)
data = snap.read(length)
print data

read(4096,0)

This code shows what actually happens inside the code I've written.
This prints the 4096 bytes from the file '0' which is only 654 bytes.
When we run the code we get the whole file. That's right. I also get
it. But when this read() function becomes the file class read()
function in fuse, the data printed is not the whole but only a few
lines from the beginning. I usually use less to read a file, when
'less'ing a file (whose size is less than 4096bytes) a call to read
(0,4096) is made and data is returned. 'less' use this data returned
by my fuse read() function to display its contents. But it was
supposed to be the whole lines in the file like the example, but its
not.... This is the problem I'm facing. Did I do something wrong here ?

Sreejith K · Mar 23, 2009

Try and write an example that shows the problem in fifteen lines or

less. Much easier for us to focus on the issue that way.

import os
def read(length, offset):
os.chdir('/mnt/gfs_local/')
snap = open('mango.txt_snaps/snap1/0','r')
snap.seek(offset)
data = snap.read(length)
print data

read(4096,0)

This code shows what actually happens inside the code I've written.
This prints the 4096 bytes from the file '0' which is only 654 bytes.
When we run the code we get the whole file. That's right. I also get
it. But when this read() function becomes the file class read()
function in fuse, the data printed is not the whole but only a few
lines from the beginning. I usually use less to read a file, when
'less'ing a file (whose size is less than 4096bytes) a call to read
(0,4096) is made and data is returned. 'less' use this data returned
by my fuse read() function to display its contents. But it was
supposed to be the whole lines in the file like the example, but its
not.... This is the problem I'm facing. Did I do something wrong here ?

R. David Murray · Mar 23, 2009

Sreejith K said:
import os
def read(length, offset):
os.chdir('/mnt/gfs_local/')
snap = open('mango.txt_snaps/snap1/0','r')
snap.seek(offset)
data = snap.read(length)
print data

read(4096,0)

This code shows what actually happens inside the code I've written.
This prints the 4096 bytes from the file '0' which is only 654 bytes.
When we run the code we get the whole file. That's right. I also get
it. But when this read() function becomes the file class read()
function in fuse, the data printed is not the whole but only a few
lines from the beginning. I usually use less to read a file, when
'less'ing a file (whose size is less than 4096bytes) a call to read
(0,4096) is made and data is returned. 'less' use this data returned
by my fuse read() function to display its contents. But it was
supposed to be the whole lines in the file like the example, but its
not.... This is the problem I'm facing. Did I do something wrong here ?

If I'm understanding you correctly, you are saying that when you use
this function as the fuse read function you don't get the whole file,
and you are verifying this by using 'less' to read the 'file' exposed
by fuse. Correct?

So you still have not decoupled the python read from the fuse read in
your debugging. You are focused on the fact that the python read "must
be failing", yet you still (as far as you have told us) not _proven_
that by logging the value returned from the read. Until you do that,
you can't even be sure where your problem is.

If you have done it, show us the logging output, please.

Steve Holden · Mar 23, 2009

Sreejith said:
import os
def read(length, offset):
os.chdir('/mnt/gfs_local/')
snap = open('mango.txt_snaps/snap1/0','r')
snap.seek(offset)
data = snap.read(length)
print data

read(4096,0)

This code shows what actually happens inside the code I've written.
This prints the 4096 bytes from the file '0' which is only 654 bytes.
When we run the code we get the whole file. That's right. I also get
it. But when this read() function becomes the file class read()
function in fuse, the data printed is not the whole but only a few
lines from the beginning.

This is confusing. I presume you to mean that when you make this
function a method of some class it stops operating correctly?

But I am not sure.

I am still struggling to understand your problem. Sorry,it's just a
language thing. If we take our time we will understand each other in the
end.

regards
Steve

R. David Murray · Mar 24, 2009

Steve Holden said:
This is confusing. I presume you to mean that when you make this
function a method of some class it stops operating correctly?

But I am not sure.

I am still struggling to understand your problem. Sorry,it's just a
language thing. If we take our time we will understand each other in the
end.

You may be asking this question for pedagogical reasons, Steve, but
in case not...the OP is apparently doing a 'less xxxx' where xxxx is
the name of a file in a fuse filesystem (that is, a mounted filesystem
whose back end is some application code written by the OP). So when
the OP runs less, several calls get made to fuse, which passes them to
fuse-python, which calls methods on the OP's python class. He is looking
in particular at the 'read' call, which happens after 'less' has opened
the file and wants to read a block (apparently either less or fuse is
asking for the first 4096 bytes of the file). At that point his 'read'
method above is called. But based on what he's told us it appears his
conclusion that the 'snap.read(length)' call is not returning the whole
file is based on the fact that less is only showing him part of the file.
There are several steps between that 'snap.read' and less displaying on
the terminal whatever bytes it got back from its read call in whatever
way it is less chooses to display them....

Gabriel Genellina · Mar 24, 2009

En Mon, 23 Mar 2009 21:37:14 -0300, R. David Murray

Steve Holden said:
Steve Holden said:

This is confusing. I presume you to mean that when you make this
function a method of some class it stops operating correctly?

But I am not sure.

I am still struggling to understand your problem. Sorry,it's just a
language thing. If we take our time we will understand each other in the
end.

Click to expand...

You may be asking this question for pedagogical reasons, Steve, but
in case not...the OP is apparently doing a 'less xxxx' where xxxx is
the name of a file in a fuse filesystem (that is, a mounted filesystem
whose back end is some application code written by the OP). [...]
There are several steps between that 'snap.read' and less displaying on
the terminal whatever bytes it got back from its read call in whatever
way it is less chooses to display them....

And that's why everyone is asking for a *real* log. Assumptions like "foo
must be 0 here" aren't enough. One needs *evidence*: a log file showing
the value of "foo" right when it is used. Then, one can begin to infer
what happens -- first step would be to determine *which* layer is (or is
not) responsible for the misbehavior.

In this case, I'd like to see file.tell(), the requested size and the
returned data length, *right*at*the*read()*call*.

Sreejith K · Mar 24, 2009

En Mon, 23 Mar 2009 21:37:14 -0300, R. David Murray
<[email protected]> escribió:

You may be asking this question for pedagogical reasons, Steve, but
in case not...the OP is apparently doing a 'less xxxx' where xxxx is
the name of a file in a fuse filesystem (that is, a mounted filesystem
whose back end is some application code written by the OP). [...]
There are several steps between that 'snap.read' and less displaying on
the terminal whatever bytes it got back from its read call in whatever
way it is less chooses to display them....

Click to expand...

And that's why everyone is asking for a *real* log. Assumptions like "foo
must be 0 here" aren't enough. One needs *evidence*: a log file showing
the value of "foo" right when it is used. Then, one can begin to infer
what happens -- first step would be to determine *which* layer is (or is
not) responsible for the misbehavior.

In this case, I'd like to see file.tell(), the requested size and the
returned data length, *right*at*the*read()*call*.

R. David Murray understood the problem very well. As you've said I
logged the data returned and see that it actually the complete file.
But the problem is when 'less'ing only two lines are displayed. So its
not regarding the read() of python. Usually in fuse filesystems (as in
some examples), when some file read occurs fuse catch it and calls the
python-fuse's read(length, offset) function. For small files (when we
read using 'less'), this will be usually the first block i.e 4096 with
offset 0. We return what we read from the read() method of fuse-
python. But when I return the data I read, a 'less' operation in my
fuse filesystem shows some lines only, even if the returned data is
the whole file.

In my implementation of fuse-filesystem when a read is called, instead
of returning the data read from the original file, I return the data
read from another file ('0') which resides in <original-file>__snaps/
snap1 directory (I use this directory to store the blocks of original
files when write occurs. So each file here would be 4096 bytes). I'm
doing this because I want to make snapshots of files so that I can
restore the older file easily. The problem occurs when reading this
file and returning the read data.

Some flush/release functions are there in fuse to properly close the
opened file. When reading (less) the original file without snapshots,
there is no issue. But when reading the snapshot instead, the problem
occurs. I open the snapshot file with the same modes as the original
file. Is there anything I should do after read() like the flush() as
for the original file ? I tried it, but no success...

Log when reading from snapshot
=======
Read length: 4096 offset: 0
Snapshot 0 opened..
Snap read
===Data Begin===
Getting started -- pdb.set_trace()

To start, I'll show you the very simplest way to use the Python
debugger.

1. Let's start with a simple program, epdb1.py.

# epdb1.py -- experiment with the Python debugger, pdb
a = "aaa"
b = "bbb"
c = "ccc"
final = a + b + c
print final

2. Insert the following statement at the beginning of your Python
program. This statement imports the Python debugger module, pdb.

import pdb

3. Now find a spot where you would like tracing to begin, and
insert the following code:

pdb.set_trace()
===Data End====
snap.tell(): 654
data size : 654
Original file flushed
Original file closed

data is the whole file, but 'less' gives only the two lines...

help in debugging file.seek, file.read	2	Apr 29, 2007
read() does not read new content on FreeBSD/OpenBSD/OSX	0	Jun 16, 2011
File Read Cache - How to purge?	14	Aug 21, 2007
file.read() returns an emtpy even if its currenet position is not atthe end	2	Apr 22, 2007
it doesn't work ;) [class recursive function]	0	Sep 16, 2010
binmode for readling the whole file?	8	Jun 11, 2010
Python client/server that reads HTML body from server	1	Apr 12, 2023
Read efficiency?	6	Feb 21, 2010

file.read() doesn't read the whole file

Sreejith K

R. David Murray

Sreejith K

R. David Murray

Terry Reedy

I V

Sreejith K

Sreejith K

R. David Murray

Sreejith K

Sreejith K

Steve Holden

Sreejith K

Sreejith K

Sreejith K

R. David Murray

Steve Holden

R. David Murray

Gabriel Genellina

Sreejith K

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads