Problem reading/writing files

S

smeenehan

This is a bit of a peculiar problem. First off, this relates to Python
Challenge #12, so if you are attempting those and have yet to finish
#12, as there are potential spoilers here.

I have five different image files shuffled up in one big binary file.
In order to view them I have to "unshuffle" the data, which means
moving bytes around. Currently my approach is to read the data from the
original, unshuffle as necessary, and then write to 5 different files
(2 .jpgs, 2 .pngs and 1 .gif).

The problem is with the read() method. If I read a byte valued as 0x00
(in hexadecimal), the read method returns a character with the value
0x20. When printed as strings, these two values look the same (null and
space, respectively), but obviously this screws with the data and makes
the resulting image file unreadable. I can add a simple if statement to
correct this, which seems to make the .jpgs readable, but the .pngs
still have errors and the .gif is corrupted, which makes me wonder if
the read method is not doing this to other bytes as well.

Now, the *really* peculiar thing is that I made a simple little file
and used my hex editor to manually change the first byte to 0x00. When
I read that byte with the read() method, it returned the correct value,
which boggles me.

Anyone have any idea what could be going on? Alternatively, is there a
better way to shift about bytes in a non-text file without using the
read() method (since returning the byte as a string seems to be what's
causing the issue)? Thanks in advance!
 
S

Simon Forman

This is a bit of a peculiar problem. First off, this relates to Python
Challenge #12, so if you are attempting those and have yet to finish
#12, as there are potential spoilers here.

I have five different image files shuffled up in one big binary file.
In order to view them I have to "unshuffle" the data, which means
moving bytes around. Currently my approach is to read the data from the
original, unshuffle as necessary, and then write to 5 different files
(2 .jpgs, 2 .pngs and 1 .gif).

The problem is with the read() method. If I read a byte valued as 0x00
(in hexadecimal), the read method returns a character with the value
0x20.

No. It doesn't.

Ok, maybe it does, but I doubt this so severely that, without even
checking, I'll bet you a [virtual] beer it doesn't. :)

Are you opening the file in binary mode?


Ok, I did check, it doesn't.

|>> s = '\0'
|>> len(s)
1
|>> print s
\x00
|>> f = open('noway', 'wb')
|>> f.write(s)
|>> f.close()

Checking that the file is a length 1 null byte:

$ hexdump noway
0000000 0000
0000001
$ ls -l noway
-rw-r--r-- 1 sforman sforman 1 2006-08-03 23:40 noway

Now let's read it and see...

|>> f = open('noway', 'rb')
|>> s = f.read()
|>> f.close()
|>> len(s)
1
|>> print s
\x00

The problem is not with the read() method. Or, if it is, something
very very weird is going on.

If you can do the above and not get the same results I'd be interested
to know what file data you have, what OS you're using.

Peace,
~Simon

(Think about this: More people than you have tried the challenge, if
this happened to them they'd have mentioned it too, and it would have
fixed or at least addressed by now. Maybe.)

(Hmm, or maybe this is *part* of the challenge?)
 
J

John Machin

This is a bit of a peculiar problem. First off, this relates to Python
Challenge #12, so if you are attempting those and have yet to finish
#12, as there are potential spoilers here.

I have five different image files shuffled up in one big binary file.
In order to view them I have to "unshuffle" the data, which means
moving bytes around. Currently my approach is to read the data from the
original, unshuffle as necessary, and then write to 5 different files
(2 .jpgs, 2 .pngs and 1 .gif).

The problem is with the read() method. If I read a byte valued as 0x00
(in hexadecimal), the read method returns a character with the value
0x20.

I doubt it. What platform? What version of Python? Have you opened the
file in binary mode i.e. open('thefile', 'rb') ?? Show us the relevant
parts of your code, plus what caused you to conclude that read()
changed data on the fly in an undocumented fashion.
When printed as strings, these two values look the same (null and
space, respectively),

Use the repr() function when you want to see what's *really* in an
object:

#>>> hasnul = 'a\x00b'
#>>> hasspace = 'a\x20b'
#>>> print hasnul, hasspace
a b a b
#>>> print repr(hasnul), repr(hasspace)
'a\x00b' 'a b'
#>>>

but obviously this screws with the data and makes
the resulting image file unreadable. I can add a simple if statement to
correct this, which seems to make the .jpgs readable, but the .pngs
still have errors and the .gif is corrupted, which makes me wonder if
the read method is not doing this to other bytes as well.

Now, the *really* peculiar thing is that I made a simple little file
and used my hex editor to manually change the first byte to 0x00. When
I read that byte with the read() method, it returned the correct value,
which boggles me.

Anyone have any idea what could be going on? Alternatively, is there a
better way to shift about bytes in a non-text file without using the
read() method (since returning the byte as a string seems to be what's
causing the issue)?

"seems to be" != "is" :)

There is no simple better way. We need to establish what you are
actually doing to cause this problem to seem to happen. Kindly answer
the questions above ;-)

Cheers,
John
 
S

smeenehan

What platform? What version of Python? Have you opened the
file in binary mode i.e. open('thefile', 'rb') ?? Show us the relevant
parts of your code, plus what caused you to conclude that read()
changed data on the fly in an undocumented fashion.

Yes, I've been reading and writing everything in binary mode. I'm using
version 2.4 on a Windows XP machine.

Here is the code that I have been using to split up the original file:

f = open('evil2.gfx','rb')
i1 = open('img1.jpg','wb')
i2 = open('img2.png','wb')
i3 = open('img3.gif','wb')
i4 = open('img4.png','wb')
i5 = open('img5.jpg','wb')


for i in range(0,67575,5):
i1.write(f.read(1))
i2.write(f.read(1))
i3.write(f.read(1))
i4.write(f.read(1))
i5.write(f.read(1))

f.close()
i1.close()
i2.close()
i3.close()
i4.close()
i5.close()

I first noticed the problem by looking at the original file and
img1.jpg side by side with a hex editor. Since img1 contains every 5th
byte from the original file, I was able to find many places where \x00
should have been copied to img1.jpg, but instead a \x20 was copied.
What caused me to suspect the read method was the following:
print repr(s[19:22])
'\xe0 \r'

Now, I have checked many times with a hex editor that the 21st byte of
the file is \x00, yet above you can see that it is reading it as a
space. I've repeated this with several different nulls in the original
file and the result is always the same.

As I said in my original post, when I try simply writing a null to my
own file and reading it (as someone mentioned earlier) everything is
fine. It seems to be only this file which is causing issue.
 
S

smeenehan

Ok, now I'm very confused, even though I just solved my problem. I
copied the entire contents of the original file (evil2.gfx) from my hex
editor and pasted it into a text file. When I read from *this* file
using my original code, everything worked fine. When I read the 21st
byte, it came up as the correct \x00. Why this didn't work in trying to
read from the original file, I don't know, since the hex values should
be the same, but oh well...
 
R

Roel Schroeven

(e-mail address removed) schreef:
f = open('evil2.gfx','rb')
i1 = open('img1.jpg','wb')
i2 = open('img2.png','wb')
i3 = open('img3.gif','wb')
i4 = open('img4.png','wb')
i5 = open('img5.jpg','wb')


for i in range(0,67575,5):
i1.write(f.read(1))
i2.write(f.read(1))
i3.write(f.read(1))
i4.write(f.read(1))
i5.write(f.read(1))

f.close()
i1.close()
i2.close()
i3.close()
i4.close()
i5.close()

I first noticed the problem by looking at the original file and
img1.jpg side by side with a hex editor. Since img1 contains every 5th
byte from the original file, I was able to find many places where \x00
should have been copied to img1.jpg, but instead a \x20 was copied.
What caused me to suspect the read method was the following:
print repr(s[19:22])
'\xe0 \r'

Now, I have checked many times with a hex editor that the 21st byte of
the file is \x00, yet above you can see that it is reading it as a
space. I've repeated this with several different nulls in the original
file and the result is always the same.

As I said in my original post, when I try simply writing a null to my
own file and reading it (as someone mentioned earlier) everything is
fine. It seems to be only this file which is causing issue.

Very weird. I tried your code on my system (Python 2.4, Windows XP) (but
using a copy of evil2.gfx I still had on my system), with no problems.

Are you sure that you don't have 2 copies of that file around, and that
your program is using the wrong one? Or is it possible that some module
imported with 'from blabla import *' clashes with the builtin open()?
 
S

smeenehan

Well, now I tried running the script and it worked fine with the .gfx
file. Originally I was working using the IDLE, which I wouldn't have
thought would make a difference, but when I ran the script on its own
it worked fine and when I ran it in the IDLE it didn't work unless the
data was in a text file. Weird.
 
R

Roel Schroeven

(e-mail address removed) schreef:
Well, now I tried running the script and it worked fine with the .gfx
file. Originally I was working using the IDLE, which I wouldn't have
thought would make a difference, but when I ran the script on its own
it worked fine and when I ran it in the IDLE it didn't work unless the
data was in a text file. Weird.

Weird indeed: I ran the script under IDLE too...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top