Problem reading/writing files

smeenehan · Aug 4, 2006

This is a bit of a peculiar problem. First off, this relates to Python
Challenge #12, so if you are attempting those and have yet to finish
#12, as there are potential spoilers here.

I have five different image files shuffled up in one big binary file.
In order to view them I have to "unshuffle" the data, which means
moving bytes around. Currently my approach is to read the data from the
original, unshuffle as necessary, and then write to 5 different files
(2 .jpgs, 2 .pngs and 1 .gif).

The problem is with the read() method. If I read a byte valued as 0x00
(in hexadecimal), the read method returns a character with the value
0x20. When printed as strings, these two values look the same (null and
space, respectively), but obviously this screws with the data and makes
the resulting image file unreadable. I can add a simple if statement to
correct this, which seems to make the .jpgs readable, but the .pngs
still have errors and the .gif is corrupted, which makes me wonder if
the read method is not doing this to other bytes as well.

Now, the *really* peculiar thing is that I made a simple little file
and used my hex editor to manually change the first byte to 0x00. When
I read that byte with the read() method, it returned the correct value,
which boggles me.

Anyone have any idea what could be going on? Alternatively, is there a
better way to shift about bytes in a non-text file without using the
read() method (since returning the byte as a string seems to be what's
causing the issue)? Thanks in advance!

faulkner · Aug 4, 2006

have you been using text mode?

Simon Forman · Aug 4, 2006

This is a bit of a peculiar problem. First off, this relates to Python
Challenge #12, so if you are attempting those and have yet to finish
#12, as there are potential spoilers here.

I have five different image files shuffled up in one big binary file.
In order to view them I have to "unshuffle" the data, which means
moving bytes around. Currently my approach is to read the data from the
original, unshuffle as necessary, and then write to 5 different files
(2 .jpgs, 2 .pngs and 1 .gif).

The problem is with the read() method. If I read a byte valued as 0x00
(in hexadecimal), the read method returns a character with the value
0x20.

No. It doesn't.

Ok, maybe it does, but I doubt this so severely that, without even
checking, I'll bet you a [virtual] beer it doesn't.

Are you opening the file in binary mode?

Ok, I did check, it doesn't.

|>> s = '\0'
|>> len(s)
1
|>> print s
\x00
|>> f = open('noway', 'wb')
|>> f.write(s)
|>> f.close()

Checking that the file is a length 1 null byte:

$ hexdump noway
0000000 0000
0000001
$ ls -l noway
-rw-r--r-- 1 sforman sforman 1 2006-08-03 23:40 noway

Now let's read it and see...

|>> f = open('noway', 'rb')
|>> s = f.read()
|>> f.close()
|>> len(s)
1
|>> print s
\x00

The problem is not with the read() method. Or, if it is, something
very very weird is going on.

If you can do the above and not get the same results I'd be interested
to know what file data you have, what OS you're using.

Peace,
~Simon

(Think about this: More people than you have tried the challenge, if
this happened to them they'd have mentioned it too, and it would have
fixed or at least addressed by now. Maybe.)

(Hmm, or maybe this is *part* of the challenge?)

John Machin · Aug 4, 2006

This is a bit of a peculiar problem. First off, this relates to Python
Challenge #12, so if you are attempting those and have yet to finish
#12, as there are potential spoilers here.

I have five different image files shuffled up in one big binary file.
In order to view them I have to "unshuffle" the data, which means
moving bytes around. Currently my approach is to read the data from the
original, unshuffle as necessary, and then write to 5 different files
(2 .jpgs, 2 .pngs and 1 .gif).

The problem is with the read() method. If I read a byte valued as 0x00
(in hexadecimal), the read method returns a character with the value
0x20.

I doubt it. What platform? What version of Python? Have you opened the
file in binary mode i.e. open('thefile', 'rb') ?? Show us the relevant
parts of your code, plus what caused you to conclude that read()
changed data on the fly in an undocumented fashion.

When printed as strings, these two values look the same (null and
space, respectively),

Use the repr() function when you want to see what's *really* in an
object:

#>>> hasnul = 'a\x00b'
#>>> hasspace = 'a\x20b'
#>>> print hasnul, hasspace
a b a b
#>>> print repr(hasnul), repr(hasspace)
'a\x00b' 'a b'
#>>>

but obviously this screws with the data and makes
the resulting image file unreadable. I can add a simple if statement to
correct this, which seems to make the .jpgs readable, but the .pngs
still have errors and the .gif is corrupted, which makes me wonder if
the read method is not doing this to other bytes as well.

Now, the *really* peculiar thing is that I made a simple little file
and used my hex editor to manually change the first byte to 0x00. When
I read that byte with the read() method, it returned the correct value,
which boggles me.

Anyone have any idea what could be going on? Alternatively, is there a
better way to shift about bytes in a non-text file without using the
read() method (since returning the byte as a string seems to be what's
causing the issue)?

"seems to be" != "is"

There is no simple better way. We need to establish what you are
actually doing to cause this problem to seem to happen. Kindly answer
the questions above ;-)

Cheers,
John

smeenehan · Aug 4, 2006

What platform? What version of Python? Have you opened the

file in binary mode i.e. open('thefile', 'rb') ?? Show us the relevant
parts of your code, plus what caused you to conclude that read()
changed data on the fly in an undocumented fashion.

Yes, I've been reading and writing everything in binary mode. I'm using
version 2.4 on a Windows XP machine.

Here is the code that I have been using to split up the original file:

f = open('evil2.gfx','rb')
i1 = open('img1.jpg','wb')
i2 = open('img2.png','wb')
i3 = open('img3.gif','wb')
i4 = open('img4.png','wb')
i5 = open('img5.jpg','wb')

for i in range(0,67575,5):
i1.write(f.read(1))
i2.write(f.read(1))
i3.write(f.read(1))
i4.write(f.read(1))
i5.write(f.read(1))

f.close()
i1.close()
i2.close()
i3.close()
i4.close()
i5.close()

I first noticed the problem by looking at the original file and
img1.jpg side by side with a hex editor. Since img1 contains every 5th
byte from the original file, I was able to find many places where \x00
should have been copied to img1.jpg, but instead a \x20 was copied.
What caused me to suspect the read method was the following:
print repr(s[19:22])
'\xe0 \r'

Now, I have checked many times with a hex editor that the 21st byte of
the file is \x00, yet above you can see that it is reading it as a
space. I've repeated this with several different nulls in the original
file and the result is always the same.

As I said in my original post, when I try simply writing a null to my
own file and reading it (as someone mentioned earlier) everything is
fine. It seems to be only this file which is causing issue.

smeenehan · Aug 4, 2006

Ok, now I'm very confused, even though I just solved my problem. I
copied the entire contents of the original file (evil2.gfx) from my hex
editor and pasted it into a text file. When I read from *this* file
using my original code, everything worked fine. When I read the 21st
byte, it came up as the correct \x00. Why this didn't work in trying to
read from the original file, I don't know, since the hex values should
be the same, but oh well...

Roel Schroeven · Aug 4, 2006

(e-mail address removed) schreef:

f = open('evil2.gfx','rb')
i1 = open('img1.jpg','wb')
i2 = open('img2.png','wb')
i3 = open('img3.gif','wb')
i4 = open('img4.png','wb')
i5 = open('img5.jpg','wb')

for i in range(0,67575,5):
i1.write(f.read(1))
i2.write(f.read(1))
i3.write(f.read(1))
i4.write(f.read(1))
i5.write(f.read(1))

f.close()
i1.close()
i2.close()
i3.close()
i4.close()
i5.close()

I first noticed the problem by looking at the original file and
img1.jpg side by side with a hex editor. Since img1 contains every 5th
byte from the original file, I was able to find many places where \x00
should have been copied to img1.jpg, but instead a \x20 was copied.
What caused me to suspect the read method was the following:
print repr(s[19:22])
'\xe0 \r'

Now, I have checked many times with a hex editor that the 21st byte of
the file is \x00, yet above you can see that it is reading it as a
space. I've repeated this with several different nulls in the original
file and the result is always the same.

As I said in my original post, when I try simply writing a null to my
own file and reading it (as someone mentioned earlier) everything is
fine. It seems to be only this file which is causing issue.

Very weird. I tried your code on my system (Python 2.4, Windows XP) (but
using a copy of evil2.gfx I still had on my system), with no problems.

Are you sure that you don't have 2 copies of that file around, and that
your program is using the wrong one? Or is it possible that some module
imported with 'from blabla import *' clashes with the builtin open()?

smeenehan · Aug 4, 2006

Well, now I tried running the script and it worked fine with the .gfx
file. Originally I was working using the IDLE, which I wouldn't have
thought would make a difference, but when I ran the script on its own
it worked fine and when I ran it in the IDLE it didn't work unless the
data was in a text file. Weird.

Roel Schroeven · Aug 4, 2006

(e-mail address removed) schreef:

Well, now I tried running the script and it worked fine with the .gfx
file. Originally I was working using the IDLE, which I wouldn't have
thought would make a difference, but when I ran the script on its own
it worked fine and when I ran it in the IDLE it didn't work unless the
data was in a text file. Weird.

Weird indeed: I ran the script under IDLE too...

How can I train a neural network by reading different csv files	0	Nov 24, 2022
Reading/writing a dictionary to file problem :(	1	Mar 31, 2020
Pyro4 - reading files	1	Jan 28, 2014
Timing problem	4	May 1, 2023
Getting Error reading in JSON file	0	Apr 28, 2022
Sending Error when attaching files	1	Aug 7, 2023
Reading, writing files	2	Aug 21, 2009
Problems reading tif files	2	Sep 27, 2010

Problem reading/writing files

smeenehan

faulkner

Simon Forman

John Machin

smeenehan

smeenehan

Roel Schroeven

smeenehan

Roel Schroeven

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads