Problem reading/writing files

Discussion in 'Python' started by smeenehan@hmc.edu, Aug 4, 2006.

  1. Guest

    This is a bit of a peculiar problem. First off, this relates to Python
    Challenge #12, so if you are attempting those and have yet to finish
    #12, as there are potential spoilers here.

    I have five different image files shuffled up in one big binary file.
    In order to view them I have to "unshuffle" the data, which means
    moving bytes around. Currently my approach is to read the data from the
    original, unshuffle as necessary, and then write to 5 different files
    (2 .jpgs, 2 .pngs and 1 .gif).

    The problem is with the read() method. If I read a byte valued as 0x00
    (in hexadecimal), the read method returns a character with the value
    0x20. When printed as strings, these two values look the same (null and
    space, respectively), but obviously this screws with the data and makes
    the resulting image file unreadable. I can add a simple if statement to
    correct this, which seems to make the .jpgs readable, but the .pngs
    still have errors and the .gif is corrupted, which makes me wonder if
    the read method is not doing this to other bytes as well.

    Now, the *really* peculiar thing is that I made a simple little file
    and used my hex editor to manually change the first byte to 0x00. When
    I read that byte with the read() method, it returned the correct value,
    which boggles me.

    Anyone have any idea what could be going on? Alternatively, is there a
    better way to shift about bytes in a non-text file without using the
    read() method (since returning the byte as a string seems to be what's
    causing the issue)? Thanks in advance!
     
    , Aug 4, 2006
    #1
    1. Advertising

  2. faulkner Guest

    have you been using text mode?

    wrote:
    > This is a bit of a peculiar problem. First off, this relates to Python
    > Challenge #12, so if you are attempting those and have yet to finish
    > #12, as there are potential spoilers here.
    >
    > I have five different image files shuffled up in one big binary file.
    > In order to view them I have to "unshuffle" the data, which means
    > moving bytes around. Currently my approach is to read the data from the
    > original, unshuffle as necessary, and then write to 5 different files
    > (2 .jpgs, 2 .pngs and 1 .gif).
    >
    > The problem is with the read() method. If I read a byte valued as 0x00
    > (in hexadecimal), the read method returns a character with the value
    > 0x20. When printed as strings, these two values look the same (null and
    > space, respectively), but obviously this screws with the data and makes
    > the resulting image file unreadable. I can add a simple if statement to
    > correct this, which seems to make the .jpgs readable, but the .pngs
    > still have errors and the .gif is corrupted, which makes me wonder if
    > the read method is not doing this to other bytes as well.
    >
    > Now, the *really* peculiar thing is that I made a simple little file
    > and used my hex editor to manually change the first byte to 0x00. When
    > I read that byte with the read() method, it returned the correct value,
    > which boggles me.
    >
    > Anyone have any idea what could be going on? Alternatively, is there a
    > better way to shift about bytes in a non-text file without using the
    > read() method (since returning the byte as a string seems to be what's
    > causing the issue)? Thanks in advance!
     
    faulkner, Aug 4, 2006
    #2
    1. Advertising

  3. Simon Forman Guest

    wrote:
    > This is a bit of a peculiar problem. First off, this relates to Python
    > Challenge #12, so if you are attempting those and have yet to finish
    > #12, as there are potential spoilers here.
    >
    > I have five different image files shuffled up in one big binary file.
    > In order to view them I have to "unshuffle" the data, which means
    > moving bytes around. Currently my approach is to read the data from the
    > original, unshuffle as necessary, and then write to 5 different files
    > (2 .jpgs, 2 .pngs and 1 .gif).
    >
    > The problem is with the read() method. If I read a byte valued as 0x00
    > (in hexadecimal), the read method returns a character with the value
    > 0x20.


    No. It doesn't.

    Ok, maybe it does, but I doubt this so severely that, without even
    checking, I'll bet you a [virtual] beer it doesn't. :)

    Are you opening the file in binary mode?


    Ok, I did check, it doesn't.

    |>> s = '\0'
    |>> len(s)
    1
    |>> print s
    \x00
    |>> f = open('noway', 'wb')
    |>> f.write(s)
    |>> f.close()

    Checking that the file is a length 1 null byte:

    $ hexdump noway
    0000000 0000
    0000001
    $ ls -l noway
    -rw-r--r-- 1 sforman sforman 1 2006-08-03 23:40 noway

    Now let's read it and see...

    |>> f = open('noway', 'rb')
    |>> s = f.read()
    |>> f.close()
    |>> len(s)
    1
    |>> print s
    \x00

    The problem is not with the read() method. Or, if it is, something
    very very weird is going on.

    If you can do the above and not get the same results I'd be interested
    to know what file data you have, what OS you're using.

    Peace,
    ~Simon

    (Think about this: More people than you have tried the challenge, if
    this happened to them they'd have mentioned it too, and it would have
    fixed or at least addressed by now. Maybe.)

    (Hmm, or maybe this is *part* of the challenge?)
     
    Simon Forman, Aug 4, 2006
    #3
  4. John Machin Guest

    wrote:
    > This is a bit of a peculiar problem. First off, this relates to Python
    > Challenge #12, so if you are attempting those and have yet to finish
    > #12, as there are potential spoilers here.
    >
    > I have five different image files shuffled up in one big binary file.
    > In order to view them I have to "unshuffle" the data, which means
    > moving bytes around. Currently my approach is to read the data from the
    > original, unshuffle as necessary, and then write to 5 different files
    > (2 .jpgs, 2 .pngs and 1 .gif).
    >
    > The problem is with the read() method. If I read a byte valued as 0x00
    > (in hexadecimal), the read method returns a character with the value
    > 0x20.


    I doubt it. What platform? What version of Python? Have you opened the
    file in binary mode i.e. open('thefile', 'rb') ?? Show us the relevant
    parts of your code, plus what caused you to conclude that read()
    changed data on the fly in an undocumented fashion.

    > When printed as strings, these two values look the same (null and
    > space, respectively),


    Use the repr() function when you want to see what's *really* in an
    object:

    #>>> hasnul = 'a\x00b'
    #>>> hasspace = 'a\x20b'
    #>>> print hasnul, hasspace
    a b a b
    #>>> print repr(hasnul), repr(hasspace)
    'a\x00b' 'a b'
    #>>>


    > but obviously this screws with the data and makes
    > the resulting image file unreadable. I can add a simple if statement to
    > correct this, which seems to make the .jpgs readable, but the .pngs
    > still have errors and the .gif is corrupted, which makes me wonder if
    > the read method is not doing this to other bytes as well.
    >
    > Now, the *really* peculiar thing is that I made a simple little file
    > and used my hex editor to manually change the first byte to 0x00. When
    > I read that byte with the read() method, it returned the correct value,
    > which boggles me.
    >
    > Anyone have any idea what could be going on? Alternatively, is there a
    > better way to shift about bytes in a non-text file without using the
    > read() method (since returning the byte as a string seems to be what's
    > causing the issue)?


    "seems to be" != "is" :)

    There is no simple better way. We need to establish what you are
    actually doing to cause this problem to seem to happen. Kindly answer
    the questions above ;-)

    Cheers,
    John
     
    John Machin, Aug 4, 2006
    #4
  5. Guest

    > What platform? What version of Python? Have you opened the
    > file in binary mode i.e. open('thefile', 'rb') ?? Show us the relevant
    > parts of your code, plus what caused you to conclude that read()
    > changed data on the fly in an undocumented fashion.


    Yes, I've been reading and writing everything in binary mode. I'm using
    version 2.4 on a Windows XP machine.

    Here is the code that I have been using to split up the original file:

    f = open('evil2.gfx','rb')
    i1 = open('img1.jpg','wb')
    i2 = open('img2.png','wb')
    i3 = open('img3.gif','wb')
    i4 = open('img4.png','wb')
    i5 = open('img5.jpg','wb')


    for i in range(0,67575,5):
    i1.write(f.read(1))
    i2.write(f.read(1))
    i3.write(f.read(1))
    i4.write(f.read(1))
    i5.write(f.read(1))

    f.close()
    i1.close()
    i2.close()
    i3.close()
    i4.close()
    i5.close()

    I first noticed the problem by looking at the original file and
    img1.jpg side by side with a hex editor. Since img1 contains every 5th
    byte from the original file, I was able to find many places where \x00
    should have been copied to img1.jpg, but instead a \x20 was copied.
    What caused me to suspect the read method was the following:

    >>> f = open('evil2.gfx','rb')
    >>> s = f.read()

    print repr(s[19:22])
    '\xe0 \r'

    Now, I have checked many times with a hex editor that the 21st byte of
    the file is \x00, yet above you can see that it is reading it as a
    space. I've repeated this with several different nulls in the original
    file and the result is always the same.

    As I said in my original post, when I try simply writing a null to my
    own file and reading it (as someone mentioned earlier) everything is
    fine. It seems to be only this file which is causing issue.
     
    , Aug 4, 2006
    #5
  6. Guest

    Ok, now I'm very confused, even though I just solved my problem. I
    copied the entire contents of the original file (evil2.gfx) from my hex
    editor and pasted it into a text file. When I read from *this* file
    using my original code, everything worked fine. When I read the 21st
    byte, it came up as the correct \x00. Why this didn't work in trying to
    read from the original file, I don't know, since the hex values should
    be the same, but oh well...
     
    , Aug 4, 2006
    #6
  7. schreef:
    > f = open('evil2.gfx','rb')
    > i1 = open('img1.jpg','wb')
    > i2 = open('img2.png','wb')
    > i3 = open('img3.gif','wb')
    > i4 = open('img4.png','wb')
    > i5 = open('img5.jpg','wb')
    >
    >
    > for i in range(0,67575,5):
    > i1.write(f.read(1))
    > i2.write(f.read(1))
    > i3.write(f.read(1))
    > i4.write(f.read(1))
    > i5.write(f.read(1))
    >
    > f.close()
    > i1.close()
    > i2.close()
    > i3.close()
    > i4.close()
    > i5.close()
    >
    > I first noticed the problem by looking at the original file and
    > img1.jpg side by side with a hex editor. Since img1 contains every 5th
    > byte from the original file, I was able to find many places where \x00
    > should have been copied to img1.jpg, but instead a \x20 was copied.
    > What caused me to suspect the read method was the following:
    >
    >>>> f = open('evil2.gfx','rb')
    >>>> s = f.read()

    > print repr(s[19:22])
    > '\xe0 \r'
    >
    > Now, I have checked many times with a hex editor that the 21st byte of
    > the file is \x00, yet above you can see that it is reading it as a
    > space. I've repeated this with several different nulls in the original
    > file and the result is always the same.
    >
    > As I said in my original post, when I try simply writing a null to my
    > own file and reading it (as someone mentioned earlier) everything is
    > fine. It seems to be only this file which is causing issue.


    Very weird. I tried your code on my system (Python 2.4, Windows XP) (but
    using a copy of evil2.gfx I still had on my system), with no problems.

    Are you sure that you don't have 2 copies of that file around, and that
    your program is using the wrong one? Or is it possible that some module
    imported with 'from blabla import *' clashes with the builtin open()?

    --
    If I have been able to see further, it was only because I stood
    on the shoulders of giants. -- Isaac Newton

    Roel Schroeven
     
    Roel Schroeven, Aug 4, 2006
    #7
  8. Guest

    Well, now I tried running the script and it worked fine with the .gfx
    file. Originally I was working using the IDLE, which I wouldn't have
    thought would make a difference, but when I ran the script on its own
    it worked fine and when I ran it in the IDLE it didn't work unless the
    data was in a text file. Weird.
     
    , Aug 4, 2006
    #8
  9. schreef:
    > Well, now I tried running the script and it worked fine with the .gfx
    > file. Originally I was working using the IDLE, which I wouldn't have
    > thought would make a difference, but when I ran the script on its own
    > it worked fine and when I ran it in the IDLE it didn't work unless the
    > data was in a text file. Weird.


    Weird indeed: I ran the script under IDLE too...


    --
    If I have been able to see further, it was only because I stood
    on the shoulders of giants. -- Isaac Newton

    Roel Schroeven
     
    Roel Schroeven, Aug 4, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Modukuri
    Replies:
    0
    Views:
    2,909
    Modukuri
    May 19, 2004
  2. Daniel Gowans

    Reading/Writing pure binary files

    Daniel Gowans, May 27, 2004, in forum: VHDL
    Replies:
    2
    Views:
    5,196
  3. TC
    Replies:
    3
    Views:
    26,176
    jessica
    May 19, 2004
  4. Replies:
    0
    Views:
    790
  5. Replies:
    6
    Views:
    392
    Joe Wright
    Oct 9, 2006
Loading...

Share This Page