Reading the first MB of a binary file

Discussion in 'Python' started by Max Leason, Jan 25, 2009.

  1. Max Leason

    Max Leason Guest

    Hi,

    I'm attempting to read the first MB of a binary file and then do a md5
    hash on it so that i can find the file later despite it being moved or
    any file name changes that may have been made to it. These files are
    large (350-1400MB) video files and i often located on a different
    computer and I figure that there is a low risk for generating the same
    hash between two files. The problem occurs in the read command which
    returns all \x00s. Any ideas why this is happening?

    Code:
    >>>>open("Chuck.S01E01.HDTV.XViD-YesTV.avi", "rb").read(1024)

    b'\x00\x00\x00\x00\x00\x00....\x00'
     
    Max Leason, Jan 25, 2009
    #1
    1. Advertising

  2. Max Leason

    MRAB Guest

    Max Leason wrote:
    > Hi,
    >
    > I'm attempting to read the first MB of a binary file and then do a
    > md5 hash on it so that i can find the file later despite it being
    > moved or any file name changes that may have been made to it. These
    > files are large (350-1400MB) video files and i often located on a
    > different computer and I figure that there is a low risk for
    > generating the same hash between two files. The problem occurs in the
    > read command which returns all \x00s. Any ideas why this is
    > happening?
    >
    > Code:
    >>>>> open("Chuck.S01E01.HDTV.XViD-YesTV.avi", "rb").read(1024)

    > b'\x00\x00\x00\x00\x00\x00....\x00'
    >

    You're reading the first 1024 bytes. Perhaps the first 1024 bytes of the
    file _are_ all zero!

    Try reading more and checking those, eg:

    >>> SIZE = 1024 ** 2
    >>> open("Chuck.S01E01.HDTV.XViD-YesTV.avi", "rb").read(SIZE) ==

    b'\x00' * SIZE
     
    MRAB, Jan 25, 2009
    #2
    1. Advertising

  3. On Sun, 25 Jan 2009 08:37:07 -0800, Max Leason wrote:

    > I'm attempting to read the first MB of a binary file and then do a md5
    > hash on it so that i can find the file later despite it being moved or
    > any file name changes that may have been made to it. These files are
    > large (350-1400MB) video files and i often located on a different
    > computer and I figure that there is a low risk for generating the same
    > hash between two files. The problem occurs in the read command which
    > returns all \x00s. Any ideas why this is happening?
    >
    > Code:
    >>>>>open("Chuck.S01E01.HDTV.XViD-YesTV.avi", "rb").read(1024)

    > b'\x00\x00\x00\x00\x00\x00....\x00'


    As MRAB says, maybe the first 1024 actually *are* all zero bytes. Wild
    guess: That's a file created by a bittorrent client which preallocates
    the files and that file above isn't downloaded completely yet!?

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Jan 25, 2009
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. matt
    Replies:
    9
    Views:
    467
    Andrew Thompson
    Oct 27, 2004
  2. rvr
    Replies:
    11
    Views:
    809
    Alex Popescu
    Jul 11, 2007
  3. Ron Eggler

    writing binary file (ios::binary)

    Ron Eggler, Apr 25, 2008, in forum: C++
    Replies:
    9
    Views:
    966
    James Kanze
    Apr 28, 2008
  4. Guest
    Replies:
    6
    Views:
    1,775
    Guest
    Apr 25, 2010
  5. Richard Schneeman
    Replies:
    16
    Views:
    552
    Daniel Bush
    Aug 27, 2008
Loading...

Share This Page