find and replace string in binary file

Discussion in 'Python' started by loial, Mar 4, 2014.

  1. loial

    loial Guest

    How do I read a binary file, find/identify a character string and replace it with another character string and write out to another file?

    Its the finding of the string in a binary file that I am not clear on.

    Any help appreciated
     
    loial, Mar 4, 2014
    #1
    1. Advertisements

  2. loial

    MRAB Guest

    Read it in chunks and search each chunk (the chunks should be at least
    as long as the search string).

    You should note that the string you're looking for could be split
    across 2 chunks, so when writing the code make sure that you include
    some overlap between adjacent chunks (it's best if the overlap is at
    least N-1 characters, where N is the length of the search string).
     
    MRAB, Mar 4, 2014
    #2
    1. Advertisements

  3. loial

    Peter Otten Guest

    That's not possible. You have to convert either binary to string or string
    to binary before you can replace. Whatever you choose, you have to know the
    encoding of the file. Consider

    #python3
    ENCODING = "iso-8859-1"
    with open(source, encoding=ENCODING) as infile:
    data = infile.read()
    with open(dest, "w", encoding=ENCODING) as outfile:
    outfile.write(data.replace("nötig", "möglich"))

    If the file is indeed iso-8859-1 this will replace occurrences of the bytes

    b'n\xf6tig' with b'm\xf6glich'

    But if you were guessing wrong and the file is utf-8 it may contain the
    bytes b'n\xc3\xb6tig' instead which are incorrectly interpreted by your
    script as 'nötig' and thus left as is.
     
    Peter Otten, Mar 4, 2014
    #3
  4. If it's actually a binary file (as in, an executable, or an image, or
    something), then the *file* won't have an encoding, so you'll need to
    know the encoding of the particular string you want and encode your
    string to bytes.

    ChrisA
     
    Chris Angelico, Mar 4, 2014
    #4
  5. loial

    emile Guest


    On 2.7 it's as easy as it sounds without having to think much about
    encodings and such. I find it mostly just works.

    emile@paj39:~$ which python
    /usr/bin/python
    emile@paj39:~$ python
    Python 2.7.3 (default, Sep 26 2013, 16:38:10)
    [GCC 4.7.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    emile@paj39:~$ chmod a+x /home/emile/pyecho
    emile@paj39:~$ /home/emile/pyecho
    Python 2.7.3 (default, Sep 26 2013, 16:38:10)
    [GCC 4.7.2] on linux2
    Echo "help", "copyright", "credits" or "license" for more information.

    YMMV,

    Emile
     
    emile, Mar 5, 2014
    #5
  6. loial

    loial Guest

    Thanks Emile.

    Unfortunately I have to use python 2.6 for this


     
    loial, Mar 5, 2014
    #6
  7. loial

    Dave Angel Guest

    I see from another message that you're using Python 2.6. That
    makes a huge difference and should have been in your query, along
    with a minimal code sample.

    Is the binary file under 100 MB or so? Then open it (in binary
    mode 'rb'), and read it. You'll now have a (large) byte string
    containing the entire file.

    The next question is whether you're sure that your search and
    replace strings are ASCII. Assuming that is probably a mistake,
    but it will get you started.

    Now the substitution is trivial:
    new_bytes = old_bytes.replace (search, replace)
    It's also possible to emulate that with find and slice, mainly if
    you need to report progress to the user.

    If the search and/or replace strings are not ASCII, you have to
    know what encoding the file may have used for them. You need to
    build a Unicode string, encode it the same way as the file uses,
    and then call the replace method.

    Now for a huge caveat. If you don't know the binary format,
    you're risking the creation of pure junk. Here are just two
    examples of what might go wrong, assuming the file is an
    executable. The same risks exist for other files, but I'm just
    supposing.

    If the two byte strings are not the same length, then all the
    remaining code and data in the file will be moved to a new spot.
    If you're lucky, the code will crash quickly, since all
    pointers referencing that code and data are incorrect.

    If some non-textual part of the file happens to match your search
    string you're going to likely trash that portion of the code. If
    the search string is large enough, maybe this is unlikely. But
    I recall taking the challenge of writing assembly programs which
    could be generated entirely from one or more type commands
    (msdos)
     
    Dave Angel, Mar 5, 2014
    #7
  8. Mark Lawrence, Mar 5, 2014
    #8
  9. Did you try it?

    Emile
     
    Emile van Sebille, Mar 5, 2014
    #9
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.