finding/replacing a long binary pattern in a .bin file

Discussion in 'Python' started by yaipa, Jan 13, 2005.

  1. yaipa

    yaipa Guest

    What would be the common sense way of finding a binary pattern in a
    ..bin file, say some 200 bytes, and replacing it with an updated pattern
    of the same length at the same offset?

    Also, the pattern can occur on any byte boundary in the file, so
    chunking through the code at 16 bytes a frame maybe a problem. The
    file itself isn't so large, maybe 32 kbytes is all and the need for
    speed is not so great, but the need for accuracy in the
    search/replacement is very important.

    Thanks,

    --Alan
    yaipa, Jan 13, 2005
    #1
    1. Advertising

  2. On 12 Jan 2005 22:36:54 -0800, yaipa <> wrote:
    > What would be the common sense way of finding a binary pattern in a
    > .bin file, say some 200 bytes, and replacing it with an updated pattern
    > of the same length at the same offset?
    >
    > Also, the pattern can occur on any byte boundary in the file, so
    > chunking through the code at 16 bytes a frame maybe a problem. The
    > file itself isn't so large, maybe 32 kbytes is all and the need for
    > speed is not so great, but the need for accuracy in the
    > search/replacement is very important.


    Okay, given the requirements.

    f = file('mybinfile')
    contents = f.read().replace(oldbinstring, newbinstring)
    f.close()
    f = file('mybinfile','w')
    f.write(contents)
    f.close()

    Will do it, and do it accurately. But it will also read the entire
    file into memory.

    Stephen.
    Stephen Thorne, Jan 13, 2005
    #2
    1. Advertising

  3. On Thu, 13 Jan 2005 16:51:46 +1000, Stephen Thorne <> wrote:

    >On 12 Jan 2005 22:36:54 -0800, yaipa <> wrote:
    >> What would be the common sense way of finding a binary pattern in a
    >> .bin file, say some 200 bytes, and replacing it with an updated pattern
    >> of the same length at the same offset?
    >>
    >> Also, the pattern can occur on any byte boundary in the file, so
    >> chunking through the code at 16 bytes a frame maybe a problem. The
    >> file itself isn't so large, maybe 32 kbytes is all and the need for
    >> speed is not so great, but the need for accuracy in the
    >> search/replacement is very important.

    >
    >Okay, given the requirements.
    >
    >f = file('mybinfile')
    >contents = f.read().replace(oldbinstring, newbinstring)
    >f.close()
    >f = file('mybinfile','w')
    >f.write(contents)
    >f.close()
    >
    >Will do it, and do it accurately. But it will also read the entire
    >file into memory.
    >

    You must be on linux or such, otherwise you would have shown opening the
    _binary_ files (I assume that's what a .bin file is) with 'rb' and 'wb', IWT.

    Not sure what system the OP was/is on.

    BTW, I'm sure you could write a generator that would take a file name
    and oldbinstring and newbinstring as arguments, and read and yield nice
    os-file-system-friendly disk-sector-multiple chunks, so you could write

    fout = open('mynewbinfile', 'wb')
    for buf in updated_file_stream('myoldbinfile','rb', oldbinstring, newbinstring):
    fout.write(buf)
    fout.close()

    (left as an exercise ;-)
    (modifying a file "in place" is another exercise)
    (doing the latter with defined maximum memory buffer usage
    even when mods increase the length of the file is another ;-)

    Regards,
    Bengt Richter
    Bengt Richter, Jan 13, 2005
    #3
  4. [Stephen Thorne]

    > On 12 Jan 2005 22:36:54 -0800, yaipa <> wrote:
    >
    > > What would be the common sense way of finding a binary pattern in
    > > a .bin file, say some 200 bytes, and replacing it with an updated
    > > pattern of the same length at the same offset? The file itself
    > > isn't so large, maybe 32 kbytes is all and the need for speed is not
    > > so great, but the need for accuracy in the search/replacement is
    > > very important.


    > Okay, given the requirements.


    > f = file('mybinfile')
    > contents = f.read().replace(oldbinstring, newbinstring)
    > f.close()
    > f = file('mybinfile','w')
    > f.write(contents)
    > f.close()


    > Will do it, and do it accurately. But it will also read the entire
    > file into memory.


    32Kb is a small file indeed, reading it in memory is not a problem!

    People sometimes like writing long Python programs. Here is about the
    same, a bit shorter: :)

    buffer = file('mybinfile', 'rb').read().replace(oldbinstring, newbinstring)
    file('mybinfile', 'wb').write(buffer)

    --
    Fran├žois Pinard http://pinard.progiciels-bpi.ca
    =?iso-8859-1?Q?Fran=E7ois?= Pinard, Jan 13, 2005
    #4
  5. yaipa

    Jeff Shannon Guest

    Bengt Richter wrote:

    > BTW, I'm sure you could write a generator that would take a file name
    > and oldbinstring and newbinstring as arguments, and read and yield nice
    > os-file-system-friendly disk-sector-multiple chunks, so you could write
    >
    > fout = open('mynewbinfile', 'wb')
    > for buf in updated_file_stream('myoldbinfile','rb', oldbinstring, newbinstring):
    > fout.write(buf)
    > fout.close()


    What happens when the bytes to be replaced are broken across a block
    boundary? ISTM that neither half would be recognized....

    I believe that this requires either reading the entire file into
    memory, to scan all at once, or else conditionally matching an
    arbitrary fragment of the end of a block against the beginning of the
    oldbinstring... Given that the file in question is only a few tens of
    kbytes, I'd think that doing it in one gulp is simpler. (For a large
    file, chunking it might be necessary, though...)

    Jeff Shannon
    Technician/Programmer
    Credit International
    Jeff Shannon, Jan 13, 2005
    #5
  6. On Thu, 13 Jan 2005 11:40:52 -0800, Jeff Shannon <> wrote:

    >Bengt Richter wrote:
    >
    >> BTW, I'm sure you could write a generator that would take a file name
    >> and oldbinstring and newbinstring as arguments, and read and yield nice
    >> os-file-system-friendly disk-sector-multiple chunks, so you could write
    >>
    >> fout = open('mynewbinfile', 'wb')
    >> for buf in updated_file_stream('myoldbinfile','rb', oldbinstring, newbinstring):
    >> fout.write(buf)
    >> fout.close()

    >
    >What happens when the bytes to be replaced are broken across a block
    >boundary? ISTM that neither half would be recognized....

    That was part of the exercise ;-)

    (Hint: use str.find to find unbroken oldbinstrings in current inputbuffer and buffer out
    safe changes, then when find fails, delete the safely used front of the input buffer,
    and append another chunk from the input file. Repeat until last chunk has been appended
    and find finds no more. Then buffer out the tail of the input buffer (if any) that then
    won't have an oldbinstring to change).

    >
    >I believe that this requires either reading the entire file into
    >memory, to scan all at once, or else conditionally matching an
    >arbitrary fragment of the end of a block against the beginning of the
    >oldbinstring... Given that the file in question is only a few tens of
    >kbytes, I'd think that doing it in one gulp is simpler. (For a large
    >file, chunking it might be necessary, though...)


    It's certainly simpler to do it in one gulp, but it's not really hard to
    do it in chunks. You just have to make sure your input buffer/chunksize is/are
    larger than oldbinstring ;-)

    Regards,
    Bengt Richter
    Bengt Richter, Jan 14, 2005
    #6
  7. On Thu, 13 Jan 2005 11:40:52 -0800, Jeff Shannon <> wrote:

    >Bengt Richter wrote:
    >
    >> BTW, I'm sure you could write a generator that would take a file name
    >> and oldbinstring and newbinstring as arguments, and read and yield nice
    >> os-file-system-friendly disk-sector-multiple chunks, so you could write
    >>
    >> fout = open('mynewbinfile', 'wb')
    >> for buf in updated_file_stream('myoldbinfile','rb', oldbinstring, newbinstring):
    >> fout.write(buf)
    >> fout.close()

    >
    >What happens when the bytes to be replaced are broken across a block
    >boundary? ISTM that neither half would be recognized....
    >
    >I believe that this requires either reading the entire file into
    >memory, to scan all at once, or else conditionally matching an
    >arbitrary fragment of the end of a block against the beginning of the
    >oldbinstring... Given that the file in question is only a few tens of
    >kbytes, I'd think that doing it in one gulp is simpler. (For a large
    >file, chunking it might be necessary, though...)
    >

    Might as well post this, in case you're interested... warning, not very tested.
    You want to write a proper test? ;-)

    ----< sreplace.py >-------------------------------------------------
    def sreplace(sseq, old, new, retsize=4096):
    """
    iterate through sseq input string chunk sequence treating it
    as a continuous stream, replacing each substring old with new,
    and generating a sequence of retsize returned strings, except
    that the last may be shorter depedning on available input.
    """
    inbuf = ''
    endsseq = False
    out = []
    start = 0
    lenold = len(old)
    lennew = len(new)
    while not endsseq:
    start, endprev = old and inbuf.find(old, start) or -1, start
    if start<0:
    start = endprev # restore find start pos
    for chunk in sseq: inbuf+= chunk; break
    else:
    out.append(inbuf[start:])
    endsseq = True
    else:
    out.append(inbuf[endprev:start])
    start += lenold
    out.append(new)
    if endsseq or sum(map(len, out))>=retsize:
    s = ''.join(out)
    while len(s)>= retsize:
    yield s[:retsize]
    s = s[retsize:]
    if endsseq:
    if s: yield s
    else:
    out =

    if __name__ == '__main__':
    import sys
    args = sys.argv[:]
    usage = """
    Test usage: [python] sreplace.py old new retsize [rest of args is string chunks for test]
    where old is old string to find in chunked stream and new is replacement
    and retsize is returned buffer size, except that last may be shorter"""
    if not args[1:]: raise SystemExit, usage
    try:
    args[3] = int(args[3])
    args[0] = iter(sys.argv[4:])
    print '%r\n-----------\n%s\n------------' %(sys.argv[1:], '\n'.join(sreplace(*args[:4])))
    except Exception, e:
    print '%s: %s' %(e.__class__.__name__, e)
    raise SystemExit, usage
    --------------------------------------------------------------------

    As mentioned, not tested very much beyond what you see:

    [ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 20 This is x and abcxdef 012x345 zzxx zzz x
    ['x', '_XX_', '20', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x']
    -----------
    Thisis_XX_andabc_XX_
    def012_XX_345zz_XX__
    XX_zzz_XX_
    ------------

    [ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 80 This is x and abcxdef 012x345 zzxx zzz x
    ['x', '_XX_', '80', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x']
    -----------
    Thisis_XX_andabc_XX_def012_XX_345zz_XX__XX_zzz_XX_
    ------------

    [ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 4 This is x and abcxdef 012x345 zzxx zzz x
    ['x', '_XX_', '4', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x']
    -----------
    This
    is_X
    X_an
    dabc
    _XX_
    def0
    12_X
    X_34
    5zz_
    XX__
    XX_z
    zz_X
    X_
    ------------

    [ 2:44] C:\pywk\ut>py24 sreplace.py def DEF 80 This is x and abcxdef 012x345 zzxx zzz x
    ['def', 'DEF', '80', 'This', 'is', 'x', 'and', 'abcxdef', '012x345', 'zzxx', 'zzz', 'x']
    -----------
    ThisisxandabcxDEF012x345zzxxzzzx
    ------------

    If you wanted to change a binary file, you'd use it something like (although probably let
    the default buffer size be at 4096, not 20, which is pretty silly other than demoing.
    At least the input chunks are 512 ;-)

    >>> from sreplace import sreplace
    >>> fw = open('sreplace.py.txt','wb')
    >>> for buf in sreplace(iter(lambda f=open('sreplace.py','rb'):f.read(512), ''),'out','OUT',20):

    ... fw.write(buf)
    ...
    >>> fw.close()
    >>> ^Z



    [ 3:00] C:\pywk\ut>diff -u sreplace.py sreplace.py.txt
    --- sreplace.py Fri Jan 14 02:39:52 2005
    +++ sreplace.py.txt Fri Jan 14 03:00:01 2005
    @@ -7,7 +7,7 @@
    """
    inbuf = ''
    endsseq = False
    - out = []
    + OUT = []
    start = 0
    lenold = len(old)
    lennew = len(new)
    @@ -17,21 +17,21 @@
    start = endprev # restore find start pos
    for chunk in sseq: inbuf+= chunk; break
    else:
    - out.append(inbuf[start:])
    + OUT.append(inbuf[start:])
    endsseq = True
    else:
    - out.append(inbuf[endprev:start])
    + OUT.append(inbuf[endprev:start])
    start += lenold
    - out.append(new)
    - if endsseq or sum(map(len, out))>=retsize:
    - s = ''.join(out)
    + OUT.append(new)
    + if endsseq or sum(map(len, OUT))>=retsize:
    + s = ''.join(OUT)
    while len(s)>= retsize:
    yield s[:retsize]
    s = s[retsize:]
    if endsseq:
    if s: yield s
    else:
    - out =
    + OUT =

    if __name__ == '__main__':
    import sys


    Regards,
    Bengt Richter
    Bengt Richter, Jan 14, 2005
    #7
  8. yaipa

    yaipa Guest

    Bengt, and all,

    Thanks for all the good input. The problems seems to be that .find()
    is good for text files on Windows, but is not much use when it is
    binary data. The script is for a Assy Language build tool, so I know
    the exact seek address of the binary data that I need to replace, so
    maybe I'll just go that way. It just seemed a little more general to
    do a search and replace rather than having to type in a seek address.

    Of course I could use a Lib function to convert the binary data to
    ascii and back, but seems a little over the top in this case.

    Cheers,

    --Alan


    Bengt Richter wrote:
    > On Thu, 13 Jan 2005 11:40:52 -0800, Jeff Shannon <>

    wrote:
    >
    > >Bengt Richter wrote:
    > >
    > >> BTW, I'm sure you could write a generator that would take a file

    name
    > >> and oldbinstring and newbinstring as arguments, and read and yield

    nice
    > >> os-file-system-friendly disk-sector-multiple chunks, so you could

    write
    > >>
    > >> fout = open('mynewbinfile', 'wb')
    > >> for buf in updated_file_stream('myoldbinfile','rb',

    oldbinstring, newbinstring):
    > >> fout.write(buf)
    > >> fout.close()

    > >
    > >What happens when the bytes to be replaced are broken across a block


    > >boundary? ISTM that neither half would be recognized....
    > >
    > >I believe that this requires either reading the entire file into
    > >memory, to scan all at once, or else conditionally matching an
    > >arbitrary fragment of the end of a block against the beginning of

    the
    > >oldbinstring... Given that the file in question is only a few tens

    of
    > >kbytes, I'd think that doing it in one gulp is simpler. (For a

    large
    > >file, chunking it might be necessary, though...)
    > >

    > Might as well post this, in case you're interested... warning, not

    very tested.
    > You want to write a proper test? ;-)
    >
    > ----< sreplace.py >-------------------------------------------------
    > def sreplace(sseq, old, new, retsize=4096):
    > """
    > iterate through sseq input string chunk sequence treating it
    > as a continuous stream, replacing each substring old with new,
    > and generating a sequence of retsize returned strings, except
    > that the last may be shorter depedning on available input.
    > """
    > inbuf = ''
    > endsseq = False
    > out = []
    > start = 0
    > lenold = len(old)
    > lennew = len(new)
    > while not endsseq:
    > start, endprev = old and inbuf.find(old, start) or -1, start
    > if start<0:
    > start = endprev # restore find start pos
    > for chunk in sseq: inbuf+= chunk; break
    > else:
    > out.append(inbuf[start:])
    > endsseq = True
    > else:
    > out.append(inbuf[endprev:start])
    > start += lenold
    > out.append(new)
    > if endsseq or sum(map(len, out))>=retsize:
    > s = ''.join(out)
    > while len(s)>= retsize:
    > yield s[:retsize]
    > s = s[retsize:]
    > if endsseq:
    > if s: yield s
    > else:
    > out =
    >
    > if __name__ == '__main__':
    > import sys
    > args = sys.argv[:]
    > usage = """
    > Test usage: [python] sreplace.py old new retsize [rest of

    args is string chunks for test]
    > where old is old string to find in chunked stream and new

    is replacement
    > and retsize is returned buffer size, except that last may

    be shorter"""
    > if not args[1:]: raise SystemExit, usage
    > try:
    > args[3] = int(args[3])
    > args[0] = iter(sys.argv[4:])
    > print '%r\n-----------\n%s\n------------' %(sys.argv[1:],

    '\n'.join(sreplace(*args[:4])))
    > except Exception, e:
    > print '%s: %s' %(e.__class__.__name__, e)
    > raise SystemExit, usage
    > --------------------------------------------------------------------
    >
    > As mentioned, not tested very much beyond what you see:
    >
    > [ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 20 This is x and abcxdef

    012x345 zzxx zzz x
    > ['x', '_XX_', '20', 'This', 'is', 'x', 'and', 'abcxdef', '012x345',

    'zzxx', 'zzz', 'x']
    > -----------
    > Thisis_XX_andabc_XX_
    > def012_XX_345zz_XX__
    > XX_zzz_XX_
    > ------------
    >
    > [ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 80 This is x and abcxdef

    012x345 zzxx zzz x
    > ['x', '_XX_', '80', 'This', 'is', 'x', 'and', 'abcxdef', '012x345',

    'zzxx', 'zzz', 'x']
    > -----------
    > Thisis_XX_andabc_XX_def012_XX_345zz_XX__XX_zzz_XX_
    > ------------
    >
    > [ 2:43] C:\pywk\ut>py24 sreplace.py x _XX_ 4 This is x and abcxdef

    012x345 zzxx zzz x
    > ['x', '_XX_', '4', 'This', 'is', 'x', 'and', 'abcxdef', '012x345',

    'zzxx', 'zzz', 'x']
    > -----------
    > This
    > is_X
    > X_an
    > dabc
    > _XX_
    > def0
    > 12_X
    > X_34
    > 5zz_
    > XX__
    > XX_z
    > zz_X
    > X_
    > ------------
    >
    > [ 2:44] C:\pywk\ut>py24 sreplace.py def DEF 80 This is x and abcxdef

    012x345 zzxx zzz x
    > ['def', 'DEF', '80', 'This', 'is', 'x', 'and', 'abcxdef', '012x345',

    'zzxx', 'zzz', 'x']
    > -----------
    > ThisisxandabcxDEF012x345zzxxzzzx
    > ------------
    >
    > If you wanted to change a binary file, you'd use it something like

    (although probably let
    > the default buffer size be at 4096, not 20, which is pretty silly

    other than demoing.
    > At least the input chunks are 512 ;-)
    >
    > >>> from sreplace import sreplace
    > >>> fw = open('sreplace.py.txt','wb')
    > >>> for buf in sreplace(iter(lambda

    f=open('sreplace.py','rb'):f.read(512), ''),'out','OUT',20):
    > ... fw.write(buf)
    > ...
    > >>> fw.close()
    > >>> ^Z

    >
    >
    > [ 3:00] C:\pywk\ut>diff -u sreplace.py sreplace.py.txt
    > --- sreplace.py Fri Jan 14 02:39:52 2005
    > +++ sreplace.py.txt Fri Jan 14 03:00:01 2005
    > @@ -7,7 +7,7 @@
    > """
    > inbuf = ''
    > endsseq = False
    > - out = []
    > + OUT = []
    > start = 0
    > lenold = len(old)
    > lennew = len(new)
    > @@ -17,21 +17,21 @@
    > start = endprev # restore find start pos
    > for chunk in sseq: inbuf+= chunk; break
    > else:
    > - out.append(inbuf[start:])
    > + OUT.append(inbuf[start:])
    > endsseq = True
    > else:
    > - out.append(inbuf[endprev:start])
    > + OUT.append(inbuf[endprev:start])
    > start += lenold
    > - out.append(new)
    > - if endsseq or sum(map(len, out))>=retsize:
    > - s = ''.join(out)
    > + OUT.append(new)
    > + if endsseq or sum(map(len, OUT))>=retsize:
    > + s = ''.join(OUT)
    > while len(s)>= retsize:
    > yield s[:retsize]
    > s = s[retsize:]
    > if endsseq:
    > if s: yield s
    > else:
    > - out =
    > + OUT =
    >
    > if __name__ == '__main__':
    > import sys
    >
    >
    > Regards,
    > Bengt Richter
    yaipa, Jan 14, 2005
    #8
  9. On 14 Jan 2005 15:40:27 -0800, "yaipa" <> wrote:

    >Bengt, and all,
    >
    >Thanks for all the good input. The problems seems to be that .find()
    >is good for text files on Windows, but is not much use when it is
    >binary data. The script is for a Assy Language build tool, so I know

    Did you try it? Why shouldn't find work for binary data?? At the end of
    this, I showed an example of opening and modding a text file _in binary_.

    >>> s= ''.join(chr(i) for i in xrange(256))
    >>> s

    '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\
    x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`ab
    cdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f
    \x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7
    \xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf
    \xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7
    \xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef
    \xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
    >>> for i in xrange(256):

    ... assert i == s.find(chr(i))
    ...
    >>>


    I.e., all the finds succeded for all 256 possible bytes. Why wouldn't you think that would work fine
    for data from a binary file? Of course, find is case sensitive and fixed, not a regex, so it's
    not very flexible. It wouldn't be that hard to expand to a list of old,new pairs as a change spec
    though. Of course that would slow it down some.


    >the exact seek address of the binary data that I need to replace, so
    >maybe I'll just go that way. It just seemed a little more general to
    >do a search and replace rather than having to type in a seek address.

    Except you run the risk of not having a unique search result, unless you
    have a really guaranteed unique pattern.
    >
    >Of course I could use a Lib function to convert the binary data to
    >ascii and back, but seems a little over the top in this case.

    I think you misunderstand Python strings. There is no need to "convert" the result
    of open(filename, 'rb').read(chunksize). Re-read the example below ;-)
    [...]
    >>
    >> If you wanted to change a binary file, you'd use it something like

    ^^^^^^^^^^^
    >(although probably let
    >> the default buffer size be at 4096, not 20, which is pretty silly

    >other than demoing.
    >> At least the input chunks are 512 ;-)
    >>
    >> >>> from sreplace import sreplace
    >> >>> fw = open('sreplace.py.txt','wb')

    opens a binary output file

    >> >>> for buf in sreplace(iter(lambda

    >f=open('sreplace.py','rb'):f.read(512), ''),'out','OUT',20):

    iter(f, sentinel) is the format above. I creates an iterator that
    keeps calling f() until f()==sentinel, which it doesn't return, and that ends the sequence
    f in this case is lambda f=open(inputfilename):f.read(inputchunksize)
    and the sentinel is '' -- which is what is returned at EOF.
    The old thing to find was 'out', to be changed to 'OUT', and the 20 was a silly small
    return chunks size for the sreplace(...) iterator. Alll these chunks were simply passed
    to
    >> ... fw.write(buf)
    >> ...
    >> >>> fw.close()

    and closing the file explicitly wrapped it up.
    >> >>> ^Z


    I just typed that in interactively to demo the file change process with the source itself, so the diff
    could show the changes. I guess I should have made sreplace.py runnable as a binary file updater, rather
    than a cute demo using command line text. The files are no worry, but what is the source of your old
    and new binary patterns that you want use for find and replace? You can't enter them in unescaped format
    on a command line, so you may want to specify them in separate binary files, or you could specify them
    as Python strings in a module that could be imported. E.g.,

    ---< old2new.py >------
    # example of various ways to specify binary bytes in strings
    from binascii import unhexlify as hex2chr
    old = (
    'This is plain text.'
    + ''.join(map(chr,[33,44,55, 0xaa])) + '<<-- arbitrary list of binary bytes specified in numerically if desired'
    + chr(33)+chr(44)+chr(55)+ '<<-- though this is plainer for a short sequence'
    + hex2chr('4142433031320001ff') + r'<<-- should be ABC012\x00\x01\xff'
    )

    new = '\x00'*len(old) # replace with zero bytes
    -----------------------

    BTW: Note: changing binaries can be dangerous! Do so at your own risk!!
    And this has not been tested worth a darn, so caveat**n.

    ---< binfupd.py >------
    from sreplace import sreplace
    def main(infnam, outfnam, old, new):
    infile = open(infnam, 'rb')
    inseq = iter(lambda: infile.read(4096), '')
    outfile = open(outfnam, 'wb')
    try:
    try:
    for buf in sreplace(inseq, old, new):
    outfile.write(buf)
    finally:
    infile.close()
    outfile.close()
    except Exception, e:
    print '%s:%s' %(e.__class__.__name__, e)

    if __name__ == '__main__':
    import sys
    try:
    oldnew = __import__(sys.argv[3])
    main(sys.argv[1], sys.argv[2], oldnew.old, oldnew.new)
    except Exception, e:
    print '%s:%s' %(e.__class__.__name__, e)
    raise SystemExit, """
    Usage: [python] binfupd.py infname outfname oldnewmodulename
    where infname is read in binary, and outfname is written
    in binary, replacing instances of old binary data with new
    specified as python strings named old and new respectively
    in a module named oldnewmodulename (without .py extension).
    """
    -----------------------

    REMEMBER: NO WARRANTY FOR ANY PURPOSE! USE AT YOUR OWN RISK!

    And, if you know where to seek to, that seems like the best way ;-)

    Regards,
    Bengt Richter
    Bengt Richter, Jan 15, 2005
    #9
  10. yaipa

    John Lenton Guest

    On Wed, Jan 12, 2005 at 10:36:54PM -0800, yaipa wrote:
    > What would be the common sense way of finding a binary pattern in a
    > .bin file, say some 200 bytes, and replacing it with an updated pattern
    > of the same length at the same offset?
    >
    > Also, the pattern can occur on any byte boundary in the file, so
    > chunking through the code at 16 bytes a frame maybe a problem. The
    > file itself isn't so large, maybe 32 kbytes is all and the need for
    > speed is not so great, but the need for accuracy in the
    > search/replacement is very important.


    ok, after having read the answers, I feel I must, once again, bring
    mmap into the discussion. It's not that I'm any kind of mmap expert,
    that I twirl mmaps for a living; in fact I barely have cause to use it
    in my work, but give me a break! this is the kind of thing mmap
    *shines* at!

    Let's say m is your mmap handle, a is the pattern you want to find,
    b is the pattern you want to replace, and n is the size of both a and
    b.

    You do this:

    p = m.find(a)
    m[p:p+n] = b

    and that is *it*. Ok, so getting m to be a mmap handle takes more work
    than open() (*) A *lot* more work, in fact, so maybe you're justified
    in not using it; some people can't afford the extra

    s = os.stat(fn).st_size
    m = mmap.mmap(f.fileno(), s)

    and now I'm all out of single-letter variables.

    *) why isn't mmap easier to use? I've never used it with something
    other than the file size as its second argument, and with its access
    argument in sync with open()'s second arg.

    --
    John Lenton () -- Random fortune:
    If the aborigine drafted an IQ test, all of Western civilization would
    presumably flunk it.
    -- Stanley Garn

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.5 (GNU/Linux)

    iD8DBQFB6KYegPqu395ykGsRAi2MAKCAgLlfIfiKMvOYTN3n+hWgd/u7wgCgkEIv
    pr3dzPovxdjsVbZjhIVC+6E=
    =dNOf
    -----END PGP SIGNATURE-----
    John Lenton, Jan 15, 2005
    #10
  11. yaipa

    yaipa Guest

    John,

    Thanks for reminding me of the mmap module. The following worked as
    expected.
    #--------------------------------------------------------
    import mmap

    source_data = open("source_file.bin", 'rb').read()
    search_data = open("search_data.bin", 'rb').read()
    replace_data = open("replace_data.bin", 'rb').read()

    # copy source.bin to modified.bin
    open("modified.bin", 'wb').write(open("source_file.bin", 'rb').read())

    fp = open("modified.bin", 'r+')
    mm = mmap.mmap(fp.fileno(), 0)

    start_addr = mm.find(search_data)
    end_addr = start_addr + len(replace_data)
    mm[start_addr:end_addr] = replace_data

    mm.close()
    #--------------------------------------------------------

    Although, I choose impliment string method approach in the build tool
    because there are two occurances of *Pattern* in the .bin file to be
    updated and the string method did both in one shot.

    Cheers,

    --Alan


    John Lenton wrote:
    > On Wed, Jan 12, 2005 at 10:36:54PM -0800, yaipa wrote:
    > > What would be the common sense way of finding a binary pattern in a
    > > .bin file, say some 200 bytes, and replacing it with an updated

    pattern
    > > of the same length at the same offset?
    > >
    > > Also, the pattern can occur on any byte boundary in the file, so
    > > chunking through the code at 16 bytes a frame maybe a problem. The
    > > file itself isn't so large, maybe 32 kbytes is all and the need for
    > > speed is not so great, but the need for accuracy in the
    > > search/replacement is very important.

    >
    > ok, after having read the answers, I feel I must, once again, bring
    > mmap into the discussion. It's not that I'm any kind of mmap expert,
    > that I twirl mmaps for a living; in fact I barely have cause to use

    it
    > in my work, but give me a break! this is the kind of thing mmap
    > *shines* at!
    >
    > Let's say m is your mmap handle, a is the pattern you want to find,
    > b is the pattern you want to replace, and n is the size of both a and
    > b.
    >
    > You do this:
    >
    > p = m.find(a)
    > m[p:p+n] = b
    >
    > and that is *it*. Ok, so getting m to be a mmap handle takes more

    work
    > than open() (*) A *lot* more work, in fact, so maybe you're justified
    > in not using it; some people can't afford the extra
    >
    > s = os.stat(fn).st_size
    > m = mmap.mmap(f.fileno(), s)
    >
    > and now I'm all out of single-letter variables.
    >
    > *) why isn't mmap easier to use? I've never used it with something
    > other than the file size as its second argument, and with its access
    > argument in sync with open()'s second arg.
    >
    > --
    > John Lenton () -- Random fortune:
    > If the aborigine drafted an IQ test, all of Western civilization

    would
    > presumably flunk it.
    > -- Stanley Garn
    yaipa, Jan 19, 2005
    #11
  12. yaipa

    yaipa Guest

    Thanks Francois,

    It worked as expected.
    -------------------------------------------------------------------------------
    source_data = open("source_data.bin", 'rb').read()
    search_data = open("search_data.bin", 'rb').read()
    replace_data = open("replace_data.bin", 'rb').read()
    outFile = open("mod.bin", 'wb')

    file_offset = source_data.find(search_data)
    print "file_offset:", file_offset

    outData = source_data.replace(search_data, replace_data)
    outFile.write(outData)
    outFile.close
    print ""
    yaipa, Jan 19, 2005
    #12
  13. yaipa

    yaipa Guest

    Thanks Francois,

    It worked as expected.
    -------------------------------------------------------------------------------
    source_data = open("source_data.bin", 'rb').read()
    search_data = open("search_data.bin", 'rb').read()
    replace_data = open("replace_data.bin", 'rb').read()
    outFile = open("mod.bin", 'wb')

    file_offset = source_data.find(search_data)
    print "file_offset:", file_offset

    outData = source_data.replace(search_data, replace_data)
    outFile.write(outData)
    outFile.close
    print ""
    yaipa, Jan 19, 2005
    #13
  14. yaipa

    yaipa Guest

    Bengt,

    Thanks for the input, sorry, your diff threw me the first time I looked
    at it, but then I went back and tried it later. Yes it works fine and
    I've tucked it away for later use. For this particular Use Case
    String.replace seems to get the job done in short order and the tool
    needs to be maintained by folks not familiar /w Python so I went a head
    and used that. But, I image I will use this bit of code when I need a
    finer grained tool.

    Thanks again.
    Cheers,

    --Alan
    yaipa, Jan 19, 2005
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kevin Mitchell

    Can "bin" be changed to "cgi-bin" for asp.net

    Kevin Mitchell, Oct 19, 2003, in forum: ASP .Net
    Replies:
    3
    Views:
    788
    Wim Hollebrandse
    Oct 19, 2003
  2. John Salerno
    Replies:
    30
    Views:
    1,939
    Stephan Kuhagen
    Aug 10, 2006
  3. sweety
    Replies:
    9
    Views:
    1,005
    Richard Heathfield
    Feb 7, 2006
  4. anne001
    Replies:
    1
    Views:
    398
  5. Shashank Khanvilkar

    finding a binary pattern in a file.

    Shashank Khanvilkar, Sep 20, 2005, in forum: Perl Misc
    Replies:
    2
    Views:
    120
    News KF
    Sep 20, 2005
Loading...

Share This Page