search and replace in a binary file

Discussion in 'Perl Misc' started by Rafal Konopka, Jul 22, 2006.

  1. I need to search and replace some strings in a binary file. When I try
    something like this the code below, it works fine. The thing is that
    I'll need to use replacements that have more or fewer characters (like
    150 replaced with 20, etc. I know it requires some hairy bitwise
    shifts but I have no idea how to do it.

    TIA,

    Rafal

    &edit_up('myfile');

    sub edit_up {

    my ($infile) = @_;

    undef $/;
    open(F,$infile) || die "$infile: $!";
    binmode(F);
    my $OUT = "test\\";
    if (!-d $OUT) {mkdir($OUT,07770);}

    open(OF,">$OUT" . $infile);
    binmode(OF);

    while (read(F, $buf, 1024)) {
    $buf =~ s/\[150\]/[100]/g;
    print OF $buf;
    }
    close(F);
    close(OF);
    }
    Rafal Konopka, Jul 22, 2006
    #1
    1. Advertising

  2. Rafal Konopka <> wrote:
    > I need to search and replace some strings in a binary file. When I try
    > something like this the code below, it works fine. The thing is that
    > I'll need to use replacements that have more or fewer characters (like
    > 150 replaced with 20, etc. I know it requires some hairy bitwise
    > shifts but I have no idea how to do it.



    If you read from one file, write to another file, and then rename
    the 2nd file, then it requires no trickery at all.

    Perl can do this for you, see "-i" in perlrun.pod and $^I in perlvar.pod,
    though you might have to figure out how to binmode() the ARGV and ARGVOUT
    filehandles.


    > &edit_up('myfile');



    edit_up('myfile');


    You should not use ampersands on subroutine calls unless you know what
    using ampersands on subroutine calls does, and what it does is what
    you want to do. See perlsub.pod.


    > sub edit_up {
    >
    > my ($infile) = @_;
    >
    > undef $/;



    local $/;

    would be better...

    .... but $/ is not used for input via read() anyway, so there is no
    need to set it to anything in particular.


    > open(F,$infile) || die "$infile: $!";



    You check the return value from open(). Good. Very Good.


    > binmode(F);
    > my $OUT = "test\\";



    It is good style to use single quotes on strings unless you want one
    of the two extra things that double quotes give you (variable
    interpolation and/or backslash escapes).

    my $OUT = 'test\\';

    If the pathname is not destined for a Windows "shell", as in this case,
    then using forward slashes in paths is a Good Idea too:

    my $OUT = 'test/';


    > if (!-d $OUT) {mkdir($OUT,07770);}

    ^^^^^
    ^^^^^ those are some mighty
    ^^^^^ funny-looking permissions...


    That will fail if $OUT is a file or pipe or link or ...

    You probably want to test for existence rather than for directory-ness:

    if ( !-e $OUT ) { mkdir($OUT,0777) }

    or written more clearly:

    mkdir $OUT, 0777 unless -e $OUT;


    > open(OF,">$OUT" . $infile);



    Now you are no longer checking the return value from open(). Not So Good.

    It is now apparent that you _are_ reading from one file and writing to another,
    so different lengths should not be a problem.

    Did you try it with different lengths and experience a problem?


    > binmode(OF);
    >
    > while (read(F, $buf, 1024)) {



    You probably need to handle the case where your to-be-replaced value
    is broken across buffer boundaries...


    > $buf =~ s/\[150\]/[100]/g;

    ^^^^^^

    That search string looks suspiciously non-binary to me.


    > print OF $buf;
    > }
    > close(F);
    > close(OF);
    > }



    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jul 22, 2006
    #2
    1. Advertising

  3. On Sat, 22 Jul 2006 09:56:44 -0500, Tad McClellan
    <> wrote:

    >Rafal Konopka <> wrote:
    >>[...]

    >If you read from one file, write to another file, and then rename
    >the 2nd file, then it requires no trickery at all.
    >
    >Perl can do this for you, see "-i" in perlrun.pod and $^I in perlvar.pod,
    >though you might have to figure out how to binmode() the ARGV and ARGVOUT
    >filehandles.
    >
    >
    >> &edit_up('myfile');

    >
    >
    > edit_up('myfile');
    >
    >
    >You should not use ampersands on subroutine calls unless you know what
    >using ampersands on subroutine calls does, and what it does is what
    >you want to do. See perlsub.pod.
    >


    I've been using them (ampersands) all my life :), but I'll check out
    perlsub.pod


    >... but $/ is not used for input via read() anyway, so there is no
    >need to set it to anything in particular.


    OK

    >> if (!-d $OUT) {mkdir($OUT,07770);}

    > ^^^^^
    > ^^^^^ those are some mighty
    > ^^^^^ funny-looking permissions...
    >
    >

    typo

    >Did you try it with different lengths and experience a problem?


    Yes, that's the issue. The moment I replaced 150 with 20, I couldn't
    open the file in the application.

    >You probably need to handle the case where your to-be-replaced value
    >is broken across buffer boundaries...


    Exactly!

    >
    >> $buf =~ s/\[150\]/[100]/g;

    > ^^^^^^
    >
    >That search string looks suspiciously non-binary to me.


    it's just an example. Some of the replacements will be ascii strings
    (like the one above) and some will be binary characters (e.g
    chr(176)). The file itself is a binary file.

    So how do I go about replacing 1 character with, say two or two
    character with 1?

    Thanks for your suggestions.

    Rafal
    Rafal Konopka, Jul 22, 2006
    #3
  4. Rafal Konopka <> wrote:
    > On Sat, 22 Jul 2006 09:56:44 -0500, Tad McClellan
    ><> wrote:
    >
    >>Rafal Konopka <> wrote:
    >>>[...]

    >>If you read from one file, write to another file, and then rename
    >>the 2nd file, then it requires no trickery at all.



    >>Did you try it with different lengths and experience a problem?

    >
    > Yes, that's the issue.



    Did perl make the different length changes or not?


    > The moment I replaced 150 with 20, I couldn't
    > open the file in the application.

    ^^^^^^^^^^^^^^^^^^

    That does not answer the question above.

    Can you see the changes with a file dump or binary editor?


    >>You probably need to handle the case where your to-be-replaced value
    >>is broken across buffer boundaries...

    >
    > Exactly!



    You say that as if it was mentioned in your original, it wasn't, I
    was just pointing out that you may have more than one problem to
    work on.



    > So how do I go about replacing 1 character with, say two or two
    > character with 1?



    I think you aleady know how, by outputting 2 characters instead
    of 1.

    My guess is that perl is making the changes that you need, but that
    those changes are incompatible with your unnamed "application".



    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jul 22, 2006
    #4
  5. On Sat, 22 Jul 2006 13:38:31 -0500, Tad McClellan
    <> wrote:

    >Rafal Konopka <> wrote:
    >> On Sat, 22 Jul 2006 09:56:44 -0500, Tad McClellan
    >><> wrote:
    >>


    >>>Did you try it with different lengths and experience a problem?

    >>
    >> Yes, that's the issue.

    >
    >
    >Did perl make the different length changes or not?
    >
    >
    >> The moment I replaced 150 with 20, I couldn't
    >> open the file in the application.

    > ^^^^^^^^^^^^^^^^^^
    >That does not answer the question above.
    >
    >Can you see the changes with a file dump or binary editor?
    >


    Yes, I can see the changes in the dump file

    I really know nothing about binary files. Having tried the character
    for character replacvement successfully, I tried asymmetric
    replacemtns. While I could see them in the dump file, I could no
    longer open the file in the application.

    >I think you aleady know how, by outputting 2 characters instead
    >of 1.


    >My guess is that perl is making the changes that you need, but that
    >those changes are incompatible with your unnamed "application".


    Essentially, it all boils down to this: imagine I have to replace
    "Jon" with "Jonathan" and conversely "William" with "Billy" in a Word
    document? The straight-forward search and replace is not going to
    work, so how do I do it?

    Rafal
    Rafal Konopka, Jul 22, 2006
    #5
  6. On Sat, 22 Jul 2006 12:44:55 -0400, Rafal Konopka wrote:
    > it's just an example. Some of the replacements will be ascii strings
    > (like the one above) and some will be binary characters (e.g
    > chr(176)). The file itself is a binary file.
    >
    > So how do I go about replacing 1 character with, say two or two
    > character with 1?


    There is no general solution. You need to know the format of the file
    and take care to preserve the format when making changes. For example,
    many binary format use length fields. If you change the length of a
    record, you have to update the length field, too. Some file formats also
    use checksums to detect corruption - then you need to recompute the
    checksum, too.

    Many file formats are documented at http://www.wotsit.org/default.asp
    If the file in question is in a proprietary format you may need to ask
    the vendor for information or reverse engineer it.

    hp


    --
    _ | Peter J. Holzer | > Wieso sollte man etwas erfinden was nicht
    |_|_) | Sysadmin WSR | > ist?
    | | | | Was sonst wäre der Sinn des Erfindens?
    __/ | http://www.hjp.at/ | -- P. Einstein u. V. Gringmuth in desd
    Peter J. Holzer, Jul 22, 2006
    #6
  7. On Sat, 22 Jul 2006 15:15:35 -0400, Rafal Konopka wrote:

    > Essentially, it all boils down to this: imagine I have to replace
    > "Jon" with "Jonathan" and conversely "William" with "Billy" in a Word
    > document? The straight-forward search and replace is not going to
    > work, so how do I do it?


    Either:

    1) Open the document as a COM/.NET object, do a search and replace.

    2) Reverse engineer the binary format of the file and figure out what else
    has to change.

    M4
    --
    Redundancy is a great way to introduce more single points of failure.
    Martijn Lievaart, Jul 23, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andy
    Replies:
    1
    Views:
    355
    Jack Klein
    Nov 25, 2003
  2. Ike
    Replies:
    1
    Views:
    364
    Tony Dahlman
    Nov 30, 2006
  3. Timmy
    Replies:
    5
    Views:
    465
  4. Replies:
    9
    Views:
    452
    Keith Thompson
    Jul 3, 2009
  5. Bogdan

    Binary tree search vs Binary search

    Bogdan, Oct 18, 2010, in forum: C Programming
    Replies:
    22
    Views:
    3,066
    Michael Angelo Ravera
    Oct 21, 2010
Loading...

Share This Page