search and replace in a binary file

R

Rafal Konopka

I need to search and replace some strings in a binary file. When I try
something like this the code below, it works fine. The thing is that
I'll need to use replacements that have more or fewer characters (like
150 replaced with 20, etc. I know it requires some hairy bitwise
shifts but I have no idea how to do it.

TIA,

Rafal

&edit_up('myfile');

sub edit_up {

my ($infile) = @_;

undef $/;
open(F,$infile) || die "$infile: $!";
binmode(F);
my $OUT = "test\\";
if (!-d $OUT) {mkdir($OUT,07770);}

open(OF,">$OUT" . $infile);
binmode(OF);

while (read(F, $buf, 1024)) {
$buf =~ s/\[150\]/[100]/g;
print OF $buf;
}
close(F);
close(OF);
}
 
T

Tad McClellan

Rafal Konopka said:
I need to search and replace some strings in a binary file. When I try
something like this the code below, it works fine. The thing is that
I'll need to use replacements that have more or fewer characters (like
150 replaced with 20, etc. I know it requires some hairy bitwise
shifts but I have no idea how to do it.


If you read from one file, write to another file, and then rename
the 2nd file, then it requires no trickery at all.

Perl can do this for you, see "-i" in perlrun.pod and $^I in perlvar.pod,
though you might have to figure out how to binmode() the ARGV and ARGVOUT
filehandles.

&edit_up('myfile');


edit_up('myfile');


You should not use ampersands on subroutine calls unless you know what
using ampersands on subroutine calls does, and what it does is what
you want to do. See perlsub.pod.

sub edit_up {

my ($infile) = @_;

undef $/;


local $/;

would be better...

.... but $/ is not used for input via read() anyway, so there is no
need to set it to anything in particular.

open(F,$infile) || die "$infile: $!";


You check the return value from open(). Good. Very Good.

binmode(F);
my $OUT = "test\\";


It is good style to use single quotes on strings unless you want one
of the two extra things that double quotes give you (variable
interpolation and/or backslash escapes).

my $OUT = 'test\\';

If the pathname is not destined for a Windows "shell", as in this case,
then using forward slashes in paths is a Good Idea too:

my $OUT = 'test/';

if (!-d $OUT) {mkdir($OUT,07770);}
^^^^^
^^^^^ those are some mighty
^^^^^ funny-looking permissions...


That will fail if $OUT is a file or pipe or link or ...

You probably want to test for existence rather than for directory-ness:

if ( !-e $OUT ) { mkdir($OUT,0777) }

or written more clearly:

mkdir $OUT, 0777 unless -e $OUT;

open(OF,">$OUT" . $infile);


Now you are no longer checking the return value from open(). Not So Good.

It is now apparent that you _are_ reading from one file and writing to another,
so different lengths should not be a problem.

Did you try it with different lengths and experience a problem?

binmode(OF);

while (read(F, $buf, 1024)) {


You probably need to handle the case where your to-be-replaced value
is broken across buffer boundaries...

$buf =~ s/\[150\]/[100]/g;
^^^^^^

That search string looks suspiciously non-binary to me.
 
R

Rafal Konopka

Rafal Konopka said:
If you read from one file, write to another file, and then rename
the 2nd file, then it requires no trickery at all.

Perl can do this for you, see "-i" in perlrun.pod and $^I in perlvar.pod,
though you might have to figure out how to binmode() the ARGV and ARGVOUT
filehandles.

&edit_up('myfile');


edit_up('myfile');


You should not use ampersands on subroutine calls unless you know what
using ampersands on subroutine calls does, and what it does is what
you want to do. See perlsub.pod.

I've been using them (ampersands) all my life :), but I'll check out
perlsub.pod

... but $/ is not used for input via read() anyway, so there is no
need to set it to anything in particular.
OK

^^^^^
^^^^^ those are some mighty
^^^^^ funny-looking permissions...

typo

Did you try it with different lengths and experience a problem?

Yes, that's the issue. The moment I replaced 150 with 20, I couldn't
open the file in the application.
You probably need to handle the case where your to-be-replaced value
is broken across buffer boundaries...
Exactly!
$buf =~ s/\[150\]/[100]/g;
^^^^^^

That search string looks suspiciously non-binary to me.

it's just an example. Some of the replacements will be ascii strings
(like the one above) and some will be binary characters (e.g
chr(176)). The file itself is a binary file.

So how do I go about replacing 1 character with, say two or two
character with 1?

Thanks for your suggestions.

Rafal
 
T

Tad McClellan

Rafal Konopka said:
Rafal Konopka said:
If you read from one file, write to another file, and then rename
the 2nd file, then it requires no trickery at all.
Did you try it with different lengths and experience a problem?

Yes, that's the issue.


Did perl make the different length changes or not?

The moment I replaced 150 with 20, I couldn't
open the file in the application.
^^^^^^^^^^^^^^^^^^

That does not answer the question above.

Can you see the changes with a file dump or binary editor?



You say that as if it was mentioned in your original, it wasn't, I
was just pointing out that you may have more than one problem to
work on.


So how do I go about replacing 1 character with, say two or two
character with 1?


I think you aleady know how, by outputting 2 characters instead
of 1.

My guess is that perl is making the changes that you need, but that
those changes are incompatible with your unnamed "application".
 
R

Rafal Konopka

Did perl make the different length changes or not?


^^^^^^^^^^^^^^^^^^
That does not answer the question above.

Can you see the changes with a file dump or binary editor?

Yes, I can see the changes in the dump file

I really know nothing about binary files. Having tried the character
for character replacvement successfully, I tried asymmetric
replacemtns. While I could see them in the dump file, I could no
longer open the file in the application.
I think you aleady know how, by outputting 2 characters instead
of 1.
My guess is that perl is making the changes that you need, but that
those changes are incompatible with your unnamed "application".

Essentially, it all boils down to this: imagine I have to replace
"Jon" with "Jonathan" and conversely "William" with "Billy" in a Word
document? The straight-forward search and replace is not going to
work, so how do I do it?

Rafal
 
P

Peter J. Holzer

it's just an example. Some of the replacements will be ascii strings
(like the one above) and some will be binary characters (e.g
chr(176)). The file itself is a binary file.

So how do I go about replacing 1 character with, say two or two
character with 1?

There is no general solution. You need to know the format of the file
and take care to preserve the format when making changes. For example,
many binary format use length fields. If you change the length of a
record, you have to update the length field, too. Some file formats also
use checksums to detect corruption - then you need to recompute the
checksum, too.

Many file formats are documented at http://www.wotsit.org/default.asp
If the file in question is in a proprietary format you may need to ask
the vendor for information or reverse engineer it.

hp
 
M

Martijn Lievaart

Essentially, it all boils down to this: imagine I have to replace
"Jon" with "Jonathan" and conversely "William" with "Billy" in a Word
document? The straight-forward search and replace is not going to
work, so how do I do it?

Either:

1) Open the document as a COM/.NET object, do a search and replace.

2) Reverse engineer the binary format of the file and figure out what else
has to change.

M4
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top