Replacing characters in file

A

Anony-mouse

Hi,

I'm trying to find a way to replace characters in a file, but have run
into problems. I'm trying to go through a .DAT file replacing the
characters 015F (in hex) with 015C.

I tried playing around with versions of SED, but the file also contains
EOF control characters which cause that to abort part-way through the
file (although it wasn't doing the replacement anyway).

It also needs to be via the DOS command line or a similar way that can
be performed by a BAT file and be as small as possible since it needs
to run from a keyring Flash drive and still leave enough room for the
data files.

Is Perl going to be able to do this??
Or does anyone know a better way?

_
_/ \___
Anony-mouse says o_/O _/ \
"Eek-eek-eek!" \__/_|_/_|\____/
 
K

Klaus

Hi,

I'm trying to find a way to replace characters in a file, but have run
into problems. I'm trying to go through a .DAT file replacing the
characters 015F (in hex) with 015C.

I tried playing around with versions of SED, but the file also contains
EOF control characters which cause that to abort part-way through the
file (although it wasn't doing the replacement anyway).

It also needs to be via the DOS command line or a similar way that can
be performed by a BAT file and be as small as possible since it needs
to run from a keyring Flash drive and still leave enough room for the
data files.

Is Perl going to be able to do this??

C:\>perl -Mbytes -Mopen=IO,:raw -pi.bak -e "s/\x01\x5c/\x01\x5f/g"
test.dat

see also "perldoc perlopentut" and "perldoc perlrun"

"-Mbytes" is needed to disallow character encodings (such as Utf8,
etc...)
"-Mopen=IO,:raw" forces I/O to "binmode" (i.e. disallow transformation
of CR/LF)
"-pi.bak" performs inplace editing (that's the "-i" part)
whereas the "-p" runs automated loop to process the file line by line

I am running ActiveState Perl on Windows XP

C:\>perl -v

This is perl, v5.8.8 built for MSWin32-x86-multi-thread
(with 50 registered patches, see perl -V for more detail)

Copyright 1987-2006, Larry Wall

Binary build 820 [274739] provided by ActiveState http://www.ActiveState.com
Built Jan 23 2007 15:57:46
 
A

Anony-mouse

Klaus said:
Hi,

I'm trying to find a way to replace characters in a file, but have run
into problems. I'm trying to go through a .DAT file replacing the
characters 015F (in hex) with 015C.

I tried playing around with versions of SED, but the file also contains
EOF control characters which cause that to abort part-way through the
file (although it wasn't doing the replacement anyway).

It also needs to be via the DOS command line or a similar way that can
be performed by a BAT file and be as small as possible since it needs
to run from a keyring Flash drive and still leave enough room for the
data files.

Is Perl going to be able to do this??

C:\>perl -Mbytes -Mopen=IO,:raw -pi.bak -e "s/\x01\x5c/\x01\x5f/g"
test.dat

see also "perldoc perlopentut" and "perldoc perlrun"

"-Mbytes" is needed to disallow character encodings (such as Utf8,
etc...)
"-Mopen=IO,:raw" forces I/O to "binmode" (i.e. disallow transformation
of CR/LF)
"-pi.bak" performs inplace editing (that's the "-i" part)
whereas the "-p" runs automated loop to process the file line by line

I am running ActiveState Perl on Windows XP

C:\>perl -v

This is perl, v5.8.8 built for MSWin32-x86-multi-thread
(with 50 registered patches, see perl -V for more detail)

Copyright 1987-2006, Larry Wall

Binary build 820 [274739] provided by ActiveState http://www.ActiveState.com
Built Jan 23 2007 15:57:46


Many thanks. I'll give that a try. :eek:)

_
_/ \___
Anony-mouse says o_/O _/ \
"Eek-eek-eek!" \__/_|_/_|\____/
 
K

Klaus

Anony-mouse schreef:
I'm trying to find a way to replace [..]
characters 015F (in hex) with 015C.

Is this about double-byte characters?

....or maybe even about converting BCD (binary coded (packed) decimal)
from unsigned value "15" to signed value "+15" ?
 
A

Anony-mouse

"Dr.Ruud" said:
Anony-mouse schreef:
I'm trying to find a way to replace [..] characters 015F (in hex) with 015C.

Is this about double-byte characters?

Two characters one after the other in the file. I can't simply replace
5F by 5C since there are other 5F characters that must remain
unchanged. It's only those preceeded by a 01 that need to be changed.


The original answer posted doesn't work with my version of Perl. I'm
attempting to download the Active version, but it will take me a while
via dial-up. With an installer size of 15MB I'm not sure it's going to
be the right solution for me since I need it to be small to run from a
keyring Flash drive on any computer it gets plugged into. :eek:(

_
_/ \___
Anony-mouse says o_/O _/ \
"Eek-eek-eek!" \__/_|_/_|\____/
 
D

Dr.Ruud

Anony-mouse schreef:
Dr.Ruud:
Anony-mouse:
I'm trying to find a way to replace [..] characters 015F (in hex)
with 015C.

Is this about double-byte characters?

Two characters one after the other in the file.

Please say "bytes" (or "octets") if you mean those. A single character
can occupy many bytes in a file, think UTF-8.
I can't simply replace
5F by 5C since there are other 5F characters that must remain
unchanged. It's only those preceeded by a 01 that need to be changed.

That answer was already given by Klaus. See also sed or awk.

The original answer posted doesn't work with my version of Perl

It was a generic answer. What error messages did you get?
 
A

Anony-mouse

"Dr.Ruud" said:
Anony-mouse schreef:
Dr.Ruud:
Anony-mouse:
I'm trying to find a way to replace [..] characters 015F (in hex)
with 015C.

Is this about double-byte characters?

Two characters one after the other in the file.

Please say "bytes" (or "octets") if you mean those. A single character
can occupy many bytes in a file, think UTF-8.

They are single byte characters as displayed by my hex editor. The
character I want to replace is a "_" (hex 5F) but ONLY when preceeded
by the 01 control character. It needs to be replaced by a "\" character
(hex 5C).

I may also need to swap them back again later, but that's easy to do
once I get it working this way around.


That answer was already given by Klaus. See also sed or awk.

The SED I tried didn't work because the file contains the DOS EOF
control characters (hex A0) that cause it to abort before reaching the
actual end of the file.


It was a generic answer. What error messages did you get?

Both Mbytes and MOpen are unknown commands, because it's an older
version since I was trying to get as small a filesize as possible to
put on a keyring Flash drive.


_
_/ \___
Anony-mouse says o_/O _/ \
"Eek-eek-eek!" \__/_|_/_|\____/
 
J

John W. Krahn

Anony-mouse said:
"Dr.Ruud" said:
Anony-mouse schreef:
Dr.Ruud:
Anony-mouse:
I'm trying to find a way to replace [..] characters 015F (in hex)
with 015C.
Is this about double-byte characters?
Two characters one after the other in the file.
Please say "bytes" (or "octets") if you mean those. A single character
can occupy many bytes in a file, think UTF-8.

They are single byte characters as displayed by my hex editor. The
character I want to replace is a "_" (hex 5F) but ONLY when preceeded
by the 01 control character. It needs to be replaced by a "\" character
(hex 5C).

s/(?<=\x01)\x5F/\x5C/g



John
 
D

Dr.Ruud

Anony-mouse schreef:
They are single byte characters as displayed by my hex editor. The
character I want to replace is a "_" (hex 5F) but ONLY when preceeded
by the 01 control character. It needs to be replaced by a "\"
character (hex 5C).

You keep talking about characters, so can we assume you have a "text"
file with (because you mentioned DOS) CRLF line endings?

But if your .DAT is a "binary" file, read perlopentut.
 
A

Anony-mouse

Michele Dondi said:
There are no "Mbytes and MOpen commands". There are the bytes and open
pragma(tic module)s. And the -M cmd line switch that provides a very
convenient shortcut in place of a C<use> statement. If your perl
doesn't support these, then it must be *very* old and you'd better
replace it anyway. If its size is excessive for your needs, then you
may package your own distro with a reduced number of accompanying
modules and possibly without documentation. But that's not much
reasonable: more reasonably there are several ways to package a
script, a perl interpreter and all the needed modules into a single
executable file. That may come close to a bare minimum that could be
suitable for your needs.

The original reply was:

perl -Mbytes -Mopen=IO,:raw -pi.bak -e
"s/\x01\x5c/\x01\x5f/g" test.dat

My version of Perl throws up an error unless I remove both of the first
two parameters / switches / commands / whatever you want to call them.
_
_/ \___
Anony-mouse says o_/O _/ \
"Eek-eek-eek!" \__/_|_/_|\____/
 
A

Anony-mouse

"John W. Krahn" said:
Anony-mouse said:
"Dr.Ruud" said:
Anony-mouse schreef:
Dr.Ruud:
Anony-mouse:
I'm trying to find a way to replace [..] characters 015F (in hex)
with 015C.
Is this about double-byte characters?
Two characters one after the other in the file.
Please say "bytes" (or "octets") if you mean those. A single character
can occupy many bytes in a file, think UTF-8.

They are single byte characters as displayed by my hex editor. The
character I want to replace is a "_" (hex 5F) but ONLY when preceeded
by the 01 control character. It needs to be replaced by a "\" character
(hex 5C).

s/(?<=\x01)\x5F/\x5C/g

I tried things like that in SED, but it didn't work because it stops
when it hits an EOF character before it reaches the real end of the
file.

I think I also tried a similar thing in Perl by taking out the -Mbytes
and -Mopen commands from the original reply and got the same aborted
result.
_
_/ \___
Anony-mouse says o_/O _/ \
"Eek-eek-eek!" \__/_|_/_|\____/
 
A

Anony-mouse

"Dr.Ruud" said:
Anony-mouse schreef:


You keep talking about characters, so can we assume you have a "text"
file with (because you mentioned DOS) CRLF line endings?

But if your .DAT is a "binary" file, read perlopentut.

Call them whatever you want, but I have said all along that the
"characters" are 01 and 5F in hex ... 01 is not an ASCII character, but
is a control character for "start of record" (or something like that
from memory).

I have also said that it is a .DAT file containing various control
characters, so it can't be a plain text file, which is why I never said
"text file" anywhere.

If it was a simple text file, then SED would have worked.
_
_/ \___
Anony-mouse says o_/O _/ \
"Eek-eek-eek!" \__/_|_/_|\____/
 
T

Tad McClellan

Anony-mouse said:
01 is not an ASCII character,


Yes it is.

but
is a control character


Many ASCII characters are control characters.

for "start of record" (or something like that
from memory).


Its name, defined by ASCII, is Start Of Heading (SOH).

I have also said that it is a .DAT file containing various control
characters, so it can't be a plain text file, which is why I never said
"text file" anywhere.


Text files can, and nearly always do, contain control
characters (LF, line feed, for example).
 
M

Mumia W.

"Dr.Ruud" said:
[...]
But if your .DAT is a "binary" file, read perlopentut.

Call them whatever you want, but I have said all along that the
"characters" are 01 and 5F in hex ... 01 is not an ASCII character, but
is a control character for "start of record" (or something like that
from memory).

I have also said that it is a .DAT file containing various control
characters, so it can't be a plain text file, which is why I never said
"text file" anywhere.

If it was a simple text file, then SED would have worked.
_
_/ \___
Anony-mouse says o_/O _/ \
"Eek-eek-eek!" \__/_|_/_|\____/

So I take it that you want us to help you write a program that can make
modifications to a binary file, and you want that program to fit on a
USB key.

Why do you want to do this?
 
J

Jürgen Exner

Anony-mouse said:
it stops
when it hits an EOF character before it reaches the real end of the
file.

Your text file contains an EOF before the end of the file? That is hmmm,
well, unusual, don't you think?
Maybe you should look into fixing the software that created this file?

If on the other hand you don't have a text file, then maybe it is time to
stop treating it as a text file but start handling it as a binary file
instead.

jue
 
A

Anony-mouse

"Jürgen Exner" said:
Your text file contains an EOF before the end of the file? That is hmmm,
well, unusual, don't you think?
Maybe you should look into fixing the software that created this file?

If on the other hand you don't have a text file, then maybe it is time to
stop treating it as a text file but start handling it as a binary file
instead.

It's not a text file. As I said it's a .DAT file that contains control
characters, including EOF ones. :eek:\
_
_/ \___
Anony-mouse says o_/O _/ \
"Eek-eek-eek!" \__/_|_/_|\____/
 
A

Anony-mouse

Michele Dondi said:
I understood that. Now, did you understand I suggested you to upgrade?

As I said in a previous message. I was using an old version that was
nice and small (around 300K) to fit on a Flash drive. Obvoiusly this
old version doesn't like the Mbytes and Mopen options.

I'm trying to get the Active Perl that the original person was using,
but it's 15MB and I'm on a limited time dial-up account, so it will
take a while before I can even try it. Although that's the installer
filesize, it may well turn out that the version of Perl is simply too
big for my needs anyway.

_
_/ \___
Anony-mouse says o_/O _/ \
"Eek-eek-eek!" \__/_|_/_|\____/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top