search/replace

M

molsted

Hi all
I'm trying to relpace some strings in a textfile like this:
&00Antiques^M&00Antiquit<0x00E4>ten^M&00Antiquit<0x00E9>s^M&00Antig<0x00FC>edades^M&00Antikviteter^M

I've tried the following:
s/&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n/
<Style:GB>$1\n<Style:DE>$2\n<Style:FR>$3\n<Style:ES>$4\n<Style:DK>$5/
g;

with no luck.
 
T

Tad J McClellan

molsted said:
I'm trying to relpace some strings in a textfile like this:
&00Antiques^M&00Antiquit<0x00E4>ten^M&00Antiquit<0x00E9>s^M&00Antig<0x00FC>edades^M&00Antikviteter^M



Does your data really have caret-M in it or does it instead have
carriage return-linefeed in it?

You should write the data in Real Perl Code so that there is no ambiguity.

Have you seen the Posting Guidelines that are posted here frequently?


--------------------------
#!/usr/bin/perl
use warnings;
use strict;

my @lang = qw/ <Style:GB> <Style:DE> <Style:FR> <Style:ES> <Style:DK> /;

$_ = "&00Antiques\r\n&00Antiquit<0x00E4>ten\r\n&00Antiquit<0x00E9>s\r\n"
. "&00Antig<0x00FC>edades\r\n&00Antikviteter\r\n";
print;
print "\n";

my $capture_num = 0;
s/&00([^\r]+)\r\n/$lang[$capture_num++]$1\n/g;
print;
 
A

Alex

molsted said:
Hi all
I'm trying to relpace some strings in a textfile like this:
&00Antiques^M&00Antiquit<0x00E4>ten^M&00Antiquit<0x00E9>s^M&00Antig<0x00FC>edades^M&00Antikviteter^M

I've tried the following:
s/&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n&00(.+?)\r\n/
<Style:GB>$1\n<Style:DE>$2\n<Style:FR>$3\n<Style:ES>$4\n<Style:DK>$5/
g;

What does the '^M' mean to you? My editor shows carriage returns as
'^M', but I see you're search for a carriage return followed by a
newline. Since not all line breaks are the same (it depends on your
system), you want to match "at the end of the line". You want to use $
to match "just before the end of a line" and .*? to chomp off the the
line breaking characters. This requires using the flags s and m. The
x-flag permits whitespace in your expression, which improves readability.

Here's a version that works on my system:

s/


^ & 00 ([^\r\n]+?)$ .*?



^ & 00 ([^\r\n]+?)$ .*?



^ & 00 ([^\r\n]+?)$ .*?



^ & 00 ([^\r\n]+?)$ .*?



^ & 00 ([^\r\n]+?)$ .*?



/<Style:GB>$1\n<Style:DE>$2\n<Style:FR>$3\n<Style:ES>$4\n<Style:DK>$5/msx;


Note: The white space in "^ & 00 ([^\r\n]+?)$ .*?" is ignored, so it
really means "^&00([^\r\n]+?)$.*?", which means "At the start of a line,
match an ampersand, followed by two zeros, followed by any number of
characters which are not carriage returns or line feeds, just before the
end of the line".

HTH!
 
A

Alex

Alex meant to write:
s/

^ & 00 ([^\r\n]+?)$ .*?
^ & 00 ([^\r\n]+?)$ .*?
^ & 00 ([^\r\n]+?)$ .*?
^ & 00 ([^\r\n]+?)$ .*?
^ & 00 ([^\r\n]+?)$ .*?
/<Style:GB>$1\n<Style:DE>$2\n<Style:FR>$3\n<Style:ES>$4\n<Style:DK>$5/msx;

And sorry for all the extra lines, which my ng-reader added for me.
 
M

molsted

Does your data really have caret-M in it or does it instead have
carriage return-linefeed in it?

You should write the data in Real Perl Code so that there is no ambiguity..

Have you seen the Posting Guidelines that are posted here frequently?

--------------------------
#!/usr/bin/perl
use warnings;
use strict;

my @lang = qw/ <Style:GB> <Style:DE> <Style:FR> <Style:ES> <Style:DK> /;

$_ = "&00Antiques\r\n&00Antiquit<0x00E4>ten\r\n&00Antiquit<0x00E9>s\r\n"
   . "&00Antig<0x00FC>edades\r\n&00Antikviteter\r\n";
print;
print "\n";

my $capture_num = 0;
s/&00([^\r]+)\r\n/$lang[$capture_num++]$1\n/g;
print;
--------------------------

Hi Tad,
I haven't seen Posting Guidelines, this my first post to the group,
can i read them some where?
I'm going with your suggestion but it only matches the first line.
However if I put more sequences in the @lang-array it will work.
How would I overcome that?

----------------------------

#!/usr/bin/perl

use strict;

my $fileName=$ARGV[0];

open(FILE,"$fileName") || die("Cannot Open File");

my(@fcont) = <FILE>;

close FILE;

open(FOUT,">$fileName.txt") || die("Cannot Open File");

foreach my $line (@fcont) {

$line =~ s/\r/\r\n/g;

#### METHOD #1 BEGIN ####

my @lang = qw/ <Style:GB> <Style:DE> <Style:FR> <Style:ES>
<Style:DK> /;
my $capture_num = 0;
$line =~ s/&00([^\r]+)\r\n/$lang[$capture_num++]$1\n/g;

#### METHOD #1 END ####

print FOUT $line;
}
close FOUT;

exit 0
 
T

Tad J McClellan

molsted said:
I haven't seen Posting Guidelines, this my first post to the group,
can i read them some where?

http://tinyurl.com/dg27de


I'm going with your suggestion but it only matches the first line.


To analyse the behavior of a pattern match, we need two things:

1) the pattern that is to be matched
2) the string that the pattern is to be matched against

Since we only have access to one of them, we cannot analyse why it
fails to match.

#!/usr/bin/perl

use strict;


use warnings;

my $fileName=$ARGV[0];

open(FILE,"$fileName") || die("Cannot Open File");


You should not quote lone variables:

perldoc -q vars

What's wrong with always quoting "$vars"?

You should use the 3-argument form of open() and a lexical filehandle.

You should include the name of the file in the diag message.

You should put delimiters around the filename in your diag message.

You should include the $! variable in the diag messages.


open my $FILE, '<', $file_name or die "could not open '$file_name' $!";

my(@fcont) = <FILE>;


my @fcont = <$FILE>;

(but see below)

foreach my $line (@fcont) {


You should not read the entire file into memory if you only need
one line of the file at a time.

$line =~ s/\r/\r\n/g;


Why are you doing this?

Is the file a MAC-OS (not OS X) text file?

It is too late to fix line endings after you have used <> to read "lines".

You need to fix them *before* applying the <> operator.

Perhaps by setting the $/ variable to an appropriate value.
 
M

molsted

To analyse the behavior of a pattern match, we need two things:

1) the pattern that is to be matched

Sample pattern:
&00Antiques^M
&00Antiquit<0x00E4>ten^M
&00Antiquit<0x00E9>s^M
&00Antig<0x00FC>edades^M
&00Antikviteter^M

Sample output:
<Style:GB>Antiques
<Style:DE>Antiquit<0x00E4>ten
<Style:FR>Antiquit<0x00E9>s
<Style:ES>Antig<0x00FC>edades
<Style:DK>Antikviteter

All on seperate lines. The file is generated on a Windows PC (\r\n),
my file needs to end up as a UNIX-file on Mac OS X

The first file had accidently been opened on a Mac, hence the \r end
of line.

I hope this clears things a bit up.

The file is being converted from 1252 to Macroman prior being run
through script (/usr/bin/iconv -f WINDOWS-1252 -t MACROMAN). However I
am considdering using 'Text::Iconv' instead.
 
T

Tad J McClellan

molsted said:
Sample pattern:
&00Antiques^M
&00Antiquit<0x00E4>ten^M
&00Antiquit<0x00E9>s^M
&00Antig<0x00FC>edades^M
&00Antikviteter^M


That is NOT the pattern to be matched!

The pattern to be matched is:

&00([^\r]+)\r\n

Those are (meant to be) the strings that the pattern is to be matched against.

The reason that none of those strings match the pattern is because
none of those strings contain a carriage return, and the pattern requires
a carriage return.

A hex dump, such as from xxd, shows that there are no carriage returns
in that data. Each lines ends with a caret (ASCII 0x5e), an upper
case "M" (ASCII 0x4d) and a linefeed (ASCII 0x0a):

0000000: 2630 3041 6e74 6971 7565 735e 4d0a 2630 &00Antiques^M.&0
^^ ^^^^
0000010: 3041 6e74 6971 7569 743c 3078 3030 4534 0Antiquit<0x00E4
0000020: 3e74 656e 5e4d 0a26 3030 416e 7469 7175 >ten^M.&00Antiqu
^^^^ ^^
0000030: 6974 3c30 7830 3045 393e 735e 4d0a 2630 it<0x00E9>s^M.&0
^^ ^^^^

If you cannot figure out how to post data with the line endings that
are actually in your data, then write the data in Real Perl Code.

(that sounds familiar...)

instead of

while ( <FILE> ) {

put the data into an array and loop over the array:

my @lines = ( "&00Antiques\r\n", "&00Antiquit<0x00E4>ten\r\n", ...
foreach ( @lines ) {

The file is generated on a Windows PC (\r\n),
my file needs to end up as a UNIX-file on Mac OS X


Then all you need to do is delete all of the carriage returns before
matching:

tr/\r//d;

and change the pattern to not require carriage returns.

The first file had accidently been opened on a Mac, hence the \r end
of line.


That explains it then.

On Linux/OS X the input operator, <>, reads until it finds a newline.

Since there were no newlines, a single read gets the entire file in one go.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top