molsted said:
Sample pattern:
&00Antiques^M
&00Antiquit<0x00E4>ten^M
&00Antiquit<0x00E9>s^M
&00Antig<0x00FC>edades^M
&00Antikviteter^M
That is NOT the pattern to be matched!
The pattern to be matched is:
&00([^\r]+)\r\n
Those are (meant to be) the strings that the pattern is to be matched against.
The reason that none of those strings match the pattern is because
none of those strings contain a carriage return, and the pattern requires
a carriage return.
A hex dump, such as from xxd, shows that there are no carriage returns
in that data. Each lines ends with a caret (ASCII 0x5e), an upper
case "M" (ASCII 0x4d) and a linefeed (ASCII 0x0a):
0000000: 2630 3041 6e74 6971 7565 735e 4d0a 2630 &00Antiques^M.&0
^^ ^^^^
0000010: 3041 6e74 6971 7569 743c 3078 3030 4534 0Antiquit<0x00E4
0000020: 3e74 656e 5e4d 0a26 3030 416e 7469 7175 >ten^M.&00Antiqu
^^^^ ^^
0000030: 6974 3c30 7830 3045 393e 735e 4d0a 2630 it<0x00E9>s^M.&0
^^ ^^^^
If you cannot figure out how to post data with the line endings that
are actually in your data, then write the data in Real Perl Code.
(that sounds familiar...)
instead of
while ( <FILE> ) {
put the data into an array and loop over the array:
my @lines = ( "&00Antiques\r\n", "&00Antiquit<0x00E4>ten\r\n", ...
foreach ( @lines ) {
The file is generated on a Windows PC (\r\n),
my file needs to end up as a UNIX-file on Mac OS X
Then all you need to do is delete all of the carriage returns before
matching:
tr/\r//d;
and change the pattern to not require carriage returns.
The first file had accidently been opened on a Mac, hence the \r end
of line.
That explains it then.
On Linux/OS X the input operator, <>, reads until it finds a newline.
Since there were no newlines, a single read gets the entire file in one go.