utf8 in regexp (perl 5.8.1)

W

Wes Groleau

I have a file containing thousands of Spanish words, encoded AFAIK)
in UTF-8. I also have a perl script in UTF-8, which says (hope
pasting works):

#!/usr/bin/perl -w -CSD
#
# NOTE: The extra space in the bang line is mandatory (bug in perl 5.8)
use warnings;
use strict;
use utf8;

while (<>)
{
print if ( /ñ/ )
}

What is in the regexp is supposed to be "small n with tilde"
and I verified with od -xc that it is hex C3 B1 as is every
place in the file where that letter appears.

The script is intended to find all words containing that
letter. But it finds nothing. After wading through gallons
of text (man encoding, man utf8, man perlunicode, etc.),
I still had no reason to think it was wrong. But I added

use encoding "utf8";

and ran it again, getting only:

Ze-Admins-Computer:~/Desktop wgroleau$ char-find palabras.utf8
Malformed UTF-8 character (unexpected non-continuation byte 0x00,
immediately after start byte 0xde) at
/Volumes/Parents/wgroleau/bin/char-find line 12.

?!? According to 'od -xc' the script does NOT contain any
byte that is 0xde In fact, the ONLY bytes in the script that
are not ASCII are the bytes for the "enye" which are on line
twelve, but neither of them is a DE and NO bytes are 00.

Have I found a bug in perl or is my ignorance just getting
the best of me?

Oh, yeah, I also tried a few things with 'binmode' that didn't
work either.

WWG
 
W

Wes Groleau

Wes said:
[problems with]
use utf8;

while (<>)
{
print if ( /ñ/ )
}

I removed "use utf8" and it worked. So I think it's
a bug, especially since
use encoding "utf8";
caused

Malformed UTF-8 character (unexpected non-continuation byte 0x00,
immediately after start byte 0xde) at
/Volumes/Parents/wgroleau/bin/char-find line 12.

?!? According to 'od -xc' the script does NOT contain any
byte that is 0xde In fact, the ONLY bytes in the script that
are not ASCII are the bytes for the "enye" which are on line
twelve, but neither of them is a DE and NO bytes are 00.

--
Wes Groleau

A pessimist says the glass is half empty.

An optimist says the glass is half full.

An engineer says somebody made the glass
twice as big as it needed to be.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top