problem with regex

P

Paul Johnston

Hi
I have a file encoded using unicode (utf-8) on a Redhat 9 system and
using Perl 5.8.0
It contains mixed estonian and English like below:

<ee> Kaks vana sõpra </ee>
<en> Two old friends </en>
<ee> Tere Piret ! </ee>
<en> Hello Piret ! </en>
<ee> Tere Tõnu ! </ee>
<en> Hello Tõnu ! </en>

I need to do some processing but the expression
(/õ/) will not match with the õ in any line
The perl script and the file I wish to process were both created using
the same editor (kedit) so I assume they are encoding using the same
scheme.
Any ideas why I cannot for example extract all lines which contain
this symbol "õ"
TIA
Paul
 
P

Paul Lall

Hi
I have a file encoded using unicode (utf-8) on a Redhat 9 system and
using Perl 5.8.0
It contains mixed estonian and English like below:

<ee> Kaks vana sõpra </ee>
<en> Two old friends </en>
<ee> Tere Piret ! </ee>
<en> Hello Piret ! </en>
<ee> Tere Tõnu ! </ee>
<en> Hello Tõnu ! </en>

I need to do some processing but the expression
(/õ/) will not match with the õ in any line
The perl script and the file I wish to process were both created using
the same editor (kedit) so I assume they are encoding using the same
scheme.
Any ideas why I cannot for example extract all lines which contain
this symbol "õ"
TIA
Paul


Without having seen your code, my guess would be that your locale is not
correctly set up. See perldoc perllocale and perldoc locale

Paul Lalli
 
B

Ben Morrow

Paul Lall said:
Without having seen your code, my guess would be that your locale is not
correctly set up. See perldoc perllocale and perldoc locale

NO! Don't mix locales and unicode with 5.8. It doesn't work.

If you wish to use utf8 literals in your source, you have to 'use
utf8;' at the top.

Ben
 
P

Paul Johnston

NO! Don't mix locales and unicode with 5.8. It doesn't work.

If you wish to use utf8 literals in your source, you have to 'use
utf8;' at the top.

Ben

Just as a follow up I have discover the script works i.e matches õ on
Solaris 5.8 Perl version 5.005
However adding
use utf8; to the script on the Redhat machine also works so my
problems have been solved (for now at least :) )
Many thanks
Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Problem with code 2
help with regex 7
Regex help 2
Clickable link conversion regex? 0
Regex problem 2
help with regex matching multiple %e 0
Big problem I need to solve with some unix utils 1
regex problem 6

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top