trouble processing non-English text

D

DavidK

Hello,

I am trying to process some Greek text using Perl. Strangely, I can
print out the text properly but when I try to assign the text to a
variable or do some processing, it fails.

The data file is:

1 και
2 να

My program is:

#!/usr/bin/perl -w
use strict;
use encoding "greek";

my %symbols = ();

open(FILE, "$file");

while (my $line = <FILE>) {
chomp($line);

my @fields = split(/\s+/, $line);

my $num_fields = @fields;

if ($num_fields == 2) {

my $freq = shift(@fields);
my $word = shift(@fields);

print "$word\n";

my @letters = split(//, $word);

foreach my $letter (@letters) {
$symbols{$letter} = 1;

print "$letter -> $letter_test\n";
}

print "\n";
}
}

The output is:

και
� ->
� ->
� ->
� ->
� ->
� ->

να
� ->
� ->
� ->
� ->

I've done some reading on the web and I still can't figure out what's
happening.

I'd appreciate any help. Thanks!
 
D

Dr.Ruud

DavidK said:
I am trying to process some Greek text using Perl. Strangely, I can
print out the text properly but when I try to assign the text to a
variable or do some processing, it fails.
[...]
use encoding "greek";
[...]
The output is:

και
� ->
[...]

In what sense does it fail?

What does `echo $LANG` show you?
 
D

DavidK

Thanks for the responses!

My $LANG variable is set to en_US.UTF-8.

The file I thought was in ISO8859-7 is actually UTF-8. I should have
been opening the file with >

open(my $FILE, "<:encoding(UTF-8)", $file)
or die "can't open '$file': $!";

I also had to format the output with

binmode STDOUT, ":utf8";

to view it properly.

Thanks again. It seems to be working now. thank you ben for the Perl
style tips.

I'm sorry about the confusing source code. I tried to simplify it and
I removed some lines by mistake.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top