russian chars

B

Bart van den Burg

Hi

I'm starting to get some interest in east european languages (by now i know
a few words Latvian), and I'm trying to convert russian texts to fonetical
english. However, I think Perl is tripping over the charsets here, cause all
output i get from running the script below is

Use of uninitialized value in print at russisch.pl line 20.

25 times

Now i'm wondering: am i missing something extremely stupid here, or is this
really a problem? And if the latter is true: What can I do about it?
I save the text as UTF-8 in windows over samba to a linux computer... when I
do "cat russisch.pl" on an SSH shell with charset set to UTF-8, all is
correct

Thx
Bart

------------------------------------
#!/usr/bin/perl -w
use strict;

my $line = "????????????!";

my %write = (
"?" => "a",
"?" => "d",
"?" => "Z",
"?" => "r",
"?" => "v",
"?" => "s",
"?" => "t",
"?" => "oo",
"?" => "i",
"e" => "ye",
);

foreach (split(//, $line)) {
print $write{$_};
}

print "\n";
 
J

James Henson

Hi Bart,
Use of uninitialized value in print at russisch.pl line 20.

This probably means that you try to print a value that
isn't in your hash. Your example is somewhat distorted, I
assume the question marks should be utf-8 encoded
cyrillic chars.
my $line = "????????????!";

This would be your input line, the line you are trying
to translate?
my %write = (
"?" => "a",
"?" => "d",
"?" => "Z",
"?" => "r",
"?" => "v",
"?" => "s",
"?" => "t",
"?" => "oo",
"?" => "i",
"e" => "ye",
);

Here you establish some mappings from the cyrillic
characters to phonetic spelling, I presume.
foreach (split(//, $line)) {
print $write{$_};
}

I think this should work, but I guess your input line
contains something that is not in your map set.

Perhaps you can try to see if $write{$_} is defined,
and if it isn't just print $_ instead of the mapping?

Hope this helps,
James
 
L

LÄÊ»ie Techie

Hi

I'm starting to get some interest in east european languages (by now i know
a few words Latvian), and I'm trying to convert russian texts to fonetical
english. However, I think Perl is tripping over the charsets here, cause all
output i get from running the script below is

Use of uninitialized value in print at russisch.pl line 20.

25 times

OK, that would indicate that $write{$_} is not defined. Are you positive
that $write{$_} is defined for all the characters of your input string?
perhaps (${$_} || $_) would be better.
Now i'm wondering: am i missing something extremely stupid here, or is this
really a problem? And if the latter is true: What can I do about it?
I save the text as UTF-8 in windows over samba to a linux computer... when I
do "cat russisch.pl" on an SSH shell with charset set to UTF-8, all is
correct

Let's start by consulting a few pages from the documentation:
perldoc perluniintro
perldoc perlunicode

If using Perl < 5.8, use utf8 pragma!
Also, as a compatibility measure, the "use utf8" pragma must be explicitly
included to enable recognition of UTF-8 in the Perl scripts themselves (ie
UTF-8 string literals)
my $line = "????????????!";

Escape non latin-1 characters.

For example, my nick is spelled "L\x{101}\x{2BB}ie Techie"
\x{101} is an 'a' with a macron (called kahako in Hawaiian)
\x{2BB} is the glottal stop (okina)

Perl will replace these with the actual value at run-time, so the
performance hit is next to nil.
my %write = (
"?" => "a",
"?" => "d",
"?" => "Z",
"?" => "r",
"?" => "v",
"?" => "s",
"?" => "t",
"?" => "oo",
"?" => "i",
"e" => "ye",
);

You may have to specify the encoding using binmode so the Perl won't
complain about wide characters.
foreach (split(//, $line)) {
print $write{$_};
}
}
print "\n";

Aloha,
La'ie Techie
 
T

Ted Zlatanov

I'm starting to get some interest in east european languages (by now
i know a few words Latvian), and I'm trying to convert russian texts
to fonetical english.

Without commenting on your particular script, have you looked at the
CPAN Convert::Translit, Lingua::RU::Translit, etc. modules?

http://search.cpan.org/search?query=translit&mode=all

What you're trying to do is called "transliteration" and it's a
well-known algorithm for Russian writing. You can just do a
web/newsgroup search for "cyrillic transliteration perl" and you'll
probably find lots of useful information. I did.

Note the text may be Russian, but the characters (and alphabet) are
Cyrillic. Being Bulgarian, I notice those things :)

Ted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,273
Latest member
DamonShoem

Latest Threads

Top