russian chars

Bart van den Burg · Nov 14, 2003

Hi

I'm starting to get some interest in east european languages (by now i know
a few words Latvian), and I'm trying to convert russian texts to fonetical
english. However, I think Perl is tripping over the charsets here, cause all
output i get from running the script below is

Use of uninitialized value in print at russisch.pl line 20.

25 times

Now i'm wondering: am i missing something extremely stupid here, or is this
really a problem? And if the latter is true: What can I do about it?
I save the text as UTF-8 in windows over samba to a linux computer... when I
do "cat russisch.pl" on an SSH shell with charset set to UTF-8, all is
correct

Thx
Bart

------------------------------------
#!/usr/bin/perl -w
use strict;

my $line = "????????????!";

my %write = (
"?" => "a",
"?" => "d",
"?" => "Z",
"?" => "r",
"?" => "v",
"?" => "s",
"?" => "t",
"?" => "oo",
"?" => "i",
"e" => "ye",
);

foreach (split(//, $line)) {
print $write{$_};
}

print "\n";

James Henson · Nov 14, 2003

Hi Bart,

Use of uninitialized value in print at russisch.pl line 20.

This probably means that you try to print a value that
isn't in your hash. Your example is somewhat distorted, I
assume the question marks should be utf-8 encoded
cyrillic chars.

my $line = "????????????!";

This would be your input line, the line you are trying
to translate?

my %write = (
"?" => "a",
"?" => "d",
"?" => "Z",
"?" => "r",
"?" => "v",
"?" => "s",
"?" => "t",
"?" => "oo",
"?" => "i",
"e" => "ye",
);

Here you establish some mappings from the cyrillic
characters to phonetic spelling, I presume.

foreach (split(//, $line)) {
print $write{$_};
}

I think this should work, but I guess your input line
contains something that is not in your map set.

Perhaps you can try to see if $write{$_} is defined,
and if it isn't just print $_ instead of the mapping?

Hope this helps,
James

LÄÊ»ie Techie · Nov 18, 2003

Hi

I'm starting to get some interest in east european languages (by now i know
a few words Latvian), and I'm trying to convert russian texts to fonetical
english. However, I think Perl is tripping over the charsets here, cause all
output i get from running the script below is

Use of uninitialized value in print at russisch.pl line 20.

25 times

OK, that would indicate that $write{$_} is not defined. Are you positive
that $write{$_} is defined for all the characters of your input string?
perhaps (${$_} || $_) would be better.

Now i'm wondering: am i missing something extremely stupid here, or is this
really a problem? And if the latter is true: What can I do about it?
I save the text as UTF-8 in windows over samba to a linux computer... when I
do "cat russisch.pl" on an SSH shell with charset set to UTF-8, all is
correct

Let's start by consulting a few pages from the documentation:
perldoc perluniintro
perldoc perlunicode

Thx
Bart

If using Perl < 5.8, use utf8 pragma!
Also, as a compatibility measure, the "use utf8" pragma must be explicitly
included to enable recognition of UTF-8 in the Perl scripts themselves (ie
UTF-8 string literals)

my $line = "????????????!";

Escape non latin-1 characters.

For example, my nick is spelled "L\x{101}\x{2BB}ie Techie"
\x{101} is an 'a' with a macron (called kahako in Hawaiian)
\x{2BB} is the glottal stop (okina)

Perl will replace these with the actual value at run-time, so the
performance hit is next to nil.

my %write = (
"?" => "a",
"?" => "d",
"?" => "Z",
"?" => "r",
"?" => "v",
"?" => "s",
"?" => "t",
"?" => "oo",
"?" => "i",
"e" => "ye",
);

You may have to specify the encoding using binmode so the Perl won't
complain about wide characters.

foreach (split(//, $line)) {
print $write{$_};
}
}
print "\n";

Aloha,
La'ie Techie

Ted Zlatanov · Nov 18, 2003

I'm starting to get some interest in east european languages (by now
i know a few words Latvian), and I'm trying to convert russian texts
to fonetical english.

Without commenting on your particular script, have you looked at the
CPAN Convert::Translit, Lingua::RU::Translit, etc. modules?

http://search.cpan.org/search?query=translit&mode=all

What you're trying to do is called "transliteration" and it's a
well-known algorithm for Russian writing. You can just do a
web/newsgroup search for "cyrillic transliteration perl" and you'll
probably find lots of useful information. I did.

Note the text may be Russian, but the characters (and alphabet) are
Cyrillic. Being Bulgarian, I notice those things

Ted

Translater + module + tkinter	1	Feb 16, 2023
hex dump w/ or w/out utf-8 chars	40	Jul 8, 2013
s modifier doesn't seem to work	20	Aug 10, 2013
utf8, length and syswrite are killing me	2	Feb 17, 2010
First Commercial Perl Program	30	Mar 10, 2012
writing wide chars	2	Aug 14, 2006
Confused by utf8/sysread/syswrite/DBD::Pg	1	Dec 29, 2009
Problem with AJAX and Special Chars (Perl, MySQL 4)	1	Mar 16, 2006

russian chars

Bart van den Burg

James Henson

LÄÊ»ie Techie

Ted Zlatanov

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads