i18n?

  • Thread starter Martin Str|mberg
  • Start date
M

Martin Str|mberg

I've been looking for i18n for perl so that I can't get error messages
in Swedish. I haven't found anything, not _any_ other language.

Is there no such thing? Or have I just searched the wrong places or
for the wrong things?
 
O

Oliver 'ojo' Bedford

Am Wed, 02 Dec 2009 10:52:08 +0000 schrieb Martin Str|mberg:
I've been looking for i18n for perl so that I can't get error messages
in Swedish. I haven't found anything, not _any_ other language.

Is there no such thing? Or have I just searched the wrong places or for
the wrong things?

I haven't really thought about it, but do you think that this would
be a good thing?

The application should be localised, so the user can easily understand
the error message. But for the programming language itself, I find it
more important to be able to search the web
(or any other source) for the literal message to get further help.
Besides I would consider english nowadays as the lingua franca for
science and technology.

Sorry for not being really helpful.

Oliver
 
M

Martin Str|mberg

Which messages are you talking about? Those generated by perl itself
cannot be localised. There has been some discussion of the possibility
in the past, but (among other things) there's just too much code out
there that does regex matching on $@ for this to be very practical
(start with diagnostics.pm, for instance). Most modules don't provide
localized error messages either.

Yes I mean those generated by perl itself. I would help my son, whom
I'm trying to teach perl, understand what's wrong.
 
I

Ilya Zakharevich

That is for module authors to use to internationalise their modules. It
doesn't help with the messages perl generates itself.

Thinking about it more:

A) I find the argument that Perl is a "lingua franca of science and
technology" very misplaced. *A Perl application* may be designed
for use by children and/or music lovers.

B) We need something which works as English during parsing, but is
printed out in Bengali. This looks like overloading.

Unfortunately, when I designed overloading, I did not think *at
all* about overloading objects with string semantic. There was
no provision to treat REx matching specially during overloading,
treat print() specially, treat substr() specially, treat index()
specially etc... AFAIK, the current overloading behaves the same
way.

Is there code which analyses $@ any other way than doing =~? If
not, then making overloaded-stringify inspect whether it is
called during REx-matching might me possible (at least with some
modification of perl C code to make the latter condition easier
recognizable...).

Basically, what I think of is making $@ into a 2-headed beast, with 2
different STRING values. What do people think?

Ilya
 
I

Ilya Zakharevich

Um, yes? Who are you arguing with? :)

Whoever wrote this (earlier in the thread).
There is now. I submitted a patch to add 'qr' overloading to 5.12, which
is called when an overloaded object is used on the RHS of =~ or is
interpolated into a regex. Since 5.12 has true REGEX SVs, it seemed
silly not to have a corresponding 'type-cast' overload.

Good. But what I was "hinting at" was the LHS of "=~" ...; which is a
string. And "a string" != "a REx" (after appropriate overloading of != ;-)
There was some discussion of this issue on p5p a while ago, though that
was before qr-overload was different from string-overload. IIRC the
general feeling was that trying to get people to move away from
string-matching $@ by defining a set of numeric error codes for the core
perl errors was probably the best way forward.

I would hate numeric errors.

(When my system boots, there is a chance
of getting an error message which essentially says:

!!! SYS2025
!!! SYS2027

Very enjoyable. (Explanation: the system knows many different ways to
expand these to much more human-readable form. But
the system did not even start booting yet, so has no
idea in which language to bash you...))

I would very much prefer "short descriptive english strings" approach.
ERROR_DISK_READ, ERROR_DISK_NONBOOT would have similar (un)convenience
for paper manual lookup, and would have some chance for the meaning to
be guessed without the manual. [*]

Anyway, this is a pipe dream. The code DOES do $@ =~ /foo/.

Ilya

[*] P.S. On the other hand, one of my friends worked as
"non-customer support" in a certain establishment of
more or less technical nature. Non-customers were kinda
engineers. So a call about the error above might sound like:

The 1st and 3rd symbols look like snakes; the second forks;
then comes number 2025. What to do?

(I did not believe this at first, but this did not sound
like a joke.) In such situation, reducing number of
distinct non-digits to two DOES help...
 
I

Ilya Zakharevich

[*] P.S. On the other hand, one of my friends worked as
"non-customer support" in a certain establishment of
more or less technical nature. Non-customers were kinda
engineers. So a call about the error above might sound like:

The 1st and 3rd symbols look like snakes; the second forks;
then comes number 2025. What to do?

(I did not believe this at first, but this did not sound
like a joke.) In such situation, reducing number of
distinct non-digits to two DOES help...

It sounds entirely believable to me, assuming the caller spoke a
language that doesn't use the Latin alphabet but does use Arabic
numerals.

Of course. What made is hard to believe is that the country in
question is (at least up to some extend) a part of "extended Europe"
nowadays (IIRC, it takes part in some European sport competitions etc).

And AFAIU the story does not make sense applied to, e.g., Russian-speakers...

Ilya
 
M

Martin Str|mberg

That is the problem, yes.

I'm not a perl implementation hacker, but I have problems seeing the
problem.

We take that i18n module and make it a pragma (if it isn't
already). Or invent a new one. In my example below I've called this
pragma "i18n".

Then if "no i18n" is in effect, which is the default "$@" will be what
it always has been, i. e. in English.

However is a program/module uses the pragma "use i18n Swedish" or "use
i18n svenska" or whatever the syntax should be, then every "$@" in
this block/context (or whatever the term is) will suddenly be in
Swedish. And as the programmer added the "use i18n Swedish" he will be
aware of this so he will match "$@" on Swedish strings.


Perhaps this is already what that i18n module on CPAN does.

And the problem is in the perl implementation code there is a lot
matching "$Â@" on English strings, which are the strings I want to be
Swedish.


Hmm. Perhaps we should just make a perl wrapper (perlint?) that
translates the output from a perl program if it matches a perl error
message, like "perlint Swedish my_perl_script.pl"? Or perhaps only if
the exit status indicates failure?



MartinS
 
I

Ilya Zakharevich

I'm not a perl implementation hacker, but I have problems seeing the
problem.

We take that i18n module and make it a pragma (if it isn't
already). Or invent a new one. In my example below I've called this
pragma "i18n".

First question: do you understand difference between lexical and
dynamic scope? If you do, which one is your pragma implementing, and
why would this help?
However is a program/module uses the pragma "use i18n Swedish" or "use
i18n svenska" or whatever the syntax should be, then every "$@" in
this block/context (or whatever the term is) will suddenly be in
Swedish.

What do you mean by "in this block/context"? Created in this context?
Read in this context?

Do you understand that some $@ are created by Perl executable, and
some by scripts? Which do you mean?
And the problem is in the perl implementation code there is a lot
matching "$Â@" on English strings, which are the strings I want to be
Swedish.

Can't parse what you wanted to say...

Yours,
Ilya
 
I

Ilya Zakharevich

There is now. I submitted a patch to add 'qr' overloading to 5.12, which
is called when an overloaded object is used on the RHS of =~ or is
interpolated into a regex. Since 5.12 has true REGEX SVs, it seemed
silly not to have a corresponding 'type-cast' overload.

Ah... This is probably the reason why FreezeThaw is failing its tests
on 5.11... Whoever added "true REx" SV did not fix the modules broken
by this change...

Ilya
 
I

Ilya Zakharevich

I think the most immediate response from p5p would be 'that's what the
new ~~ operator is for, which is already overloadable', and in general I
would agree that having two 'stringifications' was seriously confusing.
However, since this is (just) for back-compat hacks, it's possible a
case could be made. Maybe I should just do up a patch...

Myself, I would not be so quick. As I see it, the problem with
designing a reasonable string-overloading framework is that I do not
have many SIGNIFICANTLY different models to serve as examples
applications of this framework.

I agree with Larry's estimate that "to implement a feature, I must
want it 3 times first". Usually, three "orthogonal" applications
provide enough insight to design a pilot semantic. However, I have
only 2.3 examples in mind:

a) potentially infinite streams: consider
$Pi = infinite_precision_Pi;
print OK if $Pi =~ /123456789/;
or consider
$/ = qr/[1-9][0-9]{50}/; # or something more vicious
$in = <STDIN>; # assume a pipe
system $external_program;

(in second example, we may want to gobble as few characters as
possible while achieving the match for $/).

b1) Strings with out-of-bound markup. E.g., colored output to TTY;
or *parsed* HTML streams (the string value is what you get by
cut&paste, but all the formating info is there, just out-of-bound).

You want to look for certains "features" (e.g., match RExes on
in-bound content + some restrictions on out-of-bound - as in:
find "foo|bar" at start of a "subdivision" [table cell, or div,
or whatsit]).

b2) Same, only not for read-only access, but for modification (as in
my interview to Perl Journal). E.g., suppose you want to
translate a chunk of data from HTML to LaTeX *inplace* (i.e., as
s/// is doing); the translation rules are very non-local; one
must either

re-gather all the non-local information at again and again at
each point, or

gather it once, put it in markup, and use "local structure of
markup at every point" instead of re-gathering.

After this, to do actual translation, one wants to do needed
s/// without ruining the gathered non-local information.

(This is essentially what I do in cperl-mode to facify RExes;
the only difference is that in CPerl, I only touch out-of-bound
part of content. Consider the case when I need to use these
markups to convert RExes from Perl syntax to Emacs syntax...)

(As I said in the interview, Emacs has much better facilities
for string processing than Perl. One of the purposes of string
overloading should be to narrow the gap.)

It would be wonderful if one could use "2-headed 2-language strings"
as the third application of string overloading. The problem is that I
have no idea how one would like to EDIT such strings.

For example: on English strings, one could do something like
s/each/every/g; would one want to do s/$each/$every/ (with suitably
constructed $each and $every) on 2-language strings? So far this
looks too silly to be a help in semantical design...

Hope this helps,
Ilya
 
I

Ilya Zakharevich

As for fixing the modules: with the number of modules on CPAN, this is
impossible. People writing XS modules that grovel around in perl's guts
are expected to keep up with p5p. I believe some effort is made to smoke
CPAN and fix breakages (certainly there was a big push to get the CPAN
smokes clean before 5.10) but in the end p5p can't fix everything.

IMO, serialization is not "everything". But I'm, of course, biased...

Ilya
 
I

Ilya Zakharevich

(Regardless of the relative merits) FreezeThaw is not the 'standard'
serialization module. That would be Storable, which is core and thus
*will* have been fixed by p5p. AIUI, though, part of the reasoning
behind the current push to move modules out of core is to reduce the
amount of code the pumpking has to keep up to date.

Note that there is a kinda contradiction in what you wrote.
"FreezeThaw is not the 'standard'" ONLY because Storable was pushed TO
the core.

Yours,
Ilya
 
M

Martin Str|mberg

First question: do you understand difference between lexical and
dynamic scope? If you do, which one is your pragma implementing, and

Sort of. I know lexical scope. That's what makes closures possible, e. g.
Not sure exactly what dynamic scope is. My memory is fuzzy, but was
that what local did?
why would this help?

I was thinking lexically. The code is written to match English or
e. g. Swedish.

However I suppose if whatever is put in $@ is done in an "English"
module and then will be printed on screen for the user I surely would
like that to be in Swedish, which (I think) implies some dynamic
stuff.

What do you mean by "in this block/context"? Created in this context?

Just that you can do this with pragmas:

use strict;
# Stricly coded code here.
{
no strict;
# Nasty code here.
}
# Stricly coded code here.
Read in this context?

I thought read in this context.

Do you understand that some $@ are created by Perl executable, and
some by scripts? Which do you mean?

Doesn't matter. When $@ is read while "use i18n Swedish" is in effect
it should be in Swedish, which for me implies some dynamic translation

Can't parse what you wanted to say...

Sorry. I was trying to ask if the perl implementation has code
something like this

if( $@ =~ m/Magic string/ ) {
# Do this.
} else if( $@ =~ m/Another magic string/ ) {
# Do that.
}


Thanks!
 
M

Martin Str|mberg

*What* should be in Swedish? Messages from the perl C code? Messages
from modules?

Perhaps I misunderstand something? This is how it should work:

$@ = "the water";
use i18n Swedish;
print "$@\n"; # Prints "vattnet".
use i18n German;
print "$@\n"; # Prints "der Wasser". (Sorry for any genus maltreatment.)
use i18n French;
print "$@\n"; # Prints "Ãl'oeau". (Sorry for any misspelling.)

Except (by necessesity) limited to whatever strings are in perl (and
modules when completed).
 
M

Martin Str|mberg

*What* should be in Swedish? Messages from the perl C code? Messages
from modules?

Perhaps I misunderstand something? This is how it should work:

$@ = "the water";
use i18n Swedish;
print "$@\n"; # Prints "vattnet".
use i18n German;
print "$@\n"; # Prints "der Wasser". (Sorry for any genus maltreatment.)
use i18n French;
print "$@\n"; # Prints "Ãl'oeau". (Sorry for any misspelling.)

Except (by necessesity) limited to whatever strings are in perl (and
modules when completed).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,054
Latest member
LucyCarper

Latest Threads

Top