i18n - how best to provide multilingual content

C

CptDondo

I have a small, embedded app that uses a webserver to serve up pages
showing status, etc.

Right now all the pages are hard-coded in English. We need to provide
multi-lingual support.

All of the pages are PHP generated. Ideally, I'd like for the PHP
backend to serve up the language based a) the user's locale, and if that
is not set, its own locale.

The PHP backend creates the pages on the fly from XML templates, so it
wouldn't be that hard for us to change the language.

But... I don't know the best way to do that. What is the current 'state
of the art' for language on demand in web content?

Thanks....
 
T

Toby Inkster

CptDondo said:
But... I don't know the best way to do that. What is the current 'state
of the art' for language on demand in web content?

Do you mean automatic translation; or do you mean serving up the best
choice of human-written translations?

Automatic translations are rubbish -- they are laughably bad, and will
present an entirely unprofessional image. Do not even consider using them,
except on a site that's indented to be ridiculed.

That's not to say that their completely useless -- tools like Babelfish
are useful for the *visitor* if they find a foreign site that they would
like to read -- you can usually get the gist of it. But for the author,
they are rubbish.

For human-written translations, assuming you have got good translators,
the situation is much better. Catering for a visitor in their own language
shows that you're willing to make the extra effort to do business with
them.

Many companies will offer entirely different sites for each language. If
you have the resources to manage such a layout, it is often the best
choice because:

1. It allows URLs to be tailored to the language. e.g.
/en/information/garden/lawn-mower
/fr/renseignement/jardin/fauchage
which should help with multi-lingual search engine optimisation.

2. It allows for a different information focus in each language.
For example, I was once told by a translator that translating
technical manuals between cultures involves so much more than
word-for-word translation. People of different cultures expect
to find different things in their documentation. Americans expect
the manual to be a tour-de-force of the product's unique features,
virtually an advertisement for the product; Western Europeans
expect a fairly dry step-by-step explanation of how to use the
product to accomplish different aims; Eastern Europeans expect
information on how to repair the product when it breaks, as in
their experience, these things inevitably do.

3. It allows you to take baby-steps. Say, you've decided you want
to expand into the German market, but you're not sure how much
business you'll do there, so don't want to invest a lot of money
having your entire site translated into German. You may want to
just create a single page site in German, with basic information
about your company, explain that the site's German translation is
still pending, that there is more information on the English
version of the site, and provide the telephone extension for Gunther,
who works in your New York office, but was born and raised in Munich.
As your German sales take off, you then plough back some of the money
into improving the German site. Perhaps one day, the German market
will be so important to you that you open an office in Berlin, and
allow them to maintain the German site directly.

The other approach with human-written translations is to have a single
site available in multiple languages. For example, you ask/detect a user's
preferred language, and then when they go to:

/information/garden/lawn-mower

a PHP script serves up the information in the correct language. If a
translation is not available for that particular page (say, it's a new
product, so the translators haven't finished with it yet), then you just
serve up the English page. This is a reasonably good method, but it
doesn't have advantages #1 and #2 above. It kind of has #3, but your
baby-steps look a little silly because they end up as a mixture of, in the
above example, German and English. This method can ease maintenance though.

Always be careful not to let the translated versions of the site fall too
far behind the English version in updatedness.
 
A

Andy Dingley

CptDondo said:
Right now all the pages are hard-coded in English. We need to provide
multi-lingual support.

For serious non-XSLT work, consider JSP instead of PHP. The i18n tools
are vastly better. Read the O'Relliy Java internationalization book,
just for a guide to web i18n.
backend to serve up the language based a) the user's locale, and if that
is not set, its own locale.

Make the selection completely user-selectable, with cookie persistence,
with the methods you describe setting the default. It works just the
same by default, but it's more flexible for casual users finding
themselves using other people's computers' It's a real nuisance
otherwise!
The PHP backend creates the pages on the fly from XML templates, so it
wouldn't be that hard for us to change the language.

XML or XSLT ? If you structure the data model reasonably well, it's
not hard to extract text strings stored in groups for each function,
one for each language. It's easier to manage the translation and
deployment though if the text are grouped by language into separate
files and identified by a short identifier. The XSLT document()
function is especially handy.
 
R

Rik

Andy said:
Make the selection completely user-selectable, with cookie
persistence, with the methods you describe setting the default. It
works just the
same by default, but it's more flexible for casual users finding
themselves using other people's computers' It's a real nuisance
otherwise!

Check, the order in which I determine language:
- Explicitly set (by a GET variable, or pseudo one like /en/ or /de/ etc.
taken into a rewrite)
- Cookie
- HTTP-Accept-Language in the header
- Geo-IP info (there are free databases available, which are mostly
accurate enough to determine the coutry most of the time)
- System default

After determining the language the cookie will be sent/overwritten with the
current choice.

Grtz,
 
J

J.O. Aho

Andy said:
XML or XSLT ? If you structure the data model reasonably well, it's
not hard to extract text strings stored in groups for each function,
one for each language. It's easier to manage the translation and
deployment though if the text are grouped by language into separate
files and identified by a short identifier. The XSLT document()
function is especially handy.

Seen such files in the Gnome2 application desktop icons, they only have one
short line in each language, the application description, but those files are
big, think how large files will become if you have 20-30 languages and you
have to replace the big file each time a language is updated or added, it's
easier IMHO to handle files that has only one language and on the server side
script it's easy select the right language file and use a backup if a
translation would be missing.
 
C

Captain Dondo

V Sat, 20 Jan 2007 14:27:37 +0100, Rik napsal(a):
Check, the order in which I determine language:
- Explicitly set (by a GET variable, or pseudo one like /en/ or /de/ etc.
taken into a rewrite)
- Cookie
- HTTP-Accept-Language in the header
- Geo-IP info (there are free databases available, which are mostly
accurate enough to determine the coutry most of the time)
- System default

After determining the language the cookie will be sent/overwritten with the
current choice.

Thanks. I'll probably do something like that - I've thought about
using the 'HTTP-Accept-Language' var from the header. I just don't know
how many people actually set those correctly.

I guess I didn't phrase my question accurately enough; it has been a long
week.

I have XML templates that define item labels in a form. The XML has
various tags that provide nav info and so on. This is on an embedded
system, with only a small number of phrases that would need translation; I
probably have less than 200 phrases, mostly one and two words.

I have XML templates of the following form:

<item id="myname" value="" index="5" type="text">My Name</item>

The PHP backend reads that line, and creates a form entry for myname, with
the label "My Name". What I want to do is to replace the english "My
Name" with the appropriate words in the user's language.

I'm thinking of a mechanism similar to the .po files, where the PHP
backend look up the text in a translation file. Or even something like
this:

<item id="myname" value="" index="5" type="text" text="My Name"/>

and the PHP backend would look up the text value for "My Name" in a lookup
table for the user's language.

(Aside: I guess I failed to use Google correctly yesterday.... PHP has
support for gettext! <http://us3.php.net/gettext> So that's how I think I
will go...)

--Yan
 
R

Rik

Captain said:
V Sat, 20 Jan 2007 14:27:37 +0100, Rik napsal(a):


Thanks. I'll probably do something like that - I've thought about
using the 'HTTP-Accept-Language' var from the header. I just don't
know
how many people actually set those correctly.

Not that many set it themselves, however, most browsers will set during
installation to the most probable language (based in install-languages
choice or for instance OS locale).
This is on an embedded
system, with only a small number of phrases that would need
translation; I probably have less than 200 phrases, mostly one and
two words.
I'm thinking of a mechanism similar to the .po files, where the PHP
backend look up the text in a translation file. Or even something
like this:

<item id="myname" value="" index="5" type="text" text="My Name"/>

and the PHP backend would look up the text value for "My Name" in a
lookup table for the user's language.

(Aside: I guess I failed to use Google correctly yesterday.... PHP
has support for gettext! <http://us3.php.net/gettext> So that's how
I think I will go...)

Check, with a limited amount of frases that would be my choice. A lot
harder to maintain in translating entire pages/documents though.
 
A

aa

Right now all the pages are hard-coded in English. We need to provide
multi-lingual support.

All of the pages are PHP generated. Ideally, I'd like for the PHP
backend to serve up the language based a) the user's locale, and if that
is not set, its own locale.

The PHP backend creates the pages on the fly from XML templates, so it
wouldn't be that hard for us to change the language.

But... I don't know the best way to do that. What is the current 'state
of the art' for language on demand in web content?

Thanks....

If by XML templates you mean structired contents in in defferent languages,
then all you need is just a presentatinal template in Unicode.
The problem usually arises with non-european languages which probably would
not fit into a european page layout.
As to language selection, you might want to consider an explicit selection
of a language in the menu for the language detected automatically, is not
always what a visitor wants.
 
J

John Murtari

CptDondo said:
I have a small, embedded app that uses a webserver to serve up pages
showing status, etc.

Right now all the pages are hard-coded in English. We need to provide
multi-lingual support.

All of the pages are PHP generated. Ideally, I'd like for the PHP
backend to serve up the language based a) the user's locale, and if
that is not set, its own locale.

The PHP backend creates the pages on the fly from XML templates, so it
wouldn't be that hard for us to change the language.

But... I don't know the best way to do that. What is the current
'state of the art' for language on demand in web content?

Thanks....

Like you said, go with gettext(). We just finished a
fairly large app that was to be multilingual. We used gettext
for the small stuff like "Login here". In cases where there
were larger blocks of text we would set a variable $defLang
based on the language the user was using, and in the code,

include(TEXT_DIR."/$defLang/this-usage.txt");

whenever we needed it. There was a root TEXT_DIR, with sub
dirs for each locale. File names were the same and it made
it easy for the end users to update each file for each language.

For graphic buttons it was a similar approach:

<img src=<?=BUTTON_DIR."/$defLang/login.gif"?> ......>

It worked easy for us. Probably the biggest thing we
did for the end user was create a simple PHP page that would
scan all the source files for gettext and put up a tabular
display of each phrase and the translation, if any, in each
of the target languages, and they could enter the translations
right there. They did not have to deal with the raw message
files.

Hope this helps.
--
John
___________________________________________________________________
John Murtari Software Workshop Inc.
jmurtari@following domain 315.635-1968(x-211) "TheBook.Com" (TM)
http://thebook.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top