HTML and right-to-left writing...

D

Daniel Bleisteiner

I have several understanding problems with HTML and the dir="rtl"
attribute. Maybe you can clear things up for me...

I have to evaluate the current possibilities for using HTML forms for the
arabic language and found two different things related to that topic. The
first is the dir="rtl" attribute which can be used for many HTML tags like
TEXTAREA and others. From my understanding the attribute should cause the
text to be written from right to left... possibly right-aligned at the
same time. It only does the alignment - NOT the textorder. When I type
something in the textarea the characters get appended to the right end of
the text.

Have a look at the following example: http://www.da3x.de/RTL.html

As far as my understanding goes the characters should also be added to the
left end of the text. But all browsers behave the same way (with small
differences concerning the exclamation mark of the last sentence (IE and
Mozilla put this and ONLY this last mark at the left end - Opera doesn't!).

Do all browsers make the same error or is my understanding wrong? I'd like
know how some arabic people think about that! Shouldn't be all newly typed
characters appended to the left end of the string?

Another element in HTML is BDO which can be used as <bdo
dir="rtl">test</bdo>. This TURNS the text as I'd also expect it from the
"dir"-attribute - but doesn't affect the textarea, no matter how my html
is constructed. I'd really like to clear this up because I need to
implement some routines in my server-system and I need clearance for this
GUI topics.

Thanks for all your help!
 
R

rf

Daniel Bleisteiner said:
I have several understanding problems with HTML and the dir="rtl"
attribute. Maybe you can clear things up for me...

I have to evaluate the current possibilities for using HTML forms for the
arabic language and found two different things related to that topic. The
first is the dir="rtl" attribute which can be used for many HTML tags like
TEXTAREA and others. From my understanding the attribute should cause the
text to be written from right to left... possibly right-aligned at the
same time. It only does the alignment - NOT the textorder. When I type
something in the textarea the characters get appended to the right end of
the text.

Have a look at the following example: http://www.da3x.de/RTL.html

You are missing something.

The order of characters typed into an element is at a level below dir="rtl",
indeed it is at a level below the browser.

Basically, all modern multilingual applications use the standard inbuilt
multilingual capabilities of the operating system, as do all of the common
Windows controls (eg an Edit control for a textarea). There is an entire
multilingual subsystem in there, with it's own API and its own quirks.

You will never get English characters to type in right to left because the
OS knows that English characters are left to right.

If you install, for example, Arabic language support (*) then when you
switch to an Aribic then you will find that your input is right to left.

(*) XP supports all languages. 2000 supports them if you have installed the
relevant language packs, specifically the far east asian one. 98 requires
you to specifically install a "far east asian" version of the OS. Once
again, this is not a browser function, it is part of the OS. The browser
uses the OS's functionality.
 
D

Daniel Bleisteiner

You are missing something.

Okay, this makes sense... and it means that I have no reliable way to test
my implementation when not using an arabic configured computer system.
I've tried to change to arabic using my WinXP settings but the language
was not available... seems as I need some special update to get this done.

One elementary question for me is how a string entered into an arabic
textfield is send to the server.

Example:
An arabic types "test" into a textfield which displays the text as "tset"
to him.
When retrieving the forms fields with an CGI script... will be text be
send as "test" or "tset"? I suppose its "test"... but I want to be sure...
 
R

rf

Daniel Bleisteiner said:
Okay, this makes sense... and it means that I have no reliable way to test
my implementation when not using an arabic configured computer system.
I've tried to change to arabic using my WinXP settings but the language
was not available... seems as I need some special update to get this done.

No, you just need to enable support for the language. Look up the help
files/install instructions, it's in there. I can't tell you exactly, it's
been a couple of months since I last insatlled an XP system :)
One elementary question for me is how a string entered into an arabic
textfield is send to the server.

It would be sent in the order it was typed in. That is all you need to know
and that is the way you store it.

This is the way all the common controls (edit for Textarea) store it.

The presentaional order of the characters happens at display time, that is
when the characters are actually drawn to the display surface. You, as
somebody who stores what the user has typed in do not need to know how the
characters are displayed. You store them as they come. The operating system
(yes, windows, not the browser) determines the order they are displayed on
the canvas.

Go over to microsoft.com and search for "unicode". It's the underlying
subsystem I mentioned earlier.
 
D

Daniel Bleisteiner

The presentaional order of the characters happens at display time, that
is when the characters are actually drawn to the display surface. You, as
somebody who stores what the user has typed in do not need to know how
the characters are displayed. You store them as they come. The operating
system (yes, windows, not the browser) determines the order they are
displayed on the canvas.

Thanks for the clearance! But I have to know how they are stored because I
need to render them using the software I'm developing. That's why I'm
asking those details. I have to generate the proper PostScript which
displays the text.
 
R

rf

Daniel Bleisteiner said:
Thanks for the clearance! But I have to know how they are stored

They are stored in the order they were typed in by the user.
User types in [a][c], that is what is stored.
TextOut, given the entire string, renders [c][a].

More to the point, given [A][C][a][c] where upper case is english and
lower case is arabic TextOut would render
[A][C][c][a].
because I
need to render them using the software I'm developing.

Ah, I see. In that case you still don't care. You simply pass the string of
characters, in the sequence they were entered, to TextOut, like above.
TextOut is a uniscribe enabled API. Uniscribe takes care of all the layout
things and the glyph replacement (*). You don't have to worry about it.
That's why I'm
asking those details. I have to generate the proper PostScript which
displays the text.

Don't know about postscript, never used it. However, if you can not use the
TextOut API or some equivelant to render the string then your software must
be uniscribe enabled. While this is not a trivial exercise (mainly because
of the lack of documentation) it is not too hard. 20 or 30 lines of C++ code
will do it.

(*) You are aware that certain character glyphs are replaced by others,
depending on the characters position in relation to other characters. It's
even worse in Thai. Certain characters are actually split into two seperate
glyphs which surround the following character. For example, type in an [a]
and then a and you end up with
[a1][a2]. Handling all of the is *not* a trivial exercise. The bloke at
Microsoft that wrote uniscribe took two or three years to get it right.
 
R

rf

Mark Parnell said:
Microsoft got something right?

Yep. They got the unicode bit right. However, they did the usual thing with
their only example of how to use unicode: The example is so brain dead that
it will wordwrap a fullstop onto the next line
..

Cheers
Richard.
 
J

Jukka K. Korpela

rf said:
You will never get English characters to type in right to left
because the OS knows that English characters are left to right.

Not quite. <bdo dir="rtl">abc</bdo> produces, on conforming browsers, a
right to left presentation.

In Unicode, characters have inherent directionality. This means that some
characters are left-to-right, some are right-to-left, and others are
"neutral" in different ways. By HTML specifications, the dir attribute
normally affects "neutral" text only, though in practice browsers are
known to violate this, so it is advisable to specify <html dir="rtl">
when your document is in Arabic or Hebrew. On the other hand, the <bdo>
element, by definition, means bidirectionality _override_, so the dir
attribute in it affects all text, including Latin letters.
If you install, for example, Arabic language support (*) then when
you switch to an Aribic then you will find that your input is right
to left.

Perhaps, since we know that browser vendors confuse characters, character
encodings, languages, countries, and other things into a horrendous mess.
But HTML specifications say very clearly that the declared (via lang or
xml:lang) language shall not affect the directionality; browser
configuration is outside the scope of HTML, but it is clearly inadequate
for a browser to change its behavior in directionality according to the
language support installed.

On the other hand, if I enter Arabic characters in a textarea (with no
<bdo> elements, no rtl, no lang, no xml:lang attributes) on a plain
vanilla (Finnish) Windows 98 or Windows XP, using vanilla IE 6, I get the
characters displayed right-to-left (and sent in the order they had been
entered). This is the correct behavior. The (debatably) incorrect part is
that this is not affected by enclosing <bdo> markup.

(Why "(debatably)"? Because it might be argued that textarea content is
not textual content of a document, just data in a form field embedded
into an HTML document. This is a poor excuse though, especially since a
textarea element may contain text that is used as the initial value for
the field, e.g. <textarea ...>abc</textarea>. It's clearly document
content, but not affected by an enclosing <bdo dir="rtl"> element in
practice _except_ as regards to alignment and placement of scrollbar.)

The surprises don't end here. If I use CSS instead of HTML, I can make
IE 6 treat all input so that it is rendered right to left (though still
stored and sent in the order typed):

<textarea name="txt" rows=3 cols=30
style="unicode-bidi:bidi-override; direction:rtl">

But other browsers may fail to honor this either. Apparently,
implementation of form fields is still largely based on built-in routines
that operate their own way, ignoring much of what you might have said in
HTML (or CSS).

Confused? You _will_ be after the next episode: on IE 6,
<input style="unicode-bidi:bidi-override; direction:rtl">
does _not_ make the text appear right to left. So there's even a
difference between textarea and single-line input.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top