HTML and right-to-left writing...

Discussion in 'HTML' started by Daniel Bleisteiner, May 26, 2004.

  1. I have several understanding problems with HTML and the dir="rtl"
    attribute. Maybe you can clear things up for me...

    I have to evaluate the current possibilities for using HTML forms for the
    arabic language and found two different things related to that topic. The
    first is the dir="rtl" attribute which can be used for many HTML tags like
    TEXTAREA and others. From my understanding the attribute should cause the
    text to be written from right to left... possibly right-aligned at the
    same time. It only does the alignment - NOT the textorder. When I type
    something in the textarea the characters get appended to the right end of
    the text.

    Have a look at the following example: http://www.da3x.de/RTL.html

    As far as my understanding goes the characters should also be added to the
    left end of the text. But all browsers behave the same way (with small
    differences concerning the exclamation mark of the last sentence (IE and
    Mozilla put this and ONLY this last mark at the left end - Opera doesn't!).

    Do all browsers make the same error or is my understanding wrong? I'd like
    know how some arabic people think about that! Shouldn't be all newly typed
    characters appended to the left end of the string?

    Another element in HTML is BDO which can be used as <bdo
    dir="rtl">test</bdo>. This TURNS the text as I'd also expect it from the
    "dir"-attribute - but doesn't affect the textarea, no matter how my html
    is constructed. I'd really like to clear this up because I need to
    implement some routines in my server-system and I need clearance for this
    GUI topics.

    Thanks for all your help!

    --
    Daniel Bleisteiner
    Daniel Bleisteiner, May 26, 2004
    #1
    1. Advertising

  2. Daniel Bleisteiner

    rf Guest

    "Daniel Bleisteiner" <> wrote in message
    news:eek:-online.de...
    > I have several understanding problems with HTML and the dir="rtl"
    > attribute. Maybe you can clear things up for me...
    >
    > I have to evaluate the current possibilities for using HTML forms for the
    > arabic language and found two different things related to that topic. The
    > first is the dir="rtl" attribute which can be used for many HTML tags like
    > TEXTAREA and others. From my understanding the attribute should cause the
    > text to be written from right to left... possibly right-aligned at the
    > same time. It only does the alignment - NOT the textorder. When I type
    > something in the textarea the characters get appended to the right end of
    > the text.
    >
    > Have a look at the following example: http://www.da3x.de/RTL.html


    You are missing something.

    The order of characters typed into an element is at a level below dir="rtl",
    indeed it is at a level below the browser.

    Basically, all modern multilingual applications use the standard inbuilt
    multilingual capabilities of the operating system, as do all of the common
    Windows controls (eg an Edit control for a textarea). There is an entire
    multilingual subsystem in there, with it's own API and its own quirks.

    You will never get English characters to type in right to left because the
    OS knows that English characters are left to right.

    If you install, for example, Arabic language support (*) then when you
    switch to an Aribic then you will find that your input is right to left.

    (*) XP supports all languages. 2000 supports them if you have installed the
    relevant language packs, specifically the far east asian one. 98 requires
    you to specifically install a "far east asian" version of the OS. Once
    again, this is not a browser function, it is part of the OS. The browser
    uses the OS's functionality.
    rf, May 26, 2004
    #2
    1. Advertising

  3. On Wed, 26 May 2004 11:03:16 GMT, rf <> wrote:

    > You are missing something.


    Okay, this makes sense... and it means that I have no reliable way to test
    my implementation when not using an arabic configured computer system.
    I've tried to change to arabic using my WinXP settings but the language
    was not available... seems as I need some special update to get this done.

    One elementary question for me is how a string entered into an arabic
    textfield is send to the server.

    Example:
    An arabic types "test" into a textfield which displays the text as "tset"
    to him.
    When retrieving the forms fields with an CGI script... will be text be
    send as "test" or "tset"? I suppose its "test"... but I want to be sure...

    --
    Daniel Bleisteiner
    Daniel Bleisteiner, May 26, 2004
    #3
  4. Daniel Bleisteiner

    rf Guest

    "Daniel Bleisteiner" <> wrote in message
    news:eek:-online.de...
    > On Wed, 26 May 2004 11:03:16 GMT, rf <> wrote:
    >
    > > You are missing something.

    >
    > Okay, this makes sense... and it means that I have no reliable way to test
    > my implementation when not using an arabic configured computer system.
    > I've tried to change to arabic using my WinXP settings but the language
    > was not available... seems as I need some special update to get this done.


    No, you just need to enable support for the language. Look up the help
    files/install instructions, it's in there. I can't tell you exactly, it's
    been a couple of months since I last insatlled an XP system :)

    > One elementary question for me is how a string entered into an arabic
    > textfield is send to the server.


    It would be sent in the order it was typed in. That is all you need to know
    and that is the way you store it.

    This is the way all the common controls (edit for Textarea) store it.

    The presentaional order of the characters happens at display time, that is
    when the characters are actually drawn to the display surface. You, as
    somebody who stores what the user has typed in do not need to know how the
    characters are displayed. You store them as they come. The operating system
    (yes, windows, not the browser) determines the order they are displayed on
    the canvas.

    Go over to microsoft.com and search for "unicode". It's the underlying
    subsystem I mentioned earlier.
    rf, May 26, 2004
    #4
  5. On Wed, 26 May 2004 11:51:46 GMT, rf <> wrote:

    > The presentaional order of the characters happens at display time, that
    > is when the characters are actually drawn to the display surface. You, as
    > somebody who stores what the user has typed in do not need to know how
    > the characters are displayed. You store them as they come. The operating
    > system (yes, windows, not the browser) determines the order they are
    > displayed on the canvas.


    Thanks for the clearance! But I have to know how they are stored because I
    need to render them using the software I'm developing. That's why I'm
    asking those details. I have to generate the proper PostScript which
    displays the text.

    --
    Daniel Bleisteiner
    Daniel Bleisteiner, May 26, 2004
    #5
  6. Daniel Bleisteiner

    Mark Parnell Guest

    On Wed, 26 May 2004 11:03:16 GMT, "rf" <> declared in
    alt.html:

    > You are missing something.


    Richard! Welcome back!

    --
    Mark Parnell
    http://www.clarkecomputers.com.au
    Mark Parnell, May 26, 2004
    #6
  7. Daniel Bleisteiner

    rf Guest

    "Daniel Bleisteiner" <> wrote in message
    news:eek:-online.de...
    > On Wed, 26 May 2004 11:51:46 GMT, rf <> wrote:
    >
    > > The presentaional order of the characters happens at display time, that
    > > is when the characters are actually drawn to the display surface. You,

    as
    > > somebody who stores what the user has typed in do not need to know how
    > > the characters are displayed. You store them as they come. The operating
    > > system (yes, windows, not the browser) determines the order they are
    > > displayed on the canvas.

    >
    > Thanks for the clearance! But I have to know how they are stored


    They are stored in the order they were typed in by the user.
    User types in [a][c], that is what is stored.
    TextOut, given the entire string, renders [c][a].

    More to the point, given [A][C][a][c] where upper case is english and
    lower case is arabic TextOut would render
    [A][C][c][a].

    > because I
    > need to render them using the software I'm developing.


    Ah, I see. In that case you still don't care. You simply pass the string of
    characters, in the sequence they were entered, to TextOut, like above.
    TextOut is a uniscribe enabled API. Uniscribe takes care of all the layout
    things and the glyph replacement (*). You don't have to worry about it.

    > That's why I'm
    > asking those details. I have to generate the proper PostScript which
    > displays the text.


    Don't know about postscript, never used it. However, if you can not use the
    TextOut API or some equivelant to render the string then your software must
    be uniscribe enabled. While this is not a trivial exercise (mainly because
    of the lack of documentation) it is not too hard. 20 or 30 lines of C++ code
    will do it.

    (*) You are aware that certain character glyphs are replaced by others,
    depending on the characters position in relation to other characters. It's
    even worse in Thai. Certain characters are actually split into two seperate
    glyphs which surround the following character. For example, type in an [a]
    and then a and you end up with
    [a1][a2]. Handling all of the is *not* a trivial exercise. The bloke at
    Microsoft that wrote uniscribe took two or three years to get it right.
    rf, May 27, 2004
    #7
  8. Daniel Bleisteiner

    Mark Parnell Guest

    On Wed, 26 May 2004 23:04:50 GMT, "rf" <> declared in
    alt.html:

    > The bloke at
    > Microsoft that wrote uniscribe took two or three years to get it right.


    Microsoft got something right?

    --
    Mark Parnell
    http://www.clarkecomputers.com.au
    Mark Parnell, May 27, 2004
    #8
  9. Daniel Bleisteiner

    rf Guest

    "Mark Parnell" <> wrote in message
    news:1r5kt7e3f4n8l.1059n1yui618e$...
    > On Wed, 26 May 2004 11:03:16 GMT, "rf" <> declared in
    > alt.html:
    >
    > > You are missing something.

    >
    > Richard! Welcome back!


    Cheers mate :)
    rf, May 27, 2004
    #9
  10. Daniel Bleisteiner

    rf Guest

    "Mark Parnell" <> wrote in message
    news:12qm2k030ph0x$...
    > On Wed, 26 May 2004 23:04:50 GMT, "rf" <> declared in
    > alt.html:
    >
    > > The bloke at
    > > Microsoft that wrote uniscribe took two or three years to get it right.

    >
    > Microsoft got something right?


    Yep. They got the unicode bit right. However, they did the usual thing with
    their only example of how to use unicode: The example is so brain dead that
    it will wordwrap a fullstop onto the next line
    ..

    Cheers
    Richard.
    rf, May 27, 2004
    #10
  11. "rf" <> wrote:

    > You will never get English characters to type in right to left
    > because the OS knows that English characters are left to right.


    Not quite. <bdo dir="rtl">abc</bdo> produces, on conforming browsers, a
    right to left presentation.

    In Unicode, characters have inherent directionality. This means that some
    characters are left-to-right, some are right-to-left, and others are
    "neutral" in different ways. By HTML specifications, the dir attribute
    normally affects "neutral" text only, though in practice browsers are
    known to violate this, so it is advisable to specify <html dir="rtl">
    when your document is in Arabic or Hebrew. On the other hand, the <bdo>
    element, by definition, means bidirectionality _override_, so the dir
    attribute in it affects all text, including Latin letters.

    > If you install, for example, Arabic language support (*) then when
    > you switch to an Aribic then you will find that your input is right
    > to left.


    Perhaps, since we know that browser vendors confuse characters, character
    encodings, languages, countries, and other things into a horrendous mess.
    But HTML specifications say very clearly that the declared (via lang or
    xml:lang) language shall not affect the directionality; browser
    configuration is outside the scope of HTML, but it is clearly inadequate
    for a browser to change its behavior in directionality according to the
    language support installed.

    On the other hand, if I enter Arabic characters in a textarea (with no
    <bdo> elements, no rtl, no lang, no xml:lang attributes) on a plain
    vanilla (Finnish) Windows 98 or Windows XP, using vanilla IE 6, I get the
    characters displayed right-to-left (and sent in the order they had been
    entered). This is the correct behavior. The (debatably) incorrect part is
    that this is not affected by enclosing <bdo> markup.

    (Why "(debatably)"? Because it might be argued that textarea content is
    not textual content of a document, just data in a form field embedded
    into an HTML document. This is a poor excuse though, especially since a
    textarea element may contain text that is used as the initial value for
    the field, e.g. <textarea ...>abc</textarea>. It's clearly document
    content, but not affected by an enclosing <bdo dir="rtl"> element in
    practice _except_ as regards to alignment and placement of scrollbar.)

    The surprises don't end here. If I use CSS instead of HTML, I can make
    IE 6 treat all input so that it is rendered right to left (though still
    stored and sent in the order typed):

    <textarea name="txt" rows=3 cols=30
    style="unicode-bidi:bidi-override; direction:rtl">

    But other browsers may fail to honor this either. Apparently,
    implementation of form fields is still largely based on built-in routines
    that operate their own way, ignoring much of what you might have said in
    HTML (or CSS).

    Confused? You _will_ be after the next episode: on IE 6,
    <input style="unicode-bidi:bidi-override; direction:rtl">
    does _not_ make the text appear right to left. So there's even a
    difference between textarea and single-line input.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html
    Jukka K. Korpela, May 30, 2004
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?QmlzaG95?=
    Replies:
    0
    Views:
    986
    =?Utf-8?B?QmlzaG95?=
    Dec 28, 2006
  2. =?iso-8859-1?q?Jean-Fran=E7ois_Michaud?=

    Help on table align on left of page vs left hanging indent

    =?iso-8859-1?q?Jean-Fran=E7ois_Michaud?=, Jul 10, 2007, in forum: XML
    Replies:
    2
    Views:
    1,001
    =?iso-8859-1?q?Jean-Fran=E7ois_Michaud?=
    Jul 16, 2007
  3. pc
    Replies:
    2
    Views:
    1,312
    crisgoogle
    Jun 8, 2011
  4. lawrence
    Replies:
    13
    Views:
    296
    Thomas 'PointedEars' Lahn
    Sep 4, 2004
  5. Oran
    Replies:
    2
    Views:
    537
Loading...

Share This Page