Can HTML be translated to XHTML perfectly?!

Discussion in 'Java' started by mike, Sep 23, 2004.

  1. mike

    mike Guest

    regards:

    Can HTML be translated to XHTML perfectly?!

    or there is another similar tool in java ( like api )

    can do the work?!...thank you
    -----------------------------------------------------------
    I use the Jtidy opensources but I find that I cannot translate some
    HTML pages.




    thank you
    mike, Sep 23, 2004
    #1
    1. Advertising

  2. mike

    JScoobyCed Guest

    mike wrote:

    > regards:
    >
    > Can HTML be translated to XHTML perfectly?!


    Can you define your definition of "perfectly", related to your question ?
    Maybe a different newsgroup oriented XML/HTML would be more appropriated
    (I am not sure if there are any, though ... :) )

    --
    JScoobyCed
    What about a JScooby snack Shaggy ? ... Shaggy ?!
    JScoobyCed, Sep 23, 2004
    #2
    1. Advertising

  3. (mike) writes:

    > Can HTML be translated to XHTML perfectly?!


    Not necessarily. HTML is more lenient than XHTML, and browsers even
    more so. For instance, the EMBED element is an abomination unto W3C,
    since it doesn't have a restricted set of possible attributes.

    The easiest is probably to lowercase all element and attribute names,
    and change <empty> elements into <empty/> and hope for the best.
    Tor Iver Wilhelmsen, Sep 23, 2004
    #3
  4. mike

    mike Guest

    > Can you define your definition of "perfectly",


    "perfectly", that is:

    "XHTML file"(translated from HTML)can be identified by Nokia Mobile Browser.

    Is there any constructive idea?


    thank you
    mike, Oct 1, 2004
    #4
  5. mike wrote:

    >>Can you define your definition of "perfectly",

    >
    >
    >
    > "perfectly", that is:
    >
    > "XHTML file"(translated from HTML)can be identified by Nokia Mobile Browser.


    That is no problem at all: translate every input document to this:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
    <title>Formet HTML document</title>
    </head>
    <body>
    <p>This used to be a HTML document, now it's valid XHTML!</p>
    </body>
    </html>


    > Is there any constructive idea?


    Think some more about your definition of "perfect translation".
    Michael Borgwardt, Oct 1, 2004
    #5
  6. mike

    Ann Guest

    "Michael Borgwardt" <> wrote in message
    news:...
    > mike wrote:
    >
    > >>Can you define your definition of "perfectly",

    > >
    > >


    He doesn't really mean perfect, he means it needs to work
    on Nokia Mobile Browser, that's all.

    > >
    > > "perfectly", that is:
    > >
    > > "XHTML file"(translated from HTML)can be identified by Nokia Mobile

    Browser.
    >
    > That is no problem at all: translate every input document to this:
    >
    Ann, Oct 1, 2004
    #6
  7. mike wrote:

    > Can HTML be translated to XHTML perfectly?!


    _Valid_ HTML can be translated to XHTML without excessive difficulty.
    The tool you already have probably does that job quite well. Most of
    the HTML you come across on the web is not valid, however, even if you
    excuse the absence (in most cases) of a DOCTYPE declaration.

    >
    > or there is another similar tool in java ( like api )
    >
    > can do the work?!...thank you
    > -----------------------------------------------------------
    > I use the Jtidy opensources but I find that I cannot translate some
    > HTML pages.


    That's not surprising at all. There are some pages that some major
    browsers can't translate -- which pages they are depend on which browser
    you're talking about. When a browser runs into such a page it generally
    makes its best guess, which is the most significant cause of
    cross-browser compatibility issues. (Different browsers guess
    differently, but page authors have a tendency to depend on the browser
    always guessing the same way that their favorite browser does.)


    John Bollinger
    John C. Bollinger, Oct 1, 2004
    #7
  8. mike

    Tim Jowers Guest

    Michael Borgwardt <> wrote in message news:<>...
    > mike wrote:
    >
    > >>Can you define your definition of "perfectly",

    > >
    > >
    > >
    > > "perfectly", that is:
    > >
    > > "XHTML file"(translated from HTML)can be identified by Nokia Mobile Browser.

    >
    > That is no problem at all: translate every input document to this:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <!DOCTYPE html
    > PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    > <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    > <head>
    > <title>Formet HTML document</title>
    > </head>
    > <body>
    > <p>This used to be a HTML document, now it's valid XHTML!</p>
    > </body>
    > </html>
    >
    >
    > > Is there any constructive idea?

    >
    > Think some more about your definition of "perfect translation".


    I remember using the Java html document parser class and then writing
    "handlers" for many special cases. Seem to remember having to
    subprocess each element block for things like <p> with no </p>, <br>,
    et cetera as these are non-conforming. Fortutely, I was only concerned
    with the textarea/input fields/buttons and could just pass
    applets/objects through as they were. A tool to suck in data fields
    from another app and allow the clients to build a site very quickly
    based on their business model in the tool. Pretty cool in many ways:
    http://www.speedbuildersystems.com ... look for ScreenGen. They make
    tools for analysis for insurance industry so the concept was to roll
    out basic enrollment validation rules as well as model the ocmplex
    rules on the mainframe and the web farm.

    It'll be cool once many tools can work on the same HTML files. At that
    time, last year, even FrontPage could not HTML to XHTML.
    Tim Jowers, Oct 1, 2004
    #8
  9. mike

    mike Guest

    regards:

    thank you for your valuable suggestions:

    (1) In my nokia 6600 test,your code is OK as followings.
    nokia 6600 can identify the following xhtml code.
    ------------------------------------------------------------------
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
    <title>Formet HTML document</title>
    </head>
    <body>
    <p>This used to be a HTML document, now it's valid XHTML!</p>
    </body>
    </html>
    ---------------------------------------------------------------------
    (2) I don't know precisely how a HTML document is translated into
    a XHTML file by your words.

    Now I have a HTML document.The body part of the HTML document
    is defined as (Body part of a HTML document).

    By your sayings,a XHTML file is as followings:
    --------------------------------------------------------------------
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
    <head>
    <title>Formet HTML document</title>
    </head>
    <body>

    (Body part of a HTML document).

    </body>
    </html>
    --------------------------------------------------------------------
    but my nokia 6600 cannot identify the above xhtml file.

    Do I mistake your sayings?

    or something important I am missing.

    any constructive suggestions is welcome.





































    best wishes
    mike, Oct 4, 2004
    #9
  10. mike wrote:
    > (2) I don't know precisely how a HTML document is translated into
    > a XHTML file by your words.


    *groan* That's swhat I get for trying to be subtle.

    > Do I mistake your sayings?


    Yes. I wrote:

    That is no problem at all: translate every input document to this:

    ...........................................
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml"" xml:lang="en" lang="en">
    <head>
    <title>Formet HTML document</title>
    </head>
    <body>
    <p>This used to be a HTML document, now it's valid XHTML!</p>
    </body>
    </html>
    ............................................

    And I really meant "translate every input document to EXACTLY THE
    TEXT BETWEEN THE DOTTED LINES. That fits your requirements because
    you have only specified that the output has to be valid XHTML, not
    that it must have anything whatsoever to do with the input
    document. That's exactly why I (and others) told you you need to
    rethink your requirements.

    And that's not just snide hairsplitting. Presumably you want the
    output document to be rendered exactly as the input document would
    be. But that is practically impossible when the input is invalid
    HTML (which many, if not most HTML pages found on the WWW are),
    because rendering that involves guesswork by the browser, and that
    guesswork differs a lot between browsers and how it is done exactly
    is not known, at least in the case of the most popular browser,
    Internet Explorer.

    This is exactly why jtidy will not translate some HTML pages, as you
    have noticed.
    Michael Borgwardt, Oct 5, 2004
    #10
  11. mike

    mike Guest

    Now I get what you mean,thank you anyway.
    What I want is as follows:

    a HTML document---> after tidy's translation --->"a XHTML document"

    I want the "a XHTML document" to be identified by nokia 6600.


    (1)I use the Jtidy api,and find that nokia 6600 mobile browser cannot identify
    the "a XHTML document".

    (2)but nokia 6600 can identify the normal XHTML file like you post

    BETWEEN DOTTED LINES.

    ...........................................
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml"" xml:lang="en" lang="en">
    <head>
    <title>Formet HTML document</title>
    </head>
    <body>
    <p>This used to be a HTML document, now it's valid XHTML!</p>
    </body>
    </html>
    ............................................

    (3)I am curious that:
    Besides Jtidy's help,can I produce exact XHTML file identified by nokia 6600,
    using other good HTML parser.

    some parser like:

    http://htmlparser.sourceforge.net/


    could someone good give me a constructive suggestion.

    best wishes
    mike, Oct 6, 2004
    #11
  12. mike wrote:

    > Now I get what you mean,thank you anyway.
    > What I want is as follows:
    >
    > a HTML document---> after tidy's translation --->"a XHTML document"
    >
    > I want the "a XHTML document" to be identified by nokia 6600.
    >
    >
    > (1)I use the Jtidy api,and find that nokia 6600 mobile browser cannot identify
    > the "a XHTML document".
    >
    > (2)but nokia 6600 can identify the normal XHTML file like you post


    Ah, that is a far clearer problem.

    Well, thanks to the tight specification of XHTML, I think there are really
    only three things that could be the reason for this behaviour:

    - Jtidy has a bug
    - The Nokia browser has a bug
    - Jtidy's output conforms to or produces a different version of XHTML than the
    Nokia browser expects

    After a bit of googling, it seems that the Nokia browser actually supports
    only XHTMl MP (mobile profile), which is a *subset* of XHTML described here:
    http://www.wapforum.org/tech/documents/WAP-277-XHTMLMP-20011029-a.pdf

    Unfortunately, that makes it a lot more difficult to fulfill your requirements.
    Michael Borgwardt, Oct 6, 2004
    #12
  13. Im affraid not... Sorry... but it cant...


    "Michael Borgwardt" <> wrote in message
    news:...
    > mike wrote:
    >
    >> Now I get what you mean,thank you anyway.
    >> What I want is as follows:
    >>
    >> a HTML document---> after tidy's translation --->"a XHTML document"
    >>
    >> I want the "a XHTML document" to be identified by nokia 6600.
    >>
    >>
    >> (1)I use the Jtidy api,and find that nokia 6600 mobile browser cannot
    >> identify the "a XHTML document".
    >>
    >> (2)but nokia 6600 can identify the normal XHTML file like you post

    >
    > Ah, that is a far clearer problem.
    >
    > Well, thanks to the tight specification of XHTML, I think there are really
    > only three things that could be the reason for this behaviour:
    >
    > - Jtidy has a bug
    > - The Nokia browser has a bug
    > - Jtidy's output conforms to or produces a different version of XHTML than
    > the
    > Nokia browser expects
    >
    > After a bit of googling, it seems that the Nokia browser actually supports
    > only XHTMl MP (mobile profile), which is a *subset* of XHTML described
    > here:
    > http://www.wapforum.org/tech/documents/WAP-277-XHTMLMP-20011029-a.pdf
    >
    > Unfortunately, that makes it a lot more difficult to fulfill your
    > requirements.
    Ashburton Industries, Oct 8, 2004
    #13
  14. mike

    mike Guest

    regards:

    Is it reasonable?

    _NOTValid_ HTML
    --->_Valid_HTML (check syntax program)
    --->_Valid_XHTML(after translation)

    thank you

    best wishes
    mike, Oct 9, 2004
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Nik Coughin

    Perfectly valid website crashing IE

    Nik Coughin, Mar 2, 2005, in forum: HTML
    Replies:
    45
    Views:
    4,473
    Matt Probert
    Mar 4, 2005
  2. DNB
    Replies:
    4
    Views:
    650
    bruce barker
    Jan 10, 2008
  3. Tomás Ó hÉilidhe

    Suggest USB to RS232 that works perfectly

    Tomás Ó hÉilidhe, Jun 13, 2008, in forum: C Programming
    Replies:
    7
    Views:
    519
    Richard
    Jun 13, 2008
  4. robert
    Replies:
    16
    Views:
    1,205
  5. Ry Nohryb
    Replies:
    73
    Views:
    577
    Ry Nohryb
    Jul 29, 2010
Loading...

Share This Page