Can HTML be translated to XHTML perfectly?!

mike · Sep 23, 2004

regards:

Can HTML be translated to XHTML perfectly?!

or there is another similar tool in java ( like api )

can do the work?!...thank you
-----------------------------------------------------------
I use the Jtidy opensources but I find that I cannot translate some
HTML pages.

thank you

JScoobyCed · Sep 23, 2004

mike said:
regards:

Can HTML be translated to XHTML perfectly?!

Can you define your definition of "perfectly", related to your question ?
Maybe a different newsgroup oriented XML/HTML would be more appropriated
(I am not sure if there are any, though ...

)

Tor Iver Wilhelmsen · Sep 23, 2004

Can HTML be translated to XHTML perfectly?!

Not necessarily. HTML is more lenient than XHTML, and browsers even
more so. For instance, the EMBED element is an abomination unto W3C,
since it doesn't have a restricted set of possible attributes.

The easiest is probably to lowercase all element and attribute names,
and change <empty> elements into <empty/> and hope for the best.

mike · Oct 1, 2004

Can you define your definition of "perfectly",

"perfectly", that is:

"XHTML file"(translated from HTML)can be identified by Nokia Mobile Browser.

Is there any constructive idea?

thank you

Michael Borgwardt · Oct 1, 2004

mike said:
"perfectly", that is:

"XHTML file"(translated from HTML)can be identified by Nokia Mobile Browser.

That is no problem at all: translate every input document to this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>
This used to be a HTML document, now it's valid XHTML!
</body>

Is there any constructive idea?

Think some more about your definition of "perfect translation".

Ann · Oct 1, 2004

He doesn't really mean perfect, he means it needs to work
on Nokia Mobile Browser, that's all.

John C. Bollinger · Oct 1, 2004

mike said:
Can HTML be translated to XHTML perfectly?!

_Valid_ HTML can be translated to XHTML without excessive difficulty.
The tool you already have probably does that job quite well. Most of
the HTML you come across on the web is not valid, however, even if you
excuse the absence (in most cases) of a DOCTYPE declaration.

or there is another similar tool in java ( like api )

can do the work?!...thank you

That's not surprising at all. There are some pages that some major
browsers can't translate -- which pages they are depend on which browser
you're talking about. When a browser runs into such a page it generally
makes its best guess, which is the most significant cause of
cross-browser compatibility issues. (Different browsers guess
differently, but page authors have a tendency to depend on the browser
always guessing the same way that their favorite browser does.)

John Bollinger
(e-mail address removed)

Tim Jowers · Oct 1, 2004

Michael Borgwardt said:
That is no problem at all: translate every input document to this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>
This used to be a HTML document, now it's valid XHTML!
</body>

Think some more about your definition of "perfect translation".

I remember using the Java html document parser class and then writing
"handlers" for many special cases. Seem to remember having to
subprocess each element block for things like with no , ,
et cetera as these are non-conforming. Fortutely, I was only concerned
with the textarea/input fields/buttons and could just pass
applets/objects through as they were. A tool to suck in data fields
from another app and allow the clients to build a site very quickly
based on their business model in the tool. Pretty cool in many ways:
http://www.speedbuildersystems.com ... look for ScreenGen. They make
tools for analysis for insurance industry so the concept was to roll
out basic enrollment validation rules as well as model the ocmplex
rules on the mainframe and the web farm.

It'll be cool once many tools can work on the same HTML files. At that
time, last year, even FrontPage could not HTML to XHTML.

mike · Oct 4, 2004

regards:

thank you for your valuable suggestions:

(1) In my nokia 6600 test,your code is OK as followings.
nokia 6600 can identify the following xhtml code.
------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>
This used to be a HTML document, now it's valid XHTML!
</body>
</html>
---------------------------------------------------------------------
(2) I don't know precisely how a HTML document is translated into
a XHTML file by your words.

Now I have a HTML document.The body part of the HTML document
is defined as (Body part of a HTML document).

By your sayings,a XHTML file is as followings:
--------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>

(Body part of a HTML document).

</body>
</html>
--------------------------------------------------------------------
but my nokia 6600 cannot identify the above xhtml file.

Do I mistake your sayings?

or something important I am missing.

any constructive suggestions is welcome.

best wishes

Michael Borgwardt · Oct 5, 2004

mike said:
(2) I don't know precisely how a HTML document is translated into
a XHTML file by your words.

*groan* That's swhat I get for trying to be subtle.

Do I mistake your sayings?

Yes. I wrote:

That is no problem at all: translate every input document to this:

...........................................
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>
This used to be a HTML document, now it's valid XHTML!
</body>
</html>
............................................

And I really meant "translate every input document to EXACTLY THE
TEXT BETWEEN THE DOTTED LINES. That fits your requirements because
you have only specified that the output has to be valid XHTML, not
that it must have anything whatsoever to do with the input
document. That's exactly why I (and others) told you you need to
rethink your requirements.

And that's not just snide hairsplitting. Presumably you want the
output document to be rendered exactly as the input document would
be. But that is practically impossible when the input is invalid
HTML (which many, if not most HTML pages found on the WWW are),
because rendering that involves guesswork by the browser, and that
guesswork differs a lot between browsers and how it is done exactly
is not known, at least in the case of the most popular browser,
Internet Explorer.

This is exactly why jtidy will not translate some HTML pages, as you
have noticed.

mike · Oct 6, 2004

Now I get what you mean,thank you anyway.
What I want is as follows:

a HTML document---> after tidy's translation --->"a XHTML document"

I want the "a XHTML document" to be identified by nokia 6600.

(1)I use the Jtidy api,and find that nokia 6600 mobile browser cannot identify
the "a XHTML document".

(2)but nokia 6600 can identify the normal XHTML file like you post

BETWEEN DOTTED LINES.

...........................................
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"" xml:lang="en" lang="en">
<head>
<title>Formet HTML document</title>
</head>
<body>
This used to be a HTML document, now it's valid XHTML!
</body>
</html>
............................................

(3)I am curious that:
Besides Jtidy's help,can I produce exact XHTML file identified by nokia 6600,
using other good HTML parser.

some parser like:

http://htmlparser.sourceforge.net/

could someone good give me a constructive suggestion.

best wishes

Michael Borgwardt · Oct 6, 2004

mike said:
Now I get what you mean,thank you anyway.
What I want is as follows:

a HTML document---> after tidy's translation --->"a XHTML document"

I want the "a XHTML document" to be identified by nokia 6600.

(1)I use the Jtidy api,and find that nokia 6600 mobile browser cannot identify
the "a XHTML document".

(2)but nokia 6600 can identify the normal XHTML file like you post

Ah, that is a far clearer problem.

Well, thanks to the tight specification of XHTML, I think there are really
only three things that could be the reason for this behaviour:

- Jtidy has a bug
- The Nokia browser has a bug
- Jtidy's output conforms to or produces a different version of XHTML than the
Nokia browser expects

After a bit of googling, it seems that the Nokia browser actually supports
only XHTMl MP (mobile profile), which is a *subset* of XHTML described here:
http://www.wapforum.org/tech/documents/WAP-277-XHTMLMP-20011029-a.pdf

Unfortunately, that makes it a lot more difficult to fulfill your requirements.

Ashburton Industries · Oct 8, 2004

Im affraid not... Sorry... but it cant...

Michael Borgwardt said:
Ah, that is a far clearer problem.

Well, thanks to the tight specification of XHTML, I think there are really
only three things that could be the reason for this behaviour:

- Jtidy has a bug
- The Nokia browser has a bug
- Jtidy's output conforms to or produces a different version of XHTML than
the
Nokia browser expects

After a bit of googling, it seems that the Nokia browser actually supports
only XHTMl MP (mobile profile), which is a *subset* of XHTML described
here:
http://www.wapforum.org/tech/documents/WAP-277-XHTMLMP-20011029-a.pdf

Unfortunately, that makes it a lot more difficult to fulfill your
requirements.

mike · Oct 9, 2004

regards:

Is it reasonable?

_NOTValid_ HTML
--->_Valid_HTML (check syntax program)
--->_Valid_XHTML(after translation)

thank you

best wishes

Converting HTML to XHTML (JTidy,OpenXML,Xerces)	9	Mar 23, 2006
How can I remove the extra space marked in the image attached to my Email HTML template?	2	Feb 25, 2023
XML/XHTML/HTML differences, bugs... and howto	0	Jan 23, 2013
Changing .html in URL	3	Jul 11, 2022
Web scraping i guess (Yet to start, maybe this should be done in python?)	1	Nov 10, 2021
A HTML document can be converted to XHTML document.	2	Jan 11, 2005
How can I create a table using the input element?	1	Mar 25, 2022
Can someone explain why i have to drag my mouse on one window and the shape to be printed on another	1	Feb 9, 2022

Can HTML be translated to XHTML perfectly?!

mike

JScoobyCed

Tor Iver Wilhelmsen

mike

Michael Borgwardt

Ann

John C. Bollinger

Tim Jowers

mike

Michael Borgwardt

mike

Michael Borgwardt

Ashburton Industries

mike

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads