Newbie needs help on MS WORD-generated HTML

T

Tomba

I own one web page, hosted here:
http://pubs.sdrt.org/
The HTML was created in WORD 97 by an HMT SaveAs of the .DOC source. The
page is three years old and I've just updated it.
The webmaster now insists that all pages on his site pass the test at
http://validator.w3.org/
My updated .HTM fails miserably; thousands of errors.
I tried creating the HTML via WORD 2003; a five-times bigger .HTM file and
still thousands of errors.
The source, and the WORD 97- and 2003-generated HTML are here:
http://tomba.free.fr/S&DJRPUBS/
Is there a program that will convert my HTML to code that will pass the
test? Or perhaps a (free?) page generator I can download that will input a
WORD file? Any help will be gratefully received! Thanks, Tom
 
A

Arjen

Tomba schreef:
I own one web page, hosted here:
http://pubs.sdrt.org/
The HTML was created in WORD 97 by an HMT SaveAs of the .DOC source. The
page is three years old and I've just updated it.
The webmaster now insists that all pages on his site pass the test at
http://validator.w3.org/
My updated .HTM fails miserably; thousands of errors.
I tried creating the HTML via WORD 2003; a five-times bigger .HTM file and
still thousands of errors.
The source, and the WORD 97- and 2003-generated HTML are here:
http://tomba.free.fr/S&DJRPUBS/
Is there a program that will convert my HTML to code that will pass the
test? Or perhaps a (free?) page generator I can download that will input a
WORD file? Any help will be gratefully received! Thanks, Tom

Check html tidy

http://tidy.sourceforge.net/
 
T

Travis Newbury

I own one web page, hosted here:http://pubs.sdrt.org/
The HTML was created in WORD 97 by an HMT SaveAs of the .DOC source. The
page is three years old and I've just updated it.
The webmaster now insists that all pages on his site pass the test athttp://validator.w3.org/
My updated .HTM fails miserably; thousands of errors....

Search for TIDY. It will (may) clean up the code for you.
 
A

Andy Dingley

I own one web page, hosted here:http://pubs.sdrt.org/
The HTML was created in WORD 97 by an HMT SaveAs of the .DOC source.

Word? You're doomed! :cool:
The webmaster now insists that all pages on his site pass the test

Tell him to bugger off. Validity is useful and a good thing to aim
at, but it's not something that should be mandated without also
offering a practical way to achieve it. Hey, it's steam railways --
one of the very worst fields of web design for having to work with the
Loud Confident and Wrong; semi-clued anoraks who know enough to be a
nuisance, not enough to be helpful. Cheer up, it could be IT lecturers
instead.

First of all you ought to find yourself a text editor (try TextPad for
starters) and learn some _minimal_ HTML coding. You can't achieve
useful validity with a purely WYSIWYG tool. Good book is "Head First
HTML with XHTMl & CSS", possibly Elizabeth Castro, but anything else
is likely to be more harm than good. You will need to work with the
code directly though (it's not as hard as people make out).

Secondly, try opening it with a recent version of Word and saving it
as "HTML Filtered" rather than "HTML". This isnt' valid, but it's
closer to it than Word ever managed previously.

You're lucky that your output came out of Word 97 and not a more
recent Word. Word '97 makes bad HTML, the more recent versions (with
the mso: namespacing) make stuff that deliberately bears little
relationship to HTML at all. Those are almost unworkable! Tidy will be
useful here -- usually it isn't.

The recently updated HTML validator extension for Firefox
http://users.skynet.be/mgueury/mozilla/
incorporates both validators and Tidy. Using Tidy's simple "Just fix
the damn thing" option produces a page that's little changed but is
valid (I've just tried it). That's enough for you.

If you want to do more, then learn some trivial CSS, delete all the
<font> tags, all the <p> tags inside the table and then apply a few
simple classes and CSS style rules to the colour-highlighted blocks.
Add the HTML 4.01 Strict doctype declaration to the top and you're
well sorted.
 
T

Tina Peters

Tomba said:
I own one web page, hosted here:
http://pubs.sdrt.org/
The HTML was created in WORD 97 by an HMT SaveAs of the .DOC source. The
page is three years old and I've just updated it.
The webmaster now insists that all pages on his site pass the test at
http://validator.w3.org/


That is such a simple webpage, it shouldn't be that hard to recreate it with
valid HTML. I'm sure someone here would even be willing to give you a valid
template to work with. The worst part is going to be entering all of the
data back into the tables. Hire some kid to do the data entry for $50. ;-)

--Tina
 
T

Toby A Inkster

Tina said:
The worst part is going to be entering all of the data back into the
tables.

If he's got it in Word, then he can get the data into Excel. From there,
into another spreadsheet package with better HTML output (e.g. Gnumeric)
and then a few search and replaces to get a nice table. I've done similar
many a time.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/CSS/Javascript/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
 
T

Tina Peters

Toby A Inkster said:
If he's got it in Word, then he can get the data into Excel. From there,
into another spreadsheet package with better HTML output (e.g. Gnumeric)
and then a few search and replaces to get a nice table. I've done similar
many a time.


Ahh, yes. He might even go one better and get some sort of a web-based
script with a data base backend, for "on the fly" editing in the future. A
simple php/mysql solution would work nicely.

--Tina
 
A

Andy Dingley

That is such a simple webpage, it shouldn't be that hard to recreate it with
valid HTML.

This is such an almost-valid webpage that it's only a few minutes work
to hand-edit it to be valid.

With one hand.
 
T

Tina Peters

Andy Dingley said:
This is such an almost-valid webpage that it's only a few minutes work
to hand-edit it to be valid.

With one hand.

I didn't look at the code, since he said there were 1000s of errors and
(ack) made with Word. Maybe he could pay someone to make it valid. If, as
you say, its just a couple minutes of work...shouldn't cost more than the
price of a decent pizza. :)


--Tina
 
T

Toby A Inkster

T

Tomba

Tomba said:
I own one web page, hosted here:
http://pubs.sdrt.org/
The HTML was created in WORD 97 by an HMT SaveAs of the .DOC source. The
page is three years old and I've just updated it.
The webmaster now insists that all pages on his site pass the test at
http://validator.w3.org/
My updated .HTM fails miserably; thousands of errors.
I tried creating the HTML via WORD 2003; a five-times bigger .HTM file and
still thousands of errors.
The source, and the WORD 97- and 2003-generated HTML are here:
http://tomba.free.fr/S&DJRPUBS/
Is there a program that will convert my HTML to code that will pass the
test? Or perhaps a (free?) page generator I can download that will input a
WORD file? Any help will be gratefully received! Thanks, Tom
I ran the file against Tidy and the validations errors went from thousands
to 1; which was easily fixed by removing what I guess was an old,
now-redundant tag. I have a clean file!
My thanks to all who responded.
 
N

Nikita the Spider

Toby A Inkster said:
I would imagine so -- though I guess you probably need Apple's X server
installed. (ISTR that the X server is somewhere on the Apple website as a
free download.)

You are correct; X is a free download. dorayme, you'll probably want to
use Fink to install Gnumeric. I tried on my own not using Fink and it
wasn't pretty. It's a shame because I've heard very good things about
Gnumeric.
 
J

J.O. Aho

Toby said:
I would imagine so -- though I guess you probably need Apple's X server
installed. (ISTR that the X server is somewhere on the Apple website as a
free download.)

On the OSX for intel, you can select to install it from the install CD, not
sure if thats the case on the PowerPC version, as I never played with it as
Apple don't allow/support other PowerPC hardware to run OSX.
 
D

dorayme

Toby A Inkster said:
I would imagine so -- though I guess you probably need Apple's X server
installed. (ISTR that the X server is somewhere on the Apple website as a
free download.)

I installed a server a while back, with php running too. This was
a fun. It came packaged with the OS but needed to be dug out and
configured. OK, I will take another look at gnumeric soon.
 
D

dorayme

Nikita the Spider said:
You are correct; X is a free download. dorayme, you'll probably want to
use Fink to install Gnumeric. I tried on my own not using Fink and it
wasn't pretty. It's a shame because I've heard very good things about
Gnumeric.

Well, this is something I will put on the list too. Thanks for
the tip, Spider. Do you then use it now?

Fink! In the Wizard of Id, does not a peasant every now and then
scoot past the palace and shout out real loud or throw a brick in
with a message that reads, "The King is a fink!"?
 
D

dorayme

"J.O. Aho said:
On the OSX for intel, you can select to install it from the install CD, not
sure if thats the case on the PowerPC version, as I never played with it as
Apple don't allow/support other PowerPC hardware to run OSX.

Pre Intel is me; Intel maybe long way off because I have my eye
on a G5 for next (they will have to be getting irresistibly
cheap) That's a thought, to look on my install DVD... X has been
so rock solid that it is now in a huge pile of CDs and DVDs under
my desk...
 
J

John Hosking

I ran the file against Tidy and the validations errors went from thousands
to 1; which was easily fixed by removing what I guess was an old,
now-redundant tag. I have a clean file!
My thanks to all who responded.

Well, if you're happy, I'm happy, but the page you mentioned could still
take some going over. For example, it starts with

<B><FONT SIZE=5><P>Somerset &amp; Dorset Joint Railway (S&amp;DJR)
Bibliography</P>
</B></FONT>

which features misnested tags (and some questionable markup). Better:

<h1>Somerset &amp; Dorset Joint Railway (S&amp;DJR) Bibliography</h1>

styled per default or as you like.

Not as much of a size saving per effort-minute required (compared with
using HTMLTidy), but something to know about.
 
J

John Hosking

John said:
Well, if you're happy, I'm happy, but the page you mentioned could still
take some going over. For example, it starts with

<B><FONT SIZE=5><P>Somerset &amp; Dorset Joint Railway (S&amp;DJR)
Bibliography</P>
</B></FONT>

which features misnested tags (and some questionable markup). Better:

<h1>Somerset &amp; Dorset Joint Railway (S&amp;DJR) Bibliography</h1>

styled per default or as you like.

Not as much of a size saving per effort-minute required (compared with
using HTMLTidy), but something to know about.
Sorry, I guess you didn't mean that you had uploaded the tidied file. I
see that that page (http://pubs.sdrt.org/) still shows
"Failed validation, 1484 errors" on the validator.
 
T

Toby A Inkster

J.O. Aho said:
On the OSX for intel, you can select to install it from the install CD, not
sure if thats the case on the PowerPC version, as I never played with it as
Apple don't allow/support other PowerPC hardware to run OSX.

Yep -- it's on the 10.4 install discs (PPC and Intel), but I don't think
it's on the discs for earlier versions of OS X, hence my recommendation
of the Apple website.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/CSS/Javascript/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top