ms doc to web page

D

dorayme

Too often, I am obliged (don't ask why) to preserve the format of a MS word
doc to show up on the web. I get the doc and I make an html file to suit. I
used to do it by hand and it looked a good translation from word processor
doc on screen to web browser and it validated and so on, maybe not as nice
as you guys would like, but it passed. But it is far too much work and quite
regular and I have taken to simply saving within Word as an html file. It
takes a few seconds instead of hours and is tooooo great a temptation. It
works and produces an almost exact match to the word doc when expressed in a
browse r- I was amazed (I try it in Mozilla and IE and iCab and I am assured
it is OK on PCs with the regular browsers. But the code looks so crappy to
me. (Mac, Office 2001). It would never validate in a million years and does
not. I am amazed no one has told me it does not work right on their machines
around the world...

My question is: any better wsiwig for this that would produce the goods but
better code? I have Dreamweaver on a PC - maybe I should use it, I have
always hated these things, ie. the wsiwigs ... and PCs but I am mellowing on
these :)

dorayme
 
S

Spartanicus

[Converting Word documents to html]
My question is: any better wsiwig for this that would produce the goods but
better code?

A Google search will probably throw up some specialised software.

One approach is to use the MS Word cleanup routine found in HTMLTidy. It
probably won't result in valid html, but a lot of the junk will be
removed.

HTMLTidy is a command line utility, a GUI interface exists and several
editors have incorporated it (HTML Kit, Topstyle etc.).
 
D

dorayme

From: Spartanicus said:
Newsgroups: alt.html
Date: Sat, 15 Jan 2005 11:39:31 +0000
Subject: Re: ms doc to web page


[Converting Word documents to html]
My question is: any better wsiwig for this that would produce the goods but
better code?

A Google search will probably throw up some specialised software.

One approach is to use the MS Word cleanup routine found in HTMLTidy. It
probably won't result in valid html, but a lot of the junk will be
removed.

HTMLTidy is a command line utility, a GUI interface exists and several
editors have incorporated it (HTML Kit, Topstyle etc.).

On the "Tidy HTML" I have as a plug in for my BBLite editor I am afraid
there are so many errors registered that it would be easier to hand code
from scratch.

But you seem to be pointing to a part of it I do not have. Interesting and
thank you, I should perhaps investigate in case of future need.

But I have discovered a better method for the present task: I am now just
making the original doc available as a doc downloadable from any browser and
openable in Word. This will do.

dorayme
 
S

Spartanicus

dorayme said:
On the "Tidy HTML" I have as a plug in for my BBLite editor I am afraid
there are so many errors registered that it would be easier to hand code
from scratch.

But you seem to be pointing to a part of it I do not have. Interesting and
thank you, I should perhaps investigate in case of future need.

Tidy doesn't just report errors, it can also rewrite and remove code.
The Word cleanup routine should be available in most versions of Tidy,
and example of Topstyle's Tidy GUI config:
http://www.spartanicus.utvinternet.ie/test/tidy.png
But I have discovered a better method for the present task: I am now just
making the original doc available as a doc downloadable from any browser and
openable in Word. This will do.

This requires users to have a copy of MS Word installed, proprietary
document formats such as Word and Acrobat not Internet friendly. Open
non proprietary formats should be used. HTML is the format to use on the
Internet unless pagination is essential.
 
D

dorayme

From: Spartanicus said:
Newsgroups: alt.html
Date: Tue, 18 Jan 2005 10:12:43 +0000
Subject: Re: ms doc to web page



Tidy doesn't just report errors, it can also rewrite and remove code.
The Word cleanup routine should be available in most versions of Tidy,
and example of Topstyle's Tidy GUI config:
http://www.spartanicus.utvinternet.ie/test/tidy.png


This requires users to have a copy of MS Word installed, proprietary
document formats such as Word and Acrobat not Internet friendly. Open
non proprietary formats should be used. HTML is the format to use on the
Internet unless pagination is essential.

Yes, I understand, it is not a general audience but a restricted academic
one. I took a small risk. I think they all have MS or facilities to open the
text.

I know Tidy does not just report errors. But mine at least asks for various
things to be fixed before it can handle the rest. These are numerous and
difficult things I have been thinking. Things it does not understand or
"recognise", unknown attributes and so on. Maybe I better look into a more
modern Tidy?

dorayme

[here is a bit of the report:

"line 1 column 1 - Warning: unknown attribute "xmlns:w"
line 1 column 1 - Warning: unknown attribute "xmlns:eek:"
line 44 column 1 - Warning: <style> lacks "type" attribute
line 144 column 27 - Error: <o:p> is not recognized!
line 144 column 27 - Warning: discarding unexpected <o:p>
line 144 column 31 - Warning: discarding unexpected </o:p>
line 146 column 79 - Error: <o:p> is not recognized!
line 146 column 79 - Warning: discarding unexpected <o:p>
line 146 column 84 - Warning: discarding unexpected </o:p>
line 148 column 79 - Error: <o:p> is not recognized!
line 148 column 79 - Warning: discarding unexpected <o:p>
line 148 column 84 - Warning: discarding unexpected </o:p>
line 153 column 17 - Error: <o:p> is not recognized!
line 153 column 17 - Warning: discarding unexpected <o:p>
line 153 column 21 - Warning: discarding unexpected </o:p>"

etc and finally


"Document content looks like XHTML 1.0 Transitional
286 warnings/errors were found!

This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version."]
 
M

Mark Parnell

Previously in alt.html said:
This requires users to have a copy of MS Word installed, proprietary
document formats such as Word and Acrobat not Internet friendly. Open
non proprietary formats should be used.

Not entirely true. OpenOffice.org (and probably others) can read Word
documents. Not saying I don't agree with the sentiment, but you don't
actually *need* MS Word to open a Word document.
 
D

dorayme

From: Mark Parnell said:
Newsgroups: alt.html
Date: Wed, 19 Jan 2005 10:50:08 +1100
Subject: Re: ms doc to web page



Not entirely true. OpenOffice.org (and probably others) can read Word
documents. Not saying I don't agree with the sentiment, but you don't
actually *need* MS Word to open a Word document.


Certainly, before I had Office, I was able to open Office docs in a variety
of ways.

Anyway, I should have said previously that my putting it as a doc for
download rather than display was a replacement for a previous practice of
emailing hundreds of people the doc (they were able to open an attachment
they had subscribed to so they would be able to open same via a website). It
was my solution to the scandalous HTML code generated by MS Word.

I was hoping to prepare for when I really have to make such docs as good web
pages... My main point or request was about how you guys and gals approach
turning an Office doc into a reasonably kosher web page without doing it by
hand like I have done...

dorayme
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,277
Latest member
VytoKetoReview

Latest Threads

Top