Word HTML

S

Saber

No, I don't use M$ Word to make web sites, so don't just pounce on me
for that right away.
That being said, one of my professors does. She made a general
announcement to the class if anyone can fix up her site, make it look
better and the such. I figured, "Sure, why not? I can do HTML and a
good amount of CSS with little problem." She doesn't want anything
extremely fancy; just better than a Word Document turned into a web page.
Then I looked at the code..
<rant>
WTF was M$ thinking?! Seriously? You call that HTML? I call it junk.
It is worse than if i took a can of alphabet soup, dumped it on a table
and added in some <'s >'s and "'s randomly.
</rant>
Is there any easy(ish) way to make it nice code wise so I can work with
it or is it a lost cause and I just have to redo everything. I wouldn't
normally mind redoing it, but it's Marine Bio class and each page is
fairly long and well, I'm kinda OCD with formatting a bit and I know it
will take me more time than I can afford. Justifying a bad grade on a
test by showing a new web site isn't the best strategy in the world. :)

Oh, and I tried using Dreamweaver to reformat the code to something
better that I can work with a fix up from there. It crashed my
computer. My $7000 quad-core 3.3GHz, 8GB Ram computer. That's... not
supposed to happen. I've reformatted stuff before (albeit, stuff I had
made myself a few years back and it needed an update) and it didn't crash.
 
E

Els

Saber said:
Then I looked at the code..
<rant>
WTF was M$ thinking?! Seriously? You call that HTML? I call it junk.
It is worse than if i took a can of alphabet soup, dumped it on a table
and added in some <'s >'s and "'s randomly.
</rant>
Is there any easy(ish) way to make it nice code wise so I can work with
it or is it a lost cause and I just have to redo everything. I wouldn't
normally mind redoing it, but it's Marine Bio class and each page is
fairly long and well, I'm kinda OCD with formatting a bit and I know it
will take me more time than I can afford. Justifying a bad grade on a
test by showing a new web site isn't the best strategy in the world. :)

You could try HTML Tidy:
<http://www.w3.org/People/Raggett/tidy/>

I've also found it useful to copy paste from the browser into the text
editor, and then adding the HTML elements myself. This may not be as
quick if there are lots of links or visual formatting, but works
wonders for long pages with simple headings, paragraphs and lists.
 
J

Jukka K. Korpela

Saber said:
No, I don't use M$ Word to make web sites, so don't just pounce on me
for that right away.

Actually, Microsoft Word can be very useful for making web sites. For
example, after creating a draft page, open it in Word and use its nice
spelling and grammar checks. (Yes, I know, that's the kind of thing that
people _don't_ use Word for in web authoring. )
That being said, one of my professors does. She made a general
announcement to the class if anyone can fix up her site, make it look
better and the such. I figured, "Sure, why not? I can do HTML and a
good amount of CSS with little problem."

Cleaning it up could be an interesting exercise but it's really hard work.
It would usually be more efficient to redesign the site from scratch.
Is there any easy(ish) way to make it nice code wise so I can work
with it or is it a lost cause and I just have to redo everything.

Save it as filtered HTML (via File/Save As; old versions of Word may need a
plugin for that), then delete the style sheet and write a simple nice style
sheet instead. After this, the main headache is usually extra markup that
makes the source less legible and some nasty markup that sets table cells
sizes with pixel settings. But in a simple case, the latter problem is fixed
by zapping with the following magic CSS wand:
td { width: auto !important; }
 
S

Saber

Els said:

D'oh! I have that bookmarked too. Silly me.
I've also found it useful to copy paste from the browser into the text
editor, and then adding the HTML elements myself. This may not be as
quick if there are lots of links or visual formatting, but works
wonders for long pages with simple headings, paragraphs and lists.

Didn't think of that also. I can do that with some of the pages, but
some have a lotta links. Still saves me some time though.

Thanks! :)
 
C

C A Upsdell

Saber said:
Didn't think of that also. I can do that with some of the pages, but
some have a lotta links. Still saves me some time though.

Some (all?) browsers let you copy the source which produced a selected
piece of a web page, so if you use this for text, you can paste the text
complete with the HTML for the links. Probably a few &nbsp; characters
to deal with, but this should be okay so long as you confine yourself to
copying text.
 
S

Saber

Jukka said:
Actually, Microsoft Word can be very useful for making web sites. For
example, after creating a draft page, open it in Word and use its nice
spelling and grammar checks. (Yes, I know, that's the kind of thing that
people _don't_ use Word for in web authoring. )

That is a valid point. I will copy and paste paragraphs and sentences
sometimes into Word to double-check stuff. But as the only tool, it,
well, is lacking.
Cleaning it up could be an interesting exercise but it's really hard
work. It would usually be more efficient to redesign the site from scratch.
The main page is actually kinda funny, in a sad sort of way. It is
called mb20.htm. Not index, but that's forgivable, it's user-error, not
Word error. She has a Ph.D in unicellular Protists in marine
evironments, but that's doesn't mean she knows about web structure. The
funny part is, even with all of the inline styling, there is a
sub-folder called mb20_files and a stylesheet in that with more of what
looks like the same styles. Even though they are written out in full
for basically every line in the web page.
Save it as filtered HTML (via File/Save As; old versions of Word may
need a plugin for that), then delete the style sheet and write a simple
nice style sheet instead. After this, the main headache is usually extra
markup that makes the source less legible and some nasty markup that
sets table cells sizes with pixel settings. But in a simple case, the
latter problem is fixed by zapping with the following magic CSS wand:
td { width: auto !important; }
Is saving as filtered HTML possible if I only have the HTML files? I'm
just using whats already on the site, I don't have the original .doc's.
 
S

Saber

C said:
Some (all?) browsers let you copy the source which produced a selected
piece of a web page, so if you use this for text, you can paste the text
complete with the HTML for the links. Probably a few &nbsp; characters
to deal with, but this should be okay so long as you confine yourself to
copying text.

Normally I would do that. But the links are in a UL, in a table with 4
style elements in each <a> tag. The time it would take to trim it down
would end up being the same as typing them up again or right-click the
link on the page and copy link location and paste that into the HTML.
 
A

Adrienne Boswell

Gazing into my crystal ball I observed Saber
Normally I would do that. But the links are in a UL, in a table with
4 style elements in each <a> tag. The time it would take to trim it
down would end up being the same as typing them up again or
right-click the link on the page and copy link location and paste that
into the HTML.

Four style attributes in each a element?! Time for CSS, and maybe a
little server side script.
 
J

Jukka K. Korpela

Saber said:
The main page is actually kinda funny, in a sad sort of way. It is
called mb20.htm.

I think the name comes simply and automatically from mb20.doc (or
mb20.docx), which was the user-selected name for the Word file.
The funny part is, even with all of the inline styling, there is a
sub-folder called mb20_files and a stylesheet in that with more of
what looks like the same styles.

Word is a bit odd, and differently odd in different versions.
Is saving as filtered HTML possible if I only have the HTML files?

A simple test reverals that it is, at least in Word 2007 (and most probably
in earlier versions as well).

I picked up a Word file I had created on Word 2007 in "compatibility mode"
(i.e., supposed to be readable on old versions from Word 97 to Word 2003). I
first saved it "as Web page", resulting in a 40 kB file with loads of Word
junk, plus a folder with several files, including two image files for each
image in the Word document, two obscure XML files, and a theme file.

Then I opened the HTML document in Word 2007, via File/Open, and saved it as
filtered Web page. This resulted in a 14 kB file with less junk and a folder
with just one image file per one image in the original Word document.
There's still junk like messy Microsoft CSS but mostly concentrated in one
<style> element which can easily be removed or shrunk. And there's still
strange stuff like
<body lang=EN-US>
for a document in Finnish but
<p class=MsoNormal><span lang=FI>paragraph text</span></p>
for each paragraph, etc. Foolish but harmless. And there's e.g.
<h1><span lang=FI style='font-size:14.0pt'>heading text</span></h1>
which is not so harmless, since overriding this in CSS just with
h1 { font-size: 130% }
or something like that won't suffice, due to CSS rules. You would have to
add
h1 span { font-size: 100% }
(or clean up the markup).
 
C

cwdjrxyz

WTF was M$ thinking?! Seriously?  You call that HTML?  I call it junk..
It is worse than if i took a can of alphabet soup, dumped it on a table
and added in some <'s >'s and "'s randomly.
</rant>

It gets much, much worse. Just view the main page of all of Microsoft
at http://www.microsoft.com/en/us/default.aspx and validate at the w3c
html and css validators. This page changes a bit fairly often.

The page is written as xhtml transitional and uses the correct w3c
Doctype for this. However it is served as text/html. That is just as
well, because if the page were served correctly as true xhtml with the
required mime type of application/xhtml+xml, it could not be viewed on
any IE browser at least through IE7. All you would get would be an
error message. However if all errors were corrected, the page could be
viewed on most other proper recent browsers when served as true xhtml.

The page is not exceptionally long by major company standards. However
it has 176 html errors and 36 warnings. It also has 78 css errors.Most
of these errors would have to be corrected for the page to be even
viewable on xhtml enabled browsers if it were served as true xhtml.
Else you would only get error messages from the very strict xml parser
used for true xhtml pages served properly.

I don't know who wrote the code for this page. Hopefully it was
contracted out and does not reflect the html and css knowledge of
Microsoft staff that writes code for their browsers. I think most who
read this group would have to work very hard to come up with a page of
their own with so many errors.In my opinion, which likely does not
matter to Microsoft, this page is a disgrace and reflects very poorly
on Microsoft. If such a page were written in a html class at school,
it would deserve the lowest possible grade and flunking of the course.
 
B

Bergamot

Saber said:
The main page is actually kinda funny, in a sad sort of way. It is
called mb20.htm.
funny part is, even with all of the inline styling, there is a
sub-folder called mb20_files and a stylesheet in that

Sounds like it was created via the "save as web page, complete" feature
of a browser. If that were done using IE, the resulting code will be
ugly because IE "optimizes" it for its own purposes. Other browsers
mangle the code less.
 
M

Matt-the-Hoople

Is there any easy(ish) way to make it nice code wise so I can work
with it or is it a lost cause and I just have to redo everything.

Try opening the document in Open Office (Writer) and then saving it as HTML
from there. Open Office is the open source version of MS Office, and is
available from http://openoffice.org/

I've never actually tried it, and I don't have any MSWord docs to try it
with, but I'd be willing to bet that most of the MSGarbage is gone. :)

cheers

- M


--
# http://www.nofccainway.com
# nofccainway@_your_clothes_nofccainway.com
# remove _your_clothes_ when emailing me

# d o n ' t b e l i e v e e v e r y t h i n g y o u t h i n k
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,998
Latest member
MarissaEub

Latest Threads

Top