Word HTML

Discussion in 'HTML' started by Saber, Sep 30, 2008.

  1. Saber

    Saber Guest

    No, I don't use M$ Word to make web sites, so don't just pounce on me
    for that right away.
    That being said, one of my professors does. She made a general
    announcement to the class if anyone can fix up her site, make it look
    better and the such. I figured, "Sure, why not? I can do HTML and a
    good amount of CSS with little problem." She doesn't want anything
    extremely fancy; just better than a Word Document turned into a web page.
    Then I looked at the code..
    <rant>
    WTF was M$ thinking?! Seriously? You call that HTML? I call it junk.
    It is worse than if i took a can of alphabet soup, dumped it on a table
    and added in some <'s >'s and "'s randomly.
    </rant>
    Is there any easy(ish) way to make it nice code wise so I can work with
    it or is it a lost cause and I just have to redo everything. I wouldn't
    normally mind redoing it, but it's Marine Bio class and each page is
    fairly long and well, I'm kinda OCD with formatting a bit and I know it
    will take me more time than I can afford. Justifying a bad grade on a
    test by showing a new web site isn't the best strategy in the world. :)

    Oh, and I tried using Dreamweaver to reformat the code to something
    better that I can work with a fix up from there. It crashed my
    computer. My $7000 quad-core 3.3GHz, 8GB Ram computer. That's... not
    supposed to happen. I've reformatted stuff before (albeit, stuff I had
    made myself a few years back and it needed an update) and it didn't crash.

    --
    Saber
     
    Saber, Sep 30, 2008
    #1
    1. Advertising

  2. Saber

    Els Guest

    Saber wrote:

    > Then I looked at the code..
    > <rant>
    > WTF was M$ thinking?! Seriously? You call that HTML? I call it junk.
    > It is worse than if i took a can of alphabet soup, dumped it on a table
    > and added in some <'s >'s and "'s randomly.
    > </rant>
    > Is there any easy(ish) way to make it nice code wise so I can work with
    > it or is it a lost cause and I just have to redo everything. I wouldn't
    > normally mind redoing it, but it's Marine Bio class and each page is
    > fairly long and well, I'm kinda OCD with formatting a bit and I know it
    > will take me more time than I can afford. Justifying a bad grade on a
    > test by showing a new web site isn't the best strategy in the world. :)


    You could try HTML Tidy:
    <http://www.w3.org/People/Raggett/tidy/>

    I've also found it useful to copy paste from the browser into the text
    editor, and then adding the HTML elements myself. This may not be as
    quick if there are lots of links or visual formatting, but works
    wonders for long pages with simple headings, paragraphs and lists.

    --
    Els http://locusmeus.com/
     
    Els, Sep 30, 2008
    #2
    1. Advertising

  3. Saber wrote:

    > No, I don't use M$ Word to make web sites, so don't just pounce on me
    > for that right away.


    Actually, Microsoft Word can be very useful for making web sites. For
    example, after creating a draft page, open it in Word and use its nice
    spelling and grammar checks. (Yes, I know, that's the kind of thing that
    people _don't_ use Word for in web authoring. )

    > That being said, one of my professors does. She made a general
    > announcement to the class if anyone can fix up her site, make it look
    > better and the such. I figured, "Sure, why not? I can do HTML and a
    > good amount of CSS with little problem."


    Cleaning it up could be an interesting exercise but it's really hard work.
    It would usually be more efficient to redesign the site from scratch.

    > Is there any easy(ish) way to make it nice code wise so I can work
    > with it or is it a lost cause and I just have to redo everything.


    Save it as filtered HTML (via File/Save As; old versions of Word may need a
    plugin for that), then delete the style sheet and write a simple nice style
    sheet instead. After this, the main headache is usually extra markup that
    makes the source less legible and some nasty markup that sets table cells
    sizes with pixel settings. But in a simple case, the latter problem is fixed
    by zapping with the following magic CSS wand:
    td { width: auto !important; }

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Sep 30, 2008
    #3
  4. Saber

    Saber Guest

    Els wrote:
    > Saber wrote:
    >
    >> Then I looked at the code..
    >> <rant>
    >> WTF was M$ thinking?! Seriously? You call that HTML? I call it junk.
    >> It is worse than if i took a can of alphabet soup, dumped it on a table
    >> and added in some <'s >'s and "'s randomly.
    >> </rant>
    >> Is there any easy(ish) way to make it nice code wise so I can work with
    >> it or is it a lost cause and I just have to redo everything. I wouldn't
    >> normally mind redoing it, but it's Marine Bio class and each page is
    >> fairly long and well, I'm kinda OCD with formatting a bit and I know it
    >> will take me more time than I can afford. Justifying a bad grade on a
    >> test by showing a new web site isn't the best strategy in the world. :)

    >
    > You could try HTML Tidy:
    > <http://www.w3.org/People/Raggett/tidy/>


    D'oh! I have that bookmarked too. Silly me.
    >
    > I've also found it useful to copy paste from the browser into the text
    > editor, and then adding the HTML elements myself. This may not be as
    > quick if there are lots of links or visual formatting, but works
    > wonders for long pages with simple headings, paragraphs and lists.


    Didn't think of that also. I can do that with some of the pages, but
    some have a lotta links. Still saves me some time though.

    Thanks! :)

    --
    Saber
     
    Saber, Sep 30, 2008
    #4
  5. Saber

    C A Upsdell Guest

    Saber wrote:
    >> I've also found it useful to copy paste from the browser into the text
    >> editor, and then adding the HTML elements myself. This may not be as
    >> quick if there are lots of links or visual formatting, but works
    >> wonders for long pages with simple headings, paragraphs and lists.

    >
    > Didn't think of that also. I can do that with some of the pages, but
    > some have a lotta links. Still saves me some time though.


    Some (all?) browsers let you copy the source which produced a selected
    piece of a web page, so if you use this for text, you can paste the text
    complete with the HTML for the links. Probably a few &nbsp; characters
    to deal with, but this should be okay so long as you confine yourself to
    copying text.
     
    C A Upsdell, Sep 30, 2008
    #5
  6. Saber

    Saber Guest

    Jukka K. Korpela wrote:
    > Saber wrote:
    >
    >> No, I don't use M$ Word to make web sites, so don't just pounce on me
    >> for that right away.

    >
    > Actually, Microsoft Word can be very useful for making web sites. For
    > example, after creating a draft page, open it in Word and use its nice
    > spelling and grammar checks. (Yes, I know, that's the kind of thing that
    > people _don't_ use Word for in web authoring. )


    That is a valid point. I will copy and paste paragraphs and sentences
    sometimes into Word to double-check stuff. But as the only tool, it,
    well, is lacking.
    >
    >> That being said, one of my professors does. She made a general
    >> announcement to the class if anyone can fix up her site, make it look
    >> better and the such. I figured, "Sure, why not? I can do HTML and a
    >> good amount of CSS with little problem."

    >
    > Cleaning it up could be an interesting exercise but it's really hard
    > work. It would usually be more efficient to redesign the site from scratch.
    >

    The main page is actually kinda funny, in a sad sort of way. It is
    called mb20.htm. Not index, but that's forgivable, it's user-error, not
    Word error. She has a Ph.D in unicellular Protists in marine
    evironments, but that's doesn't mean she knows about web structure. The
    funny part is, even with all of the inline styling, there is a
    sub-folder called mb20_files and a stylesheet in that with more of what
    looks like the same styles. Even though they are written out in full
    for basically every line in the web page.
    >> Is there any easy(ish) way to make it nice code wise so I can work
    >> with it or is it a lost cause and I just have to redo everything.

    >
    > Save it as filtered HTML (via File/Save As; old versions of Word may
    > need a plugin for that), then delete the style sheet and write a simple
    > nice style sheet instead. After this, the main headache is usually extra
    > markup that makes the source less legible and some nasty markup that
    > sets table cells sizes with pixel settings. But in a simple case, the
    > latter problem is fixed by zapping with the following magic CSS wand:
    > td { width: auto !important; }
    >

    Is saving as filtered HTML possible if I only have the HTML files? I'm
    just using whats already on the site, I don't have the original .doc's.

    --
    Saber
     
    Saber, Sep 30, 2008
    #6
  7. Saber

    Saber Guest

    C A Upsdell wrote:
    > Saber wrote:
    >>> I've also found it useful to copy paste from the browser into the text
    >>> editor, and then adding the HTML elements myself. This may not be as
    >>> quick if there are lots of links or visual formatting, but works
    >>> wonders for long pages with simple headings, paragraphs and lists.

    >>
    >> Didn't think of that also. I can do that with some of the pages, but
    >> some have a lotta links. Still saves me some time though.

    >
    > Some (all?) browsers let you copy the source which produced a selected
    > piece of a web page, so if you use this for text, you can paste the text
    > complete with the HTML for the links. Probably a few &nbsp; characters
    > to deal with, but this should be okay so long as you confine yourself to
    > copying text.


    Normally I would do that. But the links are in a UL, in a table with 4
    style elements in each <a> tag. The time it would take to trim it down
    would end up being the same as typing them up again or right-click the
    link on the page and copy link location and paste that into the HTML.

    --
    Saber
     
    Saber, Sep 30, 2008
    #7
  8. Gazing into my crystal ball I observed Saber
    <> writing in
    news:48e28091$0$5672$:

    > C A Upsdell wrote:
    >> Saber wrote:
    >>>> I've also found it useful to copy paste from the browser into the
    >>>> text editor, and then adding the HTML elements myself. This may not
    >>>> be as quick if there are lots of links or visual formatting, but
    >>>> works wonders for long pages with simple headings, paragraphs and
    >>>> lists.
    >>>
    >>> Didn't think of that also. I can do that with some of the pages,
    >>> but some have a lotta links. Still saves me some time though.

    >>
    >> Some (all?) browsers let you copy the source which produced a
    >> selected piece of a web page, so if you use this for text, you can
    >> paste the text complete with the HTML for the links. Probably a few
    >> &nbsp; characters to deal with, but this should be okay so long as
    >> you confine yourself to copying text.

    >
    > Normally I would do that. But the links are in a UL, in a table with
    > 4 style elements in each <a> tag. The time it would take to trim it
    > down would end up being the same as typing them up again or
    > right-click the link on the page and copy link location and paste that
    > into the HTML.
    >
    > --
    > Saber
    >


    Four style attributes in each a element?! Time for CSS, and maybe a
    little server side script.

    --
    Adrienne Boswell at Home
    Arbpen Web Site Design Services
    http://www.cavalcade-of-coding.info
    Please respond to the group so others can share
     
    Adrienne Boswell, Oct 1, 2008
    #8
  9. Saber wrote:

    > The main page is actually kinda funny, in a sad sort of way. It is
    > called mb20.htm.


    I think the name comes simply and automatically from mb20.doc (or
    mb20.docx), which was the user-selected name for the Word file.

    > The funny part is, even with all of the inline styling, there is a
    > sub-folder called mb20_files and a stylesheet in that with more of
    > what looks like the same styles.


    Word is a bit odd, and differently odd in different versions.

    > Is saving as filtered HTML possible if I only have the HTML files?


    A simple test reverals that it is, at least in Word 2007 (and most probably
    in earlier versions as well).

    I picked up a Word file I had created on Word 2007 in "compatibility mode"
    (i.e., supposed to be readable on old versions from Word 97 to Word 2003). I
    first saved it "as Web page", resulting in a 40 kB file with loads of Word
    junk, plus a folder with several files, including two image files for each
    image in the Word document, two obscure XML files, and a theme file.

    Then I opened the HTML document in Word 2007, via File/Open, and saved it as
    filtered Web page. This resulted in a 14 kB file with less junk and a folder
    with just one image file per one image in the original Word document.
    There's still junk like messy Microsoft CSS but mostly concentrated in one
    <style> element which can easily be removed or shrunk. And there's still
    strange stuff like
    <body lang=EN-US>
    for a document in Finnish but
    <p class=MsoNormal><span lang=FI>paragraph text</span></p>
    for each paragraph, etc. Foolish but harmless. And there's e.g.
    <h1><span lang=FI style='font-size:14.0pt'>heading text</span></h1>
    which is not so harmless, since overriding this in CSS just with
    h1 { font-size: 130% }
    or something like that won't suffice, due to CSS rules. You would have to
    add
    h1 span { font-size: 100% }
    (or clean up the markup).

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Oct 1, 2008
    #9
  10. Saber

    cwdjrxyz Guest


    > WTF was M$ thinking?! Seriously?  You call that HTML?  I call it junk..
    > It is worse than if i took a can of alphabet soup, dumped it on a table
    > and added in some <'s >'s and "'s randomly.
    > </rant>


    It gets much, much worse. Just view the main page of all of Microsoft
    at http://www.microsoft.com/en/us/default.aspx and validate at the w3c
    html and css validators. This page changes a bit fairly often.

    The page is written as xhtml transitional and uses the correct w3c
    Doctype for this. However it is served as text/html. That is just as
    well, because if the page were served correctly as true xhtml with the
    required mime type of application/xhtml+xml, it could not be viewed on
    any IE browser at least through IE7. All you would get would be an
    error message. However if all errors were corrected, the page could be
    viewed on most other proper recent browsers when served as true xhtml.

    The page is not exceptionally long by major company standards. However
    it has 176 html errors and 36 warnings. It also has 78 css errors.Most
    of these errors would have to be corrected for the page to be even
    viewable on xhtml enabled browsers if it were served as true xhtml.
    Else you would only get error messages from the very strict xml parser
    used for true xhtml pages served properly.

    I don't know who wrote the code for this page. Hopefully it was
    contracted out and does not reflect the html and css knowledge of
    Microsoft staff that writes code for their browsers. I think most who
    read this group would have to work very hard to come up with a page of
    their own with so many errors.In my opinion, which likely does not
    matter to Microsoft, this page is a disgrace and reflects very poorly
    on Microsoft. If such a page were written in a html class at school,
    it would deserve the lowest possible grade and flunking of the course.
     
    cwdjrxyz, Oct 1, 2008
    #10
  11. Saber

    Bergamot Guest

    Saber wrote:
    >
    > The main page is actually kinda funny, in a sad sort of way. It is
    > called mb20.htm.
    > funny part is, even with all of the inline styling, there is a
    > sub-folder called mb20_files and a stylesheet in that


    Sounds like it was created via the "save as web page, complete" feature
    of a browser. If that were done using IE, the resulting code will be
    ugly because IE "optimizes" it for its own purposes. Other browsers
    mangle the code less.

    --
    Berg
     
    Bergamot, Oct 2, 2008
    #11
  12. On 30 Sep 2008, Saber barged into alt.html and uttered:

    > Is there any easy(ish) way to make it nice code wise so I can work
    > with it or is it a lost cause and I just have to redo everything.


    Try opening the document in Open Office (Writer) and then saving it as HTML
    from there. Open Office is the open source version of MS Office, and is
    available from http://openoffice.org/

    I've never actually tried it, and I don't have any MSWord docs to try it
    with, but I'd be willing to bet that most of the MSGarbage is gone. :)

    cheers

    - M


    --
    # http://www.nofccainway.com
    # nofccainway@_your_clothes_nofccainway.com
    # remove _your_clothes_ when emailing me

    # d o n ' t b e l i e v e e v e r y t h i n g y o u t h i n k
     
    Matt-the-Hoople, Oct 3, 2008
    #12
  13. Saber

    Jan Karman Guest

    why not trying Notepad2 from
    http://www.flos-freeware.ch/notepad2.html
    ?
    Good luck!

    "Bergamot" <> wrote in message
    news:...
    >
    > Saber wrote:
    >>
    >> The main page is actually kinda funny, in a sad sort of way. It is
    >> called mb20.htm.
    >> funny part is, even with all of the inline styling, there is a
    >> sub-folder called mb20_files and a stylesheet in that

    >
    > Sounds like it was created via the "save as web page, complete"
    > feature
    > of a browser. If that were done using IE, the resulting code will be
    > ugly because IE "optimizes" it for its own purposes. Other browsers
    > mangle the code less.
    >
    > --
    > Berg
     
    Jan Karman, Oct 4, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Laura
    Replies:
    1
    Views:
    533
    Gunnar Hjalmarsson
    Jun 5, 2004
  2. Stephen Witter

    opening a word doc in word not browser

    Stephen Witter, May 18, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    493
    Stephen Witter
    May 18, 2004
  3. Al Moritz
    Replies:
    7
    Views:
    641
    Richard Laing
    Jul 22, 2003
  4. fitwell
    Replies:
    2
    Views:
    618
    fitwell
    Nov 13, 2003
  5. Sujith Gangaraju

    word document in html html format

    Sujith Gangaraju, Jul 24, 2008, in forum: Ruby
    Replies:
    3
    Views:
    160
    Jeremy Heiler
    Jul 24, 2008
Loading...

Share This Page