HTML normalization

A

Aaron Gray

Hi,

I am looking for a tool that will normalize HTML code, so that badly created
HTML can be cleaned up removing things like unnecessary spans and font tags.

Preferably free and open source or shareware.

Aaron
 
D

dorayme

Mark said:
Deciding to do something for the good of humanity, Aaron Gray
<[email protected]> declared in alt.html:

There was a line or two by BdeZ in this latter. I am away from base a
long way, can't fiddle with my hosts' computer (which is set at 800 x
600 btw and a big salutary lesson for me, yes, some people really do
have this!) and seem resigned to use Google, what else? Anyway, back to
BdeZ. As you know, I have been quite interested in her killfile for
quite a long time. The fascination does not go away because I am away.
You see, wherever I am, i am always in her kf. Got that? OK, now if she
killfiles all google group posts, do I go into a lower chamber of her
kf when I post from GG? Will I see Boji down there (You might know I
heard him screaming once from a lower chamber, he is very bad and
probably went straight there)

Last time I popped back to Sydney the local sweaty Blinky Bill defamed
me and I had to leave immedietely for another holiday to recover my
composure.

O, an afterthought, to the OP, nothing will clean up your html for you.
You have to do all the dirty work yourself unless you hire a cleaner
and this will be in either of two forms, human or Martian. The Martian
ones are all on holiday (yes, we all gather together for riotous fun
and surf and so on on the North Coast of Australia...)
 
M

Mark Parnell

Deciding to do something for the good of humanity,
There was a line or two by BdeZ in this latter. I am away from base a
long way, can't fiddle with my hosts' computer (which is set at 800 x
600 btw and a big salutary lesson for me, yes, some people really do
have this!) and seem resigned to use Google, what else?

Unfortunately, there aren't any other realistic alternatives. If I could
work out how to only killfile posts from GG that are a) a reply and b)
don't contain any quoted text, I'd do it. But since my newsreader has no
way of knowing whether a post has quoted text or not until I download
the entire post, and I only download the headers until I actually want
to read a post, I can't see that happening.
Anyway, back to
BdeZ. As you know, I have been quite interested in her killfile for
quite a long time. The fascination does not go away because I am away.
You see, wherever I am, i am always in her kf. Got that? OK, now if she
killfiles all google group posts, do I go into a lower chamber of her
kf when I post from GG?

Well if you're posting from GG, you're in my killfile too, but because
you replied to one of my posts, you ascended back out again (only for
this post though). :)
The Martian
ones are all on holiday (yes, we all gather together for riotous fun
and surf and so on on the North Coast of Australia...)

Bit late for schoolies week isn't it? ;-)
 
M

Mark Parnell

Deciding to do something for the good of humanity, Aaron Gray
This does not seem to normalize styles and font setting at all.

Then perhaps you should explain what you mean by "normali[z|s]ing" them.
I understood it the same way Frank did. If we misunderstood, you'll have
to be more specific.
 
F

Frank Olieu

_Aaron Gray_ skrev | wrote | écrivit (03-02-2006 02:09):
This does not seem to normalize styles and font setting at all.

have a look at the /question/ again ;-)
 
D

dorayme

Mark said:
Aaron Gray
<[email protected]> said>
This does not seem to normalize styles and font setting at all.

Then perhaps you should explain what you mean by "normali[z|s]ing" them.
I understood it the same way Frank did. If we misunderstood, you'll have
to be more specific.

OP wants to be able to get a god awful thing like an MS Office "saved
as html" doc auto made nice and standard and easily read and free of
all the crap. It can't be done. Try a Tidy on one of those! You might
as well try to raise the sea by pissing in it...
 
M

Mark Parnell

Deciding to do something for the good of humanity,
OP wants to be able to get a god awful thing like an MS Office "saved
as html" doc auto made nice and standard and easily read and free of
all the crap.

That's what Tidy does - it may not do a perfect job, but the OP doesn't
seem to think Tidy does at all what he wants.

For Office-generated HTML files specifically, MS have a utility which I
must admit works quite well:
http://www.microsoft.com/downloads/...EE-3FBD-482C-83B0-96FB79B74DED&displaylang=EN
 
B

Blinky the Shark

Well if you're posting from GG, you're in my killfile too, but because you
replied to one of my posts, you ascended back out again (only for this
post though). :)

Suggestion: My GG rule is the first one in my score file, so it kicks in
before all others, and it includes the instruction to stop scoring the
post once it has been triggered and has awarded the post a -9999.
 
A

Aaron Gray

This does not seem to normalize styles and font setting at all.
Then perhaps you should explain what you mean by "normali[z|s]ing" them.
I understood it the same way Frank did. If we misunderstood, you'll have
to be more specific.

In the "mathematical" sence. eg if you have two embedded tags that are the
same then they are removed.

<span style="font-family:courier">Some text. <span
style="font-family:courier">some more text. </span>yet more text.</span>

Normalizes to :-

<span style="font-family:courier">Some text. some more text. yet more
text.</span>

I and several people are using the NVU HTML editor which on editting does
not normalize the HTML, leaving in lots of unneeded tags when doing
sploradic editting :(

Aaron
 
D

dorayme

Mark said:
Deciding to do something for the good of humanity,


That's what Tidy does - it may not do a perfect job, but the OP doesn't
seem to think Tidy does at all what he wants.

For Office-generated HTML files specifically, MS have a utility which I
must admit works quite well:
http://www.microsoft.com/downloads/...EE-3FBD-482C-83B0-96FB79B74DED&displaylang=EN

Well, maybe I have used a very old Tidy or something for years on my
Macs... I found it useful for alerting me to a few probs, even for
fixing some persistent bad, but not good at all when the original is
really awful... so if the OP has god awful raw material, he will end up
with god awful end product, the improvement being worthless on the
whole, probably not that visible even.
 
A

Aaron Gray

really awful... so if the OP has god awful raw material, he will end up
with god awful end product, the improvement being worthless on the
whole, probably not that visible even.

No its reasonably good HTML, just not normalized.

Its generated by NVU HTML editor, which is a reasonably good Free Open
Source Cross Platform HTML editor :-

http://www.nvu.com/

If it comes to it and I cannot find an existing HTML normalizer program I'll
knock up one myself.

Aaron
 
M

Mark Parnell

Deciding to do something for the good of humanity, Blinky the Shark
Suggestion: My GG rule is the first one in my score file, so it kicks in
before all others, and it includes the instruction to stop scoring the
post once it has been triggered and has awarded the post a -9999.

Actually, I did that on purpose. I'd like to see all replies to my
posts, even if they come from GG.
 
A

Andy Dingley

This does not seem to normalize styles and font setting at all.

If you have CSS to normalize, try using the W3C CSS validator, then
using its final report as a tidied-up version of your stylesheet,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top