Convert HTML Tags to Lower-case for XHTML Compliance

S

schmoozes

http://www.ng2000.com/news.php?tp=html

The XHTML definition demands all tags to be lower-cased. Your page will
not validate otherwise and will therefore not be valid XHTML. If you
write all your XHTML by yourself, it shouldn't be an issue. You simply
write all tags in lower-case. Now, imaging situations where you're not
in control over the code being written. One situation is when you let
visitors/users of the website
 
F

freemont

http://shnip

The XHTML definition demands all tags to be lower-cased. Your page will
not validate otherwise and will therefore not be valid XHTML. If you
write all your XHTML by yourself, it shouldn't be an issue. You simply
write all tags in lower-case. Now, imaging situations where you're not
in control over the code being written. One situation is when you let
visitors/users of the website

It helps when you finish sentences so that
 
M

mbstevens

http://www.ng2000.com/news.php?tp=html

The XHTML definition demands all tags to be lower-cased. Your page will
not validate otherwise and will therefore not be valid XHTML. If you
write all your XHTML by yourself, it shouldn't be an issue. You simply
write all tags in lower-case. Now, imaging situations where you're not
in control over the code being written. One situation is when you let
visitors/users of the website

The C++ code after going through a couple of pages:
____________________________________________________
private static string LowerCaseHtml(string html)
{
string[] tags = new string[] {
"p", "a", "br", "span", "div", "i", "u", "b", "h1", "h2",
"h3", "h4", "h5", "h6", "h7", "ul", "ol", "li", "img",
"tr", "table", "th", "td", "tbody", "thead", "tfoot",
"input", "select", "option", "textarea", "em", "strong"
};

foreach (string s in tags)
{
html = html.Replace("<" + s.ToUpper(), "<" + s).Replace("/" + s.ToUpper() + ">", "/" + s + ">");;
}

return html;
}
_________________________________________________


It's a nice try, but would you mind running it over the following
sentence, and letting us know what the results are:

<P>Colonel Altman said "Target the Border, boys!"</P>

Looking at the code without actually running it,
my guess is that you'll get:

<P>colonel altman said "target the border, boys!"</P>

The problem is that you have to
separate out strings that are parts of tags from those that
are just part of text that gets displayed on a web page.

You would normally want an (X)HTML parser to do this.

Languages like Perl and Python have libraries and modules
that provide (X)HTML parsing capabilities. You link them
in with a single line of code. I haven't checked
C++ lately, but I bet it does, too.

Tidy, I think, can also accomplish this. You can find it
through the w3c website.
 
M

mbstevens

http://www.ng2000.com/news.php?tp=html

The XHTML definition demands all tags to be lower-cased. Your page will
not validate otherwise and will therefore not be valid XHTML. If you
write all your XHTML by yourself, it shouldn't be an issue. You simply
write all tags in lower-case. Now, imaging situations where you're not
in control over the code being written. One situation is when you let
visitors/users of the website

The C++ code after going through a couple of pages:
____________________________________________________
private static string LowerCaseHtml(string html)
{
string[] tags = new string[] {
"p", "a", "br", "span", "div", "i", "u", "b", "h1", "h2",
"h3", "h4", "h5", "h6", "h7", "ul", "ol", "li", "img",
"tr", "table", "th", "td", "tbody", "thead", "tfoot",
"input", "select", "option", "textarea", "em", "strong"
};

foreach (string s in tags)
{
html = html.Replace("<" + s.ToUpper(), "<" + s).Replace("/" + s.ToUpper() + ">", "/" + s + ">");;
}

return html;
}
_________________________________________________


It's a nice try, but would you mind running it over the following
sentence, and letting us know what the results are:

<P>Colonel Altman said "Target the Border, boys!"</P>

Looking at the code without actually running it,
my guess is that you'll get:

<P>colonel altman said "target the border, boys!"</P>

The problem is that you have to
separate out strings that are parts of tags from those that
are just part of text that gets displayed on a web page.

You would normally want an (X)HTML parser to do this.

Languages like Perl and Python have libraries and modules
that provide (X)HTML parsing capabilities. You link them
in with a single line of code. I haven't checked
C++ lately, but I bet it does, too.

Tidy, I think, can also accomplish this. You can find it
through the w3c website.

If it passes the test sentence, you might also try it on:

<img src="Alt/Target/Span.jpg" alt="Colonel Altman said 'Target the
Border, boys!'" HEIGHT=20 WIDTH=36 />

Begin to see why a fairly elaborate parser is needed?
 
M

mbstevens

Begin to see why a fairly elaborate parser is needed?

The other thing that worries me is that you are converting the
string with ToUpper() instead of ToLower(). That has to have some
bizarre consequences if you're trying to convert to lower case.
 
J

Jim Moe

The XHTML definition demands all tags to be lower-cased. Your page will
not validate otherwise and will therefore not be valid XHTML. If you
write all your XHTML by yourself, it shouldn't be an issue. You simply
write all tags in lower-case. Now, imaging situations where you're not
in control over the code being written. One situation is when you let
visitors/users of the website
Use HTML-Tidy <http://sourceforge.net/projects/tidy/> to convert the case of
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top