T
Tim Arnold
hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop to
create CHM files. That application really hates xhtml, so I need to convert
self-ending tags (e.g. <br />) to plain html (e.g. <br>).
Seems simple enough, but I'm having some trouble with it. regexps trip up
because I also have to take into account 'img', 'meta', 'link' tags, not
just the simple 'br' and 'hr' tags. Well, maybe there's a simple way to do
that with regexps, but my simpleminded <img[^(/>)]+/> doesn't work. I'm not
enough of a regexp pro to figure out that lookahead stuff.
I'm not sure where to start now; I looked at BeautifulSoup and
BeautifulStoneSoup, but I can't see how to modify the actual tag.
thanks,
--Tim Arnold
create CHM files. That application really hates xhtml, so I need to convert
self-ending tags (e.g. <br />) to plain html (e.g. <br>).
Seems simple enough, but I'm having some trouble with it. regexps trip up
because I also have to take into account 'img', 'meta', 'link' tags, not
just the simple 'br' and 'hr' tags. Well, maybe there's a simple way to do
that with regexps, but my simpleminded <img[^(/>)]+/> doesn't work. I'm not
enough of a regexp pro to figure out that lookahead stuff.
I'm not sure where to start now; I looked at BeautifulSoup and
BeautifulStoneSoup, but I can't see how to modify the actual tag.
thanks,
--Tim Arnold