Regex question; match <br> after opening tag

P

Peter J. Holzer

Yes it is.

I wasn't actually sure whether an unescaped "<" within an attribute
value is allowed (it's forbidden in XML), but James Clark's SGML parser
accepts it and http://www.isgmlug.org/sgmlhelp/g-sg16.htm suggests it.
In any case I'm sure that an unescaped ">" is allowed, and that's the
one which brings breaks the proposed solution.
That is XHTML.

Even in XML, an unescaped ">" within an attribute is allowed, so

<br title="a &lt;br> element" />

is valid XHTML and breaks the proposed solution.
XHTML is not the same as HTML.

ACK. Although I tend to use HTML compatible XHTML[1] instead of HTML.

hp


[1] http://www.w3.org/TR/xhtml1/#guidelines
 
J

Jason

I don't have it installed, so found it on <http://search.cpan.org> and
scanned its docs - there's a handy list of all its methods at the top.
Based on its name, toString() looked like it might be relevant to what
you were trying to do, so I checked the full description of it to make
sure.

I've invested quite a few points in the Looking Stuff Up skill over the
years, and found that it's a pretty good investment. :)

sherm--

So...

I'm using HTML::HTML5::parser now, and while it works fine on catching
opening <span> or whatever without a closing </span>, I'm still not
sure how I should use this to remove the opening or trailing <br> in a
string like:

$text = "<span class=whatever><font class=small><br><br>Test</font></
span><br><br>This is fine.";

Which should be converted to:

$text = "<span class=whatever><font class=small>Test</font></
span><br><br>This is fine.";

Or, like:

$text = "This is fine.<br><br><span class=whatever><font
class=small>Test<br><br></font></span>";

Which should be converted to:

$text = "This is fine.<br><br><span class=whatever><font
class=small>Test</font></span>";


Just to reiterate, this is coming from a message board post, so these
strings are just basic samples. I'm trying to remove any <br> that
comes at the beginning (or end) of the string, even if it follows (or
precedes) another tag that is acceptable.
 
P

Peter J. Holzer

[...]

I'm using HTML::HTML5::parser now, and while it works fine on catching
opening <span> or whatever without a closing </span>, I'm still not
sure how I should use this to remove the opening or trailing <br> in a
string like:

As Sherm said, HTML::HTML5::parser returns an XML::LibXML::Document
object, so you can use all the methods of XML::LibXML::Document (and
XML::LibXML::Node, which is a superclass of XML::LibXML::Document) to
manipulate the tree.

For example:

* findnodes to find your br elements
* nextNonBlankSibling and previousNonBlankSibling to check if they are
the last or first nonblank element of their parent.
* unbindNode or removeChild to delete them

hp
 
C

ccc31807

XHTML is not the same as HTML.

I was thinking of running the HTML code through the W3C validator. I
almost always to so, and try my best to achieve the green light.

http://validator.w3.org/

Sometimes I settle for less, but to my thinking (not following the
precise definitions but just my habits) anything that passes the
validator is valid HTML and anything that doesn't isn't.

I understand that this is a subject that people can have very
different opinions on. My opinion is that, whenever possible, HTML
should pass the validator, but I don't insist that others have the
same opinion.

CC.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top