J
jwcarlton
I've just started changing my processing over to HTML::HTML5:
arser,
so please bear with me on this.
I've been using a regex to remove empty tags, but I see one that's not
working so I assume there's either a typo, or an error in the logic.
I'm trying to convert this:
<span class="Apple-style-span" style="font-family: Arial, Verdana,
Helvetica, sans-serif; "><br></span>
To:
<br>
It should also catch <span...></span> (with nothing inside), or
<span...> </span> (with a whitespace inside).
"class" and "style" can be anything (or non-existent), so I'm just
trying to remove <span, followed by anything (or nothing) to the first
Here's what I'm using:
$text =~ s/<span[^>]*>\s*<\/span>/ /gi;
$text =~ s/<span[^>]*>(<br>)*<\/span>/$1/gi;
This doesn't appear to work, though. The string I posted above
actually came through verbatim, so it must have matched false.
Of course, I know that this would fail on nested <span></span> tags,
which is why I'm switching over to HTML::HTML5:
arser. But in the
meanwhile, why did this one not match?
so please bear with me on this.
I've been using a regex to remove empty tags, but I see one that's not
working so I assume there's either a typo, or an error in the logic.
I'm trying to convert this:
<span class="Apple-style-span" style="font-family: Arial, Verdana,
Helvetica, sans-serif; "><br></span>
To:
<br>
It should also catch <span...></span> (with nothing inside), or
<span...> </span> (with a whitespace inside).
"class" and "style" can be anything (or non-existent), so I'm just
trying to remove <span, followed by anything (or nothing) to the first
, then the following </span>
Here's what I'm using:
$text =~ s/<span[^>]*>\s*<\/span>/ /gi;
$text =~ s/<span[^>]*>(<br>)*<\/span>/$1/gi;
This doesn't appear to work, though. The string I posted above
actually came through verbatim, so it must have matched false.
Of course, I know that this would fail on nested <span></span> tags,
which is why I'm switching over to HTML::HTML5:
meanwhile, why did this one not match?