How to write matching pattern for multi-line HTML tag ?

S

Saurabh

Hi,

I want to create a pattern to remove the tags from a HTML source file.
But the pattern I have created does not handle the multiline tags. Is
there any way to handle the tags spanning over more than one line in
following pattern ?

s!\<(\!\|\/)?.*?(\/)?\>!!gix


Thanks in advance.

Regards
Saurabh.
 
N

Nick of course

Saurabh said:
s!\<(\!\|\/)?.*?(\/)?\>!!gix
I find it hard to believe that this regex could ever do anything
useful.
The /x modifier does nothing cos there's no spaces to ignore
The /i modifier is redundant cos there's no alphas in the regex
You need the /s modifier to operate on multiline strings
The | used for alternation is escaped so you're looking for a literal |

Try something like

s/<.*?>//
 
N

Nick of course

Nick said:
I find it hard to believe that this regex could ever do anything
useful.
The /x modifier does nothing cos there's no spaces to ignore
The /i modifier is redundant cos there's no alphas in the regex
You need the /s modifier to operate on multiline strings
The | used for alternation is escaped so you're looking for a literal |

Try something like

s/<.*?>//
Whoops!

s/<.*?>//sg
 
B

Ben Morrow

Quoth "Saurabh said:
I want to create a pattern to remove the tags from a HTML source file.
But the pattern I have created does not handle the multiline tags. Is
there any way to handle the tags spanning over more than one line in
following pattern ?

s!\<(\!\|\/)?.*?(\/)?\>!!gix

See perldoc -q html. Preferably before bothering people all over the
world with a question you could answer yourself.

Ben
 
J

Jürgen Exner

Saurabh said:
I want to create a pattern to remove the tags from a HTML source file.
But the pattern I have created does not handle the multiline tags. Is

That is the least of your problems. As has been mentioned gazillion of times
REs are the wrong tool for parsing HTML.
See "perldoc -q HTML" for details why and for much better approaches.
And don't forget to check CPAN for ready-made solutions (yes, there is a
module to strip tags from HTML code).

jue
 
K

krakle

Saurabh said:
I want to create a pattern to remove the tags from a HTML source file.

There are modules for this... Why reinvent the wheel? The purpose of a
module is so you can accomplish a task without slaving over the code to
come up with a decent untested algorithm...

If it wasn't for CPAN Perl would be the most useless language to write
anything efficient. I don't think there is ONE script I wrote (even if
10 lines) that doesn't utilize a module.
 
T

Tad McClellan

Nick of course said:
Nick of course wrote:
Whoops!

s/<.*?>//sg


Whoops some more! (that is what usually happens when you attempt
to re-answer a Frequently Asked Question).

Try it on these HTML snippets:

<p>A < B</p>

<!-- <no><tags><here> -->


See the answer to this FAQ for yet more examples that make
your attempt to use regexes on HTML futile.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top