ask for a RE pattern to match TABLE in html

O

oyster

that is, there is no TABLE tag between a TABLE, for example
<table >something with out table tag</table>
what is the RE pattern? thanks

the following is not right
<table.*?>[^table]*?</table>
 
S

Stefan Behnel

oyster said:
that is, there is no TABLE tag between a TABLE, for example
<table >something with out table tag</table>
what is the RE pattern? thanks

the following is not right
<table.*?>[^table]*?</table>

Why not use an HTML parser instead? Try lxml.html.

http://codespeak.net/lxml/

Stefan
 
J

Jonathan Gardner

Stating it differently: in order to correctly recognize HTML
tags, you must use an HTML parser.  Trying to write an HTML
parser in a single RE is probably not practical.

s/practical/possible

It isn't *possible* to grok HTML with regular expressions. Individual
tags--yes. But not a full element where nesting is possible. At least
not properly.

Maybe we need some notes on the limits of regular expressions in the
re documentation for people who haven't taken the computer science
courses on parsing and grammars. Then we could explain the necessity
of real parsers and grammars, at least in layman's terms.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,681
Members
48,796
Latest member
Greg L.

Latest Threads

Top