pyparsing with nested table

A

astarocean

using pyparsing to deal with nested tables , wanna keep table's
structure and propertys .
but program was chunked with the </td> tag of inner table.

have any ideas?

here's the program


from pyparsing import *

mytable = """
<table id="leftpage_table" width="156" border="0" cellspacing="0"
cellpadding="0">
<tr id="trtd" height="24">
<td width="153" background="images/bt_kind.gif" align="center"
class="left_menu">system</td>
</tr>
<tr id="trtd_down" height="20">
<td id="trtd_down"><table id="inner_lefgpage_table" width="100%"
height="100%" border="0" cellspacing="0" cellpadding="0">
<tr id="inner_trtd" height="20">
<td background="images/bt_class.gif" align="center">art</td>
</tr>
<tr>
<td background="images/bt_class.gif" align="center">art</td>
</tr>
</table></td>
</tr>
</table>
"""

startTag = Literal("<")
endTag = Literal(">")
idPattern = CaselessLiteral("id").suppress() + Literal("=").suppress()
+ ( quotedString.copy().setParseAction( removeQuotes ) |
Word(srange("[a-zA-Z0-9_~]")))
attrPattern = Combine(Word(alphanums + "_") + Literal("=") + (
quotedString | Word(srange("[a-zA-Z0-9_~:&@#;?/\.]"))))

tablePattern = Forward()
def getItemCloseTag(x):
itemCloseTag = Combine(startTag + Literal("/") + CaselessLiteral(x)
+ endTag).suppress()
return itemCloseTag
def getItemStartTag(x):
itemStartTag = startTag.suppress() +
Keyword(x,caseless=True).suppress() + Group(ZeroOrMore(idPattern)) +
Group(ZeroOrMore(attrPattern)) + endTag.suppress()
return itemStartTag
def getItemPattern(x):
tCloseTag = getItemCloseTag(x)
itemPattern = getItemStartTag(x) + Group(ZeroOrMore(tablePattern))
+ Group(SkipTo(tCloseTag)) + tCloseTag
return itemPattern
def getMultiLevelPattern(x,y):
tCloseTag = getItemCloseTag(x)
itemPattern = getItemStartTag(x) + Group(OneOrMore(y)) + tCloseTag
return itemPattern

tdPattern = getItemPattern(x='td')
trPattern = getMultiLevelPattern('tr',tdPattern)
tablePattern = getMultiLevelPattern('table',trPattern)
t = tablePattern
for toks,strt,end in t.scanString(mytable):
print toks.asList()


OutPut:
[['leftpage_table'], ['width="156"', 'border="0"', 'cellspacing="0"',
'cellpadding="0"'], [['trtd'], ['height="24"'], [[], ['width="153"',
'background="images/bt_kind.gif"', 'align="center"',
'class="left_menu"'], [], ['system']], ['trtd_down'], ['height="20"'],
[['trtd_down'], [], [], ['<table id="inner_lefgpage_table" width="100%"
height="100%" border="0" cellspacing="0" cellpadding="0">\n <tr
id="inner_trtd" height="20">\n <td
background="images/bt_class.gif" align="center">art']], [], [], [[],
['background="images/bt_class.gif"', 'align="center"'], [], ['art']]]]
 
P

Paul McGuire

astarocean said:
using pyparsing to deal with nested tables , wanna keep table's
structure and propertys .
but program was chunked with the </td> tag of inner table.

have any ideas?

here's the program


from pyparsing import *
said:
tablePattern = Forward()
said:
tablePattern = getMultiLevelPattern('table',trPattern)
t = tablePattern
for toks,strt,end in t.scanString(mytable):
print toks.asList()

Load Forward's with '<<' instead of '='. Change:
tablePattern = getMultiLevelPattern('table',trPattern)
to:
tablePattern << getMultiLevelPattern('table',trPattern)

I think that is all you needed.

Awesome job! (Also check out the pyparsing built-ins for making HTML
and XML tags.)

-- Paul
 
A

astarocean

Paul said:
Load Forward's with '<<' instead of '='. Change:
tablePattern = getMultiLevelPattern('table',trPattern)
to:
tablePattern << getMultiLevelPattern('table',trPattern)

I think that is all you needed.

Awesome job! (Also check out the pyparsing built-ins for making HTML
and XML tags.)

-- Paul

thank you , i was wonding why my iteraiton not functional . so it's my
fault .

later , i checked other parsers like Clientable & BeautifulSoap ,
i think with beautifulsoap doing this job is a better idea.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top