Beautiful parse joy - Oh what fun

R

rh0dium

Hi all,

I am trying to parse into a dictionary a table and I am having all
kinds of fun. Can someone please help me out.

What I want is this:

dic={'Division Code':'SALS','Employee':'LOO ABLE'}

Here is what I have..

html="""<table> <tr valign="top"><td width="24"><img
src="/icons/ecblank.gif" border="0" height="1" width="1" alt=""
/></td><td width="129"><b><font size="2" face="Arial">Division Code:
</font></b></td><td width="693"><font size="2"
face="Arial">SALS</font></td></tr> <tr valign="top"><td width="24"><img
src="/icons/ecblank.gif" border="0" height="1" width="1" alt="" /> <td
width="129"><b><font size="2" face="Arial">Employee:
</font></b></td> <td width="693"><font size="2"
face="Arial">LOO</font><b><font size="2" face="Arial"> </font></b><font
size="2" face="Arial">ABLE</font></td></tr></table> """


from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup()
soup.feed(html)

dic={}
for row in soup('table')[0]('tr'):
column = row('td')
print column[1].findNext('font').string.strip(),
column[2].findNext('font').string.strip()
dic[column[1].findNext('font').string.strip()]=
column[2].findNext('font').string.strip()

for key in dic.keys():
print key, dic[key]

The problem is I am missing the last name ABLE. How can I get "ALL"
of the text. Clearly I have something wrong with my font string.. but
what it is I am not sure of.

Please and thanks!!
 
L

Larry Bates

rh0dium said:
Hi all,

I am trying to parse into a dictionary a table and I am having all
kinds of fun. Can someone please help me out.

What I want is this:

dic={'Division Code':'SALS','Employee':'LOO ABLE'}

Here is what I have..

html="""<table> <tr valign="top"><td width="24"><img
src="/icons/ecblank.gif" border="0" height="1" width="1" alt=""
/></td><td width="129"><b><font size="2" face="Arial">Division Code:
</font></b></td><td width="693"><font size="2"
face="Arial">SALS</font></td></tr> <tr valign="top"><td width="24"><img
src="/icons/ecblank.gif" border="0" height="1" width="1" alt="" /> <td
width="129"><b><font size="2" face="Arial">Employee:
</font></b></td> <td width="693"><font size="2"
face="Arial">LOO</font><b><font size="2" face="Arial"> </font></b><font
size="2" face="Arial">ABLE</font></td></tr></table> """


from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup()
soup.feed(html)

dic={}
for row in soup('table')[0]('tr'):
column = row('td')
print column[1].findNext('font').string.strip(),
column[2].findNext('font').string.strip()
dic[column[1].findNext('font').string.strip()]=
column[2].findNext('font').string.strip()

for key in dic.keys():
print key, dic[key]

The problem is I am missing the last name ABLE. How can I get "ALL"
of the text. Clearly I have something wrong with my font string.. but
what it is I am not sure of.

Please and thanks!!
In the last row you have 3 <font> tags. The first one
contains LOO the second one is empty and the third one
contains ABLE.

<td width="693"><font size="2" face="Arial">LOO</font><b>
<font size="2" face="Arial"> </font></b>
<font size="2" face="Arial">ABLE</font></td>

Your code is not expecting the second (empty) tag.

-Larry Bates
 
K

KvS

Maybe a more robust approach is just to walk through the string
counting the (increments) of the number of brackets "<" and ">" as you
know that all the relevant text occurs right after a ">" has occured
that sets your counter to 0 (meaning you're at the "highest level").
There's no relevant text if the next character is again a "<".
 
G

George Sakkis

Here's one way to do it:

import re
_any_re = re.compile('.+')

d = {}
for row in BeautifulSoup(html).fetch('tr'):
columns = row.fetch('td')
field = columns[1].firstText(_any_re).rstrip(' \t\n:')
value = ' '.join(text.rstrip()
for text in columns[2].fetchText(_any_re))
d[field] = value
print d

George
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top