I added extra td tags to your example, for whatever reason I am
getting None. When I do the following:
print all_tds[0].string
print all_tds[8].string
from BeautifulSoup import BeautifulSoup
doc = """
<html>
<head>
<title></title>
</head>
<body>
<table>
</table>
<table>
<tr><td>hello</td></tr>
<tr><td>world</td><td>goodbye</td></tr>
<tr>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 48.884 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 49.950 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 69.322 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 99.740 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
</tr>
</table>
</body>
</html>
"""
soup = BeautifulSoup(doc)
tables = soup.findAll('table')
target_table = tables[1]
all_tds = target_table.findAll('td')
print all_tds[0].string
print all_tds[8].string
tds_str = all_tds[8].string
print tds_str
Output I am getting is following:
None
None
I am not sure why I am getting None for these lines:
print all_tds[0].string
print all_tds[8].string
I need to traverse a html page with big table that has many row and
columns. For example, how to go 35th td tag and do regex to retireve
the content. After that is done, you move down to 15th td tag from
35th tag (35+15) and do regex to retrieve the content?
1) You can find your table using one of these methods:
a)
target_table = soup.find('table', id='car_parts')
b)
tables = soup.findall('table')
target_table = tables[2]
The tables are put in a list in the order that they appear on the
page.
2) You can get all the td's in the table using this statement:
all_tds = target_table.findall('td')
3) You can get the contents of the tags using these statements:
print all_tds[34].string
print all_tds[49].string
Here is an example:
from BeautifulSoup import BeautifulSoup
doc = """
<html>
<head>
<title></title>
</head>
<body>
<table>
</table>
<table>
<tr><td>hello</td></tr>
<tr><td>world</td><td>goodbye</td></tr>
</table>
</body>
</html>
"""
soup = BeautifulSoup(doc)
tables = soup.findAll('table')
target_table = tables[1]
all_tds = target_table.findAll('td')
print all_tds[0].string
print all_tds[2].string
--output:--
hello
goddbye