newb: BeautifulSoup

crybaby · Sep 20, 2007

I need to traverse a html page with big table that has many row and
columns. For example, how to go 35th td tag and do regex to retireve
the content. After that is done, you move down to 15th td tag from
35th tag (35+15) and do regex to retrieve the content?

TheFlyingDutchman · Sep 21, 2007

I need to traverse a html page with big table that has many row and
columns. For example, how to go 35th td tag and do regex to retireve
the content. After that is done, you move down to 15th td tag from
35th tag (35+15) and do regex to retrieve the content?

Make the file an xhtml file (valid xml) if it isn't already and then
you can use software written to process XML files:

http://pyxml.sourceforge.net/topics/

Stefan Behnel · Sep 21, 2007

TheFlyingDutchman said:
Make the file an xhtml file (valid xml) if it isn't already and then
you can use software written to process XML files:

http://pyxml.sourceforge.net/topics/

.... or just use software that can process XML and HTML the same way *and* that
supports XPath and tree iteration so that you can easily select the content
you want.

http://codespeak.net/lxml/

Stefan

7stud · Sep 21, 2007

I need to traverse a html page with big table that has many row and
columns. For example, how to go 35th td tag and do regex to retireve
the content. After that is done, you move down to 15th td tag from
35th tag (35+15) and do regex to retrieve the content?

1) You can find your table using one of these methods:

a)
target_table = soup.find('table', id='car_parts')

b)
tables = soup.findall('table')
target_table = tables[2]

The tables are put in a list in the order that they appear on the
page.

2) You can get all the td's in the table using this statement:

all_tds = target_table.findall('td')

3) You can get the contents of the tags using these statements:

print all_tds[34].string
print all_tds[49].string

Here is an example:

from BeautifulSoup import BeautifulSoup

doc = """
<html>
<head>
<title></title>
</head>
<body>
<table>
</table>

<table>
<tr><td>hello</td></tr>
<tr><td>world</td><td>goodbye</td></tr>
</table>
</body>
</html>
"""

soup = BeautifulSoup(doc)

tables = soup.findAll('table')
target_table = tables[1]

all_tds = target_table.findAll('td')
print all_tds[0].string
print all_tds[2].string

--output:--
hello
goddbye

Gabriel Genellina · Sep 21, 2007

I need to traverse a html page with big table that has many row and
columns. For example, how to go 35th td tag and do regex to retireve
the content. After that is done, you move down to 15th td tag from
35th tag (35+15) and do regex to retrieve the content?

See the examples at the BeautifulSoup page
http://www.crummy.com/software/BeautifulSoup/

crybaby · Sep 21, 2007

I added extra td tags to your example, for whatever reason I am
getting None. When I do the following:

print all_tds[0].string
print all_tds[8].string

from BeautifulSoup import BeautifulSoup

doc = """
<html>
<head>
<title></title>
</head>
<body>
<table>
</table>

<table>
<tr><td>hello</td></tr>
<tr><td>world</td><td>goodbye</td></tr>
<tr>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 48.884 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 49.950 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 69.322 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
<td align=right width=80><font size=2 face="New Times
Roman,Times,Serif"> 99.740 </font></td>
<td width=1 height=0 bgcolor="#800000"><img src="/img/
spacer.gif" width=1 height=0 alt="|"/></td>
</tr>
</table>
</body>
</html>
"""

soup = BeautifulSoup(doc)

tables = soup.findAll('table')
target_table = tables[1]

all_tds = target_table.findAll('td')
print all_tds[0].string
print all_tds[8].string
tds_str = all_tds[8].string
print tds_str

Output I am getting is following:
None
None

I am not sure why I am getting None for these lines:

print all_tds[0].string
print all_tds[8].string

I need to traverse a html page with big table that has many row and
columns. For example, how to go 35th td tag and do regex to retireve
the content. After that is done, you move down to 15th td tag from
35th tag (35+15) and do regex to retrieve the content?

Click to expand...

1) You can find your table using one of these methods:

a)
target_table = soup.find('table', id='car_parts')

b)
tables = soup.findall('table')
target_table = tables[2]

The tables are put in a list in the order that they appear on the
page.

2) You can get all the td's in the table using this statement:

all_tds = target_table.findall('td')

3) You can get the contents of the tags using these statements:

print all_tds[34].string
print all_tds[49].string

Here is an example:

from BeautifulSoup import BeautifulSoup

doc = """
<html>
<head>
<title></title>
</head>
<body>
<table>
</table>

<table>
<tr><td>hello</td></tr>
<tr><td>world</td><td>goodbye</td></tr>
</table>
</body>
</html>
"""

soup = BeautifulSoup(doc)

tables = soup.findAll('table')
target_table = tables[1]

all_tds = target_table.findAll('td')
print all_tds[0].string
print all_tds[2].string

--output:--
hello
goddbye

Bootstrap contact form not working	2	Feb 15, 2025
Sort by number of characters	1	Nov 2, 2023
Need help with <rowspan> in an HTML table	1	Nov 6, 2024
Javascript DOM	1	Mar 29, 2023
Why Do We Need Angular, React, or Other Frontend Frameworks?	0	Apr 19, 2025
Removing tags with BeautifulSoup	0	Aug 8, 2007
BeautifulSoup and Problem Tables	2	Sep 20, 2008
Hello I am learning how to code and I tried making a calculator with HTML and js with some CSS I am stuck at thing, Like the screen value is	0	Mar 13, 2025

newb: BeautifulSoup

crybaby

TheFlyingDutchman

Stefan Behnel

7stud

Gabriel Genellina

crybaby

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads