python-parser running Beautiful Soup needs to be reviewed

Martin Kaspar · Dec 11, 2010

Hello commnity

i am new to Python and to Beatiful Soup also!
It is told to be a great tool to parse and extract content. So here i
am...:

I want to take the content of a <td>-tag of a table in a html
document. For example, i have this table

<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>

<td>
This is the second sample text
</td>
</tr>
</table>

How can i use beautifulsoup to take the text "This is a sample text"?

Should i make use
soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
the whole table.

See the target http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323

Well - what have we to do first:

The first thing is t o find the table:

i do this with Using find rather than findall returns the first item
in the list
(rather than returning a list of all finds - in which case we'd have
to add an extra [0]
to take the first element of the list):

table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})

Then use find again to find the first td:

first_td = soup.find('td')

Then we have to use renderContents() to extract the textual contents:

text = first_td.renderContents()

.... and the job is done (though we may also want to use strip() to
remove leading and trailing spaces:

trimmed_text = text.strip()

This should give us:

print trimmed_text
This is a sample text

as desired.

What do you think about the code? I love to hear from you!?

greetings
matze

Stef Mientki · Dec 11, 2010

Hello commnity

i am new to Python and to Beatiful Soup also!
It is told to be a great tool to parse and extract content. So here i
am...:

I want to take the content of a <td>-tag of a table in a html
document. For example, i have this table

<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>

<td>
This is the second sample text
</td>
</tr>
</table>

How can i use beautifulsoup to take the text "This is a sample text"?

Should i make use
soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
the whole table.

See the target http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323

Well - what have we to do first:

The first thing is t o find the table:

i do this with Using find rather than findall returns the first item
in the list
(rather than returning a list of all finds - in which case we'd have
to add an extra [0]
to take the first element of the list):

table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})

Then use find again to find the first td:

first_td = soup.find('td')

Then we have to use renderContents() to extract the textual contents:

text = first_td.renderContents()

... and the job is done (though we may also want to use strip() to
remove leading and trailing spaces:

trimmed_text = text.strip()

This should give us:

print trimmed_text
This is a sample text

as desired.

What do you think about the code? I love to hear from you!?

I've no opinion.
I'm just struggling with BeautifulSoup myself, finding it one of the toughest libs I've seen ;-)

So the simplest solution I came up with:

Text = """
<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>

<td>
This is the second sample text
</td>
</tr>
</table>
"""
Content = BeautifulSoup ( Text )
print Content.find('td').contents[0].strip()
And now I wonder how to get the next contents !!

cheers,
Stef

Peter Pearson · Dec 11, 2010

On Sat, 11 Dec 2010 22:38:43 +0100, Stef Mientki wrote:
[snip]

So the simplest solution I came up with:

Text = """
<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>

<td>
This is the second sample text
</td>
</tr>
</table>
"""
Content = BeautifulSoup ( Text )
print Content.find('td').contents[0].strip()
And now I wonder how to get the next contents !!

Here's a suggestion:

peter@eleodes:~$ python
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information..... <table class="bp_ergebnis_tab_info">
.... <tr>
.... <td>
.... This is a sample text
.... </td>
....
.... <td>
.... This is the second sample text
.... </td>
.... </tr>
.... print xx.contents[0].strip()
....
This is a sample text
This is the second sample text

Alexander Kapps · Dec 11, 2010

Hello commnity

i am new to Python and to Beatiful Soup also!
It is told to be a great tool to parse and extract content. So here i
am...:

I want to take the content of a<td>-tag of a table in a html
document. For example, i have this table

<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>

<td>
This is the second sample text
</td>
</tr>
</table>

How can i use beautifulsoup to take the text "This is a sample text"?

Should i make use
soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
the whole table.

See the target http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323

Well - what have we to do first:

The first thing is t o find the table:

i do this with Using find rather than findall returns the first item
in the list
(rather than returning a list of all finds - in which case we'd have
to add an extra [0]
to take the first element of the list):

table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})

Then use find again to find the first td:

first_td = soup.find('td')

Then we have to use renderContents() to extract the textual contents:

text = first_td.renderContents()

... and the job is done (though we may also want to use strip() to
remove leading and trailing spaces:

trimmed_text = text.strip()

This should give us:

print trimmed_text
This is a sample text

as desired.

What do you think about the code? I love to hear from you!?

Click to expand...

I've no opinion.
I'm just struggling with BeautifulSoup myself, finding it one of the toughest libs I've seen ;-)

Really? While I'm by no means an expert, I find it very easy to work
with. It's very well structured IMHO.

So the simplest solution I came up with:

Text = """
<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>

<td>
This is the second sample text
</td>
</tr>
</table>
"""
Content = BeautifulSoup ( Text )
print Content.find('td').contents[0].strip()
And now I wonder how to get the next contents !!

Content = BeautifulSoup ( Text )
for td in Content.findAll('td'):
print td.string.strip() # or td.renderContents().strip()

Stef Mientki · Dec 12, 2010

I've no opinion.

Really? While I'm by no means an expert, I find it very easy to work with. It's very well
structured IMHO.

I think the cause lies in the documentation.
The PySide documentation is much easier to understand (at least for me)

http://www.pyside.org/docs/pyside/PySide/QtWebKit/QWebElement.html

cheers,
Stef

python-parser running Beautiful Soup only spits out one line of 10.What i have gotten wrong here?	1	Dec 25, 2010
How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 21, 2023
Beautiful Soup iterator question....	2	Apr 20, 2007
A little complex usage of Beautiful Soup Parsing Help!	1	Jul 20, 2011
parsing tables with beautiful soup?	3	Mar 21, 2007
Need help with <rowspan> in an HTML table	1	Nov 6, 2024
Can anyone please help? HTML - two tables applying different styles	4	Dec 1, 2020
Using Beautiful Soup to entangle bookmarks.html	15	Sep 7, 2006

python-parser running Beautiful Soup needs to be reviewed

Martin Kaspar

Stef Mientki

Peter Pearson

Alexander Kapps

Stef Mientki

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads