beutifulsoup

luca72 · Oct 29, 2008

Hello
I try to use beautifulsoup
i have this:
sito = urllib.urlopen('http://www.prova.com/')
esamino = BeautifulSoup(sito)
luca = esamino.findAll('tr', align='center')

print luca[0]

I need to get the following information:
1)Only|G|BoT|05
2)#1
3)44.4MB
4)Pc-prova.rar
with: print luca[0].a.string i get #1
with print luca[0].td.string i get 44.4MB
can you explain me how to get the others two value
Thanks
Luca

Peter Pearson · Oct 29, 2008

Hello
I try to use beautifulsoup
i have this:
sito = urllib.urlopen('http://www.prova.com/')
esamino = BeautifulSoup(sito)
luca = esamino.findAll('tr', align='center')

print luca[0]

[The following long string has been wrapped.] href="#">#1</a></th><td width="10%">44.4MB</td>

said:
I need to get the following information:
1)Only|G|BoT|05
2)#1
3)44.4MB
4)Pc-prova.rar
with: print luca[0].a.string i get #1
with print luca[0].td.string i get 44.4MB
can you explain me how to get the others two value

Like you, I struggle with BeautifulSoup; but perhaps this will help
while waiting for somebody smarter to join the thread:
.... """<tr align="center"><th width="5%">"""
.... """<a onclick="t('Only|G|BoT|05','#1');" href="#">#1</a>"""

.... "" said:
tr = soup.findAll( 'tr' )
tr[0].findAll( text = True ) [u'#1', u'44.4MB', u' Pc-prova.rar ']
c = tr[0].findChild( attrs={"onclick": True} )
print c[ "onclick" ]

Click to expand...

Click to expand...

t('Only|G|BoT|05','#1');

Stefan Behnel · Oct 30, 2008

Peter said:
Like you, I struggle with BeautifulSoup

Well, there's always lxml.html if you need it.

http://codespeak.net/lxml/

Stefan

Kay Schluehr · Oct 30, 2008

Hello
I try to use beautifulsoup
i have this:
sito = urllib.urlopen('http://www.prova.com/')
esamino = BeautifulSoup(sito)
luca = esamino.findAll('tr', align='center')

print luca[0]

I need to get the following information:
1)Only|G|BoT|05
2)#1
3)44.4MB
4)Pc-prova.rar
with: print luca[0].a.string i get #1
with print luca[0].td.string i get 44.4MB
can you explain me how to get the others two value
Thanks
Luca

The same way you got `luca`

1,2) luca.find("a")["onclick"].split("'") and search through the
result list
3) luca.find("td").string
4) luca.find("font").string

luca72 · Oct 30, 2008

hello
Another stupit question instead of use
sito = urllib.urlopen('http://www.prova.com/')
esamino = BeautifulSoup(sito)

i do
sito = urllib.urlopen('http://onlygame.helloweb.eu/')
file_sito = open('sito.html', 'wb')
for line in sito :
file_sito.write(line)
file_sito.close()

how can i pass the file sito.html to beautifulsoup?

Regards

Luca

Kay Schluehr · Oct 30, 2008

hello
Another stupit question instead of use
sito = urllib.urlopen('http://www.prova.com/')
esamino = BeautifulSoup(sito)

i do
sito = urllib.urlopen('http://onlygame.helloweb.eu/')
file_sito = open('sito.html', 'wb')
for line in sito :
file_sito.write(line)
file_sito.close()

how can i pass the file sito.html to beautifulsoup?

Regards

Luca

download = urllib.urlopen("http://www.fiber-space.de/downloads/
downloads.html")
BeautifulSoup(download.read())

Ciao

Help with my responsive home page	2	Dec 14, 2022
cannot get html content of tag with BeautifulSoup	1	Jun 18, 2010
Help with code	0	Jun 12, 2022
(discord.py) 'async_generator' has no attribute 'flatten'	1	Jul 7, 2023
cxfreeze	0	Oct 12, 2007
Uncaught ReferenceError: item is not defined at HTMLButtonElement.onclick in the: <button onclick="item.inserir()">Inserir dados</button>	1	Apr 22, 2023
csv module	1	Aug 24, 2007
How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 22, 2023

beutifulsoup

luca72

Peter Pearson

Stefan Behnel

Kay Schluehr

luca72

Kay Schluehr

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads