beutifulsoup

Discussion in 'Python' started by luca72, Oct 29, 2008.

  1. luca72

    luca72 Guest

    Hello
    I try to use beautifulsoup
    i have this:
    sito = urllib.urlopen('http://www.prova.com/')
    esamino = BeautifulSoup(sito)
    luca = esamino.findAll('tr', align='center')

    print luca[0]

    >><tr align="center"><th width="5%"><a onclick="t('Only|G|BoT|05','#1');" href="#">#1</a></th><td width="10%">44.4MB</td><td width="90%" align="left"><font color="orange"> Pc-prova.rar </font></td></tr>


    I need to get the following information:
    1)Only|G|BoT|05
    2)#1
    3)44.4MB
    4)Pc-prova.rar
    with: print luca[0].a.string i get #1
    with print luca[0].td.string i get 44.4MB
    can you explain me how to get the others two value
    Thanks
    Luca
    luca72, Oct 29, 2008
    #1
    1. Advertising

  2. On Wed, 29 Oct 2008 09:45:31 -0700 (PDT), luca72 <> wrote:
    > Hello
    > I try to use beautifulsoup
    > i have this:
    > sito = urllib.urlopen('http://www.prova.com/')
    > esamino = BeautifulSoup(sito)
    > luca = esamino.findAll('tr', align='center')
    >
    > print luca[0]
    >

    [The following long string has been wrapped.]
    >>><tr align="center"><th width="5%"><a onclick="t('Only|G|BoT|05','#1');"

    href="#">#1</a></th><td width="10%">44.4MB</td>
    <td width="90%" align="left">
    <font color="orange"> Pc-prova.rar </font></td></tr>
    >
    > I need to get the following information:
    > 1)Only|G|BoT|05
    > 2)#1
    > 3)44.4MB
    > 4)Pc-prova.rar
    > with: print luca[0].a.string i get #1
    > with print luca[0].td.string i get 44.4MB
    > can you explain me how to get the others two value


    Like you, I struggle with BeautifulSoup; but perhaps this will help
    while waiting for somebody smarter to join the thread:

    >>> soup = BeautifulSoup.BeautifulSoup(

    .... """<tr align="center"><th width="5%">"""
    .... """<a onclick="t('Only|G|BoT|05','#1');" href="#">#1</a>"""
    .... """</th><td width="10%">44.4MB</td><td width="90%" align="left">"""
    .... """<font color="orange"> Pc-prova.rar </font></td></tr>""" )
    >>> tr = soup.findAll( 'tr' )
    >>> tr[0].findAll( text = True )

    [u'#1', u'44.4MB', u' Pc-prova.rar ']
    >>> c = tr[0].findChild( attrs={"onclick": True} )
    >>> print c[ "onclick" ]

    t('Only|G|BoT|05','#1');


    --
    To email me, substitute nowhere->spamcop, invalid->net.
    Peter Pearson, Oct 29, 2008
    #2
    1. Advertising

  3. Peter Pearson wrote:
    > Like you, I struggle with BeautifulSoup


    Well, there's always lxml.html if you need it.

    http://codespeak.net/lxml/

    Stefan
    Stefan Behnel, Oct 30, 2008
    #3
  4. luca72

    Kay Schluehr Guest

    On 29 Okt., 17:45, luca72 <> wrote:
    > Hello
    > I try to use beautifulsoup
    > i have this:
    > sito = urllib.urlopen('http://www.prova.com/')
    > esamino = BeautifulSoup(sito)
    > luca = esamino.findAll('tr', align='center')
    >
    > print luca[0]
    >
    > >><tr align="center"><th width="5%"><a onclick="t('Only|G|BoT|05','#1');" href="#">#1</a></th><td width="10%">44.4MB</td><td width="90%" align="left"><font color="orange"> Pc-prova.rar </font></td></tr>

    >
    > I need to get the following information:
    > 1)Only|G|BoT|05
    > 2)#1
    > 3)44.4MB
    > 4)Pc-prova.rar
    > with: print luca[0].a.string i get #1
    > with print luca[0].td.string i get 44.4MB
    > can you explain me how to get the others two value
    > Thanks
    > Luca


    The same way you got `luca`

    1,2) luca.find("a")["onclick"].split("'") and search through the
    result list
    3) luca.find("td").string
    4) luca.find("font").string
    Kay Schluehr, Oct 30, 2008
    #4
  5. luca72

    luca72 Guest

    hello
    Another stupit question instead of use
    sito = urllib.urlopen('http://www.prova.com/')
    esamino = BeautifulSoup(sito)

    i do
    sito = urllib.urlopen('http://onlygame.helloweb.eu/')
    file_sito = open('sito.html', 'wb')
    for line in sito :
    file_sito.write(line)
    file_sito.close()

    how can i pass the file sito.html to beautifulsoup?

    Regards

    Luca
    luca72, Oct 30, 2008
    #5
  6. luca72

    Kay Schluehr Guest

    On 30 Okt., 18:28, luca72 <> wrote:
    > hello
    > Another stupit question instead of use
    > sito = urllib.urlopen('http://www.prova.com/')
    > esamino = BeautifulSoup(sito)
    >
    > i do
    >  sito = urllib.urlopen('http://onlygame.helloweb.eu/')
    >  file_sito = open('sito.html', 'wb')
    >  for line in sito :
    >      file_sito.write(line)
    >  file_sito.close()
    >
    > how can i pass the file sito.html to beautifulsoup?
    >
    > Regards
    >
    > Luca


    download = urllib.urlopen("http://www.fiber-space.de/downloads/
    downloads.html")
    BeautifulSoup(download.read())

    Ciao
    Kay Schluehr, Oct 30, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page