beautiful soup get class info

T

teddybubu

I am using beautifulsoup to get the title and date of the website.
title is working fine but I am not able to pull the date. Here is the code in the url:

<span class="date">October 22, 2011</span>

In Python, I am using the following code:
date1 = soup.span.text
data=soup.find_all(date="value")

Results in:

[]
March 5, 2014

What is the proper way to get this info?
Thanks.
 
T

teddybubu

Try this:



soup.find_all(name="span", class="date")



--

John Gordon Imagine what it must be like for a real medical doctor to

watch 'House', or a real serial killer to watch 'Dexter'.

I have python 2.7.2 and it does not like class in the code you provided. Now when I take out [ class="date"], this is returned:
[<span class="date">March 5, 2014</span>, <span class="date">March 5, 2014</span>]

This is the code I am using: "data = soup.find_all(name="span")
print (data)"
1. it returns today's date instead of the actual date
2. returns it twice
 
J

John Gordon

I have python 2.7.2 and it does not like class in the code you provided.

Oh right, 'class' is a reserved word. I imagine beautifulsoup has
a workaround for that.
Now when I take out [ class="date"], this is returned:
[<span class="date">March 5, 2014</span>, <span class="date">March 5, 2014</span>]

This is the code I am using: "data = soup.find_all(name="span")
print (data)"
1. it returns today's date instead of the actual date
2. returns it twice

Are there two occurrences of '<span class="date">March 5, 2014</span>'
in the HTML? If so, then beautifulsoup is doing its job correctly.

It might help if you posted the sample HTML data you're working with.
 
T

teddybubu

I have python 2.7.2 and it does not like class in the code you provided.



Oh right, 'class' is a reserved word. I imagine beautifulsoup has

a workaround for that.


Now when I take out [ class="date"], this is returned:
[<span class="date">March 5, 2014</span>, <span class="date">March 5, 2014</span>]
This is the code I am using: "data = soup.find_all(name="span")
print (data)"
1. it returns today's date instead of the actual date
2. returns it twice



Are there two occurrences of '<span class="date">March 5, 2014</span>'

in the HTML? If so, then beautifulsoup is doing its job correctly.



It might help if you posted the sample HTML data you're working with.



--

John Gordon Imagine what it must be like for a real medical doctor to

watch 'House', or a real serial killer to watch 'Dexter'.

ok I got this working. now to the next problem.... thanks.
 
M

Mark Lawrence

soup.find_all(name="span", class="date")


I have python 2.7.2 and it does not like class in the code you provided.



Oh right, 'class' is a reserved word. I imagine beautifulsoup has

a workaround for that.


Now when I take out [ class="date"], this is returned:
[<span class="date">March 5, 2014</span>, <span class="date">March 5, 2014</span>]
This is the code I am using: "data = soup.find_all(name="span")
print (data)"
1. it returns today's date instead of the actual date
2. returns it twice



Are there two occurrences of '<span class="date">March 5, 2014</span>'

in the HTML? If so, then beautifulsoup is doing its job correctly.



It might help if you posted the sample HTML data you're working with.



--

John Gordon Imagine what it must be like for a real medical doctor to

watch 'House', or a real serial killer to watch 'Dexter'.

ok I got this working. now to the next problem.... thanks.

I'm pleased to see that you have a solution. Now, should you wish to
ask further questions, would you please read and action this first
https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing the
double line spacing above, thanks.
 
C

Christopher Welborn

I am using beautifulsoup to get the title and date of the website.
title is working fine but I am not able to pull the date. Here is the code in the url:

<span class="date">October 22, 2011</span>

In Python, I am using the following code:
date1 = soup.span.text
data=soup.find_all(date="value")

Results in:

[]
March 5, 2014

What is the proper way to get this info?
Thanks.

I believe it's the 'attrs' argument.
http://www.crummy.com/software/BeautifulSoup/bs4/doc/

# Workaround the 'class' problem:
data = soup.find_all(attrs={'class': 'date'})

I haven't tested it, but it's worth looking into.
 
P

Peter Otten

Christopher said:
I am using beautifulsoup to get the title and date of the website.
title is working fine but I am not able to pull the date. Here is the
code in the url:

<span class="date">October 22, 2011</span>

In Python, I am using the following code:
date1 = soup.span.text
data=soup.find_all(date="value")

Results in:

[]
March 5, 2014

What is the proper way to get this info?
Thanks.

I believe it's the 'attrs' argument.
http://www.crummy.com/software/BeautifulSoup/bs4/doc/

# Workaround the 'class' problem:
data = soup.find_all(attrs={'class': 'date'})

I haven't tested it, but it's worth looking into.

Yes there are two ways to filtr by class:
.... <span class="one">alpha</span>
.... <span class="two">beta</span>""")

Use attrs:
[<span class="one">alpha</span>]

Append an underscore:
[<span class="two">beta</span>]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top