Best Way to extract Numbers from String

J

Jimbo

Hello

I am trying to grab some numbers from a string containing HTML text.
Can you suggest any good functions that I could use to do this? What
would be the easiest way to extract the following numbers from this
string...

My String has this layout & I have commented what I want to grab:
Code:
 """</th>
				<td class="last">43.200 </td>
				<td class="change indicator" nowrap>0.040 </td>

                                                   <td>43.150 </td> #
I need to grab this number only
				<td>43.200 </td>
                                                   <td>43.130 </td> #
I need to grab this number only
				<td>43.290 </td> 				             <td>43.100 </td> # I need to
grab this number only
				<td>7,450,447 </td>
				<td class="middle"><a
					href="/asx/markets/optionPrices.do?
by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
a></td>
				<td class="middle"><a
					href="/asx/markets/warrantPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">Warrants &amp; Structured
Products</a></td>
				<td class="middle"><a
					href="/asx/markets/cfdPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
				<td class="middle"><a href="http://hfgapps.hubb.com/asxtools/
Charts.aspx?
TimeFrame=D6&amp;compare=comp_index&amp;indicies=XJO&amp;pma1=20&amp;pma2=20&amp;asxCode=BHP"><img
src="/images/chart.gif" border="0" height="15" width="15"></a>
</td>
				<td><a href="/research/announcements/status_notes.htm#XD">XD</a>
				</td>
				<td><a href="/asx/statistics/announcements.do?
by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
</td>
			</tr>"""
 
G

Gabriel Genellina

I am trying to grab some numbers from a string containing HTML text.
Can you suggest any good functions that I could use to do this? What
would be the easiest way to extract the following numbers from this
string...

My String has this layout & I have commented what I want to grab:
Code:
 """</th>
				<td class="last">43.200 </td>
				<td class="change indicator" nowrap>0.040 </td>

<td>43.150 </td> #
I need to grab this number only
				<td>43.200 </td>
<td>43.130 </td> #
I need to grab this number only[/QUOTE]

I'd use BeautifulSoup [1] to handle bad formed HTML like that.

[1] http://www.crummy.com/software/BeautifulSoup/
 
L

Luis M. González

Hello

I am trying to grab some numbers from a string containing HTML text.
Can you suggest any good functions that I could use to do this? What
would be the easiest way to extract the following numbers from this
string...

My String has this layout & I have commented what I want to grab:
Code:
 """</th>
                                <td class="last">43.200 </td>
                                <td class="change indicator" nowrap>0.040 </td>

                                                   <td>43.150 </td> #
I need to grab this number only
                                <td>43.200 </td>
                                                   <td>43.130 </td> #
I need to grab this number only
                                <td>43.290 </td>                                         <td>43.100 </td> # I need to
grab this number only
                                <td>7,450,447 </td>
                                <td class="middle"><a
                                        href="/asx/markets/optionPrices.do?
by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
a></td>
                                <td class="middle"><a
                                        href="/asx/markets/warrantPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">Warrants & Structured
Products</a></td>
                                <td class="middle"><a
                                        href="/asx/markets/cfdPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
                                <td class="middle"><a href="http://hfgapps.hubb.com/asxtools/
Charts.aspx?
TimeFrame=D6&compare=comp_index&indicies=XJO&pma1=20&pma2=20&asxCode=BHP">< img
src="/images/chart.gif" border="0" height="15" width="15"></a>
</td>
                                <td><a href="/research/announcements/status_notes.htm#XD">XD</a>
                                </td>
                                <td><a href="/asx/statistics/announcements.do?
by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
</td>
                        </tr>"""


You should use BeautifulSoup or perhaps regular expressions.
Or if you are not very smart, lik me, just try a brute force approach:
for e in i.split():
if '.' in e and e[0].isdigit():
print (e)


43.200
0.040
43.150
43.200
43.130
43.290
43.100
 
J

Jimbo

I am trying to grab some numbers from a string containing HTML text.
Can you suggest any good functions that I could use to do this? What
would be the easiest way to extract the following numbers from this
string...
My String has this layout & I have commented what I want to grab:
Code:
 """</th>
                                <td class="last">43.200 </td>
                                <td class="change indicator" nowrap>0.040 </td>[/QUOTE]
[QUOTE]
                                                   <td>43.150 </td> #
I need to grab this number only
                                <td>43.200 </td>
                                                   <td>43.130 </td> #
I need to grab this number only
                                <td>43.290 </td>                                         <td>43.100 </td> # I need to
grab this number only
                                <td>7,450,447 </td>
                                <td class="middle"><a
                                        href="/asx/markets/optionPrices.do?
by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
a></td>
                                <td class="middle"><a
                                        href="/asx/markets/warrantPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">Warrants & Structured
Products</a></td>
                                <td class="middle"><a
                                        href="/asx/markets/cfdPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
                                <td class="middle"><a href="http://hfgapps.hubb.com/asxtools/
Charts.aspx?
TimeFrame=D6&compare=comp_index&indicies=XJO&pma1=20&pma2=20&asxCode=BHP">< img
src="/images/chart.gif" border="0" height="15" width="15"></a>
</td>
                                <td><a href="/research/announcements/status_notes.htm#XD">XD</a>
                                </td>
                                <td><a href="/asx/statistics/announcements.do?
by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
</td>
                        </tr>"""

You should use BeautifulSoup or perhaps regular expressions.
Or if you are not very smart, lik me, just try a brute force approach:

        for e in i.split():
                if '.' in e and e[0].isdigit():
                        print (e)

43.200
0.040
43.150
43.200
43.130
43.290
43.100



- Hide quoted text -

- Show quoted text -- Hide quoted text -

- Show quoted text -

Thanks very much, I'm going to look at regular expressions but that
for your code, it shows me how I can do it iwth standard python :)
 
N

Novocastrian_Nomad

Regular expression are very powerful, and I use them a lot in my
paying job (unfortunately not with Python). You are however,
basically using a second programing language, which can be difficult
to master.

Does this give you the desired result?

import re

matches = re.findall('<td>([\d\.,]+)\s*</td>', code)
for match in matches:
print match

resulting in this output:
43.150
43.200
43.130
43.290
43.100
7,450,447
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,054
Latest member
LucyCarper

Latest Threads

Top