Best Way to extract Numbers from String

Jimbo · Mar 19, 2010

Hello

I am trying to grab some numbers from a string containing HTML text.
Can you suggest any good functions that I could use to do this? What
would be the easiest way to extract the following numbers from this
string...

My String has this layout & I have commented what I want to grab:

Code:

 """</th>
				<td class="last">43.200 </td>
				<td class="change indicator" nowrap>0.040 </td>

                                                   <td>43.150 </td> #
I need to grab this number only
				<td>43.200 </td>
                                                   <td>43.130 </td> #
I need to grab this number only
				<td>43.290 </td> 				             <td>43.100 </td> # I need to
grab this number only
				<td>7,450,447 </td>
				<td class="middle"><a
					href="/asx/markets/optionPrices.do?
by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
a></td>
				<td class="middle"><a
					href="/asx/markets/warrantPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">Warrants &amp; Structured
Products</a></td>
				<td class="middle"><a
					href="/asx/markets/cfdPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
				<td class="middle"><a href="http://hfgapps.hubb.com/asxtools/
Charts.aspx?
TimeFrame=D6&amp;compare=comp_index&amp;indicies=XJO&amp;pma1=20&amp;pma2=20&amp;asxCode=BHP"><img
src="/images/chart.gif" border="0" height="15" width="15"></a>
</td>
				<td><a href="/research/announcements/status_notes.htm#XD">XD</a>
				</td>
				<td><a href="/asx/statistics/announcements.do?
by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
</td>
			</tr>"""

Gabriel Genellina · Mar 19, 2010

En Sat said:
I am trying to grab some numbers from a string containing HTML text.
Can you suggest any good functions that I could use to do this? What
would be the easiest way to extract the following numbers from this
string...

My String has this layout & I have commented what I want to grab:

Code:

"""</th> <td class="last">43.200 </td> <td class="change indicator" nowrap>0.040 </td> <td>43.150 </td> # I need to grab this number only <td>43.200 </td> <td>43.130 </td> # I need to grab this number only[/QUOTE] I'd use BeautifulSoup [1] to handle bad formed HTML like that. [1] http://www.crummy.com/software/BeautifulSoup/

Luis M. González · Mar 20, 2010

Hello

I am trying to grab some numbers from a string containing HTML text.
Can you suggest any good functions that I could use to do this? What
would be the easiest way to extract the following numbers from this
string...

My String has this layout & I have commented what I want to grab:

Code:

 """</th>
                                <td class="last">43.200 </td>
                                <td class="change indicator" nowrap>0.040 </td>

                                                   <td>43.150 </td> #
I need to grab this number only
                                <td>43.200 </td>
                                                   <td>43.130 </td> #
I need to grab this number only
                                <td>43.290 </td>                                         <td>43.100 </td> # I need to
grab this number only
                                <td>7,450,447 </td>
                                <td class="middle"><a
                                        href="/asx/markets/optionPrices.do?
by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
a></td>
                                <td class="middle"><a
                                        href="/asx/markets/warrantPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">Warrants & Structured
Products</a></td>
                                <td class="middle"><a
                                        href="/asx/markets/cfdPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
                                <td class="middle"><a href="http://hfgapps.hubb.com/asxtools/
Charts.aspx?
TimeFrame=D6&compare=comp_index&indicies=XJO&pma1=20&pma2=20&asxCode=BHP">< img
src="/images/chart.gif" border="0" height="15" width="15"></a>
</td>
                                <td><a href="/research/announcements/status_notes.htm#XD">XD</a>
                                </td>
                                <td><a href="/asx/statistics/announcements.do?
by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
</td>
                        </tr>"""

You should use BeautifulSoup or perhaps regular expressions.
Or if you are not very smart, lik me, just try a brute force approach:
for e in i.split():
if '.' in e and e[0].isdigit():
print (e)

43.200
0.040
43.150
43.200
43.130
43.290
43.100

Jimbo · Mar 20, 2010

Hello

I am trying to grab some numbers from a string containing HTML text.
Can you suggest any good functions that I could use to do this? What
would be the easiest way to extract the following numbers from this
string...

My String has this layout & I have commented what I want to grab:

Code:

 """</th>
                                <td class="last">43.200 </td>
                                <td class="change indicator" nowrap>0.040 </td>[/QUOTE]
[QUOTE]
                                                   <td>43.150 </td> #
I need to grab this number only
                                <td>43.200 </td>
                                                   <td>43.130 </td> #
I need to grab this number only
                                <td>43.290 </td>                                         <td>43.100 </td> # I need to
grab this number only
                                <td>7,450,447 </td>
                                <td class="middle"><a
                                        href="/asx/markets/optionPrices.do?
by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
a></td>
                                <td class="middle"><a
                                        href="/asx/markets/warrantPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">Warrants & Structured
Products</a></td>
                                <td class="middle"><a
                                        href="/asx/markets/cfdPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
                                <td class="middle"><a href="http://hfgapps.hubb.com/asxtools/
Charts.aspx?
TimeFrame=D6&compare=comp_index&indicies=XJO&pma1=20&pma2=20&asxCode=BHP">< img
src="/images/chart.gif" border="0" height="15" width="15"></a>
</td>
                                <td><a href="/research/announcements/status_notes.htm#XD">XD</a>
                                </td>
                                <td><a href="/asx/statistics/announcements.do?
by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
</td>
                        </tr>"""

You should use BeautifulSoup or perhaps regular expressions.
Or if you are not very smart, lik me, just try a brute force approach:

for e in i.split():
if '.' in e and e[0].isdigit():
print (e)

43.200
0.040
43.150
43.200
43.130
43.290
43.100

- Hide quoted text -

- Show quoted text -- Hide quoted text -

- Show quoted text -

Thanks very much, I'm going to look at regular expressions but that
for your code, it shows me how I can do it iwth standard python

Novocastrian_Nomad · Mar 20, 2010

Regular expression are very powerful, and I use them a lot in my
paying job (unfortunately not with Python). You are however,
basically using a second programing language, which can be difficult
to master.

Does this give you the desired result?

import re

matches = re.findall('<td>([\d\.,]+)\s*</td>', code)
for match in matches:
print match

resulting in this output:
43.150
43.200
43.130
43.290
43.100
7,450,447

Extract information from HTML table	7	Apr 1, 2007
What's the best way to parse this HTML tag?	3	Mar 11, 2012
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
What's the best way to write this regular expression?	41	Mar 6, 2012
Can anybody plz help me outof this problem in mah code	0	Nov 17, 2011
Mini Web Server in C++ (Part One)	4	Oct 2, 2025
Extract numbers from string	7	Sep 25, 2007
Help with code	0	Jun 11, 2022

Best Way to extract Numbers from String

Jimbo

Gabriel Genellina

Luis M. González

Jimbo

Novocastrian_Nomad

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads