html parser with regex, how to solve?

  • Thread starter Luiz Vitor Martinez Cardoso
  • Start date

L

Luiz Vitor Martinez Cardoso

[Note: parts of this message were removed to make it a legal post.]

Yeah,

I`m trying to develop a simple application using ruby (when this works i
will pass to rails). I need get the source code from a URL, and find for
this string:

<h3 class="zmp">$299.99</h3>

wow, but i need search for not only 149.00, but for all possible numbers, my
friend suggest this:

<h3 class="zmp">*$\d+\.\d{2}.*</h3>

i think this works! but i need other thing... look my code:

#!/usr/bin/ruby

require 'hpricot'
require 'open-uri'

@content = Hpricot(open("
http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"))

now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?

@content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(

how i can solved this?


thanks for you attention,
Luiz Vitor Martinez Cardoso.



--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br
 
Ad

Advertisements

S

s.ross

Don't use the regex. Let hpricot do what it's good at:

$ irb=> 149.00

In your code, your @content will be searchable the same way. Hpricot
will give you a collection of all h3's with class 'zmp'.

http://code.whytheluckystiff.net/doc/hpricot/

Hope this helps.


Yeah,

I`m trying to develop a simple application using ruby (when this
works i
will pass to rails). I need get the source code from a URL, and find
for
this string:

<h3 class="zmp">$299.99</h3>

wow, but i need search for not only 149.00, but for all possible
numbers, my
friend suggest this:

<h3 class="zmp">*$\d+\.\d{2}.*</h3>

i think this works! but i need other thing... look my code:

#!/usr/bin/ruby

require 'hpricot'
require 'open-uri'

@content = Hpricot(open("
http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"))

now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?

@content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(

how i can solved this?


thanks for you attention,
Luiz Vitor Martinez Cardoso.



--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br
 
L

Luiz Vitor Martinez Cardoso

[Note: parts of this message were removed to make it a legal post.]

Thanks much! This really works ;)

Now i`m having a new problem (very simple), the output is $1999,00, how i
can remove a $? I will need convert this to a float number ;)

Regards,
Luiz Vitor Martinez Cardoso.

Don't use the regex. Let hpricot do what it's good at:

$ irb=> 149.00

In your code, your @content will be searchable the same way. Hpricot
will give you a collection of all h3's with class 'zmp'.

http://code.whytheluckystiff.net/doc/hpricot/

Hope this helps.


Yeah,

I`m trying to develop a simple application using ruby (when this
works i
will pass to rails). I need get the source code from a URL, and find
for
this string:

<h3 class="zmp">$299.99</h3>

wow, but i need search for not only 149.00, but for all possible
numbers, my
friend suggest this:

<h3 class="zmp">*$\d+\.\d{2}.*</h3>

i think this works! but i need other thing... look my code:

#!/usr/bin/ruby

require 'hpricot'
require 'open-uri'

@content = Hpricot(open("
http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"))

now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?

@content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(

how i can solved this?


thanks for you attention,
Luiz Vitor Martinez Cardoso.



--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br


--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br
 
J

Joe

try this:

ele.text.sub('$', '')

Joe

Thanks much! This really works ;)

Now i`m having a new problem (very simple), the output is $1999,00, how i
can remove a $? I will need convert this to a float number ;)

Regards,
Luiz Vitor Martinez Cardoso.


Don't use the regex. Let hpricot do what it's good at:

$ irb
require 'rubygems'
require 'hpricot'
html = '<h3 class="zmp">149.00</h3>'
doc = Hpricot.parse(html)
ele = doc.search('h3.zmp')
puts ele.text
=> 149.00

In your code, your @content will be searchable the same way. Hpricot
will give you a collection of all h3's with class 'zmp'.

http://code.whytheluckystiff.net/doc/hpricot/

Hope this helps.


Yeah,

I`m trying to develop a simple application using ruby (when this
works i
will pass to rails). I need get the source code from a URL, and find
for
this string:

<h3 class="zmp">$299.99</h3>

wow, but i need search for not only 149.00, but for all possible
numbers, my
friend suggest this:

<h3 class="zmp">*$\d+\.\d{2}.*</h3>

i think this works! but i need other thing... look my code:

#!/usr/bin/ruby

require 'hpricot'
require 'open-uri'

@content = Hpricot(open("
http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"))

now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?

@content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(

how i can solved this?


thanks for you attention,
Luiz Vitor Martinez Cardoso.



--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br


--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br
 
Ad

Advertisements

L

Luiz Vitor Martinez Cardoso

[Note: parts of this message were removed to make it a legal post.]

Thanks

I do it!

Regards,
Luiz Vitor Martinez Cardoso.

try this:

ele.text.sub('$', '')

Joe

Thanks much! This really works ;)

Now i`m having a new problem (very simple), the output is $1999,00, how i
can remove a $? I will need convert this to a float number ;)

Regards,
Luiz Vitor Martinez Cardoso.


Don't use the regex. Let hpricot do what it's good at:

$ irb
require 'rubygems'
require 'hpricot'
html = '<h3 class="zmp">149.00</h3>'
doc = Hpricot.parse(html)
ele = doc.search('h3.zmp')
puts ele.text
=> 149.00

In your code, your @content will be searchable the same way. Hpricot
will give you a collection of all h3's with class 'zmp'.

http://code.whytheluckystiff.net/doc/hpricot/

Hope this helps.


On Jan 5, 2008, at 4:07 PM, Luiz Vitor Martinez Cardoso wrote:

Yeah,

I`m trying to develop a simple application using ruby (when this
works i
will pass to rails). I need get the source code from a URL, and find
for
this string:

<h3 class="zmp">$299.99</h3>

wow, but i need search for not only 149.00, but for all possible
numbers, my
friend suggest this:

<h3 class="zmp">*$\d+\.\d{2}.*</h3>

i think this works! but i need other thing... look my code:

#!/usr/bin/ruby

require 'hpricot'
require 'open-uri'

@content = Hpricot(open("
http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"))

now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?

@content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(

how i can solved this?


thanks for you attention,
Luiz Vitor Martinez Cardoso.



--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br


--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br


--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top