how to extract something in between a pattern

C

Cheyne Li

Hi experts,

I'm not very familiar with ruby's library. I wonder if there a method
can extract something in a pattern? For example,

I have a string: a=aabbcc<p>ccddee</p>

I wanna get the anything between <p> and </p>, which is ccddee


Thanks in advance.
 
L

Li Chen

Cheyne said:
Hi experts,

I'm not very familiar with ruby's library. I wonder if there a method
can extract something in a pattern? For example,

I have a string: a=aabbcc<p>ccddee</p>

I wanna get the anything between <p> and </p>, which is ccddee


Thanks in advance.


C:\Users\Alex>irb
irb(main):001:0> require 'hpricot'
=> true
irb(main):002:0> a="aabbcc<p>ccddee</p>"
=> "aabbcc<p>ccddee</p>"
irb(main):003:0>
irb(main):004:0* doc=Hpricot(a)
=> #<Hpricot::Doc "aabbcc" {elem <p> "ccddee" </p>}>
irb(main):005:0> p doc.at('p').inner_text
"ccddee"
=> nil
irb(main):006:0>



Li
 
T

Tod Beardsley

Ya, in this case (HTML/XML), Hpricot is your best bet.

Otherwise, standard regex stuff would apply, imo.
 
S

suroot57

Hi experts,

I'm not very familiar with ruby's library. I wonder if there a method
can extract something in a pattern? For example,

I have a string: a=aabbcc<p>ccddee</p>

I wanna get the anything between <p> and </p>, which is ccddee

Thanks in advance.

If you want to use regexp, a quick and dirty way would be :

(a.split %r{</?p>})[1]
 
I

Ittay Dror

[Note: parts of this message were removed to make it a legal post.]



Hi experts,

I'm not very familiar with ruby's library. I wonder if there a method
can extract something in a pattern? For example,

I have a string: a=aabbcc<p>ccddee</p>

I wanna get the anything between <p> and </p>, which is ccddee

Thanks in advance.

If you want to use regexp, a quick and dirty way would be :

(a.split %r{</?p>})[1]
or:
irb(main):001:0> a = 'aabbcc<p>ccddee</p>'
=> "aabbcc<p>ccddee</p>"
irb(main):002:0> a[%r{<p>(.*)</p>}, 1]
=> "ccddee"
 
P

Patrick He

Or:

irb(main):004:0> a = 'aabbcc<p>ccddee</p>ccc<p>eee</p>'
=> "aabbcc<p>ccddee</p>ccc<p>eee</p>"
irb(main):005:0> a.scan(%r{<p>([^<]*)</p>})
=> [["ccddee"], ["eee"]]

I perfer to specify what character(s) not to match explicitly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top