Regular expressions - Again

J. mp · Mar 6, 2007

I'm really bad with this things called regular expressions, so I'm
looking for help again.

Now, if I have a String like
"some string some content <title>this I want</title>"

And I want to use the scan function to extract what is between <title>
and </title> how can I build my regular expression. The final result
should be:
this I want

Thnaks

Phrogz · Mar 6, 2007

Now, if I have a String like
"some string some content <title>this I want</title>"

And I want to use the scan function to extract what is between <title>
and </title> how can I build my regular expression. The final result
should be:
this I want

irb(main):001:0> str = "This is <title>what I
irb(main):002:0" want</title> and no more"
=> "This is <title>what I\nwant</title> and no more"
irb(main):003:0> str[ %r{<title>(.+?)</title>}, 1 ]
=> nil
irb(main):004:0> str[ %r{<title>(.+?)</title>}m, 1 ]
=> "what I\nwant"

Note that the use of 'm' to match across multiple lines, assuming your
title tag spans them.

Note that this will fail if you have "<title>This is <title>nested</
title> content</title>", and will result in "This is <title>nested"

Jenda Krynicky · Mar 7, 2007

J. mp said:
I'm really bad with this things called regular expressions, so I'm
looking for help again.

Now, if I have a String like
"some string some content <title>this I want</title>"

And I want to use the scan function to extract what is between <title>
and </title> how can I build my regular expression. The final result
should be:
this I want

Thnaks

You generaly want to use a HTML parser ... provided that Wuby has one.

You may be lucky with <title> since it's likely to not include any
attributes, but still there might be some whitespace INSIDE the tags,
there may be a comment inside the <title>...</title> that you may or may
not want, there may be a <title> or </title> inside a comment etc. etc.
etc.

In (censored) I'd use HTML:

arser from CPAN, but shhhh ... this is a
Wuby site, we don't speak of such things here.

Jenda

J. mp · Mar 7, 2007

You generaly want to use a HTML parser ... provided that Wuby has one.

You may be lucky with <title> since it's likely to not include any
attributes, but still there might be some whitespace INSIDE the tags,
there may be a comment inside the <title>...</title> that you may or may
not want, there may be a <title> or </title> inside a comment etc. etc.
etc.

In (censored) I'd use HTML:arser from CPAN, but shhhh ... this is a
Wuby site, we don't speak of such things here.

Jenda

I ended with Hpricot, it's working fine with a few tests I made till
now.

Harry · Mar 7, 2007

You may be lucky with said:
attributes, but still there might be some whitespace INSIDE the tags,

Jenda

str = "This is <title> what I\n\n\n\n \n want </title> and no more"
p str

str =~ /<title>(.*?)<\/title>/m
p $1.gsub(/(\n|\s)+/, " ").strip

Alex Young · Mar 7, 2007

Harry said:
str = "This is <title> what I\n\n\n\n \n want </title> and
no more"
p str

str =~ /<title>(.*?)<\/title>/m
p $1.gsub(/(\n|\s)+/, " ").strip

I think he meant:

str = "This is <title >what I want</title> and no more"

but we don't know if the problem requires handling anything more complex
than simple tags.

Alex Young · Mar 7, 2007

Jenda said:
You generaly want to use a HTML parser ... provided that Wuby has one.

I wonder what the first hit from googling "ruby html parser" is? Ah
yes, hpricot. A perfectly valid approach.

Personally, in the past I've libtidy'd html to xml and used REXML's
stream parser. This has the rather wonderful benefit of actually being
able to fix some fairly broken html, and failing early if it can't.

You may be lucky with <title> since it's likely to not include any
attributes, but still there might be some whitespace INSIDE the tags,
there may be a comment inside the <title>...</title> that you may or may
not want, there may be a <title> or </title> inside a comment etc. etc.
etc.

In (censored) I'd use HTML:arser from CPAN, but shhhh ... this is a
Wuby site, we don't speak of such things here.

It's a mailing list, not a site... Easy to confuse, possibly, but the
mailing list is the primary interface.

Harry · Mar 7, 2007

I think he meant:

str = "This is <title >what I want</title> and no more"

Oh, that's quite different.
Never mind.

Emily Litella

Jenda Krynicky · Mar 7, 2007

Alex said:
I wonder what the first hit from googling "ruby html parser" is? Ah
yes, hpricot. A perfectly valid approach.

Hpricot? How come the name does not surprise me? It's a perfectly clear
name specifying exactly what and how it does.

Jenda
module Enumerable
alias foldl inject # inventing names in a foreign language huh?
end

J. mp · Mar 7, 2007

Hpricot? How come the name does not surprise me? It's a perfectly clear
name specifying exactly what and how it does.

Jenda
module Enumerable
alias foldl inject # inventing names in a foreign language huh?
end

Why isn't hpricot a good aproach? any other suggestions?

Jenda Krynicky · Mar 7, 2007

J. mp said:
Why isn't hpricot a good aproach? any other suggestions?

No, it most likely is a good approach. It's just that the name is a bit
... wuby. Which is to be expected.

Jenda

J. mp · Mar 8, 2007

Albert said:
one does not expect less from the creator of chunky bacon...

can you tell me the whole story? what is the chunky bacon?

Alex Young · Mar 8, 2007

J. mp said:
can you tell me the whole story? what is the chunky bacon?

http://poignantguide.net/ruby/

Your brain will never be the same again...

Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Regular Expressions	14	Aug 25, 2008
regular expressions and matching delimeters	17	May 21, 2014
GET NEIL DEGRASSES TYSON, I ripped a hole with this one...	0	Nov 10, 2022
Regular expressions: Find part of a string	4	Jun 4, 2009
Need help again please	19	Feb 14, 2020
Regular expressions, capture repeated groups	4	Jul 8, 2010
regular expressions	3	Apr 26, 2010

Regular expressions - Again

J. mp

Phrogz

Jenda Krynicky

J. mp

Harry

Alex Young

Alex Young

Harry

Jenda Krynicky

J. mp

Jenda Krynicky

J. mp

Alex Young

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads