Marc said:
I have a slight problem. I have strings with some tags such as
'<b><lightblue>name:</></b>'
I need to match "name:" and "lightblue"
In other words:
- What is between <> </>
and
- What is inside the first <> right next to "name:"
The following regex does not work:
'<b><lightblue>name:</></b>' =~ /<([a-zA-Z]+)>(.+?)<\/>/
$1 # => "b"
This is your string:
'<b><lightblue>name:</></b>'
and the first part of your regex says to look for a '<', followed by one
or more characters, followed by a '>'. That certainly describes the
$2 # => "<lightblue>name:
This is your string again:
'<b> <--already matched this
<lightblue>name:</></b>'
The second part of your regex says to look for a '<', followed by any
character one or more times, followed by '</>'. That certainly
describes the string '<lightblue>name</>'.
Note that since the characters '</>' only appear once in your string,
the non-greedy qualifier has no effect. By default, regex's are greedy,
so if your string looked like this:
'<b><lightblue>name:</></b>xxxxxxxxxxxxxxx</>'
then the greedy version of your regex:
/>(.+)<\/>/ <----(no '?')
would match:
<lightblue>name:</></b>xxxxxxxxxxxxxxx</>
That's because the portion:
<lightblue>name:</></b>xxxxxxxxxxxxxxx
is interpreted as "any character(.) one or more times(+)".
On the other hand, your non-greedy regex(i.e. with the '?') would match:
<lightblue>name:</>
If you examine your string again:
'<b><lightblue>name:</></b>'
the 'lightblue' substring is preceded by the characters '><', and that
is different from what precedes 'b'. You can use that fact to get
'lightblue' instead of 'b'. This regex will get 'lightblue':
That says to look for '><' followed by one or more characters that are
not a '>'. That will match:
'><lightblue'
To get 'name:', you can do something similar. This is the rest of the
string after 'lightblue':
'>name:</></b>'
Here is a regex to get 'name:':
That says to look for a '>', followed by one or more characters that are
not a '<'. Here it is altogether:
pattern = /><([^>]+)>([^<]+)/
str = "<b><lightblue>name:</></b>"
match_obj = pattern.match(str)
puts match_obj[1]
puts match_obj[2]
--output:--
lightblue
name: