Questions about * + and ? in Regex

C

Carlos Ortega

Hi,

I have some questions related the correct meaning of * + and ? in Regex
that I would appreciate some clarification:
I have an example (derived from the Programming Ruby 2nd Edition), that
I don't understand why gives these results, here is the code:

def show_regexp(a, re)
if a =~ re
puts "#{$`}<<#{$&}>>#{$'}"
else
puts "no match"
end
end

show_regexp('Example1', /\s*/)
show_regexp('Example2', /\s.*/)
show_regexp('Example3 ', /\s.?/) # Space at the end of string
show_regexp('Example4 ', /\s.+/) # Space at the end of string
show_regexp('Example5 ', /\s.*/) # Space at the end of string

output gives:

<<>>Example1
no match
Example3<< >>
no match
Example5<< >>

If I understand well:
* means - match zero or more occurrences of preceding expression.
+ means - match 1 or more occurrences of preceding expression.
? means - match 0 or 1 occurrence of preceding expression.

Why Example2 gives "no match"? I understand this as find "0 or more
occurrences" of (a space followed by any character)
Why Example4 gives "no match"? I understand this as find "1 or more
occurrence" of (a space followed by any character)
I am assuming that the null character can be matched by a .
Am I correct?

Best Regards
 
C

Chris Shea

Hi,

I have some questions related the correct meaning of * + and ? in Regex
that I would appreciate some clarification:
I have an example (derived from the Programming Ruby 2nd Edition), that
I don't understand why gives these results, here is the code:

def show_regexp(a, re)
if a =~ re
puts "#{$`}<<#{$&}>>#{$'}"
else
puts "no match"
end
end

show_regexp('Example1', /\s*/)
show_regexp('Example2', /\s.*/)
show_regexp('Example3 ', /\s.?/) # Space at the end of string
show_regexp('Example4 ', /\s.+/) # Space at the end of string
show_regexp('Example5 ', /\s.*/) # Space at the end of string

output gives:

<<>>Example1
no match
Example3<< >>
no match
Example5<< >>

If I understand well:
* means - match zero or more occurrences of preceding expression.
+ means - match 1 or more occurrences of preceding expression.
? means - match 0 or 1 occurrence of preceding expression.

Why Example2 gives "no match"? I understand this as find "0 or more
occurrences" of (a space followed by any character)
Why Example4 gives "no match"? I understand this as find "1 or more
occurrence" of (a space followed by any character)
I am assuming that the null character can be matched by a .
Am I correct?

Best Regards

A dot (.) can only match an actual character. Example 2 fails because
it's looking not for "0 or more occurrences of (a space followed by
any character)", but "a space followed by 0 or more characters". The *
only applies to whatever immediately precedes it, not the whole
expression... unless the expression's enclosed in parentheses. A regex
for "0 or more occurrences of (a space followed by any character)"
would be /(\s.)*/. In that case, the * applies to the parenthesized
group of whitespace and dot.

Example 4 fails because the only space isn't followed by anything at all.

HTH,
Chris

P.S. I strongly recommend Jeffrey Friedl's Mastering Regular Expressions.
 
C

Carlos Ortega

Thanks a lot Chris now I think I got it, however I still have the
doubt interpreting this:

show_regexp('hi hi hihihi hi hi', /\s.*?\s/)

Overall my confusion arrives when 2 special characters are together...

Cause this last would be:
-Match a space
-Followed by 0 or More characters
-Followed by ..... <= Here is my doubt
-Ending with a space.

Again I would appreciate you help on this.

Regards
Carlos
 
M

MonkeeSage

Thanks a lot Chris now I think I got it, however I still have the
doubt interpreting this:

show_regexp('hi hi hihihi hi hi', /\s.*?\s/)

Overall my confusion arrives when 2 special characters are together...

Cause this last would be:
-Match a space
-Followed by 0 or More characters
-Followed by ..... <= Here is my doubt
-Ending with a space.

Again I would appreciate you help on this.

Regards
Carlos

Normally "*" is "greedy" -- i.e., it matches the right-most matching
substring -- but when it's bounded by "?" it matches the left-most
(first) instance.

"Hello world, from ruby".match(/.*?\s+/)[0]
# => "Hello "

"Hello world, from ruby".match(/.*\s+/)[0]
=> "Hello world, from "

Regards,
Jordan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

RegEx 0
Regex Question 6
regex matching a state in a string 2
regex question 7
Clickable link conversion regex? 0
questions of idiom 3
Questions about regex 3
Idk need help in editing this source code 0

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top