Problem in 2 code

A

Amir Ebrahimifard

Hi
I dont understand results of these 2 code ?
please explain me what do every code do ?

1 -
x = "This is a test".match( /(\w+)(\w+)/ )
puts x[0]
puts x[1]
puts x[2]

2 -
x = "This is a test".match( /(\w+) (\w+)/ )
puts x[0]
puts x[1]
puts x[2]
 
K

Kc Co

Amir said:
Hi
I dont understand results of these 2 code ?
please explain me what do every code do ?

1 -
x = "This is a test".match( /(\w+)(\w+)/ )
puts x[0]
puts x[1]
puts x[2]

2 -
x = "This is a test".match( /(\w+) (\w+)/ )
puts x[0]
puts x[1]
puts x[2]

/(\w+)(\w+)/ is a Regexp. If you want to know about it, you should
probably look it up to help remove the confusion. \w+ means one or more
word characters, so /(\w+)(\w+)/ means 2 word characters or more. Based
on the string, the match method returns the first instance that matches
it, which is "This". As to what the x[1] and x[2] put to the screen, I
could be totally wrong about it but I'm guessing they might be the parts
that match the parts of the Regexp given. However, it'd probably be best
to ask someone else about that.

In the second code, it's the same thing except there's a space between
the two \w+. This means at least one word character followed by a space
followed by at least one word character. That is why it returns the
match "This is" instead of just "This".

I hope this was helpful.
 
A

Amir Ebrahimifard

Thanks for answer , but yet I have a problem :
what does first code do?
why when I write "puts x[0]" ruby returns "This" and for "puts x[1]"
returns "Thi" and for "puts x[2]" returns "s" ?
 
M

Markus Fischer

Hello Amir,

why when I write "puts x[0]" ruby returns "This" and for "puts x[1]"
returns "Thi" and for "puts x[2]" returns "s" ?

The first (x[0]) is always the complete match the whole regular
expression did match. The rest are the individual sub matches, if there
are any.

One also has to know that, by default, in most implementation any
regular expression is "greedy", which means it tries to match as much
characters as possible.

So, given your first example:

"This is a test".match( /(\w+)(\w+)/ )

\w - match a a single "word" character

\w+ - match at least one *or* more "word" characters

Now since by default everything is greedy, the first \w+ tries to match
as much as possible. Since the second \w+ wants to fulfill it task too,
the first \w+ eats up already everything until the last character and
leaves that for the second \w+ .

There's a special character ? which can be used to tell a regex to be
non-greedy, try this example:

"This is a test".match( /(\w+?)(\w+)/ )

irb(main):006:0> "1234".match(/(\d+?)(\d+)/)
=> #<MatchData "1234" 1:"1" 2:"234">

The \w+? means "match as few as possible" and thus it only matches the
first "1" and leaves all the rest to the second \w+ .

In your case it's debatable whether this regex really makes sense
though; at a first glance it doesn't look like a generally useful case
and really looks very specific.

HTH
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,120
Latest member
ShelaWalli
Top