regex backreference weirdness...

K

Kyle Schmitt

It all started with trying to convert some strings with underscores in
them, to camel case...and me thinking of regexes as sed regexes

It looks like gsub saves back references, but only after the whole
method exits, so it can't use what it found. Is this right?

Below is what I tried, which leads me up to the question of... what
would the _right_ way be of doing this?

irb>"camel_case".gsub(/_(.)/,$1.upcase)
NoMethodError: undefined method `upcase' for nil:NilClass
from (irb):1
#Which made me say Hu? it looks like a valid back reference to me...
#So I tried
"camel_case"=~/_(.)/
irb>puts $1
=>"c"
#ok really wierd...
irb>"camel_case".gsub(/_(.)/,$1.upcase)
=>"camelCase"
#Right, I'll buy that since $1 is still hanging around
irb>"camel_face".gsub(/_(.)/,$1.upcase)
=>"camelCase"
irb>"camel_face".gsub(/_(.)/,$1.upcase)
=>"camelFase"
#Ha ha! gsub DOES save a backreference... so why isn't this working?! :(
 
M

Morton Goldberg

Below is what I tried, which leads me up to the question of... what
would the _right_ way be of doing this?

irb>"camel_case".gsub(/_(.)/,$1.upcase)
NoMethodError: undefined method `upcase' for nil:NilClass
from (irb):1

I think, in this case, you will have to use a block. For example:

"camel_case".gsub(/_(.)/) { $1.upcase } # => "camelCase"

or

"camel_case".gsub(/_./) { |m| m[1, 1].upcase } # => "camelCase"

Regards, Morton
 
R

Robert Klemme

2007/7/20 said:
It all started with trying to convert some strings with underscores in
them, to camel case...and me thinking of regexes as sed regexes

It looks like gsub saves back references, but only after the whole
method exits, so it can't use what it found. Is this right?

Below is what I tried, which leads me up to the question of... what
would the _right_ way be of doing this?

irb>"camel_case".gsub(/_(.)/,$1.upcase)

You need to be aware that $1.upcase is evaluated *before* the method
call. So it can *never* be able to do calculations based on match
state. You rather want to use the block for, where the block is
invoked once per match. For example, you can do

irb(main):005:0> "camel_case".gsub(/(?:\A|_)(.)/) {|m| $1.capitalize }
=> "CamelCase"
NoMethodError: undefined method `upcase' for nil:NilClass
from (irb):1
#Which made me say Hu? it looks like a valid back reference to me...

No, with the non block form you need to use \1, \2 etc. as has been
mentioned already.
#So I tried
"camel_case"=~/_(.)/
irb>puts $1
=>"c"
#ok really wierd...
irb>"camel_case".gsub(/_(.)/,$1.upcase)
=>"camelCase"
#Right, I'll buy that since $1 is still hanging around
irb>"camel_face".gsub(/_(.)/,$1.upcase)
=>"camelCase"
irb>"camel_face".gsub(/_(.)/,$1.upcase)
=>"camelFase"
#Ha ha! gsub DOES save a backreference... so why isn't this working?! :(

You're still working on the value of $1 from the last invocation.
Proper backreferencing in the non block form looks like this:

irb(main):010:0> "camel_case".gsub /[cde]/, '<\\&>'
=> "<c>am<e>l_<c>as<e>"
irb(main):011:0> "camel_case".gsub /c(.)/, '<\\1>'
=> "<a>mel_<a>se"

Regards

robert
 
P

Peña, Botp

From: Kyle Schmitt [mailto:[email protected]]=20
# #Ha ha! gsub DOES save a backreference... so why isn't this=20
# working?! :(

i think the behaviour is documented.

root@pc4all:~# qri string#gsub
------------------------------------------------------------ String#gsub
str.gsub(pattern, replacement) =3D> new_str
str.gsub(pattern) {|match| block } =3D> new_str
------------------------------------------------------------------------
Returns a copy of str with all occurrences of pattern replaced
with either replacement or the value of the block. The pattern
will typically be a Regexp; if it is a String then no regular
expression metacharacters will be interpreted (that is /\d/ will
match a digit, but '\d' will match a backslash followed by a 'd').

If a string is used as the replacement, special variables from the
match (such as $& and $1) cannot be substituted into it, as
substitution into the string occurs before the pattern match
starts. However, the sequences \1, \2, and so on may be used to
interpolate successive groups in the match.

In the block form, the current match string is passed in as a
parameter, and variables such as $1, $2, $`, $&, and $' will be
set appropriately. The value returned by the block will be
substituted for the match on each call.

The result inherits any tainting in the original string or any
supplied replacement string.

"hello".gsub(/[aeiou]/, '*') #=3D> "h*ll*"
"hello".gsub(/([aeiou])/, '<\1>') #=3D> "h<e>ll<o>"
"hello".gsub(/./) {|s| s[0].to_s + ' '} #=3D> "104 101 108 108 =
111 "

root@pc4all:~#

kind regards -botp
 
K

Kyle Schmitt

I completely, and utterly forgot about the block form of gsub.
Perfect. Thanks everyone!

But it does make me wonder, for the non block form, when you use the
\1 variable, I can see how to use it inside of other strings, but how
would you go about running other methods on it? In this case upcase.
Or is there no way?


As far as re-inventing the wheel, it's important to know the hows and
whys, even if you don't end up implementing it yourself :)

--Kyle
 
R

Robert Klemme

But it does make me wonder, for the non block form, when you use the
\1 variable, I can see how to use it inside of other strings, but how
would you go about running other methods on it? In this case upcase.
Or is there no way?

Precisely.

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top