bug in gsub(?)

T

Tiziano Merzi

I have found this bug(?) in gsub

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\ \1')
=> \ \\ :\ {\ }\ =\ #\ ~ OK

but

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\1')
=> \1\1\1\1\1\1\1

Any idea?
 
B

Brian Candler

Tiziano said:
I have found this bug(?) in gsub
http://www.catb.org/~esr/faqs/smart-questions.html#id382249

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\ \1')
=> \ \\ :\ {\ }\ =\ #\ ~ OK

but

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\1')
=> \1\1\1\1\1\1\1

Any idea?

puts "a".gsub(/a/, '\\\\') # i.e. two backslashes
=> \

That is, in a replacement string, if you backslash-escape a backslash
you get a single backslash. That allows you to have literally \1 if
that's what you need.

So a literal backslash is \\, and the first capture is \1

So what you want is \\\1, to get a backslash followed by the first
capture. However, that is represented in a string literal as '\\\\\\1'
(which generates a 4 character string) because a string literal also has
backslash escaping.
'\\\\\\1'.size => 4
puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\\\\1')
\\\:\{\}\=\#\~
=> nil

Take a suggestion from me: save your sanity and use the block form
instead :)
puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/) { "\\#{$1}" }
\\\:\{\}\=\#\~
=> nil
 
B

Brian Candler

Chad said:
I've wondered for quite a while what was the rationale for having \1 in
the first place.

Ruby inherits a lot from Perl, and Perl from sed.

Some of the Perlisms are IMO superfluous - in particular the Kernel
methods which operate on $_, and the flip-flop conditional operators.

Objects would be much tidier if they didn't inherit Kernel#gets,
Kernel#gsub etc; and you'd avoid some confusing error messages like

irb(main):001:0> 3.gsub(/a/,'b')
NoMethodError: private method `gsub' called for 3:Fixnum
 
T

Tiziano Merzi

Brian said:
That is, in a replacement string, if you backslash-escape a backslash
you get a single backslash. That allows you to have literally \1 if
that's what you need.

So a literal backslash is \\, and the first capture is \1

So what you want is \\\1, to get a backslash followed by the first
capture. However, that is represented in a string literal as '\\\\\\1'
(which generates a 4 character string) because a string literal also has
backslash escaping.
'\\\\\\1'.size => 4
puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\\\\1')
\\\:\{\}\=\#\~
=> nil

Take a suggestion from me: save your sanity and use the block form
instead :)
puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/) { "\\#{$1}" }
\\\:\{\}\=\#\~
=> nil

ThanksBrian!
I know the block form.
So the problem is the backslash escape in string:
'\\\1' == '\\\\1' => true
 
M

Mike Stok

=20
Okay . . . I guess that sorta makes sense. Of course, I've never used = \1
in Perl, nor seen anyone else do so either, so until you mentioned it = I
had entirely forgotten that was an option there either.
=20
Both languages would be better off without that syntax, and just stick
with $1 instead, I think.
=20
=20
=20
I wouldn't really call \1 a "Perlism", given that the way I've always
seen it done is with $1 instead. If it's a Perlism despite its lack = of
general usage, I'd say it's every bit as much a Rubyism.

There are times in Perl when you need to use \1 in the matching part of =
a regular expression because you don't want $1 to interpolate into the =
match.

Consider trying to match a simple quoted string (i.e. no \ escaping):

my $s1 =3D "Hello there";
my $s2 =3D q{The cat said "Hello there, how's it going?"};

if ($s1 =3D~ m/(ell)/) {
print "print s1 matched - \$1 is '$1'\n";
}

if ($s2 =3D~ m/(["'])(.*?)\1/) {
print "print s2 matched - \$2 is '$2'\n";
}

This outputs:

print s1 matched - $1 is 'ell'
print s2 matched - $2 is 'Hello there, how's it going?'


If you try using $1 in place of \1 in the second regex then it will =
output

print s1 matched - $1 is 'ell'
print s2 matched - $2 is 'H'


Mike

--=20

Mike Stok <[email protected]>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.
 
B

Brian Candler

Chad said:
I wouldn't really call \1 a "Perlism", given that the way I've always
seen it done is with $1 instead.

I called \1 a perlism mainly because it's a sedism that perl inherited.
You're right that in Perl you could instead write:

$str =~ s/(.)/$1$1/;

Of course, that doesn't work in Ruby without using the block form:

str.sub(/(.)/, "$1$1") # no!
str.sub(/(.)/, "#{$1}#{$1}") # no!!
str.sub(/(.)/) {"#{$1}#{$1}"} # ok

in which case you could either argue that ruby needs sed's \1 more than
perl does, or you could argue that ruby doesn't need it at all.

It's odd that ruby strives to be so perl-compatible in areas like this,
but is different in far more important areas (e.g. ^ matching newlines
within a string, not just the start of string)

Regards,

Brian.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Partial GSUB match / replacement 6
regex gsub 3
gsub: invalid byte sequence in US-ASCII 5
gsub("\\", "\\\\") seems unintuitive 10
gsub and backslashes 15
lambda with $1 fails as gsub block 3
gsub ? 2
gsub bug? 10

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,677
Members
48,796
Latest member
Greg L.

Latest Threads

Top