bug in gsub(?)

Tiziano Merzi · Sep 25, 2010

I have found this bug(?) in gsub

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\ \1')
=> \ \\ :\ {\ }\ =\ #\ ~ OK

but

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\1')
=> \1\1\1\1\1\1\1

Any idea?

Brian Candler · Sep 25, 2010

Tiziano said:
I have found this bug(?) in gsub
http://www.catb.org/~esr/faqs/smart-questions.html#id382249

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\ \1')
=> \ \\ :\ {\ }\ =\ #\ ~ OK

but

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\1')
=> \1\1\1\1\1\1\1

Any idea?

puts "a".gsub(/a/, '\\\\') # i.e. two backslashes
=> \

That is, in a replacement string, if you backslash-escape a backslash
you get a single backslash. That allows you to have literally \1 if
that's what you need.

So a literal backslash is \\, and the first capture is \1

So what you want is \\\1, to get a backslash followed by the first
capture. However, that is represented in a string literal as '\\\\\\1'
(which generates a 4 character string) because a string literal also has
backslash escaping.

'\\\\\\1'.size => 4
puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\\\\1')

Click to expand...

\\\:\{\}\=\#\~
=> nil

Take a suggestion from me: save your sanity and use the block form
instead

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/) { "\\#{$1}" }

Click to expand...

\\\:\{\}\=\#\~
=> nil

Brian Candler · Sep 26, 2010

Chad said:
I've wondered for quite a while what was the rationale for having \1 in
the first place.

Ruby inherits a lot from Perl, and Perl from sed.

Some of the Perlisms are IMO superfluous - in particular the Kernel
methods which operate on $_, and the flip-flop conditional operators.

Objects would be much tidier if they didn't inherit Kernel#gets,
Kernel#gsub etc; and you'd avoid some confusing error messages like

irb(main):001:0> 3.gsub(/a/,'b')
NoMethodError: private method `gsub' called for 3:Fixnum

Tiziano Merzi · Sep 26, 2010

Brian said:
That is, in a replacement string, if you backslash-escape a backslash
you get a single backslash. That allows you to have literally \1 if
that's what you need.

So a literal backslash is \\, and the first capture is \1

So what you want is \\\1, to get a backslash followed by the first
capture. However, that is represented in a string literal as '\\\\\\1'
(which generates a 4 character string) because a string literal also has
backslash escaping.

'\\\\\\1'.size => 4
puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/, '\\\\\\1')

Click to expand...

Click to expand...

\\\:\{\}\=\#\~
=> nil

Take a suggestion from me: save your sanity and use the block form
instead

puts "\\:{}=#~".gsub(/([\\\:\~\=\#\{\}])/) { "\\#{$1}" }

Click to expand...

Click to expand...

\\\:\{\}\=\#\~
=> nil

ThanksBrian!
I know the block form.
So the problem is the backslash escape in string:
'\\\1' == '\\\\1' => true

Mike Stok · Sep 26, 2010

=20
Okay . . . I guess that sorta makes sense. Of course, I've never used = \1
in Perl, nor seen anyone else do so either, so until you mentioned it = I
had entirely forgotten that was an option there either.
=20
Both languages would be better off without that syntax, and just stick
with $1 instead, I think.
=20
=20
=20
I wouldn't really call \1 a "Perlism", given that the way I've always
seen it done is with $1 instead. If it's a Perlism despite its lack = of
general usage, I'd say it's every bit as much a Rubyism.

There are times in Perl when you need to use \1 in the matching part of =
a regular expression because you don't want $1 to interpolate into the =
match.

Consider trying to match a simple quoted string (i.e. no \ escaping):

my $s1 =3D "Hello there";
my $s2 =3D q{The cat said "Hello there, how's it going?"};

if ($s1 =3D~ m/(ell)/) {
print "print s1 matched - \$1 is '$1'\n";
}

if ($s2 =3D~ m/(["'])(.*?)\1/) {
print "print s2 matched - \$2 is '$2'\n";
}

This outputs:

print s1 matched - $1 is 'ell'
print s2 matched - $2 is 'Hello there, how's it going?'

If you try using $1 in place of \1 in the second regex then it will =
output

print s1 matched - $1 is 'ell'
print s2 matched - $2 is 'H'

Mike

--=20

Mike Stok <[email protected]>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.

Brian Candler · Sep 27, 2010

Chad said:
I wouldn't really call \1 a "Perlism", given that the way I've always
seen it done is with $1 instead.

I called \1 a perlism mainly because it's a sedism that perl inherited.
You're right that in Perl you could instead write:

$str =~ s/(.)/$1$1/;

Of course, that doesn't work in Ruby without using the block form:

str.sub(/(.)/, "$1$1") # no!
str.sub(/(.)/, "#{$1}#{$1}") # no!!
str.sub(/(.)/) {"#{$1}#{$1}"} # ok

in which case you could either argue that ruby needs sed's \1 more than
perl does, or you could argue that ruby doesn't need it at all.

It's odd that ruby strives to be so perl-compatible in areas like this,
but is different in far more important areas (e.g. ^ matching newlines
within a string, not just the start of string)

Regards,

Brian.

Xavier Noria · Sep 27, 2010

It's odd that ruby strives to be so perl-compatible in areas like this,
but is different in far more important areas (e.g. ^ matching newlines
within a string, not just the start of string)

Absolutely, there are a few gotchas:

http://www.advogato.org/person/fxn/diary/498.html

Don't know why is that way, but I find them surprising.

Partial GSUB match / replacement	6	Nov 20, 2010
regex gsub	3	Feb 26, 2011
gsub: invalid byte sequence in US-ASCII	5	Jun 15, 2010
gsub("\\", "\\\\") seems unintuitive	10	Feb 22, 2008
gsub and backslashes	15	Nov 20, 2010
lambda with $1 fails as gsub block	3	Dec 10, 2008
gsub ?	2	Sep 1, 2007
gsub bug?	10	May 21, 2006

bug in gsub(?)

Tiziano Merzi

Brian Candler

Brian Candler

Tiziano Merzi

Mike Stok

Brian Candler

Xavier Noria

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads