No regex backreference with four backslashes

G

gabriel.birke

Consider the following test case:

require 'test/unit'
class RegexTest < Test::Unit::TestCase
def test_escaping
numbers = "12345"
assert_equal "12345", numbers.gsub(/(2|4)/, '\1')
assert_equal "12345", numbers.gsub(/(2|4)/, "\\1")
assert_equal "1\\ 23\\ 45", numbers.gsub(/(2|4)/, '\\ \1')
assert_equal "1\\ 23\\ 45", numbers.gsub(/(2|4)/, "\\ \\1")
assert_equal "1\\23\\45", numbers.gsub(/(2|4)/, '\\\1')
assert_equal "1\\23\\45", numbers.gsub(/(2|4)/, "\\\\1")
end
end
require 'test/unit/ui/console/testrunner'
Test::Unit::UI::Console::TestRunner.run(RegexTest)

The last two assertions fail (With the message <"1\\23\\45"> expected
but was <"1\\13\\15">.) - but why?

Is this a bug in the regex implementation or is there something wrong
with my regular expression or substitution string?
 
G

gabriel.birke

Paul said:
To find out how your strings are being parsed, print them out. Then print
out the result of the regexes directly, rather than relying on an
assertion.

numbers.gsub(/(2|4)/,"\\\1")

"1\\\0013\\\0015"

numbers.gsub(/(2|4)/,"\\\\1")

"1\\13\\15"

The best "test suite" is your eyes.

I've done that already, the test was only to show the problem: I could
not escape chars in the numbers string with a backslash.

Anyway, I found the solution, it's five backslashes instead of four.
That's a bit counter-intuitive, maybe someone can explain it.
Especially when these two are compared:

numbers.gsub(/(2|4)/,'\\ \\1')
numbers.gsub(/(2|4)/,'\\\\\1')

I expected that when I remove the space from the first expression, that
my characters would get quoted. instead, the four backslashes get
interpreted as two escaped backslashes and the 1 as a literal
character. Can somebdoy shed some light on the how and why of this
case? Especially, why the solution with the five backslashes doesn't
yield double backlashes in the result string?
 
M

MonkeeSage

I expected that when I remove the space from the first expression, that
my characters would get quoted. instead, the four backslashes get
interpreted as two escaped backslashes and the 1 as a literal
character. Can somebdoy shed some light on the how and why of this
case? Especially, why the solution with the five backslashes doesn't
yield double backlashes in the result string?

In the replacement string, a backreference is a backslash followed by a
number -- reference(\1) -- but a double-backslash is treated as a
literal single backslash, so \\1 == literal(\1). So then, three
backslashes and a number, \\\1 is equal to literal(\) reference(\1).
Four means literal(\\1). Finally, five means literal(\\) reference(\1),
and thus, since backslashes must be escaped to be seen as a single
backslash in a string, you end up with the resulting string
"1\\23\\45", meaning 1\23\45. Hope that makes sense.

Regards,
Jordan
 
G

gabriel.birke

Paul said:
puts '\\\\\1'

\\\1 # meaning two backslashes and an escaped '1'

Oh, by the way. You haven't said what you are trying to accomplish.

I was trying to escape some characters in a string with a backslash.

When printing out '\\\\\1' (resulting in two backslashes and and
escaped '1' like you said) I would expect the result string s
(s=numbers.gsub(/(2|4)/, '\\\\\1') to contain *two* backslashes and
then the original character. But apparently the replacement string is
interpreted as "one backslash and a backreference (escaped with two
backslashes)."
 
G

gabriel.birke

But apparently the replacement string is
interpreted as "one backslash and a backreference (escaped with two
backslashes)."

After thinking a while about it I realized this is not correct.

Backslashes in a replacement string *must* be double backslashes (four
backslashes in the literal string) because otherwise they would be
interpreted as escaped characters by the regex engine. Right?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top