Weird behaviour escaping special characters in a string

Discussion in 'Ruby' started by Greg Hurrell, Feb 21, 2007.

  1. Greg Hurrell

    Greg Hurrell Guest

    This instance method added to the String class returns a copy of the
    receiver with occurrences of \ replaced with \\, and occurrences of '
    replaced with \':

    class String
    def to_source_string
    gsub(/(\\|')/, '\\\\\1')
    end
    end

    The idea is that it will give you a string that you can write out a
    Ruby file that will later print the string. For, example, let's take
    the string, foo (3 characters):

    "puts '" + "foo".to_source_string + "'" # puts 'foo'

    Or a string with special characters in it like 'foo' (5 characters,
    including enclosing single quotes):

    "puts '" + "'foo'".to_source_string + "'" # puts '\'foo\''

    My RSpec specs and experimentation in irb confirm that the method
    works but I am at a loss to explain one thing:

    Why do I need so many backslashes in my replacement expression?

    There are five slashes in the replacement expression:

    gsub(/(\\|')/, '\\\\\1')

    But I would have thought that three would work:

    gsub(/(\\|')/, '\\\1')

    I basically want to replace "whatever is found in the pattern" with a
    backslash (\\) followed by "whatever was found" (\1); so that's three
    slashes. But with only three slashes Ruby gives me \1foo\1 instead of
    \'foo\'. Four slashes produces the same result. Five slashes and
    suddenly everything works (funnily enough, six slashes also works).
    Two slashes and one slash have no effect (no escaping is performed).

    I've got working code so it's not a huge problem, but my curiosity is
    piqued. What's going on here that I don't understand?

    Cheers,
    Greg
    Greg Hurrell, Feb 21, 2007
    #1
    1. Advertising

  2. On 2/21/07, Greg Hurrell <> wrote:
    > This instance method added to the String class returns a copy of the
    > receiver with occurrences of \ replaced with \\, and occurrences of '
    > replaced with \':
    >
    > class String
    > def to_source_string
    > gsub(/(\\|')/, '\\\\\1')
    > end
    > end


    class String
    def to_source_string
    gsub(/(\\|')/) { "\\#$1" }
    end
    end

    -austin
    --
    Austin Ziegler * * http://www.halostatue.ca/
    * * http://www.halostatue.ca/feed/
    *
    Austin Ziegler, Feb 21, 2007
    #2
    1. Advertising

  3. On Feb 21, 2007, at 12:36 PM, Austin Ziegler wrote:

    > On 2/21/07, Greg Hurrell <> wrote:
    >> This instance method added to the String class returns a copy of the
    >> receiver with occurrences of \ replaced with \\, and occurrences of '
    >> replaced with \':
    >>
    >> class String
    >> def to_source_string
    >> gsub(/(\\|')/, '\\\\\1')
    >> end
    >> end

    >
    > class String
    > def to_source_string
    > gsub(/(\\|')/) { "\\#$1" }
    > end
    > end


    It's probably better to use a character class [\\'] instead of
    alternation (\\|').

    James Edward Gray II
    James Edward Gray II, Feb 21, 2007
    #3
  4. On Thu, Feb 22, 2007 at 02:55:09AM +0900, Greg Hurrell wrote:
    > Why do I need so many backslashes in my replacement expression?
    >
    > There are five slashes in the replacement expression:
    >
    > gsub(/(\\|')/, '\\\\\1')
    >
    > But I would have thought that three would work:
    >
    > gsub(/(\\|')/, '\\\1')


    Because even in single quotes, blackslashes must be doubled; this in turn is
    because \' is the way that you insert a single quote within a single-quoted
    string.

    irb(main):001:0> a='\\'
    => "\\"
    irb(main):002:0> a.size
    => 1
    irb(main):003:0> b='\''
    => "'"
    irb(main):004:0> b.size
    => 1
    irb(main):005:0> c='\x'
    => "\\x"
    irb(main):006:0> c.size
    => 2

    > I basically want to replace "whatever is found in the pattern" with a
    > backslash (\\) followed by "whatever was found" (\1); so that's three
    > slashes. But with only three slashes Ruby gives me \1foo\1 instead of
    > \'foo\'. Four slashes produces the same result. Five slashes and
    > suddenly everything works (funnily enough, six slashes also works).
    > Two slashes and one slash have no effect (no escaping is performed).
    >
    > I've got working code so it's not a huge problem, but my curiosity is
    > piqued. What's going on here that I don't understand?


    irb(main):009:0> a='\\\\1'
    => "\\\\1"
    irb(main):010:0> a.size
    => 3
    irb(main):011:0> a='\\\\\1'
    => "\\\\\\1"
    irb(main):012:0> a.size
    => 4
    irb(main):013:0> a='\\\\\\1'
    => "\\\\\\1"
    irb(main):014:0> a.size
    => 4

    In a single-quoted string:
    \' => '
    \\ => \
    \x => \x for all other x

    So '...\1' and '...\\1' are identical.

    HTH,

    Brian.
    Brian Candler, Feb 21, 2007
    #4
  5. Greg Hurrell

    Greg Hurrell Guest

    On 21 feb, 20:50, Brian Candler <> wrote:

    > In a single-quoted string:
    > \' => '
    > \\ => \
    > \x => \x for all other x
    >
    > So '...\1' and '...\\1' are identical.


    Excellent, that explains why I was getting the same results for 3 and
    4 slashes, and the same for 5 and 6 slashes.

    Cheers,
    Greg
    Greg Hurrell, Feb 22, 2007
    #5
  6. Greg Hurrell

    Greg Hurrell Guest

    On 21 feb, 19:45, James Edward Gray II <>
    wrote:
    > On Feb 21, 2007, at 12:36 PM, Austin Ziegler wrote:
    >
    > It's probably better to use a character class [\\'] instead of
    > alternation (\\|').
    >
    > James Edward Gray II


    I did some quick and dirty benchmarks and using a character class is a
    little bit quicker. Interpolation ("\\#$1") is slower but more
    readable. I guess I'll stick with the character class and no
    interpolation though.

    require 'benchmark'
    include Benchmark

    bm(6) do |x|
    x.report('alternation') { 100_000.times { "'foo'".gsub(/(\\|')/, '\\\
    \\1') } }
    x.report('char class') { 100_000.times { "'foo'".gsub(/[\\']/, '\\\\
    \&') } }
    x.report('interpolation') { 100_000.times { "'foo'".gsub(/(\\|')/, "\
    \#$1") } }
    x.report('interpolation with char class') { 100_000.times
    { "'foo'".gsub(/[\\']/, "\\#$&") } }
    end
    user system total real
    alternation 0.450000 0.000000 0.450000 ( 0.452661)
    char class 0.390000 0.000000 0.390000 ( 0.396193)
    interpolation 0.540000 0.010000 0.550000 ( 0.532106)
    interpolation with char class 0.480000 0.000000 0.480000
    ( 0.485922)
    Greg Hurrell, Feb 22, 2007
    #6
  7. On Thu, 22 Feb 2007 13:55:06 +0100, Greg Hurrell <=
    > =


    wrote:

    > On 21 feb, 20:50, Brian Candler <> wrote:
    >
    >> In a single-quoted string:
    >> \' =3D> '
    >> \\ =3D> \
    >> \x =3D> \x for all other x
    >>
    >> So '...\1' and '...\\1' are identical.

    >
    > Excellent, that explains why I was getting the same results for 3 and
    > 4 slashes, and the same for 5 and 6 slashes.
    >


    %q{...} is your friend.

    David Vallner
    David Vallner, Feb 22, 2007
    #7
  8. 2007/2/21, Greg Hurrell <>:
    > This instance method added to the String class returns a copy of the
    > receiver with occurrences of \ replaced with \\, and occurrences of '
    > replaced with \':
    >
    > class String
    > def to_source_string
    > gsub(/(\\|')/, '\\\\\1')
    > end
    > end
    >
    > The idea is that it will give you a string that you can write out a
    > Ruby file that will later print the string. For, example, let's take
    > the string, foo (3 characters):
    >
    > "puts '" + "foo".to_source_string + "'" # puts 'foo'
    >
    > Or a string with special characters in it like 'foo' (5 characters,
    > including enclosing single quotes):
    >
    > "puts '" + "'foo'".to_source_string + "'" # puts '\'foo\''


    Why don't you just use #inspect?

    Kind regards

    robert
    Robert Klemme, Feb 22, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stefan Mueller
    Replies:
    3
    Views:
    32,989
    Stefan Mueller
    Jul 23, 2006
  2. =?Utf-8?B?bWljaGFlbHJp?=

    SiteMap, SiteMapPath is Escaping Special Characters

    =?Utf-8?B?bWljaGFlbHJp?=, May 8, 2007, in forum: ASP .Net
    Replies:
    1
    Views:
    545
    =?Utf-8?B?bWljaGFlbHJp?=
    May 9, 2007
  3. Gene Kahn
    Replies:
    5
    Views:
    110
    David N. Springer
    Nov 22, 2004
  4. Gary Yngve
    Replies:
    5
    Views:
    328
    matt neuburg
    Feb 24, 2009
  5. James Black

    escaping special characters in JSON

    James Black, Apr 8, 2006, in forum: Javascript
    Replies:
    4
    Views:
    397
    James Black
    Apr 10, 2006
Loading...

Share This Page