gsub and backslashes

Discussion in 'Ruby' started by Ralph Shnelvar, Nov 20, 2010.

  1. [Note: parts of this message were removed to make it a legal post.]

    Consider the string
    \1\2\3
    that is
    "\\1\\2\\3"

    I feel really stupid ... but this simple substitution pattern does not do what I expect.

    "\\1\\2\\3".gsub(/\\/,"\\\\")

    What I want is to change single backslashes to double backslashes. The result of the above substitution is "no change"

    On the other hand
    "\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
    does do what I want ... but I am clueless as to why.
    Ralph Shnelvar, Nov 20, 2010
    #1
    1. Advertising

  2. Ralph Shnelvar

    Ammar Ali Guest

    On Sun, Nov 21, 2010 at 12:13 AM, Ralph Shnelvar <> wrote:
    > Consider the string
    > =C2=A0\1\2\3
    > that is
    > =C2=A0"\\1\\2\\3"
    >
    > I feel really stupid ... but this simple substitution pattern does not do=

    what I expect.
    >
    > =C2=A0"\\1\\2\\3".gsub(/\\/,"\\\\")
    >
    > What I want is to change single backslashes to double backslashes. =C2=A0=

    The result of the above substitution is "no change"
    >
    > On the other hand
    > =C2=A0"\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
    > does do what I want ... but I am clueless as to why.


    Backslashes are tricky. What's happening here is each escaped
    backslash "\\" yields one backslash, which affects (escapes) what
    comes after it, in this case another escaped backslash that in turn
    yields one back slash. In other words, four backslashes yield two
    backslashes, which is an escaped backslash (i.e one backslash).

    HTH,
    Ammar
    Ammar Ali, Nov 20, 2010
    #2
    1. Advertising

  3. Ralph Shnelvar

    Ammar Ali Guest

    On Sun, Nov 21, 2010 at 12:34 AM, Ammar Ali <> wrote:
    > On Sun, Nov 21, 2010 at 12:13 AM, Ralph Shnelvar <> wrote=

    :
    >> Consider the string
    >> =C2=A0\1\2\3
    >> that is
    >> =C2=A0"\\1\\2\\3"
    >>
    >> I feel really stupid ... but this simple substitution pattern does not d=

    o what I expect.
    >>
    >> =C2=A0"\\1\\2\\3".gsub(/\\/,"\\\\")
    >>
    >> What I want is to change single backslashes to double backslashes. =C2=

    =A0The result of the above substitution is "no change"
    >>
    >> On the other hand
    >> =C2=A0"\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
    >> does do what I want ... but I am clueless as to why.

    >
    > Backslashes are tricky. What's happening here is each escaped
    > backslash "\\" yields one backslash, which affects (escapes) what
    > comes after it, in this case another escaped backslash that in turn
    > yields one back slash. In other words, four backslashes yield two
    > backslashes, which is an escaped backslash (i.e one backslash).
    >


    I should have added that you can get the same result with 3
    backslashes. So 6 of them will give you two.

    >> "\\1\\2\\3".gsub(/\\/,"\\\\\\").scan /./

    =3D> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

    Regards,
    Ammar
    Ammar Ali, Nov 20, 2010
    #3
  4. Ralph Shnelvar

    botp Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Sun, Nov 21, 2010 at 6:13 AM, Ralph Shnelvar <> wrote:
    > What I want is to change single backslashes to double backslashes. The

    result of the above substitution is "no change"
    >
    > On the other hand
    > "\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
    > does do what I want ... but I am clueless as to why.


    there are many ways,

    #1
    "\\1\\2\\3".gsub(/(\\)/,"\\1\\1").scan /./
    #=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

    #2
    "\\1\\2\\3".gsub(/(\\)/,'\1\1').scan /./
    #=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

    #3
    "\\1\\2\\3".gsub(/\\/){"\\\\"}.scan /./
    #=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]

    #4
    "\\1\\2\\3".gsub(/(\\)/){$1+$1}.scan /./
    #=> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]


    #1 & #2 samples uses group backreferences, ruby may need second parsing pass
    for this feature to work...

    #3 & #4 uses code blocks. may not need second pass. backreferences can be
    had using $n notation.

    best regards -botp
    botp, Nov 21, 2010
    #4
  5. Ralph Shnelvar

    Ammar Ali Guest

    On Sun, Nov 21, 2010 at 11:57 AM, botp <> wrote:
    > On Sun, Nov 21, 2010 at 6:13 AM, Ralph Shnelvar <> wrote:
    >> What I want is to change single backslashes to double backslashes. =C2=

    =A0The
    > result of the above substitution is "no change"
    >>
    >> On the other hand
    >> =C2=A0"\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
    >> does do what I want ... but I am clueless as to why.

    >
    > there are many ways,
    >
    > #1
    > "\\1\\2\\3".gsub(/(\\)/,"\\1\\1").scan /./
    > #=3D> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]
    >
    > #2
    > "\\1\\2\\3".gsub(/(\\)/,'\1\1').scan /./
    > #=3D> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]
    >
    > #3
    > "\\1\\2\\3".gsub(/\\/){"\\\\"}.scan /./
    > #=3D> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]
    >
    > #4
    > "\\1\\2\\3".gsub(/(\\)/){$1+$1}.scan /./
    > #=3D> ["\\", "\\", "1", "\\", "\\", "2", "\\", "\\", "3"]
    >
    >
    > #1 & #2 samples uses group backreferences, ruby may need second parsing p=

    ass
    > for this feature to work...
    >
    > #3 & #4 uses code blocks. may not need second pass. backreferences can be
    > had using $n notation.


    botp's excellent suggestions reminded of another one:

    >> "\\1\\2\\3".gsub(/\\/, '\&\&')

    =3D> "\\\\1\\\\2\\\\3"

    Regards,
    Ammar
    Ammar Ali, Nov 21, 2010
    #5
  6. Ralph Shnelvar wrote in post #962847:
    > Consider the string
    > \1\2\3
    > that is
    > "\\1\\2\\3"
    >
    > I feel really stupid ... but this simple substitution pattern does not
    > do what I expect.
    >
    > "\\1\\2\\3".gsub(/\\/,"\\\\")


    Here you are replacing one backslash with one backslash.

    The trouble is, in the *replacement* string, '\1' has a special meaning
    (insert the value of the first capture). Because of this, a literal
    backslash is backslash-backslash.

    So to replace with *two* backslashes you need
    backslash-backslash-backslash-backslash. And inside a double or single
    quoted string, a single backslash is represented as "\\" or '\\'

    irb(main):001:0> "\\1\\2\\3".gsub(/\\/,"\\\\\\\\")
    => "\\\\1\\\\2\\\\3"

    The second level of backslashing isn't used with the block form, since
    if you want to use captured subexpressions you can use #{$1} instead of
    \1. Hence as an alternative:

    irb(main):002:0> "\\1\\2\\3".gsub(/\\/) { "\\\\" }
    => "\\\\1\\\\2\\\\3"

    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Nov 21, 2010
    #6
  7. Ralph Shnelvar

    Ammar Ali Guest

    On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <> wrote=
    :
    > Ralph Shnelvar wrote in post #962847:
    >> =C2=A0 "\\1\\2\\3".gsub(/\\/,"\\\\")

    >
    > Here you are replacing one backslash with one backslash.
    >
    > The trouble is, in the *replacement* string, '\1' has a special meaning
    > (insert the value of the first capture). Because of this, a literal
    > backslash is backslash-backslash.


    That's a keen observation, but the fact that they happen to be
    back-references doesn't seem to play a part in this situation.

    >> "\\a\\b\\c".gsub(/\\/,"\\\\")

    =3D> "\\a\\b\\c"
    >> "\\a\\b\\c".gsub(/\\/,"\\\\\\")

    =3D> "\\\\a\\\\b\\\\c"

    Regards,
    Ammar
    Ammar Ali, Nov 21, 2010
    #7
  8. On Mon, Nov 22, 2010 at 12:27 AM, Ammar Ali <> wrote:
    > On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <> wro=

    te:
    >> Ralph Shnelvar wrote in post #962847:
    >>> =A0 "\\1\\2\\3".gsub(/\\/,"\\\\")

    >>
    >> Here you are replacing one backslash with one backslash.
    >>
    >> The trouble is, in the *replacement* string, '\1' has a special meaning
    >> (insert the value of the first capture). Because of this, a literal
    >> backslash is backslash-backslash.

    >
    > That's a keen observation, but the fact that they happen to be
    > back-references doesn't seem to play a part in this situation.
    >
    >>> "\\a\\b\\c".gsub(/\\/,"\\\\")

    > =3D> "\\a\\b\\c"
    >>> "\\a\\b\\c".gsub(/\\/,"\\\\\\")

    > =3D> "\\\\a\\\\b\\\\c"


    The key point to understand IMHO is that a backslash is special in
    replacement strings. So, whenever one wants to have a literal
    backslash in a replacement string one needs to escape it and hence
    have to backslashes. Coincidentally a backslash is also special in a
    string (even in a single quoted string). So you need two levels of
    escaping, makes 2 * 2 =3D 4 backslashes on the screen for one literal
    replacement backslash.

    Additionally people are often confused by the fact that IRB by default
    uses #inspect for showing expression values which will display twice
    as much backslashes as are present in the string. :)

    <grumpy>Can we please make a big red sticker and put it on every Ruby
    installer and source tar to inform people of this and the local
    variable method ambiguity. These two seem to be the issues that pop
    up most of the time.</grumpy>

    Kind regards

    robert

    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Nov 22, 2010
    #8
  9. Ralph Shnelvar

    Ammar Ali Guest

    On Mon, Nov 22, 2010 at 10:38 AM, Robert Klemme
    <> wrote:
    > On Mon, Nov 22, 2010 at 12:27 AM, Ammar Ali <> wrote=

    :
    >> On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <> wr=

    ote:
    >>> Ralph Shnelvar wrote in post #962847:
    >>>> =C2=A0 "\\1\\2\\3".gsub(/\\/,"\\\\")
    >>>
    >>> Here you are replacing one backslash with one backslash.
    >>>
    >>> The trouble is, in the *replacement* string, '\1' has a special meaning
    >>> (insert the value of the first capture). Because of this, a literal
    >>> backslash is backslash-backslash.

    >>
    >> That's a keen observation, but the fact that they happen to be
    >> back-references doesn't seem to play a part in this situation.
    >>
    >>>> "\\a\\b\\c".gsub(/\\/,"\\\\")

    >> =3D> "\\a\\b\\c"
    >>>> "\\a\\b\\c".gsub(/\\/,"\\\\\\")

    >> =3D> "\\\\a\\\\b\\\\c"

    >
    > The key point to understand IMHO is that a backslash is special in
    > replacement strings. =C2=A0So, whenever one wants to have a literal
    > backslash in a replacement string one needs to escape it and hence
    > have to backslashes. =C2=A0Coincidentally a backslash is also special in =

    a
    > string (even in a single quoted string). =C2=A0So you need two levels of
    > escaping, makes 2 * 2 =3D 4 backslashes on the screen for one literal
    > replacement backslash.


    Actually, 3 backslashes will yield one backslash. The first two result
    in one (escaped), and the third one, escaped by the previous escaped
    backslash ends up being one. My second example showed this, using 6
    backslashes instead of 8. Using 4 backslashes works because the second
    pair yields and escaped backslash, but it is not necessary.

    Regards,
    Ammar
    Ammar Ali, Nov 22, 2010
    #9
  10. On Mon, Nov 22, 2010 at 1:28 PM, Ammar Ali <> wrote:
    > On Mon, Nov 22, 2010 at 10:38 AM, Robert Klemme
    > <> wrote:
    >> On Mon, Nov 22, 2010 at 12:27 AM, Ammar Ali <> wrot=

    e:
    >>> On Sun, Nov 21, 2010 at 11:02 PM, Brian Candler <> w=

    rote:
    >>>> Ralph Shnelvar wrote in post #962847:
    >>>>> =A0 "\\1\\2\\3".gsub(/\\/,"\\\\")
    >>>>
    >>>> Here you are replacing one backslash with one backslash.
    >>>>
    >>>> The trouble is, in the *replacement* string, '\1' has a special meanin=

    g
    >>>> (insert the value of the first capture). Because of this, a literal
    >>>> backslash is backslash-backslash.
    >>>
    >>> That's a keen observation, but the fact that they happen to be
    >>> back-references doesn't seem to play a part in this situation.
    >>>
    >>>>> "\\a\\b\\c".gsub(/\\/,"\\\\")
    >>> =3D> "\\a\\b\\c"
    >>>>> "\\a\\b\\c".gsub(/\\/,"\\\\\\")
    >>> =3D> "\\\\a\\\\b\\\\c"

    >>
    >> The key point to understand IMHO is that a backslash is special in
    >> replacement strings. =A0So, whenever one wants to have a literal
    >> backslash in a replacement string one needs to escape it and hence
    >> have to backslashes. =A0Coincidentally a backslash is also special in a
    >> string (even in a single quoted string). =A0So you need two levels of
    >> escaping, makes 2 * 2 =3D 4 backslashes on the screen for one literal
    >> replacement backslash.

    >
    > Actually, 3 backslashes will yield one backslash. The first two result
    > in one (escaped), and the third one, escaped by the previous escaped
    > backslash ends up being one. My second example showed this, using 6
    > backslashes instead of 8. Using 4 backslashes works because the second
    > pair yields and escaped backslash, but it is not necessary.


    That does not work reliably under all circumstances though:

    irb(main):006:0> "abc".gsub /./, "\\\n"
    =3D> "\\\n\\\n\\\n"
    irb(main):007:0> puts("abc".gsub /./, "\\\n")
    \
    \
    \
    =3D> nil
    irb(main):008:0> "abc".gsub /./, "\\\\n"
    =3D> "\\n\\n\\n"
    irb(main):009:0> puts("abc".gsub /./, "\\\\n")
    \n\n\n
    =3D> nil

    It is safer to use 4 backslashes. This is the only robust way to do
    this even though sometimes you can simply use a single backslash (e.g.
    \1 instead of \\1) because string parsing is a bit tolerant under some
    circumstances:

    irb(main):014:0> '\1'
    =3D> "\\1"
    irb(main):015:0> '\\1'
    =3D> "\\1"

    but

    irb(main):019:0> "\n"
    =3D> "\n"
    irb(main):020:0> "\\n"
    =3D> "\\n"
    irb(main):021:0> "\1"
    =3D> "\x01"
    irb(main):022:0> "\\1"
    =3D> "\\1"


    Kind regards

    robert

    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Nov 22, 2010
    #10
  11. Ralph Shnelvar

    Ammar Ali Guest

    On Mon, Nov 22, 2010 at 3:53 PM, Robert Klemme
    <> wrote:
    > On Mon, Nov 22, 2010 at 1:28 PM, Ammar Ali <> wrote:
    >>
    >> Actually, 3 backslashes will yield one backslash. The first two result
    >> in one (escaped), and the third one, escaped by the previous escaped
    >> backslash ends up being one. My second example showed this, using 6
    >> backslashes instead of 8. Using 4 backslashes works because the second
    >> pair yields and escaped backslash, but it is not necessary.

    >
    > That does not work reliably under all circumstances though:
    >
    > irb(main):006:0> "abc".gsub /./, "\\\n"
    > =3D> "\\\n\\\n\\\n"
    > irb(main):007:0> puts("abc".gsub /./, "\\\n")
    > \
    > \
    > \
    > =3D> nil
    > irb(main):008:0> "abc".gsub /./, "\\\\n"
    > =3D> "\\n\\n\\n"
    > irb(main):009:0> puts("abc".gsub /./, "\\\\n")
    > \n\n\n
    > =3D> nil


    I think these examples are somewhat misleading, because the escaped
    newline (\n) normally includes a backslash. Taking that into account,
    i.e. not counting the one that is part of newline character, the first
    example is only using 2 backslashes, and the second example is using
    3. The same goes for its friends, \a, \r, \f, etc.

    > It is safer to use 4 backslashes. =C2=A0This is the only robust way to do
    > this even though sometimes you can simply use a single backslash (e.g.
    > \1 instead of \\1) because string parsing is a bit tolerant under some
    > circumstances:


    I don't think this is tolerance from the string parser, it is
    recognition of the \1 as a valid octal value.

    > irb(main):014:0> '\1'
    > =3D> "\\1"
    > irb(main):015:0> '\\1'
    > =3D> "\\1"


    Here the single quotes are coming into play. Octal escapes are not
    recognized within them. But it outputs the string in double quotes,
    "forcing" the backslash to be escaped in the output. Backslashes need
    to be escaped in single quoted string, just like they do in double
    quoted ones, so in the second example ('\\1'), it's just one
    backslash, again.

    > but
    >
    > irb(main):019:0> "\n"
    > =3D> "\n"
    > irb(main):020:0> "\\n"
    > =3D> "\\n"
    > irb(main):021:0> "\1"
    > =3D> "\x01"
    > irb(main):022:0> "\\1"
    > =3D> "\\1"


    Here the double quotes are taking effect. The first correctly prints a
    newline, the second an escaped one, the third gets recognized as an
    octal escape, and the last escapes the meaning of the backslash that
    would otherwise cause the 1 to be interpreted as an octal value.

    Maybe using 4 backslashes is safer, overall, but I wouldn't make it a
    rule. At least not without explaining these special cases that include
    a leading backslash in their normal representation.

    Regards,
    Ammar
    Ammar Ali, Nov 22, 2010
    #11
  12. On 22.11.2010 18:21, Ammar Ali wrote:
    > On Mon, Nov 22, 2010 at 3:53 PM, Robert Klemme
    > <> wrote:
    >> On Mon, Nov 22, 2010 at 1:28 PM, Ammar Ali<> wrote:
    >>>
    >>> Actually, 3 backslashes will yield one backslash. The first two result
    >>> in one (escaped), and the third one, escaped by the previous escaped
    >>> backslash ends up being one. My second example showed this, using 6
    >>> backslashes instead of 8. Using 4 backslashes works because the second
    >>> pair yields and escaped backslash, but it is not necessary.

    >>
    >> That does not work reliably under all circumstances though:
    >>
    >> irb(main):006:0> "abc".gsub /./, "\\\n"
    >> => "\\\n\\\n\\\n"
    >> irb(main):007:0> puts("abc".gsub /./, "\\\n")
    >> \
    >> \
    >> \
    >> => nil
    >> irb(main):008:0> "abc".gsub /./, "\\\\n"
    >> => "\\n\\n\\n"
    >> irb(main):009:0> puts("abc".gsub /./, "\\\\n")
    >> \n\n\n
    >> => nil

    >
    > I think these examples are somewhat misleading, because the escaped
    > newline (\n) normally includes a backslash. Taking that into account,
    > i.e. not counting the one that is part of newline character, the first
    > example is only using 2 backslashes, and the second example is using
    > 3. The same goes for its friends, \a, \r, \f, etc.


    That is the very point of my posting: you cannot always use three
    slashes reliably because - ooops - all of a sudden the last one may be
    part of something else. In other case, it happens to work

    irb(main):002:0> "abc".gsub /./, "\\\y"
    => "\\y\\y\\y"
    irb(main):003:0> "abc".gsub /./, "\\\\y"
    => "\\y\\y\\y"

    Now if someone changes "y" to "n" in the first case the (probably
    unintended) effect is dramatic. Or consider a replacement string 'foo
    \1 bar' which at some point in time is changed to "foo \1 bar \n"
    unsuspectingly and which suddenly does not only yield a newline but some
    weird octal character. This would have been avoided if the original
    string did contain two backslashes already.

    >> It is safer to use 4 backslashes. This is the only robust way to do
    >> this even though sometimes you can simply use a single backslash (e.g.
    >> \1 instead of \\1) because string parsing is a bit tolerant under some
    >> circumstances:

    >
    > I don't think this is tolerance from the string parser, it is
    > recognition of the \1 as a valid octal value.
    >
    >> irb(main):014:0> '\1'
    >> => "\\1"
    >> irb(main):015:0> '\\1'
    >> => "\\1"

    >
    > Here the single quotes are coming into play. Octal escapes are not
    > recognized within them. But it outputs the string in double quotes,
    > "forcing" the backslash to be escaped in the output. Backslashes need
    > to be escaped in single quoted string, just like they do in double
    > quoted ones, so in the second example ('\\1'), it's just one
    > backslash, again.


    Apparently I was not clear enough. The point is, that there is some
    tolerance. Both sequences (line 14 and 15) produce the *same* output
    although they differ in backslash usage. This does not work if you try
    to write '\' to get a single backslash. For that you need '\\'. If you
    use two backslashes in both cases it's clear what happens and there is
    no room for errors.

    >> but
    >>
    >> irb(main):019:0> "\n"
    >> => "\n"
    >> irb(main):020:0> "\\n"
    >> => "\\n"
    >> irb(main):021:0> "\1"
    >> => "\x01"
    >> irb(main):022:0> "\\1"
    >> => "\\1"

    >
    > Here the double quotes are taking effect. The first correctly prints a
    > newline, the second an escaped one,


    This is not an "escaped newline" but merely a backslash followed by
    character "n". Whether that is considered "escaped" in some way depends
    on the code that processes this string. If at all this is an escaped
    "n". :)

    > the third gets recognized as an
    > octal escape, and the last escapes the meaning of the backslash that
    > would otherwise cause the 1 to be interpreted as an octal value.


    Correct.

    > Maybe using 4 backslashes is safer, overall, but I wouldn't make it a
    > rule. At least not without explaining these special cases that include
    > a leading backslash in their normal representation.


    My precise reason to make it a rule is that it is simple and beginners
    do not have to remember all these special cases that you find so worthy
    mentioning.

    Actually I do not like those special cases and would rather suggest to
    remove them since they make things unnecessary complicated. The
    repeated occurrence of newbie confusion and the very discussion we are
    having here proves that the logic creates more confusion than clarity.
    The only reason I do not suggest to change this is the fact that this
    might break a lot of code.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Nov 22, 2010
    #12
  13. Ralph Shnelvar

    Ammar Ali Guest

    On Mon, Nov 22, 2010 at 9:25 PM, Robert Klemme
    <> wrote:
    > On 22.11.2010 18:21, Ammar Ali wrote:
    >> I don't think this is tolerance from the string parser, it is
    >> recognition of the \1 as a valid octal value.
    >>
    >>> irb(main):014:0> =C2=A0'\1'
    >>> =3D> =C2=A0"\\1"
    >>> irb(main):015:0> =C2=A0'\\1'
    >>> =3D> =C2=A0"\\1"

    ----8<----
    >
    > Apparently I was not clear enough. =C2=A0The point is, that there is some
    > tolerance. =C2=A0Both sequences (line 14 and 15) produce the *same* outpu=

    t
    > although they differ in backslash usage. =C2=A0This does not work if you =

    try to
    > write '\' to get a single backslash. =C2=A0For that you need '\\'. =C2=A0=

    If you use
    > two backslashes in both cases it's clear what happens and there is no roo=

    m
    > for errors.


    I guess I took issue with the word tolerance. I don't think of lexers
    and parsers as tolerant. They are quite ruthless and dictatorial. It's
    either their way, or their way in a way one did not expect. :)


    > This is not an "escaped newline" but merely a backslash followed by
    > character "n". =C2=A0Whether that is considered "escaped" in some way dep=

    ends on
    > the code that processes this string. =C2=A0If at all this is an escaped "=

    n". :)

    You are correct sir. For someone who was nitpicking, I misspoke. :)


    > My precise reason to make it a rule is that it is simple and beginners do
    > not have to remember all these special cases that you find so worthy
    > mentioning.


    This might be six of one, half a dozen of the other kind of situation.
    People would start to ask if the backslash in the \n case would count
    in the "just add 4" rule, or not? 4 backslashes in total or 5? It
    seems to only shift the issue slightly, and temporarily, until one has
    to actually understand what is really going on.

    > Actually I do not like those special cases and would rather suggest to
    > remove them since they make things unnecessary complicated. =C2=A0The rep=

    eated
    > occurrence of newbie confusion and the very discussion we are having here
    > proves that the logic creates more confusion than clarity. The only reaso=

    n I
    > do not suggest to change this is the fact that this might break a lot of
    > code.


    I agree, but this long "heritage" that goes back to the 60s is
    probably very hard to shake. Maybe a new language can break away from
    it.

    Out of curiosity, what could these beasts be replaced with? Constants?

    Cheers,
    Ammar
    Ammar Ali, Nov 22, 2010
    #13
  14. On Mon, Nov 22, 2010 at 10:06 PM, Ammar Ali <> wrote:
    > On Mon, Nov 22, 2010 at 9:25 PM, Robert Klemme
    > <> wrote:
    >> On 22.11.2010 18:21, Ammar Ali wrote:
    >>> I don't think this is tolerance from the string parser, it is
    >>> recognition of the \1 as a valid octal value.
    >>>
    >>>> irb(main):014:0> =A0'\1'
    >>>> =3D> =A0"\\1"
    >>>> irb(main):015:0> =A0'\\1'
    >>>> =3D> =A0"\\1"

    > ----8<----
    >>
    >> Apparently I was not clear enough. =A0The point is, that there is some
    >> tolerance. =A0Both sequences (line 14 and 15) produce the *same* output
    >> although they differ in backslash usage. =A0This does not work if you tr=

    y to
    >> write '\' to get a single backslash. =A0For that you need '\\'. =A0If yo=

    u use
    >> two backslashes in both cases it's clear what happens and there is no ro=

    om
    >> for errors.

    >
    > I guess I took issue with the word tolerance. I don't think of lexers
    > and parsers as tolerant. They are quite ruthless and dictatorial. It's
    > either their way, or their way in a way one did not expect. :)


    :) But rules can be made to allow for some flexibility (just think
    of method calls with or without brackets in Ruby).

    >> This is not an "escaped newline" but merely a backslash followed by
    >> character "n". =A0Whether that is considered "escaped" in some way depen=

    ds on
    >> the code that processes this string. =A0If at all this is an escaped "n"=

    :)
    >
    > You are correct sir. For someone who was nitpicking, I misspoke. :)


    No problem. Apparently we both enjoy nitpicking. :))

    >> My precise reason to make it a rule is that it is simple and beginners d=

    o
    >> not have to remember all these special cases that you find so worthy
    >> mentioning.

    >
    > This might be six of one, half a dozen of the other kind of situation.
    > People would start to ask if the backslash in the \n case would count
    > in the "just add 4" rule, or not? 4 backslashes in total or 5? It
    > seems to only shift the issue slightly, and temporarily, until one has
    > to actually understand what is really going on.


    Hmm... Maybe.

    >> Actually I do not like those special cases and would rather suggest to
    >> remove them since they make things unnecessary complicated. =A0The repea=

    ted
    >> occurrence of newbie confusion and the very discussion we are having her=

    e
    >> proves that the logic creates more confusion than clarity. The only reas=

    on I
    >> do not suggest to change this is the fact that this might break a lot of
    >> code.

    >
    > I agree, but this long "heritage" that goes back to the 60s is
    > probably very hard to shake. Maybe a new language can break away from
    > it.


    In Ruby's case the heritage does not go back to the sixties but rather
    to the nineties (1997) if I am not mistaken.

    > Out of curiosity, what could these beasts be replaced with? Constants?


    I'd leave everything as is except drop special cases like '\1' (this
    would either be an octal escape as in a double quoted string or rather
    just "1"). In single quoted strings only ' would be special if
    preceded by a backslash. In double quoted strings I would have those
    characters which are special currently (", n, r, a, t and probably
    others I'm not thinking of right now). I am undecided whether I would
    make all others errors or tolerant (e.g. "\z" would either by a syntax
    error or just "z"). I have a slight tendency to the more strict
    variant though because otherwise people might be left wondering what
    \z means when it is just "z"; also, this would help detect typing
    errors (maybe someone wanted to type "\t" which is just a key away in
    my German keyboard).

    Kind regards

    robert


    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Nov 23, 2010
    #14
  15. Ralph Shnelvar

    Ammar Ali Guest

    [Note: parts of this message were removed to make it a legal post.]

    On Tue, Nov 23, 2010 at 11:17 AM, Robert Klemme
    <>wrote:

    > On Mon, Nov 22, 2010 at 10:06 PM, Ammar Ali <> wrote:
    > >
    > > I guess I took issue with the word tolerance. I don't think of lexers
    > > and parsers as tolerant. They are quite ruthless and dictatorial. It's
    > > either their way, or their way in a way one did not expect. :)

    >
    > :) But rules can be made to allow for some flexibility (just think
    > of method calls with or without brackets in Ruby).



    That's a good example, and I know understand what you meant by tolerance.



    > >> This is not an "escaped newline" but merely a backslash followed by
    > >> character "n". Whether that is considered "escaped" in some way depends

    > on
    > >> the code that processes this string. If at all this is an escaped "n".

    > :)
    > >
    > > You are correct sir. For someone who was nitpicking, I misspoke. :)

    >
    > No problem. Apparently we both enjoy nitpicking. :))



    :)


    > I agree, but this long "heritage" that goes back to the 60s is
    > > probably very hard to shake. Maybe a new language can break away from
    > > it.

    >
    > In Ruby's case the heritage does not go back to the sixties but rather
    > to the nineties (1997) if I am not mistaken.



    I was thinking of C, which I believe introduced these escapes, but I'm not
    sure.



    > > Out of curiosity, what could these beasts be replaced with? Constants?

    >
    > I'd leave everything as is except drop special cases like '\1' (this
    > would either be an octal escape as in a double quoted string or rather
    > just "1"). In single quoted strings only ' would be special if
    > preceded by a backslash. In double quoted strings I would have those
    > characters which are special currently (", n, r, a, t and probably
    > others I'm not thinking of right now). I am undecided whether I would
    > make all others errors or tolerant (e.g. "\z" would either by a syntax
    > error or just "z"). I have a slight tendency to the more strict
    > variant though because otherwise people might be left wondering what
    > \z means when it is just "z"; also, this would help detect typing
    > errors (maybe someone wanted to type "\t" which is just a key away in
    > my German keyboard).




    I like the idea of treating unnecessary escapes as syntax errors, or at
    least warnings. I see this a lot in regular expressions, especially in
    character sets. Characters that don't need to be escaped (like ? and *) are
    preceded with a backslash, just to be safe I guess, making for a harder to
    code, as you noted.

    Regards,
    Ammar
    Ammar Ali, Nov 23, 2010
    #15
  16. On Tue, Nov 23, 2010 at 12:39 PM, Ammar Ali <> wrote:
    > On Tue, Nov 23, 2010 at 11:17 AM, Robert Klemme
    > <>wrote:
    >
    >> On Mon, Nov 22, 2010 at 10:06 PM, Ammar Ali <> wrot=

    e:

    >> I agree, but this long "heritage" that goes back to the 60s is
    >> > probably very hard to shake. Maybe a new language can break away from
    >> > it.

    >>
    >> In Ruby's case the heritage does not go back to the sixties but rather
    >> to the nineties (1997) if I am not mistaken.

    >
    > I was thinking of C, which I believe introduced these escapes, but I'm no=

    t
    > sure.


    Yeah, but I don't want to change \n, \t etc. in double quoted strings.
    I mostly want to get rid of '\1' which is something completely
    specific to Ruby.

    >> > Out of curiosity, what could these beasts be replaced with? Constants?

    >>
    >> I'd leave everything as is except drop special cases like '\1' (this
    >> would either be an octal escape as in a double quoted string or rather
    >> just "1"). =A0In single quoted strings only ' would be special if
    >> preceded by a backslash. =A0In double quoted strings I would have those
    >> characters which are special currently (", n, r, a, t and probably
    >> others I'm not thinking of right now). =A0I am undecided whether I would
    >> make all others errors or tolerant (e.g. "\z" would either by a syntax
    >> error or just "z"). =A0I have a slight tendency to the more strict
    >> variant though because otherwise people might be left wondering what
    >> \z means when it is just "z"; also, this would help detect typing
    >> errors (maybe someone wanted to type "\t" which is just a key away in
    >> my German keyboard).

    >
    > I like the idea of treating unnecessary escapes as syntax errors, or at
    > least warnings. I see this a lot in regular expressions, especially in
    > character sets. Characters that don't need to be escaped (like ? and *) a=

    re
    > preceded with a backslash, just to be safe I guess, making for a harder t=

    o
    > code, as you noted.


    Exactly. I would not want to get rid of optional brackets for example
    because lack of brackets can make code much more readable (apart from
    foo.bar=3D(123) looking weird). It's always a question of balance. I
    have to say that Matz did a remarkable job at this in Ruby in general.
    This is just one of very few things that could be better (class
    variables is another one I can think of right now).

    Kind regards

    robert

    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Nov 23, 2010
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?TWF0dCBIYW1pbHRvbg==?=

    GridView and Backslashes

    =?Utf-8?B?TWF0dCBIYW1pbHRvbg==?=, May 2, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    335
    =?Utf-8?B?TWF0dCBIYW1pbHRvbg==?=
    May 2, 2006
  2. David Naseby

    SQLite/ActiveRecord and Backslashes

    David Naseby, Oct 5, 2004, in forum: Ruby
    Replies:
    0
    Views:
    105
    David Naseby
    Oct 5, 2004
  3. aurelianito

    gsub and gsub! are inconsistent

    aurelianito, Nov 8, 2005, in forum: Ruby
    Replies:
    9
    Views:
    152
    Robert Klemme
    Nov 9, 2005
  4. Phil Rhoades
    Replies:
    3
    Views:
    161
    Logan Capaldo
    Dec 20, 2005
  5. John Wright

    gsub and backslashes

    John Wright, Jan 21, 2007, in forum: Ruby
    Replies:
    4
    Views:
    116
    William James
    Jan 21, 2007
Loading...

Share This Page