Regexp riddle; escaping escapes

Discussion in 'Ruby' started by Phlip, Aug 17, 2007.

  1. Phlip

    Phlip Guest

    Rubies:

    Someone didn't escape their & in their HTML correctly. Let's fix it.

    This regexp correctly does not escape &dude, because we only want to escape
    raw & markers:

    p "yo &dude".gsub(/&([^a-z])/i, '&\1')

    That passed "yo &dude" thru unchanged. (I am aware "dude" has no ; on the
    end; we are leaving that optional, for whatever reason...)

    Now escape & followed by a non-alphabetic character:

    p "yo & dude".gsub(/&([^a-z])/i, '&\1')

    That correctly provides: "yo & dude"

    Now how to escape "yo && dude"? Note that the ([^a-z]) consumes the second
    &, leading to this incorrect output:

    "yo && dude"

    The only workaround I can think of is to run the Regexp twice:

    x = "yo && dude"
    2.times{ x.gsub!(/&([^a-z])/i, '&\1') }
    p x

    Can someone help my feeb Regexp skills and get a "yo && dude" in one
    line?

    --
    Phlip
    http://www.oreilly.com/catalog/9780596510657/
    ^ assert_xpath
    http://tinyurl.com/23tlu5 <-- assert_raise_message
     
    Phlip, Aug 17, 2007
    #1
    1. Advertising

  2. Phlip

    Tim Pease Guest

    On 8/17/07, Phlip <> wrote:
    > Rubies:
    >
    > Someone didn't escape their & in their HTML correctly. Let's fix it.
    >
    > This regexp correctly does not escape &dude, because we only want to escape
    > raw & markers:
    >
    > p "yo &dude".gsub(/&([^a-z])/i, '&amp;\1')
    >
    > That passed "yo &dude" thru unchanged. (I am aware "dude" has no ; on the
    > end; we are leaving that optional, for whatever reason...)
    >
    > Now escape & followed by a non-alphabetic character:
    >
    > p "yo & dude".gsub(/&([^a-z])/i, '&amp;\1')
    >
    > That correctly provides: "yo &amp; dude"
    >
    > Now how to escape "yo && dude"? Note that the ([^a-z]) consumes the second
    > &, leading to this incorrect output:
    >
    > "yo &amp;& dude"
    >
    > The only workaround I can think of is to run the Regexp twice:
    >
    > x = "yo && dude"
    > 2.times{ x.gsub!(/&([^a-z])/i, '&amp;\1') }
    > p x
    >
    > Can someone help my feeb Regexp skills and get a "yo &amp;&amp; dude" in one
    > line?
    >


    str = "yo && dude"
    str.gsub!( %r/&(?=[^a-z])/i, '&amp;')
    p str
    => "yo &amp;&amp; dude"


    The regular expression trick here is the (?=re) That's called the
    "zero-width positive lookahead". It matches, but it does not consume
    the string; so the gsub! will only replace the characters that are NOT
    inside (?=re).

    Blessings,
    TwP
     
    Tim Pease, Aug 17, 2007
    #2
    1. Advertising

  3. Phlip

    Phlip Guest

    Phlip, Aug 17, 2007
    #3
  4. Phlip

    Tim Pease Guest

    On 8/17/07, Phlip <> wrote:
    > Tim Pease wrote:
    >
    > > str.gsub!( %r/&(?=[^a-z])/i, '&amp;')

    >
    > Thanks!
    >
    > > "zero-width positive lookahead"

    >
    > Man, that was right there, but I was blocking on it. (-;
    >


    I had to pull my pickaxe off the shelf and look it up, too. Page 327
    in the second edition if you're interested in reading about it. It's
    in the first edition, too, that is available online.

    Blessings,
    TwP
     
    Tim Pease, Aug 17, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Phlip
    Replies:
    8
    Views:
    545
  2. Une bévue

    escaping for regexp ???

    Une bévue, Sep 17, 2006, in forum: Ruby
    Replies:
    1
    Views:
    90
    Une bévue
    Sep 17, 2006
  3. Joao Silva
    Replies:
    16
    Views:
    363
    7stud --
    Aug 21, 2009
  4. Intransition

    Regexp.escape with un-escapes

    Intransition, Dec 6, 2009, in forum: Ruby
    Replies:
    4
    Views:
    116
    Intransition
    Dec 8, 2009
  5. Jane Doe
    Replies:
    3
    Views:
    138
    Jane Doe
    Sep 13, 2003
Loading...

Share This Page