look-behind in oniguruma

Discussion in 'Ruby' started by Phil Tomson, Sep 12, 2004.

  1. Phil Tomson

    Phil Tomson Guest

    Apparently oniguruma supports look-behind. Is there any documentation on how
    to use this feature?

    for example, if I had the string "~ABC~DE" and I want to return a list of
    letters in the string which are preceeded by '~' ( ['A','D'] in this case) how
    might I use the look-behind feature in oniguruma to achieve this? or, how
    would I get a list of letters in the string which are not preceeded by '~'
    (['B','C', 'E'] in this example.

    (I know there are other ways of doing this, I'm just posing this as an example
    of using look-behind).

    Here's one that's a bit trickier: What if I had "~(ABC)DE" and I want the
    tilde (a negation operator) to apply to each letter within the parens that it
    preceeds, so that I would get ['A','B','C'], but in the case where the input
    string is "(ABC)DE" I would get an empty list.... and then of course I would
    want them to be nestable: "~(~ABC)" ('A' should not appear in the list in this
    case since it's doubly negated - OK, that's probably going too far and
    maybe it's getting to the point where I should break out RACC ;-)

    Phil
    Phil Tomson, Sep 12, 2004
    #1
    1. Advertising

  2. On 12 Sep 2004 03:44:24 GMT, Phil Tomson <> wrote:
    > Apparently oniguruma supports look-behind. Is there any documentation on how
    > to use this feature?


    Essentially, they are the same as look-aheads ... zero-width assertions,
    except that the look-behind expression must be a fixed width pattern (no
    indeterminate quantifiers), and no captures are allowed in a negative
    look-behind

    > for example, if I had the string "~ABC~DE" and I want to return a list of
    > letters in the string which are preceeded by '~' ( ['A','D'] in this case) how
    > might I use the look-behind feature in oniguruma to achieve this? or, how
    > would I get a list of letters in the string which are not preceeded by '~'
    > (['B','C', 'E'] in this example.



    str = "~ABC~DE"
    p str.scan(/(?<=~)[A-Z]/)
    p str.scan(/(?<!~)[A-Z]/)

    gives:

    ["A", "D"]
    ["B", "C", "E"]

    regards,
    andrew

    --
    Andrew L. Johnson http://www.siaris.net/
    There are two types of programming languages; the ones that people bitch
    about and the ones that no one uses.
    -- Bjarne Stroustrup
    Andrew Johnson, Sep 12, 2004
    #2
    1. Advertising

  3. Phil Tomson

    Phil Tomson Guest

    In article <ABQ0d.397334$gE.56953@pd7tw3no>,
    Andrew Johnson <> wrote:
    ^^^^^^^^
    hmmm...

    >On 12 Sep 2004 03:44:24 GMT, Phil Tomson <> wrote:
    >> Apparently oniguruma supports look-behind. Is there any documentation on how
    >> to use this feature?

    >
    >Essentially, they are the same as look-aheads ... zero-width assertions,
    >except that the look-behind expression must be a fixed width pattern (no
    >indeterminate quantifiers), and no captures are allowed in a negative
    >look-behind
    >
    >> for example, if I had the string "~ABC~DE" and I want to return a list of
    >> letters in the string which are preceeded by '~' ( ['A','D'] in this case) how
    >> might I use the look-behind feature in oniguruma to achieve this? or, how
    >> would I get a list of letters in the string which are not preceeded by '~'
    >> (['B','C', 'E'] in this example.

    >
    >
    > str = "~ABC~DE"
    > p str.scan(/(?<=~)[A-Z]/)
    > p str.scan(/(?<!~)[A-Z]/)
    >
    >gives:
    >
    > ["A", "D"]
    > ["B", "C", "E"]
    >


    Thanks. That's what I was looking for. Is this essentially the same way that
    it works in Perl?

    Phil
    Phil Tomson, Sep 12, 2004
    #3
  4. Andrew Johnson wrote:

    > Essentially, they are the same as look-aheads ... zero-width assertions,
    > except that the look-behind expression must be a fixed width pattern (no
    > indeterminate quantifiers), and no captures are allowed in a negative
    > look-behind


    So it is implemented as zero-width look-ahead + eating as many
    characters as the content matches?

    (I've thought about implementing /foo/.preceded_by('bar') as
    /(?!bar).{3}foo/.)

    > regards,
    > andrew


    More regards,
    Florian Gross
    Florian Gross, Sep 12, 2004
    #4
  5. On Sunday 12 September 2004 06:54, Andrew Johnson wrote:
    > On 12 Sep 2004 03:44:24 GMT, Phil Tomson <> wrote:
    > > Apparently oniguruma supports look-behind. Is there any documentation on
    > > how to use this feature?

    >
    > Essentially, they are the same as look-aheads ... zero-width assertions,
    > except that the look-behind expression must be a fixed width pattern (no
    > indeterminate quantifiers), and no captures are allowed in a negative
    > look-behind


    Oniguruma supports alternation inside lookbehind, so you can get a similar
    behavior as quantifiers.

    AEditor's regexp engine supports variable width lookbehind, where you
    can use quantifiers inside lookbehind.. (with inversed left-most-longest
    rule).

    It would be good if Oniguruma had support for quantifiers inside lookbehind.

    irb(main):007:0> re = NewRegexp.new('(?=.z).(?<=(?:ab){2,3}x.)')
    => +-Sequence
    +-Lookahead positive
    | +-Sequence
    | +-Outside set=U-000A
    | +-Inside set="z"
    +-Outside set=U-000A
    +-Lookbehind positive
    +-Sequence
    +-Repeat greedy{2,3} # quantifier inside lookbehind!!
    | +-Group non-capturing
    | +-Sequence
    | +-Inside set="a"
    | +-Inside set="b"
    +-Inside set="x"
    +-Outside set=U-000A
    irb(main):008:0> 'xyz'.gsub5(re, 'Y')
    => "xyz"
    irb(main):009:0> 'abxyz'.gsub5(re, 'Y')
    => "abxyz"
    irb(main):010:0> 'ababxyz'.gsub5(re, 'Y')
    => "ababxYz"
    irb(main):011:0> 'abababxyz'.gsub5(re, 'Y')
    => "abababxYz"


    --
    Simon Strandgaard
    Simon Strandgaard, Sep 12, 2004
    #5
  6. On Sunday 12 September 2004 13:07, Simon Strandgaard wrote:
    > On Sunday 12 September 2004 06:54, Andrew Johnson wrote:
    > > On 12 Sep 2004 03:44:24 GMT, Phil Tomson <> wrote:
    > > > Apparently oniguruma supports look-behind. Is there any documentation
    > > > on how to use this feature?

    > >
    > > Essentially, they are the same as look-aheads ... zero-width assertions,
    > > except that the look-behind expression must be a fixed width pattern (no
    > > indeterminate quantifiers), and no captures are allowed in a negative
    > > look-behind

    >
    > Oniguruma supports alternation inside lookbehind, so you can get a similar
    > behavior as quantifiers.
    >
    > AEditor's regexp engine supports variable width lookbehind, where you
    > can use quantifiers inside lookbehind.. (with inversed left-most-longest
    > rule).
    >
    > It would be good if Oniguruma had support for quantifiers inside
    > lookbehind.



    (here is an example with infinite quantifiers)

    irb(main):016:0> re = NewRegexp.new('(?<!(ab)+|(cd){2,}).')
    => +-Sequence
    +-Lookbehind negative
    | +-Alternation
    | +-Repeat greedy{1,-1}
    | | +-Group capture=1
    | | +-Sequence
    | | +-Inside set="a"
    | | +-Inside set="b"
    | +-Repeat greedy{2,-1}
    | +-Group capture=2
    | +-Sequence
    | +-Inside set="c"
    | +-Inside set="d"
    +-Outside set=U-000A
    irb(main):017:0> 'qwerty'.gsub5(re, 'Z')
    => "ZZZZZZ"
    irb(main):018:0> 'qweabrty'.gsub5(re, 'Z')
    => "ZZZZZrZZ"
    irb(main):019:0> 'cdcdqwerty'.gsub5(re, 'Z')
    => "ZZZZqZZZZZ"
    irb(main):020:0> 'cdqwerty'.gsub5(re, 'Z')
    => "ZZZZZZZZ"
    irb(main):021:0>

    --
    Simon Strandgaard
    Simon Strandgaard, Sep 12, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. inhahe
    Replies:
    3
    Views:
    2,345
    Diez B. Roggisch
    Jan 28, 2005
  2. Simon Strandgaard

    regexp unlimited: Ruby's vs Oniguruma

    Simon Strandgaard, Nov 13, 2003, in forum: Ruby
    Replies:
    2
    Views:
    81
    Mark Wilson
    Nov 17, 2003
  3. Wolfgang Nádasi-Donner
    Replies:
    8
    Views:
    163
    Wolfgang Nádasi-Donner
    Jul 31, 2005
  4. Oniguruma -- when?

    , Dec 7, 2005, in forum: Ruby
    Replies:
    6
    Views:
    102
    Joe Van Dyk
    Dec 7, 2005
  5. Replies:
    4
    Views:
    169
Loading...

Share This Page