Regex negative look-behind bug?

R

Ruby Nuby

irb, Ruby 1.9.1

What am I missing here?

"b T T W b".match(/(?<!t t|a b) w/i)
=> nil

#The second look-behind is now just a
"b T T W b".match(/(?<!t t|a) w/i)
=> #<MatchData " W">

#Regex stays the same, the T T are now in lower case
"b t t W b".match(/(?<!t t|a) w/i)
=> nil

#Look-behind only contains the t t condition now and, T T are back to
upper case
"b T T W b".match(/(?<!t t) w/i)
=> nil
 
A

Ammar Ali

[Note: parts of this message were removed to make it a legal post.]

irb, Ruby 1.9.1

What am I missing here?

"b T T W b".match(/(?<!t t|a b) w/i)
=> nil

#The second look-behind is now just a
"b T T W b".match(/(?<!t t|a) w/i)
=> #<MatchData " W">

#Regex stays the same, the T T are now in lower case
"b t t W b".match(/(?<!t t|a) w/i)
=> nil

#Look-behind only contains the t t condition now and, T T are back to
upper case
"b T T W b".match(/(?<!t t) w/i)
=> nil

No bug here. It is doing exactly what you asked: only match a w if it is not
preceded by 't t'. In all cases the w is preceded by 't t', and in the case
that did match (?<!t t|a), the w was preceded by a 't t' but not an 'a', as
you asked, so it did match.
=> #<MatchData " W">

Regards,
Ammar
 
R

Robert Klemme

No bug here. It is doing exactly what you asked: only match a w if it is not
preceded by 't t'. In all cases the w is preceded by 't t', and in the case
that did match (?<!t t|a), the w was preceded by a 't t' but not an 'a', as
you asked, so it did match.

That was an alternative! If the RX in the lookbehind can match, the
negative lookbehind must fail IMHO.

There is a problem with the match though. I suspect there is an issue
with case sensitivity propagation

irb(main):009:0> "b T T W b".match(/(?<!t t|a) w/i)
=> #<MatchData " W">
irb(main):010:0> "b T T W b".match(/(?i:<!t t|a) w/i)
=> nil

irb(main):013:0> RUBY_VERSION
=> "1.9.1"
irb(main):014:0> RUBY_PATCHLEVEL
=> 430

Kind regards

robert
 
A

Ammar Ali

[Note: parts of this message were removed to make it a legal post.]

That was an alternative! If the RX in the lookbehind can match, the
negative lookbehind must fail IMHO.
The thing is what's in the lookbehind, and all assertions for that matter,
is not really a regular expression. It is a fixed length literal. The only
exception, AFAIK, is character sets because they are also fixed length. The
engine needs to know how many characters to step back and examine.

Also the first alternative that matches wins. Here it is in lower case and
without ignoring case:
=> nil


There is a problem with the match though. I suspect there is an issue
with case sensitivity propagation

irb(main):009:0> "b T T W b".match(/(?<!t t|a) w/i)
=> #<MatchData " W">
irb(main):010:0> "b T T W b".match(/(?i:<!t t|a) w/i)
=> nil

That's not a valid assertion any more, it is now an options specification.
=> #<MatchData "<!t t w">


irb(main):013:0> RUBY_VERSION
=> "1.9.1"
irb(main):014:0> RUBY_PATCHLEVEL
=> 430

I initially tried the cases with 1.9.2, but I tried the above with the
latest 1.9.1 on my system (a bit older).
=> 378

Regards,
Ammar
 
R

Robert Klemme

The thing is what's in the lookbehind, and all assertions for that matter= ,
is not really a regular expression. It is a fixed length literal. The onl= y
exception, AFAIK, is character sets because they are also fixed length. T= he
engine needs to know how many characters to step back and examine.

Docs say that the regexp cannot be unlimited. But it is by far not
only a fixed length literal. "|" is certainly meta in an assertion -
the second line would not match if the lookbehind assertion was a
literal.

10:45:31 ~$ ruby19 x.rb
bc /(?<=3Dab)c/ []
bc /(?<=3Da|b)c/ ["c"]
bc /(?<=3Da\|b)c/ []
abc /(?<=3Dab)c/ ["c"]
abc /(?<=3Da|b)c/ ["c"]
abc /(?<=3Da\|b)c/ []
a|bc /(?<=3Dab)c/ []
a|bc /(?<=3Da|b)c/ ["c"]
a|bc /(?<=3Da\|b)c/ ["c"]
a\|bc /(?<=3Dab)c/ []
a\|bc /(?<=3Da|b)c/ ["c"]
a\|bc /(?<=3Da\|b)c/ []
10:45:32 ~$ cat x.rb

str =3D ["bc", "abc", "a|bc", "a\\|bc"]
rxs =3D [/(?<=3Dab)c/,/(?<=3Da|b)c/,/(?<=3Da\|b)c/]

str.each do |s|
rxs.each do |r|
printf "%-10s %-15p %p\n", s, r, s.scan(r)
end
end

10:45:45 ~$


Docs even say "In negative-look-behind, captured group isn't allowed,
but shy group(?:) is allowed." So it's a regexp albeit a limited one.

http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
 
A

Ammar Ali

[Note: parts of this message were removed to make it a legal post.]

Docs say that the regexp cannot be unlimited. But it is by far not
only a fixed length literal. "|" is certainly meta in an assertion -
the second line would not match if the lookbehind assertion was a
literal.

Yes, please excuse the terseness of my last response. I wrote it as I was
rushing out the door.

What I meant, but did not properly clarify, is; the contents of assertions
are not *full* expressions. They can not contain quantifiers, they can not
contain captures, and they can not include backreferences or anything that
can complicate determining the length of the contents. Obviously alternation
is allowed, since that's what we were discussing. However, only as long as
the alternatives abide by the limitations.

Ruby's regular expression engine is quite flexible in this regard, as it
allows the alternatives to be of different lengths, unlike some other
engines that require them to be of the same length.


----8<----
The root issue still exists

irb(main):014:0> "a ac".scan /(?<!a a|b)c/i
=> []
irb(main):015:0> "A Ac".scan /(?<!a a|b)c/i
=> ["c"]
irb(main):016:0> "ac".scan /(?<!a|b)c/i
=> []
irb(main):017:0> "Ac".scan /(?<!a|b)c/i
=> []

----8<----


IMHO this is a bug.



OK, now that we've eliminated the syntax and the double-negative confusion,
I see the issue clearly. Thank you for your patience :)

It might be a bug, but since the contents of assertions do not go through
the full eval/exec cycle of "regular" regular expressions, this could be
just another limitation of assertions. It might be difficult to figure out
the last options in effect because they can be inserted multiple times in an
expression, on their own (from here on) and they can be nested. Which of
these would be used? Maybe just use the top level options? That can
potentially introduce more confusion.

Anyway, it's definitely worth reporting. Worst case, we'll find out it's a
limitation, and best case, it will end being a feature request, if not a
bug.

Is the OP able/willing to report this?

http://redmine.ruby-lang.org/

Regards,
Ammar
 
R

Ruby Nuby

Ammar, Robert,

Thank you both for your healthy discussions. I'm glad that I'm not crazy
and you guys agree that it's probably a bug or a very very special
feature :)

You guys understand the underlying issue and implications much better
than I do. I think it'd be better if one of you reported this instead of
I. Please don't fight over it :)

Thanks again.
 
A

Ammar Ali

[Note: parts of this message were removed to make it a legal post.]

Ammar, Robert,

Thank you both for your healthy discussions. I'm glad that I'm not crazy
and you guys agree that it's probably a bug or a very very special
feature :)
You're welcome. I'm fascinated by Oniguruma (ruby's regex engine) so this is
much fun for me. :)

Regards,
Ammar
 
A

Ammar Ali

[Note: parts of this message were removed to make it a legal post.]


Thanks. I was too busy to follow up yesterday.

I'm glad you reported it because I'm still on the fence about it being a
bug. I think the negative match (when it matches it doesn't) and alternation
are confusing in this case.

IMHO, the following examples prove that ignoring case works as expected, but
it's difficult to verify this with when alternation is added to the mix.

Expected, not ignoring casenil

Expected, case differs, and it's not being ignored3

Expected, case differs, but it's being ignorednil


Adding alternation is playing a part in either making it hard to tell which
part is matching, or not respecting the i option.

Thanks again,
Ammar
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top