Strange regexp behaviour in gsub

  • Thread starter Kristof Bastiaensen
  • Start date
K

Kristof Bastiaensen

Hi,

I am wondering if this is the correct behaviour in gsub:

"bab".gsub(/(?!a)ab/, "cd")
=> "bab"

shouldn't that be "bcd"?

Also the following:
"\nab".gsub(/(?!\w)ab/, "cd")
=> "\nab"

This seems to work
"bab".gsub(/(?!c)ab/, "cd")
=> "bcd"

Kristof
 
J

Joel VanderWerf

Kristof said:
Hi,

I am wondering if this is the correct behaviour in gsub:

"bab".gsub(/(?!a)ab/, "cd")
=> "bab"

shouldn't that be "bcd"?

I think /(?!a)ab/ can't match anything. It's saying that the first
character after the beginning of the match must not be "a", and the
first character of the match must be "a". This is contradictory.
 
K

Kristof Bastiaensen

I think /(?!a)ab/ can't match anything. It's saying that the first
character after the beginning of the match must not be "a", and the
first character of the match must be "a". This is contradictory.

Yes, that would clarify the situation, but is it the correct
behaviour? I would think that (?!a)a doesn't mean the same
character, but consecutive ones. Because it doesn't consume
the character, it effectively is the character 'before' the
match (if any). The other behaviour wouldn't make sense,
because (?!a)b is then exactly the same as b.

Kristof
 
F

Florian Gross

Kristof said:
Yes, that would clarify the situation, but is it the correct
behaviour? I would think that (?!a)a doesn't mean the same
character, but consecutive ones. Because it doesn't consume
the character, it effectively is the character 'before' the
match (if any). The other behaviour wouldn't make sense,
because (?!a)b is then exactly the same as b.

I think that it's the intended behavior. Just use /(?!a).b/ if you want
to consume the character.

Thinking about this, it is indeed possible to implement fixed-width
look-behind -- interesting.

Regards,
Florian Gross
 
D

David Alan Black

Hi --

Kristof Bastiaensen said:
Yes, that would clarify the situation, but is it the correct
behaviour? I would think that (?!a)a doesn't mean the same
character, but consecutive ones. Because it doesn't consume the
character, it effectively is the character 'before' the match (if
any).

/(?!a)/ doesn't match or consume any character; it refers to the state
of things between characters. The previous character (or start of
string) has come and gone; the assertion, now, is "what lies just
ahead is not 'a'".
The other behaviour wouldn't make sense, because (?!a)b is then
exactly the same as b.

Assertions like this always have the possibility of being redundant --
for example:

/(?=a)abc/ # same as /abc/

but there are a lot of cases where they aren't, and that's where they
become useful:

/David (?!Black)(\S+)/ # grab another David's last name


David
 
K

Kristof Bastiaensen

I think that it's the intended behavior. Just use /(?!a).b/ if you want
to consume the character.

Hi,
You are right, I looked it up in the manual, and there it was. The
term zero-width-look-ahead pretty much says it all. I must have
gotten the definition all wrong.
Thinking about this, it is indeed possible to implement fixed-width
look-behind -- interesting.

I was thinking more about something like variable-width look-between :)
Meaning for example a(?^\w+)b would match any a(.*)b if (.*) is
not equal to (\w+)
Regards,
Florian Gross

Thanks,
Kristof
 
D

David Alan Black

Hi --

Kristof Bastiaensen said:
I was thinking more about something like variable-width look-between :)
Meaning for example a(?^\w+)b would match any a(.*)b if (.*) is
not equal to (\w+)

For that particular case you can use \W (opposite of \w):

/a\W*b/ # a + zero or more non-\w + b

For more specific cases, you can use a negated character class:

/a[^123]*b/ # a + [zero or more of NOT 1,2,3] + b


David
 
J

Joel VanderWerf

David said:
Hi --




For that particular case you can use \W (opposite of \w):

/a\W*b/ # a + zero or more non-\w + b

Not quite the same:

/a\W*b/ =~ "a%xb"
# => nil

It sounded like the OP wanted an re that matched "a%xb", because "%x" is
"not equal to (\w+)". Sort of like:

/a(?!\w+b).*b/ =~ "a%xb"
# => 0
 
R

Robert Klemme

Florian Gross said:
I think that it's the intended behavior. Just use /(?!a).b/ if you want
to consume the character.

I'd use /[^a]b/ if I wanted to consume the character. No need for
negative lookahead here.

robert
 
R

Robert Klemme

Kristof Bastiaensen said:
Hi,
You are right, I looked it up in the manual, and there it was. The
term zero-width-look-ahead pretty much says it all. I must have
gotten the definition all wrong.


I was thinking more about something like variable-width look-between :)
Meaning for example a(?^\w+)b would match any a(.*)b if (.*) is
not equal to (\w+)

IMHO that's not generally possible with regular expressions. You'll
always have to define positively things that should match. Exclusion
character classes are just a means of convenience but this does not extend
to complete (sub) expressions.

For example: to match a.*a where the part in the middle does not contain
only b's (i.e. matches /b+/) you can do:

/a(.*[^b].*)?a/

irb(main):004:0> rx=/a(.*[^b].*)?a/
=> /a(.*[^b].*)?a/
irb(main):005:0> rx === "aa"
=> true
irb(main):006:0> rx === "aba"
=> false
irb(main):007:0> rx === "acba"
=> true

Regards

robert
 
D

David Alan Black

Hi --

Joel VanderWerf said:
Not quite the same:

/a\W*b/ =~ "a%xb"
# => nil

Whoops :)
It sounded like the OP wanted an re that matched "a%xb", because "%x" is
"not equal to (\w+)". Sort of like:

/a(?!\w+b).*b/ =~ "a%xb"
# => 0

Yes, you're right, though I'm driven to find something that doesn't
involve repeating the 'b'. Current iteration:

/a(.*\W.*)?b/.match("a%xb")
# => #<MatchData:0x4019d298>

(possibly with *?'s instead of *'s, depending on the OP's needs)


David
 
N

nobu.nokada

Hi,

At Wed, 12 May 2004 09:13:51 +0900,
Florian Gross wrote in [ruby-talk:99884]:
I think that it's the intended behavior. Just use /(?!a).b/ if you want
to consume the character.

Thinking about this, it is indeed possible to implement fixed-width
look-behind -- interesting.

Ruby 1.9 (Oniguruma) has look-behind feature.

$ ruby -v -e 'p "bab".gsub(/(?<!a)ab/, "cd")'
ruby 1.9.0 (2004-05-12) [i686-linux]
"bcd"
 
K

Kristof Bastiaensen

Hi,

At Wed, 12 May 2004 09:13:51 +0900,
Florian Gross wrote in [ruby-talk:99884]:
I think that it's the intended behavior. Just use /(?!a).b/ if you want
to consume the character.

Thinking about this, it is indeed possible to implement fixed-width
look-behind -- interesting.

Ruby 1.9 (Oniguruma) has look-behind feature.

$ ruby -v -e 'p "bab".gsub(/(?<!a)ab/, "cd")'
ruby 1.9.0 (2004-05-12) [i686-linux]
"bcd"

Great!

That is exactly what I needed. And I saw it has negative
look behind also. (?<!subexp)
I think this is especially usefull in String#gsub, so you don't
have to subgroup the context, and replicate it in the
substitution.

Kristof
 
F

Florian Gross

Hi,
Moin!

Ruby 1.9 (Oniguruma) has look-behind feature.

$ ruby -v -e 'p "bab".gsub(/(?<!a)ab/, "cd")'
ruby 1.9.0 (2004-05-12) [i686-linux]
"bcd"

Very nice. Does this work too?

ruby -v -e 'p "bab\nbcab".gsub(/^(?<!bc?)ab$/, "cd")'

(I would expect it to produce "bcd\nbccd".)

Regards,
Florian Gross
 
S

Simon Strandgaard

Florian said:
"bab\nbcab".gsub(/^(?<!bc?)ab$/, "cd")
^^^
^^^
oniguruma doesn't like your questionmark


Oniguruma only supports fixed width lookbehind.. quantifiers are not possible
however you can use alternation instead.

(?<!b(?:c|))
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top