[Regex] Alternative for look behind

J

Janus Bor

Hi all,

I need to replace all '-' characters at the beginning of a sequence
(string) with 'X's (until a character appears that is not '-').

E.g.

-------abcde-gh----

should be changed into

XXXXXXXabcde-gh----



If Ruby would support look behind in Regular Expressions, I could
probably do something like this:

sequence.gsub(/(?=^-*)-/, "X")

But unfortunately Ruby does not support look behind.

Of course, I could count the number of '-' at the beginning of the
sequence:

x = sequence[/-*/].length

And then I could replace the first x characters with "X". But I don't
like that solution, it feels clumsy. Is there an elegant way of doing
this?

Thanks in advance & best regards!
Janus
 
P

Peña, Botp

RnJvbTogSmFudXMgQm9yIFttYWlsdG86amFudXNAdXJiYW4teW91dGguY29tXSANCiMgSSBuZWVk
IHRvIHJlcGxhY2UgYWxsICctJyBjaGFyYWN0ZXJzIGF0IHRoZSBiZWdpbm5pbmcgb2YgYSBzZXF1
ZW5jZQ0KIyAoc3RyaW5nKSB3aXRoICdYJ3MgKHVudGlsIGEgY2hhcmFjdGVyIGFwcGVhcnMgdGhh
dCBpcyBub3QgJy0nKS4NCiMgRS5nLg0KIyAtLS0tLS0tYWJjZGUtZ2gtLS0tDQojIHNob3VsZCBi
ZSBjaGFuZ2VkIGludG8NCiMgWFhYWFhYWGFiY2RlLWdoLS0tLQ0KDQpteSBpbml0aWFsIHJlYWN0
aW9uIHdhcyBub3QgYSBsb29rIGJlaGluZCwNCg0KaXJiKG1haW4pOjA2MjowPiBzLmdzdWIoLyhe
LSopLyl7JDEudHIoIi0iLCJYIil9DQo9PiAiWFhYWFhYWGFiY2RlLWdoLS0tLSINCg0KDQp0aGUg
c2Vjb25kIHdhcyB0byB1c2Ugb25pZ3VydW1hLA0KDQpyZT1PbmlndXJ1bWE6Ok9SZWdleHAubmV3
KCAnKD88ZGFzaGVzPl4tKikoPzxhZnRlcj4uKiknICkNCiM9PiAvKD88ZGFzaGVzPl4tKikoPzxh
ZnRlcj4uKikvDQpzDQojPT4gIi0tLS0tLS1hYmNkZS1naC0tLS0iDQptPXJlLm1hdGNoIHMNCiM9
PiAjPE1hdGNoRGF0YToweDJjNTFiN2M+DQptWzpkYXNoZXNdDQojPT4gIi0tLS0tLS0iDQptWzph
ZnRlcl0NCiM9PiAiYWJjZGUtZ2gtLS0tIg0KLi4uDQoNCm5pY2UgbmFtaW5nIGZlYXR1cmUsIGJ1
dCBvdmVya2lsbCBmb3IgdGhpcyBjYXNlLiB5bW12Lg0KDQpraW5kIHJlZ2FyZHMgLWJvdHANCg0K
DQo=
 
R

Robert Klemme

I need to replace all '-' characters at the beginning of a sequence
(string) with 'X's (until a character appears that is not '-').

E.g.

-------abcde-gh----

should be changed into

XXXXXXXabcde-gh----

If Ruby would support look behind in Regular Expressions, I could
probably do something like this:

sequence.gsub(/(?=^-*)-/, "X")

But unfortunately Ruby does not support look behind.

It does - in 1.9. But:

irb(main):001:0> sequence = '-------abcde-gh----'
=> "-------abcde-gh----"
irb(main):002:0> sequence.gsub(/(?=^-*)-/, "X")
=> "X------abcde-gh----"
irb(main):003:0>

I am not sure lookbehind is the proper means here. After all you want
to replace a sequence of dashes *before* a particular sequence
(according to your description above). So you would rather use
lookforward, wouldn't you?

irb(main):003:0> sequence.gsub(/-(?=-*abcde)/, "X")
=> "XXXXXXXabcde-gh----"

Granted, it does not anchor.

Here's another solution

irb(main):008:0> s = sequence.dup
=> "-------abcde-gh----"
irb(main):009:0> while s.sub! /^(X*)-/, '\\1X'; end
=> nil
irb(main):010:0> s
=> "XXXXXXXabcde-gh----"

But I'd rather use Pena's solution or another block form

irb(main):011:0> sequence.sub(/^-*/) {|m| "X" * m.length}
=> "XXXXXXXabcde-gh----"
irb(main):012:0> sequence.sub(/^-*/) {|m| m.tr '-','X'}
=> "XXXXXXXabcde-gh----"

Kind regards

robert
 
J

Janus Bor

Thanks to both of you! I like your solutions. I'm a fan of one liners...

I ended up using this one, as it has the best readability for a novice
programmer like me imho:

Robert said:
irb(main):011:0> sequence.sub(/^-*/) {|m| "X" * m.length}
=> "XXXXXXXabcde-gh----"


Robert said:
It does - in 1.9. But:

irb(main):001:0> sequence = '-------abcde-gh----'
=> "-------abcde-gh----"
irb(main):002:0> sequence.gsub(/(?=^-*)-/, "X")
=> "X------abcde-gh----"

Thanks for the info. Just downloaded 1.9 to check it out. But I still
don't comprehend why look behind is not working like I expected. If I
write the equivalent replacing characters at the end of the string using
look ahead it works just fine:

irb(main):001:0> sequence = '-------abcde-gh----'
=> "-------abcde-gh----"
irb(main):002:0> sequence.gsub(/-(?=-*$)/, "X")
=> "-------abcde-ghXXXX"

Robert said:
I am not sure lookbehind is the proper means here. After all you want
to replace a sequence of dashes *before* a particular sequence
(according to your description above). So you would rather use
lookforward, wouldn't you?

irb(main):003:0> sequence.gsub(/-(?=-*abcde)/, "X")
=> "XXXXXXXabcde-gh----"

Granted, it does not anchor.

The problem with look ahead/forward is, that I have to make sure the
sequence starts with "-". Otherwise something like this could happen:

irb(main):003:0> sequence = 'abcde-gh----'
=> "abcde-gh----"
irb(main):004:0> sequence.gsub(/-(?=-*[abcdefgh])/, "X")
=> "abcdeXgh----"


Kind regards,
Janus
 
R

Robert Klemme

Thanks to both of you! I like your solutions. I'm a fan of one liners...

You're welcome!
Thanks for the info. Just downloaded 1.9 to check it out. But I still
don't comprehend why look behind is not working like I expected. If I
write the equivalent replacing characters at the end of the string using
look ahead it works just fine:

irb(main):001:0> sequence = '-------abcde-gh----'
=> "-------abcde-gh----"
irb(main):002:0> sequence.gsub(/-(?=-*$)/, "X")
=> "-------abcde-ghXXXX"

I believe the reason is that with lookbehind the regular expression
needs start matching at the *same* location (the beginning of the
sequence) multiple times. Even though lookbehind does not consume
characters I believe RX implementations prohibit matching at the same
location over and over again - partly for efficiency reasons but also to
avoid endless loops. The lookforward solution with replacement at the
end of the sequence moves the start position of the match one character
forward for every match. That's how I explain myself why the straight
forward lookbehind does not work.

There's another reason, why lookbehind won't work: the docs state

"Subexp of look-behind must be fixed character length."

See http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt

At the moment I cannot think of a solution with lookbehind that would
avoid these issues because all lookbehinds must start matching at the
beginning of the sequence in order to fulfill your requirement that the
initial sequence must be replaced.
Robert said:
I am not sure lookbehind is the proper means here. After all you want
to replace a sequence of dashes *before* a particular sequence
(according to your description above). So you would rather use
lookforward, wouldn't you?

irb(main):003:0> sequence.gsub(/-(?=-*abcde)/, "X")
=> "XXXXXXXabcde-gh----"

Granted, it does not anchor.

The problem with look ahead/forward is, that I have to make sure the
sequence starts with "-". Otherwise something like this could happen:

irb(main):003:0> sequence = 'abcde-gh----'
=> "abcde-gh----"
irb(main):004:0> sequence.gsub(/-(?=-*[abcdefgh])/, "X")
=> "abcdeXgh----"

Well, this won't happen if the sequence "abcde" is known to appear only
after an initial sequence of dashes. (Note the difference between your
lookforward solution and mine: you created a character class while I
just matched the plain sequence). But of course it's not the same as
"replace initial portion of dashes".

I can recommend "Mastering regular expressions" if you want to dive
deeper into this: it's pretty well written and does only have as much
theory of regular languages as necessary but explains very well
differences between regular expression implementations and also
considers efficiency aspects.

http://oreilly.com/catalog/9781565922570/

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top