[SUMMARY] Reverse the Polarity (#143)

R

Ruby Quiz

I was glad to see this problem submitted. I worked on it a ways back when I was
translating examples from Higher-Order Perl. It shows up in the book during the
discussion of infinite streams.

Watching others solve it was great fun because I realized that Ruby Quiz fans
are crazy. A couple of you implemented an almost scary amount of the regular
expression syntax.

Probably the craziest solution comes from Jesús who even shoehorned in
backreferences. Let's take a look at parts of that lengthy solution below.

First, some limitations:

# Number of times to repeat for Star and Plus repeaters
TIMES = 2

# Set of chars for Dot and negated [^] char groups
#CHARS = [("a".."z").to_a, ("A".."Z").to_a, ".", ",", ";"].flatten
CHARS = %w{a b c d e}

# ...

Many regular expressions can match an infinite number of Strings. Consider the
trivial expression /a+/. It matches "a", "aa", "aaa", etc. There are two ways
to handle this practically: limit the repetition or provide an infinite series
of matches user code can explore and stop as needed. Jesús did the former.

These variables control how much repetition is done and what character set is
used for constructs that match any character or any character not included in a
listed subset.

If your are curious to see the infinite streams approach, I wrote about it as
part of this blog post:

http://blog.grayproductions.net/articles/infinite_streams

That solution doesn't include the regular expression parsing code though.

Getting back to Jesús's code, the next section defines several classes in the
categories of repeaters and groups. Groups are pieces of a matched String that
can be generated and repeaters control how many times those groups appear.
Let's begin with a trivial repeater that doesn't really repeat:

# ...

class OneTimeRepeater
def initialize(group)
@group = group
end

def result
@group.result
end
end

# ...

As, you can see, this repeater just wraps some group. The result() of this
repeater is the result of that group, one time. Obviously this is used for
things that don't repeat.

The next repeater should be plenty familiar to regular expression fans though:

# ...

class StarRepeater
def initialize(group)
@group = group
end

def result
r = []
group_res = @group.result
group_res.unshift("")
TIMES.times do
r << group_res
end
combine(r).uniq
end
end

# ...

This would be the repeater used to handle expressions like /a*/. The result of
this repeater is zero or more occurrences of the group it was passed.
Practically speaking here that means between one and TIMES copies of the group
or the empty String to represent zero. We see this collected into an Array here
and passed on to combine() for generation.

The PlusRepeater, QuestionMarkRepeater and RangeRepeater classes are constructed
similarly. In the interests of space and time, I'm not going to show those
here.

Now we are ready for the groups. Again we begin with the trivial cases:

# ...

class SingleChar
def initialize(c)
@c = c
end
def result
[@c]
end
end

# ...

class Dot
def result
CHARS
end
end

# ...

Here we see groups that echo a character and represent a special range of
characters as their results. Obviously the first is used for literal chunks of
an expression, like /a/, and the second is used for the special regular
expression character /./.

Let's examine the group used to represent alternations like /a|b/:

# ...

class OrGroup
def initialize(first_groupset, second_groupset)
@first = first_groupset
@second = second_groupset
end

def result
strings = @first.map {|x| x.result}
s = combine(strings)
strings = @second.map {|x| x.result}
s.concat(combine(strings))
end
end

# ...

Since the result()s of a group are just the possible generations of that group,
alternation can be represented as the left choices plus the right choices.
That's what we see here.

Jesús's code includes other groups: CharGroup for character classes,
MultiGroup for nesting groups and managing reference counts, and BackReference
for supporting the regular expression feature of the same name. I'm going to
skip over these too, to keep the size of this summary reasonable.

Now let's have a look at the combine() method I've glossed over twice now:

# ...

# Combines arrays, concatenating each string
# merging the possible groups they have
# Starts combining the first two arrays, then goes on
# combining each other array to the result of the
# previous combination
def combine(arrays)
string = arrays.inject do |r, rep|
temp = []
r.each {|aa| rep.each {|bb| temp << (aa.concat_and_merge_groups(bb))}}
temp
end
string
end

class String
attr_accessor :filled_groups

def add_filled_group(num, group)
@filled_groups ||= {}
@filled_groups[num] = group
end

def concat_and_merge_groups(other)
temp = self + other
groups = {}
groups.merge!(self.filled_groups) if self.filled_groups
groups.merge!(other.filled_groups) if other.filled_groups
temp.filled_groups = groups
temp
end

end

# ...

The combine() method just turns an Array of group result()s into the actual
combinations of Strings. So given [%w[a b], %w[c d]] it outputs %w[ac ad bc
bd].

The String additions here that are tracking groups were added to support
backreferences. This was a challenging feature to support and it required
hooking into the code at many levels. I don't want to focus on it too much
though since it wasn't a critical part of the quiz so much as an impressive
extra Jesús managed to include.

The next method we need to look at is the parser. I'm going to trim some of it
because it's quite long, but you should still get to see how it's put together:

# ...

class Regexp
attr_reader :num_groups
def parse(s, i = 0)
repeaters = []
group = nil
while i < s.length
char = s.chr
case char
# ...
when '.'
group = Dot.new
when '|'
groups, i = parse(s, i + 1)
group = OrGroup.new(repeaters, groups)
return [group], i
# ...
else
group = SingleChar.new(char)
end

repeater = nil
i += 1
if i < s.length
case s.chr
when '*'
repeater = StarRepeater.new(group)
# ...
else
repeater = OneTimeRepeater.new(group)
i -= 1
end
i += 1
else
repeater = OneTimeRepeater.new(group)
end
repeaters << repeater
end
return repeaters, i
end

# ...

As you can see, this is a recursive character by character parser. It reads
through the expression finding groups and wrapping those in repeaters, with
whatever level of nesting is required. This builds up an Abstract Syntax Tree
for the expression.

Let's see how this gets put to use to create the quiz solution method:

# ...

def generate
@num_groups = 0
r = self.inspect[1..-2]
repeaters, _ = self.parse(r)
strings = repeaters.map {|x| x.result}
s = combine(strings)
# Makes a pass for the backreferences
s.each do |string|
string.gsub!(/__(\d+)__/) do |match|
string.filled_groups[$1.to_i]
end
end
s
end
end

The process here is very simple: pull the source of the expression, parse it
into the AST, generate the result() of the AST, and combine() those Strings into
the needed Array of matches. Again, the rest of the code here is used to
substitute backreferences back into the end results.

Jesús also included another method that pretty-printed and verified results,
but I won't go into that here.

As usual I don't want you to miss out on the other solutions. James Koppel used
currying to build up an AST of functions. Vasil Vangelovski sent in a slow but
unique approach that works on all regular expressions without even parsing them.
Do take the time to inspect the solutions. It's worth it.

My thanks to all who poured so much effort into this quiz. You are all
fearless.

Tomorrow we will take a stab at slicing up time...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top