[SUMMARY] Reverse the Polarity (#143)

Discussion in 'Ruby' started by Ruby Quiz, Oct 18, 2007.

  1. Ruby Quiz

    Ruby Quiz Guest

    I was glad to see this problem submitted. I worked on it a ways back when I was
    translating examples from Higher-Order Perl. It shows up in the book during the
    discussion of infinite streams.

    Watching others solve it was great fun because I realized that Ruby Quiz fans
    are crazy. A couple of you implemented an almost scary amount of the regular
    expression syntax.

    Probably the craziest solution comes from Jesús who even shoehorned in
    backreferences. Let's take a look at parts of that lengthy solution below.

    First, some limitations:

    # Number of times to repeat for Star and Plus repeaters
    TIMES = 2

    # Set of chars for Dot and negated [^] char groups
    #CHARS = [("a".."z").to_a, ("A".."Z").to_a, ".", ",", ";"].flatten
    CHARS = %w{a b c d e}

    # ...

    Many regular expressions can match an infinite number of Strings. Consider the
    trivial expression /a+/. It matches "a", "aa", "aaa", etc. There are two ways
    to handle this practically: limit the repetition or provide an infinite series
    of matches user code can explore and stop as needed. Jesús did the former.

    These variables control how much repetition is done and what character set is
    used for constructs that match any character or any character not included in a
    listed subset.

    If your are curious to see the infinite streams approach, I wrote about it as
    part of this blog post:

    http://blog.grayproductions.net/articles/infinite_streams

    That solution doesn't include the regular expression parsing code though.

    Getting back to Jesús's code, the next section defines several classes in the
    categories of repeaters and groups. Groups are pieces of a matched String that
    can be generated and repeaters control how many times those groups appear.
    Let's begin with a trivial repeater that doesn't really repeat:

    # ...

    class OneTimeRepeater
    def initialize(group)
    @group = group
    end

    def result
    @group.result
    end
    end

    # ...

    As, you can see, this repeater just wraps some group. The result() of this
    repeater is the result of that group, one time. Obviously this is used for
    things that don't repeat.

    The next repeater should be plenty familiar to regular expression fans though:

    # ...

    class StarRepeater
    def initialize(group)
    @group = group
    end

    def result
    r = []
    group_res = @group.result
    group_res.unshift("")
    TIMES.times do
    r << group_res
    end
    combine(r).uniq
    end
    end

    # ...

    This would be the repeater used to handle expressions like /a*/. The result of
    this repeater is zero or more occurrences of the group it was passed.
    Practically speaking here that means between one and TIMES copies of the group
    or the empty String to represent zero. We see this collected into an Array here
    and passed on to combine() for generation.

    The PlusRepeater, QuestionMarkRepeater and RangeRepeater classes are constructed
    similarly. In the interests of space and time, I'm not going to show those
    here.

    Now we are ready for the groups. Again we begin with the trivial cases:

    # ...

    class SingleChar
    def initialize(c)
    @c = c
    end
    def result
    [@c]
    end
    end

    # ...

    class Dot
    def result
    CHARS
    end
    end

    # ...

    Here we see groups that echo a character and represent a special range of
    characters as their results. Obviously the first is used for literal chunks of
    an expression, like /a/, and the second is used for the special regular
    expression character /./.

    Let's examine the group used to represent alternations like /a|b/:

    # ...

    class OrGroup
    def initialize(first_groupset, second_groupset)
    @first = first_groupset
    @second = second_groupset
    end

    def result
    strings = @first.map {|x| x.result}
    s = combine(strings)
    strings = @second.map {|x| x.result}
    s.concat(combine(strings))
    end
    end

    # ...

    Since the result()s of a group are just the possible generations of that group,
    alternation can be represented as the left choices plus the right choices.
    That's what we see here.

    Jesús's code includes other groups: CharGroup for character classes,
    MultiGroup for nesting groups and managing reference counts, and BackReference
    for supporting the regular expression feature of the same name. I'm going to
    skip over these too, to keep the size of this summary reasonable.

    Now let's have a look at the combine() method I've glossed over twice now:

    # ...

    # Combines arrays, concatenating each string
    # merging the possible groups they have
    # Starts combining the first two arrays, then goes on
    # combining each other array to the result of the
    # previous combination
    def combine(arrays)
    string = arrays.inject do |r, rep|
    temp = []
    r.each {|aa| rep.each {|bb| temp << (aa.concat_and_merge_groups(bb))}}
    temp
    end
    string
    end

    class String
    attr_accessor :filled_groups

    def add_filled_group(num, group)
    @filled_groups ||= {}
    @filled_groups[num] = group
    end

    def concat_and_merge_groups(other)
    temp = self + other
    groups = {}
    groups.merge!(self.filled_groups) if self.filled_groups
    groups.merge!(other.filled_groups) if other.filled_groups
    temp.filled_groups = groups
    temp
    end

    end

    # ...

    The combine() method just turns an Array of group result()s into the actual
    combinations of Strings. So given [%w[a b], %w[c d]] it outputs %w[ac ad bc
    bd].

    The String additions here that are tracking groups were added to support
    backreferences. This was a challenging feature to support and it required
    hooking into the code at many levels. I don't want to focus on it too much
    though since it wasn't a critical part of the quiz so much as an impressive
    extra Jesús managed to include.

    The next method we need to look at is the parser. I'm going to trim some of it
    because it's quite long, but you should still get to see how it's put together:

    # ...

    class Regexp
    attr_reader :num_groups
    def parse(s, i = 0)
    repeaters = []
    group = nil
    while i < s.length
    char = s.chr
    case char
    # ...
    when '.'
    group = Dot.new
    when '|'
    groups, i = parse(s, i + 1)
    group = OrGroup.new(repeaters, groups)
    return [group], i
    # ...
    else
    group = SingleChar.new(char)
    end

    repeater = nil
    i += 1
    if i < s.length
    case s.chr
    when '*'
    repeater = StarRepeater.new(group)
    # ...
    else
    repeater = OneTimeRepeater.new(group)
    i -= 1
    end
    i += 1
    else
    repeater = OneTimeRepeater.new(group)
    end
    repeaters << repeater
    end
    return repeaters, i
    end

    # ...

    As you can see, this is a recursive character by character parser. It reads
    through the expression finding groups and wrapping those in repeaters, with
    whatever level of nesting is required. This builds up an Abstract Syntax Tree
    for the expression.

    Let's see how this gets put to use to create the quiz solution method:

    # ...

    def generate
    @num_groups = 0
    r = self.inspect[1..-2]
    repeaters, _ = self.parse(r)
    strings = repeaters.map {|x| x.result}
    s = combine(strings)
    # Makes a pass for the backreferences
    s.each do |string|
    string.gsub!(/__(\d+)__/) do |match|
    string.filled_groups[$1.to_i]
    end
    end
    s
    end
    end

    The process here is very simple: pull the source of the expression, parse it
    into the AST, generate the result() of the AST, and combine() those Strings into
    the needed Array of matches. Again, the rest of the code here is used to
    substitute backreferences back into the end results.

    Jesús also included another method that pretty-printed and verified results,
    but I won't go into that here.

    As usual I don't want you to miss out on the other solutions. James Koppel used
    currying to build up an AST of functions. Vasil Vangelovski sent in a slow but
    unique approach that works on all regular expressions without even parsing them.
    Do take the time to inspect the solutions. It's worth it.

    My thanks to all who poured so much effort into this quiz. You are all
    fearless.

    Tomorrow we will take a stab at slicing up time...
    Ruby Quiz, Oct 18, 2007
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rastislav Struharik

    Reverse engineering an EDIF file?

    Rastislav Struharik, Nov 10, 2003, in forum: VHDL
    Replies:
    8
    Views:
    7,967
    Joonas Timo Taavetti Kekoni
    Jan 2, 2004
  2. Danny
    Replies:
    1
    Views:
    4,673
    Andrew Thompson
    May 26, 2004
  3. dogbite
    Replies:
    4
    Views:
    693
    osmium
    Oct 10, 2003
  4. Ruby Quiz
    Replies:
    3
    Views:
    134
    Jesús Gabriel y Galán
    Oct 16, 2007
  5. Matthew Moss
    Replies:
    1
    Views:
    153
    ThoML
    May 9, 2008
Loading...

Share This Page