[QUIZ] Genetic Programming (#212)

D

Daniel Moore

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have elapsed from the time this message was
sent.

2. Support Ruby Quiz by submitting ideas and responses
as often as you can!
Visit: http://rubyquiz.strd6.com/suggestions

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.

RSS Feed: http://rubyquiz.strd6.com/quizzes.rss

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

## Genetic Programming (#212)

Buon giorno Rubyists,

Let's say that we have a table of outputs for an unknown function, and
we want to find out what that function is. One possible method of
finding a solution is to use genetic programming[1]. In genetic
programming we begin with a random population of programs, then rank
them according to how well they meet the solution criteria. The top
ranked programs are then modified to create a new generation of
programs. The new generation is ranked by how well the solve the
problem in the same way and the process repeats until, hopefully,
we'll have evolved a strong solution.

The modifications made to the programs are based on techniques
observed in biological evolution: mutation and crossover. In mutation
a piece of one program is altered randomly, for example `x + 3` could
become `x + (y - 1)`. The mutation occurs at one node in the parse
tree and replaces it with a new randomly generated node.

Crossover works in much the same way, except instead of randomly
altering the node it is taken from another node of another program.
For example: `x + (y * x)` and `3 - (y + x*x)` could yield something
like `(y*x) - (y + x*x)`.

In order to mutate our programs it would be nice to work with the
structure of the code directly. The ParseTree[2] gem provides a way of
getting an Abstract Syntax Tree[3] to manipulate. The AST represents
the code as S-expressions[4], arrays containing operators as the first
argument and the operands as the remaining arguments. If you are
familiar with LISP then you know that LISP programmers program
directly in S-expressions. Using S-expressions makes evolution and
manipulation of the programs much easier, as S-expressions are the
essence of programs.

When manipulating your ASTs you may need a deep copy to prevent
unwanted side effects Here is a deep copy idiom that may be useful:

def deep_copy(tree)
Marshal.load(Marshal.dump(tree)
end

In order to get the AST back into executable ruby you'll need
ruby2ruby[5] or something like it. As always, feel encouraged to use
and share other tools if you feel they are good for the task. Also, if
you have any questions don't be afraid to ask, there are many people
on the mailing list who are happy to help!

Here is the table of outputs for the mystery function that we would
like to evolve an algorithm for:
["x", "y", "result"]
[8, 16, 20808]
[22, 31, 150847]
[5, 16, 20685]
[12, 19, 34895]
[18, 25, 79349]
[20, 33, 181525]
[31, 1, -119]
[19, 33, 181433]
[0, 12, 8640]
[13, 12, 9017]

In order to determine how well suited a program is we can use the
following metric function:

# Return how far off from the data the given method is
# A lower score is better, a score of 0 matches the data perfectly
def fitness(program, data)
delta = 0
data.each do |row|
value = program.calculate(row[0], row[1])
delta += (value - row[2]).abs
end
return delta
end

Have Fun!

[1]: http://en.wikipedia.org/wiki/Genetic_programming
[2]: http://rubyforge.org/projects/parsetree/
[3]: http://en.wikipedia.org/wiki/Abstract_syntax_tree
[4]: http://en.wikipedia.org/wiki/S-Expression
[5]: http://seattlerb.rubyforge.org/ruby2ruby/
 
S

Sander Land

%w[rubygems charlie].each{|lib| require lib}
class Quiz212 < TreeGenotype([:x,:y,proc{rand(20)-10}], [:-@], [:+,:*,:-])
DATA =3D [[8, 16, 20808] , [22, 31, 150847], [5, 16, 20685], [12,
19, 34895], [18, 25, 79349],
[20, 33, 181525], [31, 1, -119] , [19, 33, 181433], [0,
12, 8640], [13, 12, 9017]]
SIZE_PENALTY =3D 100
def fitness
-DATA.inject(0){|f,(x,y,z)| f + ( z - eval_genes:)x=3D>x,:y=3D>y)
).abs } - size * SIZE_PENALTY
end
end

Population.new(Quiz212).evolve_on_console(200)

The best result I got using size penalty 100 has fitness -2296 and tree:
[:*, [:term, :y], [:+, [:+, [:term, :y], [:term, :x]], [:*, [:*,
[:term, -5], [:term, :y]], [:-@, [:term, :y]]]]]
Which simplifies to 5y^3 + y^2 + xy.

Not including a penalty for size in the fitness function causes bloat.
However, including such a penalty causes it to never find the lower
order terms as the smaller size is considered more important than
slightly better accuracy. This is a well-known problem in GP.


-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-= =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

The three rules of Ruby Quiz:

1. =A0Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have elapsed from the time this message was
sent.

2. =A0Support Ruby Quiz by submitting ideas and responses
as often as you can!
Visit: http://rubyquiz.strd6.com/suggestions

3. =A0Enjoy!

Suggestion: =A0A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. =A0Please reply to
the original quiz message, if you can.

RSS Feed: http://rubyquiz.strd6.com/quizzes.rss

-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-= =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-

## Genetic Programming (#212)

Buon giorno Rubyists,

Let's say that we have a table of outputs for an unknown function, and
we want to find out what that function is. One possible method of
finding a solution is to use genetic programming[1]. In genetic
programming we begin with a random population of programs, then rank
them according to how well they meet the solution criteria. The top
ranked programs are then modified to create a new generation of
programs. The new generation is ranked by how well the solve the
problem in the same way and the process repeats until, hopefully,
we'll have evolved a strong solution.

The modifications made to the programs are based on techniques
observed in biological evolution: mutation and crossover. In mutation
a piece of one program is altered randomly, for example `x + 3` could
become `x + (y - 1)`. The mutation occurs at one node in the parse
tree and replaces it with a new randomly generated node.

Crossover works in much the same way, except instead of randomly
altering the node it is taken from another node of another program.
For example: `x + (y * x)` and `3 - (y + x*x)` could yield something
like `(y*x) - (y + x*x)`.

In order to mutate our programs it would be nice to work with the
structure of the code directly. The ParseTree[2] gem provides a way of
getting an Abstract Syntax Tree[3] to manipulate. The AST represents
the code as S-expressions[4], arrays containing operators as the first
argument and the operands as the remaining arguments. If you are
familiar with LISP then you know that LISP programmers program
directly in S-expressions. Using S-expressions makes evolution and
manipulation of the programs much easier, as S-expressions are the
essence of programs.

When manipulating your ASTs you may need a deep copy to prevent
unwanted side effects Here is a deep copy idiom that may be useful:

=A0 =A0def deep_copy(tree)
=A0 =A0 =A0Marshal.load(Marshal.dump(tree)
=A0 =A0end

In order to get the AST back into executable ruby you'll need
ruby2ruby[5] or something like it. As always, feel encouraged to use
and share other tools if you feel they are good for the task. Also, if
you have any questions don't be afraid to ask, there are many people
on the mailing list who are happy to help!

Here is the table of outputs for the mystery function that we would
like to evolve an algorithm for:
=A0 =A0["x", "y", "result"]
=A0 =A0[8, 16, 20808]
=A0 =A0[22, 31, 150847]
=A0 =A0[5, 16, 20685]
=A0 =A0[12, 19, 34895]
=A0 =A0[18, 25, 79349]
=A0 =A0[20, 33, 181525]
=A0 =A0[31, 1, -119]
=A0 =A0[19, 33, 181433]
=A0 =A0[0, 12, 8640]
=A0 =A0[13, 12, 9017]

In order to determine how well suited a program is we can use the
following metric function:

=A0 =A0# Return how far off from the data the given method is
=A0 =A0# A lower score is better, a score of 0 matches the data perfectly
=A0 =A0def fitness(program, data)
=A0 =A0 =A0delta =3D 0
=A0 =A0 =A0data.each do |row|
=A0 =A0 =A0 =A0value =3D program.calculate(row[0], row[1])
=A0 =A0 =A0 =A0delta +=3D (value - row[2]).abs
=A0 =A0 =A0end
=A0 =A0 =A0return delta
=A0 =A0end

Have Fun!

[1]: http://en.wikipedia.org/wiki/Genetic_programming
[2]: http://rubyforge.org/projects/parsetree/
[3]: http://en.wikipedia.org/wiki/Abstract_syntax_tree
[4]: http://en.wikipedia.org/wiki/S-Expression
[5]: http://seattlerb.rubyforge.org/ruby2ruby/
 
B

brabuhr

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
## Genetic Programming (#212)

Buon giorno Rubyists,

Let's say that we have a table of outputs for an unknown function, and
we want to find out what that function is. One possible method of
finding a solution is to use genetic programming[1]. In genetic
programming we begin with a random population of programs, then rank
them according to how well they meet the solution criteria. The top
ranked programs are then modified to create a new generation of
programs. The new generation is ranked by how well the solve the
problem in the same way and the process repeats until, hopefully,
we'll have evolved a strong solution.

Some time ago, I ran across DRP, Directed Ruby Programming[1], and
thought that might be something to look at later; I thought this quiz
might be a bit of a chance. (Though I only barely scratched the surface
of the library's documentation.) First, here are some sample functions
my programs generated (fitness: function):

89416: 4312 + y * 7 * y * x
82779: ((x * 4 * (93 + y * 6 + x + 9 - x)) - 3) * y / 8 - 62 + 1 * x - 3
- 80 + 1395 + 0 / 870 + y * 6 + y - y - ((y)) - x - y + y - 16
- 657 + 880 - 7 * (6 / 6) * (y / 1) - 870 + 19 + y * y * (y) - y
- (62935) / (((x - x) - y / 1 * y / y) - x) - ((y + 4 - y + y * y
+ x)) + 6 - x * 6 * (((8 + 6 / 2563 * (y) + x / 9871671)) - y - 7
- x - x - 8) - 3361 + 9 - (43) / 675670 - y - x - 45 + 8 + (y / 1)
- 870 + 19 + y * y * (y) - y + y * x * (6 / 6) + 870 + 19 + y + x
* x - ((((10)))) - 8 + y - (x)
17387: x + (y) * 5 * y * y - ((x + y)) / 65 / y / y + y + 2 / (y * x +
95381 - x + x) - y * 0 / 50 * x * ((x)) * 7 - x * 4 + 4 * 9 * 6 *
y - 0 * (((y + x + x))) * (x + y) * (y + 78 * ((x / y * (y + 22 -
9) - ((x + y) - 684) - x - y - x + ((7 + y * ((785 / 4 - x * 8)) +
(8 - 4) + 8 / 7 + x + x))) / 4) - (8 - 9 * x - 0) * x + 2) * 91 *
76 * x + 220 + x * 4 - 2 * ((x)) * (y) * 4 - y * ((x / (y * 34 + y
+ 4 - x * 8)) + 24) + y * 1 - (y) * (((y + x + x)))
8570: 5 * y * + + y * y - 29
8424: y * + 5 * + y * y + - x
8360: 5 * y * y * y / y * y - x / - - - y
8346: 5 * y * y * y - - - 1 + x - x
8338: y * ((5 * y)) * ((y))
8102: (8) - y + 60 + y * y * 5 * y
8056: 47 + 5 * y * y * y
7288: 988 + 5 * y * y * y - y * y / y + 0 * 894 / 2 - x * x / - + 1 - - 7
3948: y * 5 * (y + (y) * y - x)

Following the first example in drp's INTRO file, I created a simple class
to generate random initial functions:

require 'drp'

class ProgramGenerator
extend DRP::RuleEngine

begin_rules
def program
"Class.new{ #{function} }"
end

def function
"def calculate(x, y); #{expression}; end"
end

def expression; "#{expression} #{binaryop} #{expression}"; end
def expression; "#{atom} #{binaryop} #{expression}"; end
def expression; "#{expression} #{binaryop} #{atom}"; end
def expression; "(#{expression})"; end
def expression; "#{atom}"; end

def binaryop; "+"; end
def binaryop; "-"; end
def binaryop; "*"; end
def binaryop; "/"; end

def unaryop; "-"; end

def atom; "#{variable}"; end
def atom; "#{literal}"; end

def variable; "x"; end
def variable; "y"; end

def literal; "#{rand(10)}"; end
def literal; "#{rand(9) + 1}#{literal}"; end
end_rules
end

irb(main):001:0> require 'program_generator'
=> true
irb(main):002:0> pg = ProgramGenerator.new
...
irb(main):003:0> pg.expression
=> "x"
irb(main):004:0> pg.expression
=> "(y * 4 + 924)"
irb(main):006:0> pg.function
=> "def calculate(x, y); 26 / x + (60 - x) + y; end"
irb(main):007:0> pg.program
=> "Class.new{ def calculate(x, y); 136 * x + 14; end }"

I setup a "magic" hash to store my population of programs and a few
helpful methods:

Programs = Hash.new{|hash, program|
hash[program] = fitness(program)
}

class <<Programs
def generate_some_more(n = 100, pg = ProgramGenerator.new)
n.times{ self[pg.expression] }
self
end

def reject_the_unfit
self.delete_if{|program, fit| !fit}
end

def select_the_best(n = 5)
self.sort_by_fitness[0, n]
end

def sort_by_fitness(dont_randomize = true)
self.reject_the_unfit
self.sort_by{|program, fitness| dont_randomize ? fitness : rand(fitness)}
end
end

irb(main):001:0> Programs
=> {}
irb(main):002:0> Programs.generate_some_more(2)
=> {"y"=>687122, "(x - 4 * x / 31 + (0 * 964 / 65 / y - 8 + 4 + (y * ((y / x *
54 - (x / ((986 - y + x - 24 + ((6 / 197 / 29 - 287)) / 9 + y * 2 / 2 * 7)) *
23 - 7) + x - 44 * 659 + (y * 1 - 8 / y / ((5 * x)) - 5) * 7632 / ((y)) / x /
8856))) / x + x) - ((x))) - 7 - 0 - x"=>nil}
irb(main):003:0> Programs.select_the_best(1)
=> [["y", 687122]]

And wrapped that in a loop to generate/mutate a bunch of functions:

pm = ProgramMutator.new

1000.times do
Programs.generate_some_more.
sort_by_fitness(true).
each_cons(2).
each do |(p1, f1), (p2, f2)|
new_program = case rand(10)
when 3..9 then pm.cross_breed(p1, p2)
when 2 then pm.mutate(p2)
when 1 then pm.mutate(p1)
else ProgramGenerator.new.expression
end
new_fitness = Programs[new_program]
#puts "#{f1} X #{f2}\t#{new_fitness}"
end

program, fitness = Programs.select_the_best(1).flatten
puts "Best: #{fitness}\t#{program[0..60]}"
end
ruby1.9 -rubygems 212.rb
Best: 569018 28298
Best: 494580 3170 * x - y
Best: 494580 3170 * x - y
Best: 494580 3170 * x - y
Best: 357554 (4257 * y - 1)
Best: 357554 (4257 * y - 1)
Best: 357554 (4257 * y - 1)
Best: 357554 (4257 * y - 1)
Best: 212330 (y) * - 142 / - 25 * y * x
Best: 212330 (y) * - 142 / - 25 * y * x
Best: 187980 0 / y - - 8 * 25 * y * x
Best: 173870 248 * y * (x)
Best: 104800 26 - 2 + x * 7 * y * (y)
Best: 104800 26 - 2 + x * 7 * y * (y)
Best: 104800 26 - 2 + x * 7 * y * (y)
Best: 92312 8526 - x + y * + 8 * y * x
^C

That leaves the ProgramMutator class:

class ProgramMutator
def cross_breed(p1, p2)
#...
end

def mutate(program)
#...
end
end

The machine I'm using today doesn't have ParseTree ("ParseTree doesn't work
with ruby 1.9.0 (LoadError)"). So, I was just doing string manipulations.
I don't know which functions produced which of the results from above, but
a few samples:

def cross_breed(p1, p2)
p1 + %{w + - * /}[rand(4)] + p2
end

def cross_breed(p1, p2)
s1 = p1[0..rand(p1.size)]
s2 = p2[rand(p2.size)..-1]

s1 + %w{ + - * / }[rand(4)] + s2
end

def mutate(program)
program.gsub(
"#{%w{ + - * / }[rand(4)]}",
"#{%w{ + - * / }[rand(4)]}"
)
end

[1] http://drp.rubyforge.org/
 
B

brabuhr

some sample functions my programs generated (fitness: function):

89416: ...
8338: =A0y * ((5 * y)) * ((y))
8102: =A0(8) - y + 60 + y * y * 5 * y
8056: =A047 + 5 * y * y * y
7288: =A0988 + 5 * y * y * y - y * y / y + 0 * 894 / 2 - x * x / - + 1 - = - 7
3948: =A0y * 5 * (y + (y) * y - x)

I let it run over night:

3804: y * y + y * + y * + + - + - y * 5
 
B

brabuhr

And now what you've all been waiting for... the mystery function:

=A0 =A0def mystery(x, y)
=A0 =A0 =A03 * x * y + 5 * (y**3) - 7 * x
=A0 =A0end

Cool. I was not thinking and I rebooted while my script and output
were in /tmp; the machine is set to clear tmp on boot: so, I lost my
lowest-delta functions. But, I recovered these:
DATA =3D [[8, 16, 20808], [22, 31, 150847], [5, 16, 20685], [12, 19, 34895=
],
[18, 25, 79349], [20, 33, 181525], [31, 1, -119] , [19, 33, 181433=
],
[0, 12, 8640], [13, 12, 9017]]

# An example of one of the closer ones I found:

f =3D "(y) * + x - - - - + x - x - 3 * 82 / y + - 5 + + + x - y * y + y
- 2 * (x + y) / 459 + + - + - - 5 + (y) - 2 + - + 5 + (y) / + 5 - - -
+ + - + x - 3 * 8 + x - + (y) * 5 - x - - x + (y) * 5 + y * (y) / (y)
* y * (y) * 5 + y * (y) * 5 / 3 / (y) * - - + x - 3 + (y) * y / (y) *
y"
puts DATA.map{|(x, y, ev)| puts "#{x}\t#{y}\t#{ev}\t#{av =3D
eval(f)}\t#{delta =3D (ev - av).abs}"; delta}.inject{|s,v| v +=3D s}
puts

# I think this was the second best I found; unfortunately, I did not save
# the original output. The version here was after some hand editing while
# pasting it into a spreadsheet, e.g.:
# "x + + + - + y" =3D> "x - y"; "x + - + - + y" =3D> "x + y"; "(x)" =3D=

f =3D "-5 * y / 4 - x - y - 5 * 8 + y / 5 * -y / 5 + 5 * y - 8 / -8 - y
/ 5 / 4 * 4 + 5 - 7 - 3 / 6 - y / 5 * -4 / y * y + 3 / 6 * -x + 4 + 5
+ 1 - 3 / 6 / -8 * y + 8 / -8 + 8 * + 1 + y / 5 * -y / 5 * -4 * 4 / +
y + 6 - y - 8 + 8 / -8 + 5 / 8 * -8 + y / 5 * 8 / -8 - y + 7 + y * x +
5 * y * y * y + 3 * -x + y + 8 / 3 / 6 - y - 8 / -8 - y / 5 * -4 + y *
y / y * x + 8 * y / -4 + 5 / 84 * 8 * -3 * 4 - x + y * x"
puts DATA.map{|(x, y, ev)| puts "#{x}\t#{y}\t#{ev}\t#{av =3D
eval(f)}\t#{delta =3D (ev - av).abs}"; delta}.inject{|s,v| v +=3D s}
puts

# I added "- 5" to the above:

f =3D "-5 * y / 4 - x - y - 5 * 8 + y / 5 * -y / 5 + 5 * y - 8 / -8 - y
/ 5 / 4 * 4 + 5 - 7 - 3 / 6 - y / 5 * -4 / y * y + 3 / 6 * -x + 4 + 5
+ 1 - 3 / 6 / -8 * y + 8 / -8 + 8 * + 1 + y / 5 * -y / 5 * -4 * 4 / +
y + 6 - y - 8 + 8 / -8 + 5 / 8 * -8 + y / 5 * 8 / -8 - y + 7 + y * x +
5 * y * y * y + 3 * -x + y + 8 / 3 / 6 - y - 8 / -8 - y / 5 * -4 + y *
y / y * x + 8 * y / -4 + 5 / 84 * 8 * -3 * 4 - x + y * x - 5"
puts DATA.map{|(x, y, ev)| puts "#{x}\t#{y}\t#{ev}\t#{av =3D
eval(f)}\t#{delta =3D (ev - av).abs}"; delta}.inject{|s,v| v +=3D s}
puts
8 16 20808 20816 8
22 31 150847 150842 5
5 16 20685 20681 4
12 19 34895 34916 21
18 25 79349 79369 20
20 33 181525 181526 1
31 1 -119 -128 9
19 33 181433 181435 2
0 12 8640 8602 38
13 12 9017 9057 40
148

8 16 20808 20810 2
22 31 150847 150859 12
5 16 20685 20681 4
12 19 34895 34902 7
18 25 79349 79361 12
20 33 181525 181530 5
31 1 -119 -78 41
19 33 181433 181436 3
0 12 8640 8625 15
13 12 9017 9028 11
112

8 16 20808 20805 3
22 31 150847 150854 7
5 16 20685 20676 9
12 19 34895 34897 2
18 25 79349 79356 7
20 33 181525 181525 0
31 1 -119 -83 36
19 33 181433 181431 2
0 12 8640 8620 20
13 12 9017 9023 6
92
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top