very simple Genetic Algorithm completed

Discussion in 'Python' started by Matthew_WARREN@bnpparibas.com, Jan 31, 2008.

  1. Guest

    Hi,

    I got some help with this from here, and there's been a little bit of
    discussion around GA's recently, so thought I'd post up my likey slow and
    clunky version of a GA that in essence just 'evolves' a solution to 'make a
    sum that evaluates to n using */+-0123456789' it's a really simple GA that
    would be useful for someone who doesn't quite get GA's to look at.

    I think it's simple enough to be fairly self explanatory.

    to see it come up with evolved solutions to n=1000

    >>>from quickga import *
    >>>evolve()


    I like playing with stuff like this. I'm going to use this little toy to
    investigate how mutation rates/crossover gene length, pop size etc.. etc..
    interact with each other. All completely unscientifically and for my own
    bemusement.

    One point, it's a good idea to keep mutationrate around 1000 - 10000 with
    genome and population sizes of say 50 - 100. Too low and you get no
    solution as the mutations mix up the genome too much for selection
    pressure to work.


    ....as this actually does need to go as quick as it can, and if anyone feels
    like it, I'd really appreciate picking it over on the list for
    optimization. I'm not too familiar with Pthon internals, nor programming
    for speed in general.

    <pre>
    from random import randint
    from operator import itemgetter


    genes=['+','-','*','/','0','1','2','3','4','5','6','7','8','9']
    signs=['+','-','*','/']
    digits=['1','2','3','4','5','6','7','8','9']

    table = {"++": "+", "+-": "-", "+*": "+", "+/": "+",
    "-+": "-", "--": "+", "-*": "-", "-/": "-",
    "*+": "*", "**": "*", "*/": "*",
    "/+": "/", "/*": "/", "//": "/",
    "+0":"+","*0":"*","-0":"-","/0":"/"} # keep out octal literals

    def rationalise_signs(s):
    """Takes the genome string and twiddles it so eval() will work as
    expected
    """
    prev = ''
    while s != prev:
    prev=s
    for z in ['+','-','*','/']:
    s=s.replace(z+'0',z)
    for key, value in table.items():
    s = s.replace(key, value)
    s=s.lstrip('0')
    s=s.strip('+-*/')
    return s



    def generate(number,length):
    """Generate the initial population of genome strings
    """
    population=[]
    for i in range(number):
    s=rationalise_signs(''.join([
    genes[randint(0,len(genes))-1] for n in range(length) ]))
    population.append(s)
    return population


    def floatify(intstring):#So eval() be floating point.
    """kludge to ensure eval() does floating point math
    """
    prev=''
    while intstring != prev:
    prev=intstring
    for sign in signs:
    for digit in digits:

    intstring=intstring.replace(digit+sign,digit+'.0'+sign)
    return intstring

    def express(population):
    """Get the 'expression' of the genome.
    """
    expressed_population=[]
    for individual in population:
    s=floatify(individual)
    expressed_population.append((individual,eval(s)))
    return expressed_population

    def fitness(expressed_population,fitvalue,tolerance):
    """Test the expressed genome for fitness
    """
    population_fitness=[]
    sumfitness=0
    for expressed_individual in expressed_population:
    individual,expression=expressed_individual
    fitness=abs(fitvalue-expression)
    sumfitness=sumfitness+fitness
    population_fitness.append((individual,expression,fitness))
    avgfitness=sumfitness/len(expressed_population)
    return (population_fitness,avgfitness)



    def get_fittest(population_fitness,pct,full=False):
    """Quick n dirty way of selecting - top n% fittest individuals
    """
    population_fitness.sort(key=itemgetter(2))#sort on fitness
    npct=(len(population_fitness)/100.0)*pct
    if not full:
    return [ n[0] for n in population_fitness[0:int(npct)] ]
    else:
    return population_fitness[0:int(npct)]


    def mutate(individual,rate):
    """Does what it says on the tin. Mutates per gene
    if rate is 10000 mutatuion rate is 1 in 10000 on avg
    """
    newindividual=''
    for gene in individual:
    if randint(0,rate)==1:
    newgene=genes[randint(0,14)-1]
    newindividual=newindividual+newgene
    else:
    newindividual=newindividual+gene
    return newindividual

    def breed_new(individuals,number,mutationrate):#crossover with mutation
    """simple crossover of the two genomes around a point, then mutate
    """
    newpopulation=[]
    num_individuals=len(individuals)
    while len(newpopulation)<=number:
    lady=individuals[randint(0,num_individuals-1)]
    man=individuals[randint(0,num_individuals-1)]
    xpoint=randint(0,100)
    xlady=(len(lady)/100.0)*xpoint
    xman=(len(man)/100.0)*xpoint
    leftxlady=lady[:int(xlady)]
    rightxlady=lady[int(xlady):]
    leftxman=man[:int(xman)]
    rightxman=man[int(xman):]

    new1=rationalise_signs(mutate(leftxlady+rightxman,mutationrate))

    new2=rationalise_signs(mutate(leftxman+rightxlady,mutationrate))
    newpopulation.append(new1)
    newpopulation.append(new2)
    return newpopulation


    def
    evolve(popsize=50,genomelength=100,mutationrate=10000,fitcullpct=10,numsolutions=5,target=1000,tolerance=1):
    """Controls the whole process.
    """
    pop=generate(popsize,genomelength)
    fitgens=[]
    generation=1
    while len(fitgens)<numsolutions:
    epop=express(pop)
    fpop,avg=fitness(epop,target,tolerance)
    print "Average fitness",avg
    if avg>tolerance:

    pop=breed_new(get_fittest(fpop,fitcullpct),popsize,mutationrate)
    generation=generation+1
    else:
    print "Pop avg fitness within tolerance"
    print "********************************"
    fitgens.append((fpop[0:],generation))
    pop=generate(popsize,genomelength)
    generation=1
    outlist=[]
    for fitpop,generation in fitgens:
    bestfitpop=get_fittest(fitpop,20,full=True)
    for fitgeneinfo in bestfitpop:
    genome,number,avgfit=fitgeneinfo
    prev=''
    s=floatify(genome)
    outlist.append(genome+" in "+str(generation)+"
    generations got "+str(number)+" avg fit ="+str(avgfit))
    for line in set(outlist):
    print line

    </pre>


    Matt. (Apologies for any disclaimers)


    This message and any attachments (the "message") is
    intended solely for the addressees and is confidential.
    If you receive this message in error, please delete it and
    immediately notify the sender. Any use not in accord with
    its purpose, any dissemination or disclosure, either whole
    or partial, is prohibited except formal approval. The internet
    can not guarantee the integrity of this message.
    BNP PARIBAS (and its subsidiaries) shall (will) not
    therefore be liable for the message if modified.
    Do not print this message unless it is necessary,
    consider the environment.

    ---------------------------------------------

    Ce message et toutes les pieces jointes (ci-apres le
    "message") sont etablis a l'intention exclusive de ses
    destinataires et sont confidentiels. Si vous recevez ce
    message par erreur, merci de le detruire et d'en avertir
    immediatement l'expediteur. Toute utilisation de ce
    message non conforme a sa destination, toute diffusion
    ou toute publication, totale ou partielle, est interdite, sauf
    autorisation expresse. L'internet ne permettant pas
    d'assurer l'integrite de ce message, BNP PARIBAS (et ses
    filiales) decline(nt) toute responsabilite au titre de ce
    message, dans l'hypothese ou il aurait ete modifie.
    N'imprimez ce message que si necessaire,
    pensez a l'environnement.
     
    , Jan 31, 2008
    #1
    1. Advertising

  2. Paul McGuire Guest

    On Jan 31, 10:43 am, wrote:
    > Hi,
    >
    > I got some help with this from here, and there's been a little bit of
    > discussion around GA's recently, so thought I'd post up my likey slow and
    > clunky version of a GA that in essence just 'evolves' a solution to 'make a
    > sum that evaluates to n using */+-0123456789'  it's a really simple GA that
    > would be useful for someone who doesn't quite get GA's to look at.
    >
    > I think it's simple enough to be fairly self explanatory.
    >
    > to see it come up with evolved solutions to n=1000
    >
    > >>>from quickga import *
    > >>>evolve()

    >


    Some improvement tips:

    0. Tack this bit onto the end of quickga.py, and you wont have to
    write a separate routine to import quickga and invoke evolve():

    if __name__ == "__main__":
    evolve()


    1. Remove all calls to floatify, and instead start your program with:
    from __future__ import division
    On my system this gained about 16% improvement.


    2. Bugfix: In 2 places, change:
    newgene=genes[randint(0,14)-1]
    to
    newgene=genes[randint(0,14-1)]

    randint(a,b) returns values from a to b inclusive, and genes is a list
    containing 14 elements (so you want subscripts from 0 to 13). (Using
    subscripts from -1 to 13 will bias the selection of genes to use twice
    as many of the last item in the list, since both -1 and 13 will give
    that value.)

    Similarly, change:

    s=rationalise_signs(''.join([ genes[randint(0,len(genes))-1] ...
    to:
    s=rationalise_signs(''.join([ genes[randint(0,len(genes)-1)] ...


    3. Change mutate to build a list instead of a string. Then use
    ''.join() at the end to convert the list into a single string return
    value.

    def mutate(individual,rate):
    """Does what it says on the tin. Mutates per gene
    if rate is 10000 mutatuion rate is 1 in 10000 on avg
    """
    newindividual=[]
    for gene in individual:
    if randint(0,rate)==1:
    newgene=genes[randint(0,14-1)]
    newindividual.append(newgene)
    else:
    newindividual.append(gene)
    return ''.join(newindividual)


    (This was less of a speed improvement than I thought it would be, but
    IIRC, the optimization of successive string concatentions is only
    available when running Python on Windows. If you are running on Linux,
    this should have more benefit.)


    4. Include psyco to cut execution time in half. Put this at the top
    of your program, right after "from __future__ ...":

    try:
    import psyco
    except ImportError:
    print ("Running without psyco optimization")
    else:
    psyco.full()

    If psyco is available, this will optimize all called functions.


    5. On a hunch that many of your individuals are the same string, I
    lifted the call to eval out of express() into a separate routine
    called evaluate(), and added a memoizing cache to skip the call to
    eval if the string had been eval'ed before - if so, the value is
    reported from the cache. This chopped another 40% off the runtime.

    evalcache = {}
    def evaluate(s):
    if s in evalcache:
    ret = evalcache
    else:
    ret = eval(s)
    evalcache = ret
    return ret

    (A note on timing this code: since there are many calls to
    randomization methods, consistent timing requires an explicit call to
    random.seed() to ensure that a consistent set of random numbers is
    used. Otherwise, the timing gets thrown off by the randomization,
    which sends the program down different paths.)


    6. This is another change that had little effect, but I'll at least
    put it out there (a leftover from an algorithmic optimization course I
    took ages ago). Instead of:

    fitness=abs(fitvalue-expression)

    try using:

    fitness=(fitvalue-expression)*(fitvalue-expression)

    For some optimization methods (GA is not one of them), the
    discontinuity of abs at the origin delays convergence of the
    algorithm, whereas computing the square of the difference *is*
    continuous at 0 *and* still ensures a positive value. Just gave it a
    try out of academic curiosity, but just a dead end after all.

    Cheers,
    -- Paul
     
    Paul McGuire, Feb 1, 2008
    #2
    1. Advertising

  3. Paul McGuire Guest

    On Jan 31, 10:43 am, wrote:
    > Hi,
    >
    > I got some help with this from here, and there's been a little bit of
    > discussion around GA's recently, so thought I'd post up my likey slow and
    > clunky version of a GA that in essence just 'evolves' a solution to 'make a
    > sum that evaluates to n using */+-0123456789'  it's a really simple GA that
    > would be useful for someone who doesn't quite get GA's to look at.
    >
    > I think it's simple enough to be fairly self explanatory.
    >
    > to see it come up with evolved solutions to n=1000
    >
    > >>>from quickga import *
    > >>>evolve()

    >
    > I like playing with stuff like this. I'm going to use this little toy to
    > investigate how mutation rates/crossover gene length, pop size etc.. etc..
    > interact with each other. All completely unscientifically and for my own
    > bemusement.


    Another interesting technique, similar to GA, is SA or Simulated
    Annealing. You should be able to adapt your quickga.py program to an
    SA approach without too much trouble, and comparing the two should
    tickle your academic bemusement.

    -- Paul
     
    Paul McGuire, Feb 1, 2008
    #3
  4. Carl Banks Guest

    On Feb 1, 9:09 am, Paul McGuire <> wrote:
    > 2. Bugfix: In 2 places, change:
    > newgene=genes[randint(0,14)-1]
    > to
    > newgene=genes[randint(0,14-1)]
    >
    > randint(a,b) returns values from a to b inclusive, and genes is a list
    > containing 14 elements (so you want subscripts from 0 to 13). (Using
    > subscripts from -1 to 13 will bias the selection of genes to use twice
    > as many of the last item in the list, since both -1 and 13 will give
    > that value.)



    Better yet, use random.randrange: it was added for this very reason,
    to get a random index in a range.

    Perhaps even better still, use random.choice. It gets a random
    element in a sequence.


    Carl Banks
     
    Carl Banks, Feb 1, 2008
    #4
  5. On Fri, 01 Feb 2008 06:09:49 -0800, Paul McGuire wrote:

    > IIRC, the optimization of successive string concatentions is only
    > available when running Python on Windows. If you are running on Linux,
    > this should have more benefit.)


    There's no reason to believe it is platform-dependent, although it is
    implementation-dependent (only works in CPython). It was introduced in
    Python 2.4:

    http://www.python.org/doc/2.4/whatsnew/node12.html



    --
    Steven
     
    Steven D'Aprano, Feb 1, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. SNAKE

    genetic algorithm

    SNAKE, Nov 4, 2003, in forum: C++
    Replies:
    2
    Views:
    358
    Allan Bruce
    Nov 4, 2003
  2. Sean Ross

    OT: Genetic Algorithm Recipe Bug Fix

    Sean Ross, Jul 3, 2003, in forum: Python
    Replies:
    6
    Views:
    374
    Andrew Dalke
    Jul 18, 2003
  3. Max

    Python Genetic Algorithm

    Max, Jan 27, 2008, in forum: Python
    Replies:
    14
    Views:
    1,123
    Wildemar Wildenburger
    Jan 28, 2008
  4. olivier.melcher

    Help running a very very very simple code

    olivier.melcher, May 12, 2008, in forum: Java
    Replies:
    8
    Views:
    2,381
  5. Robo
    Replies:
    4
    Views:
    195
    Peter Hickman
    Dec 15, 2004
Loading...

Share This Page