Strings and Lists

T

Tom Longridge

My current Python project involves lots repeatating code blocks,
mainly centred around a binary string of data. It's a genetic
algorithm in which there are lots of strings (the chromosomes) which
get mixed, mutated and compared a lot.

Given Python's great list processing abilities and the relative
inefficiencies some string operations, I was considering using a list
of True and False values rather than a binary string.

I somehow doubt there would be a clear-cut answer to this, but from
this description, does anyone have any reason to think that one way
would be much more efficient than the other? (I realise the best way
would be to do both and `timeit` to see which is faster, but it's a
sizeable program and if anybody considers it a no-brainer I'd much
rather know now!)

Any advice would be gladly recieved.
 
A

ajikoe

Hello Tom,

I think it is more efficient if we can use list (with True,False)
member to do genetics algorithms. Of course a lot of works to do to
change from string binary into boolean list.

I do programming genetics algorithms in C# I guess I have to modify my
program also because my old program use binary string manipulation.

Sincerely Yours,
Pujo
 
S

Sidharth Kuruvila

Hi,
I not sure what sorts of operations you plan to do. But if you
intend to use fixed length arrays or even carrying out repetetive
operations. You should probably look at numeric
http://numeric.scipy.org/
 
B

Bill Mill

My current Python project involves lots repeatating code blocks,
mainly centred around a binary string of data. It's a genetic
algorithm in which there are lots of strings (the chromosomes) which
get mixed, mutated and compared a lot.

Given Python's great list processing abilities and the relative
inefficiencies some string operations, I was considering using a list
of True and False values rather than a binary string.

I somehow doubt there would be a clear-cut answer to this, but from
this description, does anyone have any reason to think that one way
would be much more efficient than the other? (I realise the best way
would be to do both and `timeit` to see which is faster, but it's a
sizeable program and if anybody considers it a no-brainer I'd much
rather know now!)

Any advice would be gladly recieved.

Tom,

it seems to me that, if efficiency is your main goal, you should store
your data as a list of integers, and use the bit-twiddling operators
to get at your data. These should be *very* fast, as well as memory
efficient.

Peace
Bill Mill
bill.mill at gmail.com
 
P

Peter Hansen

Tom said:
My current Python project involves lots repeatating code blocks,
mainly centred around a binary string of data. It's a genetic
algorithm in which there are lots of strings (the chromosomes) which
get mixed, mutated and compared a lot.

Given Python's great list processing abilities and the relative
inefficiencies some string operations, I was considering using a list
of True and False values rather than a binary string.

I somehow doubt there would be a clear-cut answer to this, but from
this description, does anyone have any reason to think that one way
would be much more efficient than the other? (I realise the best way
would be to do both and `timeit` to see which is faster, but it's a
sizeable program and if anybody considers it a no-brainer I'd much
rather know now!)

Any advice would be gladly recieved.

It depends more on the operations you are performing, and
more importantly *which of those you have measured and
found to be slow*, than on anything else.

If, for example, you've got a particular complex set of
slicing, dicing, and mutating operations going on, then
that might say use one type of data structure.

If, on the other hand, the evaluation of the fitness
function is what is taking most of the time, then you
should focus on what that algorithm does and needs
and, after profiling (to get real data rather than
shots-in-the-dark), you can pick an appropriate data
structure for optimizing those operations.

Strings seem to be what people pick, by default, without
much thought, but I doubt they're the right thing
for the majority of GA work...

-Peter
 
D

Donn Cave

My current Python project involves lots repeatating code blocks,
mainly centred around a binary string of data. It's a genetic
algorithm in which there are lots of strings (the chromosomes) which
get mixed, mutated and compared a lot.

Given Python's great list processing abilities and the relative
inefficiencies some string operations, I was considering using a list
of True and False values rather than a binary string.

I somehow doubt there would be a clear-cut answer to this, but from
this description, does anyone have any reason to think that one way
would be much more efficient than the other? (I realise the best way
would be to do both and `timeit` to see which is faster, but it's a
sizeable program and if anybody considers it a no-brainer I'd much
rather know now!)

I make no representations about how much mileage you would
get out of it, but if character data suits your purposes and
you just need a mutable array - in principle, I think it would
be more efficient to use an array. Like,
import array
a = array.array('c', strdata)

As I understand it, this object simply contains character data,
not a list of 1 character string objects, so it's much more
economical to store and examine.

Donn Cave, (e-mail address removed)
 
T

Tom Longridge

Thank you all very much for your responses. It's especially reassuring
to hear about other Python GA's as I have had some scepticism about
Python's speed (or lack of it) being too big a problem for such an
application.

With regard to using numeric, arrays or integer lists -- I didn't
mention that these strings can also contain wild cards (so I suppose
it's not really binary -- sorry). This is traditionally done using a
'#' symbol, but I was imagining using a value of None in a boolean list
to represent this. Also there is currently a fair bit of research going
into other representations (floating-point values, paired values etc)
so I was hoping to be able to keep my framework extensible for the
future.

Many thanks again for your help. I will ``take the plunge'' and give
the boolean list a go I think!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top