Efficient way of generating original alphabetic strings like unix file "split"

P

py_genetic

Hi,

I'm looking to generate x alphabetic strings in a list size x. This
is exactly the same output that the unix command "split" generates as
default file name output when splitting large files.

Example:

produce x original, but not random strings from english alphabet, all
lowercase. The length of each string and possible combinations is
dependent on x. You don't want any repeats.

[aaa, aab, aac, aad, .... aax, ...... bbc, bbd, .... bcd]

I'm assumming there is a slick, pythonic way of doing this, besides
writing out a beast of a looping function. I've looked around on
activestate cookbook, but have come up empty handed. Any suggestions?

Thanks,
Conor
 
M

mensanator

Hi,

I'm looking to generate x alphabetic strings in a list size x. This
is exactly the same output that the unix command "split" generates as
default file name output when splitting large files.

Example:

produce x original, but not random strings from english alphabet, all
lowercase. The length of each string and possible combinations is
dependent on x. You don't want any repeats.

[aaa, aab, aac, aad, .... aax, ...... bbc, bbd, .... bcd]

I'm assumming there is a slick, pythonic way of doing this, besides
writing out a beast of a looping function. I've looked around on
activestate cookbook, but have come up empty handed. Any suggestions?

If you allow numbers also, you can use Base 36:
print gmpy.digits(n,36),

aaa aab aac aad aae aaf aag aah aai aaj aak aal
aam aan aao aap aaq aar aas aat aau aav aaw aax
aay aaz ab0 ab1 ab2 ab3 ab4 ab5 ab6 ab7 ab8 ab9
aba abb abc abd abe abf abg abh abi abj abk abl
abm abn abo abp abq abr abs abt abu abv abw abx
aby abz ac0 ac1 ac2 ac3 ac4 ac5 ac6 ac7 ac8 ac9
aca acb acc acd ace acf acg ach aci acj ack acl
acm acn aco acp acq acr acs act acu acv acw acx
acy acz ad0 ad1 ad2 ad3 ad4 ad5 ad6 ad7 ad8 ad9
ada adb adc add ade adf adg adh adi adj adk adl
adm adn ado adp adq adr ads adt adu adv adw adx
ady adz ae0 ae1 ae2 ae3 ae4 ae5 ae6 ae7 ae8 ae9
aea aeb aec aed aee aef aeg aeh aei aej aek ael
aem aen aeo aep ...
 
R

Rob Wolfe

py_genetic said:
Hi,

I'm looking to generate x alphabetic strings in a list size x. This
is exactly the same output that the unix command "split" generates as
default file name output when splitting large files.

Example:

produce x original, but not random strings from english alphabet, all
lowercase. The length of each string and possible combinations is
dependent on x. You don't want any repeats.

[aaa, aab, aac, aad, .... aax, ...... bbc, bbd, .... bcd]

I'm assumming there is a slick, pythonic way of doing this, besides
writing out a beast of a looping function. I've looked around on
activestate cookbook, but have come up empty handed. Any suggestions?

You didn't try hard enough. :)

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/190465
 
M

mensanator

py_genetic said:
I'm looking to generate x alphabetic strings in a list size x. This
is exactly the same output that the unix command "split" generates as
default file name output when splitting large files.

produce x original, but not random strings from english alphabet, all
lowercase. The length of each string and possible combinations is
dependent on x. You don't want any repeats.
[aaa, aab, aac, aad, .... aax, ...... bbc, bbd, .... bcd]
I'm assumming there is a slick, pythonic way of doing this, besides
writing out a beast of a looping function. I've looked around on
activestate cookbook, but have come up empty handed. Any suggestions?

You didn't try hard enough. :)

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/190465


Unfortunately, that's a very poor example. The terminaology is
all wrong.

"xpermutations takes all elements from the sequence, order matters."
This ought to be the Cartesian Product, but it's not (no replacement).

"xcombinations takes n distinct elements from the sequence, order
matters."
If order matters, it's a PERMUTATION, period.

"xuniqueCombinations takes n distinct elements from the sequence,
order is irrelevant."
No such thing, a Combination is unique by definition.

"xselections takes n elements (not necessarily distinct) from the
sequence, order matters."
Ah, this allows a size operator, so if size = length, we get full
Cartesian Product.

The proper terminology for the Cartesian Product and
its subsets is:

Permutations with replacement
Combinations with replacement
Permutations without replacement
Combinations without replacement

And if the functions were properly labeled, you would get:

permutation without replacement - size 4

Permutations of 'love'
love loev lvoe lveo leov levo olve olev ovle ovel oelv oevl vloe vleo
vole voel velo veol elov elvo eolv eovl evlo evol


permutation without replacement - size 2

Combinations of 2 letters from 'love'
lo lv le ol ov oe vl vo ve el eo ev


combination without replacement - size 2

Unique Combinations of 2 letters from 'love'
lo lv le ov oe ve


permutation with replacement - size 2

Selections of 2 letters from 'love'
ll lo lv le ol oo ov oe vl vo vv ve el eo ev ee


full Cartesian Product, permutations with replacement - size 4

Selections of 4 letters from 'love'
llll lllo lllv llle llol lloo llov lloe llvl llvo llvv llve llel lleo
llev llee loll lolo lolv lole lool looo loov looe lovl lovo lovv love
loel loeo loev loee lvll lvlo lvlv lvle lvol lvoo lvov lvoe lvvl lvvo
lvvv lvve lvel lveo lvev lvee lell lelo lelv lele leol leoo leov leoe
levl levo levv leve leel leeo leev leee olll ollo ollv olle olol oloo
olov oloe olvl olvo olvv olve olel oleo olev olee ooll oolo oolv oole
oool oooo ooov oooe oovl oovo oovv oove ooel ooeo ooev ooee ovll ovlo
ovlv ovle ovol ovoo ovov ovoe ovvl ovvo ovvv ovve ovel oveo ovev ovee
oell oelo oelv oele oeol oeoo oeov oeoe oevl oevo oevv oeve oeel oeeo
oeev oeee vlll vllo vllv vlle vlol vloo vlov vloe vlvl vlvo vlvv vlve
vlel vleo vlev vlee voll volo volv vole vool vooo voov vooe vovl vovo
vovv vove voel voeo voev voee vvll vvlo vvlv vvle vvol vvoo vvov vvoe
vvvl vvvo vvvv vvve vvel vveo vvev vvee vell velo velv vele veol veoo
veov veoe vevl vevo vevv veve veel veeo veev veee elll ello ellv elle
elol eloo elov eloe elvl elvo elvv elve elel eleo elev elee eoll eolo
eolv eole eool eooo eoov eooe eovl eovo eovv eove eoel eoeo eoev eoee
evll evlo evlv evle evol evoo evov evoe evvl evvo evvv evve evel eveo
evev evee eell eelo eelv eele eeol eeoo eeov eeoe eevl eevo eevv eeve
eeel eeeo eeev eeee


And Combinations with replacement seems to be missing.
 
P

py_genetic

See my other post to see if that is indeed what you mean.

Thanks, mensanator I see what you are saying, I appreciate you
clarification. I modified the unique version to fit my needs,
sometimes you just want the first x unique combinations and of the
right "width" (A or AA or AAA...) string, so I reworked it a bit to be
more efficient. Isn't this a case of base^n-1 for # unique
combinations, using the alphabet: 26^strlen - 1 or to figure out
strlen from #of combinations needed: ln(26 * #ofcobinations needed)/
ln(26) obviously a float but a pritty good idea of strlen needed when
rounded?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top