permuting letters and fairy tales

J

Johannes Nix

Hello,

yesterday I met a cute person (after my dance class) who told me about an
interesting experiment regarding cognition. People were told to read a
typed text; However, in every word in the text, only the first and the
last letter were in the right place. The other letters had their
positions changed in an arbitrary manner. The surprising result, I was
told, was that people can read this mixed-up text fairly well.

Because I am a somewhat sceptical guy, at times, and because I thought
that I deserved some play, I decided to code the rule above in a
scriptlet. The resulting 23 lines are below, and the outcome is quite
interesting, not only from the point of view of librarians ;-) :

--------------------------------------------------
#!/usr/bin/python
import sys
import locale
import string
import re
import Numeric
import RandomArray

locale.setlocale(locale.LC_CTYPE, '')
wordsep = re.compile('([^%s])' % string.letters)

for line in sys.stdin.xreadlines():
for word in wordsep.split(line):
if word and word[0] in string.letters:
word = string.lower(word)
wlen = len(word)
if wlen > 3:
wa = Numeric.array(word)
perm = RandomArray.permutation(wlen-2)
wa[1:wlen-1] = Numeric.take(wa[1:wlen-1],perm)
word = wa.tostring()
sys.stdout.write('%s' % word)

--------------------------------------------------


For the Uninitiated, Numeric is a package which deals with array data;
arrays are mutable sequences and Numeric.take() can reorder items in
them; RandomArray.permutation() delivers the randomized reordering we
need.

Now I have two innocent questions:

- Is it possible to make it a bit more concise ;-))) ?

- Can it coerced to run a little bit faster ?
(on my oldish, 300 MHz-AMD K6 , run time looks like this
for a famous, 2663-word-long fairy tale from the Grimm's brothers:

nix@aster:~> time <HaenselundGretel.txt ./python/perlmutt.py >v

real 0m6.970s
user 0m3.634s
sys 0m0.120s


And two remarks what is interesting about it:

- It's a good example how powerful libraries, like
Numeric, make one's life easier. (BTW, why is Numeric
and stuff like take() still not included in the standard
Library ? Batteries included, but calculator not ?)

- Perhaps it's useful to protect messages in some
regions with not-so-democratic forms of government
against automatic scanning by making the message
machine-unreadable, causing some Orwellian Confusion ;-) ?
Of course, texts from Pythonistas would remain suspicious,
due to the large number of "y" occurring in them....

have a nice evening....

Johannes
 
S

Steven Bethard

Johannes Nix said:
Now I have two innocent questions:

- Is it possible to make it a bit more concise )) ?

- Can it coerced to run a little bit faster ?
(on my oldish, 300 MHz-AMD K6 , run time looks like this
for a famous, 2663-word-long fairy tale from the Grimm's brothers:

So, I don't have Numeric, but I believe I've translated your code to the
equivalent numarray code. I've also written a version of it using only the
standard Python libraries:

------------------------------------------------------------
import sys
import locale
import string
import re
import numarray
import numarray.strings
import numarray.random_array
import random

def f1(file_in, file_out):
wordsep = re.compile('([^%s])' % string.letters)
for line in file_in.xreadlines():
for word in wordsep.split(line):
if word and word[0] in string.letters:
word = string.lower(word)
wlen = len(word)
if wlen > 3:
wa = numarray.strings.array(list(word))
perm = numarray.random_array.permutation(wlen-2)
wa[1:wlen-1] = numarray.take(wa[1:wlen-1],perm)
word = wa.tostring()
file_out.write('%s' % word)
file_out.write('\n')

def f2(file_in, file_out):
word_sep_matcher = re.compile(r'(\W+)')
for line in file_in:
for word in word_sep_matcher.split(line):
if word and word.isalpha():
word = word.lower()
if len(word) > 3:
inner = list(word[1:-1])
random.shuffle(inner)
word = '%s%s%s' % (word[0], ''.join(inner), word[-1])
file_out.write('%s' % word)
file_out.write('\n')
------------------------------------------------------------

As far as conciceness goes, a couple things of note:
(1) I believe r'([^A-z])' does the same thing as your re
(2) xreadlines is deprecated in Python 2.3
(3) str.isalpha() is probably a good substitute for s[0] in string.letters. (If
you want to be strict about the translation, you can do s[0].isalpha())
(4) string.lower(word) is deprecated in favor of word.lower()
(5) perhaps Numeric doesn't support this, but I believe you can replace
wa[1:wlen-1] with wa[1:-1]
(6) you can do something much like what your numarray code does by using
random.shuffle


As far as speed goes, here's the timings I got using your email (minus the code)
as input (your code is saved in 'temp.txt'):
python -m timeit -n 10000 -s "import temp; file_in = file('temp.txt'); file_out
= file('out.txt', 'w')" "temp.f1(file_in, file_out)"
10000 loops, best of 3: 31.6 usec per loop
python -m timeit -n 10000 -s "import temp; file_in = file('temp.txt'); file_out
= file('out.txt', 'w')" "temp.f2(file_in, file_out)"
10000 loops, best of 3: 8.02 usec per loop

Looks to me like the one without numarray is much quicker, but I can't be sure
that Numeric would behave in the same manner.
- It's a good example how powerful libraries, like
Numeric, make one's life easier. (BTW, why is Numeric
and stuff like take() still not included in the standard
Library ? Batteries included, but calculator not ?)

Well, at least in numarray, take is just the functional form of indexing by an
array:
arr = na.arange(20)*2
na.take(arr, na.array([3, 5, 13])) array([ 6, 10, 26])
arr[na.array([3, 5, 13])]
array([ 6, 10, 26])

While it's true that Python's builtin lists don't support this directly, list
comprehensions make this pretty easy to reproduce:
lst = [2*x for x in range(20)]
[lst for i in [3, 5, 13]]

[6, 10, 26]
- Perhaps it's useful to protect messages in some
regions with not-so-democratic forms of government
against automatic scanning by making the message
machine-unreadable, causing some Orwellian Confusion ?

Unfortunately, this 'encoding scheme' ;) doesn't work in all languages -- what's
the first 'letter' of a two-character word in, say, Mandarin? ;)

Steve
 
A

Andrew Dalke

Johannes said:
yesterday I met a cute person (after my dance class)

Another dancer, eh? I mostly do Latin dancing (salsa, cha-cha,
nerengue, though haven't got bachata down) some tango and a bit
of swing.
who told me about an
interesting experiment regarding cognition. People were told to read a
typed text; However, in every word in the text, only the first and the
last letter were in the right place.

That was going around a couple months ago. Here's some links:
http://jwz.livejournal.com/256229.html
http://slashdot.org/article.pl?sid=03/09/15/2227256
- Is it possible to make it a bit more concise ;-))) ?

Yes. Try this

#!/usr/bin/pthoyn
irpomt re
imropt rondam
ipormt snitrg

# This is lcloae aware, so long as '][-' aren't ltreets.
# (Orhtiewse use re.epscae)

wrod_pat = re.colmipe('[' + string.letters + ']{4,}')

def _jmulbe(m):
s = m.gruop(0)
leretts = lsit(s[1:-1])
rondam.sulfhfe(leetrts)
return s[0] + "".jion(ltrtees) + s[-1]

def jmblue_file(ilnife, ouflite):
for lnie in iinlfe:
lnie = wrod_pat.sub(_julbme, lnie)
olftuie.wirte(line)

if __name__ == "__mian__":
ioprmt sys
julbme_file(sys.sdtin, sys.sdtout)

- Can it coerced to run a little bit faster ?

You'll have to test it for yourself. I don't have a copy
of your data set and can't find it on-line.
- It's a good example how powerful libraries, like
Numeric, make one's life easier. (BTW, why is Numeric
and stuff like take() still not included in the standard
Library ? Batteries included, but calculator not ?)

Well, random.shuffle works nicely as does using re.sub along
with a callable. I rarely need Numeric.
- Perhaps it's useful to protect messages in some
regions with not-so-democratic forms of government
against automatic scanning by making the message
machine-unreadable, causing some Orwellian Confusion ;-) ?
Of course, texts from Pythonistas would remain suspicious,
due to the large number of "y" occurring in them....

There are many more ways to do that. Eg, see what the
spammers do to get through the repressive forms of email
filters I use.

Oh, and to make life easier for you,

#!/usr/bin/python
import re
import random
import string

# This is locale aware, so long as '][-' aren't letters.
# (Otherwise use re.escape)

word_pat = re.compile('[' + string.letters + ']{4,}')

def _jumble(m):
s = m.group(0)
letters = list(s[1:-1])
random.shuffle(letters)
return s[0] + "".join(letters) + s[-1]

def jumble_file(infile, outfile):
for line in infile:
line = word_pat.sub(_jumble, line)
outfile.write(line)

if __name__ == "__main__":
import sys
jumble_file(sys.stdin, sys.stdout)

Andrew
(e-mail address removed)
 
S

Scott David Daniels

Andrew said:
Not to prolong this, but if you are dealing with camelCase:
word_pat = re.compile('[' + string.letters + ']{4,}')

you might want to use:
word_pat = re.compile('[' + string.letters + ']['
+ string.lowercase + ']{3,}')


-Scott David Daniels
(e-mail address removed)
 
S

Steven Bethard

Scott David Daniels said:
Not to prolong this, but if you are dealing with camelCase:
word_pat = re.compile('[' + string.letters + ']{4,}')

you might want to use:
word_pat = re.compile('[' + string.letters + ']['
+ string.lowercase + ']{3,}')

Is there any reason not to use [A-z] type regexps?
p = re.compile('[' + string.letters + ']{4,}')
p.findall('fd 234 asdf454 asdfsa4 sadf Qsdha asdfAded') ['asdf', 'asdfsa', 'sadf', 'Qsdha', 'asdfAded']
p = re.compile('[A-z]{4,}')
p.findall('fd 234 asdf454 asdfsa4 sadf Qsdha asdfAded')
['asdf', 'asdfsa', 'sadf', 'Qsdha', 'asdfAded']
p = re.compile('[' + string.letters + ']['+ string.lowercase + ']{3,}')
p.findall('fd 234 asdf454 asdfsa4 sadf Qsdha asdfAded') ['asdf', 'asdfsa', 'sadf', 'Qsdha', 'asdf', 'Aded']
p = re.compile('[A-z][a-z]{3,}')
p.findall('fd 234 asdf454 asdfsa4 sadf Qsdha asdfAded')
['asdf', 'asdfsa', 'sadf', 'Qsdha', 'asdf', 'Aded']

Seems to do the same thing to me, and doesn't require importing string (which
will hopefully be deprecated one of these days...) =)

Steve
 
A

Andrew Dalke

Steven said:
Is there any reason not to use [A-z] type regexps?

Better support for internationalization, so it will
work in España and Göteborg.



Andrew
(e-mail address removed)
 
S

Steven Bethard

Andrew Dalke said:
Steven said:
Is there any reason not to use [A-z] type regexps?

Better support for internationalization, so it will
work in España and Göteborg.

Ahh. That makes sense of course. Thanks!

I looked again at the re module, and it seems that \w and \W do have
internationalization support... Is there any way to match \w but not \d? Maybe
something like:
r'[^\d\W]{4,}'

This seems to work (maybe?):
p = re.compile(r'[^\d\W]{4,}', re.UNICODE)
p.findall(u'él me compró un globo. 1234 a342')
[u'compr\xf3', u'globo']

I don't know how to check how this works in different locales though...

Steve
 
A

Andrew Dalke

Steven said:
I looked again at the re module, and it seems that \w and \W do have
internationalization support... Is there any way to match \w but not \d? Maybe
something like:
r'[^\d\W]{4,}'

This seems to work (maybe?):

Yeah, I tried that originally but noticed that digits and '_'
are included, which ruined the idea of the scramble. So I
opted with the OP's choice and hard-coded just the letters.
p = re.compile(r'[^\d\W]{4,}', re.UNICODE)

Nice way to do it. I would also put '_' in that exclude list.
> I don't know how to check how this works in different
> locales though...

I think I don't understand locales well enough either. It
looks like I need to use re.UNICODE more often.
>>> import string
>>> string.letters 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> import re
>>> pat = re.compile(r"[^\d^\W]{4,}", re.UNICODE)
>>> GOT = u"G\N{LATIN SMALL LETTER O WITH DIAERESIS}teborg"
>>> print GOT.encode("utf8") Göteborg
>>> pat.search(GOT).group(0) u'G\xf6teborg'
>>> pat = re.compile(r"[^\d^\W]{4,}")
>>> pat.search(GOT).group(0) u'teborg'
>>>
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "") 'C'
>>>

So you can see that re.UNICODE uses the Unicode definition
of what is a letter despite the locale being C.

Therefore, I think your approach is better.

Andrew
(e-mail address removed)
 
C

Carl Banks

Johannes Nix said:
yesterday I met a cute person

Obviously this wouldn't have happened if you were using Perl.

(after my dance class) who told me about an
interesting experiment regarding cognition. People were told to read a
typed text; However, in every word in the text, only the first and the
last letter were in the right place. The other letters had their
positions changed in an arbitrary manner. The surprising result, I was
told, was that people can read this mixed-up text fairly well.

I never entirely bought this, although I've never gone to the
empirical extreme you did.

If you look at the oft-cited example (see
http://www.snopes.com/language/apocryph/cambridge.asp ), you will see
that the intetior letters are not even close to being in random order.
For the most part, letters near the front of the word remain near the
front, and letters towards the back remain near the back.

My thought is that, the brain doesn't read the word as a whole as they
claim, but neither does it pay too much attention to the _exact_
ordering. So you could scramble the letters a little, and it will
still be recognizable; but scramble them a lot, and you will have
trouble.

I suspect that a fairy tale wouldn't be too hard even with random
ordering, though, since fairy tales don't use a lot of big words.

Because I am a somewhat sceptical guy, at times, and because I thought
that I deserved some play, I decided to code the rule above in a
scriptlet. The resulting 23 lines are below, and the outcome is quite
interesting, not only from the point of view of librarians ;-) :

--------------------------------------------------
#!/usr/bin/python
import sys
import locale
import string
import re
import Numeric
import RandomArray

locale.setlocale(locale.LC_CTYPE, '')
wordsep = re.compile('([^%s])' % string.letters)

for line in sys.stdin.xreadlines():
for word in wordsep.split(line):
if word and word[0] in string.letters:
word = string.lower(word)
wlen = len(word)
if wlen > 3:
wa = Numeric.array(word)
perm = RandomArray.permutation(wlen-2)
wa[1:wlen-1] = Numeric.take(wa[1:wlen-1],perm)
word = wa.tostring()
sys.stdout.write('%s' % word)

--------------------------------------------------


For the Uninitiated, Numeric is a package which deals with array data;
arrays are mutable sequences and Numeric.take() can reorder items in
them; RandomArray.permutation() delivers the randomized reordering we
need.

Now I have two innocent questions:

- Is it possible to make it a bit more concise ;-))) ?

Yes, but I'd say you're already at the point where more conciseness
causes an unacceptable hit in the readability. I suspect you also
thought this.

- Can it coerced to run a little bit faster ?
(on my oldish, 300 MHz-AMD K6 , run time looks like this
for a famous, 2663-word-long fairy tale from the Grimm's brothers:

nix@aster:~> time <HaenselundGretel.txt ./python/perlmutt.py >v

real 0m6.970s
user 0m3.634s
sys 0m0.120s

Well, the thing about Numeric and numarray is that it's not designed
to scale down well. If performance is your concern, you are probably
better off using regular Python lists if your operations act on small
arrays, as they do in this case. It doesn't surprise me that Steven
Bethard found that ordinary Python ran faster.

However, maybe we can use numarray to a greater advantage. The best
way to use numarray is to operate on as much data as possible, which
in this case is the whole text file. The question is: can we use
operations on the whole text file to do this job? Well, we probably
can't scramble the letters of individual words by operating on the
whole text file at once. But could we find the word boundaries that
way? The answer is yes. Have a look at my snippet to see how:

-----------------------
import numarray as na
from numarray import random_array as ra

def interior_scramble(text):

# First, build a length 256 table indicating if this is a real
# letter. It's a boolean-valued array.

ltable = na.array([ chr(c).isalpha() for c in range(256) ])

# Turn the above text into a numarray array We add a space
# before and after the text, to make it possible to see the
# boundaries on the first and last words

atext = na.fromstring(" %s " % text, na.UInt8)

# Now, calculate a mask. This uses the binary values of the
# text string to index into the ltable, so that whereever a
# letter appears in the text, a one appears in the mask.

amask = ltable[atext]

# Here's the magic: to find the word boundaries, we xor two
# slices that are offset by one. Learning to use slicing in
# this way is one of the greatest ways to channel the power of
# numarray.

# In the mask array, a start boundary will be a zero followed
# by a one. An end will be a one followed by a zero.
# Therefore, if you xor adjacent entries in the mask array,
# the result will be nonzero whereever there's a boundary.
# So, let's xor two one-off slices (which has the effect of
# xoring adjacent entries) of the mask to get the word
# boundaries.

pat = amask[:-1] ^ amask[1:]

# pat will be nonzero wherever there's a word boundary. Let's
# use numarray.nonzero to get the indices of the word
# boundaries. The word boundaries will be
# start,end,start,end,... so we can reshape the array so that
# each row is a start,end pair.

words = na.reshape(na.nonzero(pat)[0],(-1,2))

# Because of how the resulting array aligns, we have to add 2
# to each start value to get the index where the scrambling
# begins.

words[:] += [2,0]

# We got the word boundaries. Now, loop through them and
# scramble. Because the whole file is a numarray, we don't
# need to convert each word to a numarray and back
# individually. That also saves some time.

for start,end in words:
n = end-start
if n < 2:
continue
perm = ra.permutation(end-start)
atext[start:end] = atext[start:end][perm]

# It's scrambled. Convert the applicable slice back to a
# string.

return atext[1:-1].tostring()
-----------------

I'm not sure if this is faster than your method or the pure-Python
method; I didn't check (I worked too hard writing it :). But,
generally speaking, whenever you can operate on your whole data set
using numarray at once, doing so will be a big time savings.
 
B

Bengt Richter

Hello,

yesterday I met a cute person (after my dance class) who told me about an
interesting experiment regarding cognition. People were told to read a
typed text; However, in every word in the text, only the first and the
last letter were in the right place. The other letters had their
positions changed in an arbitrary manner. The surprising result, I was
told, was that people can read this mixed-up text fairly well.

Because I am a somewhat sceptical guy, at times, and because I thought
that I deserved some play, I decided to code the rule above in a
scriptlet. The resulting 23 lines are below, and the outcome is quite
interesting, not only from the point of view of librarians ;-) :

--------------------------------------------------
#!/usr/bin/python
import sys
import locale
import string
import re
import Numeric
import RandomArray

locale.setlocale(locale.LC_CTYPE, '')
wordsep = re.compile('([^%s])' % string.letters)

for line in sys.stdin.xreadlines():
for word in wordsep.split(line):
if word and word[0] in string.letters:
word = string.lower(word)
wlen = len(word)
if wlen > 3:
wa = Numeric.array(word)
perm = RandomArray.permutation(wlen-2)
wa[1:wlen-1] = Numeric.take(wa[1:wlen-1],perm)
word = wa.tostring()
sys.stdout.write('%s' % word)

--------------------------------------------------


For the Uninitiated, Numeric is a package which deals with array data;
arrays are mutable sequences and Numeric.take() can reorder items in
them; RandomArray.permutation() delivers the randomized reordering we
need.

Now I have two innocent questions:

- Is it possible to make it a bit more concise ;-))) ?

- Can it coerced to run a little bit faster ?
(on my oldish, 300 MHz-AMD K6 , run time looks like this
for a famous, 2663-word-long fairy tale from the Grimm's brothers:

nix@aster:~> time <HaenselundGretel.txt ./python/perlmutt.py >v

real 0m6.970s
user 0m3.634s
sys 0m0.120s


And two remarks what is interesting about it:

- It's a good example how powerful libraries, like
Numeric, make one's life easier. (BTW, why is Numeric
and stuff like take() still not included in the standard
Library ? Batteries included, but calculator not ?)

- Perhaps it's useful to protect messages in some
regions with not-so-democratic forms of government
against automatic scanning by making the message
machine-unreadable, causing some Orwellian Confusion ;-) ?
Of course, texts from Pythonistas would remain suspicious,
due to the large number of "y" occurring in them....

have a nice evening....
Don't know the speed, but this seems fairly self-documenting to me
(with a little thought ;-):
... seqisalpha = False; seq = []
... for c in s:
... if c.isalpha() == seqisalpha: seq.append(c); continue
... elif seqisalpha and len(seq)>3:
... mid = seq[1:-1]
... random.shuffle(mid)
... seq[1:-1] = mid
... yield ''.join(seq)
... seq = [c]
... seqisalpha = c.isalpha()
... if seq: yield ''.join(seq)
... 'This is an elmxape. It has 7 wrods ;-)'

Regards,
Bengt Richter
 
M

Mike Meyer

My thought is that, the brain doesn't read the word as a whole as they
claim, but neither does it pay too much attention to the _exact_
ordering. So you could scramble the letters a little, and it will
still be recognizable; but scramble them a lot, and you will have
trouble.

"Word shape" used to be a popular theory on word recognition. That
would explain why mixing up the interior letters sometimes works and
sometimes doesn't. But that theory has fallen out of favor in recent
years.

<mike
 
B

Bengt Richter

]
Don't know the speed, but this seems fairly self-documenting to me
(with a little thought ;-):
... seqisalpha = False; seq = []
... for c in s:
... if c.isalpha() == seqisalpha: seq.append(c); continue
... elif seqisalpha and len(seq)>3:
... mid = seq[1:-1]
... random.shuffle(mid)
... seq[1:-1] = mid
... yield ''.join(seq)
... seq = [c]
... seqisalpha = c.isalpha()
... if seq: yield ''.join(seq)
...'This is an elmxape. It has 7 wrods ;-)'
Hum, didn't think enough ;-/
'lsat is antidisestablishmentarianism'

Bad logic. Needs a repeat of the elif test and suite before the last yield line ;-/
I kind of smelled that as I was posting, but was too lazy to identify the offensive
material ;-)

But it does bring up a general problem of final logic in an iter loop. ISTM it could
be useful to have an option not to raise StopIteration, but instead keep returning
a specified sentinel at the end, something like file.read returns '' repeatedly at EOF.
E.g., I could have used it like (untested, and impossible since no such iter option ;-):

def messwith(s):
seqisalpha = False; seq = []
next = iter(s, EOSEQ='').next
while True:
c = next()
if c and c.isalpha() == seqisalpha: seq.append(c); continue
elif seqisalpha and len(seq)>3:
mid = seq[1:-1]
random.shuffle(mid)
seq[1:-1] = mid
yield ''.join(seq)
if not c: break
seq = [c]
seqisalpha = c.isalpha()

In the meanwhile, I don't like any of the workarounds, but maybe
I just need some real sleep ;-)

Regards,
Bengt Richter
 
K

Kent Johnson

Bengt said:
But it does bring up a general problem of final logic in an iter loop. ISTM it could
be useful to have an option not to raise StopIteration, but instead keep returning
a specified sentinel at the end, something like file.read returns '' repeatedly at EOF.

From the itertools examples:.... "Returns the sequence elements and then returns None indefinitely"
.... return chain(seq, repeat(None))

Kent
 
B

Bengt Richter

From the itertools examples:
... "Returns the sequence elements and then returns None indefinitely"
... return chain(seq, repeat(None))
Thanks for shaming me into refreshing my acquaintance with the itertools module ;-)
It really deserves to be more than an acquaintance. It's surprising that itertools
doesn't show up in posted solutions more (Alex's notwithstanding ;-)

Regards,
Bengt Richter
 
D

David H Wild

My thought is that, the brain doesn't read the word as a whole as they
claim, but neither does it pay too much attention to the _exact_
ordering. So you could scramble the letters a little, and it will
still be recognizable; but scramble them a lot, and you will have
trouble.

In many cases, of course, we pick up what the word is from the context. If
I were to write "He was very tired, so he went to bad before nine o clock"
it is quite likely that the spelling mistake wouldn't be noticed - or that
people would read the sense and then think "there's something wrong there".
If the text had a dirty mark covering the letter between 'b' and 'd' most
people would see it as "bed" without missing a beat.
 
P

Peter Otten

Bengt said:
It really deserves to be more than an acquaintance.  It's surprising that
itertools doesn't show up in posted solutions more (Alex's notwithstanding
;-)

I can help fix that :)

While I think that a good old for-loop is often more readable than the
"cool" itertools solution, itertools has just grown a new function that
fits into your messwith() generator quite nicely:

def messwith(s):
for isalpha, chars in itertools.groupby(s, type(s).isalpha):
chars = list(chars)
if isalpha and len(chars) > 3:
mid = chars[1:-1]
random.shuffle(mid)
chars[1:-1] = mid
yield "".join(chars)

Peter
 
A

Andrew Dalke

Steven Bethard said:
Is there any way to match \w but not \d?

Dieter said:
It is: r'(?!\d)\w'

While implementation are free to optimize this case, the current
Python implementation is slower than the other solution of r"[^\d\W]"
>>> text = "Blah an123d blah901234 9spam and eggs\n" * 1000
>>> import re
>>> pat1 = re.compile(r"((?!\d)\w)+")
>>> pat2 = re.compile(r"[^\d\W]+")
>>> len(pat2.findall(text)) 7000
>>> len(pat1.findall(text)) 7000
>>> import timeit
>>> x = timeit.Timer(setup = "import __main__ as M", stmt = "M.pat1.findall(M.text)")
>>> x.timeit(100) 4.0506279468536377
>>> x = timeit.Timer(setup = "import __main__ as M", stmt = "M.pat2.findall(M.text)")
>>> x.timeit(100) 1.8287069797515869
>>>

Andrew
(e-mail address removed)
 
S

Steven Bethard

Dieter said:
It is: r'(?!\d)\w'

Yeah, I guess you could use negative lookahead assertions too. My
proposed solution to the problem discussed in this thread:
>>> re.findall(r'[^\W\d_]{4,}', 'asdg1dfs _asfd s adfsa')
['asdg', 'asfd', 'adfsa']

A solution using a negative lookahead assertion:
>>> re.findall(r'(?:(?![\d_])\w){4,}', 'asdg1dfs _asfd s adfsa')
['asdg', 'asfd', 'adfsa']

This seems a fair bit more verbose (and IMHO harder to read) than the
solution I proposed, but perhaps you had a clearer version in mind?

I tend to shy away from lookahead assertions because IMHO there's
usually an easier way. They are occasionally useful though...

Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top