permuting letters and fairy tales

Discussion in 'Python' started by Johannes Nix, Nov 11, 2004.

  1. Johannes Nix

    Johannes Nix Guest

    Hello,

    yesterday I met a cute person (after my dance class) who told me about an
    interesting experiment regarding cognition. People were told to read a
    typed text; However, in every word in the text, only the first and the
    last letter were in the right place. The other letters had their
    positions changed in an arbitrary manner. The surprising result, I was
    told, was that people can read this mixed-up text fairly well.

    Because I am a somewhat sceptical guy, at times, and because I thought
    that I deserved some play, I decided to code the rule above in a
    scriptlet. The resulting 23 lines are below, and the outcome is quite
    interesting, not only from the point of view of librarians ;-) :

    --------------------------------------------------
    #!/usr/bin/python
    import sys
    import locale
    import string
    import re
    import Numeric
    import RandomArray

    locale.setlocale(locale.LC_CTYPE, '')
    wordsep = re.compile('([^%s])' % string.letters)

    for line in sys.stdin.xreadlines():
    for word in wordsep.split(line):
    if word and word[0] in string.letters:
    word = string.lower(word)
    wlen = len(word)
    if wlen > 3:
    wa = Numeric.array(word)
    perm = RandomArray.permutation(wlen-2)
    wa[1:wlen-1] = Numeric.take(wa[1:wlen-1],perm)
    word = wa.tostring()
    sys.stdout.write('%s' % word)

    --------------------------------------------------


    For the Uninitiated, Numeric is a package which deals with array data;
    arrays are mutable sequences and Numeric.take() can reorder items in
    them; RandomArray.permutation() delivers the randomized reordering we
    need.

    Now I have two innocent questions:

    - Is it possible to make it a bit more concise ;-))) ?

    - Can it coerced to run a little bit faster ?
    (on my oldish, 300 MHz-AMD K6 , run time looks like this
    for a famous, 2663-word-long fairy tale from the Grimm's brothers:

    nix@aster:~> time <HaenselundGretel.txt ./python/perlmutt.py >v

    real 0m6.970s
    user 0m3.634s
    sys 0m0.120s


    And two remarks what is interesting about it:

    - It's a good example how powerful libraries, like
    Numeric, make one's life easier. (BTW, why is Numeric
    and stuff like take() still not included in the standard
    Library ? Batteries included, but calculator not ?)

    - Perhaps it's useful to protect messages in some
    regions with not-so-democratic forms of government
    against automatic scanning by making the message
    machine-unreadable, causing some Orwellian Confusion ;-) ?
    Of course, texts from Pythonistas would remain suspicious,
    due to the large number of "y" occurring in them....

    have a nice evening....

    Johannes
     
    Johannes Nix, Nov 11, 2004
    #1
    1. Advertising

  2. Johannes Nix <Johannes.Nix <at> gmx.net> writes:
    > Now I have two innocent questions:
    >
    > - Is it possible to make it a bit more concise )) ?
    >
    > - Can it coerced to run a little bit faster ?
    > (on my oldish, 300 MHz-AMD K6 , run time looks like this
    > for a famous, 2663-word-long fairy tale from the Grimm's brothers:


    So, I don't have Numeric, but I believe I've translated your code to the
    equivalent numarray code. I've also written a version of it using only the
    standard Python libraries:

    ------------------------------------------------------------
    import sys
    import locale
    import string
    import re
    import numarray
    import numarray.strings
    import numarray.random_array
    import random

    def f1(file_in, file_out):
    wordsep = re.compile('([^%s])' % string.letters)
    for line in file_in.xreadlines():
    for word in wordsep.split(line):
    if word and word[0] in string.letters:
    word = string.lower(word)
    wlen = len(word)
    if wlen > 3:
    wa = numarray.strings.array(list(word))
    perm = numarray.random_array.permutation(wlen-2)
    wa[1:wlen-1] = numarray.take(wa[1:wlen-1],perm)
    word = wa.tostring()
    file_out.write('%s' % word)
    file_out.write('\n')

    def f2(file_in, file_out):
    word_sep_matcher = re.compile(r'(\W+)')
    for line in file_in:
    for word in word_sep_matcher.split(line):
    if word and word.isalpha():
    word = word.lower()
    if len(word) > 3:
    inner = list(word[1:-1])
    random.shuffle(inner)
    word = '%s%s%s' % (word[0], ''.join(inner), word[-1])
    file_out.write('%s' % word)
    file_out.write('\n')
    ------------------------------------------------------------

    As far as conciceness goes, a couple things of note:
    (1) I believe r'([^A-z])' does the same thing as your re
    (2) xreadlines is deprecated in Python 2.3
    (3) str.isalpha() is probably a good substitute for s[0] in string.letters. (If
    you want to be strict about the translation, you can do s[0].isalpha())
    (4) string.lower(word) is deprecated in favor of word.lower()
    (5) perhaps Numeric doesn't support this, but I believe you can replace
    wa[1:wlen-1] with wa[1:-1]
    (6) you can do something much like what your numarray code does by using
    random.shuffle


    As far as speed goes, here's the timings I got using your email (minus the code)
    as input (your code is saved in 'temp.txt'):

    >python -m timeit -n 10000 -s "import temp; file_in = file('temp.txt'); file_out

    = file('out.txt', 'w')" "temp.f1(file_in, file_out)"
    10000 loops, best of 3: 31.6 usec per loop

    >python -m timeit -n 10000 -s "import temp; file_in = file('temp.txt'); file_out

    = file('out.txt', 'w')" "temp.f2(file_in, file_out)"
    10000 loops, best of 3: 8.02 usec per loop

    Looks to me like the one without numarray is much quicker, but I can't be sure
    that Numeric would behave in the same manner.

    > - It's a good example how powerful libraries, like
    > Numeric, make one's life easier. (BTW, why is Numeric
    > and stuff like take() still not included in the standard
    > Library ? Batteries included, but calculator not ?)


    Well, at least in numarray, take is just the functional form of indexing by an
    array:

    >>> arr = na.arange(20)*2
    >>> na.take(arr, na.array([3, 5, 13]))

    array([ 6, 10, 26])
    >>> arr[na.array([3, 5, 13])]

    array([ 6, 10, 26])

    While it's true that Python's builtin lists don't support this directly, list
    comprehensions make this pretty easy to reproduce:

    >>> lst = [2*x for x in range(20)]
    >>> [lst for i in [3, 5, 13]]

    [6, 10, 26]

    > - Perhaps it's useful to protect messages in some
    > regions with not-so-democratic forms of government
    > against automatic scanning by making the message
    > machine-unreadable, causing some Orwellian Confusion ?


    Unfortunately, this 'encoding scheme' ;) doesn't work in all languages -- what's
    the first 'letter' of a two-character word in, say, Mandarin? ;)

    Steve
     
    Steven Bethard, Nov 11, 2004
    #2
    1. Advertising

  3. Johannes Nix

    Andrew Dalke Guest

    Johannes Nix wrote:
    > yesterday I met a cute person (after my dance class)


    Another dancer, eh? I mostly do Latin dancing (salsa, cha-cha,
    nerengue, though haven't got bachata down) some tango and a bit
    of swing.

    > who told me about an
    > interesting experiment regarding cognition. People were told to read a
    > typed text; However, in every word in the text, only the first and the
    > last letter were in the right place.


    That was going around a couple months ago. Here's some links:
    http://jwz.livejournal.com/256229.html
    http://slashdot.org/article.pl?sid=03/09/15/2227256

    > - Is it possible to make it a bit more concise ;-))) ?


    Yes. Try this

    #!/usr/bin/pthoyn
    irpomt re
    imropt rondam
    ipormt snitrg

    # This is lcloae aware, so long as '][-' aren't ltreets.
    # (Orhtiewse use re.epscae)

    wrod_pat = re.colmipe('[' + string.letters + ']{4,}')

    def _jmulbe(m):
    s = m.gruop(0)
    leretts = lsit(s[1:-1])
    rondam.sulfhfe(leetrts)
    return s[0] + "".jion(ltrtees) + s[-1]

    def jmblue_file(ilnife, ouflite):
    for lnie in iinlfe:
    lnie = wrod_pat.sub(_julbme, lnie)
    olftuie.wirte(line)

    if __name__ == "__mian__":
    ioprmt sys
    julbme_file(sys.sdtin, sys.sdtout)


    > - Can it coerced to run a little bit faster ?


    You'll have to test it for yourself. I don't have a copy
    of your data set and can't find it on-line.

    > - It's a good example how powerful libraries, like
    > Numeric, make one's life easier. (BTW, why is Numeric
    > and stuff like take() still not included in the standard
    > Library ? Batteries included, but calculator not ?)


    Well, random.shuffle works nicely as does using re.sub along
    with a callable. I rarely need Numeric.

    > - Perhaps it's useful to protect messages in some
    > regions with not-so-democratic forms of government
    > against automatic scanning by making the message
    > machine-unreadable, causing some Orwellian Confusion ;-) ?
    > Of course, texts from Pythonistas would remain suspicious,
    > due to the large number of "y" occurring in them....


    There are many more ways to do that. Eg, see what the
    spammers do to get through the repressive forms of email
    filters I use.

    Oh, and to make life easier for you,

    #!/usr/bin/python
    import re
    import random
    import string

    # This is locale aware, so long as '][-' aren't letters.
    # (Otherwise use re.escape)

    word_pat = re.compile('[' + string.letters + ']{4,}')

    def _jumble(m):
    s = m.group(0)
    letters = list(s[1:-1])
    random.shuffle(letters)
    return s[0] + "".join(letters) + s[-1]

    def jumble_file(infile, outfile):
    for line in infile:
    line = word_pat.sub(_jumble, line)
    outfile.write(line)

    if __name__ == "__main__":
    import sys
    jumble_file(sys.stdin, sys.stdout)

    Andrew
     
    Andrew Dalke, Nov 11, 2004
    #3
  4. Andrew Dalke wrote:
    >

    Not to prolong this, but if you are dealing with camelCase:

    > word_pat = re.compile('[' + string.letters + ']{4,}')


    you might want to use:
    word_pat = re.compile('[' + string.letters + ']['
    + string.lowercase + ']{3,}')


    -Scott David Daniels
     
    Scott David Daniels, Nov 12, 2004
    #4
  5. Scott David Daniels <Scott.Daniels <at> Acm.Org> writes:
    >
    > Not to prolong this, but if you are dealing with camelCase:
    >
    > > word_pat = re.compile('[' + string.letters + ']{4,}')

    >
    > you might want to use:
    > word_pat = re.compile('[' + string.letters + ']['
    > + string.lowercase + ']{3,}')


    Is there any reason not to use [A-z] type regexps?

    >>> p = re.compile('[' + string.letters + ']{4,}')
    >>> p.findall('fd 234 asdf454 asdfsa4 sadf Qsdha asdfAded')

    ['asdf', 'asdfsa', 'sadf', 'Qsdha', 'asdfAded']
    >>> p = re.compile('[A-z]{4,}')
    >>> p.findall('fd 234 asdf454 asdfsa4 sadf Qsdha asdfAded')

    ['asdf', 'asdfsa', 'sadf', 'Qsdha', 'asdfAded']

    >>> p = re.compile('[' + string.letters + ']['+ string.lowercase + ']{3,}')
    >>> p.findall('fd 234 asdf454 asdfsa4 sadf Qsdha asdfAded')

    ['asdf', 'asdfsa', 'sadf', 'Qsdha', 'asdf', 'Aded']
    >>> p = re.compile('[A-z][a-z]{3,}')
    >>> p.findall('fd 234 asdf454 asdfsa4 sadf Qsdha asdfAded')

    ['asdf', 'asdfsa', 'sadf', 'Qsdha', 'asdf', 'Aded']

    Seems to do the same thing to me, and doesn't require importing string (which
    will hopefully be deprecated one of these days...) =)

    Steve
     
    Steven Bethard, Nov 12, 2004
    #5
  6. Johannes Nix

    Andrew Dalke Guest

    Steven Bethard wrote:
    > Is there any reason not to use [A-z] type regexps?


    Better support for internationalization, so it will
    work in España and Göteborg.



    Andrew
     
    Andrew Dalke, Nov 12, 2004
    #6
  7. regular expressions and internationalization (WAS: permutingletters...)

    Andrew Dalke <adalke <at> mindspring.com> writes:
    >
    > Steven Bethard wrote:
    > > Is there any reason not to use [A-z] type regexps?

    >
    > Better support for internationalization, so it will
    > work in España and Göteborg.


    Ahh. That makes sense of course. Thanks!

    I looked again at the re module, and it seems that \w and \W do have
    internationalization support... Is there any way to match \w but not \d? Maybe
    something like:
    r'[^\d\W]{4,}'

    This seems to work (maybe?):

    >>> p = re.compile(r'[^\d\W]{4,}', re.UNICODE)
    >>> p.findall(u'él me compró un globo. 1234 a342')

    [u'compr\xf3', u'globo']

    I don't know how to check how this works in different locales though...

    Steve
     
    Steven Bethard, Nov 12, 2004
    #7
  8. Johannes Nix

    Andrew Dalke Guest

    Re: regular expressions and internationalization (WAS: permutingletters...)

    Steven Bethard wrote:
    > I looked again at the re module, and it seems that \w and \W do have
    > internationalization support... Is there any way to match \w but not \d? Maybe
    > something like:
    > r'[^\d\W]{4,}'
    >
    > This seems to work (maybe?):


    Yeah, I tried that originally but noticed that digits and '_'
    are included, which ruined the idea of the scramble. So I
    opted with the OP's choice and hard-coded just the letters.

    >>>>p = re.compile(r'[^\d\W]{4,}', re.UNICODE)


    Nice way to do it. I would also put '_' in that exclude list.

    > I don't know how to check how this works in different
    > locales though...


    I think I don't understand locales well enough either. It
    looks like I need to use re.UNICODE more often.

    >>> import string
    >>> string.letters

    'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
    >>> import re
    >>> pat = re.compile(r"[^\d^\W]{4,}", re.UNICODE)
    >>> GOT = u"G\N{LATIN SMALL LETTER O WITH DIAERESIS}teborg"
    >>> print GOT.encode("utf8")

    Göteborg
    >>> pat.search(GOT).group(0)

    u'G\xf6teborg'
    >>> pat = re.compile(r"[^\d^\W]{4,}")
    >>> pat.search(GOT).group(0)

    u'teborg'
    >>>
    >>> import locale
    >>> locale.setlocale(locale.LC_ALL, "")

    'C'
    >>>


    So you can see that re.UNICODE uses the Unicode definition
    of what is a letter despite the locale being C.

    Therefore, I think your approach is better.

    Andrew
     
    Andrew Dalke, Nov 12, 2004
    #8
  9. Johannes Nix

    Carl Banks Guest

    Johannes Nix <> wrote in message news:<>...
    > yesterday I met a cute person


    Obviously this wouldn't have happened if you were using Perl.


    > (after my dance class) who told me about an
    > interesting experiment regarding cognition. People were told to read a
    > typed text; However, in every word in the text, only the first and the
    > last letter were in the right place. The other letters had their
    > positions changed in an arbitrary manner. The surprising result, I was
    > told, was that people can read this mixed-up text fairly well.


    I never entirely bought this, although I've never gone to the
    empirical extreme you did.

    If you look at the oft-cited example (see
    http://www.snopes.com/language/apocryph/cambridge.asp ), you will see
    that the intetior letters are not even close to being in random order.
    For the most part, letters near the front of the word remain near the
    front, and letters towards the back remain near the back.

    My thought is that, the brain doesn't read the word as a whole as they
    claim, but neither does it pay too much attention to the _exact_
    ordering. So you could scramble the letters a little, and it will
    still be recognizable; but scramble them a lot, and you will have
    trouble.

    I suspect that a fairy tale wouldn't be too hard even with random
    ordering, though, since fairy tales don't use a lot of big words.


    > Because I am a somewhat sceptical guy, at times, and because I thought
    > that I deserved some play, I decided to code the rule above in a
    > scriptlet. The resulting 23 lines are below, and the outcome is quite
    > interesting, not only from the point of view of librarians ;-) :
    >
    > --------------------------------------------------
    > #!/usr/bin/python
    > import sys
    > import locale
    > import string
    > import re
    > import Numeric
    > import RandomArray
    >
    > locale.setlocale(locale.LC_CTYPE, '')
    > wordsep = re.compile('([^%s])' % string.letters)
    >
    > for line in sys.stdin.xreadlines():
    > for word in wordsep.split(line):
    > if word and word[0] in string.letters:
    > word = string.lower(word)
    > wlen = len(word)
    > if wlen > 3:
    > wa = Numeric.array(word)
    > perm = RandomArray.permutation(wlen-2)
    > wa[1:wlen-1] = Numeric.take(wa[1:wlen-1],perm)
    > word = wa.tostring()
    > sys.stdout.write('%s' % word)
    >
    > --------------------------------------------------
    >
    >
    > For the Uninitiated, Numeric is a package which deals with array data;
    > arrays are mutable sequences and Numeric.take() can reorder items in
    > them; RandomArray.permutation() delivers the randomized reordering we
    > need.
    >
    > Now I have two innocent questions:
    >
    > - Is it possible to make it a bit more concise ;-))) ?


    Yes, but I'd say you're already at the point where more conciseness
    causes an unacceptable hit in the readability. I suspect you also
    thought this.


    > - Can it coerced to run a little bit faster ?
    > (on my oldish, 300 MHz-AMD K6 , run time looks like this
    > for a famous, 2663-word-long fairy tale from the Grimm's brothers:
    >
    > nix@aster:~> time <HaenselundGretel.txt ./python/perlmutt.py >v
    >
    > real 0m6.970s
    > user 0m3.634s
    > sys 0m0.120s


    Well, the thing about Numeric and numarray is that it's not designed
    to scale down well. If performance is your concern, you are probably
    better off using regular Python lists if your operations act on small
    arrays, as they do in this case. It doesn't surprise me that Steven
    Bethard found that ordinary Python ran faster.

    However, maybe we can use numarray to a greater advantage. The best
    way to use numarray is to operate on as much data as possible, which
    in this case is the whole text file. The question is: can we use
    operations on the whole text file to do this job? Well, we probably
    can't scramble the letters of individual words by operating on the
    whole text file at once. But could we find the word boundaries that
    way? The answer is yes. Have a look at my snippet to see how:

    -----------------------
    import numarray as na
    from numarray import random_array as ra

    def interior_scramble(text):

    # First, build a length 256 table indicating if this is a real
    # letter. It's a boolean-valued array.

    ltable = na.array([ chr(c).isalpha() for c in range(256) ])

    # Turn the above text into a numarray array We add a space
    # before and after the text, to make it possible to see the
    # boundaries on the first and last words

    atext = na.fromstring(" %s " % text, na.UInt8)

    # Now, calculate a mask. This uses the binary values of the
    # text string to index into the ltable, so that whereever a
    # letter appears in the text, a one appears in the mask.

    amask = ltable[atext]

    # Here's the magic: to find the word boundaries, we xor two
    # slices that are offset by one. Learning to use slicing in
    # this way is one of the greatest ways to channel the power of
    # numarray.

    # In the mask array, a start boundary will be a zero followed
    # by a one. An end will be a one followed by a zero.
    # Therefore, if you xor adjacent entries in the mask array,
    # the result will be nonzero whereever there's a boundary.
    # So, let's xor two one-off slices (which has the effect of
    # xoring adjacent entries) of the mask to get the word
    # boundaries.

    pat = amask[:-1] ^ amask[1:]

    # pat will be nonzero wherever there's a word boundary. Let's
    # use numarray.nonzero to get the indices of the word
    # boundaries. The word boundaries will be
    # start,end,start,end,... so we can reshape the array so that
    # each row is a start,end pair.

    words = na.reshape(na.nonzero(pat)[0],(-1,2))

    # Because of how the resulting array aligns, we have to add 2
    # to each start value to get the index where the scrambling
    # begins.

    words[:] += [2,0]

    # We got the word boundaries. Now, loop through them and
    # scramble. Because the whole file is a numarray, we don't
    # need to convert each word to a numarray and back
    # individually. That also saves some time.

    for start,end in words:
    n = end-start
    if n < 2:
    continue
    perm = ra.permutation(end-start)
    atext[start:end] = atext[start:end][perm]

    # It's scrambled. Convert the applicable slice back to a
    # string.

    return atext[1:-1].tostring()
    -----------------

    I'm not sure if this is faster than your method or the pure-Python
    method; I didn't check (I worked too hard writing it :). But,
    generally speaking, whenever you can operate on your whole data set
    using numarray at once, doing so will be a big time savings.


    --
    CARL BANKS
     
    Carl Banks, Nov 13, 2004
    #9
  10. On Thu, 11 Nov 2004 22:31:57 +0100, Johannes Nix <> wrote:

    >
    >Hello,
    >
    >yesterday I met a cute person (after my dance class) who told me about an
    >interesting experiment regarding cognition. People were told to read a
    >typed text; However, in every word in the text, only the first and the
    >last letter were in the right place. The other letters had their
    >positions changed in an arbitrary manner. The surprising result, I was
    >told, was that people can read this mixed-up text fairly well.
    >
    >Because I am a somewhat sceptical guy, at times, and because I thought
    >that I deserved some play, I decided to code the rule above in a
    >scriptlet. The resulting 23 lines are below, and the outcome is quite
    >interesting, not only from the point of view of librarians ;-) :
    >
    >--------------------------------------------------
    >#!/usr/bin/python
    >import sys
    >import locale
    >import string
    >import re
    >import Numeric
    >import RandomArray
    >
    >locale.setlocale(locale.LC_CTYPE, '')
    >wordsep = re.compile('([^%s])' % string.letters)
    >
    >for line in sys.stdin.xreadlines():
    > for word in wordsep.split(line):
    > if word and word[0] in string.letters:
    > word = string.lower(word)
    > wlen = len(word)
    > if wlen > 3:
    > wa = Numeric.array(word)
    > perm = RandomArray.permutation(wlen-2)
    > wa[1:wlen-1] = Numeric.take(wa[1:wlen-1],perm)
    > word = wa.tostring()
    > sys.stdout.write('%s' % word)
    >
    >--------------------------------------------------
    >
    >
    >For the Uninitiated, Numeric is a package which deals with array data;
    >arrays are mutable sequences and Numeric.take() can reorder items in
    >them; RandomArray.permutation() delivers the randomized reordering we
    >need.
    >
    >Now I have two innocent questions:
    >
    >- Is it possible to make it a bit more concise ;-))) ?
    >
    >- Can it coerced to run a little bit faster ?
    > (on my oldish, 300 MHz-AMD K6 , run time looks like this
    > for a famous, 2663-word-long fairy tale from the Grimm's brothers:
    >
    > nix@aster:~> time <HaenselundGretel.txt ./python/perlmutt.py >v
    >
    > real 0m6.970s
    > user 0m3.634s
    > sys 0m0.120s
    >
    >
    >And two remarks what is interesting about it:
    >
    >- It's a good example how powerful libraries, like
    > Numeric, make one's life easier. (BTW, why is Numeric
    > and stuff like take() still not included in the standard
    > Library ? Batteries included, but calculator not ?)
    >
    >- Perhaps it's useful to protect messages in some
    > regions with not-so-democratic forms of government
    > against automatic scanning by making the message
    > machine-unreadable, causing some Orwellian Confusion ;-) ?
    > Of course, texts from Pythonistas would remain suspicious,
    > due to the large number of "y" occurring in them....
    >
    >have a nice evening....
    >

    Don't know the speed, but this seems fairly self-documenting to me
    (with a little thought ;-):

    >>> import random
    >>> def messwith(s):

    ... seqisalpha = False; seq = []
    ... for c in s:
    ... if c.isalpha() == seqisalpha: seq.append(c); continue
    ... elif seqisalpha and len(seq)>3:
    ... mid = seq[1:-1]
    ... random.shuffle(mid)
    ... seq[1:-1] = mid
    ... yield ''.join(seq)
    ... seq = [c]
    ... seqisalpha = c.isalpha()
    ... if seq: yield ''.join(seq)
    ...
    >>> def jumble(s): return ''.join(messwith(s))

    ...
    >>> jumble('This is an example. It has 7 words ;-)')

    'This is an elmxape. It has 7 wrods ;-)'

    Regards,
    Bengt Richter
     
    Bengt Richter, Nov 13, 2004
    #10
  11. Johannes Nix

    Benji York Guest

    Benji York, Nov 13, 2004
    #11
  12. Johannes Nix

    Mike Meyer Guest

    (Carl Banks) writes:

    > Johannes Nix <> wrote in message news:<>...
    > My thought is that, the brain doesn't read the word as a whole as they
    > claim, but neither does it pay too much attention to the _exact_
    > ordering. So you could scramble the letters a little, and it will
    > still be recognizable; but scramble them a lot, and you will have
    > trouble.


    "Word shape" used to be a popular theory on word recognition. That
    would explain why mixing up the interior letters sometimes works and
    sometimes doesn't. But that theory has fallen out of favor in recent
    years.

    <mike
    --
    Mike Meyer <> http://www.mired.org/home/mwm/
    Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
     
    Mike Meyer, Nov 13, 2004
    #12
  13. On Sat, 13 Nov 2004 03:30:44 GMT, (Bengt Richter) wrote:
    [...]
    >>

    >Don't know the speed, but this seems fairly self-documenting to me
    >(with a little thought ;-):
    >
    > >>> import random
    > >>> def messwith(s):

    > ... seqisalpha = False; seq = []
    > ... for c in s:
    > ... if c.isalpha() == seqisalpha: seq.append(c); continue
    > ... elif seqisalpha and len(seq)>3:
    > ... mid = seq[1:-1]
    > ... random.shuffle(mid)
    > ... seq[1:-1] = mid
    > ... yield ''.join(seq)
    > ... seq = [c]
    > ... seqisalpha = c.isalpha()
    > ... if seq: yield ''.join(seq)
    > ...
    > >>> def jumble(s): return ''.join(messwith(s))

    > ...
    > >>> jumble('This is an example. It has 7 words ;-)')

    > 'This is an elmxape. It has 7 wrods ;-)'
    >

    Hum, didn't think enough ;-/

    >>> jumble('last is antidisestablishmentarianism')

    'last is antidisestablishmentarianism'
    >>> jumble('last is antidisestablishmentarianism')

    'lsat is antidisestablishmentarianism'

    Bad logic. Needs a repeat of the elif test and suite before the last yield line ;-/
    I kind of smelled that as I was posting, but was too lazy to identify the offensive
    material ;-)

    But it does bring up a general problem of final logic in an iter loop. ISTM it could
    be useful to have an option not to raise StopIteration, but instead keep returning
    a specified sentinel at the end, something like file.read returns '' repeatedly at EOF.
    E.g., I could have used it like (untested, and impossible since no such iter option ;-):

    def messwith(s):
    seqisalpha = False; seq = []
    next = iter(s, EOSEQ='').next
    while True:
    c = next()
    if c and c.isalpha() == seqisalpha: seq.append(c); continue
    elif seqisalpha and len(seq)>3:
    mid = seq[1:-1]
    random.shuffle(mid)
    seq[1:-1] = mid
    yield ''.join(seq)
    if not c: break
    seq = [c]
    seqisalpha = c.isalpha()

    In the meanwhile, I don't like any of the workarounds, but maybe
    I just need some real sleep ;-)

    Regards,
    Bengt Richter
     
    Bengt Richter, Nov 13, 2004
    #13
  14. Johannes Nix

    Kent Johnson Guest

    Bengt Richter wrote:
    > But it does bring up a general problem of final logic in an iter loop. ISTM it could
    > be useful to have an option not to raise StopIteration, but instead keep returning
    > a specified sentinel at the end, something like file.read returns '' repeatedly at EOF.


    From the itertools examples:
    >>> def padnone(seq):

    .... "Returns the sequence elements and then returns None indefinitely"
    .... return chain(seq, repeat(None))

    Kent
     
    Kent Johnson, Nov 13, 2004
    #14
  15. On Sat, 13 Nov 2004 08:40:24 -0500, Kent Johnson <> wrote:

    >Bengt Richter wrote:
    >> But it does bring up a general problem of final logic in an iter loop. ISTM it could
    >> be useful to have an option not to raise StopIteration, but instead keep returning
    >> a specified sentinel at the end, something like file.read returns '' repeatedly at EOF.

    >
    > From the itertools examples:
    > >>> def padnone(seq):

    >... "Returns the sequence elements and then returns None indefinitely"
    >... return chain(seq, repeat(None))
    >

    Thanks for shaming me into refreshing my acquaintance with the itertools module ;-)
    It really deserves to be more than an acquaintance. It's surprising that itertools
    doesn't show up in posted solutions more (Alex's notwithstanding ;-)

    Regards,
    Bengt Richter
     
    Bengt Richter, Nov 13, 2004
    #15
  16. Johannes Nix

    David H Wild Guest

    In article <>,
    Carl Banks <> wrote:
    > My thought is that, the brain doesn't read the word as a whole as they
    > claim, but neither does it pay too much attention to the _exact_
    > ordering. So you could scramble the letters a little, and it will
    > still be recognizable; but scramble them a lot, and you will have
    > trouble.


    In many cases, of course, we pick up what the word is from the context. If
    I were to write "He was very tired, so he went to bad before nine o clock"
    it is quite likely that the spelling mistake wouldn't be noticed - or that
    people would read the sense and then think "there's something wrong there".
    If the text had a dirty mark covering the letter between 'b' and 'd' most
    people would see it as "bed" without missing a beat.

    --
    __ __ __ __ __ ___ _____________________________________________
    |__||__)/ __/ \|\ ||_ | / Acorn StrongArm Risc_PC
    | || \\__/\__/| \||__ | /...Internet access for all Acorn RISC machines
    ___________________________/
     
    David H Wild, Nov 13, 2004
    #16
  17. Johannes Nix

    Peter Otten Guest

    Bengt Richter wrote:

    > It really deserves to be more than an acquaintance.  It's surprising that
    > itertools doesn't show up in posted solutions more (Alex's notwithstanding
    > ;-)


    I can help fix that :)

    While I think that a good old for-loop is often more readable than the
    "cool" itertools solution, itertools has just grown a new function that
    fits into your messwith() generator quite nicely:

    def messwith(s):
    for isalpha, chars in itertools.groupby(s, type(s).isalpha):
    chars = list(chars)
    if isalpha and len(chars) > 3:
    mid = chars[1:-1]
    random.shuffle(mid)
    chars[1:-1] = mid
    yield "".join(chars)

    Peter
     
    Peter Otten, Nov 14, 2004
    #17
  18. Re: regular expressions and internationalization (WAS: permuting letters...)

    Steven Bethard <> writes on Fri, 12 Nov 2004 20:15:28 +0000 (UTC):
    > ...
    > Is there any way to match \w but not \d?


    It is: r'(?!\d)\w'

    The '(?!...)' is called a negative lookahead.
     
    Dieter Maurer, Nov 17, 2004
    #18
  19. Johannes Nix

    Andrew Dalke Guest

    Re: regular expressions and internationalization (WAS: permutingletters...)

    Steven Bethard <> writes on Fri, 12 Nov 2004
    20:15:28 +0000 (UTC):
    >Is there any way to match \w but not \d?


    Dieter Maurer wrote:
    > It is: r'(?!\d)\w'


    While implementation are free to optimize this case, the current
    Python implementation is slower than the other solution of r"[^\d\W]"

    >>> text = "Blah an123d blah901234 9spam and eggs\n" * 1000
    >>> import re
    >>> pat1 = re.compile(r"((?!\d)\w)+")
    >>> pat2 = re.compile(r"[^\d\W]+")
    >>> len(pat2.findall(text))

    7000
    >>> len(pat1.findall(text))

    7000
    >>> import timeit
    >>> x = timeit.Timer(setup = "import __main__ as M",

    stmt = "M.pat1.findall(M.text)")
    >>> x.timeit(100)

    4.0506279468536377
    >>> x = timeit.Timer(setup = "import __main__ as M",

    stmt = "M.pat2.findall(M.text)")
    >>> x.timeit(100)

    1.8287069797515869
    >>>


    Andrew
     
    Andrew Dalke, Nov 17, 2004
    #19
  20. Re: regular expressions and internationalization (WAS: permutingletters...)

    Dieter Maurer wrote:
    > Steven Bethard <> writes on Fri, 12 Nov 2004 20:15:28 +0000 (UTC):
    > >
    > > Is there any way to match \w but not \d?

    >
    > It is: r'(?!\d)\w'


    Yeah, I guess you could use negative lookahead assertions too. My
    proposed solution to the problem discussed in this thread:

    >>> re.findall(r'[^\W\d_]{4,}', 'asdg1dfs _asfd s adfsa')

    ['asdg', 'asfd', 'adfsa']

    A solution using a negative lookahead assertion:

    >>> re.findall(r'(?:(?![\d_])\w){4,}', 'asdg1dfs _asfd s adfsa')

    ['asdg', 'asfd', 'adfsa']

    This seems a fair bit more verbose (and IMHO harder to read) than the
    solution I proposed, but perhaps you had a clearer version in mind?

    I tend to shy away from lookahead assertions because IMHO there's
    usually an easier way. They are occasionally useful though...

    Steve
     
    Steven Bethard, Nov 17, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    25
    Views:
    674
    Peter Decker
    Feb 16, 2007
  2. Christian Meesters

    permuting over nested dicts?

    Christian Meesters, Oct 31, 2007, in forum: Python
    Replies:
    8
    Views:
    296
    Boris Borcic
    Nov 8, 2007
  3. Merrigan
    Replies:
    4
    Views:
    610
    Chris
    Dec 14, 2007
  4. Venugopal
    Replies:
    11
    Views:
    1,686
    Tassilo v. Parseval
    Nov 5, 2003
  5. Brian Wakem

    Permuting using any number of given chars

    Brian Wakem, May 17, 2005, in forum: Perl Misc
    Replies:
    1
    Views:
    94
    Anno Siegel
    May 17, 2005
Loading...

Share This Page