letter frequency counter / your thoughts..

U

umpsumps

Hello,

Here is my code for a letter frequency counter. It seems bloated to
me and any suggestions of what would be a better way (keep in my mind
I'm a beginner) would be greatly appreciated..

def valsort(x):
res = []
for key, value in x.items():
res.append((value, key))
return res

def mostfreq(strng):
dic = {}
for letter in strng:
if letter not in dic:
dic.setdefault(letter, 1)
else:
dic[letter] += 1
newd = dic.items()
getvals = valsort(newd)
getvals.sort()
length = len(getvals)
return getvals[length - 3 : length]

thanks much!!
 
P

Paul Melis

Here is my code for a letter frequency counter. It seems bloated to
me and any suggestions of what would be a better way (keep in my mind
I'm a beginner) would be greatly appreciated..

Not bad for a beginner I think :)
def valsort(x):
res = []
for key, value in x.items():
res.append((value, key))
return res

def mostfreq(strng):
dic = {}
for letter in strng:
if letter not in dic:
dic.setdefault(letter, 1)
else:
dic[letter] += 1
newd = dic.items()
getvals = valsort(newd)
getvals.sort()
length = len(getvals)
return getvals[length - 3 : length]

thanks much!!

Slightly shorter:

def mostfreq(strng):
dic = {}
for letter in strng:
if letter not in dic:
dic[letter] = 0
dic[letter] += 1
# Swap letter, count here as we want to sort on count first
getvals = [(pair[1],pair[0]) for pair in dic.iteritems()]
getvals.sort()
return getvals[-3:]

I'm not sure if you wanted the function mostfreq to return the 3 most
frequent letters of the first 3 letters? It seems to do the latter. The
code above uses the former, i.e. letters with highest frequency.

Paul
 
C

castironpi

Here is my code for a letter frequency counter.  It seems bloated to
me and any suggestions of what would be a better way (keep in my mind
I'm a beginner) would be greatly appreciated..

Not bad for a beginner I think :)




def valsort(x):
   res = []
   for key, value in x.items():
           res.append((value, key))
   return res
def mostfreq(strng):
   dic = {}
   for letter in strng:
           if letter not in dic:
                   dic.setdefault(letter, 1)
           else:
                   dic[letter] += 1
   newd = dic.items()
   getvals = valsort(newd)
   getvals.sort()
   length = len(getvals)
   return getvals[length - 3 : length]
thanks much!!

Slightly shorter:

def mostfreq(strng):
     dic = {}
     for letter in strng:
         if letter not in dic:
             dic[letter] = 0
         dic[letter] += 1
     # Swap letter, count here as we want to sort on count first
     getvals = [(pair[1],pair[0]) for pair in dic.iteritems()]
     getvals.sort()
     return getvals[-3:]

I'm not sure if  you wanted the function mostfreq to return the 3 most
frequent letters of the first 3 letters? It seems to do the latter. The
code above uses the former, i.e. letters with highest frequency.

Paul- Hide quoted text -

- Show quoted text -

I think I'd try to get a deque on disk. Constant indexing. Store
disk addresses in b-trees. How long does 'less than' take? Is a
sector small, and what's inside?
 
A

Arnaud Delobelle

Hello,

Here is my code for a letter frequency counter. It seems bloated to
me and any suggestions of what would be a better way (keep in my mind
I'm a beginner) would be greatly appreciated..

def valsort(x):
res = []
for key, value in x.items():
res.append((value, key))
return res

def mostfreq(strng):
dic = {}
for letter in strng:
if letter not in dic:
dic.setdefault(letter, 1)
else:
dic[letter] += 1
newd = dic.items()
getvals = valsort(newd)
getvals.sort()
length = len(getvals)
return getvals[length - 3 : length]

thanks much!!

I won't comment on the algorithm, but I think you should try to find
better names for your variables. In the snippet above you have x,
res, dic, newd, length, getvals which don't give much of a clue as to
what they are used for.

e.g.

* dic = {}
We know it's a dict, but a dict of what?

* newd = dic.items()
Sounds like 'new dictionary', but obviously isn'tas it is a list
of key,value pairs.

* length = len(getvals)
Again, we know it's a length, but the length of what?

HTH
 
I

Ian Kelly

dic = {}
for letter in strng:
if letter not in dic:
dic[letter] = 0
dic[letter] += 1

As a further refinement, you could use the defaultdict class from the
collections module:

dic = defaultdict(int)
for letter in strng:
dic[letter] += 1
 
C

castironpi

    dic = {}
    for letter in strng:
        if letter not in dic:
            dic[letter] = 0
        dic[letter] += 1

As a further refinement, you could use the defaultdict class from the
collections module:

   dic = defaultdict(int)
       for letter in strng:
           dic[letter] += 1

Sounds like novel flow of control.
 
U

umpsumps

That's a great suggestion Arnaud. I'll keep that in mind next time I
post code. Thanks ;)


Here is my code for a letter frequency counter. It seems bloated to
me and any suggestions of what would be a better way (keep in my mind
I'm a beginner) would be greatly appreciated..
def valsort(x):
res = []
for key, value in x.items():
res.append((value, key))
return res
def mostfreq(strng):
dic = {}
for letter in strng:
if letter not in dic:
dic.setdefault(letter, 1)
else:
dic[letter] += 1
newd = dic.items()
getvals = valsort(newd)
getvals.sort()
length = len(getvals)
return getvals[length - 3 : length]
thanks much!!

I won't comment on the algorithm, but I think you should try to find
better names for your variables. In the snippet above you have x,
res, dic, newd, length, getvals which don't give much of a clue as to
what they are used for.

e.g.

* dic = {}
We know it's a dict, but a dict of what?

* newd = dic.items()
Sounds like 'new dictionary', but obviously isn'tas it is a list
of key,value pairs.

* length = len(getvals)
Again, we know it's a length, but the length of what?

HTH
 
P

Paul Hankin

Here is my code for a letter frequency counter.  It seems bloated to
me and any suggestions of what would be a better way (keep in my mind
I'm a beginner) would be greatly appreciated..

Yours is a little more efficient than this, but here's a compact way
to write what you want.

import heapq

def mostfreq(message):
return heapq.nlargest(3, set(message), key=message.count)
 
P

Paul Rubin

def valsort(x):
res = []
for key, value in x.items():
res.append((value, key))
return res

Note: all code below is untested and may have errors ;-)

I think the above is misnamed because it doesn't actually sort.
Anyway, you could write it as a list comprehension:

def valsort(d):
return [(value, key) for (key, value) in d]

def mostfreq(strng):
dic = {}
for letter in strng:
if letter not in dic:
dic.setdefault(letter, 1)
else:
dic[letter] += 1

I would write that with the defaultdict module:

from collections import defaultdict
def mostfreq(strng):
dic = defaultdict(int)
for letter in strng:
dic[letter] += 1

Alternatively with regular dicts, you could say:

def mostfreq(strng):
dic = {}
for letter in strng:
dic[letter] = dic.get(letter, 0) + 1
newd = dic.items()
getvals = valsort(newd)
getvals.sort()
length = len(getvals)
return getvals[length - 3 : length]

Someone else suggested the heapq module, which is a good approach
though it might be considered a little bit high-tech. If you
want to use sorting (conceptually simpler), you could use the
sorted function instead of the in-place sorting function:

# return the second element of a 2-tuple. Note how we
# use tuple unpacking: this is really a function of one argument
# (the tuple) but we specify the arg as (a,b) so the tuple
# is automatically unpacked on entry to the function.
# this is a limited form of the "pattern matching" found in
# languages like ML.
def snd((a,b)): return b

return sorted(dic.iteritems, key=snd, reverse=True)[-3:]
 
B

bruno.desthuilliers

Hello,

Here is my code for a letter frequency counter. It seems bloated to
me and any suggestions of what would be a better way (keep in my mind
I'm a beginner) would be greatly appreciated..

def valsort(x):
res = []
for key, value in x.items():
res.append((value, key))
return res

def mostfreq(strng):
dic = {}
for letter in strng:
if letter not in dic:
dic.setdefault(letter, 1)

You don't need dict.setdefault here - you could more simply use:
dic[letter] = 0
else:
dic[letter] += 1
newd = dic.items()
getvals = valsort(newd)

There's an error here, see below...
getvals.sort()
length = len(getvals)
return getvals[length - 3 : length]

This is a very convoluted way to get the last (most used) 3 pairs. The
shortest way is:
return getvals[-3:]


Now... Did you actually test your code ?-)

mostfreq("abbcccddddeeeeeffffff")

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/tmp/python-1048583f.py", line 15, in mostfreq
File "/usr/tmp/python-1048583f.py", line 3, in valsort
AttributeError: 'list' object has no attribute 'items'

Hint : you're passing valsort a list of pairs when it expects a
dict...

I don't see the point of the valsort function that - besides it
doesn't sort anything (I can only second Arnaud about naming) - seems
like an arbitrary extraction of a piece of code for no obvious reason,
and could be expressed in a simple single line:
getvals = [(v, k) for k, v in dic.items()]

While we're at it, returning only the 3 most frequent items seems a
bit arbitrary too - you could either return the whole thing, or at
least let the user decide how many items he wants.

And finally, I'd personnally hope such a function to return a list of
(letter, frequency) and not a list of (frequency, letter).

Here's a (Python 2.5.x only) possible implementation:

from collections import defaultdict

def get_letters_frequency(source):
letters_count = defaultdict(int)
for letter in source:
letters_count[letter] += 1
sorted_count = sorted(
((freq, letter) for letter, freq in letters_count.iteritems()),
reverse=True
)
return [(letter, freq) for freq, letter in sorted_count]

get_letters_frequency("abbcccddddeeeeeffffff")
=> [('f', 6), ('e', 5), ('d', 4), ('c', 3), ('b', 2), ('a', 1)]

# and if you only want the top 3:
get_letters_frequency("abbcccddddeeeeeffffff")[0:3]
=> [('f', 6), ('e', 5), ('d', 4)]

HTH
 
J

John Machin

That's a great suggestion Arnaud. I'll keep that in mind next time I
post code. Thanks ;)

It's a suggestion for YOUR benefit, not ours. Consider keeping it in
mind next time you WRITE code, whether you intend publishing it or
not.
 
B

bruno.desthuilliers

Someone else suggested the heapq module, which is a good approach
though it might be considered a little bit high-tech. If you
want to use sorting (conceptually simpler), you could use the
sorted function instead of the in-place sorting function:

# return the second element of a 2-tuple. Note how we
# use tuple unpacking: this is really a function of one argument
# (the tuple) but we specify the arg as (a,b) so the tuple
# is automatically unpacked on entry to the function.
# this is a limited form of the "pattern matching" found in
# languages like ML.
def snd((a,b)): return b

operator.itemgetter does this already
return sorted(dic.iteritems, key=snd, reverse=True)[-3:]

you want to actually call iteritems here !-)

return sorted(dic.iteritems(), key=operator.itemgetter(1),
reverse=True)

Thanks for reminding me the 'key' argument to sorted anyway - I too
often tend to forget it.
 
B

bruno.desthuilliers

On 7 mai, 23:51, "(e-mail address removed)"
(snip)

Small improvement thanks to Paul Rubin:

from collections import defaultdict
from operator import itemgetter

def get_letters_frequency(source):
letters_count = defaultdict(int)
for letter in source:
letters_count[letter] += 1
return sorted(
letters_count.iteritems(),
key=itemgetter(1),
reverse=True
)
 
C

castironpi

(snip)

Small improvement thanks to Paul Rubin:

from collections import defaultdict
from operator import itemgetter

def get_letters_frequency(source):
    letters_count = defaultdict(int)
    for letter in source:
        letters_count[letter] += 1
      return sorted(
          letters_count.iteritems(),
          key=itemgetter(1),
          reverse=True
       )

I have a bounce on mostfreq("abbcccddddeeeeeffffff"). It's more
useful when it runs in real time, i.e. persists. Can I write database
code on time in a pickle?

serial= depickle( uniqueA )
serial.append( "a" )
repickle( uniqueA, serial )

I have to keep control during three ops.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top