Hello,
Here is my code for a letter frequency counter. It seems bloated to
me and any suggestions of what would be a better way (keep in my mind
I'm a beginner) would be greatly appreciated..
def valsort(x):
res = []
for key, value in x.items():
res.append((value, key))
return res
def mostfreq(strng):
dic = {}
for letter in strng:
if letter not in dic:
dic.setdefault(letter, 1)
You don't need dict.setdefault here - you could more simply use:
dic[letter] = 0
else:
dic[letter] += 1
newd = dic.items()
getvals = valsort(newd)
There's an error here, see below...
getvals.sort()
length = len(getvals)
return getvals[length - 3 : length]
This is a very convoluted way to get the last (most used) 3 pairs. The
shortest way is:
return getvals[-3:]
Now... Did you actually test your code ?-)
mostfreq("abbcccddddeeeeeffffff")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/tmp/python-1048583f.py", line 15, in mostfreq
File "/usr/tmp/python-1048583f.py", line 3, in valsort
AttributeError: 'list' object has no attribute 'items'
Hint : you're passing valsort a list of pairs when it expects a
dict...
I don't see the point of the valsort function that - besides it
doesn't sort anything (I can only second Arnaud about naming) - seems
like an arbitrary extraction of a piece of code for no obvious reason,
and could be expressed in a simple single line:
getvals = [(v, k) for k, v in dic.items()]
While we're at it, returning only the 3 most frequent items seems a
bit arbitrary too - you could either return the whole thing, or at
least let the user decide how many items he wants.
And finally, I'd personnally hope such a function to return a list of
(letter, frequency) and not a list of (frequency, letter).
Here's a (Python 2.5.x only) possible implementation:
from collections import defaultdict
def get_letters_frequency(source):
letters_count = defaultdict(int)
for letter in source:
letters_count[letter] += 1
sorted_count = sorted(
((freq, letter) for letter, freq in letters_count.iteritems()),
reverse=True
)
return [(letter, freq) for freq, letter in sorted_count]
get_letters_frequency("abbcccddddeeeeeffffff")
=> [('f', 6), ('e', 5), ('d', 4), ('c', 3), ('b', 2), ('a', 1)]
# and if you only want the top 3:
get_letters_frequency("abbcccddddeeeeeffffff")[0:3]
=> [('f', 6), ('e', 5), ('d', 4)]
HTH