Getting word frequencies from files which are in folder.

K

krisbee1983

Hello to all,

I'm beginer in learning Python I wish somebody help me with solving
this problem. I would like to read all text files wchich are in some
folder. For this text files I need to make some word frequencies using
defined words like "buy", "red", "good". If some file don't have that
word will get "0" for this frequency. It shoud be stored in array. If
I have alredy got frequencies for every file in folder, my array wrote
to text file.

I will be very gratefully for receiving any help.
 
I

Irmen de Jong

Hello to all,

I'm beginer in learning Python I wish somebody help me with solving
this problem. I would like to read all text files wchich are in some
folder. For this text files I need to make some word frequencies using
defined words like "buy", "red", "good". If some file don't have that
word will get "0" for this frequency. It shoud be stored in array. If
I have alredy got frequencies for every file in folder, my array wrote
to text file.

This sounds suspiciously like a homework assignment.
I don't think you'll get much help for this one, unless
you show some code you wrote yourself already with a specific
question about problems you're having....

--Irmen
 
K

krisbee1983

This sounds suspiciously like a homework assignment.
I don't think you'll get much help for this one, unless
you show some code you wrote yourself already with a specific
question about problems you're having....

Well you have some right. I will make it more specific.
I have got something like that:

import os, os.path

def wyswietlanie_drzewa(dir_path):
#function is reading folders and sub folders until it gets to a file.
for name in os.listdir(dir_path):
full_path = os.path.join(dir_path, name)
print full_path
if os.path.isdir(full_path):
wyswietlanie_drzewa(full_path)

My question is how to get word frequencies from this files?
I will be glad to get any help.

Krisbee
 
7

7stud

My question is how to get word frequencies from this files?
I will be glad to get any help.

--files have a read(), readline(), and readlines() method
--strings have a split() method, which splits the string on
whitespace(e.g. spaces)
--lists have a count() method
 
A

Alex Martelli

Well you have some right. I will make it more specific.
I have got something like that:

import os, os.path

def wyswietlanie_drzewa(dir_path):
#function is reading folders and sub folders until it gets to a file.
for name in os.listdir(dir_path):
full_path = os.path.join(dir_path, name)
print full_path
if os.path.isdir(full_path):
wyswietlanie_drzewa(full_path)

My question is how to get word frequencies from this files?
I will be glad to get any help.

You may want to consider os.walk as an alternative way to get all files;
it's easy to wrap it into a generator yielding all files in the subtree.

This, I would think, is the proper factoring in Python: have a generator
yielding each file, and a function taking a file and returning the word
frequencies for that one file. This neatly separates the two halves of
the task -- and you can easily factor things down further...

Give a text file, you can iterate on it: the items are the lines. Given
a line, you can extract all words in it and iterate on those: look at
the re module, and the \w feature of regular-expression pattern strings.
So, a generator that turns a file into a stream of words is also an easy
sub-task to accomplish.

Given a stream of words, and a set of "interesting words", it's easy to
count the occurrences of interesting words. There, I'll supply that
part, to entice you to write the others, and thereby perhaps learn some
Python...:

def count_interesting_words(all_words, interesting_words):
d = dict.fromkeys(interesting_words, 0)
for word in all_words:
if word in d: d[word] += 1
return d


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top