Getting word frequencies from files which are in folder.

Discussion in 'Python' started by krisbee1983@gmail.com, Apr 4, 2007.

  1. Guest

    Hello to all,

    I'm beginer in learning Python I wish somebody help me with solving
    this problem. I would like to read all text files wchich are in some
    folder. For this text files I need to make some word frequencies using
    defined words like "buy", "red", "good". If some file don't have that
    word will get "0" for this frequency. It shoud be stored in array. If
    I have alredy got frequencies for every file in folder, my array wrote
    to text file.

    I will be very gratefully for receiving any help.
     
    , Apr 4, 2007
    #1
    1. Advertising

  2. wrote:
    > Hello to all,
    >
    > I'm beginer in learning Python I wish somebody help me with solving
    > this problem. I would like to read all text files wchich are in some
    > folder. For this text files I need to make some word frequencies using
    > defined words like "buy", "red", "good". If some file don't have that
    > word will get "0" for this frequency. It shoud be stored in array. If
    > I have alredy got frequencies for every file in folder, my array wrote
    > to text file.


    This sounds suspiciously like a homework assignment.
    I don't think you'll get much help for this one, unless
    you show some code you wrote yourself already with a specific
    question about problems you're having....

    --Irmen
     
    Irmen de Jong, Apr 4, 2007
    #2
    1. Advertising

  3. Guest

    > This sounds suspiciously like a homework assignment.
    > I don't think you'll get much help for this one, unless
    > you show some code you wrote yourself already with a specific
    > question about problems you're having....


    Well you have some right. I will make it more specific.
    I have got something like that:

    import os, os.path

    def wyswietlanie_drzewa(dir_path):
    #function is reading folders and sub folders until it gets to a file.
    for name in os.listdir(dir_path):
    full_path = os.path.join(dir_path, name)
    print full_path
    if os.path.isdir(full_path):
    wyswietlanie_drzewa(full_path)

    My question is how to get word frequencies from this files?
    I will be glad to get any help.

    Krisbee
     
    , Apr 4, 2007
    #3
  4. Terry Reedy Guest

    <> wrote in message
    news:...
    |
    | My question is how to get word frequencies from this files?
    | I will be glad to get any help.

    Go to
    http://groups.google.com/group/comp.lang.python/topics
    and search on "count word frequency" and you will find several previous
    posts on this topic.

    tjr
     
    Terry Reedy, Apr 4, 2007
    #4
  5. 7stud Guest

    On Apr 4, 2:07 pm, wrote:
    > My question is how to get word frequencies from this files?
    > I will be glad to get any help.
    >


    --files have a read(), readline(), and readlines() method
    --strings have a split() method, which splits the string on
    whitespace(e.g. spaces)
    --lists have a count() method
     
    7stud, Apr 5, 2007
    #5
  6. <> wrote:

    > > This sounds suspiciously like a homework assignment.
    > > I don't think you'll get much help for this one, unless
    > > you show some code you wrote yourself already with a specific
    > > question about problems you're having....

    >
    > Well you have some right. I will make it more specific.
    > I have got something like that:
    >
    > import os, os.path
    >
    > def wyswietlanie_drzewa(dir_path):
    > #function is reading folders and sub folders until it gets to a file.
    > for name in os.listdir(dir_path):
    > full_path = os.path.join(dir_path, name)
    > print full_path
    > if os.path.isdir(full_path):
    > wyswietlanie_drzewa(full_path)
    >
    > My question is how to get word frequencies from this files?
    > I will be glad to get any help.


    You may want to consider os.walk as an alternative way to get all files;
    it's easy to wrap it into a generator yielding all files in the subtree.

    This, I would think, is the proper factoring in Python: have a generator
    yielding each file, and a function taking a file and returning the word
    frequencies for that one file. This neatly separates the two halves of
    the task -- and you can easily factor things down further...

    Give a text file, you can iterate on it: the items are the lines. Given
    a line, you can extract all words in it and iterate on those: look at
    the re module, and the \w feature of regular-expression pattern strings.
    So, a generator that turns a file into a stream of words is also an easy
    sub-task to accomplish.

    Given a stream of words, and a set of "interesting words", it's easy to
    count the occurrences of interesting words. There, I'll supply that
    part, to entice you to write the others, and thereby perhaps learn some
    Python...:

    def count_interesting_words(all_words, interesting_words):
    d = dict.fromkeys(interesting_words, 0)
    for word in all_words:
    if word in d: d[word] += 1
    return d


    Alex
     
    Alex Martelli, Apr 5, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    2,211
    Ricardo
    Jun 23, 2006
  2. robert

    numpy: frequencies

    robert, Nov 18, 2006, in forum: Python
    Replies:
    2
    Views:
    782
    Tim Hochberg
    Nov 18, 2006
  3. Replies:
    7
    Views:
    410
    Gabriel Genellina
    Sep 25, 2007
  4. Histogram of character frequencies

    , Dec 1, 2007, in forum: C Programming
    Replies:
    44
    Views:
    1,181
    Peter 'Shaggy' Haywood
    Dec 11, 2007
  5. Steve Mauldin
    Replies:
    0
    Views:
    139
    Steve Mauldin
    Feb 6, 2004
Loading...

Share This Page