Program that makes a list of words.

Discussion in 'C Programming' started by lundslaktare@yahoo.com, May 13, 2008.

  1. Guest

    Maybe this is the wrong group, if so I would like to be pointed
    to a better group.
    Anyway here's the problem:

    I would need a program that makes a list of words of a text-file.

    For example take the text:

    ------------------------------------
    To be, or not to be; thats: the ques-
    tion.
    -----------------------------------------

    would return:

    be 2
    not 1
    or 1
    question 1
    thats 1
    the 1
    To 1
    to 1
    .. 1
    , 1
    : 1
    ; 1
    - 1


    As can be seen from this example, I want the program to count both
    words and
    interpunctation, and I want it to make a difference between "to" and
    "To",
    I also want it to count:

    ques
    -tion

    as

    question 1
    - 1


    Any help would be appreciated!
    , May 13, 2008
    #1
    1. Advertising

  2. writes:

    >Any help would be appreciated!


    The result you seek (homework?) is typically termed a concordance.

    --
    Chris.
    Chris McDonald, May 13, 2008
    #2
    1. Advertising

  3. Guest

    Chris McDonald skrev:
    > writes:
    >
    > >Any help would be appreciated!

    >
    > The result you seek (homework?) is typically termed a concordance.
    >
    > --
    > Chris.


    Thank you for helping me.
    Now I know what's the name is: concordance.
    I guess that's some kind of dance?

    It's not homework, I'm not a computerscientist.

    I really would appreciate if someone can give me the URL
    where I can find such a program.
    , May 13, 2008
    #3
  4. Chris McDonald, May 13, 2008
    #4
  5. Guest

    Richard Heathfield skrev:
    > said:
    >
    > > Maybe this is the wrong group, if so I would like to be pointed
    > > to a better group.

    >
    > If you were writing such a program yourself in the C language, and there
    > were some aspect of C that was puzzling you, this would be the right
    > group. It seems, however, that you want to find an existing program that
    > already meets your requirements, rather than write it yourself.
    >
    > There's nothing wrong with that, but - as you suspected - this group isn't
    > intended to meet that need. There are many groups in the comp.sources
    > hierarchy, however, and it may well be that one of those can supply your
    > need.
    >
    > If you /were/ planning to write such a program yourself, you would want to
    > start off by tackling the most difficult problems first. These are:
    >
    > 1) sorting out the logic behind punctuation. You want to treat
    >
    > ques
    > -tion
    >
    > as one hit for question and one hit for the hyphen, which is clear enough
    > and not too difficult. But what about "didn't"? Does the ' character count
    > as a separate hit? And what about "fo'c'sle"? (That's a single word,
    > according to the dictionary.) These decisions aren't difficult to make,
    > but they do have to be made. Once you've made the decisions, you would
    > need to write a parser that can enact them.
    >
    > 2) storage. You appear to want your output to be sorted, so you'd need to
    > think about a container that can regurgitate data in sorted order after
    > processing. A binary search tree is the obvious choice, but a hash table
    > might be faster if you didn't actually need the sorting. You also need to
    > think about whether you want to be able to handle arbitrarily long words.
    > If so, you'll need to handle the memory requirements yourself, using
    > malloc.
    >
    > Although it sounds like a simple enough program, you have introduced enough
    > complications to make it quite an interesting programming exercise for
    > someone with a year or two of C experience.
    >
    > If you change your mind and decide to write it yourself in C, we can
    > certainly help you with it. Otherwise, good luck in comp.sources.* (and
    > you're going to need it, partly because I believe those groups aren't very
    > popular with source providers, and partly because your requirements are
    > sufficiently different from a normal concordance program to make it quite
    > unlikely that someone can meet your needs without actually writing a
    > program just for you).
    >
    > HTH. HAND.
    >
    > --
    > Richard Heathfield <http://www.cpax.org.uk>
    > Email: -http://www. +rjh@
    > Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    > "Usenet is a strange place" - dmr 29 July 1999



    Thank you for your reply.
    The thing is, that I don't know enough of C (or any other language for
    that matter)
    that I can write such a program myself.
    I asked my Father, and he suggested that I should use microsoft-word
    to make the whole text one column,
    and then insert that column in excel and use the sorting
    function.

    But that wont work for long text(more than 65536 words).

    I think it's just natural to count
    mor-
    ning

    as morning and -

    At least you don't want to miss a word just because it's written on
    two lines.

    Do you know a simple program that don't use megs of data?
    , May 13, 2008
    #5
  6. viza Guest

    Hi

    On May 13, 9:52 am, Richard Heathfield <> wrote:
    > said:


    > > I asked my Father, and he suggested that I should use microsoft-word
    > > to make the whole text one column,
    > > and then insert that column in excel and use the sorting
    > > function.

    >
    > Blech! :)
    >
    > > But that wont work for long text(more than 65536 words).

    >
    > And it's so inelegant, too.
    >

    [snip]
    >
    > qsort(WordArray, Treecount, sizeof *WordArray, CompareWords);


    pot? kettle? :)

    Even the most rushed semi-efficient implementation would use an insert
    sort on a linked list. I bet even excel doesn't use qsort. Perhaps a
    good one would use a tree?
    viza, May 13, 2008
    #6
  7. viza wrote:
    > Hi
    >
    > On May 13, 9:52 am, Richard Heathfield <> wrote:
    >> said:

    >
    >>> I asked my Father, and he suggested that I should use microsoft-word
    >>> to make the whole text one column,
    >>> and then insert that column in excel and use the sorting
    >>> function.

    >>
    >> Blech! :)
    >>
    >>> But that wont work for long text(more than 65536 words).

    >>
    >> And it's so inelegant, too.
    >>

    > [snip]
    >>
    >> qsort(WordArray, Treecount, sizeof *WordArray, CompareWords);

    >
    > pot? kettle? :)
    >
    > Even the most rushed semi-efficient implementation would use an insert
    > sort on a linked list. I bet even excel doesn't use qsort. Perhaps a
    > good one would use a tree?

    Who says that qsort does not do an insert sort? qsort != quick sort, at
    least not neccarrily, regardless what the name implies

    Bye, Jojo
    Joachim Schmitz, May 13, 2008
    #7
  8. viza <> writes:
    > On May 13, 9:52 am, Richard Heathfield <> wrote:
    >> said:
    >> > I asked my Father, and he suggested that I should use microsoft-word
    >> > to make the whole text one column,
    >> > and then insert that column in excel and use the sorting
    >> > function.

    >>
    >> Blech! :)
    >>
    >> > But that wont work for long text(more than 65536 words).

    >>
    >> And it's so inelegant, too.
    >>

    > [snip]
    >>
    >> qsort(WordArray, Treecount, sizeof *WordArray, CompareWords);

    >
    > pot? kettle? :)
    >
    > Even the most rushed semi-efficient implementation would use an insert
    > sort on a linked list. I bet even excel doesn't use qsort. Perhaps a
    > good one would use a tree?


    Why would you expect insertion sort (O(N**2)) to be better than qsort
    (unspecified, but likely to be O(N log N))?

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, May 13, 2008
    #8
  9. viza Guest

    Hi

    On May 13, 12:55 pm, "Joachim Schmitz" <nospam.j...@schmitz-
    digital.de> wrote:
    > viza wrote:
    > > On May 13, 9:52 am, Richard Heathfield <> wrote:
    > >> said:

    >
    > >> And it's so inelegant, too.

    > > [snip]
    > >> qsort(WordArray, Treecount, sizeof *WordArray, CompareWords);

    >
    > > pot? kettle? :)

    >
    > > Even the most rushed semi-efficient implementation would use an insert
    > > sort on a linked list. I bet even excel doesn't use qsort. Perhaps a
    > > good one would use a tree?

    >
    > Who says that qsort does not do an insert sort? qsort != quick sort, at
    > least not neccarrily, regardless what the name implies


    It could perhaps, but qsort always requires all of the items to be in
    place before it starts, so solving this problem with a qsort that was
    implemented as an insert sort would double peak memory usage, and have
    to copy the whole thing back as well. In this case IMHO it makes
    sense to sort it as it is read, and the easiest way that I could think
    to do this was an insert sort into a list.

    Regards,
    viza
    viza, May 13, 2008
    #9
  10. viza Guest

    Hi

    On May 13, 1:22 pm, Richard Heathfield <> wrote:
    > viza said:
    >
    > >> qsort(WordArray, Treecount, sizeof *WordArray, CompareWords);

    >
    > > pot? kettle? :)

    >
    > I don't follow. I grabbed something off the Net and hacked it a bit to give
    > the OP a rough idea. I neither wrote the original code nor intended to
    > provide a perfect solution.


    I realized that, it was good of you to help the (OT) OP. I just
    thought that the method this code used wasn't very elegant either.

    Regards,
    viza
    viza, May 13, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Strøiman
    Replies:
    1
    Views:
    2,070
    Peter Strøiman
    Aug 23, 2005
  2. Richard Heathfield
    Replies:
    7
    Views:
    350
    Barry Schwarz
    Oct 5, 2003
  3. utab

    Words Words

    utab, Feb 16, 2006, in forum: C++
    Replies:
    6
    Views:
    415
    Daniel T.
    Feb 16, 2006
  4. BerlinBrown
    Replies:
    6
    Views:
    4,431
  5. Lasse Edsvik

    replace words with bold words

    Lasse Edsvik, Oct 5, 2003, in forum: ASP General
    Replies:
    9
    Views:
    228
Loading...

Share This Page