Program that makes a list of words.

L

lundslaktare

Maybe this is the wrong group, if so I would like to be pointed
to a better group.
Anyway here's the problem:

I would need a program that makes a list of words of a text-file.

For example take the text:

------------------------------------
To be, or not to be; thats: the ques-
tion.
-----------------------------------------

would return:

be 2
not 1
or 1
question 1
thats 1
the 1
To 1
to 1
.. 1
, 1
: 1
; 1
- 1


As can be seen from this example, I want the program to count both
words and
interpunctation, and I want it to make a difference between "to" and
"To",
I also want it to count:

ques
-tion

as

question 1
- 1


Any help would be appreciated!
 
L

lundslaktare

Chris McDonald skrev:
The result you seek (homework?) is typically termed a concordance.

Thank you for helping me.
Now I know what's the name is: concordance.
I guess that's some kind of dance?

It's not homework, I'm not a computerscientist.

I really would appreciate if someone can give me the URL
where I can find such a program.
 
L

lundslaktare

Richard Heathfield skrev:
(e-mail address removed) said:


If you were writing such a program yourself in the C language, and there
were some aspect of C that was puzzling you, this would be the right
group. It seems, however, that you want to find an existing program that
already meets your requirements, rather than write it yourself.

There's nothing wrong with that, but - as you suspected - this group isn't
intended to meet that need. There are many groups in the comp.sources
hierarchy, however, and it may well be that one of those can supply your
need.

If you /were/ planning to write such a program yourself, you would want to
start off by tackling the most difficult problems first. These are:

1) sorting out the logic behind punctuation. You want to treat

ques
-tion

as one hit for question and one hit for the hyphen, which is clear enough
and not too difficult. But what about "didn't"? Does the ' character count
as a separate hit? And what about "fo'c'sle"? (That's a single word,
according to the dictionary.) These decisions aren't difficult to make,
but they do have to be made. Once you've made the decisions, you would
need to write a parser that can enact them.

2) storage. You appear to want your output to be sorted, so you'd need to
think about a container that can regurgitate data in sorted order after
processing. A binary search tree is the obvious choice, but a hash table
might be faster if you didn't actually need the sorting. You also need to
think about whether you want to be able to handle arbitrarily long words.
If so, you'll need to handle the memory requirements yourself, using
malloc.

Although it sounds like a simple enough program, you have introduced enough
complications to make it quite an interesting programming exercise for
someone with a year or two of C experience.

If you change your mind and decide to write it yourself in C, we can
certainly help you with it. Otherwise, good luck in comp.sources.* (and
you're going to need it, partly because I believe those groups aren't very
popular with source providers, and partly because your requirements are
sufficiently different from a normal concordance program to make it quite
unlikely that someone can meet your needs without actually writing a
program just for you).

HTH. HAND.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999


Thank you for your reply.
The thing is, that I don't know enough of C (or any other language for
that matter)
that I can write such a program myself.
I asked my Father, and he suggested that I should use microsoft-word
to make the whole text one column,
and then insert that column in excel and use the sorting
function.

But that wont work for long text(more than 65536 words).

I think it's just natural to count
mor-
ning

as morning and -

At least you don't want to miss a word just because it's written on
two lines.

Do you know a simple program that don't use megs of data?
 
V

viza

Hi

(e-mail address removed) said:
I asked my Father, and he suggested that I should use microsoft-word
to make the whole text one column,
and then insert that column in excel and use the sorting
function.

Blech! :)
But that wont work for long text(more than 65536 words).

And it's so inelegant, too.
[snip]

qsort(WordArray, Treecount, sizeof *WordArray, CompareWords);

pot? kettle? :)

Even the most rushed semi-efficient implementation would use an insert
sort on a linked list. I bet even excel doesn't use qsort. Perhaps a
good one would use a tree?
 
J

Joachim Schmitz

viza said:
Hi

(e-mail address removed) said:
I asked my Father, and he suggested that I should use microsoft-word
to make the whole text one column,
and then insert that column in excel and use the sorting
function.

Blech! :)
But that wont work for long text(more than 65536 words).

And it's so inelegant, too.
[snip]

qsort(WordArray, Treecount, sizeof *WordArray, CompareWords);

pot? kettle? :)

Even the most rushed semi-efficient implementation would use an insert
sort on a linked list. I bet even excel doesn't use qsort. Perhaps a
good one would use a tree?
Who says that qsort does not do an insert sort? qsort != quick sort, at
least not neccarrily, regardless what the name implies

Bye, Jojo
 
K

Keith Thompson

viza said:
(e-mail address removed) said:
I asked my Father, and he suggested that I should use microsoft-word
to make the whole text one column,
and then insert that column in excel and use the sorting
function.

Blech! :)
But that wont work for long text(more than 65536 words).

And it's so inelegant, too.
[snip]

qsort(WordArray, Treecount, sizeof *WordArray, CompareWords);

pot? kettle? :)

Even the most rushed semi-efficient implementation would use an insert
sort on a linked list. I bet even excel doesn't use qsort. Perhaps a
good one would use a tree?

Why would you expect insertion sort (O(N**2)) to be better than qsort
(unspecified, but likely to be O(N log N))?
 
V

viza

Hi

viza said:
(e-mail address removed) said:
And it's so inelegant, too. [snip]
qsort(WordArray, Treecount, sizeof *WordArray, CompareWords);
pot? kettle? :)
Even the most rushed semi-efficient implementation would use an insert
sort on a linked list. I bet even excel doesn't use qsort. Perhaps a
good one would use a tree?

Who says that qsort does not do an insert sort? qsort != quick sort, at
least not neccarrily, regardless what the name implies

It could perhaps, but qsort always requires all of the items to be in
place before it starts, so solving this problem with a qsort that was
implemented as an insert sort would double peak memory usage, and have
to copy the whole thing back as well. In this case IMHO it makes
sense to sort it as it is read, and the easiest way that I could think
to do this was an insert sort into a list.

Regards,
viza
 
V

viza

Hi

viza said:



I don't follow. I grabbed something off the Net and hacked it a bit to give
the OP a rough idea. I neither wrote the original code nor intended to
provide a perfect solution.

I realized that, it was good of you to help the (OT) OP. I just
thought that the method this code used wasn't very elegant either.

Regards,
viza
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top