Generate English words from a dictionary?

A

Alexander

What choices do I have if I need to be able to generate English words,
or check whether a word exists (like in spell checking)? Language
preferred - C or C++.
 
Ö

Öö Tiib

What choices do I have if I need to be able to generate English words,
or check whether a word exists (like in spell checking)? Language
preferred - C or C++.

Spell checkers are part of such well known open source software like
Mozilla Firefox and OpenOffice.org . Probably in C or C++. Just
download source code, take and use, what holds you back?
 
J

Juha Nieminen

Alexander said:
What choices do I have if I need to be able to generate English words,
or check whether a word exists (like in spell checking)? Language
preferred - C or C++.

I find this question odd. What is the problem you are seeing? Because
to me it seems kind of *trivial*. Take a big list of words (you can find
extensive lists for free with a little bit of googling), put then sorted
in an array, and that's about it. You can search the array or index it
randomly to get a random word.

(Of course if what you want is for the dictionary to take as little
memory as possible, then that's a completely different optimization
problem, one which has entire books written about.)
 
A

Alexander

I find this question odd. What is the problem you are seeing? Because
to me it seems kind of *trivial*. Take a big list of words (you can find
extensive lists for free with a little bit of googling), put then sorted
in an array, and that's about it. You can search the array or index it
randomly to get a random word.

(Of course if what you want is for the dictionary to take as little
memory as possible, then that's a completely different optimization
problem, one which has entire books written about.)

Yes, first thing I want is to allocate dynamically a few mb's of
memory, and find a good way to convince the user to wait for the
allocation in case the program evem succeeds.

I wanted to know what the practice is - it is quite common, but not
neccesairily trivial - even without storing the file in memory
(stupid) I have to find some quick enough algorithm to search for a
given word in it.
 
V

Victor Bazarov

[..]
I wanted to know what the practice is - it is quite common, but not
neccesairily trivial - even without storing the file in memory
(stupid) I have to find some quick enough algorithm to search for a
given word in it.

There are two kinds of people when it comes to dictionaries: those that
use the wheels available today on the market, and those that re-invent
their own. If asked, I would guess that the former kind is more
numerous. If you want to improve the existing wheels, the best way is
to get a job at one of the existing wheel suppliers.

V
 
J

Juha Nieminen

Alexander said:
Yes, first thing I want is to allocate dynamically a few mb's of
memory, and find a good way to convince the user to wait for the
allocation in case the program evem succeeds.

I don't get it. Are you possibly talking about some embedded or
hand-held system with a few tens of kilobytes of RAM, or something?

I have made programs for the iPhone which use full-sized English (and
other language) dictionaries. Loading the dictionary into memory takes
a fraction of a second (even though the dictionary data is actually
*compressed* in the flash drive, so there's a decompression to memory
involved; it still takes just a fraction of a second).

On a desktop computer you could probably load such a dictionary into
memory an order of magnitude faster (not to talk that its size in memory
is inconsequential because desktop computers have typically at least an
order of magnitude more RAM avilable for apps than the iPhone).

Loading and using a full dictionary isn't such a heavy operation in
modern systems (even hand-held ones).
I wanted to know what the practice is - it is quite common, but not
neccesairily trivial - even without storing the file in memory
(stupid) I have to find some quick enough algorithm to search for a
given word in it.

Binary search is the simplest (especially in C++ because the standard
library offers it) and certainly fast enough.
 
J

Jorgen Grahn

The generation part somehow got lost in the discussion. Did that mean
forming new words according to rules, e.g. black, blacker, blackest?
If so I recommend a free library like aspell, ispell, or whatever they
are called.
Yes, first thing I want is to allocate dynamically a few mb's of
memory, and find a good way to convince the user to wait for the
allocation in case the program evem succeeds.

If allocating memory takes noticeable time, your system is probably
swapping heavily, and *everything* takes noticeable time.
I wanted to know what the practice is - it is quite common, but not
neccesairily trivial - even without storing the file in memory
(stupid) I have to find some quick enough algorithm to search for a
given word in it.

The first step up from "search a text file" is something like
BerkeleyDB, which is like a std::map or std::tr1::unordered_map stored
on disk. Feed the words into one of those, and you can look them up
quickly later.

Or check what the Unix look(1) command does.

/Jorgen
 
J

Juha Nieminen

Jorgen Grahn said:
The first step up from "search a text file" is something like
BerkeleyDB, which is like a std::map or std::tr1::unordered_map stored
on disk. Feed the words into one of those, and you can look them up
quickly later.

What's wrong with having the words sorted in an array and using
binary search? It's not like an English dictionary is going to change
during the execution of a typical program...
 
J

Jorgen Grahn

What's wrong with having the words sorted in an array and using
binary search?

Nothing. It's more I/O than using a BerkeleyDB though, if you are
looking up just a few words. Depends on what he's going to do.

The Unix look(1) command I mentioned is interesting. I traced the
Linux version. It just mmap()s "/usr/share/dict/words" (a sorted text
file), does some magic, and finds the prefix you look for. I bet it
does some kind of modified binary search directly in the mapped file.
It's not like an English dictionary is going to change
during the execution of a typical program...

That's one hint that DB may be overkill, yes. You buy a feature you
don't use.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top