Generate English words from a dictionary?

Alexander · Aug 4, 2010

What choices do I have if I need to be able to generate English words,
or check whether a word exists (like in spell checking)? Language
preferred - C or C++.

Öö Tiib · Aug 4, 2010

What choices do I have if I need to be able to generate English words,
or check whether a word exists (like in spell checking)? Language
preferred - C or C++.

Spell checkers are part of such well known open source software like
Mozilla Firefox and OpenOffice.org . Probably in C or C++. Just
download source code, take and use, what holds you back?

Juha Nieminen · Aug 4, 2010

Alexander said:
What choices do I have if I need to be able to generate English words,
or check whether a word exists (like in spell checking)? Language
preferred - C or C++.

I find this question odd. What is the problem you are seeing? Because
to me it seems kind of *trivial*. Take a big list of words (you can find
extensive lists for free with a little bit of googling), put then sorted
in an array, and that's about it. You can search the array or index it
randomly to get a random word.

(Of course if what you want is for the dictionary to take as little
memory as possible, then that's a completely different optimization
problem, one which has entire books written about.)

Alexander · Aug 4, 2010

I find this question odd. What is the problem you are seeing? Because
to me it seems kind of *trivial*. Take a big list of words (you can find
extensive lists for free with a little bit of googling), put then sorted
in an array, and that's about it. You can search the array or index it
randomly to get a random word.

(Of course if what you want is for the dictionary to take as little
memory as possible, then that's a completely different optimization
problem, one which has entire books written about.)

Yes, first thing I want is to allocate dynamically a few mb's of
memory, and find a good way to convince the user to wait for the
allocation in case the program evem succeeds.

I wanted to know what the practice is - it is quite common, but not
neccesairily trivial - even without storing the file in memory
(stupid) I have to find some quick enough algorithm to search for a
given word in it.

Victor Bazarov · Aug 4, 2010

[..]
I wanted to know what the practice is - it is quite common, but not
neccesairily trivial - even without storing the file in memory
(stupid) I have to find some quick enough algorithm to search for a
given word in it.

There are two kinds of people when it comes to dictionaries: those that
use the wheels available today on the market, and those that re-invent
their own. If asked, I would guess that the former kind is more
numerous. If you want to improve the existing wheels, the best way is
to get a job at one of the existing wheel suppliers.

V

Juha Nieminen · Aug 5, 2010

Alexander said:
Yes, first thing I want is to allocate dynamically a few mb's of
memory, and find a good way to convince the user to wait for the
allocation in case the program evem succeeds.

I don't get it. Are you possibly talking about some embedded or
hand-held system with a few tens of kilobytes of RAM, or something?

I have made programs for the iPhone which use full-sized English (and
other language) dictionaries. Loading the dictionary into memory takes
a fraction of a second (even though the dictionary data is actually
*compressed* in the flash drive, so there's a decompression to memory
involved; it still takes just a fraction of a second).

On a desktop computer you could probably load such a dictionary into
memory an order of magnitude faster (not to talk that its size in memory
is inconsequential because desktop computers have typically at least an
order of magnitude more RAM avilable for apps than the iPhone).

Loading and using a full dictionary isn't such a heavy operation in
modern systems (even hand-held ones).

I wanted to know what the practice is - it is quite common, but not
neccesairily trivial - even without storing the file in memory
(stupid) I have to find some quick enough algorithm to search for a
given word in it.

Binary search is the simplest (especially in C++ because the standard
library offers it) and certainly fast enough.

Jorgen Grahn · Aug 5, 2010

The generation part somehow got lost in the discussion. Did that mean
forming new words according to rules, e.g. black, blacker, blackest?
If so I recommend a free library like aspell, ispell, or whatever they
are called.

Yes, first thing I want is to allocate dynamically a few mb's of
memory, and find a good way to convince the user to wait for the
allocation in case the program evem succeeds.

If allocating memory takes noticeable time, your system is probably
swapping heavily, and *everything* takes noticeable time.

I wanted to know what the practice is - it is quite common, but not
neccesairily trivial - even without storing the file in memory
(stupid) I have to find some quick enough algorithm to search for a
given word in it.

The first step up from "search a text file" is something like
BerkeleyDB, which is like a std::map or std::tr1::unordered_map stored
on disk. Feed the words into one of those, and you can look them up
quickly later.

Or check what the Unix look(1) command does.

/Jorgen

Juha Nieminen · Aug 5, 2010

Jorgen Grahn said:
The first step up from "search a text file" is something like
BerkeleyDB, which is like a std::map or std::tr1::unordered_map stored
on disk. Feed the words into one of those, and you can look them up
quickly later.

What's wrong with having the words sorted in an array and using
binary search? It's not like an English dictionary is going to change
during the execution of a typical program...

Jorgen Grahn · Aug 6, 2010

What's wrong with having the words sorted in an array and using
binary search?

Nothing. It's more I/O than using a BerkeleyDB though, if you are
looking up just a few words. Depends on what he's going to do.

The Unix look(1) command I mentioned is interesting. I traced the
Linux version. It just mmap()s "/usr/share/dict/words" (a sorted text
file), does some magic, and finds the prefix you look for. I bet it
does some kind of modified binary search directly in the mapped file.

It's not like an English dictionary is going to change
during the execution of a typical program...

That's one hint that DB may be overkill, yes. You buy a feature you
don't use.

/Jorgen

Single put routine overlapping words during iteration	4	Jan 2, 2023
Generate one HTML from API based on the object key language and their value	2	Aug 19, 2022
Translater + module + tkinter	1	Feb 16, 2023
SW to scan pdfs and generate report	1	Mar 22, 2023
Dictionary and List	1	Apr 26, 2021
How do i convert a Chinese DAT file from a game I play	2	Feb 4, 2022
How can I execute a function ONLY if fetch request returns 404 status?	0	Sep 17, 2022
How to create language translation program from dsl or bgl (by Open Source)	0	Nov 12, 2022

Generate English words from a dictionary?

Alexander

Öö Tiib

Juha Nieminen

Alexander

Victor Bazarov

Juha Nieminen

Jorgen Grahn

Juha Nieminen

Jorgen Grahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads