Generate English words from a dictionary?

Discussion in 'C++' started by Alexander, Aug 4, 2010.

  1. Alexander

    Alexander Guest

    What choices do I have if I need to be able to generate English words,
    or check whether a word exists (like in spell checking)? Language
    preferred - C or C++.
    Alexander, Aug 4, 2010
    #1
    1. Advertising

  2. Alexander

    Öö Tiib Guest

    On 4 aug, 20:46, Alexander <> wrote:
    > What choices do I have if I need to be able to generate English words,
    > or check whether a word exists (like in spell checking)? Language
    > preferred - C or C++.


    Spell checkers are part of such well known open source software like
    Mozilla Firefox and OpenOffice.org . Probably in C or C++. Just
    download source code, take and use, what holds you back?
    Öö Tiib, Aug 4, 2010
    #2
    1. Advertising

  3. Alexander <> wrote:
    > What choices do I have if I need to be able to generate English words,
    > or check whether a word exists (like in spell checking)? Language
    > preferred - C or C++.


    I find this question odd. What is the problem you are seeing? Because
    to me it seems kind of *trivial*. Take a big list of words (you can find
    extensive lists for free with a little bit of googling), put then sorted
    in an array, and that's about it. You can search the array or index it
    randomly to get a random word.

    (Of course if what you want is for the dictionary to take as little
    memory as possible, then that's a completely different optimization
    problem, one which has entire books written about.)
    Juha Nieminen, Aug 4, 2010
    #3
  4. Alexander

    Alexander Guest

    On Aug 4, 9:11 pm, Juha Nieminen <> wrote:
    > Alexander <> wrote:
    > > What choices do I have if I need to be able to generate English words,
    > > or check whether a word exists (like in spell checking)? Language
    > > preferred - C or C++.

    >
    > I find this question odd. What is the problem you are seeing? Because
    > to me it seems kind of *trivial*. Take a big list of words (you can find
    > extensive lists for free with a little bit of googling), put then sorted
    > in an array, and that's about it. You can search the array or index it
    > randomly to get a random word.
    >
    > (Of course if what you want is for the dictionary to take as little
    > memory as possible, then that's a completely different optimization
    > problem, one which has entire books written about.)


    Yes, first thing I want is to allocate dynamically a few mb's of
    memory, and find a good way to convince the user to wait for the
    allocation in case the program evem succeeds.

    I wanted to know what the practice is - it is quite common, but not
    neccesairily trivial - even without storing the file in memory
    (stupid) I have to find some quick enough algorithm to search for a
    given word in it.
    Alexander, Aug 4, 2010
    #4
  5. On 8/4/2010 2:23 PM, Alexander wrote:
    >[..]
    > I wanted to know what the practice is - it is quite common, but not
    > neccesairily trivial - even without storing the file in memory
    > (stupid) I have to find some quick enough algorithm to search for a
    > given word in it.


    There are two kinds of people when it comes to dictionaries: those that
    use the wheels available today on the market, and those that re-invent
    their own. If asked, I would guess that the former kind is more
    numerous. If you want to improve the existing wheels, the best way is
    to get a job at one of the existing wheel suppliers.

    V
    --
    I do not respond to top-posted replies, please don't ask
    Victor Bazarov, Aug 4, 2010
    #5
  6. Alexander <> wrote:
    > Yes, first thing I want is to allocate dynamically a few mb's of
    > memory, and find a good way to convince the user to wait for the
    > allocation in case the program evem succeeds.


    I don't get it. Are you possibly talking about some embedded or
    hand-held system with a few tens of kilobytes of RAM, or something?

    I have made programs for the iPhone which use full-sized English (and
    other language) dictionaries. Loading the dictionary into memory takes
    a fraction of a second (even though the dictionary data is actually
    *compressed* in the flash drive, so there's a decompression to memory
    involved; it still takes just a fraction of a second).

    On a desktop computer you could probably load such a dictionary into
    memory an order of magnitude faster (not to talk that its size in memory
    is inconsequential because desktop computers have typically at least an
    order of magnitude more RAM avilable for apps than the iPhone).

    Loading and using a full dictionary isn't such a heavy operation in
    modern systems (even hand-held ones).

    > I wanted to know what the practice is - it is quite common, but not
    > neccesairily trivial - even without storing the file in memory
    > (stupid) I have to find some quick enough algorithm to search for a
    > given word in it.


    Binary search is the simplest (especially in C++ because the standard
    library offers it) and certainly fast enough.
    Juha Nieminen, Aug 5, 2010
    #6
  7. Alexander

    Jorgen Grahn Guest

    On Wed, 2010-08-04, Alexander wrote:
    > On Aug 4, 9:11 pm, Juha Nieminen <> wrote:
    >> Alexander <> wrote:
    >> > What choices do I have if I need to be able to generate English words,


    The generation part somehow got lost in the discussion. Did that mean
    forming new words according to rules, e.g. black, blacker, blackest?
    If so I recommend a free library like aspell, ispell, or whatever they
    are called.

    >> > or check whether a word exists (like in spell checking)? Language
    >> > preferred - C or C++.

    >>
    >> I find this question odd. What is the problem you are seeing? Because
    >> to me it seems kind of *trivial*. Take a big list of words (you can find
    >> extensive lists for free with a little bit of googling), put then sorted
    >> in an array, and that's about it. You can search the array or index it
    >> randomly to get a random word.
    >>
    >> (Of course if what you want is for the dictionary to take as little
    >> memory as possible, then that's a completely different optimization
    >> problem, one which has entire books written about.)

    >
    > Yes, first thing I want is to allocate dynamically a few mb's of
    > memory, and find a good way to convince the user to wait for the
    > allocation in case the program evem succeeds.


    If allocating memory takes noticeable time, your system is probably
    swapping heavily, and *everything* takes noticeable time.

    > I wanted to know what the practice is - it is quite common, but not
    > neccesairily trivial - even without storing the file in memory
    > (stupid) I have to find some quick enough algorithm to search for a
    > given word in it.


    The first step up from "search a text file" is something like
    BerkeleyDB, which is like a std::map or std::tr1::unordered_map stored
    on disk. Feed the words into one of those, and you can look them up
    quickly later.

    Or check what the Unix look(1) command does.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Aug 5, 2010
    #7
  8. Jorgen Grahn <> wrote:
    > The first step up from "search a text file" is something like
    > BerkeleyDB, which is like a std::map or std::tr1::unordered_map stored
    > on disk. Feed the words into one of those, and you can look them up
    > quickly later.


    What's wrong with having the words sorted in an array and using
    binary search? It's not like an English dictionary is going to change
    during the execution of a typical program...
    Juha Nieminen, Aug 5, 2010
    #8
  9. Alexander

    Jorgen Grahn Guest

    On Thu, 2010-08-05, Juha Nieminen wrote:
    > Jorgen Grahn <> wrote:
    >> The first step up from "search a text file" is something like
    >> BerkeleyDB, which is like a std::map or std::tr1::unordered_map stored
    >> on disk. Feed the words into one of those, and you can look them up
    >> quickly later.

    >
    > What's wrong with having the words sorted in an array and using
    > binary search?


    Nothing. It's more I/O than using a BerkeleyDB though, if you are
    looking up just a few words. Depends on what he's going to do.

    The Unix look(1) command I mentioned is interesting. I traced the
    Linux version. It just mmap()s "/usr/share/dict/words" (a sorted text
    file), does some magic, and finds the prefix you look for. I bet it
    does some kind of modified binary search directly in the mapped file.

    > It's not like an English dictionary is going to change
    > during the execution of a typical program...


    That's one hint that DB may be overkill, yes. You buy a feature you
    don't use.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Aug 6, 2010
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?UmFlZCBTYXdhbGhh?=

    English/English DLL

    =?Utf-8?B?UmFlZCBTYXdhbGhh?=, Oct 15, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    1,658
    =?Utf-8?B?UmFlZCBTYXdhbGhh?=
    Oct 16, 2005
  2. IchBin
    Replies:
    1
    Views:
    756
  3. Replies:
    1
    Views:
    521
    Peter Flynn
    Jul 6, 2005
  4. Jaewoong kim Crossbreeze
    Replies:
    1
    Views:
    116
    Jaewoong kim Crossbreeze
    Apr 17, 2009
  5. rahul
    Replies:
    12
    Views:
    216
    Gunnar Hjalmarsson
    May 12, 2005
Loading...

Share This Page