two interesting data structure/algorithm questions

Discussion in 'C++' started by de, Nov 16, 2007.

  1. de

    de Guest

    First of all, sorry for cross posting. Couldn't find a dedicated
    newgroup for this.

    My friend asked me two questions about data structure and algorithms.
    I think she got them from some job interview books. I felt they are
    interesting but haven't found good solutions yet. Could someone give
    some suggestions?

    Q1: Suppose you are looking up a contact name on your cell phone's
    address book. Suppose your contact's name is "abc123". Then you type
    "a" in the search box. The cell phone shows all names start with
    letter "a" in sorted order. Then you type "b" after the first "a". The
    cell phone shows all names start with letters "ab" in sorted order.
    Then you type "c" after the "ab" and the cell phone shows all names
    starts with letters "abc" in sorted order. You keep typing the contact
    name and the list shown by your cell phone keeps updating and
    shrinking untill at last either you find your contact name or you
    don't.
    Question: what's the best data structure and algorithm to implement
    this real-time lookup effeciently and what's the time complexity of
    the implementation?

    Q2: Suppose you have a book written in English. Your goal is to search
    for an arbitrary string and return the page numbers where this string
    is found. There could be more than one page number returned if the
    string is cross page or if it's found on multiple pages. For example,
    you have a string "this is an " on page 20 and then " interesting
    book" on page 21. The string you are going to search for could be
    "this" or "this is" or "is an" or " an interesting" or "an interesting
    book". Then you'll need to return page numbers [20], [20], [20],
    [20,21], [20,21] respectively. Of course, if you look for "book", it
    could return hundreds of page numbers since "book" is really common.
    Question: what's the best data structure and algorithm to implement
    this real-time lookup effeciently and what's the time complexity of
    the implementation?
    I can think out using hashes but since the string to look up can be
    arbitrary, the hash table will be huge...
     
    de, Nov 16, 2007
    #1
    1. Advertising

  2. de said:

    > First of all, sorry for cross posting. Couldn't find a dedicated
    > newgroup for this.
    >
    > My friend asked me two questions about data structure and algorithms.


    comp.programming might have been a better place.

    > I think she got them from some job interview books. I felt they are
    > interesting but haven't found good solutions yet. Could someone give
    > some suggestions?
    >
    > Q1: Suppose you are looking up a contact name on your cell phone's
    > address book. Suppose your contact's name is "abc123". Then you type
    > "a" in the search box. The cell phone shows all names start with
    > letter "a" in sorted order. Then you type "b" after the first "a". The
    > cell phone shows all names start with letters "ab" in sorted order.
    > Then you type "c" [etc]
    > Question: what's the best data structure and algorithm to implement
    > this real-time lookup effeciently and what's the time complexity of
    > the implementation?


    You want a trie for that. A lookup costs O(log n).

    > Q2: Suppose you have a book written in English. Your goal is to search
    > for an arbitrary string and return the page numbers where this string
    > is found. There could be more than one page number returned if the
    > string is cross page or if it's found on multiple pages. For example,
    > you have a string "this is an " on page 20 and then " interesting
    > book" on page 21. The string you are going to search for could be
    > "this" or "this is" or "is an" or " an interesting" or "an interesting
    > book". Then you'll need to return page numbers [20], [20], [20],
    > [20,21], [20,21] respectively. Of course, if you look for "book", it
    > could return hundreds of page numbers since "book" is really common.
    > Question: what's the best data structure and algorithm to implement
    > this real-time lookup effeciently and what's the time complexity of
    > the implementation?


    You might want to ask Google about this. It's their area of expertise.

    --
    Richard Heathfield <http://www.cpax.org.uk>
    Email: -http://www. +rjh@
    Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
    "Usenet is a strange place" - dmr 29 July 1999
     
    Richard Heathfield, Nov 16, 2007
    #2
    1. Advertising

  3. de

    James Kanze Guest

    On Nov 16, 7:29 pm, de <> wrote:
    > First of all, sorry for cross posting. Couldn't find a dedicated
    > newgroup for this.


    > My friend asked me two questions about data structure and algorithms.
    > I think she got them from some job interview books. I felt they are
    > interesting but haven't found good solutions yet. Could someone give
    > some suggestions?


    > Q1: Suppose you are looking up a contact name on your cell phone's
    > address book. Suppose your contact's name is "abc123". Then you type
    > "a" in the search box. The cell phone shows all names start with
    > letter "a" in sorted order. Then you type "b" after the first "a". The
    > cell phone shows all names start with letters "ab" in sorted order.
    > Then you type "c" after the "ab" and the cell phone shows all names
    > starts with letters "abc" in sorted order. You keep typing the contact
    > name and the list shown by your cell phone keeps updating and
    > shrinking untill at last either you find your contact name or you
    > don't.
    > Question: what's the best data structure and algorithm to implement
    > this real-time lookup effeciently and what's the time complexity of
    > the implementation?


    If the question is about algorithms, the answer they are looking
    for is probably a trie, but if I were writing the program in
    C++, I'd actually probably use a sorted std::vector, with
    std::upper_bound and std::lower_bound (appending a '\0' to the
    end of the string when I call lower_bound, and a '\xFF' for
    upper_bound), switching to a straight linear search once the
    difference between top and bottom fell beneath a certain point.
    The results will use a lot less memory, and probably be faster
    as well. And of course, most of the code is already written:).

    > Q2: Suppose you have a book written in English. Your goal is to search
    > for an arbitrary string and return the page numbers where this string
    > is found. There could be more than one page number returned if the
    > string is cross page or if it's found on multiple pages. For example,
    > you have a string "this is an " on page 20 and then " interesting
    > book" on page 21. The string you are going to search for could be
    > "this" or "this is" or "is an" or " an interesting" or "an interesting
    > book". Then you'll need to return page numbers [20], [20], [20],
    > [20,21], [20,21] respectively. Of course, if you look for "book", it
    > could return hundreds of page numbers since "book" is really common.
    > Question: what's the best data structure and algorithm to implement
    > this real-time lookup effeciently and what's the time complexity of
    > the implementation?
    > I can think out using hashes but since the string to look up can be
    > arbitrary, the hash table will be huge...


    Too large to fit into memory, or even onto a large disk.

    The question is how the data is organized. If the text is in a
    single array (std::vector<char>, or whatever), and the page
    boundaries are maintained as a separate table of positions in
    the array (and everything fits in memory), the fastest solution
    is probably a BM search to find the string, then a binary search
    (std::lower_bound and std::upper_bound again) to find the page
    numbers. (Unless you're getting a hit every page, execution
    time will be dominated by the string search, so organizing the
    data as above, to optimize this, is probably the best way to go.
    And if the strings being searched are reasonably long, a BM
    search will beat anything in the standard library, perhaps even
    by an order of magnitude or more. So it's worth it, even though
    you have to write it yourself.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Nov 16, 2007
    #3
  4. de

    user923005 Guest

    On Nov 16, 10:29 am, de <> wrote:
    > First of all, sorry for cross posting. Couldn't find a dedicated
    > newgroup for this.
    >
    > My friend asked me two questions about data structure and algorithms.
    > I think she got them from some job interview books. I felt they are
    > interesting but haven't found good solutions yet. Could someone give
    > some suggestions?
    >
    > Q1: Suppose you are looking up a contact name on your cell phone's
    > address book. Suppose your contact's name is "abc123". Then you type
    > "a" in the search box. The cell phone shows all names start with
    > letter "a" in sorted order. Then you type "b" after the first "a". The
    > cell phone shows all names start with letters "ab" in sorted order.
    > Then you type "c" after the "ab" and the cell phone shows all names
    > starts with letters "abc" in sorted order. You keep typing the contact
    > name and the list shown by your cell phone keeps updating and
    > shrinking untill at last either you find your contact name or you
    > don't.
    > Question: what's the best data structure and algorithm to implement
    > this real-time lookup effeciently and what's the time complexity of
    > the implementation?


    Fast algorithms for sorting and searching strings

    Full text Pdf (1.10 MB)
    Source Symposium on Discrete Algorithms archive
    Proceedings of the eighth annual ACM-SIAM symposium on Discrete
    algorithms table of contents

    New Orleans, Louisiana, United States
    Pages: 360 - 369
    Year of Publication: 1997
    ISBN:0-89871-390-0
    Authors Jon L. Bentley Bell Labs, Lucent Technologies, 700 Mountain
    Avenue, Murray Hill., NJ
    Robert Sedgewick Princeton University, Princeton, NJ

    Sponsors SIGACT: ACM Special Interest Group on Algorithms and
    Computation Theory
    SIAM : Society for Industrial and Applied Mathematics

    Publisher Society for Industrial and Applied Mathematics
    Philadelphia, PA, USA


    > Q2: Suppose you have a book written in English. Your goal is to search
    > for an arbitrary string and return the page numbers where this string
    > is found. There could be more than one page number returned if the
    > string is cross page or if it's found on multiple pages. For example,
    > you have a string "this is an " on page 20 and then " interesting
    > book" on page 21. The string you are going to search for could be
    > "this" or "this is" or "is an" or " an interesting" or "an interesting
    > book". Then you'll need to return page numbers [20], [20], [20],
    > [20,21], [20,21] respectively. Of course, if you look for "book", it
    > could return hundreds of page numbers since "book" is really common.
    > Question: what's the best data structure and algorithm to implement
    > this real-time lookup effeciently and what's the time complexity of
    > the implementation?
    > I can think out using hashes but since the string to look up can be
    > arbitrary, the hash table will be huge...


    http://clucene.sourceforge.net/index.php/Main_Page

    HTH

    Follow-up set to news:comp.programming
     
    user923005, Nov 16, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bdong
    Replies:
    0
    Views:
    316
    bdong
    Oct 6, 2004
  2. yee young han
    Replies:
    2
    Views:
    503
    Jerry Coffin
    May 22, 2005
  3. Jane Austine
    Replies:
    14
    Views:
    793
    Dennis Lee Bieber
    Oct 9, 2004
  4. Jane Austine
    Replies:
    2
    Views:
    458
    Changjune Kim
    Oct 5, 2004
  5. de
    Replies:
    3
    Views:
    458
    user923005
    Nov 16, 2007
Loading...

Share This Page