two interesting data structure/algorithm questions

Discussion in 'C++' started by de, Nov 16, 2007.

1. deGuest

First of all, sorry for cross posting. Couldn't find a dedicated
newgroup for this.

I think she got them from some job interview books. I felt they are
interesting but haven't found good solutions yet. Could someone give
some suggestions?

Q1: Suppose you are looking up a contact name on your cell phone's
"a" in the search box. The cell phone shows all names start with
letter "a" in sorted order. Then you type "b" after the first "a". The
cell phone shows all names start with letters "ab" in sorted order.
Then you type "c" after the "ab" and the cell phone shows all names
starts with letters "abc" in sorted order. You keep typing the contact
name and the list shown by your cell phone keeps updating and
shrinking untill at last either you find your contact name or you
don't.
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?

Q2: Suppose you have a book written in English. Your goal is to search
for an arbitrary string and return the page numbers where this string
is found. There could be more than one page number returned if the
string is cross page or if it's found on multiple pages. For example,
you have a string "this is an " on page 20 and then " interesting
book" on page 21. The string you are going to search for could be
"this" or "this is" or "is an" or " an interesting" or "an interesting
book". Then you'll need to return page numbers [20], [20], [20],
[20,21], [20,21] respectively. Of course, if you look for "book", it
could return hundreds of page numbers since "book" is really common.
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?
I can think out using hashes but since the string to look up can be
arbitrary, the hash table will be huge...

de, Nov 16, 2007

2. Richard HeathfieldGuest

de said:

> First of all, sorry for cross posting. Couldn't find a dedicated
> newgroup for this.
>
> My friend asked me two questions about data structure and algorithms.

comp.programming might have been a better place.

> I think she got them from some job interview books. I felt they are
> interesting but haven't found good solutions yet. Could someone give
> some suggestions?
>
> Q1: Suppose you are looking up a contact name on your cell phone's
> address book. Suppose your contact's name is "abc123". Then you type
> "a" in the search box. The cell phone shows all names start with
> letter "a" in sorted order. Then you type "b" after the first "a". The
> cell phone shows all names start with letters "ab" in sorted order.
> Then you type "c" [etc]
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?

You want a trie for that. A lookup costs O(log n).

> Q2: Suppose you have a book written in English. Your goal is to search
> for an arbitrary string and return the page numbers where this string
> is found. There could be more than one page number returned if the
> string is cross page or if it's found on multiple pages. For example,
> you have a string "this is an " on page 20 and then " interesting
> book" on page 21. The string you are going to search for could be
> "this" or "this is" or "is an" or " an interesting" or "an interesting
> book". Then you'll need to return page numbers [20], [20], [20],
> [20,21], [20,21] respectively. Of course, if you look for "book", it
> could return hundreds of page numbers since "book" is really common.
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999

Richard Heathfield, Nov 16, 2007

3. James KanzeGuest

On Nov 16, 7:29 pm, de <> wrote:
> First of all, sorry for cross posting. Couldn't find a dedicated
> newgroup for this.

> My friend asked me two questions about data structure and algorithms.
> I think she got them from some job interview books. I felt they are
> interesting but haven't found good solutions yet. Could someone give
> some suggestions?

> Q1: Suppose you are looking up a contact name on your cell phone's
> address book. Suppose your contact's name is "abc123". Then you type
> "a" in the search box. The cell phone shows all names start with
> letter "a" in sorted order. Then you type "b" after the first "a". The
> cell phone shows all names start with letters "ab" in sorted order.
> Then you type "c" after the "ab" and the cell phone shows all names
> starts with letters "abc" in sorted order. You keep typing the contact
> name and the list shown by your cell phone keeps updating and
> shrinking untill at last either you find your contact name or you
> don't.
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?

for is probably a trie, but if I were writing the program in
C++, I'd actually probably use a sorted std::vector, with
std::upper_bound and std::lower_bound (appending a '\0' to the
end of the string when I call lower_bound, and a '\xFF' for
upper_bound), switching to a straight linear search once the
difference between top and bottom fell beneath a certain point.
The results will use a lot less memory, and probably be faster
as well. And of course, most of the code is already written.

> Q2: Suppose you have a book written in English. Your goal is to search
> for an arbitrary string and return the page numbers where this string
> is found. There could be more than one page number returned if the
> string is cross page or if it's found on multiple pages. For example,
> you have a string "this is an " on page 20 and then " interesting
> book" on page 21. The string you are going to search for could be
> "this" or "this is" or "is an" or " an interesting" or "an interesting
> book". Then you'll need to return page numbers [20], [20], [20],
> [20,21], [20,21] respectively. Of course, if you look for "book", it
> could return hundreds of page numbers since "book" is really common.
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?
> I can think out using hashes but since the string to look up can be
> arbitrary, the hash table will be huge...

Too large to fit into memory, or even onto a large disk.

The question is how the data is organized. If the text is in a
single array (std::vector<char>, or whatever), and the page
boundaries are maintained as a separate table of positions in
the array (and everything fits in memory), the fastest solution
is probably a BM search to find the string, then a binary search
(std::lower_bound and std::upper_bound again) to find the page
numbers. (Unless you're getting a hit every page, execution
time will be dominated by the string search, so organizing the
data as above, to optimize this, is probably the best way to go.
And if the strings being searched are reasonably long, a BM
search will beat anything in the standard library, perhaps even
by an order of magnitude or more. So it's worth it, even though
you have to write it yourself.)

--
James Kanze (GABI Software) email:
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze, Nov 16, 2007
4. user923005Guest

On Nov 16, 10:29 am, de <> wrote:
> First of all, sorry for cross posting. Couldn't find a dedicated
> newgroup for this.
>
> My friend asked me two questions about data structure and algorithms.
> I think she got them from some job interview books. I felt they are
> interesting but haven't found good solutions yet. Could someone give
> some suggestions?
>
> Q1: Suppose you are looking up a contact name on your cell phone's
> address book. Suppose your contact's name is "abc123". Then you type
> "a" in the search box. The cell phone shows all names start with
> letter "a" in sorted order. Then you type "b" after the first "a". The
> cell phone shows all names start with letters "ab" in sorted order.
> Then you type "c" after the "ab" and the cell phone shows all names
> starts with letters "abc" in sorted order. You keep typing the contact
> name and the list shown by your cell phone keeps updating and
> shrinking untill at last either you find your contact name or you
> don't.
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?

Fast algorithms for sorting and searching strings

Full text Pdf (1.10 MB)
Source Symposium on Discrete Algorithms archive
Proceedings of the eighth annual ACM-SIAM symposium on Discrete

New Orleans, Louisiana, United States
Pages: 360 - 369
Year of Publication: 1997
ISBN:0-89871-390-0
Authors Jon L. Bentley Bell Labs, Lucent Technologies, 700 Mountain
Avenue, Murray Hill., NJ
Robert Sedgewick Princeton University, Princeton, NJ

Sponsors SIGACT: ACM Special Interest Group on Algorithms and
Computation Theory
SIAM : Society for Industrial and Applied Mathematics

Publisher Society for Industrial and Applied Mathematics

> Q2: Suppose you have a book written in English. Your goal is to search
> for an arbitrary string and return the page numbers where this string
> is found. There could be more than one page number returned if the
> string is cross page or if it's found on multiple pages. For example,
> you have a string "this is an " on page 20 and then " interesting
> book" on page 21. The string you are going to search for could be
> "this" or "this is" or "is an" or " an interesting" or "an interesting
> book". Then you'll need to return page numbers [20], [20], [20],
> [20,21], [20,21] respectively. Of course, if you look for "book", it
> could return hundreds of page numbers since "book" is really common.
> Question: what's the best data structure and algorithm to implement
> this real-time lookup effeciently and what's the time complexity of
> the implementation?
> I can think out using hashes but since the string to look up can be
> arbitrary, the hash table will be huge...

http://clucene.sourceforge.net/index.php/Main_Page

HTH

Follow-up set to news:comp.programming

user923005, Nov 16, 2007