two interesting data structure/algorithm questions

D

de

First of all, sorry for cross posting. Couldn't find a dedicated
newgroup for this.

My friend asked me two questions about data structure and algorithms.
I think she got them from some job interview books. I felt they are
interesting but haven't found good solutions yet. Could someone give
some suggestions?

Q1: Suppose you are looking up a contact name on your cell phone's
address book. Suppose your contact's name is "abc123". Then you type
"a" in the search box. The cell phone shows all names start with
letter "a" in sorted order. Then you type "b" after the first "a". The
cell phone shows all names start with letters "ab" in sorted order.
Then you type "c" after the "ab" and the cell phone shows all names
starts with letters "abc" in sorted order. You keep typing the contact
name and the list shown by your cell phone keeps updating and
shrinking untill at last either you find your contact name or you
don't.
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?

Q2: Suppose you have a book written in English. Your goal is to search
for an arbitrary string and return the page numbers where this string
is found. There could be more than one page number returned if the
string is cross page or if it's found on multiple pages. For example,
you have a string "this is an " on page 20 and then " interesting
book" on page 21. The string you are going to search for could be
"this" or "this is" or "is an" or " an interesting" or "an interesting
book". Then you'll need to return page numbers [20], [20], [20],
[20,21], [20,21] respectively. Of course, if you look for "book", it
could return hundreds of page numbers since "book" is really common.
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?
I can think out using hashes but since the string to look up can be
arbitrary, the hash table will be huge...
 
R

Richard Heathfield

de said:
First of all, sorry for cross posting. Couldn't find a dedicated
newgroup for this.

My friend asked me two questions about data structure and algorithms.

comp.programming might have been a better place.
I think she got them from some job interview books. I felt they are
interesting but haven't found good solutions yet. Could someone give
some suggestions?

Q1: Suppose you are looking up a contact name on your cell phone's
address book. Suppose your contact's name is "abc123". Then you type
"a" in the search box. The cell phone shows all names start with
letter "a" in sorted order. Then you type "b" after the first "a". The
cell phone shows all names start with letters "ab" in sorted order.
Then you type "c" [etc]
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?

You want a trie for that. A lookup costs O(log n).
Q2: Suppose you have a book written in English. Your goal is to search
for an arbitrary string and return the page numbers where this string
is found. There could be more than one page number returned if the
string is cross page or if it's found on multiple pages. For example,
you have a string "this is an " on page 20 and then " interesting
book" on page 21. The string you are going to search for could be
"this" or "this is" or "is an" or " an interesting" or "an interesting
book". Then you'll need to return page numbers [20], [20], [20],
[20,21], [20,21] respectively. Of course, if you look for "book", it
could return hundreds of page numbers since "book" is really common.
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?

You might want to ask Google about this. It's their area of expertise.
 
J

James Kanze

First of all, sorry for cross posting. Couldn't find a dedicated
newgroup for this.
My friend asked me two questions about data structure and algorithms.
I think she got them from some job interview books. I felt they are
interesting but haven't found good solutions yet. Could someone give
some suggestions?
Q1: Suppose you are looking up a contact name on your cell phone's
address book. Suppose your contact's name is "abc123". Then you type
"a" in the search box. The cell phone shows all names start with
letter "a" in sorted order. Then you type "b" after the first "a". The
cell phone shows all names start with letters "ab" in sorted order.
Then you type "c" after the "ab" and the cell phone shows all names
starts with letters "abc" in sorted order. You keep typing the contact
name and the list shown by your cell phone keeps updating and
shrinking untill at last either you find your contact name or you
don't.
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?

If the question is about algorithms, the answer they are looking
for is probably a trie, but if I were writing the program in
C++, I'd actually probably use a sorted std::vector, with
std::upper_bound and std::lower_bound (appending a '\0' to the
end of the string when I call lower_bound, and a '\xFF' for
upper_bound), switching to a straight linear search once the
difference between top and bottom fell beneath a certain point.
The results will use a lot less memory, and probably be faster
as well. And of course, most of the code is already written:).
Q2: Suppose you have a book written in English. Your goal is to search
for an arbitrary string and return the page numbers where this string
is found. There could be more than one page number returned if the
string is cross page or if it's found on multiple pages. For example,
you have a string "this is an " on page 20 and then " interesting
book" on page 21. The string you are going to search for could be
"this" or "this is" or "is an" or " an interesting" or "an interesting
book". Then you'll need to return page numbers [20], [20], [20],
[20,21], [20,21] respectively. Of course, if you look for "book", it
could return hundreds of page numbers since "book" is really common.
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?
I can think out using hashes but since the string to look up can be
arbitrary, the hash table will be huge...

Too large to fit into memory, or even onto a large disk.

The question is how the data is organized. If the text is in a
single array (std::vector<char>, or whatever), and the page
boundaries are maintained as a separate table of positions in
the array (and everything fits in memory), the fastest solution
is probably a BM search to find the string, then a binary search
(std::lower_bound and std::upper_bound again) to find the page
numbers. (Unless you're getting a hit every page, execution
time will be dominated by the string search, so organizing the
data as above, to optimize this, is probably the best way to go.
And if the strings being searched are reasonably long, a BM
search will beat anything in the standard library, perhaps even
by an order of magnitude or more. So it's worth it, even though
you have to write it yourself.)
 
U

user923005

First of all, sorry for cross posting. Couldn't find a dedicated
newgroup for this.

My friend asked me two questions about data structure and algorithms.
I think she got them from some job interview books. I felt they are
interesting but haven't found good solutions yet. Could someone give
some suggestions?

Q1: Suppose you are looking up a contact name on your cell phone's
address book. Suppose your contact's name is "abc123". Then you type
"a" in the search box. The cell phone shows all names start with
letter "a" in sorted order. Then you type "b" after the first "a". The
cell phone shows all names start with letters "ab" in sorted order.
Then you type "c" after the "ab" and the cell phone shows all names
starts with letters "abc" in sorted order. You keep typing the contact
name and the list shown by your cell phone keeps updating and
shrinking untill at last either you find your contact name or you
don't.
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?

Fast algorithms for sorting and searching strings

Full text Pdf (1.10 MB)
Source Symposium on Discrete Algorithms archive
Proceedings of the eighth annual ACM-SIAM symposium on Discrete
algorithms table of contents

New Orleans, Louisiana, United States
Pages: 360 - 369
Year of Publication: 1997
ISBN:0-89871-390-0
Authors Jon L. Bentley Bell Labs, Lucent Technologies, 700 Mountain
Avenue, Murray Hill., NJ
Robert Sedgewick Princeton University, Princeton, NJ

Sponsors SIGACT: ACM Special Interest Group on Algorithms and
Computation Theory
SIAM : Society for Industrial and Applied Mathematics

Publisher Society for Industrial and Applied Mathematics
Philadelphia, PA, USA

Q2: Suppose you have a book written in English. Your goal is to search
for an arbitrary string and return the page numbers where this string
is found. There could be more than one page number returned if the
string is cross page or if it's found on multiple pages. For example,
you have a string "this is an " on page 20 and then " interesting
book" on page 21. The string you are going to search for could be
"this" or "this is" or "is an" or " an interesting" or "an interesting
book". Then you'll need to return page numbers [20], [20], [20],
[20,21], [20,21] respectively. Of course, if you look for "book", it
could return hundreds of page numbers since "book" is really common.
Question: what's the best data structure and algorithm to implement
this real-time lookup effeciently and what's the time complexity of
the implementation?
I can think out using hashes but since the string to look up can be
arbitrary, the hash table will be huge...

http://clucene.sourceforge.net/index.php/Main_Page

HTH

Follow-up set to news:comp.programming
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top