Hashtable or Btree?

Eloff · Dec 23, 2004

I've got 100MB of urls organized by domain and then by document. I
thought that a hastable of hastables or a btree of btrees would be a
good way to lookup a specific url quickly by first finding the domain
and then finding the matching document. What do you think would be
better? And do you have any implementation you recommend?

Thanks, and merry Christmas folks.

-Dan

Ivan Vecerina · Dec 23, 2004

Eloff said:
I've got 100MB of urls organized by domain and then by document. I
thought that a hastable of hastables or a btree of btrees would be a
good way to lookup a specific url quickly by first finding the domain
and then finding the matching document. What do you think would be
better?

Probably hash table.
Note that in this case it is potentially unnecessary to make a
hash table of hash tables - hasing the full URL once might do as well.

And do you have any implementation you recommend?

The pre-standard hash_map or unordered_map that your standard library
implementation is very likely to include should work well.

If anything was to be optimized, it is probably more on the side
of the string storage... eventually.

hth -Ivan

Dave O'Hearn · Dec 23, 2004

Eloff said:
I've got 100MB of urls organized by domain and then by document.
I thought that a hastable of hastables or a btree of btrees
would be a good way to lookup a specific url quickly by first
finding the domain and then finding the matching document. What
do you think would be better? And do you have any implementation
you recommend?

I don't see why you would need containers of containers. Your key could
be the (domain, document) tuple or just the URL, for either hashtables
or btrees.

None of the C++ standard containers sound suitable for this. hash_map
is not standard, and while std::map is, and probably does use a tree,
it is not going to be a btree. Also, those sorts of containers would
build data structures in memory, rather than operating on the file
directly. I don't of know of any popular libraries that would do what
you need either. Most people would just use a database; you could try
Google for "embedded database". I found some in C, but not C++.

Cy Edmunds · Dec 23, 2004

Eloff said:
I've got 100MB of urls organized by domain and then by document. I
thought that a hastable of hastables or a btree of btrees would be a
good way to lookup a specific url quickly by first finding the domain
and then finding the matching document. What do you think would be
better? And do you have any implementation you recommend?

Thanks, and merry Christmas folks.

-Dan

I recommend you put your data in a database first. Then you can use any
programming language to search as you please using SQL queries. A good free
database is mysql:

http://www.mysql.com/

Martin Stettner · Dec 26, 2004

Dave said:
...
I don't see why you would need containers of containers. Your key could
be the (domain, document) tuple or just the URL, for either hashtables
or btrees.

If you have many documents per domain, it could make sense to organize
the data this way in order to reduce memory consuption.

...
you need either. Most people would just use a database; you could try
Google for "embedded database". I found some in C, but not C++.

sqlite (www.sqlite.org) may be a good choice ...

greetings
Martin

BTree examples ??	12	Jun 27, 2003
Building a team quickly	2	Jun 19, 2020
How to get education and coding job coming from abroad starting new in the US? Advice of courses or places to look?	2	May 18, 2023
Hashtable or array of structs?	2	Aug 5, 2005
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Sorted indexes in Python	0	Oct 17, 2008
Hashtable to Data List?	3	Feb 19, 2005
Hash table Implementation	3	Mar 29, 2011

Hashtable or Btree?

Eloff

Ivan Vecerina

Dave O'Hearn

Cy Edmunds

Martin Stettner

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads