Hash structure or Binary Search Tree...

E

exonic

I'm reposting this, as everyone went off on the "did this say c++"
tangent, when someone thought he was reading a different group. Hope I get
better results this time.

I need an efficient structure to store a counter and a series of IP
addresses. I would like to use something that's not going to slow down
once given large amounts of data. I'd like it to run as close to O(n) as
possible.

The main goal of this project is to create a daemon, which receives
data delimited by a new line, hashes the data, looks it up and increments
the associated counter. If the counter reaches a configured threshold,
send back 1 to the client, otherwise send 0.

Any comments/questions would be much appreciated...

Thank You,
Sig
 
C

CBFalconer

exonic said:
....snip ...

I need an efficient structure to store a counter and a series of
IP addresses. I would like to use something that's not going to
slow down once given large amounts of data. I'd like it to run
as close to O(n) as possible.

The main goal of this project is to create a daemon, which
receives data delimited by a new line, hashes the data, looks it
up and increments the associated counter. If the counter reaches a
configured threshold, send back 1 to the client, otherwise send 0.

Any comments/questions would be much appreciated...

You could use the hashlib.zip library. This reduces your problem
to optimizing the hash functions, and the status feedback from the
library will help you do this. It will run O(1), which is
considerably better than O(n). See:

<http://cbfalconer.home.att.net/download/>

A slight modification of the wdfreq demo program should do it.
 
A

Artie Gold

exonic said:
I'm reposting this, as everyone went off on the "did this say c++"
tangent, when someone thought he was reading a different group. Hope I get
better results this time.

Well, you really have a data structure/algorithm question as opposed to
a question about the C language. The newsgroup
would be the right place to ask. (And no, the fact that you're writing
this in C is not relevant.)
I need an efficient structure to store a counter and a series of IP
addresses. I would like to use something that's not going to slow down
once given large amounts of data. I'd like it to run as close to O(n) as
possible.

O(n)? You're joking, right? ;-) You'd achieve O(n) by using a linked
list, the most naive possible approach. A balanced binary tree would
give you O(log n) -- and a hash table O(1) (though, the devil is inthe
constants).
The main goal of this project is to create a daemon, which receives
data delimited by a new line, hashes the data, looks it up and increments
the associated counter. If the counter reaches a configured threshold,
send back 1 to the client, otherwise send 0.

Daemons are also not topical here. Going to
would be appropriate.
Any comments/questions would be much appreciated...
<OT>
In a situation like this, the likely access pattern of the IP addresses
can be quite significant. A web search on some of the following might be
fruitful:

Red-black trees
AVL trees
Splay trees
Skip lists
</OT>

In any event, *please* read the FAQ at:

http://www.eskimo.com/~scs/C-faq/top.html

HTH,
--ag
 
J

Jack Klein

I'm reposting this, as everyone went off on the "did this say c++"
tangent, when someone thought he was reading a different group. Hope I get
better results this time.

I need an efficient structure to store a counter and a series of IP
addresses. I would like to use something that's not going to slow down
once given large amounts of data. I'd like it to run as close to O(n) as
possible.

The main goal of this project is to create a daemon, which receives
data delimited by a new line, hashes the data, looks it up and increments
the associated counter. If the counter reaches a configured threshold,
send back 1 to the client, otherwise send 0.

Any comments/questions would be much appreciated...

Thank You,
Sig

Why would anyone bother to hash (IPV4) IP addresses? Each and every
distinct IP address is guaranteed to fit into an unsigned long.
 
C

CBFalconer

Jack said:
.... snip ...

Why would anyone bother to hash (IPV4) IP addresses? Each and every
distinct IP address is guaranteed to fit into an unsigned long.

Because he only wants a subset of all possible addresses,
dependant on what were used. Suitable structures include an
adaptive hash table (hashlib) and a linked list. Only the hash
table offers O(1) access times AFAICT.

The hash table solution also automatically adapts to IPV6
addresses, and can easily handle input in either binary (32 or 48
bit words) or text strings.
 
J

Jeremy Yallop

CBFalconer said:
Because he only wants a subset of all possible addresses,
dependant on what were used. Suitable structures include an
adaptive hash table (hashlib) and a linked list. Only the hash
table offers O(1) access times AFAICT.

A trie offers O(1) access, plus it keeps the data ordered.

Jeremy.
 
E

Eric Sosman

CBFalconer said:
Does it? I would have expected O(logN) offhand.

<off-topic>

It's a different "N." A trie takes O(log N) where
N is the length of a single key, not the number of keys
stored. If all keys have the same N, log N is a constant
and the retrieval time is thus O(1).

A few months back somebody on a different newsgroup
was proposing to store about 100000 IPv4 addresses in
four-level trie with 256-way branching at each node.
A few back-of-the-envelope computations convinced him
this would be an extravagantly wasteful data structure:
the top two levels would be full or nearly so, but the
bottom two levels (the bulk of the trie) would be almost
empty, averaging less than two keys apiece.

See also Knuth "The Art of Computer Programming,
Volume III: Sorting and Searching," Section 6.3.

</off-topic>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top