Binary search tree

M

maxim.novak

Hi,

I have to get list of URLs one by one and to find the URLs that I have
more than one time(can't be more than twice).

I thought to put them into binary search tree, this way they'll be
sorted and I'll be able to check if the URL already exist.

Couldn't find any python library that implements trees.
Is there some library of this kind in python? Or can I find it
somewhere else?
 
L

Larry Bates

Hi,

I have to get list of URLs one by one and to find the URLs that I have
more than one time(can't be more than twice).

I thought to put them into binary search tree, this way they'll be
sorted and I'll be able to check if the URL already exist.

Couldn't find any python library that implements trees.
Is there some library of this kind in python? Or can I find it
somewhere else?
Put them into a set. You can check for existence (very fast) and at the end it
is easy to sort.

-Larry
 
N

Neil Cerutti

Put them into a set. You can check for existence (very fast)
and at the end it is easy to sort.

The set is likely the way to go.

Python's library support for binary search trees consists of the
bisect module.
 
D

D.Hering

Hi,

I have to get list of URLs one by one and to find the URLs that I have
more than one time(can't be more than twice).

I thought to put them into binary search tree, this way they'll be
sorted and I'll be able to check if the URL already exist.

Couldn't find any python library that implements trees.
Is there some library of this kind in python? Or can I find it
somewhere else?

Can you use set() or set.difference()?
 
B

Bruno Desthuilliers

(e-mail address removed) a écrit :
Hi,

I have to get list of URLs one by one and to find the URLs that I have
more than one time(can't be more than twice).

I thought to put them into binary search tree, this way they'll be
sorted and I'll be able to check if the URL already exist.

What about a set ?

s = set()
for url in urls:
if url in s:
print "already have ", url
else:
set.add(url)
 
M

Michel Albert

(e-mail address removed) a écrit :




What about a set ?

s = set()
for url in urls:
if url in s:
print "already have ", url
else:
set.add(url)

Interesting. For this case I usually used dicts. As in:

d = {}
for url in urls:
if url in d:
print "already have ", url
else:
d = 1

Now, I can see that this method has some superfluous data (the `1`
that is assigned to the dict). So I suppose this is less memory
efficient. But is this slower then? As both implementations use hashes
of the URL to access the data. Just asking out of curiosity ;)
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Now, I can see that this method has some superfluous data (the `1`
that is assigned to the dict). So I suppose this is less memory
efficient. But is this slower then? As both implementations use hashes
of the URL to access the data. Just asking out of curiosity ;)

Performance-wise, there is no difference in the implementations.
What matters when comparing programs one-by-one is how many method
calls you need. In this example, the dictionary is slightly faster
in my measurements, since for the set, you need to perform a lookup
of .add, whereas the access to __setitem__ for the dict need no
additional dictionary lookup.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top