result sorting in socket.getaddrinfo?

  • Thread starter Bernhard Schmidt
  • Start date
B

Bernhard Schmidt

Hello,

sorry for bothering, I'm not a programmer and I don't do much python,
I'm more a networking guy trying to get his favourite linux distribution
to update through the shiny new protocol IPv6 again (for those who are
interested, I'm talking about Gentoo Linux)

Gentoo's portage system is implemented in python calling rsync to sync
with a mirror server. There are rotations (metahostnames with many
address records) where portage has to decide to which IP it wants to
sync.

Basically the program needs a list of all available IP addresses and
will cycle through those until the sync is finished successfully.

The old code looked like that

| ips = socket.gethostbyname(hostname)[2]

if you test this for example with rsync.de.gentoo.org as hostname you
will get a list of addresses that changes its order with every call.
This behaviour is used for loadbalancing and failover through the
servers.

Now to support IPv6 addresses one has to use socket.getaddrinfo. This is
my current try (don't laugh :) ):

| ipsockets = socket.getaddrinfo(hostname,None,0,socket.SOCK_STREAM)
| for socket in ipsockets:
| if (socket[0]==10):
| ips.append('[' + socket[4][0] + ']')
| else:
| ips.append(socket[4][0])

Big problem: The result of getaddrinfo() and therefor of ips is sorted
in some whay. If you again do

| >>> socket.getaddrinfo("rsync.de.gentoo.org",None,0,socket.SOCK_STREAM);

and have a closer look at the resulting list you will observe two
things (at least I do on my box)

a) The two IPv6 addresses ([2001:andsoon]) are always in front of the
IPv4 addresses. This is expected behaviour and is consistent with
most applications/stacks supporting IPv6

b) The records within one address family (IPv4 or IPv6) are not really
in randomized order. I called it several hundred times now and the
order of the IPv6 records is always

'2001:638:500:101::21', '2001:7b0:11ff:1::1:1'

even worse, there are 15 IPv4 records in that list, and I have so far
seen only two of them at the beginning of the list.

When I debug the on-wire format of the DNS queries I can see that the
resolver server indeed answers with randomized order, so the sorting
seems to appear either somewhere in Python or somewhere in the
glibc.

The consequence of this would be, that the two servers in front of the
list would be hammered with traffic and the others idle around.

1.) Is it possible to change this behaviour?

2.) If not, does someone have a code snippet available for randomizing
the resulting list or another idea how to solve this?

Python 2.3.3 (#1, Jun 4 2004, 00:57:34)
[GCC 3.3.2 20031218 (Gentoo Linux 3.3.2-r5, propolice-3.3-7)] on linux2

[ some time later ]
Gnah, I just found a way so even I being a non-programmer (especially
regarding C) could test this ... using OpenSSH I verified that C
programs also suffer from this problem, it's not pythons fault.
Disregard question one above :)

Thanks a lot
Bernhard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top