ulimit on open sockets ?

M

Maxim Veksler

Hi,

I've written this code, the general idea was to listen on all 65535
port of tcp for connection.
"""
#!/usr/bin/env python

import socket, select

def get_non_blocking_socket(port_number):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setblocking(0)
s.bind(('0.0.0.0', port_number))
s.listen(1)
return s

all_sockets = map(get_non_blocking_socket, xrange(10000, 15000))

while 1:
ready_to_read, ready_to_write, in_error =
select.select(all_sockets, [], [], 0)
for nb_active_socket in all_sockets:
if nb_active_socket in ready_to_read:
conn, addr = nb_active_socket.accept()
while 1:
data = conn.recv(1024)
if not data: break
conn.send(data)
conn.close()
"""

The thing is that when I tried to run this at first I got
"""
python non_blocking_range.py
Traceback (most recent call last):
File "non_blocking_range.py", line 12, in ?
all_sockets = map(get_non_blocking_socket, xrange(10000, 15000))
File "non_blocking_range.py", line 6, in get_non_blocking_socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
File "/usr/lib/python2.4/socket.py", line 148, in __init__
_sock = _realsocket(family, type, proto)
socket.error: (24, 'Too many open files')
"""

So I set ulimit -n 500000, now I'm getting
"""
python non_blocking_range.py
Traceback (most recent call last):
File "non_blocking_range.py", line 15, in ?
ready_to_read, ready_to_write, in_error =
select.select(all_sockets, [], [], 0)
ValueError: filedescriptor out of range in select()
"""

Should I be using a different version of select or something? Or
should I implement this the other way around, if so please suggest
how.

Thank you very much,
(enthusiastically learning python) Maxim.
 
B

Bjoern Schliessmann

Maxim said:
I've written this code, the general idea was to listen on all
65535 port of tcp for connection.

Please excuse the question: Why would anyone want to do such a manic
thing (instead of, e. g., using raw sockets)?

Regards,


Björn
 
A

Alex Martelli

Maxim Veksler said:
ValueError: filedescriptor out of range in select()
"""

Should I be using a different version of select or something? Or

select typically supports 1024 FDs at most (a design limit of the
underlying operating system). You may want to try poll instead (epoll
might be better but I doubt Python supports it yet).


Alex
 
M

Maxim Veksler

select typically supports 1024 FDs at most (a design limit of the
underlying operating system). You may want to try poll instead (epoll
might be better but I doubt Python supports it yet).

I've read some post the other day of a guy faced similar problem and
it turns out {e,}poll is limited as well, besides I don't know how to
use it so an example would be great.

Now, someone I work with suggested a simple work around "Pass the list
objects in groups of 1024 each time to the select.select structure". I
think it's acceptable and good advice, the thing is I don't know how
to implement this "the python way" (that is - with out it being ugly).

Can I do modulation ( % 1024 ) on the main iterator loop?
Something like:

for nb_active_socket in (all_sockets % 1024):
if nb_active_socket in ready_to_read:
conn, addr = nb_active_socket.accept()
while 1:
data = conn.recv(1024)
if not data: break
conn.send(data)
conn.close()

?

Thanks for helping,
Maxim.
 
A

Alex Martelli

On Apr 12, 2007, at 1:17 PM, Maxim Veksler wrote:
...
Now, someone I work with suggested a simple work around "Pass the list
objects in groups of 1024 each time to the select.select structure". I
think it's acceptable and good advice, the thing is I don't know how
to implement this "the python way" (that is - with out it being ugly).

I don't understand how you're going to make it work (I see no select
calls in your code and I don't understand how you'd get one in there
by polling), but I'm going to just explain how to get slices of 1024
items at a time from a long list.

Simplest way:

for i in xrange(0, len(longlist), 1024):
shortlist = longlist[i:i+1024]
# rest of the body goes here

More elegant/reusable:

def sliceby(longlist, N=1024):
for i in xrange(0, len(longlist), 1024):
yield longlist[i:i+1024]

for shortlist in sliceby(longlist):
# body goes here

If you want to show off, itertools.groupby may be suitable for that:

for _, g in itertools.groupby(enumerate(longlist), lambda (i, j): i//
1024):
shortlist = list(a for b, a in g)
# rest of the body goes here

but I wouldn't recommend it in this case for other purposes.


Alex
 
M

Maxim Veksler

On Apr 12, 2007, at 1:17 PM, Maxim Veksler wrote:
...

I don't understand how you're going to make it work (I see no select
calls in your code and I don't understand how you'd get one in there
by polling), but I'm going to just explain how to get slices of 1024
items at a time from a long list.

Thank you. I'm attaching the full code so far for reference, sadly it
still doesn't work. It seems that select.select gets it's count of
fd's not from the amount passed to it by the sub_list but from the
kernel (or whatever) count for the process; The main issue here is
that it seems I won't be able to use select for the simple
non-blocking process and am forced to check poll to see if that helps.

The error I'm getting is still the same:

# ulimit -n
500000
# python listener_sockets_range.py
Traceback (most recent call last):
File "listener_sockets_range.py", line 22, in ?
ready_to_read, ready_to_write, in_error =
select.select(select_cap_sockets, [], [], 0)
ValueError: filedescriptor out of range in select()


"""
#!/usr/bin/env python

import socket, select

def get_non_blocking_socket(port_number):
print port_number

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setblocking(0)
s.bind(('0.0.0.0', port_number))
s.listen(1)
return s

def slice_by_fd_limit(longlist, N=1024):
for i in xrange(0, len(longlist), N):
yield longlist[i:i+N]

all_sockets = map(get_non_blocking_socket, xrange(10000, 20000))

while 1:
for select_cap_sockets in slice_by_fd_limit(all_sockets):
ready_to_read, ready_to_write, in_error =
select.select(select_cap_sockets, [], [], 0)
for nb_active_socket in all_sockets:
if nb_active_socket in ready_to_read:
conn, addr = nb_active_socket.accept()
while 1:
data = conn.recv(1024)
if not data: break
conn.send(data)
conn.close()
"""
 
A

Alex Martelli

Maxim Veksler said:
Thank you. I'm attaching the full code so far for reference, sadly it
still doesn't work. It seems that select.select gets it's count of
fd's not from the amount passed to it by the sub_list but from the
kernel (or whatever) count for the process; The main issue here is

It's not a problem of COUNT of FD's, i.e., how many you're passing to
select; the problem is the value of the _highest_ number you can pass.
It's an API-level limitation, not an issue with Python per se: the
select API takes a "bit vector" of N bits, representing a set of FDs in
that way, and N is fixed at kernel-compilation time (normally to 1024).

The poll system call does not have this particular limitation, which is
why select.poll may be better for you.

Moreover, your code has other performance problems:

while 1:
for select_cap_sockets in slice_by_fd_limit(all_sockets):
ready_to_read, ready_to_write, in_error =
select.select(select_cap_sockets, [], [], 0)
for nb_active_socket in all_sockets:
if nb_active_socket in ready_to_read:

A small issue is with the last two lines -- instead of looping directly
on the small "ready-to-read" list, you're looping on the large
all_sockets one and looking each up in the small list -- that's just
throwing performance out of the window, and adding complexity, for no
benefit whatsoever.

The big issue is that you are "ceaselessly polling". If no socket is
ready to read, you force select to return immediately anyway, and
basically call select at once afterwards. You churn on the CPU without
surcease, using 100% of it, hogging it for this "busy wait", possibly to
the point of crowding out the kernel from some of the CPU time it needs
to do useful work in the TCP-IP stack. Busy-wait is a bad thing...
never call select with a timeout of 0 in a tight loop. This
recommendation also applies to the polling-object that you can build
with select.poll, and any other situation where you're waiting for
another thread or process to deliver some data -- ideally you should
wait in a blocking way, if that's unfeasible at least make sure you're
letting some time pass between such calls, by using small but non-0
timeout (or even by inserting calls to time.sleep if that's what it
takes).

The risk of such "antipatterns" is a good reason why it would be better
to use a well-designed, well-coded, well-debugged existing framework,
such as Twisted, rather than roll your own, btw. With twisted, you can
choose among many appropriate implementations of "reactor" (the key
design pattern for async prorgramming) and activate the one that is most
suitable for your needs (including, e.g., one based on epoll, which
gives better performance than poll on suitable operating systems).

If you're adamant on "rolling your own", though, you can find a Python
epoll module at <http://cheeseshop.python.org/pypi/pyepoll/0.2> (it's
said to be in alpha status, though; I believe there are other such
modules around, but pyepoll seems to be the only one on Cheese Shop).


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top