select() call and filedescriptor out of range in select error

Discussion in 'Python' started by k3xji, Sep 16, 2010.

  1. k3xji

    k3xji Guest

    Hi all,

    We have a select-based server written in Python. Occasionally, maybe
    twice a month there occurs a weird problem, select() returns with
    filedescriptor out of range in select() error. This is of course a
    normal error and handled gracefully. Our policy is to take down few
    users for select() to handle the next cycle. However, once this error
    occurs, this also fails too:

    self.__Sockets.remove(socket)

    self.__Socket's is the very basic list of sockets we use in our IO
    loop. The call fails with:
    remove(x): x not in list

    First of all, in our entire application there is no line of code like
    remove(x), meaning there is no x variable. Second, the Exception shows
    the line number containing above code. So
    self.__Sockets.remove(socket) this fails with remove(x): x not in
    list....

    I cannot understand the problem. It happens in sporadic manner and it
    feels that the ValueError of select() call somehow corrupts the List
    structure itself in Python? Not sure if something like that is
    possible.

    Thanks in advance,
    k3xji, Sep 16, 2010
    #1
    1. Advertising

  2. k3xji

    Ned Deily Guest

    In article
    <>,
    k3xji <> wrote:
    > We have a select-based server written in Python. Occasionally, maybe
    > twice a month there occurs a weird problem, select() returns with
    > filedescriptor out of range in select() error. This is of course a
    > normal error and handled gracefully. Our policy is to take down few
    > users for select() to handle the next cycle. However, once this error
    > occurs, this also fails too:
    >
    > self.__Sockets.remove(socket)
    >
    > self.__Socket's is the very basic list of sockets we use in our IO
    > loop. The call fails with:
    > remove(x): x not in list
    >
    > First of all, in our entire application there is no line of code like
    > remove(x), meaning there is no x variable. Second, the Exception shows
    > the line number containing above code. So
    > self.__Sockets.remove(socket) this fails with remove(x): x not in
    > list....
    >
    > I cannot understand the problem. It happens in sporadic manner and it
    > feels that the ValueError of select() call somehow corrupts the List
    > structure itself in Python? Not sure if something like that is
    > possible.


    That error message is a generic exception message. It just means the
    object to be removed is not in the list. For example:

    >>> l = [a, b]
    >>> a, b = 1, 2
    >>> l = [a, b]
    >>> l.remove(a)
    >>> l.remove(a)

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: list.remove(x): x not in list

    If the problem is that the socket object in question no longer exists,
    you can protect your code there by enclosing the remove operation in a
    try block, like:

    try:
    self.__Sockets.remove(socket)
    except ValueError:
    pass

    --
    Ned Deily,
    Ned Deily, Sep 16, 2010
    #2
    1. Advertising

  3. On Wed, 15 Sep 2010 21:05:49 -0700, k3xji wrote:

    > Hi all,
    >
    > We have a select-based server written in Python. Occasionally, maybe
    > twice a month there occurs a weird problem, select() returns with
    > filedescriptor out of range in select() error. This is of course a
    > normal error and handled gracefully. Our policy is to take down few
    > users for select() to handle the next cycle. However, once this error
    > occurs, this also fails too:
    >
    > self.__Sockets.remove(socket)
    >
    > self.__Socket's is the very basic list of sockets we use in our IO loop.
    > The call fails with:
    > remove(x): x not in list



    Please show the *exact* error message, including the traceback, by
    copying and pasting it. Do not retype it by hand, or summarize it, or put
    it into your own words.



    > First of all, in our entire application there is no line of code like
    > remove(x), meaning there is no x variable.


    Look at this example:

    >>> sockets = []
    >>> sockets.remove("Hello world")

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ValueError: list.remove(x): x not in list


    "x" is just a placeholder. It doesn't refer to an actual variable x.


    > Second, the Exception shows
    > the line number containing above code. So self.__Sockets.remove(socket)
    > this fails with remove(x): x not in list....


    Exactly.



    > I cannot understand the problem. It happens in sporadic manner and it
    > feels that the ValueError of select() call somehow corrupts the List
    > structure itself in Python? Not sure if something like that is possible.


    Anything is possible, but it's not likely. What's far more likely is that
    you have a bug in your code, and that somehow, under rare circumstances,
    it tries to remove something from a list that was never inserted into the
    list. Or it tries to remove it twice.

    My guess is something like this:

    try:
    socket = get_socket()
    self._sockets.append(socket)
    except SomeError:
    pass
    # later on
    self._sockets.remove(socket)




    --
    Steven
    Steven D'Aprano, Sep 16, 2010
    #3
  4. k3xji

    James Mills Guest

    On Thu, Sep 16, 2010 at 2:49 PM, Ned Deily <> wrote:
    > If the problem is that the socket object in question no longer exists,
    > you can protect your code there by enclosing the remove operation in a
    > try block, like:



    The question that remains to be seen however is:

    Why does your list contain dirty data ? Your code has likely removed
    the socket object from the list before, why is it attempting to remove
    it again ?

    I would consider you re-look at your code's logic rather than patch
    up the code with a "band-aid-solution".

    cheers
    James


    --
    -- James Mills
    --
    -- "Problems are solved by method"
    James Mills, Sep 16, 2010
    #4
  5. k3xji

    k3xji Guest

    > Please show the *exact* error message, including the traceback, by
    > copying and pasting it. Do not retype it by hand, or summarize it, or put
    > it into your own words.


    Unfortunately this is not possible. The logging system I designed only
    gives the following information, as we have millions of logs per-day
    of custom exceptions I didnot include the full traceback.Here is only
    what I have:

    1448) 15/09/10 20:02:08 - [*] ERROR: Physical max client limit
    reached. Please contact maintenance.filedescriptor out of range in
    select()[scSocketServer.py:215:][Port:515]

    The code generating the error is:

    try:
    self.__ReadersInCycle, self.__WritersInCycle,
    e = \
    select( self.__Sockets,
    self.__WritersInCycle, [],
    base.scOptions.scOPT_SELECT_TIMEOUT)

    except ValueError, e:
    LogError('Physical max client limit reached.'
    \
    ' Please contact maintenance.'+ str(e))
    self.scSvr_OnClientPhysicalLimitReached()
    #define a policy here
    continue

    > > First of all, in our entire application there is no line of code like
    > > remove(x), meaning there is no x variable.

    >
    > Look at this example:
    >
    > >>> sockets = []
    > >>> sockets.remove("Hello world")

    >
    > Traceback (most recent call last):
    >   File "<stdin>", line 1, in <module>
    > ValueError: list.remove(x): x not in list
    >


    Ok. Thanks.

    > Anything is possible, but it's not likely. What's far more likely is that
    > you have a bug in your code, and that somehow, under rare circumstances,
    > it tries to remove something from a list that was never inserted into the
    > list. Or it tries to remove it twice.
    >
    > My guess is something like this:
    >
    > try:
    >     socket = get_socket()
    >     self._sockets.append(socket)
    > except SomeError:
    >     pass
    > # later on
    > self._sockets.remove(socket)
    >


    Hmm.. Might be, but inside the self.__Sockets list there is the
    ListenSocket() which is the real listening socket. Naturally, I am
    using it in the read list of select() on every server cycle. The weird
    thing is that the ListenSocket itself is throwing the "not in list"
    exception, too! And one thing I am sure is that I have not written any
    kind of code that removes the Listen socket from the List, that is
    just impossible. Additionaly, there are very few places that I
    traverse the __Sockets list for optimization. The only places I delete
    something from the __Sockets list:

    1) a user disconnects (normal disconnect, authentication or ping
    timeout)
    3) server is being stopped or restarted

    Other than that there is not access to that variable from outside
    objects, as can be seen it is also private. And please keep in mind
    that this bug is there for about a year, so many code reviews have
    passed successfully without noticing the type of error you are
    suggesting.

    And more information on system: I am running Python 2.4 on CentOS.

    By the way, through digging the logs and system, it turns out
    select(..) is hitting the per-process FD limit. Although the system
    wide ulimit is unlimited, I think Python "selectmodule.c" enforces
    the rule to 1024. I am getting the error after hitting that limit and
    somehow as I just explained the __ListenSocket is being removed from
    the read list which causes it to be lost and Server instance is just
    lost forever. Putting a try..except to that code and re-init server
    port is a solution but I guess a bad one, because I will have not
    found the root cause.

    Thanks in advance,
    k3xji, Sep 16, 2010
    #5
  6. On Thu, 16 Sep 2010 15:51:38 +1000, James Mills wrote:

    > On Thu, Sep 16, 2010 at 2:49 PM, Ned Deily <> wrote:
    >> If the problem is that the socket object in question no longer exists,
    >> you can protect your code there by enclosing the remove operation in a
    >> try block, like:

    >
    >
    > The question that remains to be seen however is:
    >
    > Why does your list contain dirty data ? Your code has likely removed the
    > socket object from the list before, why is it attempting to remove it
    > again ?
    >
    > I would consider you re-look at your code's logic rather than patch up
    > the code with a "band-aid-solution".


    Well said.


    --
    Steven
    Steven D'Aprano, Sep 16, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paolo Invernizzi

    filedescriptor out of range in select()

    Paolo Invernizzi, Jun 26, 2003, in forum: Python
    Replies:
    0
    Views:
    896
    Paolo Invernizzi
    Jun 26, 2003
  2. Andrew Bennetts

    Re: filedescriptor out of range in select()

    Andrew Bennetts, Jun 26, 2003, in forum: Python
    Replies:
    5
    Views:
    5,059
    Paolo Invernizzi
    Jun 30, 2003
  3. Replies:
    1
    Views:
    314
    Jack Klein
    Sep 16, 2006
  4. Laszlo Nagy
    Replies:
    0
    Views:
    373
    Laszlo Nagy
    Mar 17, 2009
  5. Laszlo Nagy
    Replies:
    0
    Views:
    433
    Laszlo Nagy
    Mar 17, 2009
Loading...

Share This Page