Intermittent Failure on Serial Port

H

H J van Rooyen

Hi All,

I am writing a polling controller for an RS-485 line that has
several addressable devices connected.
It is a small access control system.
All is well- the code runs for anything from three hours to three days, then
sometimes when I get a comms error and have to send out a nak character, it
fails hard...

The traceback below pops up. - the first lines are just some debug prints.
and the four records show reader number, id number, name...

_______________start of Konsole Messages __________________

/dev/ttyS0 set to 9600 sane cread raw -echo
.../logs/composite/rawlog
Pipe exists already
we get here - thread identity is: 1079298992
New Thread identity printed by new thread is: 1079298992
we get here too
5 0123456789012345 Sefie Sewenstein is in
2 DE0000085ABF8A01 Error record - catch her, catch him
2 8A00000870BEDE01 Bertus Bierdrinker is in
5 0123456789012345 Sefie Sewenstein is out
Traceback (most recent call last):
File "portofile.py", line 232, in ?
ret_val = main_routine(port, pollstruct, pfifo)
File "portofile.py", line 108, in main_routine
send_nak(port, timeout) # so bad luck - comms error
File "/home/hvr/Polling/lib/readerpoll.py", line 125, in send_nak
port.flush()
IOError: [Errno 29] Illegal seek
close failed: [Errno 29] Illegal seek

_______________end of Konsole Messages ___________________


The additional thread referred to is used to maintain a file of
people on the premises,- it is not relevant to this,
as it just writes out a global dictionary when
something has changed, and it works quite merrily...

Some of what I think are the relevant code snippets:

This one works all the time, as I send out an ack on the
succesful receipt of a message from one of the readers, and
I have not seen it fail yet:

def send_ack(port, timeout):
"""Routine to send out an ack on port"""

ack_string = ack # Ascii ack = '\x06'
s = ''

port.write(ack_string)
port.flush()
flush_out(port,timeout) # eat the echoed ack

This one is called after a comms error, and sometimes falls over in the above
manner...

def send_nak(port, timeout):
"""Routine to send out a nak on port"""

nak_string = nak # Ascii nak = '\x15'
s = ''

port.write(nak_string)
port.flush()
flush_out(port, timeout) # eat the echoed nak


# here we read to end, to flush a port buffer

def flush_out(file, time_out):
"""Reads the port till no more chars come in for a while"""

start_time = time.time()
s = ''
while (time.time() - start_time < time_out):
s, ret_val = read_char(file, s)
if ret_val == 0: # if there is input...
start_time = time.time()


# We make a function to read a char of file data

def read_char(file, s):
"""Reads file data returns string, ret_val is 0 on success, 1 for KbdInt, 2
no input."""

ret_val = 0 # No error yet
k = '' # Nothing read yet
try:
k = file.read(1) # read one char in as a string
except KeyboardInterrupt:
print "\n^C - Returning to Shell - Error is:", KeyboardInterrupt
ret_val = 1 # ^C while waiting for input
return k, ret_val # go out on error
except IOError:
ret_val = 2 # IOError on input - no record available
return s, ret_val # so no extra chars
if k == '': # empty string is nfg
ret_val = 2
return s, ret_val # so no extra chars
s = s + k # something to add in to passed buffer
return s, ret_val # ok exit no error


Note that the file is unblocked with some magic from fcntl module...
The file is the serial port /dev/ttyS0
There is hardware connected to the port that has the effect of a loopback - you
hear what you say..

Out of the box distribution - SuSe 10:

Python 2.4.1 (#1, Sep 13 2005, 00:39:20)
[GCC 4.0.2 20050901 (prerelease) (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Where can I find out what the Errno 29 really means?
Is this Python, the OS or maybe hardware?

Any Ideas or suggestions will be appreciated

(I am doing this via email - so I am not on line all the time- so my responses
may be slow...)

- Hendrik van Rooyen
 
S

Serge Orlov

H said:
Traceback (most recent call last):
File "portofile.py", line 232, in ?
ret_val = main_routine(port, pollstruct, pfifo)
File "portofile.py", line 108, in main_routine
send_nak(port, timeout) # so bad luck - comms error
File "/home/hvr/Polling/lib/readerpoll.py", line 125, in send_nak
port.flush()
IOError: [Errno 29] Illegal seek
close failed: [Errno 29] Illegal seek


Where can I find out what the Errno 29 really means?
Is this Python, the OS or maybe hardware?

It is from kernel: grep -w 29 `locate errno`
/usr/include/asm-generic/errno-base.h: #define ESPIPE 29
/* Illegal seek */

man lseek:

ERRORS:
ESPIPE fildes is associated with a pipe, socket, or FIFO.

RESTRICTIONS:
Linux specific restrictions: using lseek on a tty device
returns ESPIPE.
 
H

H J van Rooyen

Serge Orloff wrote:

| H J van Rooyen wrote:
| > Traceback (most recent call last):
| > File "portofile.py", line 232, in ?
| > ret_val = main_routine(port, pollstruct, pfifo)
| > File "portofile.py", line 108, in main_routine
| > send_nak(port, timeout) # so bad luck - comms error
| > File "/home/hvr/Polling/lib/readerpoll.py", line 125, in send_nak
| > port.flush()
| > IOError: [Errno 29] Illegal seek
| > close failed: [Errno 29] Illegal seek
| >
|
|
| > Where can I find out what the Errno 29 really means?
| > Is this Python, the OS or maybe hardware?
|
| It is from kernel: grep -w 29 `locate errno`
| /usr/include/asm-generic/errno-base.h: #define ESPIPE 29
| /* Illegal seek */
|
| man lseek:
|
| ERRORS:
| ESPIPE fildes is associated with a pipe, socket, or FIFO.
|
| RESTRICTIONS:
| Linux specific restrictions: using lseek on a tty device
| returns ESPIPE.


Thanks for the info - so the Kernel sometimes bombs me out - does anybody know
why the python flush sometimes calls lseek?

- Hendrik
 
S

Serge Orlov

H said:
Serge Orloff wrote:

| H J van Rooyen wrote:
| > Traceback (most recent call last):
| > File "portofile.py", line 232, in ?
| > ret_val = main_routine(port, pollstruct, pfifo)
| > File "portofile.py", line 108, in main_routine
| > send_nak(port, timeout) # so bad luck - comms error
| > File "/home/hvr/Polling/lib/readerpoll.py", line 125, in send_nak
| > port.flush()
| > IOError: [Errno 29] Illegal seek
| > close failed: [Errno 29] Illegal seek
| >
|
|
| > Where can I find out what the Errno 29 really means?
| > Is this Python, the OS or maybe hardware?
|
| It is from kernel: grep -w 29 `locate errno`
| /usr/include/asm-generic/errno-base.h: #define ESPIPE 29
| /* Illegal seek */
|
| man lseek:
|
| ERRORS:
| ESPIPE fildes is associated with a pipe, socket, or FIFO.
|
| RESTRICTIONS:
| Linux specific restrictions: using lseek on a tty device
| returns ESPIPE.


Thanks for the info - so the Kernel sometimes bombs me out - does anybody know
why the python flush sometimes calls lseek?

I thought it was your own flush method. If it is file.flush method that
makes the issue more complicated, since stdlib file.flush doesn't call
lseek method. I suggest you run your program using strace to log system
calls, without such log it's pretty hard to say what's going on. The
most interesting part is the end, but make sure you have enough space
for the whole log, it's going to be big.
 
H

H J van Rooyen

Serge Orlof wrote:

| H J van Rooyen wrote:
| > Serge Orloff wrote:
| >
| > | H J van Rooyen wrote:
| > | > Traceback (most recent call last):
| > | > File "portofile.py", line 232, in ?
| > | > ret_val = main_routine(port, pollstruct, pfifo)
| > | > File "portofile.py", line 108, in main_routine
| > | > send_nak(port, timeout) # so bad luck - comms error
| > | > File "/home/hvr/Polling/lib/readerpoll.py", line 125, in send_nak
| > | > port.flush()
| > | > IOError: [Errno 29] Illegal seek
| > | > close failed: [Errno 29] Illegal seek
| > | >
| > |
| > |
| > | > Where can I find out what the Errno 29 really means?
| > | > Is this Python, the OS or maybe hardware?
| > |
| > | It is from kernel: grep -w 29 `locate errno`
| > | /usr/include/asm-generic/errno-base.h: #define ESPIPE 29
| > | /* Illegal seek */
| > |
| > | man lseek:
| > |
| > | ERRORS:
| > | ESPIPE fildes is associated with a pipe, socket, or FIFO.
| > |
| > | RESTRICTIONS:
| > | Linux specific restrictions: using lseek on a tty device
| > | returns ESPIPE.
| >
| >
| > Thanks for the info - so the Kernel sometimes bombs me out - does anybody
know
| > why the python flush sometimes calls lseek?
|
| I thought it was your own flush method. If it is file.flush method that
| makes the issue more complicated, since stdlib file.flush doesn't call
| lseek method. I suggest you run your program using strace to log system
| calls, without such log it's pretty hard to say what's going on. The
| most interesting part is the end, but make sure you have enough space
| for the whole log, it's going to be big.

Thanks - I will research and use the strace - havent used it before - I have
about 30 gig disk space left...
Trouble is that the silly thing works for anything from some hours to some days
before it falls over - ugly...
Will bleat again when I have some more results...

In the meantime I have a datascope attached to the line, and it appears as if it
was on the point of sending a nak in response to a perfectly well formed
message - almost as if either an interrupt was missed - unlikely at 9600 baud
and a pentium 3 at some 2 GHz - or there is a weird hardware error - also
unlikely - hardware normally just breaks, does not work for millions for chars
and miss one... - I dont like the implication...

I also dont really understand the second reference - to a close that failed -
anyway we have to wait for the trace...


- Hendrik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top