[Spread-users] 1 problem with Spread.pm and 1 problem with spread daemon, was

Tom Mornini tmornini at infomania.com
Thu Jan 17 21:47:21 EST 2002


On Thursday, January 17, 2002, at 03:00 PM, John David Duncan wrote:

> My spread error condition recurred again today -- under relatively heavy
> traffic, spread seemed to stop.  I could not connect with spuser to
> spread on any server, but I could get output from spmonitor.  I had to
> kill and restarted spread on all three servers in the segment.

We are also continuing to have problems. Changes to our system have 
prevented (for the moment)
Spread from actually hanging up, so I cannot provide further spmonitor 
output.

However, I have done some careful harness testing and have discovered 
two interesting situations. Both situations were discovered while using 
the Spread.pm module.

1) Spread.pm Perl module returns a Perl undef from multicast() in some 
circumstances. I've seen it when I call multicast() repeatedly without 
calling receive() in circumstances when there are incoming errors 
messages. It *might* be related to issue #2 below, however.

2) Somewhat more mysteriously, receive() only processes get 
spontaneously disconnected and receive() returns a CONNECTION_CLOSED 
correctly whenever a receiver is hammered continuously from more than 1 
process on the same box. The sending processes DO NOT get disconnected, 
however.

What would cause this? Is this a matter of not emptying the queue fast 
enough and buffers overflowing? If so, it would seem better to me to 
block on multicast() in the senders under this circumstance, subject to 
the connect() timeout value, of course.

A debug ALL output from the Spread daemon itself can be found here:

http://www.mornini.com/spread.log.gz

The offending event seems to be summarized by these lines (prefixed by 
line numbers):

344821:[Fri 18 Jan 2002 00:52:53] Sess_write: killing mbox 9 for not 
reading
347827:[Fri 18 Jan 2002 00:52:53] Sess_kill: killing session r0-9 
( mailbox 9 )
347895:[Fri 18 Jan 2002 00:52:53] G_handle_kill: #r0-9#localhost is 
killed
347896:[Fri 18 Jan 2002 00:52:53] G_handle_kill in GOP

--
-- Tom Mornini
-- eWingz Systems, Inc.






More information about the Spread-users mailing list