[Spread-users] spread killed session, why?

Wed May 27 13:38:31 EDT 2009

The most common reason a client is disconnected from a daemon is that
the client is not reading its messages fast enough.  Or, from the
opposite perspective, that too much traffic is being sent to its
groups.  In a multicast environment it is quite easy for senders to
overwhelm receivers.  Spread will buffer up to about 1000 messages
for a client by default.  If more than 1000 messages pile up on the
daemon for a client because it is reading too slowly, then it will
kick that client.

You can raise this limit (e.g. - 10000) by changing the #define
MAX_SESSION_MESSAGES in spread_params.h and recompiling your daemon.
However, if your senders persistently or occasionally send faster
than your receivers can receive, then the problem can occur again.
The complete answer to this problem is to coordinate your senders and
receivers through application level flow control.

Cheers!
John

---
John Lane Schultz
Spread Concepts LLC
Phn: 443 838 2200 
Fax: 301 560 8875

Monday, May 25, 2009, 1:08:46 PM, you wrote:

> We observed the following scenario:

> The spread.log reports that a connection to one of the group members
> has been killed but we don't understand the reason.

> This spread daemon is running at the same host as the affected group
> member application process. The application process is running
> continuously for many days and is still running without problems. At
> the same time several other application processes at the same machine
> had problemfree spread connections.

> One second after the kill message in spread.log our application
> process failed sending a SAFE_MESS message using spread because of an
> sperrNo="Illegal session" error. One second later our application
> performed a reconnect and the spread communication is working fine.

> We do not understand why the spread daemon decided to terminate the
> connection to the application process. Could it be caused by an
> overload situation?

> In which cases will spread terminate a connection?
> How does spread monitor the reachability of a registered process
> running at the same machine?

> Any hint is greatly appreciated.
> Kind regards,
> Martin
> --
> spread-Version: 3.17.4 with perl, Platform: Solaris 2.10
> spread.log:
> 2009-05-13 02:20:17 GMT G_handle_kill: #RD02#host327 is killed
> 2009-05-13 02:20:17 GMT G_handle_kill in GOP
> 2009-05-13 02:20:17 GMT G_handle_kill: Mask for group E:RD02 set to 0 0 0 11
> 2009-05-13 02:20:17 GMT G_handle_kill: Mask for group EC set to 0 0 0 ff
> 2009-05-13 02:20:19 GMT G_handle_join: #RD02#host327 joins group E:RD02
> 2009-05-13 02:20:19 GMT G_handle_join in GOP
> 2009-05-13 02:20:19 GMT G_handle_join: Mask for group E:RDM2 set to 0 0 0 11
> 2009-05-13 02:20:19 GMT G_handle_join: #RD02#host327 joins group EC
> 2009-05-13 02:20:19 GMT G_handle_join in GOP

> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users