[Spread-users] Communicating between 130 hosts with Spread

Jim Hague jim.hague at laicatc.com
Thu Aug 8 11:27:54 EDT 2013

On Thursday 08 Aug 2013 14:23:41 John Schultz wrote:
> > I see that Spread always sets SO_KEEPALIVE on a TCP daemon connection.
> > Rather than fiddle with system-wide keep alive parameters, since both
> > our target systems support TCP_KEEP* I'd be tempted to add those for a
> > custom SP_connect build in our distribution.
> In 4.2, we turned on SO_KEEPALIVE on both sides of the connection, the
> daemon and the client, that activates TCP's keep alive semantics. 

OK. I only looked back as far as 4.2 sources.

> The problem is that OS defaults for these timeouts are typically useless, as
> has been pointed out, and need to be adjusted to make this actually useful
> for fault detection.  Unfortunately, the OS parameters that control the
> keep alive behavior are typically system wide and usually can't be set on
> a per connection basis for some reason.
> I'm not sure I understood what you are suggesting as an alternative?

Only that it appears from the man page (i.e. I've not actually tried this) 
that since 2.4 kernels on Linux it has been possible to set the keep alive 
parameters on a per-socket basis using TCP_KEEPCNT, TCP_KEEPIDLE and 
TCP_INTVL. It also looks like these are available on AIX.

Since that covers both our deployment platforms, I was proposing to modify 
SP_connect() in a deployed version of the library to set the effective timeout 
to something more a bit quicker than the defaults. I don't want to have to 
modify these system-wide if I can avoid it, in case I break something else 
installed on the machines.
Jim Hague
jim.hague at laicatc.com              Never trust a computer you can't lift.
LAIC Aktiengesellschaft            +44 1865 980647    Mob +44 7941 697732

This message contains confidential (and possible privileged) information and
is for the named addressee or its intended recipients and others may not,
disclose, distribute, copy or use it. If you have received this
communication in error please:
1. tell LAIC either by return e-mail or by telephoning us on
   +44 (0) 1342 321 873; and
2. delete the e-mail message and any copies.

Whilst we have taken steps to ensure that this message (and any attachments
or hyperlinks contained within it) are free from computer viruses and the
like, the recipient is responsible for ensuring that it is actually virus
free before opening it.

More information about the Spread-users mailing list