[Spread-users] cost of failures

John Schultz jschultz at spreadconcepts.com
Wed Oct 5 15:20:16 EDT 2005


Paul Rubel wrote:

>What mechanism do the daemons use to find membership? Is it a
>heartbeat that times out or something else?
>  
>
Spread uses a token ring to maintain its current daemon membership and 
to recover messages.  Recoveries are detected either through overhearing 
LAN traffic or in WAN through probes.  When recovered daemons are 
discovered an attempt is made at building a new daemon membership / 
ring.  Failures are detected by token losses.

>The case I'm curious about would be essentially a fail-stop failure of
>a host, which takes down a group member and its daemon but where there
>is no malicious activity taking place. 
>  
>
In that scenario the token ring would be broken and the other Spread 
daemons in the ring would discover this very quickly.  They would then 
rebuild a ring without the failed daemon.  With default timeout settings 
this could take between 5 and 30 seconds depending on whether you are 
running in LAN or WAN mode, which is determined by the IP structure of 
your daemons.  Spread runs in LAN mode if all the IPs are within a 
single class B address space, otherwise it runs in WAN mode, which has 
higher timeouts.

---
John Schultz
Spread Concepts LLC
Phn:  301 498 3233
Cell: 443 838 2200





More information about the Spread-users mailing list