[Spread-users] cost of failures

Theo Schlossnagle jesus at omniti.com
Wed Oct 5 15:47:07 EDT 2005


John Schultz wrote:

> Paul Rubel wrote:
>
>> The case I'm curious about would be essentially a fail-stop failure of
>> a host, which takes down a group member and its daemon but where there
>> is no malicious activity taking place.  
>>
> In that scenario the token ring would be broken and the other Spread 
> daemons in the ring would discover this very quickly.  They would then 
> rebuild a ring without the failed daemon.  With default timeout 
> settings this could take between 5 and 30 seconds depending on whether 
> you are running in LAN or WAN mode, which is determined by the IP 
> structure of your daemons.  Spread runs in LAN mode if all the IPs are 
> within a single class B address space, otherwise it runs in WAN mode, 
> which has higher timeouts.
>
As an encouraging note, we see much better behvaiour during node full 
failures.  With moderate traffic (10-500 messages/second) we see full 
reconvergence inside 5 seconds.

-- 
// Theo Schlossnagle
// Principal Engineer -- http://www.omniti.com/~jesus/
// Ecelerity: Run with it. -- http://www.omniti.com/





More information about the Spread-users mailing list