[Spread-users] cost of failures
Theo Schlossnagle
jesus at omniti.com
Wed Oct 5 15:47:07 EDT 2005
John Schultz wrote:
> Paul Rubel wrote:
>
>> The case I'm curious about would be essentially a fail-stop failure of
>> a host, which takes down a group member and its daemon but where there
>> is no malicious activity taking place.
>>
> In that scenario the token ring would be broken and the other Spread
> daemons in the ring would discover this very quickly. They would then
> rebuild a ring without the failed daemon. With default timeout
> settings this could take between 5 and 30 seconds depending on whether
> you are running in LAN or WAN mode, which is determined by the IP
> structure of your daemons. Spread runs in LAN mode if all the IPs are
> within a single class B address space, otherwise it runs in WAN mode,
> which has higher timeouts.
>
As an encouraging note, we see much better behvaiour during node full
failures. With moderate traffic (10-500 messages/second) we see full
reconvergence inside 5 seconds.
--
// Theo Schlossnagle
// Principal Engineer -- http://www.omniti.com/~jesus/
// Ecelerity: Run with it. -- http://www.omniti.com/
More information about the Spread-users
mailing list