[Spread-users] cost of failures
John Schultz
jschultz at spreadconcepts.com
Wed Oct 5 15:20:16 EDT 2005
Paul Rubel wrote:
>What mechanism do the daemons use to find membership? Is it a
>heartbeat that times out or something else?
>
>
Spread uses a token ring to maintain its current daemon membership and
to recover messages. Recoveries are detected either through overhearing
LAN traffic or in WAN through probes. When recovered daemons are
discovered an attempt is made at building a new daemon membership /
ring. Failures are detected by token losses.
>The case I'm curious about would be essentially a fail-stop failure of
>a host, which takes down a group member and its daemon but where there
>is no malicious activity taking place.
>
>
In that scenario the token ring would be broken and the other Spread
daemons in the ring would discover this very quickly. They would then
rebuild a ring without the failed daemon. With default timeout settings
this could take between 5 and 30 seconds depending on whether you are
running in LAN or WAN mode, which is determined by the IP structure of
your daemons. Spread runs in LAN mode if all the IPs are within a
single class B address space, otherwise it runs in WAN mode, which has
higher timeouts.
---
John Schultz
Spread Concepts LLC
Phn: 301 498 3233
Cell: 443 838 2200
More information about the Spread-users
mailing list