[Spread-users] cost of failures

Wed Oct 5 13:39:24 EDT 2005

Hello,

We have been working with Spread and are trying to understand the
worst case behavior when a group member fails. In particular, I'm
curious how long should it take for the group to recognize that a
member has failed and to agree upon a new group membership without the
failed member. That is, how long might a process need to wait between
a failure and receiving a new membership message.

I'm guessing that some of the factors in play here will be the type of
network, the number of members and their locations, the spread
timeout_* values, and the timing of the failure in respect to the
protocol steps. Are there some aspects that are so dominant that we
can practically ignore the others?

I can run some tests but I suspect that there is a higher-level
insight lurking here.

         thanks for your help,
          Paul Rubel