[Spread-users] Membership algorithm--continued

Ryan Caudy caudy at jhu.edu
Fri Apr 16 00:11:09 EDT 2004


My responses are inlined below.  Hope this helps.


kevin Tian wrote:

> 1.
> In function Form_or_fail, which case can lead to
> Token_alive =1 and F_reps.num_reps= 1? I guess there
> is only one daemon?

I believe that you mean this code:
	if( Smallest_rep( &F_reps, &dummy ) ==  My.id )
		if( Token_alive && F_reps.num_reps == 1 )
			/* clear everything and go back to op */

This corresponds to the case where a ring representative called 
Lookup_new_members(), found that there were members not in its ring, 
shifted to GATHER to try to find them, and then timed-out without 
finding any other representatives to try to merge with.  Thus, this case 
is simply a failed attempt to increase the size of the ring from a 
currently healthy membership.

> 2.
> If daemon is not the smallest rep and token is not
> alive, why we conclude the gather process fails and
> then restart to gather?

Ideally, the smallest rep will timeout first and send a form1 token 
(because everyone else pushes their timeout back when they notice a new 
smallest rep).  If this isn't the case, then the other reps can only 
conclude that gather failed, and try again.  Of course, if there's just 
a time delay here, this doesn't necessarily hurt the membership... 
they'll handle the form1 if it comes.

> 3.
> In addition, I think the implementation do not check
> whether agreement is reached before creating and
> circulating the form token, thus violating the
> algorithm description.

I'm not sure exactly what you mean, here.  Distributed agreement is 
impossible in an asynchronous environment.  Could you explain what you 
mean by violating the algorithm description?  Basically, whenever a 
daemon gets the form token, it makes sure it can go forward with the 
membership (doing basic validity checking), and appends information to 
it (if it was a representative).  Then end result of the two form rounds 
is simply that the daemons have a circulating token, and know what the 
membership will be if all goes well, as well as the information needed 
to start the EVS stage (exchanging messages to be able to meet the needs 
of the EVS specification).  If something fails, a new attempt will be 
made.  At the end of EVS, daemons know that all of the other daemons 
have the appropriate messages, and will deliver them or crash.

> 4.
> And it seems that there is no business of regular
> group members---the actual clients do NOT participate
> in the membership algorithm. How do they get
> membership information then? E.g., Who are the clients
> of my group? Is it handled in group.c?
> I look through the code, but it is too much for me.

Correct, clients don't participate in the membership algorithm at all. 
The code for the group membership algorithm, which includes state 
exchange (when necessary), building and maintaining the list of groups, 
and client notifications (i.e. transitional and regular membership 
messages) are all done in groups.c.  If you have more questions about 
this code, I can certainly answer them.

Ryan W. Caudy
Center for Networking and Distributed Systems
Department of Computer Science
Johns Hopkins University

More information about the Spread-users mailing list