[Spread-users] Three members, one group, different group IDs

Alec H. Peterson ahp at omniti.com
Thu Aug 24 11:25:08 EDT 2006


Hi John,

On Aug 24, 2006, at 8:17, John Lane Schultz wrote:

>
> That sure sounds like a bug in the way GIDs are calculated and/or  
> reported to users.  What I find interesting though is that your  
> events seem only related to a client failing and then restarting.   
> Usually, this should only affect the gid[2] field, which is a  
> counter that reflects light weight (client) group changes.  The gid 
> [1] field is the # of seconds since the epoch at the ring  
> representative when the last heavy weight (daemon) membership was  
> formed.  We did add a fix to sometimes artificially advance that  
> counter by one when there were cascading, heavy weight membership  
> attempts.
>
> My hunch is that there is a bug in the code that once a heavy  
> weight membership is actually installed it doesn't forget about all  
> of the previous cascading heavy weight changes.  Then, somehow, the  
> light weight membership triggers the above mechanism and one of the  
> daemons unilaterally raises its membership ID's time field by one.

That's the interesting part, as what is happening is that the  
'restarted' node has a gid[1] field that is one _lower_ than the  
other nodes (the other nodes have a stable gid[1]):

Before the restart, all nodes have this GID: 173044937:1156396902:2

After the restart, the restarted node has this GID:  
173044937:1156396901:2

> Does this always occur or is it an intermittent problem?  Does this  
> occur if you do your scenario from "scratch" (restart all the  
> daemons and try)?

This is always reproducable, and always happens from scratch.  I  
reproduced it a couple dozen times yesterday trying to track down the  
root cause of the problem.

> The bug is surely in groups.c, most likely related to unilaterally  
> raising the time field in relation to cascading heavy weight  
> membership attempts, and we will have to track it down.

Please let me know if I can provide any additional information.

Alec





More information about the Spread-users mailing list