[Spread-users] Three members, one group, different group IDs

John Lane Schultz jschultz at spreadconcepts.com
Thu Aug 24 11:24:29 EDT 2006


John Lane Schultz wrote:
  > Alec,
> 
> That sure sounds like a bug in the way GIDs are calculated and/or 
> reported to users.  What I find interesting though is that your events 
> seem only related to a client failing and then restarting.  Usually, 
> this should only affect the gid[2] field, which is a counter that 
> reflects light weight (client) group changes.  The gid[1] field is the # 
> of seconds since the epoch at the ring representative when the last 
> heavy weight (daemon) membership was formed.  We did add a fix to 
> sometimes artificially advance that counter by one when there were 
> cascading, heavy weight membership attempts.
> 
> My hunch is that there is a bug in the code that once a heavy weight 
> membership is actually installed it doesn't forget about all of the 
> previous cascading heavy weight changes.  Then, somehow, the light 
> weight membership triggers the above mechanism and one of the daemons 
> unilaterally raises its membership ID's time field by one.
> 
> Does this always occur or is it an intermittent problem?  Does this 
> occur if you do your scenario from "scratch" (restart all the daemons 
> and try)?
> 
> The bug is surely in groups.c, most likely related to unilaterally 
> raising the time field in relation to cascading heavy weight membership 
> attempts, and we will have to track it down.
> 

Woops! I just re-read your post (got confused with Theo's) and saw that this 
occurs when a daemon dies and restarts quickly.  This could definitely trigger 
the cascading heavy weight membership handling, which sometimes raises the 
second counter of the GID.  So, I'm even more sure now that the bug is in that 
portion of the code.

Cheer!

-- 
John Schultz
Spread Concepts LLC
Phn: 443 838 2200
Fax: 301 560 8875




More information about the Spread-users mailing list