[Spread-users] Two guys... one group... different group_ids

Yair Amir yairamir at cnds.jhu.edu
Wed Apr 6 20:39:24 EDT 2005


Hi,

While running Theo's described scenario will not create the problem
by itself, there is indeed a bug that was introduced in 3.17.2 when
the field "changed" was added to the group structure but was not
initialized correctly in the Compute_and_notify routine.

Raluca Musaloui-E discovered the behavior in the distributed systems
course at Hopkins. Together, we managed to create a relatively simple
scenario that demonstrates it (it involves a few partitions and merges),
and solved the problem. This will be included in the next Spread
version which is coming very soon.

This behavior should be fairly rare, but if you want to solve it right away,
you need to add the following line:

"new_group->changed = 0;"

immediately before the line

"new_group->num_members =0;"

in the G_compute_and_notify routine in groups.c

Cheers,

	:) Yair.



Theo Schlossnagle wrote:
> Hello all --
> 
> Weird spread membership issue.
> 
> So, I have an app that's been working for a while... It instructs a 
> group of people to "do something" and I does that by sp_multicasting in 
> it's view of the group.  It prepends the group_id it gets from the last 
> membership message it received on that group.
> 
> The message is sent SAFE.  So, I expect people to get the messa and the 
> group id to match (unless the group has altered before delivery.
> 
> I have two boxes running Spread 3.17.2.  A client on each box... today 
> my app started behaving as follows:
> 
> app1/box1 and app2/box2 are in a group together...
> 
> app1 is killed.
> 
> app2 sees a membership change and a new GID [173044807/1112795673/17] 
> with only app2 in it.
> 
> app1 is restarted.
> 
> app2 sees a membership change and a new GID [173044807/1112795673/18] 
> with app1 and app2.
> 
> app1 sees a membership change and a new GID [173044807/1112795674/18] 
> with app1 and app2.
> 
> I have disabled all use of the Spread ring except for this app... and 
> there are only about 5 messages going around (not including group 
> membership messages).  The spread ring is stable.
> 
> It was my understanding that in this new group, they should see the same 
> group ID... they do not.  this worked for a long time and suddenly today 
> it stopped and I consistently get an "off by one" on
> group_id.id[1].  I'm gonna dig a bit more, but if I restart Spread and 
> this "goes away," I'm gonna be pretty disturbed.
> 
> Thoughts?
> 
> Best regards,
> 





More information about the Spread-users mailing list