[Spread-users] Two guys... one group... different group_ids

Ryan Caudy rcaudy at gmail.com
Wed Apr 6 19:51:49 EDT 2005


There were changes introduced to the group IDs in 3.17.2.  It was
observed that VS Sets were being calculated incorrectly.  We needed to
introduce some changes to support them properly.

The short version is that we added membership IDs and group IDs based
on them for transitional configurations.  This correctly allows us to
handle the case when there are multiple underlying daemon membership
changes, but only one group membership change is completed.

It could be that there are bugs related to this code, although I have
not observed them if so.

There was one protocol layer bug fixed in 3.17.3 that could cause
something like this, by causing AGREED messages (like lightweight
membership messages) to be delivered in different orders with respect
to transitional signals.

A few questions that may help diagnose the source of the issue:

Were there any daemon membership changes, either recently before app1
was killed, or while it was down?  If so, what was the sequence of
events, and what were the Membership IDs printed along with the
configurations?  If not, what were the membership IDs printed with the
last configuration installed?

Cheers,
Ryan

On Apr 6, 2005 4:54 PM, Theo Schlossnagle <jesus at omniti.com> wrote:
> Hello all --
> 
> Weird spread membership issue.
> 
> So, I have an app that's been working for a while... It instructs a
> group of people to "do something" and I does that by sp_multicasting in
> it's view of the group.  It prepends the group_id it gets from the last
> membership message it received on that group.
> 
> The message is sent SAFE.  So, I expect people to get the messa and the
> group id to match (unless the group has altered before delivery.
> 
> I have two boxes running Spread 3.17.2.  A client on each box... today
> my app started behaving as follows:
> 
> app1/box1 and app2/box2 are in a group together...
> 
> app1 is killed.
> 
> app2 sees a membership change and a new GID [173044807/1112795673/17]
> with only app2 in it.
> 
> app1 is restarted.
> 
> app2 sees a membership change and a new GID [173044807/1112795673/18]
> with app1 and app2.
> 
> app1 sees a membership change and a new GID [173044807/1112795674/18]
> with app1 and app2.
> 
> I have disabled all use of the Spread ring except for this app... and
> there are only about 5 messages going around (not including group
> membership messages).  The spread ring is stable.
> 
> It was my understanding that in this new group, they should see the same
> group ID... they do not.  this worked for a long time and suddenly today
> it stopped and I consistently get an "off by one" on
> group_id.id[1].  I'm gonna dig a bit more, but if I restart Spread and
> this "goes away," I'm gonna be pretty disturbed.
> 
> Thoughts?
> 
> Best regards,
> 
> --
> // Theo Schlossnagle
> // Principal Engineer -- http://www.omniti.com/~jesus/
> // Postal Engine -- http://www.postalengine.com/
> // Ecelerity: fastest MTA on Earth
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
>




More information about the Spread-users mailing list