[Spread-users] Spread 4.0 daemon "goes to sleep" on XP network disconnect

John Robinson jr at vertica.com
Wed Aug 8 09:34:47 EDT 2007


We have noticed something similar to this.  In our case, part of the 
problem was mishandling a transitional and/or caused-by-network 
membership-change message.

We have learned a lot attaching a spuser process to the group to observe 
all the message traffic when going through network changes (or any other 
time).  You may see the messages that need to be paid attention.

If you think you observe an actual problem using spuser with your setup, 
you ought to forward the spuser output to this group.

/jr
---
Ryan Caudy wrote:
> In order to better assess the issue, you might want to create logs of
> what's going on with membership information printed.  See the sample
> spread.conf distributed with Spread, or the documentation at
> spread.org for more information on how to do this.
> 
> It sounds to me like A is starting the membership change, but failing
> to complete it, even though B and C are able to start the change and
> install a new membership in the same period of time.  During the
> execution of the membership algorithm, new client messages are blocked
> until completion.
> 
> Also, please check the regular membership messages received at clients
> on each side of the partition you're creating.  Everyone should be
> receiving A1 ... C2 in their new membership lists, but A1 and A2
> should be in one of the VS sets, and B1 ... C2 should be in the other.
>  If this is not the case then either my assumptions/understanding are
> wrong or there's incorrect behavior at the daemon.
> 
> I think the most likely scenario at this point is that there's a
> networking issue preventing A from completing the membership algorithm
> and installing its new (solo) configuration.
> 
> Cheers,
> Ryan
> 
> On 8/8/07, Steve Duff <Steve.Duff at vivista.sungard.com> wrote:
>> I'm evaluating Spread 4.0 as a possible candidate for a group
>> communication task, and I've noticed an unexpected behaviour (well,
>> unexpected to me, anyway).
>>
>> I have three machines A, B, and C each running a daemon, and with two
>> local clients (i.e. on the same machine), A1, A2, B1 etc. all subscribed
>> to the same group.
>>
>> If I pull the network cable out of machine A, daemons B and C notice
>> this and tell B1, B2, C1 and C2 of the membership change. A1 and A2 get
>> no membership change.
>>
>> As expected, messages can be exchanged between clients on B and C, but
>> when messages are sent through A1 they are not delivered anywhere (even
>> to A2). If I then reconnect A, then clients on A, B, and C all get
>> membership change messages, and any messages sent from A1 while
>> disconnected are now delivered to all clients.
>>
>> My expectation was that daemon A would eventually tell A1 and A2 that
>> they had been separated from the rest of the group, but this doesn't
>> happen. This makes the membership change message that they DO receive on
>> being reconnected seem spurious. Also the messages that are delivered on
>> reconnect are out of context and would cause problems when they are
>> delivered.
>>
>> I tried this initially on three virtual machines, but have confirmed the
>> behaviour is the same with real machines. I've tried using a single
>> spread segment, and three separate spread segments, but this also makes
>> no difference. I've also tried using two switches and separating A from
>> B and C by disconnecting the switches from each other - in that case A
>> works entirely as expected.
>>
>> I think the possibilities are:
>>
>> A) My expectation that A should separate and notify it's clients in this
>> circumstance is simply wrong, in which case could somebody please
>> explain why.
>> B) This is a "feature" caused by Windows XP networking, i.e. it doesn't
>> happen on other platforms.
>> C) It is a feature of the Spread implementation, and I need to work
>> around it at the application level.
>>
>> I would appreciate if anyone can offer any help with this?
>>
>>
>> Thanks
>> Steve
>>
>> **********************************************************************
>>
>> SunGard Vivista Limited, Marshfield, Chippenham, Wiltshire SN14 8SR
>> Telephone: 08456 041999, Fax: 08456 052999
>>
>> Registered Office: 33 St Mary Axe, London EC3A 8AA. Registered in England No. 1593831 VAT Reg No. GB 810 9546 34
>>
>> **********************************************************************
>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they
>> are addressed. If you have received this email in error please notify
>> the system manager.
>>
>> This footnote also confirms that this email message has been swept by
>> MIMEsweeper for the presence of computer viruses.
>>
>> www.mimesweeper.com
>> **********************************************************************
>>
>>
>> This message has been checked for all known viruses on behalf of SunGard Vivista by MessageLabs.
>>
>> http://www.messagelabs.com or Email: mailsweeper.info at vivista.sungard.com
>>
>> For further information http://www.sungard.com/vivista
>>
>> _______________________________________________
>> Spread-users mailing list
>> Spread-users at lists.spread.org
>> http://lists.spread.org/mailman/listinfo/spread-users
>>
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users




More information about the Spread-users mailing list