[Spread-users] Spread 4.0 daemon "goes to sleep" on XP network disconnect

Steve Duff Steve.Duff at vivista.sungard.com
Wed Aug 8 09:17:39 EDT 2007


Thanks for the quick reply. I've turned on Membership logging and get
the output below on A (called XP-One). In retesting this, I've noticed
that when reconnected, clients on A receive a membership change saying
that A1 and A2 are the only group members, immediately followed by
another saying the group has reformed. I hadn't noticed this in my last
test, but it seems to support your theory about a stalled membership
change. Log follows:


Handle_form2 in FORM
Memb_handle_token: handling form2 token
Handle_form2 in EVS
Memb_transitional
Memb_regular
Membership id is ( -1062731765, 1186576335)
--------------------
Configuration at XP-One is:
Num Segments 3
	1	127.0.0.1         4803
		XP-One              	192.168.0.11    
	1	127.0.0.1         4803
		XP-Two              	192.168.0.12    
	1	127.0.0.1         4803
		XP-Three            	192.168.0.13    
====================

...(XP-One is disconnected here)...

Memb_token_loss: I lost my token, state is 1
Scast_alive: State is 2
Scast_alive: State is 2
Send_join: State is 4
Send_join: State is 4
Send_join: State is 4
Send_join: State is 4
Send_join: State is 4
Memb_token_loss: I lost my token, state is 5
Scast_alive: State is 2
Scast_alive: State is 2
Send_join: State is 4
Send_join: State is 4
Send_join: State is 4
Send_join: State is 4
Send_join: State is 4
Memb_token_loss: I lost my token, state is 5
Scast_alive: State is 2
Scast_alive: State is 2
Send_join: State is 4
Send_join: State is 4
Send_join: State is 4
Send_join: State is 4
Send_join: State is 4

...(repeats continually until XP-One is reconnected, then)...

Memb_handle_token: handling form2 token
Handle_form2 in FORM
Memb_transitional
Memb_regular
Membership id is ( -1062731765, 1186576433)
--------------------
Configuration at XP-One is:
Num Segments 3
	1	127.0.0.1         4803
		XP-One              	192.168.0.11    
	0	127.0.0.1         4803
	0	127.0.0.1         4803
====================
Memb_handle_message: handling join message from 192.168.0.12, State is 1
Handle_join in OP
Send_join: State is 4
Memb_handle_message: handling join message from 192.168.0.12, State is 4
Send_join: State is 4
Memb_handle_message: handling join message from 192.168.0.12, State is 4
Send_join: State is 4
Memb_handle_message: handling join message from 192.168.0.12, State is 4
Send_join: State is 4
Memb_handle_message: handling join message from 192.168.0.12, State is 4
Send_join: State is 4
Memb_handle_message: handling join message from 192.168.0.12, State is 4
Memb_handle_token: handling form2 token
Handle_form2 in FORM
Memb_handle_token: handling form2 token
Handle_form2 in EVS
Memb_transitional
Memb_regular
Membership id is ( -1062731765, 1186576462)
--------------------
Configuration at XP-One is:
Num Segments 3
	1	127.0.0.1         4803
		XP-One              	192.168.0.11    
	1	127.0.0.1         4803
		XP-Two              	192.168.0.12    
	1	127.0.0.1         4803
		XP-Three            	192.168.0.13    
====================



Is it a bug?

Thanks
Steve





-----Original Message-----
From: Ryan Caudy [mailto:rcaudy at gmail.com] 
Sent: 08 August 2007 12:13
To: Steve Duff
Cc: spread-users at lists.spread.org
Subject: Re: [Spread-users] Spread 4.0 daemon "goes to sleep" on XP
network disconnect

In order to better assess the issue, you might want to create logs of
what's going on with membership information printed.  See the sample
spread.conf distributed with Spread, or the documentation at
spread.org for more information on how to do this.

It sounds to me like A is starting the membership change, but failing
to complete it, even though B and C are able to start the change and
install a new membership in the same period of time.  During the
execution of the membership algorithm, new client messages are blocked
until completion.

Also, please check the regular membership messages received at clients
on each side of the partition you're creating.  Everyone should be
receiving A1 ... C2 in their new membership lists, but A1 and A2
should be in one of the VS sets, and B1 ... C2 should be in the other.
 If this is not the case then either my assumptions/understanding are
wrong or there's incorrect behavior at the daemon.

I think the most likely scenario at this point is that there's a
networking issue preventing A from completing the membership algorithm
and installing its new (solo) configuration.

Cheers,
Ryan

**********************************************************************

SunGard Vivista Limited, Marshfield, Chippenham, Wiltshire  SN14 8SR
Telephone: 08456 041999,  Fax: 08456 052999

Registered Office:  33 St Mary Axe, London EC3A 8AA.  Registered in England No. 1593831 VAT Reg No. GB 810 9546 34

**********************************************************************

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************


This message has been checked for all known viruses on behalf of SunGard Vivista by MessageLabs. 

http://www.messagelabs.com or Email: mailsweeper.info at vivista.sungard.com

For further information http://www.sungard.com/vivista




More information about the Spread-users mailing list