[Spread-users] Cluster locked up again

Yair Amir yairamir at cnds.jhu.edu
Sun Jan 13 13:40:25 EST 2002


Hi,

I actually do see a few problems but I need two consecutive rounds
of dumps (you gave only one round of dumps of spmonitor. I need to 
see the progress for at least 2 rounds). I also need your spread.conf

Although I can immediately tell you that the last guy on your spread.conf
that runs a daemon loses about 25% of the messages. I think that you have
a problematic network connection on that guy (the last guy listed on the
membership). This issue is not the main problem here though.

There is no problem running a daemon on a system that nobody connects to.

Draining the mailboxes or not - this does not have any effect on spread.
What will happen is that if a program does not read its mailbox and the mailbox
gets full, then after 1000 more messages for this program, Spread just disconnects
it (the rest will get a membership event). The group_membership change
you refer to is good but will not solve the current problem.

	Cheers,

	:) Yair.

Tom Mornini wrote:
> 
> Well, making certain to drain the mailboxes didn't prevent Spread from
> hanging again.
> 
> I've already implemented the connect level group_membership change, but
> that code hasn't been pushed yet.
> 
> Here's an spmonitor dump. Does anything seem out of order? The huge
> retrans number on obi (the group leader) looks rather suspicious...
> 
> Question: Is there any problem with running a spread daemon on a system
> that nobody is connecting to? We run jabba as a hot spare, but nothing
> is actually running there. I notice that it's recv pack number is twice
> as high as the next highest...and that doesn't seem right!
> 
> ============================
> Status at boba V 3.16. 1 (state 1, gstate 1) after 248709 seconds :
> Membership  :  5  procs in 1 segments, leader is obi
> rounds   : 76806053     tok_hurry :  353432     memb change:       1
> sent pack:       2      recv pack :  752333     retrans    :     226
> u retrans:       9      s retrans :     217     b retrans  :       0
> My_aru   :  768294      Aru       :  766693     Highest seq:  768294
> Sessions :       3      Groups    :       3     Window     :      60
> Deliver M:  767663      Deliver Pk:  768294     Pers Window:      15
> Delta Mes:       0      Delta Pack:       0     Delta sec  :      11
> ==================================
> 
> Monitor>
> ============================
> Status at lando V 3.16. 1 (state 1, gstate 1) after 248741 seconds :
> Membership  :  5  procs in 1 segments, leader is obi
> rounds   : 76810910     tok_hurry :  353473     memb change:       2
> sent pack:  364288      recv pack :  387839     retrans    :     510
> u retrans:      12      s retrans :     498     b retrans  :       0
> My_aru   :  768294      Aru       :  766693     Highest seq:  768294
> Sessions :      84      Groups    :       3     Window     :      60
> Deliver M:  767737      Deliver Pk:  768384     Pers Window:      15
> Delta Mes:      74      Delta Pack:       0     Delta sec  :      32
> ==================================
> 
> Monitor>
> ============================
> Status at greedo V 3.16. 1 (state 1, gstate 1) after 248736 seconds :
> Membership  :  5  procs in 1 segments, leader is obi
> rounds   : 76810909     tok_hurry :  353473     memb change:       2
> sent pack:  387600      recv pack :  365029     retrans    :       0
> u retrans:       0      s retrans :       0     b retrans  :       0
> My_aru   :  768294      Aru       :  766693     Highest seq:  768294
> Sessions :      84      Groups    :       3     Window     :      60
> Deliver M:  767737      Deliver Pk:  768384     Pers Window:      15
> Delta Mes:       0      Delta Pack:       0     Delta sec  :      -5
> ==================================
> 
> Monitor>
> ============================
> Status at jabba V 3.16. 1 (state 1, gstate 1) after 248742 seconds :
> Membership  :  5  procs in 1 segments, leader is obi
> rounds   : 76810909     tok_hurry :  353473     memb change:       2
> sent pack:      18      recv pack : 1711493     retrans    :       4
> u retrans:       4      s retrans :       0     b retrans  :       0
> My_aru   :  768294      Aru       :  766693     Highest seq:  768294
> Sessions :       0      Groups    :       3     Window     :      60
> Deliver M:  767737      Deliver Pk:  768384     Pers Window:      15
> Delta Mes:       0      Delta Pack:       0     Delta sec  :       6
> ==================================
> 
> Monitor>
> ============================
> Status at obi V 3.16. 1 (state 1, gstate 1) after 248710 seconds :
> Membership  :  5  procs in 1 segments, leader is obi
> rounds   : 76806052     tok_hurry :  354419     memb change:       1
> sent pack:       2      recv pack :  752558     retrans    :  206268
> u retrans:  206268      s retrans :       0     b retrans  :       0
> My_aru   :  768294      Aru       :  766693     Highest seq:  768294
> Sessions :       3      Groups    :       3     Window     :      60
> Deliver M:  767663      Deliver Pk:  768294     Pers Window:      15
> Delta Mes:     -74      Delta Pack:       0     Delta sec  :     -32
> ==================================
> 
> --
> -- Tom Mornini
> -- eWingz Systems, Inc.
> --
> -- ICQ: 113526784, AOL: tmornini, Yahoo: tmornini, MSN: tmornini
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users





More information about the Spread-users mailing list