[Spread-users] Issue with Spread going silent
Luke Marsden
luke-lists at hybrid-logic.co.uk
Sat Nov 6 17:49:10 EDT 2010
Hi all,
I've got a very strange issue with Spread going "silent" (not even a
self-join message with spuser "j foo") after adding a third node to a
network of two.
The problem does not occur if all three Spread daemons are launched
simultaneously. It only happens if I launch two nodes, wait a few
seconds (until they've announced the group memberships) and then add the
third node.
Here is the spread config (everything else is stock 4.1.0):
Spread_Segment 178.22.65.249:4803 {
f497c15415a34ba8 178.22.65.249
}
Spread_Segment 178.22.65.74:4803 {
2f8919e6ea14416a 178.22.65.74
}
Spread_Segment 178.22.67.120:4803 {
a816c9ebce424d8b 178.22.67.120
}
For some background, these nodes are running on cloud infrastructure in
the same data centre but without a local broadcast address, hence the
three distinct Spread segments.
And here's the output of spmonitor with any the first two nodes
connected (working):
============================
Status at 2f8919e6ea14416a V 4.01. 0 (state 1, gstate 1) after 88
seconds :
Membership : 2 procs in 2 segments, leader is f497c15415a34ba8
rounds : 988 tok_hurry : 225 memb change: 1
sent pack: 136 recv pack : 136 retrans : 0
u retrans: 0 s retrans : 0 b retrans : 0
My_aru : 299 Aru : 299 Highest seq: 299
Sessions : 1 Groups : 1 Window : 60
Deliver M: 295 Deliver Pk: 299 Pers Window: 15
Delta Mes: 32 Delta Pack: 32 Delta sec : 5
==================================
Monitor>
============================
Status at f497c15415a34ba8 V 4.01. 0 (state 1, gstate 1) after 93
seconds :
Membership : 2 procs in 2 segments, leader is f497c15415a34ba8
rounds : 988 tok_hurry : 238 memb change: 1
sent pack: 136 recv pack : 136 retrans : 0
u retrans: 0 s retrans : 0 b retrans : 0
My_aru : 299 Aru : 299 Highest seq: 299
Sessions : 1 Groups : 1 Window : 60
Deliver M: 295 Deliver Pk: 299 Pers Window: 15
Delta Mes: 0 Delta Pack: 0 Delta sec : 5
==================================
Then when I start spread on the third node, Bad Things Happen:
Monitor> Monitor: send status query
============================
Status at 2f8919e6ea14416a V 4.01. 0 (state 1, gstate 1) after 128
seconds :
Membership : 2 procs in 2 segments, leader is f497c15415a34ba8
rounds : 1465 tok_hurry : 335 memb change: 1
sent pack: 199 recv pack : 199 retrans : 0
u retrans: 0 s retrans : 0 b retrans : 0
My_aru : 426 Aru : 426 Highest seq: 426
Sessions : 1 Groups : 1 Window : 60
Deliver M: 422 Deliver Pk: 426 Pers Window: 15
Delta Mes: 32 Delta Pack: 32 Delta sec : 5
==================================
Monitor>
============================
Status at f497c15415a34ba8 V 4.01. 0 (state 4, gstate 1) after 133
seconds :
Membership : 2 procs in 2 segments, leader is f497c15415a34ba8
rounds : 1465 tok_hurry : 357 memb change: 1
sent pack: 199 recv pack : 199 retrans : 0
u retrans: 0 s retrans : 0 b retrans : 0
My_aru : 426 Aru : 426 Highest seq: 426
Sessions : 1 Groups : 1 Window : 60
Deliver M: 422 Deliver Pk: 426 Pers Window: 15
Delta Mes: 0 Delta Pack: 0 Delta sec : 5
==================================
Monitor>
============================
Status at a816c9ebce424d8b V 4.01. 0 (state 4, gstate 1) after 2
seconds :
Membership : 0 procs in 0 segments, leader is 0
rounds : 0 tok_hurry : 0 memb change: 0
sent pack: 0 recv pack : 0 retrans : 0
u retrans: 0 s retrans : 0 b retrans : 0
My_aru : 0 Aru : 0 Highest seq: 0
Sessions : 1 Groups : 0 Window : 60
Deliver M: 0 Deliver Pk: 0 Pers Window: 15
Delta Mes: -422 Delta Pack: -426 Delta sec : -131
==================================
After the issue occurs, spuser will no longer connect to Spread on any
node:
hybrid at f497c15415a34ba8:~$ spuser
Spread library version is 4.1.0
recv_nointr_timeout: Timed out
SP_error: (-8) Connection closed by spread
Any insight would be very much appreciated, as we're about to launch a
major product which relies on this!
The environment is FreeBSD 8.1 with Spread 4.1.0 on CloudSigma (Linux
KVM) infrastructure. I can provide detailed log output, please tell me
which flags you would like.
--
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.
Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting
Mobile: +447791750420
More information about the Spread-users
mailing list