[Spread-users] Issue with Spread going silent

Goran Hasse gorhas at gmail.com
Sun Nov 7 08:15:11 EST 2010


Hi

Have you cecked all kernel parameters on FreeBSD to se if there
could be any tunig?


sysctl -a | grep ...
net.inet.udp.checksum: 1
net.inet.udp.maxdgram: 9216
net.inet.udp.recvspace: 42080
...
net.inet.ip.intr_queue_drops: 0
net.inet.icmp.drop_redirect: 0
net.inet.tcp.drop_synfin: 0
net.isr.drop: 0

There is a lot of them that could influence the network trafic.

GH


2010/11/7 Luke Marsden <luke-lists at hybrid-logic.co.uk>:
> Hi all,
>
> To pin this down as a potential FreeBSD 8.1 issue, I have now
> demonstrated that Spread 4.1.0 works fine on Debian 5.0 in the same
> network infrastructure (with multiple Spread segments, one for each
> public IP).
>
> I will now test it on FreeBSD 8.0 to see if it was some change in
> FreeBSD 8.1 which is subtly interacting with Spread to cause this
> "self-destruct-on-new-join" behaviour.
>
> If so, would anyone be able to help me create a patch to Spread which
> fixes it? I know precious little about Spread's internal protocol.
>
> Full verbosity failure logs coming later, which we can hopefully compare
> to the successful Debian run to figure out where it's going wrong!
>
> --
> Best Regards,
> Luke Marsden
> CTO, Hybrid Logic Ltd.
>
> Web: http://www.hybrid-cluster.com/
> Hybrid Web Cluster - cloud web hosting
>
> Mobile: +447791750420
>
>
>
> On Sat, 2010-11-06 at 18:12 -0400, Yair Amir wrote:
>> Hi,
>>
>> It is possibly a connectivity issue between the different computers. This
>> means that it may be not possible to send and receive a packet from each
>> computer to each other computer. You can check this building the spsend
>> and sprecv programs and running them to verify if this hypothesis is
>> correct.
>>
>> If you let the monitor run for another 40-50 seconds beyond what you
>> sent (for a few more reports) this can help.
>>
>> Cheers,
>>
>>       :) Yair.
>>
>> On 11/6/10 5:49 PM, Luke Marsden wrote:
>> > Hi all,
>> >
>> > I've got a very strange issue with Spread going "silent" (not even a
>> > self-join message with spuser "j foo") after adding a third node to a
>> > network of two.
>> >
>> > The problem does not occur if all three Spread daemons are launched
>> > simultaneously. It only happens if I launch two nodes, wait a few
>> > seconds (until they've announced the group memberships) and then add the
>> > third node.
>> >
>> > Here is the spread config (everything else is stock 4.1.0):
>> >
>> >         Spread_Segment 178.22.65.249:4803 {
>> >             f497c15415a34ba8 178.22.65.249
>> >         }
>> >         Spread_Segment 178.22.65.74:4803 {
>> >             2f8919e6ea14416a 178.22.65.74
>> >         }
>> >         Spread_Segment 178.22.67.120:4803 {
>> >             a816c9ebce424d8b 178.22.67.120
>> >         }
>> >
>> > For some background, these nodes are running on cloud infrastructure in
>> > the same data centre but without a local broadcast address, hence the
>> > three distinct Spread segments.
>> >
>> > And here's the output of spmonitor with any the first two nodes
>> > connected (working):
>> >
>> > ============================
>> > Status at 2f8919e6ea14416a V 4.01. 0 (state 1, gstate 1) after 88
>> > seconds :
>> > Membership  :  2  procs in 2 segments, leader is f497c15415a34ba8
>> > rounds   :     988      tok_hurry :     225     memb change:       1
>> > sent pack:     136      recv pack :     136     retrans    :       0
>> > u retrans:       0      s retrans :       0     b retrans  :       0
>> > My_aru   :     299      Aru       :     299     Highest seq:     299
>> > Sessions :       1      Groups    :       1     Window     :      60
>> > Deliver M:     295      Deliver Pk:     299     Pers Window:      15
>> > Delta Mes:      32      Delta Pack:      32     Delta sec  :       5
>> > ==================================
>> >
>> > Monitor>
>> > ============================
>> > Status at f497c15415a34ba8 V 4.01. 0 (state 1, gstate 1) after 93
>> > seconds :
>> > Membership  :  2  procs in 2 segments, leader is f497c15415a34ba8
>> > rounds   :     988      tok_hurry :     238     memb change:       1
>> > sent pack:     136      recv pack :     136     retrans    :       0
>> > u retrans:       0      s retrans :       0     b retrans  :       0
>> > My_aru   :     299      Aru       :     299     Highest seq:     299
>> > Sessions :       1      Groups    :       1     Window     :      60
>> > Deliver M:     295      Deliver Pk:     299     Pers Window:      15
>> > Delta Mes:       0      Delta Pack:       0     Delta sec  :       5
>> > ==================================
>> >
>> > Then when I start spread on the third node, Bad Things Happen:
>> >
>> > Monitor> Monitor: send status query
>> >
>> > ============================
>> > Status at 2f8919e6ea14416a V 4.01. 0 (state 1, gstate 1) after 128
>> > seconds :
>> > Membership  :  2  procs in 2 segments, leader is f497c15415a34ba8
>> > rounds   :    1465      tok_hurry :     335     memb change:       1
>> > sent pack:     199      recv pack :     199     retrans    :       0
>> > u retrans:       0      s retrans :       0     b retrans  :       0
>> > My_aru   :     426      Aru       :     426     Highest seq:     426
>> > Sessions :       1      Groups    :       1     Window     :      60
>> > Deliver M:     422      Deliver Pk:     426     Pers Window:      15
>> > Delta Mes:      32      Delta Pack:      32     Delta sec  :       5
>> > ==================================
>> >
>> > Monitor>
>> > ============================
>> > Status at f497c15415a34ba8 V 4.01. 0 (state 4, gstate 1) after 133
>> > seconds :
>> > Membership  :  2  procs in 2 segments, leader is f497c15415a34ba8
>> > rounds   :    1465      tok_hurry :     357     memb change:       1
>> > sent pack:     199      recv pack :     199     retrans    :       0
>> > u retrans:       0      s retrans :       0     b retrans  :       0
>> > My_aru   :     426      Aru       :     426     Highest seq:     426
>> > Sessions :       1      Groups    :       1     Window     :      60
>> > Deliver M:     422      Deliver Pk:     426     Pers Window:      15
>> > Delta Mes:       0      Delta Pack:       0     Delta sec  :       5
>> > ==================================
>> >
>> > Monitor>
>> > ============================
>> > Status at a816c9ebce424d8b V 4.01. 0 (state 4, gstate 1) after 2
>> > seconds :
>> > Membership  :  0  procs in 0 segments, leader is 0
>> > rounds   :       0      tok_hurry :       0     memb change:       0
>> > sent pack:       0      recv pack :       0     retrans    :       0
>> > u retrans:       0      s retrans :       0     b retrans  :       0
>> > My_aru   :       0      Aru       :       0     Highest seq:       0
>> > Sessions :       1      Groups    :       0     Window     :      60
>> > Deliver M:       0      Deliver Pk:       0     Pers Window:      15
>> > Delta Mes:    -422      Delta Pack:    -426     Delta sec  :    -131
>> > ==================================
>> >
>> > After the issue occurs, spuser will no longer connect to Spread on any
>> > node:
>> >
>> > hybrid at f497c15415a34ba8:~$ spuser
>> > Spread library version is 4.1.0
>> > recv_nointr_timeout: Timed out
>> > SP_error: (-8) Connection closed by spread
>> >
>> > Any insight would be very much appreciated, as we're about to launch a
>> > major product which relies on this!
>> >
>> > The environment is FreeBSD 8.1 with Spread 4.1.0 on CloudSigma (Linux
>> > KVM) infrastructure. I can provide detailed log output, please tell me
>> > which flags you would like.
>> >
>>
>> _______________________________________________
>> Spread-users mailing list
>> Spread-users at lists.spread.org
>> http://lists.spread.org/mailman/listinfo/spread-users
>
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
>



-- 
gorhas at gmail.com
Göran Hasse
Boo 229
715 91  ODENSBACKEN
Mob: 070-5530148




More information about the Spread-users mailing list