[Spread-users] Strange crashing of spread server Gstate 3 (now readable)

Pilling, Michael Michael.Pilling at dsto.defence.gov.au
Wed Jun 25 22:43:45 EDT 2008


I apologise for the wierd email some of you might have received, it
seems the list
server interpreted the spread logs as binary. 
 
Here is my original message:
 
Hello I'm wondering if anyone would have some suggestions on having one
of my 10 
servers crash out with a Gstate 3 alarm.
 
The config file is
::::::
Spread_Segment  192.168.5.255:4803 {
    act-server 192.168.5.121
    nsw-server      192.168.5.239 #como-mac2 @ Pilling's desk
    nt-server 192.168.5.221
    qld-server      192.168.5.64
    tas-server      192.168.5.222
    vic-server 192.168.5.190
    wa-server 192.168.5.220
    c2d-phlogiston 192.168.5.96
    c2d-turner1 192.168.5.14
    will-server 192.168.5.36
 
    nz-server 192.168.5.54
}
 
Spread_Segment 192.168.10.255:4803 {
#thunderbird acts as sa-server
    sa-server     192.168.10.3 #thunderbird @ pilling's Desk
}
 
#Spread_Segment 192.168.41.255:4803 {
#Matthew's Machine goes up and down causes problems at the moment
#        nz-server     192.168.41.71
#}
DangerousMonitor = true
EventLogFile = spreadlog_%h.out
EventTimeStamp
::::::
 
Only nsw-server and sa-server are physical machines, all others are VMs.
Neither of the c2d machines are still active and so never join the
network.
 
The sequence I go through to test our application is
All servers start spread
All servers start a logging application that attaches to local spread
server.
nsw and sa servers start a chat room client.
I send a few messages.
Using sptmonitor,exe on nsw-server, I partition network into
{ACT,NSW,QLD,NT},{NZ},{WILL,WA,SA,VIC,TAS}
I send a few chat messages from NSW and SA
Using sptmonitor on nsw-server, I repartition too 
{ACT,NSW,QLD,NT},{NZ,WILL,WA,SA,VIC,TAS}
Send a few more chat messages from both machines.
Then I cancel the monitor induced partition.
 
At this point, NSW server chat applications shows the network has fully
healed, and
then within a few seconds detects that SA server is unreachable, the
segment of the 
spread daemon log at NSW server is
 
[Wed 25 Jun 2008 14:38:45] Net_recv: Got monitor message, component 1
[Wed 25 Jun 2008 14:38:45] Net_recv: Got monitor message, component 1
[Wed 25 Jun 2008 14:38:56] Net_recv: Got monitor message, component 0
[Wed 25 Jun 2008 14:38:56] Net_recv: Got monitor message, component 0
Membership id is ( -2085026439, 1214370570)
[Wed 25 Jun 2008 14:39:27] --------------------
[Wed 25 Jun 2008 14:39:27] Configuration at nsw-server is:
[Wed 25 Jun 2008 14:39:27] Num Segments 2
[Wed 25 Jun 2008 14:39:27]       9          192.168.5.255     4803
[Wed 25 Jun 2008 14:39:27]                   act-server
192.168.5.121   
[Wed 25 Jun 2008 14:39:27]                   nsw-server
192.168.5.239   
[Wed 25 Jun 2008 14:39:27]                   nt-server
192.168.5.221   
[Wed 25 Jun 2008 14:39:27]                   qld-server
192.168.5.64    
[Wed 25 Jun 2008 14:39:27]                   tas-server
192.168.5.222   
[Wed 25 Jun 2008 14:39:27]                   vic-server
192.168.5.190   
[Wed 25 Jun 2008 14:39:27]                   wa-server
192.168.5.220   
[Wed 25 Jun 2008 14:39:27]                   will-server
192.168.5.36    
[Wed 25 Jun 2008 14:39:27]                   nz-server
192.168.5.54    
[Wed 25 Jun 2008 14:39:27]       1          192.168.10.255    4803
[Wed 25 Jun 2008 14:39:27]                   sa-server
192.168.10.3    
[Wed 25 Jun 2008 14:39:27] ====================
Membership id is ( -2085026439, 1214370582)
[Wed 25 Jun 2008 14:39:39] --------------------
[Wed 25 Jun 2008 14:39:39] Configuration at nsw-server is:
[Wed 25 Jun 2008 14:39:39] Num Segments 2
[Wed 25 Jun 2008 14:39:39]       9          192.168.5.255     4803
[Wed 25 Jun 2008 14:39:39]                   act-server
192.168.5.121   
[Wed 25 Jun 2008 14:39:39]                   nsw-server
192.168.5.239   
[Wed 25 Jun 2008 14:39:39]                   nt-server
192.168.5.221   
[Wed 25 Jun 2008 14:39:39]                   qld-server
192.168.5.64    
[Wed 25 Jun 2008 14:39:39]                   tas-server
192.168.5.222   
[Wed 25 Jun 2008 14:39:39]                   vic-server
192.168.5.190   
[Wed 25 Jun 2008 14:39:39]                   wa-server
192.168.5.220   
[Wed 25 Jun 2008 14:39:39]                   will-server
192.168.5.36    
[Wed 25 Jun 2008 14:39:39]                   nz-server
192.168.5.54    
[Wed 25 Jun 2008 14:39:39]       0          192.168.10.255    4803
 
The end of the spreadlog for SA server is
 
[Wed 25 Jun 2008 14:38:44] Net_recv: Got monitor message, component 2
[Wed 25 Jun 2008 14:38:44] Net_recv: Got monitor message, component 2
[Wed 25 Jun 2008 14:38:55] Net_recv: Got monitor message, component 0
[Wed 25 Jun 2008 14:38:55] Net_recv: Got monitor message, component 0
Membership id is ( -2085026439, 1214370570)
[Wed 25 Jun 2008 14:39:26] --------------------
[Wed 25 Jun 2008 14:39:26] Configuration at sa-server is:
[Wed 25 Jun 2008 14:39:26] Num Segments 2
[Wed 25 Jun 2008 14:39:26]  9 192.168.5.255     4803
[Wed 25 Jun 2008 14:39:26]   act-server           192.168.5.121   
[Wed 25 Jun 2008 14:39:26]   nsw-server           192.168.5.239   
[Wed 25 Jun 2008 14:39:26]   nt-server            192.168.5.221   
[Wed 25 Jun 2008 14:39:26]   qld-server           192.168.5.64    
[Wed 25 Jun 2008 14:39:26]   tas-server           192.168.5.222   
[Wed 25 Jun 2008 14:39:26]   vic-server           192.168.5.190   
[Wed 25 Jun 2008 14:39:26]   wa-server            192.168.5.220   
[Wed 25 Jun 2008 14:39:26]   will-server          192.168.5.36    
[Wed 25 Jun 2008 14:39:26]   nz-server            192.168.5.54    
[Wed 25 Jun 2008 14:39:26]  1 192.168.10.255    4803
[Wed 25 Jun 2008 14:39:26]   sa-server            192.168.10.3    
[Wed 25 Jun 2008 14:39:26] ====================
[Wed 25 Jun 2008 14:39:26] G_analize_groups: Gstate is 3
Exit caused by Alarm(EXIT)
 
I've included the full SA log as an attachment to my previous post
with almost the same subject. The attachment worked but the email 
body didn't.
 
This error has occurred on multiple occasions, but did not occur when I
tried the
same exercise with only {NSW,NZ,SA} as the active services.
 
I'd be greatful for any suggestions, and what does Gstate 3 mean. I take
it it is a point in the spread membership state machine but where is it
up to
logically?
 
Regards,
 
Michael


IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914.  If you have received this email in error, you are requested to contact the sender and delete the email.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20080626/ed5cacb2/attachment.html 


More information about the Spread-users mailing list