[Spread-users] Strange crashing of spread server Gstate 3

Pilling, Michael Michael.Pilling at dsto.defence.gov.au
Wed Jun 25 03:41:59 EDT 2008


Hello I'm wondering if anyone would have some suggestions on having one
of my 10 
servers crash out with a Gstate 3 alarm.
 
The config file is
<<<<<<<
Spread_Segment  192.168.5.255:4803 {
    act-server 192.168.5.121
    nsw-server      192.168.5.239 #como-mac2 @ Pilling's desk
    nt-server 192.168.5.221
    qld-server      192.168.5.64
    tas-server      192.168.5.222
    vic-server 192.168.5.190
    wa-server 192.168.5.220
    c2d-phlogiston 192.168.5.96
    c2d-turner1 192.168.5.14
    will-server 192.168.5.36
 
    nz-server 192.168.5.54
}
 
Spread_Segment 192.168.10.255:4803 {
#thunderbird acts as sa-server
    sa-server     192.168.10.3 #thunderbird @ pilling's Desk
}
 
#Spread_Segment 192.168.41.255:4803 {
#Matthew's Machine goes up and down causes problems at the moment
#        nz-server     192.168.41.71
#}
DangerousMonitor = true
EventLogFile = spreadlog_%h.out
EventTimeStamp
>>>>>>
 
Only nsw-server and sa-server are physical machines, all others are VMs.
Neither of the c2d machines are still active and so never join the
network.
 
The sequence I go through to test our application is
All servers start spread
All servers start a logging application that attaches to local spread
server.
nsw and sa servers start a chat room client.
I send a few messages.
Using sptmonitor,exe on nsw-server, I partition network into
{ACT,NSW,QLD,NT},{NZ},{WILL,WA,SA,VIC,TAS}
I send a few chat messages from NSW and SA
Using sptmonitor on nsw-server, I repartition too 
{ACT,NSW,QLD,NT},{NZ,WILL,WA,SA,VIC,TAS}
Send a few more chat messages from both machines.
Then I cancel the monitor induced partition.
 
At this point, NSW server chat applications shows the network has fully
healed, and
then within a few seconds detects that SA server is unreachable, the
segment of the 
spread daemon log at NSW server is
 
[Wed 25 Jun 2008 14:38:45] Net_recv: Got monitor message, component 1

[Wed 25 Jun 2008 14:38:45] Net_recv: Got monitor message, component 1

[Wed 25 Jun 2008 14:38:56] Net_recv: Got monitor message, component 0

[Wed 25 Jun 2008 14:38:56] Net_recv: Got monitor message, component 0

Membership id is ( -2085026439, 1214370570)

[Wed 25 Jun 2008 14:39:27] --------------------

[Wed 25 Jun 2008 14:39:27] Configuration at nsw-server is:

[Wed 25 Jun 2008 14:39:27] Num Segments 2

[Wed 25 Jun 2008 14:39:27]       9          192.168.5.255     4803

[Wed 25 Jun 2008 14:39:27]                   act-server
192.168.5.121   

[Wed 25 Jun 2008 14:39:27]                   nsw-server
192.168.5.239   

[Wed 25 Jun 2008 14:39:27]                   nt-server
192.168.5.221   

[Wed 25 Jun 2008 14:39:27]                   qld-server
192.168.5.64    

[Wed 25 Jun 2008 14:39:27]                   tas-server
192.168.5.222   

[Wed 25 Jun 2008 14:39:27]                   vic-server
192.168.5.190   

[Wed 25 Jun 2008 14:39:27]                   wa-server
192.168.5.220   

[Wed 25 Jun 2008 14:39:27]                   will-server
192.168.5.36    

[Wed 25 Jun 2008 14:39:27]                   nz-server
192.168.5.54    

[Wed 25 Jun 2008 14:39:27]       1          192.168.10.255    4803

[Wed 25 Jun 2008 14:39:27]                   sa-server
192.168.10.3    

[Wed 25 Jun 2008 14:39:27] ====================

Membership id is ( -2085026439, 1214370582)

[Wed 25 Jun 2008 14:39:39] --------------------

[Wed 25 Jun 2008 14:39:39] Configuration at nsw-server is:

[Wed 25 Jun 2008 14:39:39] Num Segments 2

[Wed 25 Jun 2008 14:39:39]       9          192.168.5.255     4803

[Wed 25 Jun 2008 14:39:39]                   act-server
192.168.5.121   

[Wed 25 Jun 2008 14:39:39]                   nsw-server
192.168.5.239   

[Wed 25 Jun 2008 14:39:39]                   nt-server
192.168.5.221   

[Wed 25 Jun 2008 14:39:39]                   qld-server
192.168.5.64    

[Wed 25 Jun 2008 14:39:39]                   tas-server
192.168.5.222   

[Wed 25 Jun 2008 14:39:39]                   vic-server
192.168.5.190   

[Wed 25 Jun 2008 14:39:39]                   wa-server
192.168.5.220   

[Wed 25 Jun 2008 14:39:39]                   will-server
192.168.5.36    

[Wed 25 Jun 2008 14:39:39]                   nz-server
192.168.5.54    

[Wed 25 Jun 2008 14:39:39]       0          192.168.10.255    4803

 

The end of the spreadlog for SA server is
 
[Wed 25 Jun 2008 14:38:44] Net_recv: Got monitor message, component 2
[Wed 25 Jun 2008 14:38:44] Net_recv: Got monitor message, component 2
[Wed 25 Jun 2008 14:38:55] Net_recv: Got monitor message, component 0
[Wed 25 Jun 2008 14:38:55] Net_recv: Got monitor message, component 0
Membership id is ( -2085026439, 1214370570)
[Wed 25 Jun 2008 14:39:26] --------------------
[Wed 25 Jun 2008 14:39:26] Configuration at sa-server is:
[Wed 25 Jun 2008 14:39:26] Num Segments 2
[Wed 25 Jun 2008 14:39:26]  9 192.168.5.255     4803
[Wed 25 Jun 2008 14:39:26]   act-server           192.168.5.121   
[Wed 25 Jun 2008 14:39:26]   nsw-server           192.168.5.239   
[Wed 25 Jun 2008 14:39:26]   nt-server            192.168.5.221   
[Wed 25 Jun 2008 14:39:26]   qld-server           192.168.5.64    
[Wed 25 Jun 2008 14:39:26]   tas-server           192.168.5.222   
[Wed 25 Jun 2008 14:39:26]   vic-server           192.168.5.190   
[Wed 25 Jun 2008 14:39:26]   wa-server            192.168.5.220   
[Wed 25 Jun 2008 14:39:26]   will-server          192.168.5.36    
[Wed 25 Jun 2008 14:39:26]   nz-server            192.168.5.54    
[Wed 25 Jun 2008 14:39:26]  1 192.168.10.255    4803
[Wed 25 Jun 2008 14:39:26]   sa-server            192.168.10.3    
[Wed 25 Jun 2008 14:39:26] ====================
[Wed 25 Jun 2008 14:39:26] G_analize_groups: Gstate is 3
Exit caused by Alarm(EXIT)

I've included the full SA log as an attachment
 
This error has occurred on multiple occasions, but did not occur when I
tried the
same exercise with only {NSW,NZ,SA} as the active services.
 
I'd be greatful for any suggestions, and what does Gstate 3 mean. I take
it it is a point in the spread membership state machine.
 
Regards,
 
Michael
 
---- 
Dr Michael Pilling 
C3I Division 
DSTO 
PO Box 1500 
Edinburgh SA 5111 
Phone: 08 8259 7017 Fax: 08 8259 5589 
email: Michael.Pilling at dsto.defence.gov.au 


IMPORTANT: This email remains the property of the Australian Defence
Organisation and is subject to the jurisdiction of section 70 of the
CRIMES ACT 1914. If you have received this email in error, you are
requested to contact the sender and delete the email.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20080625/7e5a9f9d/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: anon_spreadlog_thunderbird
Type: application/octet-stream
Size: 31635 bytes
Desc: anon_spreadlog_thunderbird
Url : http://lists.spread.org/pipermail/spread-users/attachments/20080625/7e5a9f9d/attachment.obj 


More information about the Spread-users mailing list