[Spread-users] Strange crashing of spread server Gstate 3
Pilling, Michael
Michael.Pilling at dsto.defence.gov.au
Wed Jun 25 03:41:59 EDT 2008
Hello I'm wondering if anyone would have some suggestions on having one
of my 10
servers crash out with a Gstate 3 alarm.
The config file is
<<<<<<<
Spread_Segment 192.168.5.255:4803 {
act-server 192.168.5.121
nsw-server 192.168.5.239 #como-mac2 @ Pilling's desk
nt-server 192.168.5.221
qld-server 192.168.5.64
tas-server 192.168.5.222
vic-server 192.168.5.190
wa-server 192.168.5.220
c2d-phlogiston 192.168.5.96
c2d-turner1 192.168.5.14
will-server 192.168.5.36
nz-server 192.168.5.54
}
Spread_Segment 192.168.10.255:4803 {
#thunderbird acts as sa-server
sa-server 192.168.10.3 #thunderbird @ pilling's Desk
}
#Spread_Segment 192.168.41.255:4803 {
#Matthew's Machine goes up and down causes problems at the moment
# nz-server 192.168.41.71
#}
DangerousMonitor = true
EventLogFile = spreadlog_%h.out
EventTimeStamp
>>>>>>
Only nsw-server and sa-server are physical machines, all others are VMs.
Neither of the c2d machines are still active and so never join the
network.
The sequence I go through to test our application is
All servers start spread
All servers start a logging application that attaches to local spread
server.
nsw and sa servers start a chat room client.
I send a few messages.
Using sptmonitor,exe on nsw-server, I partition network into
{ACT,NSW,QLD,NT},{NZ},{WILL,WA,SA,VIC,TAS}
I send a few chat messages from NSW and SA
Using sptmonitor on nsw-server, I repartition too
{ACT,NSW,QLD,NT},{NZ,WILL,WA,SA,VIC,TAS}
Send a few more chat messages from both machines.
Then I cancel the monitor induced partition.
At this point, NSW server chat applications shows the network has fully
healed, and
then within a few seconds detects that SA server is unreachable, the
segment of the
spread daemon log at NSW server is
[Wed 25 Jun 2008 14:38:45] Net_recv: Got monitor message, component 1
[Wed 25 Jun 2008 14:38:45] Net_recv: Got monitor message, component 1
[Wed 25 Jun 2008 14:38:56] Net_recv: Got monitor message, component 0
[Wed 25 Jun 2008 14:38:56] Net_recv: Got monitor message, component 0
Membership id is ( -2085026439, 1214370570)
[Wed 25 Jun 2008 14:39:27] --------------------
[Wed 25 Jun 2008 14:39:27] Configuration at nsw-server is:
[Wed 25 Jun 2008 14:39:27] Num Segments 2
[Wed 25 Jun 2008 14:39:27] 9 192.168.5.255 4803
[Wed 25 Jun 2008 14:39:27] act-server
192.168.5.121
[Wed 25 Jun 2008 14:39:27] nsw-server
192.168.5.239
[Wed 25 Jun 2008 14:39:27] nt-server
192.168.5.221
[Wed 25 Jun 2008 14:39:27] qld-server
192.168.5.64
[Wed 25 Jun 2008 14:39:27] tas-server
192.168.5.222
[Wed 25 Jun 2008 14:39:27] vic-server
192.168.5.190
[Wed 25 Jun 2008 14:39:27] wa-server
192.168.5.220
[Wed 25 Jun 2008 14:39:27] will-server
192.168.5.36
[Wed 25 Jun 2008 14:39:27] nz-server
192.168.5.54
[Wed 25 Jun 2008 14:39:27] 1 192.168.10.255 4803
[Wed 25 Jun 2008 14:39:27] sa-server
192.168.10.3
[Wed 25 Jun 2008 14:39:27] ====================
Membership id is ( -2085026439, 1214370582)
[Wed 25 Jun 2008 14:39:39] --------------------
[Wed 25 Jun 2008 14:39:39] Configuration at nsw-server is:
[Wed 25 Jun 2008 14:39:39] Num Segments 2
[Wed 25 Jun 2008 14:39:39] 9 192.168.5.255 4803
[Wed 25 Jun 2008 14:39:39] act-server
192.168.5.121
[Wed 25 Jun 2008 14:39:39] nsw-server
192.168.5.239
[Wed 25 Jun 2008 14:39:39] nt-server
192.168.5.221
[Wed 25 Jun 2008 14:39:39] qld-server
192.168.5.64
[Wed 25 Jun 2008 14:39:39] tas-server
192.168.5.222
[Wed 25 Jun 2008 14:39:39] vic-server
192.168.5.190
[Wed 25 Jun 2008 14:39:39] wa-server
192.168.5.220
[Wed 25 Jun 2008 14:39:39] will-server
192.168.5.36
[Wed 25 Jun 2008 14:39:39] nz-server
192.168.5.54
[Wed 25 Jun 2008 14:39:39] 0 192.168.10.255 4803
The end of the spreadlog for SA server is
[Wed 25 Jun 2008 14:38:44] Net_recv: Got monitor message, component 2
[Wed 25 Jun 2008 14:38:44] Net_recv: Got monitor message, component 2
[Wed 25 Jun 2008 14:38:55] Net_recv: Got monitor message, component 0
[Wed 25 Jun 2008 14:38:55] Net_recv: Got monitor message, component 0
Membership id is ( -2085026439, 1214370570)
[Wed 25 Jun 2008 14:39:26] --------------------
[Wed 25 Jun 2008 14:39:26] Configuration at sa-server is:
[Wed 25 Jun 2008 14:39:26] Num Segments 2
[Wed 25 Jun 2008 14:39:26] 9 192.168.5.255 4803
[Wed 25 Jun 2008 14:39:26] act-server 192.168.5.121
[Wed 25 Jun 2008 14:39:26] nsw-server 192.168.5.239
[Wed 25 Jun 2008 14:39:26] nt-server 192.168.5.221
[Wed 25 Jun 2008 14:39:26] qld-server 192.168.5.64
[Wed 25 Jun 2008 14:39:26] tas-server 192.168.5.222
[Wed 25 Jun 2008 14:39:26] vic-server 192.168.5.190
[Wed 25 Jun 2008 14:39:26] wa-server 192.168.5.220
[Wed 25 Jun 2008 14:39:26] will-server 192.168.5.36
[Wed 25 Jun 2008 14:39:26] nz-server 192.168.5.54
[Wed 25 Jun 2008 14:39:26] 1 192.168.10.255 4803
[Wed 25 Jun 2008 14:39:26] sa-server 192.168.10.3
[Wed 25 Jun 2008 14:39:26] ====================
[Wed 25 Jun 2008 14:39:26] G_analize_groups: Gstate is 3
Exit caused by Alarm(EXIT)
I've included the full SA log as an attachment
This error has occurred on multiple occasions, but did not occur when I
tried the
same exercise with only {NSW,NZ,SA} as the active services.
I'd be greatful for any suggestions, and what does Gstate 3 mean. I take
it it is a point in the spread membership state machine.
Regards,
Michael
----
Dr Michael Pilling
C3I Division
DSTO
PO Box 1500
Edinburgh SA 5111
Phone: 08 8259 7017 Fax: 08 8259 5589
email: Michael.Pilling at dsto.defence.gov.au
IMPORTANT: This email remains the property of the Australian Defence
Organisation and is subject to the jurisdiction of section 70 of the
CRIMES ACT 1914. If you have received this email in error, you are
requested to contact the sender and delete the email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20080625/7e5a9f9d/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: anon_spreadlog_thunderbird
Type: application/octet-stream
Size: 31635 bytes
Desc: anon_spreadlog_thunderbird
Url : http://lists.spread.org/pipermail/spread-users/attachments/20080625/7e5a9f9d/attachment.obj
More information about the Spread-users
mailing list