[Spread-users] Strange crashing of spread server Gstate 3 (now readable)
Pilling, Michael
Michael.Pilling at dsto.defence.gov.au
Wed Jun 25 22:43:45 EDT 2008
I apologise for the wierd email some of you might have received, it
seems the list
server interpreted the spread logs as binary.
Here is my original message:
Hello I'm wondering if anyone would have some suggestions on having one
of my 10
servers crash out with a Gstate 3 alarm.
The config file is
::::::
Spread_Segment 192.168.5.255:4803 {
act-server 192.168.5.121
nsw-server 192.168.5.239 #como-mac2 @ Pilling's desk
nt-server 192.168.5.221
qld-server 192.168.5.64
tas-server 192.168.5.222
vic-server 192.168.5.190
wa-server 192.168.5.220
c2d-phlogiston 192.168.5.96
c2d-turner1 192.168.5.14
will-server 192.168.5.36
nz-server 192.168.5.54
}
Spread_Segment 192.168.10.255:4803 {
#thunderbird acts as sa-server
sa-server 192.168.10.3 #thunderbird @ pilling's Desk
}
#Spread_Segment 192.168.41.255:4803 {
#Matthew's Machine goes up and down causes problems at the moment
# nz-server 192.168.41.71
#}
DangerousMonitor = true
EventLogFile = spreadlog_%h.out
EventTimeStamp
::::::
Only nsw-server and sa-server are physical machines, all others are VMs.
Neither of the c2d machines are still active and so never join the
network.
The sequence I go through to test our application is
All servers start spread
All servers start a logging application that attaches to local spread
server.
nsw and sa servers start a chat room client.
I send a few messages.
Using sptmonitor,exe on nsw-server, I partition network into
{ACT,NSW,QLD,NT},{NZ},{WILL,WA,SA,VIC,TAS}
I send a few chat messages from NSW and SA
Using sptmonitor on nsw-server, I repartition too
{ACT,NSW,QLD,NT},{NZ,WILL,WA,SA,VIC,TAS}
Send a few more chat messages from both machines.
Then I cancel the monitor induced partition.
At this point, NSW server chat applications shows the network has fully
healed, and
then within a few seconds detects that SA server is unreachable, the
segment of the
spread daemon log at NSW server is
[Wed 25 Jun 2008 14:38:45] Net_recv: Got monitor message, component 1
[Wed 25 Jun 2008 14:38:45] Net_recv: Got monitor message, component 1
[Wed 25 Jun 2008 14:38:56] Net_recv: Got monitor message, component 0
[Wed 25 Jun 2008 14:38:56] Net_recv: Got monitor message, component 0
Membership id is ( -2085026439, 1214370570)
[Wed 25 Jun 2008 14:39:27] --------------------
[Wed 25 Jun 2008 14:39:27] Configuration at nsw-server is:
[Wed 25 Jun 2008 14:39:27] Num Segments 2
[Wed 25 Jun 2008 14:39:27] 9 192.168.5.255 4803
[Wed 25 Jun 2008 14:39:27] act-server
192.168.5.121
[Wed 25 Jun 2008 14:39:27] nsw-server
192.168.5.239
[Wed 25 Jun 2008 14:39:27] nt-server
192.168.5.221
[Wed 25 Jun 2008 14:39:27] qld-server
192.168.5.64
[Wed 25 Jun 2008 14:39:27] tas-server
192.168.5.222
[Wed 25 Jun 2008 14:39:27] vic-server
192.168.5.190
[Wed 25 Jun 2008 14:39:27] wa-server
192.168.5.220
[Wed 25 Jun 2008 14:39:27] will-server
192.168.5.36
[Wed 25 Jun 2008 14:39:27] nz-server
192.168.5.54
[Wed 25 Jun 2008 14:39:27] 1 192.168.10.255 4803
[Wed 25 Jun 2008 14:39:27] sa-server
192.168.10.3
[Wed 25 Jun 2008 14:39:27] ====================
Membership id is ( -2085026439, 1214370582)
[Wed 25 Jun 2008 14:39:39] --------------------
[Wed 25 Jun 2008 14:39:39] Configuration at nsw-server is:
[Wed 25 Jun 2008 14:39:39] Num Segments 2
[Wed 25 Jun 2008 14:39:39] 9 192.168.5.255 4803
[Wed 25 Jun 2008 14:39:39] act-server
192.168.5.121
[Wed 25 Jun 2008 14:39:39] nsw-server
192.168.5.239
[Wed 25 Jun 2008 14:39:39] nt-server
192.168.5.221
[Wed 25 Jun 2008 14:39:39] qld-server
192.168.5.64
[Wed 25 Jun 2008 14:39:39] tas-server
192.168.5.222
[Wed 25 Jun 2008 14:39:39] vic-server
192.168.5.190
[Wed 25 Jun 2008 14:39:39] wa-server
192.168.5.220
[Wed 25 Jun 2008 14:39:39] will-server
192.168.5.36
[Wed 25 Jun 2008 14:39:39] nz-server
192.168.5.54
[Wed 25 Jun 2008 14:39:39] 0 192.168.10.255 4803
The end of the spreadlog for SA server is
[Wed 25 Jun 2008 14:38:44] Net_recv: Got monitor message, component 2
[Wed 25 Jun 2008 14:38:44] Net_recv: Got monitor message, component 2
[Wed 25 Jun 2008 14:38:55] Net_recv: Got monitor message, component 0
[Wed 25 Jun 2008 14:38:55] Net_recv: Got monitor message, component 0
Membership id is ( -2085026439, 1214370570)
[Wed 25 Jun 2008 14:39:26] --------------------
[Wed 25 Jun 2008 14:39:26] Configuration at sa-server is:
[Wed 25 Jun 2008 14:39:26] Num Segments 2
[Wed 25 Jun 2008 14:39:26] 9 192.168.5.255 4803
[Wed 25 Jun 2008 14:39:26] act-server 192.168.5.121
[Wed 25 Jun 2008 14:39:26] nsw-server 192.168.5.239
[Wed 25 Jun 2008 14:39:26] nt-server 192.168.5.221
[Wed 25 Jun 2008 14:39:26] qld-server 192.168.5.64
[Wed 25 Jun 2008 14:39:26] tas-server 192.168.5.222
[Wed 25 Jun 2008 14:39:26] vic-server 192.168.5.190
[Wed 25 Jun 2008 14:39:26] wa-server 192.168.5.220
[Wed 25 Jun 2008 14:39:26] will-server 192.168.5.36
[Wed 25 Jun 2008 14:39:26] nz-server 192.168.5.54
[Wed 25 Jun 2008 14:39:26] 1 192.168.10.255 4803
[Wed 25 Jun 2008 14:39:26] sa-server 192.168.10.3
[Wed 25 Jun 2008 14:39:26] ====================
[Wed 25 Jun 2008 14:39:26] G_analize_groups: Gstate is 3
Exit caused by Alarm(EXIT)
I've included the full SA log as an attachment to my previous post
with almost the same subject. The attachment worked but the email
body didn't.
This error has occurred on multiple occasions, but did not occur when I
tried the
same exercise with only {NSW,NZ,SA} as the active services.
I'd be greatful for any suggestions, and what does Gstate 3 mean. I take
it it is a point in the spread membership state machine but where is it
up to
logically?
Regards,
Michael
IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20080626/ed5cacb2/attachment.html
More information about the Spread-users
mailing list