[Spread-users] Spread Crash
Adrian Revill
adrian.revill at shazamteam.com
Sun Oct 5 12:04:07 EDT 2008
Hi Rodrick,
Here are the ends of 2 of the logs.
Would changing the logging values help at all?
Oct 4 14:20:23 app31 spread: Handle_alive in REPRESENTED
Oct 4 14:20:23 app31 spread: Net_recv: Received Packet - packet
length(84), packed message length(84)
Oct 4 14:20:23 app31 spread: Memb_handle_message: handling alive message
Oct 4 14:20:23 app31 spread: Handle_alive in REPRESENTED
Oct 4 14:20:23 app31 spread: Net_recv: Received Packet - packet
length(84), packed message length(84)
Oct 4 14:20:23 app31 spread: Memb_handle_message: handling alive message
Oct 4 14:20:23 app31 spread: Handle_alive in REPRESENTED
Oct 4 14:20:23 app31 spread: Net_recv: Received Packet - packet
length(104), packed message length(104)
Oct 4 14:20:23 app31 spread: Memb_handle_message: handling alive message
Oct 4 14:20:23 app31 spread: Handle_alive in REPRESENTED
Oct 4 14:20:23 app31 spread: Net_recv: Received Packet - packet
length(104), packed message length(104)
Oct 4 14:20:23 app31 spread: Memb_handle_message: handling alive message
Oct 4 14:20:23 app31 spread: Handle_alive in REPRESENTED
Oct 4 14:20:23 app31 spread: it is a Form Token.
Oct 4 14:20:23 app31 spread: Memb_handle_token: handling form1 token
Oct 4 14:20:23 app31 spread: Handle_f
Oct 4 14:20:22 purple10 spread: Handle_alive in REPRESENTED
Oct 4 14:20:22 purple10 spread: Net_recv: Received Packet - packet
length(104), packed message length(104)
Oct 4 14:20:22 purple10 spread: Memb_handle_message: handling alive message
Oct 4 14:20:22 purple10 spread: Handle_alive in REPRESENTED
Oct 4 14:20:22 purple10 spread: Net_recv: Received Packet - packet
length(104), packed message length(104)
Oct 4 14:20:22 purple10 spread: Memb_handle_message: handling alive message
Oct 4 14:20:22 purple10 spread: Handle_alive in REPRESENTED
Oct 4 14:20:22 purple10 spread: Net_recv: Received Packet - packet
length(116), packed message length(116)
Oct 4 14:20:22 purple10 spread: Memb_handle_message: handling join
message from 172.20.0.134, State is 3
Oct 4 14:20:22 purple10 spread: Scast_alive: State is 3
Oct 4 14:20:22 purple10 spread: Net_recv: Received Packet - packet
length(84), packed message length(84)
Oct 4 14:20:22 purple10 spread: Memb_handle_message: handling alive message
Oct 4 14:20:22 purple10 spread: Handle_alive in REPRESENTED
Oct 4 14:20:22 purple10 spread: Net_recv: Received Packet - packet
length(104), packed message length(104)
Oct 4 14:20:22 purple10 spread: Memb_handle_message: handling alive message
Oct 4 14:20:22 purple10 spread: Handle_alive in REPRESENTED
Oct 4 14:20:23 purple10 spread: Net_recv: Received Packet - packet
length(84), packed message
Rodrick Brown wrote:
> On Sat, Oct 4, 2008 at 12:00 PM, Adrian Revill
> <adrian.revill at shazamteam.com <mailto:adrian.revill at shazamteam.com>>
> wrote:
>
> Hi,
>
> We are running spread 4.00.00 on a single segment with redhat el5,
> and have been experiencing a problem where one server crashes
> (OS), and all the spread daemons on all the other servers die.
>
> We have enabled logging on spread but do not see any reason
> logged, probably because spread uses a buffered logger and the
> buffer is not being flushed.
> We are also not getting any core dumps.
>
> Has anyone else seen this problem? or have any idea how to fault find?
>
> Also is there a newer version of spread?
>
> Our configuration is.
>
> Spread_Segment 172.20.255.255 <http://172.20.255.255>
> {
> purple9 172.20.0.134 <http://172.20.0.134>
> purple10 172.20.0.135 <http://172.20.0.135>
> purple11 172.20.0.136 <http://172.20.0.136>
> purple12 172.20.0.137 <http://172.20.0.137>
> purple13 172.20.0.140 <http://172.20.0.140>
> purple14 172.20.0.144 <http://172.20.0.144>
> purple15 172.20.0.141 <http://172.20.0.141>
> purple16 172.20.0.143 <http://172.20.0.143>
> app11 172.20.0.130 <http://172.20.0.130>
> app12 172.20.0.131 <http://172.20.0.131>
> app13 172.20.0.142 <http://172.20.0.142>
> wombat14 172.20.0.145 <http://172.20.0.145>
> app15 172.20.0.132 <http://172.20.0.132>
> app16 172.20.0.133 <http://172.20.0.133>
> webportal11 172.20.0.138 <http://172.20.0.138>
> webportal12 172.20.0.139 <http://172.20.0.139>
> webportal13 172.20.0.157 <http://172.20.0.157>
> app20 172.20.0.170 <http://172.20.0.170>
> app21 172.20.0.171 <http://172.20.0.171>
> purple20 172.20.0.172 <http://172.20.0.172>
> purple21 172.20.0.173 <http://172.20.0.173>
> app30 172.20.0.160 <http://172.20.0.160>
> app31 172.20.0.161 <http://172.20.0.161>
> purple30 172.20.0.162 <http://172.20.0.162>
> purple31 172.20.0.163 <http://172.20.0.163>
> pws30 172.20.0.164 <http://172.20.0.164>
> pws31 172.20.0.165 <http://172.20.0.165>
> }
>
> DaemonUser = nobody
> DaemonGroup = nobody
> RuntimeDir = /usr/spread
>
> # EXIT PRINT DEBUG DATA_LINK NETWORK PROTOCOL SESSION
> # CONFIGURATION MEMBERSHIP FLOW_CONTROL STATUS EVENTS
> # GROUPS MEMORY SKIPLIST ALL NONE
>
> #DebugFlags = { ALL !DATA_LINK !MEMORY !DEBUG !EVENTS }
> DebugFlags = { ALL !DATA_LINK !MEMORY !DEBUG !EVENTS }
>
>
> Can you paste the last 10 or so lines in the log before the crash?
> I've seen weird memory behaviors with spread when you have members in
> too many groups and sending out too many messages all at once where
> spread will consume all memory on the host and crash with an malloc
> error.
>
>
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org <mailto:Spread-users at lists.spread.org>
> http://lists.spread.org/mailman/listinfo/spread-users
>
>
>
>
> --
> [ Rodrick R. Brown ]
> http://www.rodrickbrown.com http://www.linkedin.com/in/rodrickbrown
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________
More information about the Spread-users
mailing list