[Spread-users] Spread Crash

Adrian Revill adrian.revill at shazamteam.com
Sun Oct 5 12:04:07 EDT 2008


Hi Rodrick,

Here are the ends of 2 of the logs.
Would changing the logging values help at all?

Oct  4 14:20:23 app31 spread: Handle_alive in REPRESENTED
Oct  4 14:20:23 app31 spread: Net_recv: Received Packet - packet 
length(84), packed message length(84)
Oct  4 14:20:23 app31 spread: Memb_handle_message: handling alive message
Oct  4 14:20:23 app31 spread: Handle_alive in REPRESENTED
Oct  4 14:20:23 app31 spread: Net_recv: Received Packet - packet 
length(84), packed message length(84)
Oct  4 14:20:23 app31 spread: Memb_handle_message: handling alive message
Oct  4 14:20:23 app31 spread: Handle_alive in REPRESENTED
Oct  4 14:20:23 app31 spread: Net_recv: Received Packet - packet 
length(104), packed message length(104)
Oct  4 14:20:23 app31 spread: Memb_handle_message: handling alive message
Oct  4 14:20:23 app31 spread: Handle_alive in REPRESENTED
Oct  4 14:20:23 app31 spread: Net_recv: Received Packet - packet 
length(104), packed message length(104)
Oct  4 14:20:23 app31 spread: Memb_handle_message: handling alive message
Oct  4 14:20:23 app31 spread: Handle_alive in REPRESENTED
Oct  4 14:20:23 app31 spread: it is a Form Token.
Oct  4 14:20:23 app31 spread: Memb_handle_token: handling form1 token
Oct  4 14:20:23 app31 spread: Handle_f

Oct  4 14:20:22 purple10 spread: Handle_alive in REPRESENTED
Oct  4 14:20:22 purple10 spread: Net_recv: Received Packet - packet 
length(104), packed message length(104)
Oct  4 14:20:22 purple10 spread: Memb_handle_message: handling alive message
Oct  4 14:20:22 purple10 spread: Handle_alive in REPRESENTED
Oct  4 14:20:22 purple10 spread: Net_recv: Received Packet - packet 
length(104), packed message length(104)
Oct  4 14:20:22 purple10 spread: Memb_handle_message: handling alive message
Oct  4 14:20:22 purple10 spread: Handle_alive in REPRESENTED
Oct  4 14:20:22 purple10 spread: Net_recv: Received Packet - packet 
length(116), packed message length(116)
Oct  4 14:20:22 purple10 spread: Memb_handle_message: handling join 
message from 172.20.0.134, State is 3
Oct  4 14:20:22 purple10 spread: Scast_alive: State is 3
Oct  4 14:20:22 purple10 spread: Net_recv: Received Packet - packet 
length(84), packed message length(84)
Oct  4 14:20:22 purple10 spread: Memb_handle_message: handling alive message
Oct  4 14:20:22 purple10 spread: Handle_alive in REPRESENTED
Oct  4 14:20:22 purple10 spread: Net_recv: Received Packet - packet 
length(104), packed message length(104)
Oct  4 14:20:22 purple10 spread: Memb_handle_message: handling alive message
Oct  4 14:20:22 purple10 spread: Handle_alive in REPRESENTED
Oct  4 14:20:23 purple10 spread: Net_recv: Received Packet - packet 
length(84), packed message


Rodrick Brown wrote:
> On Sat, Oct 4, 2008 at 12:00 PM, Adrian Revill 
> <adrian.revill at shazamteam.com <mailto:adrian.revill at shazamteam.com>> 
> wrote:
>
>     Hi,
>
>     We are running spread 4.00.00 on a single segment with redhat el5,
>     and have been experiencing a problem where one server crashes
>     (OS), and all the spread daemons on all the other servers die.
>
>     We have enabled logging on spread but do not see any reason
>     logged, probably because spread uses a buffered logger and the
>     buffer is not being flushed.
>     We are also not getting any core dumps.
>
>     Has anyone else seen this problem? or have any idea how to fault find?
>
>     Also is there a newer version of spread?
>
>     Our configuration is.
>
>     Spread_Segment  172.20.255.255 <http://172.20.255.255>
>     {
>           purple9         172.20.0.134 <http://172.20.0.134>
>           purple10        172.20.0.135 <http://172.20.0.135>
>           purple11        172.20.0.136 <http://172.20.0.136>
>           purple12        172.20.0.137 <http://172.20.0.137>
>           purple13        172.20.0.140 <http://172.20.0.140>
>           purple14        172.20.0.144 <http://172.20.0.144>
>           purple15        172.20.0.141 <http://172.20.0.141>
>           purple16        172.20.0.143 <http://172.20.0.143>
>           app11           172.20.0.130 <http://172.20.0.130>
>           app12           172.20.0.131 <http://172.20.0.131>
>           app13           172.20.0.142 <http://172.20.0.142>
>           wombat14        172.20.0.145 <http://172.20.0.145>
>           app15           172.20.0.132 <http://172.20.0.132>
>           app16           172.20.0.133 <http://172.20.0.133>
>           webportal11     172.20.0.138 <http://172.20.0.138>
>           webportal12     172.20.0.139 <http://172.20.0.139>
>           webportal13     172.20.0.157 <http://172.20.0.157>
>           app20           172.20.0.170 <http://172.20.0.170>
>           app21           172.20.0.171 <http://172.20.0.171>
>           purple20        172.20.0.172 <http://172.20.0.172>
>           purple21        172.20.0.173 <http://172.20.0.173>
>           app30           172.20.0.160 <http://172.20.0.160>
>           app31           172.20.0.161 <http://172.20.0.161>
>           purple30        172.20.0.162 <http://172.20.0.162>
>           purple31        172.20.0.163 <http://172.20.0.163>
>           pws30           172.20.0.164 <http://172.20.0.164>
>           pws31           172.20.0.165 <http://172.20.0.165>
>     }
>
>     DaemonUser = nobody
>     DaemonGroup = nobody
>     RuntimeDir = /usr/spread
>
>     #       EXIT PRINT DEBUG DATA_LINK NETWORK PROTOCOL SESSION
>     #       CONFIGURATION MEMBERSHIP FLOW_CONTROL STATUS EVENTS
>     #       GROUPS MEMORY SKIPLIST ALL NONE
>
>     #DebugFlags = { ALL !DATA_LINK !MEMORY !DEBUG !EVENTS }
>     DebugFlags = { ALL !DATA_LINK !MEMORY !DEBUG !EVENTS }
>
>
> Can you paste the last 10 or so lines in the log before the crash?  
> I've seen weird memory behaviors with spread when you have members in 
> too many groups and sending out too many messages all at once where 
> spread will consume all memory on the host and crash with an malloc 
> error.
>
>
>
>
>     ______________________________________________________________________
>     This email has been scanned by the MessageLabs Email Security System.
>     For more information please visit http://www.messagelabs.com/email
>     ______________________________________________________________________
>
>     _______________________________________________
>     Spread-users mailing list
>     Spread-users at lists.spread.org <mailto:Spread-users at lists.spread.org>
>     http://lists.spread.org/mailman/listinfo/spread-users
>
>
>
>
> -- 
> [ Rodrick R. Brown ]  
> http://www.rodrickbrown.com http://www.linkedin.com/in/rodrickbrown
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________




More information about the Spread-users mailing list