[Spread-users] Spread daemon crashes when network fails

Balaji Rajappa balajirajappa at gmail.com
Mon Sep 18 11:47:08 EDT 2006


In my setup there are two spread daemons running on two different nodes one
client connected to each daemon. When I try bringing down the network
interface (ifconfig down/up) connecting the spread daemons down and up, the
spread daemon on one end crashes after couple of iterations of the above
test case. The network remains down for say 5 sec and up for 30 sec.

This is what the backtrace I got.

(gdb) bt

#0  0x0805757c in Create_form1 () at membership.c:1361
#1  0x080564c1 in Form_or_fail () at membership.c:920
#2  0x08052ef3 in E_handle_events () at events.c:605
#3  0x0805e69d in main (argc=3, argv=0xbfffcae4) at spread.c:255

(gdb) l
1356            for( index = My_aru+1; index <= Highest_seq; index++ )
1357            {
1358                pack_entry = index & PACKET_MASK;
1359                if( ! Packets[pack_entry].exist )
1360                {
1361                    *holes_procs_ptr = index;
1362                    Alarm( MEMB ,
1363                        "INSERT HOLE 1 IS %d My_aru is %d, Highest_seq
is %d\n",
1364                        index,My_aru, Highest_seq);
1365                    holes_procs_ptr++;

(gdb) info local
form_token = {type = 4096, transmiter_id = 16, seq = 15702,
  proc_id = -1062680573, aru = 134713880, aru_last_id = 134715232,
  flow_control = -14264, rtr_len = -16385}
rg_info = (ring_info *) 0xbfffa224
num_rings = (int32_t *) 0xbfffa220
holes_procs_ptr = (int32_t *) 0x972
index = 2415
pack_entry = 2415
num_bytes = 2427
send_scat = {num_elements = 357, elements = {{
      buf = 0x166 <Address 0x166 out of bounds>, len = 359}, {....
rg_info_buf = "\001\000\000\000\003...
temp_rep = {proc_id = -1062680574, type = 2, seg_index = 0}
i = 2
j = 2
cur_num_members = 1
valid_members = {num_members = 0, num_pending = 0, members = {-1062680573,
    0 <repeats 127 times>}}

The version of spread I'm using is 3.17.3 and running on RHEL3. Has anyone
encountered this problem before? Or should I use the latest version?

Thanks and Regards
Balaji.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20060918/6fa465a0/attachment.html 


More information about the Spread-users mailing list