<br>In my setup there are two spread daemons running on two different nodes one client connected to each daemon. When I try bringing down the network interface (ifconfig down/up) connecting the spread daemons down and up, the spread daemon on one end crashes after couple of iterations of the above test case. The network remains down for say 5 sec and up for 30 sec.
<br><br>This is what the backtrace I got.<br><br>(gdb) bt<br><br>#0 0x0805757c in Create_form1 () at membership.c:1361<br>#1 0x080564c1 in Form_or_fail () at membership.c:920<br>#2 0x08052ef3 in E_handle_events () at events.c
:605<br>#3 0x0805e69d in main (argc=3, argv=0xbfffcae4) at spread.c:255<br><br>(gdb) l<br>1356 for( index = My_aru+1; index <= Highest_seq; index++ )<br>1357 {<br>1358 pack_entry = index & PACKET_MASK;
<br>1359 if( ! Packets[pack_entry].exist )<br>1360 {<br>1361 *holes_procs_ptr = index;<br>1362 Alarm( MEMB ,<br>1363 "INSERT HOLE 1 IS %d My_aru is %d, Highest_seq is %d\n",
<br>1364 index,My_aru, Highest_seq);<br>1365 holes_procs_ptr++;<br><br>(gdb) info local<br>form_token = {type = 4096, transmiter_id = 16, seq = 15702,<br> proc_id = -1062680573, aru = 134713880, aru_last_id = 134715232,
<br> flow_control = -14264, rtr_len = -16385}<br>rg_info = (ring_info *) 0xbfffa224<br>num_rings = (int32_t *) 0xbfffa220<br>holes_procs_ptr = (int32_t *) 0x972<br>index = 2415<br>pack_entry = 2415<br>num_bytes = 2427<br>
send_scat = {num_elements = 357, elements = {{<br> buf = 0x166 <Address 0x166 out of bounds>, len = 359}, {....<br>rg_info_buf = "\001\000\000\000\003...<br>temp_rep = {proc_id = -1062680574, type = 2, seg_index = 0}
<br>i = 2<br>j = 2<br>cur_num_members = 1<br>valid_members = {num_members = 0, num_pending = 0, members = {-1062680573,<br> 0 <repeats 127 times>}}<br><br>The version of spread I'm using is 3.17.3 and running on RHEL3. Has anyone encountered this problem before? Or should I use the latest version?
<br>
<br>Thanks and Regards<br>Balaji.<br>