[Spread-users] RE: Spread 4.0 memory corruption bug in Fill_form1() and Create_form1()

Yair Amir yairamir at cs.jhu.edu
Wed Aug 8 10:41:48 EDT 2007


Hi Juan,

My guess is that you are correct. I am not sure the case where there are too
many holes beyond what a Form Token can hold is handled.

A possible solution would be to install a singleton and then move on consistently
with Spread guaranteed EVS semantics. But this is not an easy fix and will need
to be looked at carefully over time (maybe Jonathan can comment?)

It is common, usually on busy web sites using Spread, to give the Spread process
high priority that will likely eliminate the probably pretty high losses your
network experiences.

Cheers,

	:) Yair.

Juan Gomez wrote:
> In pursuing this bug further I have managed to collect more information in the hopes Spread developers can provide a fix or at least shed some light on what may be going on here. Here is the most recent data I have obtained and some speculation about what may be broken:
> 
>  
> 
> During a heavy load test (many spread messages being sent at the time the CPU where the spread daemons are running are highly utilized) on Spread I observed a crash in the Alarm() call in the following code in membership.c: 
> 
>  
> 
> static  void    Create_form1()
> 
> {
> 
> ....
> 
>         /* update holes */
> 
>         rg_info->num_holes     = 0;
> 
>         for( index = My_aru+1; index <= Highest_seq; index++ )
> 
>         {
> 
>             pack_entry = index & PACKET_MASK;
> 
>             if( ! Packets[pack_entry].exist )
> 
>             {
> 
>                *holes_procs_ptr = index;
> 
> CRASH >>>>             Alarm( MEMB ,
> 
>                    "INSERT HOLE 1 IS %d My_aru is %d, Highest_seq is %d\n",
> 
>                    index,My_aru, Highest_seq);
> 
>                holes_procs_ptr++;
> 
>                num_bytes += sizeof(int32);
> 
>                rg_info->num_holes++;
> 
>             }
> 
>         }
> 
> ....
> 
>  
> 
> }
> 
>  
> 
> From examining the core file and the code itself it seems to be the case that the function wrote past the limits of rg_info_buf[] array while writing holes to the token and corrupted the stack eventually causing the program to crash with SIGSEGV.
> 
>  
> 
> From looking at constants and related data types it seems to me that if there are too many holes to include in a token code breaks as it does not check that the holes will fit in the rg_info_buf array properly.
> 
>  
> 
> By the time of the crash some of the relevant global values were:
> 
>  
> 
> (gdb) p Highest_seq
> 
> $9 = 47491
> 
> (gdb) p My_aru
> 
> $10 = 1225
> 
> (gdb)
> 
>  
> 
> The partial dump of Packets[] is attached at the end if needed.
> 
>  
> 
> From these numbers it appears to me as this spread deamon that crash got really behind and the code does not handle that situation properly...
> 
>  
> 
> Also as I read the code that sets these globals I ran across the following function:
> 
>  
> 
>  
> 
>  
> 
> static  void    Backoff_membership()
> 
> {
> 
>         int     pack_entry;
> 
>         int     i;
> 
>  
> 
>         pack_entry=-1;
> 
>         for( i=Last_discarded+1; i <= Highest_seq; i++ )
> 
>         {
> 
>                /* clear dummy messages */
> 
>                pack_entry = i & PACKET_MASK;
> 
>                if( Packets[pack_entry].exist == 3 )
> 
>                        Packets[pack_entry].exist = 0;
> 
>         }
> 
>  
> 
>         /* return Aru and My_aru */
> 
>         Aru = Last_discarded;
> 
>  
> 
>         My_aru = Last_discarded;
> 
>         for( i=Last_discarded+1; i <= Highest_seq; i++ )
> 
>         {
> 
>                if( !Packets[pack_entry].exist ) break;
> 
>                My_aru++;
> 
>         }
> 
> }
> 
>  
> 
> My question about this function is whether the second loop is properly coded: in other words was the if( !Packets[pack_entry].exist ) break; intended to check a pack_entry value which is constant or was there some typo and the pack_entry in this second loop supposed to track the loop index as in the first loop.
> 
>  
> 
> Your help is greatly appreciated in getting to the root of this issue.
> 
>  
> 
> Regards, Juan
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> (gdb) p Packets[1255]@6397
> 
> $7 = {{head = 0x82d50f0, body = 0x95b7d70, exist = 0, proc_index = 0}, {head = 0x89078d8, body = 0x8af1168, 
> 
>     exist = 0, proc_index = 0}, {head = 0x8411710, body = 0x8af1168, exist = 0, proc_index = 0}, {head = 0x89078d8, 
> 
>     body = 0x8af1168, exist = 0, proc_index = 0}, {head = 0x8411710, body = 0x8af1168, exist = 0, proc_index = 0}, {
> 
>     head = 0x89078d8, body = 0x8af1168, exist = 0, proc_index = 0}, {head = 0x8411710, body = 0x8af1168, exist = 0, 
> 
>     proc_index = 0}, {head = 0x89078d8, body = 0x95b7d70, exist = 0, proc_index = 0}, {head = 0x82d50f0, 
> 
>     body = 0x8af1168, exist = 0, proc_index = 0}, {head = 0x8411710, body = 0x8af1168, exist = 0, proc_index = 0}, {
> 
>     head = 0x82d50f0, body = 0x8af1168, exist = 0, proc_index = 0}, {head = 0x8411710, body = 0x8af1168, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82d50f0, body = 0x8af1168, exist = 0, proc_index = 0}, {head = 0x8411710, 
> 
>     body = 0x95b7d70, exist = 0, proc_index = 0}, {head = 0x89078d8, body = 0x95b7d70, exist = 0, proc_index = 0}, {
> 
>     head = 0x8411710, body = 0x95b7d70, exist = 0, proc_index = 0}, {head = 0x89078d8, body = 0x95b7d70, exist = 0, 
> 
>     proc_index = 0}, {head = 0x8411710, body = 0x95b7d70, exist = 0, proc_index = 0}, {head = 0x89078d8, 
> 
>     body = 0x95b7d70, exist = 0, proc_index = 0}, {head = 0x8411710, body = 0x9180cd8, exist = 0, proc_index = 0}, {
> 
>     head = 0x82d50f0, body = 0x92f3420, exist = 0, proc_index = 3}, {head = 0x8412130, body = 0x95b7d70, exist = 0, 
> 
>     proc_index = 3}, {head = 0x89078d8, body = 0x8af1168, exist = 0, proc_index = 0}, {head = 0x8412130, 
> 
>     body = 0x8af1168, exist = 0, proc_index = 3}, {head = 0x82d50f0, body = 0x95b7d70, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d1be0, body = 0x92f3420, exist = 0, proc_index = 0}, {head = 0x8411710, body = 0x95b7d70, exist = 0, 
> 
>     proc_index = 3}, {head = 0x89078d8, body = 0x9180cd8, exist = 0, proc_index = 0}, {head = 0x8412130, 
> 
>     body = 0x95b7d70, exist = 0, proc_index = 3}, {head = 0x82d50f0, body = 0x93b9ad0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82d1be0, body = 0x95b7d70, exist = 0, proc_index = 3}, {head = 0x8411e60, body = 0x8af1168, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82d5e20, body = 0x92f3420, exist = 0, proc_index = 3}, {head = 0x82d1b20, 
> 
>     body = 0x9422a78, exist = 0, proc_index = 3}, {head = 0x8411710, body = 0x9180cd8, exist = 0, proc_index = 0}, {
> 
>     head = 0x89078d8, body = 0x9422a78, exist = 0, proc_index = 3}, {head = 0x81f3380, body = 0x89e8b10, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82f6030, body = 0x92f3420, exist = 0, proc_index = 3}, {head = 0x82e5938, 
> 
>     body = 0x8af1168, exist = 0, proc_index = 3}, {head = 0x82fc850, body = 0x95b7d70, exist = 0, proc_index = 3}, {
> 
>     head = 0x8412130, body = 0x93b9ad0, exist = 0, proc_index = 0}, {head = 0x82d5e20, body = 0x95b7d70, exist = 0, 
> 
>     proc_index = 3}, {head = 0x8411e60, body = 0x87463e0, exist = 0, proc_index = 3}, {head = 0x82d1be0, 
> 
>     body = 0x8528b18, exist = 0, proc_index = 3}, {head = 0x82d50f0, body = 0x8b24538, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d1b20, body = 0x92f3420, exist = 0, proc_index = 0}, {head = 0x82e5938, body = 0x8b24538, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82f6030, body = 0x8b30550, exist = 0, proc_index = 3}, {head = 0x82f6030, 
> 
>     body = 0x8b24538, exist = 0, proc_index = 3}, {head = 0x82fc850, body = 0x8b30550, exist = 0, proc_index = 0}, {
> 
>     head = 0x82e5938, body = 0x8b30550, exist = 0, proc_index = 0}, {head = 0x82fc850, body = 0x8b30550, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82e5938, body = 0x8b30550, exist = 0, proc_index = 0}, {head = 0x82fc850, 
> 
>     body = 0x8b24538, exist = 0, proc_index = 0}, {head = 0x82f6030, body = 0x8b30550, exist = 0, proc_index = 0}, {
> 
>     head = 0x82e5938, body = 0x8b30550, exist = 0, proc_index = 0}, {head = 0x82f6030, body = 0x8b30550, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82e5938, body = 0x8b30550, exist = 0, proc_index = 0}, {head = 0x82f6030, 
> 
>     body = 0x8528b18, exist = 0, proc_index = 0}, {head = 0x82fc850, body = 0x87463e0, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d1b20, body = 0x95b7d70, exist = 0, proc_index = 3}, {head = 0x82d50f0, body = 0x93b9ad0, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82d1be0, body = 0x89e8b10, exist = 0, proc_index = 3}, {head = 0x82e5938, 
> 
>     body = 0x93b9ad0, exist = 0, proc_index = 0}, {head = 0x82d50f0, body = 0x95b7d70, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d1b20, body = 0x95b7d70, exist = 0, proc_index = 3}, {head = 0x82fc850, body = 0x89e8b10, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82f6030, body = 0x87463e0, exist = 0, proc_index = 3}, {head = 0x82d1be0, 
> 
>     body = 0x8528b18, exist = 0, proc_index = 0}, {head = 0x82d50f0, body = 0x8af1168, exist = 0, proc_index = 0}, {
> 
>     head = 0x82fc850, body = 0x8528b18, exist = 0, proc_index = 3}, {head = 0x82d1b20, body = 0x87463e0, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82e5938, body = 0x89e8b10, exist = 0, proc_index = 3}, {head = 0x8411e60, 
> 
>     body = 0x95b7d70, exist = 0, proc_index = 3}, {head = 0x82d5e20, body = 0x93b9ad0, exist = 0, proc_index = 3}, {
> 
>     head = 0x8412130, body = 0x9180cd8, exist = 0, proc_index = 3}, {head = 0x81f3380, body = 0x868a840, exist = 0, 
> 
>     proc_index = 3}, {head = 0x89078d8, body = 0x82f2560, exist = 0, proc_index = 3}, {head = 0x82f6030, 
> 
>     body = 0x8b24538, exist = 0, proc_index = 0}, {head = 0x8411710, body = 0x82f2560, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d6760, body = 0x9422a78, exist = 0, proc_index = 3}, {head = 0x82d1be0, body = 0x9422a78, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82d6760, body = 0x82f2560, exist = 0, proc_index = 0}, {head = 0x8411710, 
> 
>     body = 0x82f2560, exist = 0, proc_index = 0}, {head = 0x82d6760, body = 0x9422a78, exist = 0, proc_index = 0}, {
> 
>     head = 0x82d1be0, body = 0x9422a78, exist = 0, proc_index = 0}, {head = 0x82d6760, body = 0x9422a78, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82d6760, body = 0x82f2560, exist = 0, proc_index = 3}, {head = 0x82d6760, 
> 
>     body = 0x9422a78, exist = 0, proc_index = 3}, {head = 0x82d1be0, body = 0x9422a78, exist = 0, proc_index = 1}, {
> 
>     head = 0x82d6760, body = 0x82f2560, exist = 0, proc_index = 1}, {head = 0x82d1be0, body = 0x93ac2d0, exist = 0, 
> 
>     proc_index = 1}, {head = 0x8411710, body = 0x8b24538, exist = 0, proc_index = 1}, {head = 0x82f6030, 
> 
>     body = 0x93ac2d0, exist = 0, proc_index = 1}, {head = 0x82d1be0, body = 0x8b24538, exist = 0, proc_index = 1}, {
> 
>     head = 0x8411710, body = 0x8b24538, exist = 0, proc_index = 0}, {head = 0x82d1be0, body = 0x8b24538, exist = 0, 
> 
>     proc_index = 1}, {head = 0x8411710, body = 0x82f2560, exist = 0, proc_index = 1}, {head = 0x82f6030, 
> 
>     body = 0x868a840, exist = 0, proc_index = 1}, {head = 0x82d6760, body = 0x93b9ad0, exist = 0, proc_index = 1}, {
> 
>     head = 0x89078d8, body = 0x89e8b10, exist = 0, proc_index = 1}, {head = 0x81f3380, body = 0x8528b18, exist = 0, 
> 
>     proc_index = 1}, {head = 0x8412130, body = 0x92f3420, exist = 0, proc_index = 1}, {head = 0x82d5e20, 
> 
>     body = 0x84c7a98, exist = 0, proc_index = 1}, {head = 0x8411e60, body = 0x87c7930, exist = 0, proc_index = 1}, {
> 
>     head = 0x82e5938, body = 0x9018c60, exist = 0, proc_index = 1}, {head = 0x82d1b20, body = 0x8472a88, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82d50f0, body = 0x9018c60, exist = 0, proc_index = 3}, {head = 0x868c1c8, 
> 
>     body = 0x87c7930, exist = 0, proc_index = 3}, {head = 0x82d2ea8, body = 0x84c7a98, exist = 0, proc_index = 3}, {
> 
>     head = 0x82f1408, body = 0x92f3420, exist = 0, proc_index = 3}, {head = 0x82e5968, body = 0x8528b18, exist = 0, 
> 
>     proc_index = 3}, {head = 0x8309ff8, body = 0x89e8b10, exist = 0, proc_index = 3}, {head = 0x8907968, 
> 
>     body = 0x93b9ad0, exist = 0, proc_index = 3}, {head = 0x8503db8, body = 0x868a840, exist = 0, proc_index = 3}, {
> 
>     head = 0x8689350, body = 0x82f2560, exist = 0, proc_index = 3}, {head = 0x8309930, body = 0x8b24538, exist = 0, 
> 
>     proc_index = 3}, {head = 0x86894f8, body = 0x96739d0, exist = 0, proc_index = 3}, {head = 0x868bf88, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 3}, {head = 0x8309840, body = 0x8c42888, exist = 0, proc_index = 3}, {
> 
>     head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x8309840, body = 0x8c42888, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x8309840, 
> 
>     body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x82fc850, body = 0x9583ea0, exist = 0, proc_index = 0}, {
> 
>     head = 0x868bf88, body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x8309840, body = 0x8c42888, exist = 0, 
> 
>     proc_index = 0}, {head = 0x868bf88, body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x8309840, 
> 
>     body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x868bf88, body = 0x9583ea0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x8309840, body = 0x9583ea0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x868bf88, body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x82fc850, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x8309840, body = 0x9583ea0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82fc850, body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x8309840, body = 0x9583ea0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82fc850, body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x8309840, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0}, {
> 
>     head = 0x868bf88, body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x82fc850, body = 0x9583ea0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x8309840, body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x868bf88, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0}, {
> 
>     head = 0x8309840, body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x868bf88, body = 0x9583ea0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x8309840, body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x82fc850, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x868bf88, body = 0x9583ea0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x8309840, body = 0x8c42888, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82fc850, body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x868bf88, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0}, {
> 
>     head = 0x8309840, body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x868bf88, body = 0x96739d0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82fc850, body = 0x94f4be8, exist = 0, proc_index = 0}, {head = 0x86894f8, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x8309840, body = 0x96739d0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x868bf88, body = 0x9583ea0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x86894f8, body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x82fc850, 
> 
>     body = 0x8528b18, exist = 0, proc_index = 3}, {head = 0x8309840, body = 0x8c42888, exist = 0, proc_index = 3}, {
> 
>     head = 0x8309930, body = 0x96739d0, exist = 0, proc_index = 3}, {head = 0x8689350, body = 0x8b24538, exist = 0, 
> 
>     proc_index = 3}, {head = 0x8503db8, body = 0x82f2560, exist = 0, proc_index = 3}, {head = 0x8907968, 
> 
>     body = 0x868a840, exist = 0, proc_index = 3}, {head = 0x8309ff8, body = 0x93b9ad0, exist = 0, proc_index = 3}, {
> 
>     head = 0x868bf88, body = 0x8472a88, exist = 0, proc_index = 0}, {head = 0x82f1408, body = 0x93b9ad0, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82d2ea8, body = 0x89e8b10, exist = 0, proc_index = 3}, {head = 0x868c1c8, 
> 
>     body = 0x92f3420, exist = 0, proc_index = 3}, {head = 0x82d50f0, body = 0x84c7a98, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d1b20, body = 0x87c7930, exist = 0, proc_index = 3}, {head = 0x8309ff8, body = 0x87c7930, exist = 0, 
> 
>     proc_index = 3}, {head = 0x8907968, body = 0x9018c60, exist = 0, proc_index = 3}, {head = 0x8503db8, 
> 
>     body = 0x868a840, exist = 0, proc_index = 3}, {head = 0x8689350, body = 0x82f2560, exist = 0, proc_index = 3}, {
> 
>     head = 0x8309930, body = 0x8b24538, exist = 0, proc_index = 3}, {head = 0x82e5968, body = 0x8b24538, exist = 0, 
> 
>     proc_index = 0}, {head = 0x8309930, body = 0x8b24538, exist = 0, proc_index = 0}, {head = 0x82e5968, 
> 
>     body = 0x868a840, exist = 0, proc_index = 0}, {head = 0x8689350, body = 0x94f4be8, exist = 0, proc_index = 0}, {
> 
>     head = 0x8503db8, body = 0x9018c60, exist = 0, proc_index = 0}, {head = 0x8907968, body = 0x868a840, exist = 0, 
> 
>     proc_index = 0}, {head = 0x8503db8, body = 0x82f2560, exist = 0, proc_index = 0}, {head = 0x8689350, 
> 
>     body = 0x9018c60, exist = 0, proc_index = 0}, {head = 0x82e5968, body = 0x8b24538, exist = 0, proc_index = 2}, {
> 
>     head = 0x8309930, body = 0x84c7a98, exist = 0, proc_index = 0}, {head = 0x8309ff8, body = 0x94f4be8, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82d1b20, body = 0x87c7930, exist = 0, proc_index = 0}, {head = 0x82d50f0, 
> 
>     body = 0x89e8b10, exist = 0, proc_index = 0}, {head = 0x868c1c8, body = 0x96739d0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82e5968, body = 0x96739d0, exist = 0, proc_index = 3}, {head = 0x82d2ea8, body = 0x87c7930, exist = 0, 
> 
>     proc_index = 0}, {head = 0x868c1c8, body = 0x92f3420, exist = 0, proc_index = 0}, {head = 0x82d50f0, 
> 
>     body = 0x8b24538, exist = 0, proc_index = 0}, {head = 0x82d1b20, body = 0x84c7a98, exist = 0, proc_index = 0}...}
> 
>  
> 
>  
> 
>  
> 
> ------------------------------------------------------------------------
> 
> *From:* Juan Gomez
> *Sent:* Tuesday, July 10, 2007 5:50 PM
> *To:* 'spread-users at lists.spread.org'
> *Cc:* Juan Gomez
> *Subject:* Spread 4.0 memory corruption bug in Fill_form1()
> 
>  
> 
> Hi all:
> 
>  
> 
> I have noticed that after long running test spread 4.0 crashes with 
> corrupted stack in the following spot:
> 
>  
> 
>  
> 
> Membership.c:
> 
>  
> 
> static    void      Fill_form1( sys_scatter *scat )
> 
> {
> 
> ....
> 
>                 /* New ring_info will fit, so create it */
> 
>                 for( index = Last_discarded+1; index <= Highest_seq; 
> index++ )
> 
>                 {
> 
>                     pack_entry = index & PACKET_MASK;
> 
>                     if( ! Packets[pack_entry].exist )
> 
>                     {
> 
>                                     *new_holes_procs_ptr = index;
> 
>                                     Alarm( MEMB , "INSERT HOLE 2 IS 
> %d\n",index); <<<<<<<<<<<<<<<<<<<<<<<<< CRASH HERE
> 
>                                     new_holes_procs_ptr++;
> 
>                                     num_bytes     += sizeof(int32);
> 
>                                     new_rg_info->num_holes++;
> 
>                     }
> 
>                 }
> 
>  
> 
>  
> 
> }
> 
>  
> 
> The first stack trace I got was this:
> 
>  
> 
> (gdb) where
> 
> #0  0xb7ebe9da in getenv () from /lib/libc.so.6
> 
> #1  0xb7f10d19 in tzset_internal () from /lib/libc.so.6
> 
> #2  0xb7f11a28 in __tz_convert () from /lib/libc.so.6
> 
> #3  0xb7f0fca0 in localtime () from /lib/libc.so.6
> 
> #4  0x08053498 in Alarm (mask=512, message=0x806c171 "INSERT HOLE 2 IS %d\n")
> 
>     at alarm.c:146
> 
> #5  0x0805864e in Fill_form1 (scat=Variable "scat" is not available.
> 
> ) at membership.c:1837
> 
> #6  0x00000f79 in ?? ()
> 
> #7  0x00000f7a in ?? ()
> 
> #8  0x00000f7b in ?? ()
> 
> #9  0x00000f7c in ?? ()
> 
>  
> 
>  
> 
> This obviously shows the stack was corrupted and since the only part of the code I could suspect of causing this corruption was the following buffer:
> 
>  
> 
>         char           rg_info_buf[sizeof(token_body)];
> 
>  
> 
> I added asserts to track exactly at what point we overrun the buffer and found the following:
> 
> ....
> 
>        rg_info_buf_end = rg_info_buf + sizeof(token_body);
> 
> ....
> 
>  
> 
>                 /* New ring_info will fit, so create it */
> 
>                 for( index = Last_discarded+1; index <= Highest_seq; index++ )
> 
>                 {
> 
>                     pack_entry = index & PACKET_MASK;
> 
>                     if( ! Packets[pack_entry].exist )
> 
>                     {
> 
>                         assert((rg_info_buf + num_bytes + sizeof(int32)) <= rg_info_buf_end); <<<<< TRIGERRED ASSERT
> 
>                        *new_holes_procs_ptr = index;
> 
>                        Alarm( MEMB , "INSERT HOLE 2 IS %d\n",index);
> 
>                        new_holes_procs_ptr++;
> 
>                        num_bytes     += sizeof(int32);
> 
>                        new_rg_info->num_holes++;
> 
>                     }
> 
>                 }
> 
>  
> 
>  
> 
> So it looks like we overrun the rg_info_buf under some conditions and I am wondering if snybody has seen this problem or whether there have been patches issued for this issue? This was observed with 8 node cluster....
> 
>  
> 
>  
> 
> Regards, Juan
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users





More information about the Spread-users mailing list