[Spread-users] RE: Spread 4.0 memory corruption bug in Fill_form1() and Create_form1()

Juan Gomez jgomez at juniper.net
Wed Aug 8 11:00:47 EDT 2007


Yair:

Thanks for the response. I guess I just wanted to bring the issue to the
table and motivate a patch. Increasing the priority is a quick fix that
may help as you stated: we will experiment with that, in the mean time
please keep me posted on any progress towards handling the situation in
the code itself.

Regards, Juan



-----Original Message-----
From: Yair Amir [mailto:yairamir at cs.jhu.edu] 
Sent: Wednesday, August 08, 2007 7:42 AM
To: Juan Gomez
Cc: spread-users at lists.spread.org
Subject: Re: [Spread-users] RE: Spread 4.0 memory corruption bug in
Fill_form1() and Create_form1()

Hi Juan,

My guess is that you are correct. I am not sure the case where there are
too
many holes beyond what a Form Token can hold is handled.

A possible solution would be to install a singleton and then move on
consistently
with Spread guaranteed EVS semantics. But this is not an easy fix and
will need
to be looked at carefully over time (maybe Jonathan can comment?)

It is common, usually on busy web sites using Spread, to give the Spread
process
high priority that will likely eliminate the probably pretty high losses
your
network experiences.

Cheers,

	:) Yair.

Juan Gomez wrote:
> In pursuing this bug further I have managed to collect more
information in the hopes Spread developers can provide a fix or at least
shed some light on what may be going on here. Here is the most recent
data I have obtained and some speculation about what may be broken:
> 
>  
> 
> During a heavy load test (many spread messages being sent at the time
the CPU where the spread daemons are running are highly utilized) on
Spread I observed a crash in the Alarm() call in the following code in
membership.c: 
> 
>  
> 
> static  void    Create_form1()
> 
> {
> 
> ....
> 
>         /* update holes */
> 
>         rg_info->num_holes     = 0;
> 
>         for( index = My_aru+1; index <= Highest_seq; index++ )
> 
>         {
> 
>             pack_entry = index & PACKET_MASK;
> 
>             if( ! Packets[pack_entry].exist )
> 
>             {
> 
>                *holes_procs_ptr = index;
> 
> CRASH >>>>             Alarm( MEMB ,
> 
>                    "INSERT HOLE 1 IS %d My_aru is %d, Highest_seq is
%d\n",
> 
>                    index,My_aru, Highest_seq);
> 
>                holes_procs_ptr++;
> 
>                num_bytes += sizeof(int32);
> 
>                rg_info->num_holes++;
> 
>             }
> 
>         }
> 
> ....
> 
>  
> 
> }
> 
>  
> 
> From examining the core file and the code itself it seems to be the
case that the function wrote past the limits of rg_info_buf[] array
while writing holes to the token and corrupted the stack eventually
causing the program to crash with SIGSEGV.
> 
>  
> 
> From looking at constants and related data types it seems to me that
if there are too many holes to include in a token code breaks as it does
not check that the holes will fit in the rg_info_buf array properly.
> 
>  
> 
> By the time of the crash some of the relevant global values were:
> 
>  
> 
> (gdb) p Highest_seq
> 
> $9 = 47491
> 
> (gdb) p My_aru
> 
> $10 = 1225
> 
> (gdb)
> 
>  
> 
> The partial dump of Packets[] is attached at the end if needed.
> 
>  
> 
> From these numbers it appears to me as this spread deamon that crash
got really behind and the code does not handle that situation
properly...
> 
>  
> 
> Also as I read the code that sets these globals I ran across the
following function:
> 
>  
> 
>  
> 
>  
> 
> static  void    Backoff_membership()
> 
> {
> 
>         int     pack_entry;
> 
>         int     i;
> 
>  
> 
>         pack_entry=-1;
> 
>         for( i=Last_discarded+1; i <= Highest_seq; i++ )
> 
>         {
> 
>                /* clear dummy messages */
> 
>                pack_entry = i & PACKET_MASK;
> 
>                if( Packets[pack_entry].exist == 3 )
> 
>                        Packets[pack_entry].exist = 0;
> 
>         }
> 
>  
> 
>         /* return Aru and My_aru */
> 
>         Aru = Last_discarded;
> 
>  
> 
>         My_aru = Last_discarded;
> 
>         for( i=Last_discarded+1; i <= Highest_seq; i++ )
> 
>         {
> 
>                if( !Packets[pack_entry].exist ) break;
> 
>                My_aru++;
> 
>         }
> 
> }
> 
>  
> 
> My question about this function is whether the second loop is properly
coded: in other words was the if( !Packets[pack_entry].exist ) break;
intended to check a pack_entry value which is constant or was there some
typo and the pack_entry in this second loop supposed to track the loop
index as in the first loop.
> 
>  
> 
> Your help is greatly appreciated in getting to the root of this issue.
> 
>  
> 
> Regards, Juan
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> (gdb) p Packets[1255]@6397
> 
> $7 = {{head = 0x82d50f0, body = 0x95b7d70, exist = 0, proc_index = 0},
{head = 0x89078d8, body = 0x8af1168, 
> 
>     exist = 0, proc_index = 0}, {head = 0x8411710, body = 0x8af1168,
exist = 0, proc_index = 0}, {head = 0x89078d8, 
> 
>     body = 0x8af1168, exist = 0, proc_index = 0}, {head = 0x8411710,
body = 0x8af1168, exist = 0, proc_index = 0}, {
> 
>     head = 0x89078d8, body = 0x8af1168, exist = 0, proc_index = 0},
{head = 0x8411710, body = 0x8af1168, exist = 0, 
> 
>     proc_index = 0}, {head = 0x89078d8, body = 0x95b7d70, exist = 0,
proc_index = 0}, {head = 0x82d50f0, 
> 
>     body = 0x8af1168, exist = 0, proc_index = 0}, {head = 0x8411710,
body = 0x8af1168, exist = 0, proc_index = 0}, {
> 
>     head = 0x82d50f0, body = 0x8af1168, exist = 0, proc_index = 0},
{head = 0x8411710, body = 0x8af1168, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82d50f0, body = 0x8af1168, exist = 0,
proc_index = 0}, {head = 0x8411710, 
> 
>     body = 0x95b7d70, exist = 0, proc_index = 0}, {head = 0x89078d8,
body = 0x95b7d70, exist = 0, proc_index = 0}, {
> 
>     head = 0x8411710, body = 0x95b7d70, exist = 0, proc_index = 0},
{head = 0x89078d8, body = 0x95b7d70, exist = 0, 
> 
>     proc_index = 0}, {head = 0x8411710, body = 0x95b7d70, exist = 0,
proc_index = 0}, {head = 0x89078d8, 
> 
>     body = 0x95b7d70, exist = 0, proc_index = 0}, {head = 0x8411710,
body = 0x9180cd8, exist = 0, proc_index = 0}, {
> 
>     head = 0x82d50f0, body = 0x92f3420, exist = 0, proc_index = 3},
{head = 0x8412130, body = 0x95b7d70, exist = 0, 
> 
>     proc_index = 3}, {head = 0x89078d8, body = 0x8af1168, exist = 0,
proc_index = 0}, {head = 0x8412130, 
> 
>     body = 0x8af1168, exist = 0, proc_index = 3}, {head = 0x82d50f0,
body = 0x95b7d70, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d1be0, body = 0x92f3420, exist = 0, proc_index = 0},
{head = 0x8411710, body = 0x95b7d70, exist = 0, 
> 
>     proc_index = 3}, {head = 0x89078d8, body = 0x9180cd8, exist = 0,
proc_index = 0}, {head = 0x8412130, 
> 
>     body = 0x95b7d70, exist = 0, proc_index = 3}, {head = 0x82d50f0,
body = 0x93b9ad0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82d1be0, body = 0x95b7d70, exist = 0, proc_index = 3},
{head = 0x8411e60, body = 0x8af1168, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82d5e20, body = 0x92f3420, exist = 0,
proc_index = 3}, {head = 0x82d1b20, 
> 
>     body = 0x9422a78, exist = 0, proc_index = 3}, {head = 0x8411710,
body = 0x9180cd8, exist = 0, proc_index = 0}, {
> 
>     head = 0x89078d8, body = 0x9422a78, exist = 0, proc_index = 3},
{head = 0x81f3380, body = 0x89e8b10, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82f6030, body = 0x92f3420, exist = 0,
proc_index = 3}, {head = 0x82e5938, 
> 
>     body = 0x8af1168, exist = 0, proc_index = 3}, {head = 0x82fc850,
body = 0x95b7d70, exist = 0, proc_index = 3}, {
> 
>     head = 0x8412130, body = 0x93b9ad0, exist = 0, proc_index = 0},
{head = 0x82d5e20, body = 0x95b7d70, exist = 0, 
> 
>     proc_index = 3}, {head = 0x8411e60, body = 0x87463e0, exist = 0,
proc_index = 3}, {head = 0x82d1be0, 
> 
>     body = 0x8528b18, exist = 0, proc_index = 3}, {head = 0x82d50f0,
body = 0x8b24538, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d1b20, body = 0x92f3420, exist = 0, proc_index = 0},
{head = 0x82e5938, body = 0x8b24538, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82f6030, body = 0x8b30550, exist = 0,
proc_index = 3}, {head = 0x82f6030, 
> 
>     body = 0x8b24538, exist = 0, proc_index = 3}, {head = 0x82fc850,
body = 0x8b30550, exist = 0, proc_index = 0}, {
> 
>     head = 0x82e5938, body = 0x8b30550, exist = 0, proc_index = 0},
{head = 0x82fc850, body = 0x8b30550, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82e5938, body = 0x8b30550, exist = 0,
proc_index = 0}, {head = 0x82fc850, 
> 
>     body = 0x8b24538, exist = 0, proc_index = 0}, {head = 0x82f6030,
body = 0x8b30550, exist = 0, proc_index = 0}, {
> 
>     head = 0x82e5938, body = 0x8b30550, exist = 0, proc_index = 0},
{head = 0x82f6030, body = 0x8b30550, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82e5938, body = 0x8b30550, exist = 0,
proc_index = 0}, {head = 0x82f6030, 
> 
>     body = 0x8528b18, exist = 0, proc_index = 0}, {head = 0x82fc850,
body = 0x87463e0, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d1b20, body = 0x95b7d70, exist = 0, proc_index = 3},
{head = 0x82d50f0, body = 0x93b9ad0, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82d1be0, body = 0x89e8b10, exist = 0,
proc_index = 3}, {head = 0x82e5938, 
> 
>     body = 0x93b9ad0, exist = 0, proc_index = 0}, {head = 0x82d50f0,
body = 0x95b7d70, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d1b20, body = 0x95b7d70, exist = 0, proc_index = 3},
{head = 0x82fc850, body = 0x89e8b10, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82f6030, body = 0x87463e0, exist = 0,
proc_index = 3}, {head = 0x82d1be0, 
> 
>     body = 0x8528b18, exist = 0, proc_index = 0}, {head = 0x82d50f0,
body = 0x8af1168, exist = 0, proc_index = 0}, {
> 
>     head = 0x82fc850, body = 0x8528b18, exist = 0, proc_index = 3},
{head = 0x82d1b20, body = 0x87463e0, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82e5938, body = 0x89e8b10, exist = 0,
proc_index = 3}, {head = 0x8411e60, 
> 
>     body = 0x95b7d70, exist = 0, proc_index = 3}, {head = 0x82d5e20,
body = 0x93b9ad0, exist = 0, proc_index = 3}, {
> 
>     head = 0x8412130, body = 0x9180cd8, exist = 0, proc_index = 3},
{head = 0x81f3380, body = 0x868a840, exist = 0, 
> 
>     proc_index = 3}, {head = 0x89078d8, body = 0x82f2560, exist = 0,
proc_index = 3}, {head = 0x82f6030, 
> 
>     body = 0x8b24538, exist = 0, proc_index = 0}, {head = 0x8411710,
body = 0x82f2560, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d6760, body = 0x9422a78, exist = 0, proc_index = 3},
{head = 0x82d1be0, body = 0x9422a78, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82d6760, body = 0x82f2560, exist = 0,
proc_index = 0}, {head = 0x8411710, 
> 
>     body = 0x82f2560, exist = 0, proc_index = 0}, {head = 0x82d6760,
body = 0x9422a78, exist = 0, proc_index = 0}, {
> 
>     head = 0x82d1be0, body = 0x9422a78, exist = 0, proc_index = 0},
{head = 0x82d6760, body = 0x9422a78, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82d6760, body = 0x82f2560, exist = 0,
proc_index = 3}, {head = 0x82d6760, 
> 
>     body = 0x9422a78, exist = 0, proc_index = 3}, {head = 0x82d1be0,
body = 0x9422a78, exist = 0, proc_index = 1}, {
> 
>     head = 0x82d6760, body = 0x82f2560, exist = 0, proc_index = 1},
{head = 0x82d1be0, body = 0x93ac2d0, exist = 0, 
> 
>     proc_index = 1}, {head = 0x8411710, body = 0x8b24538, exist = 0,
proc_index = 1}, {head = 0x82f6030, 
> 
>     body = 0x93ac2d0, exist = 0, proc_index = 1}, {head = 0x82d1be0,
body = 0x8b24538, exist = 0, proc_index = 1}, {
> 
>     head = 0x8411710, body = 0x8b24538, exist = 0, proc_index = 0},
{head = 0x82d1be0, body = 0x8b24538, exist = 0, 
> 
>     proc_index = 1}, {head = 0x8411710, body = 0x82f2560, exist = 0,
proc_index = 1}, {head = 0x82f6030, 
> 
>     body = 0x868a840, exist = 0, proc_index = 1}, {head = 0x82d6760,
body = 0x93b9ad0, exist = 0, proc_index = 1}, {
> 
>     head = 0x89078d8, body = 0x89e8b10, exist = 0, proc_index = 1},
{head = 0x81f3380, body = 0x8528b18, exist = 0, 
> 
>     proc_index = 1}, {head = 0x8412130, body = 0x92f3420, exist = 0,
proc_index = 1}, {head = 0x82d5e20, 
> 
>     body = 0x84c7a98, exist = 0, proc_index = 1}, {head = 0x8411e60,
body = 0x87c7930, exist = 0, proc_index = 1}, {
> 
>     head = 0x82e5938, body = 0x9018c60, exist = 0, proc_index = 1},
{head = 0x82d1b20, body = 0x8472a88, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82d50f0, body = 0x9018c60, exist = 0,
proc_index = 3}, {head = 0x868c1c8, 
> 
>     body = 0x87c7930, exist = 0, proc_index = 3}, {head = 0x82d2ea8,
body = 0x84c7a98, exist = 0, proc_index = 3}, {
> 
>     head = 0x82f1408, body = 0x92f3420, exist = 0, proc_index = 3},
{head = 0x82e5968, body = 0x8528b18, exist = 0, 
> 
>     proc_index = 3}, {head = 0x8309ff8, body = 0x89e8b10, exist = 0,
proc_index = 3}, {head = 0x8907968, 
> 
>     body = 0x93b9ad0, exist = 0, proc_index = 3}, {head = 0x8503db8,
body = 0x868a840, exist = 0, proc_index = 3}, {
> 
>     head = 0x8689350, body = 0x82f2560, exist = 0, proc_index = 3},
{head = 0x8309930, body = 0x8b24538, exist = 0, 
> 
>     proc_index = 3}, {head = 0x86894f8, body = 0x96739d0, exist = 0,
proc_index = 3}, {head = 0x868bf88, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 3}, {head = 0x8309840,
body = 0x8c42888, exist = 0, proc_index = 3}, {
> 
>     head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x8309840, body = 0x8c42888, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82fc850, body = 0x8c42888, exist = 0,
proc_index = 0}, {head = 0x8309840, 
> 
>     body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x82fc850,
body = 0x9583ea0, exist = 0, proc_index = 0}, {
> 
>     head = 0x868bf88, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x8309840, body = 0x8c42888, exist = 0, 
> 
>     proc_index = 0}, {head = 0x868bf88, body = 0x8c42888, exist = 0,
proc_index = 0}, {head = 0x8309840, 
> 
>     body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x868bf88,
body = 0x9583ea0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x8309840, body = 0x9583ea0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x868bf88, body = 0x8c42888, exist = 0,
proc_index = 0}, {head = 0x82fc850, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x8309840,
body = 0x9583ea0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82fc850, body = 0x9583ea0, exist = 0, proc_index = 0},
{head = 0x8309840, body = 0x9583ea0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82fc850, body = 0x9583ea0, exist = 0,
proc_index = 0}, {head = 0x8309840, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x82fc850,
body = 0x8c42888, exist = 0, proc_index = 0}, {
> 
>     head = 0x868bf88, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x82fc850, body = 0x9583ea0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x8309840, body = 0x8c42888, exist = 0,
proc_index = 0}, {head = 0x868bf88, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x82fc850,
body = 0x8c42888, exist = 0, proc_index = 0}, {
> 
>     head = 0x8309840, body = 0x9583ea0, exist = 0, proc_index = 0},
{head = 0x868bf88, body = 0x9583ea0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x8309840, body = 0x8c42888, exist = 0,
proc_index = 0}, {head = 0x82fc850, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x868bf88,
body = 0x9583ea0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x8309840, body = 0x8c42888, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82fc850, body = 0x9583ea0, exist = 0,
proc_index = 0}, {head = 0x868bf88, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x82fc850,
body = 0x8c42888, exist = 0, proc_index = 0}, {
> 
>     head = 0x8309840, body = 0x9583ea0, exist = 0, proc_index = 0},
{head = 0x868bf88, body = 0x96739d0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82fc850, body = 0x94f4be8, exist = 0,
proc_index = 0}, {head = 0x86894f8, 
> 
>     body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x8309840,
body = 0x96739d0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x868bf88, body = 0x9583ea0, exist = 0, 
> 
>     proc_index = 0}, {head = 0x86894f8, body = 0x9583ea0, exist = 0,
proc_index = 0}, {head = 0x82fc850, 
> 
>     body = 0x8528b18, exist = 0, proc_index = 3}, {head = 0x8309840,
body = 0x8c42888, exist = 0, proc_index = 3}, {
> 
>     head = 0x8309930, body = 0x96739d0, exist = 0, proc_index = 3},
{head = 0x8689350, body = 0x8b24538, exist = 0, 
> 
>     proc_index = 3}, {head = 0x8503db8, body = 0x82f2560, exist = 0,
proc_index = 3}, {head = 0x8907968, 
> 
>     body = 0x868a840, exist = 0, proc_index = 3}, {head = 0x8309ff8,
body = 0x93b9ad0, exist = 0, proc_index = 3}, {
> 
>     head = 0x868bf88, body = 0x8472a88, exist = 0, proc_index = 0},
{head = 0x82f1408, body = 0x93b9ad0, exist = 0, 
> 
>     proc_index = 3}, {head = 0x82d2ea8, body = 0x89e8b10, exist = 0,
proc_index = 3}, {head = 0x868c1c8, 
> 
>     body = 0x92f3420, exist = 0, proc_index = 3}, {head = 0x82d50f0,
body = 0x84c7a98, exist = 0, proc_index = 3}, {
> 
>     head = 0x82d1b20, body = 0x87c7930, exist = 0, proc_index = 3},
{head = 0x8309ff8, body = 0x87c7930, exist = 0, 
> 
>     proc_index = 3}, {head = 0x8907968, body = 0x9018c60, exist = 0,
proc_index = 3}, {head = 0x8503db8, 
> 
>     body = 0x868a840, exist = 0, proc_index = 3}, {head = 0x8689350,
body = 0x82f2560, exist = 0, proc_index = 3}, {
> 
>     head = 0x8309930, body = 0x8b24538, exist = 0, proc_index = 3},
{head = 0x82e5968, body = 0x8b24538, exist = 0, 
> 
>     proc_index = 0}, {head = 0x8309930, body = 0x8b24538, exist = 0,
proc_index = 0}, {head = 0x82e5968, 
> 
>     body = 0x868a840, exist = 0, proc_index = 0}, {head = 0x8689350,
body = 0x94f4be8, exist = 0, proc_index = 0}, {
> 
>     head = 0x8503db8, body = 0x9018c60, exist = 0, proc_index = 0},
{head = 0x8907968, body = 0x868a840, exist = 0, 
> 
>     proc_index = 0}, {head = 0x8503db8, body = 0x82f2560, exist = 0,
proc_index = 0}, {head = 0x8689350, 
> 
>     body = 0x9018c60, exist = 0, proc_index = 0}, {head = 0x82e5968,
body = 0x8b24538, exist = 0, proc_index = 2}, {
> 
>     head = 0x8309930, body = 0x84c7a98, exist = 0, proc_index = 0},
{head = 0x8309ff8, body = 0x94f4be8, exist = 0, 
> 
>     proc_index = 0}, {head = 0x82d1b20, body = 0x87c7930, exist = 0,
proc_index = 0}, {head = 0x82d50f0, 
> 
>     body = 0x89e8b10, exist = 0, proc_index = 0}, {head = 0x868c1c8,
body = 0x96739d0, exist = 0, proc_index = 0}, {
> 
>     head = 0x82e5968, body = 0x96739d0, exist = 0, proc_index = 3},
{head = 0x82d2ea8, body = 0x87c7930, exist = 0, 
> 
>     proc_index = 0}, {head = 0x868c1c8, body = 0x92f3420, exist = 0,
proc_index = 0}, {head = 0x82d50f0, 
> 
>     body = 0x8b24538, exist = 0, proc_index = 0}, {head = 0x82d1b20,
body = 0x84c7a98, exist = 0, proc_index = 0}...}
> 
>  
> 
>  
> 
>  
> 
>
------------------------------------------------------------------------
> 
> *From:* Juan Gomez
> *Sent:* Tuesday, July 10, 2007 5:50 PM
> *To:* 'spread-users at lists.spread.org'
> *Cc:* Juan Gomez
> *Subject:* Spread 4.0 memory corruption bug in Fill_form1()
> 
>  
> 
> Hi all:
> 
>  
> 
> I have noticed that after long running test spread 4.0 crashes with 
> corrupted stack in the following spot:
> 
>  
> 
>  
> 
> Membership.c:
> 
>  
> 
> static    void      Fill_form1( sys_scatter *scat )
> 
> {
> 
> ....
> 
>                 /* New ring_info will fit, so create it */
> 
>                 for( index = Last_discarded+1; index <= Highest_seq; 
> index++ )
> 
>                 {
> 
>                     pack_entry = index & PACKET_MASK;
> 
>                     if( ! Packets[pack_entry].exist )
> 
>                     {
> 
>                                     *new_holes_procs_ptr = index;
> 
>                                     Alarm( MEMB , "INSERT HOLE 2 IS 
> %d\n",index); <<<<<<<<<<<<<<<<<<<<<<<<< CRASH HERE
> 
>                                     new_holes_procs_ptr++;
> 
>                                     num_bytes     += sizeof(int32);
> 
>                                     new_rg_info->num_holes++;
> 
>                     }
> 
>                 }
> 
>  
> 
>  
> 
> }
> 
>  
> 
> The first stack trace I got was this:
> 
>  
> 
> (gdb) where
> 
> #0  0xb7ebe9da in getenv () from /lib/libc.so.6
> 
> #1  0xb7f10d19 in tzset_internal () from /lib/libc.so.6
> 
> #2  0xb7f11a28 in __tz_convert () from /lib/libc.so.6
> 
> #3  0xb7f0fca0 in localtime () from /lib/libc.so.6
> 
> #4  0x08053498 in Alarm (mask=512, message=0x806c171 "INSERT HOLE 2 IS
%d\n")
> 
>     at alarm.c:146
> 
> #5  0x0805864e in Fill_form1 (scat=Variable "scat" is not available.
> 
> ) at membership.c:1837
> 
> #6  0x00000f79 in ?? ()
> 
> #7  0x00000f7a in ?? ()
> 
> #8  0x00000f7b in ?? ()
> 
> #9  0x00000f7c in ?? ()
> 
>  
> 
>  
> 
> This obviously shows the stack was corrupted and since the only part
of the code I could suspect of causing this corruption was the following
buffer:
> 
>  
> 
>         char           rg_info_buf[sizeof(token_body)];
> 
>  
> 
> I added asserts to track exactly at what point we overrun the buffer
and found the following:
> 
> ....
> 
>        rg_info_buf_end = rg_info_buf + sizeof(token_body);
> 
> ....
> 
>  
> 
>                 /* New ring_info will fit, so create it */
> 
>                 for( index = Last_discarded+1; index <= Highest_seq;
index++ )
> 
>                 {
> 
>                     pack_entry = index & PACKET_MASK;
> 
>                     if( ! Packets[pack_entry].exist )
> 
>                     {
> 
>                         assert((rg_info_buf + num_bytes +
sizeof(int32)) <= rg_info_buf_end); <<<<< TRIGERRED ASSERT
> 
>                        *new_holes_procs_ptr = index;
> 
>                        Alarm( MEMB , "INSERT HOLE 2 IS %d\n",index);
> 
>                        new_holes_procs_ptr++;
> 
>                        num_bytes     += sizeof(int32);
> 
>                        new_rg_info->num_holes++;
> 
>                     }
> 
>                 }
> 
>  
> 
>  
> 
> So it looks like we overrun the rg_info_buf under some conditions and
I am wondering if snybody has seen this problem or whether there have
been patches issued for this issue? This was observed with 8 node
cluster....
> 
>  
> 
>  
> 
> Regards, Juan
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> 
>
------------------------------------------------------------------------
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users




More information about the Spread-users mailing list