[Spread-users] RE: Spread 4.0 memory corruption bug in Fill_form1() and Create_form1()

Juan Gomez jgomez at juniper.net
Tue Aug 7 05:15:19 EDT 2007


In pursuing this bug further I have managed to collect more information
in the hopes Spread developers can provide a fix or at least shed some
light on what may be going on here. Here is the most recent data I have
obtained and some speculation about what may be broken:
 
During a heavy load test (many spread messages being sent at the time
the CPU where the spread daemons are running are highly utilized) on
Spread I observed a crash in the Alarm() call in the following code in
membership.c: 
 
static  void    Create_form1()
{
....
        /* update holes */
        rg_info->num_holes     = 0;
        for( index = My_aru+1; index <= Highest_seq; index++ )
        {
            pack_entry = index & PACKET_MASK;
            if( ! Packets[pack_entry].exist )
            {
               *holes_procs_ptr = index;
CRASH >>>>             Alarm( MEMB ,
                   "INSERT HOLE 1 IS %d My_aru is %d, Highest_seq is
%d\n",
                   index,My_aru, Highest_seq);
               holes_procs_ptr++;
               num_bytes += sizeof(int32);
               rg_info->num_holes++;
            }
        }
....
 
}
 
>From examining the core file and the code itself it seems to be the case
that the function wrote past the limits of rg_info_buf[] array while
writing holes to the token and corrupted the stack eventually causing
the program to crash with SIGSEGV.
 
>From looking at constants and related data types it seems to me that if
there are too many holes to include in a token code breaks as it does
not check that the holes will fit in the rg_info_buf array properly.
 
By the time of the crash some of the relevant global values were:
 
(gdb) p Highest_seq
$9 = 47491
(gdb) p My_aru
$10 = 1225
(gdb)
 
The partial dump of Packets[] is attached at the end if needed.
 
>From these numbers it appears to me as this spread deamon that crash got
really behind and the code does not handle that situation properly...
 
Also as I read the code that sets these globals I ran across the
following function:
 
 
 
static  void    Backoff_membership()
{
        int     pack_entry;
        int     i;
 
        pack_entry=-1;
        for( i=Last_discarded+1; i <= Highest_seq; i++ )
        {
               /* clear dummy messages */
               pack_entry = i & PACKET_MASK;
               if( Packets[pack_entry].exist == 3 )
                       Packets[pack_entry].exist = 0;
        }
 
        /* return Aru and My_aru */
        Aru = Last_discarded;
 
        My_aru = Last_discarded;
        for( i=Last_discarded+1; i <= Highest_seq; i++ )
        {
               if( !Packets[pack_entry].exist ) break;
               My_aru++;
        }
}
 
My question about this function is whether the second loop is properly
coded: in other words was the if( !Packets[pack_entry].exist ) break;
intended to check a pack_entry value which is constant or was there some
typo and the pack_entry in this second loop supposed to track the loop
index as in the first loop.
 
Your help is greatly appreciated in getting to the root of this issue.
 
Regards, Juan
 
 
 
 
 
 
(gdb) p Packets[1255]@6397
$7 = {{head = 0x82d50f0, body = 0x95b7d70, exist = 0, proc_index = 0},
{head = 0x89078d8, body = 0x8af1168, 
    exist = 0, proc_index = 0}, {head = 0x8411710, body = 0x8af1168,
exist = 0, proc_index = 0}, {head = 0x89078d8, 
    body = 0x8af1168, exist = 0, proc_index = 0}, {head = 0x8411710,
body = 0x8af1168, exist = 0, proc_index = 0}, {
    head = 0x89078d8, body = 0x8af1168, exist = 0, proc_index = 0},
{head = 0x8411710, body = 0x8af1168, exist = 0, 
    proc_index = 0}, {head = 0x89078d8, body = 0x95b7d70, exist = 0,
proc_index = 0}, {head = 0x82d50f0, 
    body = 0x8af1168, exist = 0, proc_index = 0}, {head = 0x8411710,
body = 0x8af1168, exist = 0, proc_index = 0}, {
    head = 0x82d50f0, body = 0x8af1168, exist = 0, proc_index = 0},
{head = 0x8411710, body = 0x8af1168, exist = 0, 
    proc_index = 0}, {head = 0x82d50f0, body = 0x8af1168, exist = 0,
proc_index = 0}, {head = 0x8411710, 
    body = 0x95b7d70, exist = 0, proc_index = 0}, {head = 0x89078d8,
body = 0x95b7d70, exist = 0, proc_index = 0}, {
    head = 0x8411710, body = 0x95b7d70, exist = 0, proc_index = 0},
{head = 0x89078d8, body = 0x95b7d70, exist = 0, 
    proc_index = 0}, {head = 0x8411710, body = 0x95b7d70, exist = 0,
proc_index = 0}, {head = 0x89078d8, 
    body = 0x95b7d70, exist = 0, proc_index = 0}, {head = 0x8411710,
body = 0x9180cd8, exist = 0, proc_index = 0}, {
    head = 0x82d50f0, body = 0x92f3420, exist = 0, proc_index = 3},
{head = 0x8412130, body = 0x95b7d70, exist = 0, 
    proc_index = 3}, {head = 0x89078d8, body = 0x8af1168, exist = 0,
proc_index = 0}, {head = 0x8412130, 
    body = 0x8af1168, exist = 0, proc_index = 3}, {head = 0x82d50f0,
body = 0x95b7d70, exist = 0, proc_index = 3}, {
    head = 0x82d1be0, body = 0x92f3420, exist = 0, proc_index = 0},
{head = 0x8411710, body = 0x95b7d70, exist = 0, 
    proc_index = 3}, {head = 0x89078d8, body = 0x9180cd8, exist = 0,
proc_index = 0}, {head = 0x8412130, 
    body = 0x95b7d70, exist = 0, proc_index = 3}, {head = 0x82d50f0,
body = 0x93b9ad0, exist = 0, proc_index = 0}, {
    head = 0x82d1be0, body = 0x95b7d70, exist = 0, proc_index = 3},
{head = 0x8411e60, body = 0x8af1168, exist = 0, 
    proc_index = 3}, {head = 0x82d5e20, body = 0x92f3420, exist = 0,
proc_index = 3}, {head = 0x82d1b20, 
    body = 0x9422a78, exist = 0, proc_index = 3}, {head = 0x8411710,
body = 0x9180cd8, exist = 0, proc_index = 0}, {
    head = 0x89078d8, body = 0x9422a78, exist = 0, proc_index = 3},
{head = 0x81f3380, body = 0x89e8b10, exist = 0, 
    proc_index = 3}, {head = 0x82f6030, body = 0x92f3420, exist = 0,
proc_index = 3}, {head = 0x82e5938, 
    body = 0x8af1168, exist = 0, proc_index = 3}, {head = 0x82fc850,
body = 0x95b7d70, exist = 0, proc_index = 3}, {
    head = 0x8412130, body = 0x93b9ad0, exist = 0, proc_index = 0},
{head = 0x82d5e20, body = 0x95b7d70, exist = 0, 
    proc_index = 3}, {head = 0x8411e60, body = 0x87463e0, exist = 0,
proc_index = 3}, {head = 0x82d1be0, 
    body = 0x8528b18, exist = 0, proc_index = 3}, {head = 0x82d50f0,
body = 0x8b24538, exist = 0, proc_index = 3}, {
    head = 0x82d1b20, body = 0x92f3420, exist = 0, proc_index = 0},
{head = 0x82e5938, body = 0x8b24538, exist = 0, 
    proc_index = 3}, {head = 0x82f6030, body = 0x8b30550, exist = 0,
proc_index = 3}, {head = 0x82f6030, 
    body = 0x8b24538, exist = 0, proc_index = 3}, {head = 0x82fc850,
body = 0x8b30550, exist = 0, proc_index = 0}, {
    head = 0x82e5938, body = 0x8b30550, exist = 0, proc_index = 0},
{head = 0x82fc850, body = 0x8b30550, exist = 0, 
    proc_index = 0}, {head = 0x82e5938, body = 0x8b30550, exist = 0,
proc_index = 0}, {head = 0x82fc850, 
    body = 0x8b24538, exist = 0, proc_index = 0}, {head = 0x82f6030,
body = 0x8b30550, exist = 0, proc_index = 0}, {
    head = 0x82e5938, body = 0x8b30550, exist = 0, proc_index = 0},
{head = 0x82f6030, body = 0x8b30550, exist = 0, 
    proc_index = 0}, {head = 0x82e5938, body = 0x8b30550, exist = 0,
proc_index = 0}, {head = 0x82f6030, 
    body = 0x8528b18, exist = 0, proc_index = 0}, {head = 0x82fc850,
body = 0x87463e0, exist = 0, proc_index = 3}, {
    head = 0x82d1b20, body = 0x95b7d70, exist = 0, proc_index = 3},
{head = 0x82d50f0, body = 0x93b9ad0, exist = 0, 
    proc_index = 3}, {head = 0x82d1be0, body = 0x89e8b10, exist = 0,
proc_index = 3}, {head = 0x82e5938, 
    body = 0x93b9ad0, exist = 0, proc_index = 0}, {head = 0x82d50f0,
body = 0x95b7d70, exist = 0, proc_index = 3}, {
    head = 0x82d1b20, body = 0x95b7d70, exist = 0, proc_index = 3},
{head = 0x82fc850, body = 0x89e8b10, exist = 0, 
    proc_index = 3}, {head = 0x82f6030, body = 0x87463e0, exist = 0,
proc_index = 3}, {head = 0x82d1be0, 
    body = 0x8528b18, exist = 0, proc_index = 0}, {head = 0x82d50f0,
body = 0x8af1168, exist = 0, proc_index = 0}, {
    head = 0x82fc850, body = 0x8528b18, exist = 0, proc_index = 3},
{head = 0x82d1b20, body = 0x87463e0, exist = 0, 
    proc_index = 3}, {head = 0x82e5938, body = 0x89e8b10, exist = 0,
proc_index = 3}, {head = 0x8411e60, 
    body = 0x95b7d70, exist = 0, proc_index = 3}, {head = 0x82d5e20,
body = 0x93b9ad0, exist = 0, proc_index = 3}, {
    head = 0x8412130, body = 0x9180cd8, exist = 0, proc_index = 3},
{head = 0x81f3380, body = 0x868a840, exist = 0, 
    proc_index = 3}, {head = 0x89078d8, body = 0x82f2560, exist = 0,
proc_index = 3}, {head = 0x82f6030, 
    body = 0x8b24538, exist = 0, proc_index = 0}, {head = 0x8411710,
body = 0x82f2560, exist = 0, proc_index = 3}, {
    head = 0x82d6760, body = 0x9422a78, exist = 0, proc_index = 3},
{head = 0x82d1be0, body = 0x9422a78, exist = 0, 
    proc_index = 0}, {head = 0x82d6760, body = 0x82f2560, exist = 0,
proc_index = 0}, {head = 0x8411710, 
    body = 0x82f2560, exist = 0, proc_index = 0}, {head = 0x82d6760,
body = 0x9422a78, exist = 0, proc_index = 0}, {
    head = 0x82d1be0, body = 0x9422a78, exist = 0, proc_index = 0},
{head = 0x82d6760, body = 0x9422a78, exist = 0, 
    proc_index = 0}, {head = 0x82d6760, body = 0x82f2560, exist = 0,
proc_index = 3}, {head = 0x82d6760, 
    body = 0x9422a78, exist = 0, proc_index = 3}, {head = 0x82d1be0,
body = 0x9422a78, exist = 0, proc_index = 1}, {
    head = 0x82d6760, body = 0x82f2560, exist = 0, proc_index = 1},
{head = 0x82d1be0, body = 0x93ac2d0, exist = 0, 
    proc_index = 1}, {head = 0x8411710, body = 0x8b24538, exist = 0,
proc_index = 1}, {head = 0x82f6030, 
    body = 0x93ac2d0, exist = 0, proc_index = 1}, {head = 0x82d1be0,
body = 0x8b24538, exist = 0, proc_index = 1}, {
    head = 0x8411710, body = 0x8b24538, exist = 0, proc_index = 0},
{head = 0x82d1be0, body = 0x8b24538, exist = 0, 
    proc_index = 1}, {head = 0x8411710, body = 0x82f2560, exist = 0,
proc_index = 1}, {head = 0x82f6030, 
    body = 0x868a840, exist = 0, proc_index = 1}, {head = 0x82d6760,
body = 0x93b9ad0, exist = 0, proc_index = 1}, {
    head = 0x89078d8, body = 0x89e8b10, exist = 0, proc_index = 1},
{head = 0x81f3380, body = 0x8528b18, exist = 0, 
    proc_index = 1}, {head = 0x8412130, body = 0x92f3420, exist = 0,
proc_index = 1}, {head = 0x82d5e20, 
    body = 0x84c7a98, exist = 0, proc_index = 1}, {head = 0x8411e60,
body = 0x87c7930, exist = 0, proc_index = 1}, {
    head = 0x82e5938, body = 0x9018c60, exist = 0, proc_index = 1},
{head = 0x82d1b20, body = 0x8472a88, exist = 0, 
    proc_index = 0}, {head = 0x82d50f0, body = 0x9018c60, exist = 0,
proc_index = 3}, {head = 0x868c1c8, 
    body = 0x87c7930, exist = 0, proc_index = 3}, {head = 0x82d2ea8,
body = 0x84c7a98, exist = 0, proc_index = 3}, {
    head = 0x82f1408, body = 0x92f3420, exist = 0, proc_index = 3},
{head = 0x82e5968, body = 0x8528b18, exist = 0, 
    proc_index = 3}, {head = 0x8309ff8, body = 0x89e8b10, exist = 0,
proc_index = 3}, {head = 0x8907968, 
    body = 0x93b9ad0, exist = 0, proc_index = 3}, {head = 0x8503db8,
body = 0x868a840, exist = 0, proc_index = 3}, {
    head = 0x8689350, body = 0x82f2560, exist = 0, proc_index = 3},
{head = 0x8309930, body = 0x8b24538, exist = 0, 
    proc_index = 3}, {head = 0x86894f8, body = 0x96739d0, exist = 0,
proc_index = 3}, {head = 0x868bf88, 
    body = 0x9583ea0, exist = 0, proc_index = 3}, {head = 0x8309840,
body = 0x8c42888, exist = 0, proc_index = 3}, {
    head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x8309840, body = 0x8c42888, exist = 0, 
    proc_index = 0}, {head = 0x82fc850, body = 0x8c42888, exist = 0,
proc_index = 0}, {head = 0x8309840, 
    body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x82fc850,
body = 0x9583ea0, exist = 0, proc_index = 0}, {
    head = 0x868bf88, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x8309840, body = 0x8c42888, exist = 0, 
    proc_index = 0}, {head = 0x868bf88, body = 0x8c42888, exist = 0,
proc_index = 0}, {head = 0x8309840, 
    body = 0x8c42888, exist = 0, proc_index = 0}, {head = 0x868bf88,
body = 0x9583ea0, exist = 0, proc_index = 0}, {
    head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x8309840, body = 0x9583ea0, exist = 0, 
    proc_index = 0}, {head = 0x868bf88, body = 0x8c42888, exist = 0,
proc_index = 0}, {head = 0x82fc850, 
    body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x8309840,
body = 0x9583ea0, exist = 0, proc_index = 0}, {
    head = 0x82fc850, body = 0x9583ea0, exist = 0, proc_index = 0},
{head = 0x8309840, body = 0x9583ea0, exist = 0, 
    proc_index = 0}, {head = 0x82fc850, body = 0x9583ea0, exist = 0,
proc_index = 0}, {head = 0x8309840, 
    body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x82fc850,
body = 0x8c42888, exist = 0, proc_index = 0}, {
    head = 0x868bf88, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x82fc850, body = 0x9583ea0, exist = 0, 
    proc_index = 0}, {head = 0x8309840, body = 0x8c42888, exist = 0,
proc_index = 0}, {head = 0x868bf88, 
    body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x82fc850,
body = 0x8c42888, exist = 0, proc_index = 0}, {
    head = 0x8309840, body = 0x9583ea0, exist = 0, proc_index = 0},
{head = 0x868bf88, body = 0x9583ea0, exist = 0, 
    proc_index = 0}, {head = 0x8309840, body = 0x8c42888, exist = 0,
proc_index = 0}, {head = 0x82fc850, 
    body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x868bf88,
body = 0x9583ea0, exist = 0, proc_index = 0}, {
    head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x8309840, body = 0x8c42888, exist = 0, 
    proc_index = 0}, {head = 0x82fc850, body = 0x9583ea0, exist = 0,
proc_index = 0}, {head = 0x868bf88, 
    body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x82fc850,
body = 0x8c42888, exist = 0, proc_index = 0}, {
    head = 0x8309840, body = 0x9583ea0, exist = 0, proc_index = 0},
{head = 0x868bf88, body = 0x96739d0, exist = 0, 
    proc_index = 0}, {head = 0x82fc850, body = 0x94f4be8, exist = 0,
proc_index = 0}, {head = 0x86894f8, 
    body = 0x9583ea0, exist = 0, proc_index = 0}, {head = 0x8309840,
body = 0x96739d0, exist = 0, proc_index = 0}, {
    head = 0x82fc850, body = 0x8c42888, exist = 0, proc_index = 0},
{head = 0x868bf88, body = 0x9583ea0, exist = 0, 
    proc_index = 0}, {head = 0x86894f8, body = 0x9583ea0, exist = 0,
proc_index = 0}, {head = 0x82fc850, 
    body = 0x8528b18, exist = 0, proc_index = 3}, {head = 0x8309840,
body = 0x8c42888, exist = 0, proc_index = 3}, {
    head = 0x8309930, body = 0x96739d0, exist = 0, proc_index = 3},
{head = 0x8689350, body = 0x8b24538, exist = 0, 
    proc_index = 3}, {head = 0x8503db8, body = 0x82f2560, exist = 0,
proc_index = 3}, {head = 0x8907968, 
    body = 0x868a840, exist = 0, proc_index = 3}, {head = 0x8309ff8,
body = 0x93b9ad0, exist = 0, proc_index = 3}, {
    head = 0x868bf88, body = 0x8472a88, exist = 0, proc_index = 0},
{head = 0x82f1408, body = 0x93b9ad0, exist = 0, 
    proc_index = 3}, {head = 0x82d2ea8, body = 0x89e8b10, exist = 0,
proc_index = 3}, {head = 0x868c1c8, 
    body = 0x92f3420, exist = 0, proc_index = 3}, {head = 0x82d50f0,
body = 0x84c7a98, exist = 0, proc_index = 3}, {
    head = 0x82d1b20, body = 0x87c7930, exist = 0, proc_index = 3},
{head = 0x8309ff8, body = 0x87c7930, exist = 0, 
    proc_index = 3}, {head = 0x8907968, body = 0x9018c60, exist = 0,
proc_index = 3}, {head = 0x8503db8, 
    body = 0x868a840, exist = 0, proc_index = 3}, {head = 0x8689350,
body = 0x82f2560, exist = 0, proc_index = 3}, {
    head = 0x8309930, body = 0x8b24538, exist = 0, proc_index = 3},
{head = 0x82e5968, body = 0x8b24538, exist = 0, 
    proc_index = 0}, {head = 0x8309930, body = 0x8b24538, exist = 0,
proc_index = 0}, {head = 0x82e5968, 
    body = 0x868a840, exist = 0, proc_index = 0}, {head = 0x8689350,
body = 0x94f4be8, exist = 0, proc_index = 0}, {
    head = 0x8503db8, body = 0x9018c60, exist = 0, proc_index = 0},
{head = 0x8907968, body = 0x868a840, exist = 0, 
    proc_index = 0}, {head = 0x8503db8, body = 0x82f2560, exist = 0,
proc_index = 0}, {head = 0x8689350, 
    body = 0x9018c60, exist = 0, proc_index = 0}, {head = 0x82e5968,
body = 0x8b24538, exist = 0, proc_index = 2}, {
    head = 0x8309930, body = 0x84c7a98, exist = 0, proc_index = 0},
{head = 0x8309ff8, body = 0x94f4be8, exist = 0, 
    proc_index = 0}, {head = 0x82d1b20, body = 0x87c7930, exist = 0,
proc_index = 0}, {head = 0x82d50f0, 
    body = 0x89e8b10, exist = 0, proc_index = 0}, {head = 0x868c1c8,
body = 0x96739d0, exist = 0, proc_index = 0}, {
    head = 0x82e5968, body = 0x96739d0, exist = 0, proc_index = 3},
{head = 0x82d2ea8, body = 0x87c7930, exist = 0, 
    proc_index = 0}, {head = 0x868c1c8, body = 0x92f3420, exist = 0,
proc_index = 0}, {head = 0x82d50f0, 
    body = 0x8b24538, exist = 0, proc_index = 0}, {head = 0x82d1b20,
body = 0x84c7a98, exist = 0, proc_index = 0}...}
 

 

 

________________________________

From: Juan Gomez 
Sent: Tuesday, July 10, 2007 5:50 PM
To: 'spread-users at lists.spread.org'
Cc: Juan Gomez
Subject: Spread 4.0 memory corruption bug in Fill_form1()

 

Hi all:

 

I have noticed that after long running test spread 4.0 crashes with
corrupted stack in the following spot:

 

 

Membership.c:

 

static    void      Fill_form1( sys_scatter *scat )

{

....

                /* New ring_info will fit, so create it */

                for( index = Last_discarded+1; index <= Highest_seq;
index++ )

                {

                    pack_entry = index & PACKET_MASK;

                    if( ! Packets[pack_entry].exist )

                    {

                                    *new_holes_procs_ptr = index;

                                    Alarm( MEMB , "INSERT HOLE 2 IS
%d\n",index); <<<<<<<<<<<<<<<<<<<<<<<<< CRASH HERE

                                    new_holes_procs_ptr++;

                                    num_bytes     += sizeof(int32);

                                    new_rg_info->num_holes++;

                    }

                }

 

 

}

 

The first stack trace I got was this:

 

(gdb) where
#0  0xb7ebe9da in getenv () from /lib/libc.so.6
#1  0xb7f10d19 in tzset_internal () from /lib/libc.so.6
#2  0xb7f11a28 in __tz_convert () from /lib/libc.so.6
#3  0xb7f0fca0 in localtime () from /lib/libc.so.6
#4  0x08053498 in Alarm (mask=512, message=0x806c171 "INSERT HOLE 2 IS
%d\n")
    at alarm.c:146
#5  0x0805864e in Fill_form1 (scat=Variable "scat" is not available.
) at membership.c:1837
#6  0x00000f79 in ?? ()
#7  0x00000f7a in ?? ()
#8  0x00000f7b in ?? ()
#9  0x00000f7c in ?? ()
 
 
This obviously shows the stack was corrupted and since the only part of
the code I could suspect of causing this corruption was the following
buffer:
 
        char           rg_info_buf[sizeof(token_body)];
 
I added asserts to track exactly at what point we overrun the buffer and
found the following:
....
       rg_info_buf_end = rg_info_buf + sizeof(token_body);
....
 
                /* New ring_info will fit, so create it */
                for( index = Last_discarded+1; index <= Highest_seq;
index++ )
                {
                    pack_entry = index & PACKET_MASK;
                    if( ! Packets[pack_entry].exist )
                    {
                        assert((rg_info_buf + num_bytes + sizeof(int32))
<= rg_info_buf_end); <<<<< TRIGERRED ASSERT
                       *new_holes_procs_ptr = index;
                       Alarm( MEMB , "INSERT HOLE 2 IS %d\n",index);
                       new_holes_procs_ptr++;
                       num_bytes     += sizeof(int32);
                       new_rg_info->num_holes++;
                    }
                }
 
 
So it looks like we overrun the rg_info_buf under some conditions and I
am wondering if snybody has seen this problem or whether there have been
patches issued for this issue? This was observed with 8 node cluster....
 
 
Regards, Juan
 
 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20070807/93c35923/attachment.html 


More information about the Spread-users mailing list