[Spread-users] Spread 4.0 memory corruption bug in Fill_form1()

Tue Jul 10 20:49:49 EDT 2007

Hi all:

I have noticed that after long running test spread 4.0 crashes with
corrupted stack in the following spot:

Membership.c:

static    void      Fill_form1( sys_scatter *scat )

{

....

                /* New ring_info will fit, so create it */

                for( index = Last_discarded+1; index <= Highest_seq;
index++ )

                {

                    pack_entry = index & PACKET_MASK;

                    if( ! Packets[pack_entry].exist )

                    {

                                    *new_holes_procs_ptr = index;

                                    Alarm( MEMB , "INSERT HOLE 2 IS
%d\n",index); <<<<<<<<<<<<<<<<<<<<<<<<< CRASH HERE

                                    new_holes_procs_ptr++;

                                    num_bytes     += sizeof(int32);

                                    new_rg_info->num_holes++;

                    }

                }

}

The first stack trace I got was this:

(gdb) where
#0  0xb7ebe9da in getenv () from /lib/libc.so.6
#1  0xb7f10d19 in tzset_internal () from /lib/libc.so.6
#2  0xb7f11a28 in __tz_convert () from /lib/libc.so.6
#3  0xb7f0fca0 in localtime () from /lib/libc.so.6
#4  0x08053498 in Alarm (mask=512, message=0x806c171 "INSERT HOLE 2 IS
%d\n")
    at alarm.c:146
#5  0x0805864e in Fill_form1 (scat=Variable "scat" is not available.
) at membership.c:1837
#6  0x00000f79 in ?? ()
#7  0x00000f7a in ?? ()
#8  0x00000f7b in ?? ()
#9  0x00000f7c in ?? ()

This obviously shows the stack was corrupted and since the only part of
the code I could suspect of causing this corruption was the following
buffer:

        char           rg_info_buf[sizeof(token_body)];

I added asserts to track exactly at what point we overrun the buffer and
found the following:
....
       rg_info_buf_end = rg_info_buf + sizeof(token_body);
....

                /* New ring_info will fit, so create it */
                for( index = Last_discarded+1; index <= Highest_seq;
index++ )
                {
                    pack_entry = index & PACKET_MASK;
                    if( ! Packets[pack_entry].exist )
                    {
                        assert((rg_info_buf + num_bytes + sizeof(int32))
<= rg_info_buf_end); <<<<< TRIGERRED ASSERT
                       *new_holes_procs_ptr = index;
                       Alarm( MEMB , "INSERT HOLE 2 IS %d\n",index);
                       new_holes_procs_ptr++;
                       num_bytes     += sizeof(int32);
                       new_rg_info->num_holes++;
                    }
                }

So it looks like we overrun the rg_info_buf under some conditions and I
am wondering if snybody has seen this problem or whether there have been
patches issued for this issue? This was observed with 8 node cluster....

Regards, Juan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20070710/3b18e55a/attachment.html