[Spread-users] Spread 4.0 memory corruption bug in Fill_form1()
Juan Gomez
jgomez at juniper.net
Tue Jul 10 20:49:49 EDT 2007
Hi all:
I have noticed that after long running test spread 4.0 crashes with
corrupted stack in the following spot:
Membership.c:
static void Fill_form1( sys_scatter *scat )
{
....
/* New ring_info will fit, so create it */
for( index = Last_discarded+1; index <= Highest_seq;
index++ )
{
pack_entry = index & PACKET_MASK;
if( ! Packets[pack_entry].exist )
{
*new_holes_procs_ptr = index;
Alarm( MEMB , "INSERT HOLE 2 IS
%d\n",index); <<<<<<<<<<<<<<<<<<<<<<<<< CRASH HERE
new_holes_procs_ptr++;
num_bytes += sizeof(int32);
new_rg_info->num_holes++;
}
}
}
The first stack trace I got was this:
(gdb) where
#0 0xb7ebe9da in getenv () from /lib/libc.so.6
#1 0xb7f10d19 in tzset_internal () from /lib/libc.so.6
#2 0xb7f11a28 in __tz_convert () from /lib/libc.so.6
#3 0xb7f0fca0 in localtime () from /lib/libc.so.6
#4 0x08053498 in Alarm (mask=512, message=0x806c171 "INSERT HOLE 2 IS
%d\n")
at alarm.c:146
#5 0x0805864e in Fill_form1 (scat=Variable "scat" is not available.
) at membership.c:1837
#6 0x00000f79 in ?? ()
#7 0x00000f7a in ?? ()
#8 0x00000f7b in ?? ()
#9 0x00000f7c in ?? ()
This obviously shows the stack was corrupted and since the only part of
the code I could suspect of causing this corruption was the following
buffer:
char rg_info_buf[sizeof(token_body)];
I added asserts to track exactly at what point we overrun the buffer and
found the following:
....
rg_info_buf_end = rg_info_buf + sizeof(token_body);
....
/* New ring_info will fit, so create it */
for( index = Last_discarded+1; index <= Highest_seq;
index++ )
{
pack_entry = index & PACKET_MASK;
if( ! Packets[pack_entry].exist )
{
assert((rg_info_buf + num_bytes + sizeof(int32))
<= rg_info_buf_end); <<<<< TRIGERRED ASSERT
*new_holes_procs_ptr = index;
Alarm( MEMB , "INSERT HOLE 2 IS %d\n",index);
new_holes_procs_ptr++;
num_bytes += sizeof(int32);
new_rg_info->num_holes++;
}
}
So it looks like we overrun the rg_info_buf under some conditions and I
am wondering if snybody has seen this problem or whether there have been
patches issued for this issue? This was observed with 8 node cluster....
Regards, Juan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20070710/3b18e55a/attachment.html
More information about the Spread-users
mailing list