[Spread-users] Spread on Linux question

Crystal, Mayer mayer.crystal at gs.com
Mon Jul 4 13:52:15 EDT 2005


I've just started working with spread and have set up a network which
consists of 4 spread daemons talking to each other.  Every so often the
spread daemon crashes or just hangs and although it accepts connections it
doesn't do anything further.  We are running on Linux (2.4.21-20.0.2.ELsmp
#1 SMP Thu Apr 21 15:17:44 EDT 2005 i686 athlon i386 GNU/Linux) and are
running spread 3.17.3.  In one instance we generated the core dump and this
is what we see:

				(gdb) bt
				#0  0xb750852a in _int_realloc () from
/lib/tls/libc.so.6
				#1  0xb7507ad9 in _int_malloc () from
/lib/tls/libc.so.6
				#2  0xb7507580 in calloc () from
/lib/tls/libc.so.6
				#3  0x08053c66 in new (obj_type=50) at
memory.c:433
				#4  0x0804b11a in Deliver_packet
(pack_entry=112, to_copy=1) at protocol.c:948
				#5  0x0804b2a2 in Deliver_agreed_packets ()
at protocol.c:1010
				#6  0x0804a220 in Prot_handle_bcast (fd=3,
dummy=0, dummy_p=0x0) at protocol.c:396
				#7  0x080537ad in E_handle_events () at
events.c:673
				#8  0x080497b1 in main (argc=5,
argv=0xbfffedd4) at spread.c:193

				(gdb) frame 4
				#4  0x0804b11a in Deliver_packet
(pack_entry=112, to_copy=1) at protocol.c:948
				948     Packets[pack_entry].body =
new(PACKET_BODY);

				(gdb) info locals
				proc_index = 1
				up_ptr = (up_queue *) 0x8eb8718
				pack_ptr = (packet_header *) 0x9d74310
				mess_link = (message_link *) 0x1
				index = 2
				(gdb) print *pack_ptr
				$6 = {type = -2147483512, transmiter_id =
-1710970210, proc_id = -1710970210, memb_id = {proc_id = -1806277334, time =
1118954489}, seq = 7, fifo_seq = 2, packet_index = -2, data_len = 216}
				
				We think that this is related to the Linux
glibc issue raised in
http://groups-beta.google.com/group/linux.debian.bugs.dist/browse_thread/thr
ead/ebd50c447c23eba9/199fcd078c5e100b?q=malloc_consolidate&rnum=5&hl=en#199f
cd078c5e100b
<http://groups-beta.google.com/group/linux.debian.bugs.dist/browse_thread/th
read/ebd50c447c23eba9/199fcd078c5e100b?q=malloc_consolidate&rnum=5&hl=en#199
fcd078c5e100b> .  In a related post we found:
http://download.fedora.redhat.com/pub/fedora/linux/core/3/i386/os/RELEASE-NO
TES-en.html#id850584
<http://download.fedora.redhat.com/pub/fedora/linux/core/3/i386/os/RELEASE-N
OTES-en.html#id850584> .  Based on the last post we added MALLOC_CHECK_=1 to
the start script and during execution we found:  "free(): invalid pointer
0x9d876a8!" in the log, which seems to imply that there is indeed a double
free occurring.  We can't figure out exactly what's happening, but we
believe that this is being triggered by this spread daemon responding to
another spread daemon being shut down (this seemed to occur when 1 or 2 of
the machines running the spread daemons were rebooted).  I was wondering if
anyone has seen anything similar and if this known and if there is a
patch/workaround for this.


TIA,
Mayer
 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20050704/995560eb/attachment.html 


More information about the Spread-users mailing list