AW: [Spread-users] Spread daemon seems to "forget" its name

Schroeder, Heiko, ADBM62 heiko.schroeder at eads.com
Tue Nov 9 06:09:28 EST 2004


Hi, 

I think, I found the problem:
Temp_buf (in sees_body.h) seems is too small in our 
case, it overflows in G_mess_to_groups. And the linker
choose to place My after this buffer.

I don't fully understand the code yet, but shouldn't 
this buffer be able to hold at least MAX_MESSAGE_BODY_LEN 
bytes, which would be about 144k?

Anyway, increasing the buffer to this size solved
our problem here.

CU

   Heiko

> -----Ursprüngliche Nachricht-----
> Von: Ryan Caudy [mailto:rcaudy at gmail.com]
> Gesendet am: Dienstag, 9. November 2004 03:44
> An: Schroeder, Heiko, ADBM62
> Cc: spread-users at lists.spread.org
> Betreff: Re: [Spread-users] Spread daemon seems to "forget" its name
> 
> Hi,
> 
> What OS are you using?  What kind's of things are your clients doing? 
> This isn't something that has turned up in ordinary testing, although
> I haven't put 3.17.3 through it's paces the way I have with a slightly
> hacked 3.17.2, or a precursor to the current CVS head based on 3.17.2.
>  In order to reproduce the problem, it would help to know any
> descriptive information you can think of.
> 
> Cheers,
> Ryan
> 
> 
> On Mon, 8 Nov 2004 16:51:56 +0100, Schroeder, Heiko, ADBM62
> <heiko.schroeder at eads.com> wrote:
> > Hi,
> > 
> > we just came across a problem which (I think) hints to some
> > memory management relating bug in Spread. This is
> > with version 3.17.3.
> > 
> > We have a system of 12 hosts that host several process each
> > that communicate using Spread.  When switching one of the
> > hosts off and on again, sometimes (in about 30-50% of all
> > cases!), the whole system breaks down. At first, the crash
> > was because of an "illegal private name to kill" message.
> > I changed this message into a warning to see how the
> > system would react and switched SESSION debugging
> > on.
> > 
> > The following output comes from one of the hosts that
> > were not switched off (the others produce output that is
> > very similar):
> > 
> > [Mon 08 Nov 2004 13:57:18] Sess_read: queueing message of 
> type 4 with len 0
> > to the protocol
> > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type 
> field 0x80000084
> > [Mon 08 Nov 2004 13:57:19] Sess_read: queueing message of 
> type 4 with len 0
> > to the protocol
> > Membership id is ( 176161537, 1099915040)
> > [Mon 08 Nov 2004 13:57:19] --------------------
> > [Mon 08 Nov 2004 13:57:19] Configuration at mfc2 is:
> > [Mon 08 Nov 2004 13:57:19] Num Segments 1
> > [Mon 08 Nov 2004 13:57:19]      12      10.128.255.255    4803
> > [Mon 08 Nov 2004 13:57:19]              mfc1                
>     10.128.3.1
> > 
> > [Mon 08 Nov 2004 13:57:19]              mfc2                
>     10.128.3.2
> > 
> > [Mon 08 Nov 2004 13:57:19]              mfc3                
>     10.128.3.3
> > 
> > [Mon 08 Nov 2004 13:57:19]              mfc5                
>     10.128.3.5
> > 
> > [Mon 08 Nov 2004 13:57:19]              mfc6                
>     10.128.3.6
> > 
> > [Mon 08 Nov 2004 13:57:19]              siu1                
>     10.128.2.1
> > 
> > [Mon 08 Nov 2004 13:57:19]              siu2                
>     10.128.2.2
> > 
> > [Mon 08 Nov 2004 13:57:19]              siu3                
>     10.128.2.3
> > 
> > [Mon 08 Nov 2004 13:57:19]              siu5                
>     10.128.2.5
> > 
> > [Mon 08 Nov 2004 13:57:19]              gpcu1               
>     10.128.1.1
> > 
> > [Mon 08 Nov 2004 13:57:19]              gpcu2               
>     10.128.1.2
> > 
> > [Mon 08 Nov 2004 13:57:19]              gpcu3               
>     10.128.1.3
> > 
> > [Mon 08 Nov 2004 13:57:19] ====================
> > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type 
> field 0x80000084
> > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc 
> name mfc2 is not
> > my name
> > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3636 
> ( mailbox 24 )
> > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type 
> field 0x80000084
> > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc 
> name mfc2 is not
> > my name
> > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3669 
> ( mailbox 27 )
> > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type 
> field 0x80000084
> > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc 
> name mfc2 is not
> > my name
> > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3637 
> ( mailbox 22 )
> > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal 
> private name to kill
> > #P3636#
> > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal 
> private name to kill
> > #P3669#
> > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal 
> private name to kill
> > #P1274#
> > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal 
> private name to kill
> > #P2135#
> > 
> > Just before the new configuration message is output, 
> everyhting seems
> > to be fine. But after this, the "My.name" is suddenly 
> empty. All of the 11
> > "remaining" hosts showed the same problem, the one that "came back"
> > did not (might be by chance, though).
> > 
> > I'll try to investigate this further but I'd be very happy 
> if someone who
> > really
> > understands the code could help here... ;-)
> > 
> > CU
> > 
> >    Heiko
> > 
> > --
> > Heiko Schröder
> > EADS Deutschland GmbH
> > Defence and Communication Systems
> > Naval Combat Systems (ADBM62)
> > Bontekai 55
> > 26382 Wilhelmshaven - Germany
> > Tel: +49 44 21.15 43-230
> > Fax: +49 44 21.15 43-111
> > e-Fax: +49 731.392-20 91 11
> > heiko.schroeder at eads.com
> > 
> > www.eads.com
> > 
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
> > 
> 
> 
> -- 
> ---------------------------------------------------------------------
> Ryan W. Caudy
> <rcaudy at gmail.com>
> ---------------------------------------------------------------------
> Bloomberg L.P.
> <rcaudy1 at bloomberg.net>
> ---------------------------------------------------------------------
> [Alumnus]
> <caudy at cnds.jhu.edu>         
> Center for Networking and Distributed Systems
> Department of Computer Science
> Johns Hopkins University          
> ---------------------------------------------------------------------
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
> 




More information about the Spread-users mailing list