WG: AW: AW: [Spread-users] Spread daemon seems to "forget" its na me

Schroeder, Heiko, ADBM62 heiko.schroeder at eads.com
Mon Nov 15 03:34:10 EST 2004


Hi,

yes, the patch definitely fixed the problem.
Thanks a lot!

CU

   Heiko

> -----Ursprüngliche Nachricht-----
> Von: Ryan Caudy [mailto:rcaudy at gmail.com]
> Gesendet am: Donnerstag, 11. November 2004 00:40
> An: Schroeder, Heiko, ADBM62
> Cc: spread-users at lists.spread.org
> Betreff: Re: AW: AW: [Spread-users] Spread daemon seems to 
> "forget" its
> name
> 
> Great!  I'm glad to hear that this was indeed correct.  Did you try
> using the fix I sent, or are you still using the increased Temp_buf
> that you mentioned yesterday?
> 
> Thanks,
> Ryan
> 
> 
> On Wed, 10 Nov 2004 09:19:53 +0100, Schroeder, Heiko, ADBM62
> <heiko.schroeder at eads.com> wrote:
> > Hi,
> > 
> > yes, I checked using objdump that My (and thus name as
> > its first element) is indeed immediately following
> > the Temp_buf. To confirm this, I added a check in
> > G_mess_to_groups that showed that there was an overflow.
> > So this was definitely the cause of the problem.
> > 
> > CU
> > 
> >   Heiko
> > 
> > > -----Ursprüngliche Nachricht-----
> > > Von: Ryan Caudy [mailto:rcaudy at gmail.com]
> > > Gesendet am: Mittwoch, 10. November 2004 02:05
> > > An: Schroeder, Heiko, ADBM62
> > > Cc: spread-users at lists.spread.org
> > > Betreff: Re: AW: [Spread-users] Spread daemon seems to
> > > "forget" its name
> > >
> > > Hi,
> > >
> > > Temp_buf is misused, somewhat, in G_compute_and_notify, when the
> > > vs_sets are built there -- the size isn't checked 
> properly.  This is a
> > > problem that we know about, and need to fix.
> > >
> > > I see how it's being misused where you found it, too.
> > > G_build_groups_bufs, the routine that builds the messages 
> unpacked by
> > > G_mess_to_groups, takes great care to keep the size under 100000.
> > > Unfortunately, it doesn't take into account the message_header
> > > structure (which should be 48 bytes).  In an application 
> like yours,
> > > there certainly is the potential to trigger this bug.
> > >
> > > Did you confirm that this bug was the cause of your 
> problem using a
> > > debugger?  Was My.name immediately following Temp_buf in memory?
> > >
> > > I've attached a patch against the 3.17 branch of CVS 
> which should fix
> > > this bug, although I haven't tested it.  Please let me know if it
> > > solves your problem.
> > >
> > > Cheers,
> > > Ryan
> > >
> > > On Tue, 9 Nov 2004 12:09:28 +0100, Schroeder, Heiko, ADBM62
> > > <heiko.schroeder at eads.com> wrote:
> > > > Hi,
> > > >
> > > > I think, I found the problem:
> > > > Temp_buf (in sees_body.h) seems is too small in our
> > > > case, it overflows in G_mess_to_groups. And the linker
> > > > choose to place My after this buffer.
> > > >
> > > > I don't fully understand the code yet, but shouldn't
> > > > this buffer be able to hold at least MAX_MESSAGE_BODY_LEN
> > > > bytes, which would be about 144k?
> > > >
> > > > Anyway, increasing the buffer to this size solved
> > > > our problem here.
> > > >
> > > >
> > > >
> > > > CU
> > > >
> > > >    Heiko
> > > >
> > > > > -----Ursprüngliche Nachricht-----
> > > > > Von: Ryan Caudy [mailto:rcaudy at gmail.com]
> > > > > Gesendet am: Dienstag, 9. November 2004 03:44
> > > > > An: Schroeder, Heiko, ADBM62
> > > > > Cc: spread-users at lists.spread.org
> > > > > Betreff: Re: [Spread-users] Spread daemon seems to
> > > "forget" its name
> > > > >
> > > > > Hi,
> > > > >
> > > > > What OS are you using?  What kind's of things are your
> > > clients doing?
> > > > > This isn't something that has turned up in ordinary
> > > testing, although
> > > > > I haven't put 3.17.3 through it's paces the way I have
> > > with a slightly
> > > > > hacked 3.17.2, or a precursor to the current CVS head
> > > based on 3.17.2.
> > > > >  In order to reproduce the problem, it would help to know any
> > > > > descriptive information you can think of.
> > > > >
> > > > > Cheers,
> > > > > Ryan
> > > > >
> > > > >
> > > > > On Mon, 8 Nov 2004 16:51:56 +0100, Schroeder, Heiko, ADBM62
> > > > > <heiko.schroeder at eads.com> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > we just came across a problem which (I think) hints to some
> > > > > > memory management relating bug in Spread. This is
> > > > > > with version 3.17.3.
> > > > > >
> > > > > > We have a system of 12 hosts that host several process each
> > > > > > that communicate using Spread.  When switching one of the
> > > > > > hosts off and on again, sometimes (in about 30-50% of all
> > > > > > cases!), the whole system breaks down. At first, the crash
> > > > > > was because of an "illegal private name to kill" message.
> > > > > > I changed this message into a warning to see how the
> > > > > > system would react and switched SESSION debugging
> > > > > > on.
> > > > > >
> > > > > > The following output comes from one of the hosts that
> > > > > > were not switched off (the others produce output that is
> > > > > > very similar):
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:18] Sess_read: queueing message of
> > > > > type 4 with len 0
> > > > > > to the protocol
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > > > > field 0x80000084
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_read: queueing message of
> > > > > type 4 with len 0
> > > > > > to the protocol
> > > > > > Membership id is ( 176161537, 1099915040)
> > > > > > [Mon 08 Nov 2004 13:57:19] --------------------
> > > > > > [Mon 08 Nov 2004 13:57:19] Configuration at mfc2 is:
> > > > > > [Mon 08 Nov 2004 13:57:19] Num Segments 1
> > > > > > [Mon 08 Nov 2004 13:57:19]      12      
> 10.128.255.255    4803
> > > > > > [Mon 08 Nov 2004 13:57:19]              mfc1
> > > > >     10.128.3.1
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19]              mfc2
> > > > >     10.128.3.2
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19]              mfc3
> > > > >     10.128.3.3
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19]              mfc5
> > > > >     10.128.3.5
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19]              mfc6
> > > > >     10.128.3.6
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19]              siu1
> > > > >     10.128.2.1
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19]              siu2
> > > > >     10.128.2.2
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19]              siu3
> > > > >     10.128.2.3
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19]              siu5
> > > > >     10.128.2.5
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19]              gpcu1
> > > > >     10.128.1.1
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19]              gpcu2
> > > > >     10.128.1.2
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19]              gpcu3
> > > > >     10.128.1.3
> > > > > >
> > > > > > [Mon 08 Nov 2004 13:57:19] ====================
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > > > > field 0x80000084
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc
> > > > > name mfc2 is not
> > > > > > my name
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3636
> > > > > ( mailbox 24 )
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > > > > field 0x80000084
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc
> > > > > name mfc2 is not
> > > > > > my name
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3669
> > > > > ( mailbox 27 )
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > > > > field 0x80000084
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc
> > > > > name mfc2 is not
> > > > > > my name
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3637
> > > > > ( mailbox 22 )
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > > > > private name to kill
> > > > > > #P3636#
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > > > > private name to kill
> > > > > > #P3669#
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > > > > private name to kill
> > > > > > #P1274#
> > > > > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > > > > private name to kill
> > > > > > #P2135#
> > > > > >
> > > > > > Just before the new configuration message is output,
> > > > > everyhting seems
> > > > > > to be fine. But after this, the "My.name" is suddenly
> > > > > empty. All of the 11
> > > > > > "remaining" hosts showed the same problem, the one that
> > > "came back"
> > > > > > did not (might be by chance, though).
> > > > > >
> > > > > > I'll try to investigate this further but I'd be very happy
> > > > > if someone who
> > > > > > really
> > > > > > understands the code could help here... ;-)
> > > > > >
> > > > > > CU
> > > > > >
> > > > > >    Heiko
> > > > > >
> > > > > > --
> > > > > > Heiko Schröder
> > > > > > EADS Deutschland GmbH
> > > > > > Defence and Communication Systems
> > > > > > Naval Combat Systems (ADBM62)
> > > > > > Bontekai 55
> > > > > > 26382 Wilhelmshaven - Germany
> > > > > > Tel: +49 44 21.15 43-230
> > > > > > Fax: +49 44 21.15 43-111
> > > > > > e-Fax: +49 731.392-20 91 11
> > > > > > heiko.schroeder at eads.com
> > > > > >
> > > > > > www.eads.com
> > > > > >
> > > > > > _______________________________________________
> > > > > > Spread-users mailing list
> > > > > > Spread-users at lists.spread.org
> > > > > > http://lists.spread.org/mailman/listinfo/spread-users
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > 
> ---------------------------------------------------------------------
> > > > > Ryan W. Caudy
> > > > > <rcaudy at gmail.com>
> > > > >
> > > 
> ---------------------------------------------------------------------
> > > > > Bloomberg L.P.
> > > > > <rcaudy1 at bloomberg.net>
> > > > >
> > > 
> ---------------------------------------------------------------------
> > > > > [Alumnus]
> > > > > <caudy at cnds.jhu.edu>
> > > > > Center for Networking and Distributed Systems
> > > > > Department of Computer Science
> > > > > Johns Hopkins University
> > > > >
> > > 
> ---------------------------------------------------------------------
> > > > >
> > > > > _______________________________________________
> > > > > Spread-users mailing list
> > > > > Spread-users at lists.spread.org
> > > > > http://lists.spread.org/mailman/listinfo/spread-users
> > > > >
> > > >
> > >
> > >
> > > --
> > > 
> ---------------------------------------------------------------------
> > > Ryan W. Caudy
> > > <rcaudy at gmail.com>
> > > 
> ---------------------------------------------------------------------
> > > Bloomberg L.P.
> > > <rcaudy1 at bloomberg.net>
> > > 
> ---------------------------------------------------------------------
> > > [Alumnus]
> > > <caudy at cnds.jhu.edu>
> > > Center for Networking and Distributed Systems
> > > Department of Computer Science
> > > Johns Hopkins University
> > > 
> ---------------------------------------------------------------------
> > >
> > 
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
> > 
> 
> 
> -- 
> ---------------------------------------------------------------------
> Ryan W. Caudy
> <rcaudy at gmail.com>
> ---------------------------------------------------------------------
> Bloomberg L.P.
> <rcaudy1 at bloomberg.net>
> ---------------------------------------------------------------------
> [Alumnus]
> <caudy at cnds.jhu.edu>         
> Center for Networking and Distributed Systems
> Department of Computer Science
> Johns Hopkins University          
> ---------------------------------------------------------------------
> 




More information about the Spread-users mailing list