AW: AW: [Spread-users] Spread daemon seems to "forget" its name

Ryan Caudy rcaudy at gmail.com
Wed Nov 10 18:39:45 EST 2004


Great!  I'm glad to hear that this was indeed correct.  Did you try
using the fix I sent, or are you still using the increased Temp_buf
that you mentioned yesterday?

Thanks,
Ryan


On Wed, 10 Nov 2004 09:19:53 +0100, Schroeder, Heiko, ADBM62
<heiko.schroeder at eads.com> wrote:
> Hi,
> 
> yes, I checked using objdump that My (and thus name as
> its first element) is indeed immediately following
> the Temp_buf. To confirm this, I added a check in
> G_mess_to_groups that showed that there was an overflow.
> So this was definitely the cause of the problem.
> 
> CU
> 
>   Heiko
> 
> > -----Ursprüngliche Nachricht-----
> > Von: Ryan Caudy [mailto:rcaudy at gmail.com]
> > Gesendet am: Mittwoch, 10. November 2004 02:05
> > An: Schroeder, Heiko, ADBM62
> > Cc: spread-users at lists.spread.org
> > Betreff: Re: AW: [Spread-users] Spread daemon seems to
> > "forget" its name
> >
> > Hi,
> >
> > Temp_buf is misused, somewhat, in G_compute_and_notify, when the
> > vs_sets are built there -- the size isn't checked properly.  This is a
> > problem that we know about, and need to fix.
> >
> > I see how it's being misused where you found it, too.
> > G_build_groups_bufs, the routine that builds the messages unpacked by
> > G_mess_to_groups, takes great care to keep the size under 100000.
> > Unfortunately, it doesn't take into account the message_header
> > structure (which should be 48 bytes).  In an application like yours,
> > there certainly is the potential to trigger this bug.
> >
> > Did you confirm that this bug was the cause of your problem using a
> > debugger?  Was My.name immediately following Temp_buf in memory?
> >
> > I've attached a patch against the 3.17 branch of CVS which should fix
> > this bug, although I haven't tested it.  Please let me know if it
> > solves your problem.
> >
> > Cheers,
> > Ryan
> >
> > On Tue, 9 Nov 2004 12:09:28 +0100, Schroeder, Heiko, ADBM62
> > <heiko.schroeder at eads.com> wrote:
> > > Hi,
> > >
> > > I think, I found the problem:
> > > Temp_buf (in sees_body.h) seems is too small in our
> > > case, it overflows in G_mess_to_groups. And the linker
> > > choose to place My after this buffer.
> > >
> > > I don't fully understand the code yet, but shouldn't
> > > this buffer be able to hold at least MAX_MESSAGE_BODY_LEN
> > > bytes, which would be about 144k?
> > >
> > > Anyway, increasing the buffer to this size solved
> > > our problem here.
> > >
> > >
> > >
> > > CU
> > >
> > >    Heiko
> > >
> > > > -----Ursprüngliche Nachricht-----
> > > > Von: Ryan Caudy [mailto:rcaudy at gmail.com]
> > > > Gesendet am: Dienstag, 9. November 2004 03:44
> > > > An: Schroeder, Heiko, ADBM62
> > > > Cc: spread-users at lists.spread.org
> > > > Betreff: Re: [Spread-users] Spread daemon seems to
> > "forget" its name
> > > >
> > > > Hi,
> > > >
> > > > What OS are you using?  What kind's of things are your
> > clients doing?
> > > > This isn't something that has turned up in ordinary
> > testing, although
> > > > I haven't put 3.17.3 through it's paces the way I have
> > with a slightly
> > > > hacked 3.17.2, or a precursor to the current CVS head
> > based on 3.17.2.
> > > >  In order to reproduce the problem, it would help to know any
> > > > descriptive information you can think of.
> > > >
> > > > Cheers,
> > > > Ryan
> > > >
> > > >
> > > > On Mon, 8 Nov 2004 16:51:56 +0100, Schroeder, Heiko, ADBM62
> > > > <heiko.schroeder at eads.com> wrote:
> > > > > Hi,
> > > > >
> > > > > we just came across a problem which (I think) hints to some
> > > > > memory management relating bug in Spread. This is
> > > > > with version 3.17.3.
> > > > >
> > > > > We have a system of 12 hosts that host several process each
> > > > > that communicate using Spread.  When switching one of the
> > > > > hosts off and on again, sometimes (in about 30-50% of all
> > > > > cases!), the whole system breaks down. At first, the crash
> > > > > was because of an "illegal private name to kill" message.
> > > > > I changed this message into a warning to see how the
> > > > > system would react and switched SESSION debugging
> > > > > on.
> > > > >
> > > > > The following output comes from one of the hosts that
> > > > > were not switched off (the others produce output that is
> > > > > very similar):
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:18] Sess_read: queueing message of
> > > > type 4 with len 0
> > > > > to the protocol
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > > > field 0x80000084
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_read: queueing message of
> > > > type 4 with len 0
> > > > > to the protocol
> > > > > Membership id is ( 176161537, 1099915040)
> > > > > [Mon 08 Nov 2004 13:57:19] --------------------
> > > > > [Mon 08 Nov 2004 13:57:19] Configuration at mfc2 is:
> > > > > [Mon 08 Nov 2004 13:57:19] Num Segments 1
> > > > > [Mon 08 Nov 2004 13:57:19]      12      10.128.255.255    4803
> > > > > [Mon 08 Nov 2004 13:57:19]              mfc1
> > > >     10.128.3.1
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19]              mfc2
> > > >     10.128.3.2
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19]              mfc3
> > > >     10.128.3.3
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19]              mfc5
> > > >     10.128.3.5
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19]              mfc6
> > > >     10.128.3.6
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19]              siu1
> > > >     10.128.2.1
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19]              siu2
> > > >     10.128.2.2
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19]              siu3
> > > >     10.128.2.3
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19]              siu5
> > > >     10.128.2.5
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19]              gpcu1
> > > >     10.128.1.1
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19]              gpcu2
> > > >     10.128.1.2
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19]              gpcu3
> > > >     10.128.1.3
> > > > >
> > > > > [Mon 08 Nov 2004 13:57:19] ====================
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > > > field 0x80000084
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc
> > > > name mfc2 is not
> > > > > my name
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3636
> > > > ( mailbox 24 )
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > > > field 0x80000084
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc
> > > > name mfc2 is not
> > > > > my name
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3669
> > > > ( mailbox 27 )
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > > > field 0x80000084
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc
> > > > name mfc2 is not
> > > > > my name
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3637
> > > > ( mailbox 22 )
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > > > private name to kill
> > > > > #P3636#
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > > > private name to kill
> > > > > #P3669#
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > > > private name to kill
> > > > > #P1274#
> > > > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > > > private name to kill
> > > > > #P2135#
> > > > >
> > > > > Just before the new configuration message is output,
> > > > everyhting seems
> > > > > to be fine. But after this, the "My.name" is suddenly
> > > > empty. All of the 11
> > > > > "remaining" hosts showed the same problem, the one that
> > "came back"
> > > > > did not (might be by chance, though).
> > > > >
> > > > > I'll try to investigate this further but I'd be very happy
> > > > if someone who
> > > > > really
> > > > > understands the code could help here... ;-)
> > > > >
> > > > > CU
> > > > >
> > > > >    Heiko
> > > > >
> > > > > --
> > > > > Heiko Schröder
> > > > > EADS Deutschland GmbH
> > > > > Defence and Communication Systems
> > > > > Naval Combat Systems (ADBM62)
> > > > > Bontekai 55
> > > > > 26382 Wilhelmshaven - Germany
> > > > > Tel: +49 44 21.15 43-230
> > > > > Fax: +49 44 21.15 43-111
> > > > > e-Fax: +49 731.392-20 91 11
> > > > > heiko.schroeder at eads.com
> > > > >
> > > > > www.eads.com
> > > > >
> > > > > _______________________________________________
> > > > > Spread-users mailing list
> > > > > Spread-users at lists.spread.org
> > > > > http://lists.spread.org/mailman/listinfo/spread-users
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > ---------------------------------------------------------------------
> > > > Ryan W. Caudy
> > > > <rcaudy at gmail.com>
> > > >
> > ---------------------------------------------------------------------
> > > > Bloomberg L.P.
> > > > <rcaudy1 at bloomberg.net>
> > > >
> > ---------------------------------------------------------------------
> > > > [Alumnus]
> > > > <caudy at cnds.jhu.edu>
> > > > Center for Networking and Distributed Systems
> > > > Department of Computer Science
> > > > Johns Hopkins University
> > > >
> > ---------------------------------------------------------------------
> > > >
> > > > _______________________________________________
> > > > Spread-users mailing list
> > > > Spread-users at lists.spread.org
> > > > http://lists.spread.org/mailman/listinfo/spread-users
> > > >
> > >
> >
> >
> > --
> > ---------------------------------------------------------------------
> > Ryan W. Caudy
> > <rcaudy at gmail.com>
> > ---------------------------------------------------------------------
> > Bloomberg L.P.
> > <rcaudy1 at bloomberg.net>
> > ---------------------------------------------------------------------
> > [Alumnus]
> > <caudy at cnds.jhu.edu>
> > Center for Networking and Distributed Systems
> > Department of Computer Science
> > Johns Hopkins University
> > ---------------------------------------------------------------------
> >
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
> 


-- 
---------------------------------------------------------------------
Ryan W. Caudy
<rcaudy at gmail.com>
---------------------------------------------------------------------
Bloomberg L.P.
<rcaudy1 at bloomberg.net>
---------------------------------------------------------------------
[Alumnus]
<caudy at cnds.jhu.edu>         
Center for Networking and Distributed Systems
Department of Computer Science
Johns Hopkins University          
---------------------------------------------------------------------




More information about the Spread-users mailing list