AW: [Spread-users] Spread daemon seems to "forget" its name

Ryan Caudy rcaudy at gmail.com
Tue Nov 9 20:07:23 EST 2004


Hi,

Your application sounds like one that might benefit from the changes
that are in the current CVS head branch.  In case you're interested in
trying it out, I've attached a patch against that version to fix the
bug you found in G_mess_to_groups.  Like the other patch, I haven't
tested this yet.

Cheers,
Ryan


On Tue, 9 Nov 2004 08:27:01 +0100, Schroeder, Heiko, ADBM62
<heiko.schroeder at eads.com> wrote:
> Hi,
> 
> we're using RedHat Linux 8.0.
> 
> What is different from most other Spread applications
> (I guess) is that we're using lots of groups (I think
> about 2500 should have been active at the time, but I'll
> check this) and quite a few clients, perhaps 10-20 per
> node. Each node runs its own spread daemon, i.e. clients
> connect locally using UNIX domain sockets.
> 
> The whole system is a publish/subscribe system, i.e.
> each client normally joins serveral groups and produces
> data on behalf of the received information.
> 
> I'm trying to reproduce this with a simpler setup, too,
> but unfortunately haven't been successful yet.
> 
> CU
> 
>    Heiko
> 
> > -----Ursprüngliche Nachricht-----
> > Von: Ryan Caudy [mailto:rcaudy at gmail.com]
> > Gesendet am: Dienstag, 9. November 2004 03:44
> > An: Schroeder, Heiko, ADBM62
> > Cc: spread-users at lists.spread.org
> > Betreff: Re: [Spread-users] Spread daemon seems to "forget" its name
> >
> > Hi,
> >
> > What OS are you using?  What kind's of things are your clients doing?
> > This isn't something that has turned up in ordinary testing, although
> > I haven't put 3.17.3 through it's paces the way I have with a slightly
> > hacked 3.17.2, or a precursor to the current CVS head based on 3.17.2.
> >  In order to reproduce the problem, it would help to know any
> > descriptive information you can think of.
> >
> > Cheers,
> > Ryan
> >
> >
> > On Mon, 8 Nov 2004 16:51:56 +0100, Schroeder, Heiko, ADBM62
> > <heiko.schroeder at eads.com> wrote:
> > > Hi,
> > >
> > > we just came across a problem which (I think) hints to some
> > > memory management relating bug in Spread. This is
> > > with version 3.17.3.
> > >
> > > We have a system of 12 hosts that host several process each
> > > that communicate using Spread.  When switching one of the
> > > hosts off and on again, sometimes (in about 30-50% of all
> > > cases!), the whole system breaks down. At first, the crash
> > > was because of an "illegal private name to kill" message.
> > > I changed this message into a warning to see how the
> > > system would react and switched SESSION debugging
> > > on.
> > >
> > > The following output comes from one of the hosts that
> > > were not switched off (the others produce output that is
> > > very similar):
> > >
> > > [Mon 08 Nov 2004 13:57:18] Sess_read: queueing message of
> > type 4 with len 0
> > > to the protocol
> > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > field 0x80000084
> > > [Mon 08 Nov 2004 13:57:19] Sess_read: queueing message of
> > type 4 with len 0
> > > to the protocol
> > > Membership id is ( 176161537, 1099915040)
> > > [Mon 08 Nov 2004 13:57:19] --------------------
> > > [Mon 08 Nov 2004 13:57:19] Configuration at mfc2 is:
> > > [Mon 08 Nov 2004 13:57:19] Num Segments 1
> > > [Mon 08 Nov 2004 13:57:19]      12      10.128.255.255    4803
> > > [Mon 08 Nov 2004 13:57:19]              mfc1
> >     10.128.3.1
> > >
> > > [Mon 08 Nov 2004 13:57:19]              mfc2
> >     10.128.3.2
> > >
> > > [Mon 08 Nov 2004 13:57:19]              mfc3
> >     10.128.3.3
> > >
> > > [Mon 08 Nov 2004 13:57:19]              mfc5
> >     10.128.3.5
> > >
> > > [Mon 08 Nov 2004 13:57:19]              mfc6
> >     10.128.3.6
> > >
> > > [Mon 08 Nov 2004 13:57:19]              siu1
> >     10.128.2.1
> > >
> > > [Mon 08 Nov 2004 13:57:19]              siu2
> >     10.128.2.2
> > >
> > > [Mon 08 Nov 2004 13:57:19]              siu3
> >     10.128.2.3
> > >
> > > [Mon 08 Nov 2004 13:57:19]              siu5
> >     10.128.2.5
> > >
> > > [Mon 08 Nov 2004 13:57:19]              gpcu1
> >     10.128.1.1
> > >
> > > [Mon 08 Nov 2004 13:57:19]              gpcu2
> >     10.128.1.2
> > >
> > > [Mon 08 Nov 2004 13:57:19]              gpcu3
> >     10.128.1.3
> > >
> > > [Mon 08 Nov 2004 13:57:19] ====================
> > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > field 0x80000084
> > > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc
> > name mfc2 is not
> > > my name
> > > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3636
> > ( mailbox 24 )
> > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > field 0x80000084
> > > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc
> > name mfc2 is not
> > > my name
> > > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3669
> > ( mailbox 27 )
> > > [Mon 08 Nov 2004 13:57:19] Sess_read: Message has type
> > field 0x80000084
> > > [Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc
> > name mfc2 is not
> > > my name
> > > [Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3637
> > ( mailbox 22 )
> > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > private name to kill
> > > #P3636#
> > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > private name to kill
> > > #P3669#
> > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > private name to kill
> > > #P1274#
> > > [Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal
> > private name to kill
> > > #P2135#
> > >
> > > Just before the new configuration message is output,
> > everyhting seems
> > > to be fine. But after this, the "My.name" is suddenly
> > empty. All of the 11
> > > "remaining" hosts showed the same problem, the one that "came back"
> > > did not (might be by chance, though).
> > >
> > > I'll try to investigate this further but I'd be very happy
> > if someone who
> > > really
> > > understands the code could help here... ;-)
> > >
> > > CU
> > >
> > >    Heiko
> > >
> > > --
> > > Heiko Schröder
> > > EADS Deutschland GmbH
> > > Defence and Communication Systems
> > > Naval Combat Systems (ADBM62)
> > > Bontekai 55
> > > 26382 Wilhelmshaven - Germany
> > > Tel: +49 44 21.15 43-230
> > > Fax: +49 44 21.15 43-111
> > > e-Fax: +49 731.392-20 91 11
> > > heiko.schroeder at eads.com
> > >
> > > www.eads.com
> > >
> > > _______________________________________________
> > > Spread-users mailing list
> > > Spread-users at lists.spread.org
> > > http://lists.spread.org/mailman/listinfo/spread-users
> > >
> >
> >
> > --
> > ---------------------------------------------------------------------
> > Ryan W. Caudy
> > <rcaudy at gmail.com>
> > ---------------------------------------------------------------------
> > Bloomberg L.P.
> > <rcaudy1 at bloomberg.net>
> > ---------------------------------------------------------------------
> > [Alumnus]
> > <caudy at cnds.jhu.edu>
> > Center for Networking and Distributed Systems
> > Department of Computer Science
> > Johns Hopkins University
> > ---------------------------------------------------------------------
> >
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
> >
> 


-- 
---------------------------------------------------------------------
Ryan W. Caudy
<rcaudy at gmail.com>
---------------------------------------------------------------------
Bloomberg L.P.
<rcaudy1 at bloomberg.net>
---------------------------------------------------------------------
[Alumnus]
<caudy at cnds.jhu.edu>         
Center for Networking and Distributed Systems
Department of Computer Science
Johns Hopkins University          
---------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: groups.c.patch
Type: application/octet-stream
Size: 1532 bytes
Desc: not available
Url : http://lists.spread.org/pipermail/spread-users/attachments/20041109/e9287bbc/attachment.obj 


More information about the Spread-users mailing list