[Spread-users] Spread daemon seems to "forget" its name

Schroeder, Heiko, ADBM62 heiko.schroeder at eads.com
Mon Nov 8 10:51:56 EST 2004


Hi,

we just came across a problem which (I think) hints to some
memory management relating bug in Spread. This is
with version 3.17.3.

We have a system of 12 hosts that host several process each
that communicate using Spread.  When switching one of the
hosts off and on again, sometimes (in about 30-50% of all 
cases!), the whole system breaks down. At first, the crash 
was because of an "illegal private name to kill" message. 
I changed this message into a warning to see how the 
system would react and switched SESSION debugging
on.

The following output comes from one of the hosts that
were not switched off (the others produce output that is
very similar):

[Mon 08 Nov 2004 13:57:18] Sess_read: queueing message of type 4 with len 0
to the protocol
[Mon 08 Nov 2004 13:57:19] Sess_read: Message has type field 0x80000084
[Mon 08 Nov 2004 13:57:19] Sess_read: queueing message of type 4 with len 0
to the protocol
Membership id is ( 176161537, 1099915040)
[Mon 08 Nov 2004 13:57:19] --------------------
[Mon 08 Nov 2004 13:57:19] Configuration at mfc2 is:
[Mon 08 Nov 2004 13:57:19] Num Segments 1
[Mon 08 Nov 2004 13:57:19]      12      10.128.255.255    4803
[Mon 08 Nov 2004 13:57:19]              mfc1                    10.128.3.1

[Mon 08 Nov 2004 13:57:19]              mfc2                    10.128.3.2

[Mon 08 Nov 2004 13:57:19]              mfc3                    10.128.3.3

[Mon 08 Nov 2004 13:57:19]              mfc5                    10.128.3.5

[Mon 08 Nov 2004 13:57:19]              mfc6                    10.128.3.6

[Mon 08 Nov 2004 13:57:19]              siu1                    10.128.2.1

[Mon 08 Nov 2004 13:57:19]              siu2                    10.128.2.2

[Mon 08 Nov 2004 13:57:19]              siu3                    10.128.2.3

[Mon 08 Nov 2004 13:57:19]              siu5                    10.128.2.5

[Mon 08 Nov 2004 13:57:19]              gpcu1                   10.128.1.1

[Mon 08 Nov 2004 13:57:19]              gpcu2                   10.128.1.2

[Mon 08 Nov 2004 13:57:19]              gpcu3                   10.128.1.3

[Mon 08 Nov 2004 13:57:19] ====================
[Mon 08 Nov 2004 13:57:19] Sess_read: Message has type field 0x80000084
[Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc name mfc2 is not
my name 
[Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3636 ( mailbox 24 )
[Mon 08 Nov 2004 13:57:19] Sess_read: Message has type field 0x80000084
[Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc name mfc2 is not
my name 
[Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3669 ( mailbox 27 )
[Mon 08 Nov 2004 13:57:19] Sess_read: Message has type field 0x80000084
[Mon 08 Nov 2004 13:57:19] Sess_validate_read_header: proc name mfc2 is not
my name 
[Mon 08 Nov 2004 13:57:19] Sess_kill: killing session P3637 ( mailbox 22 )
[Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal private name to kill
#P3636#
[Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal private name to kill
#P3669#
[Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal private name to kill
#P1274#
[Mon 08 Nov 2004 13:57:19] Sess_handle_kill: Illegal private name to kill
#P2135#


Just before the new configuration message is output, everyhting seems
to be fine. But after this, the "My.name" is suddenly empty. All of the 11
"remaining" hosts showed the same problem, the one that "came back"
did not (might be by chance, though).

I'll try to investigate this further but I'd be very happy if someone who
really
understands the code could help here... ;-)

CU

   Heiko

--
Heiko Schröder
EADS Deutschland GmbH
Defence and Communication Systems
Naval Combat Systems (ADBM62)
Bontekai 55
26382 Wilhelmshaven - Germany
Tel: +49 44 21.15 43-230
Fax: +49 44 21.15 43-111
e-Fax: +49 731.392-20 91 11
heiko.schroeder at eads.com

www.eads.com






More information about the Spread-users mailing list