[Spread-users] Delayed membership establishment issue

Scott Barvick sbarvick at revasystems.com
Fri May 19 10:33:06 EDT 2006


I thought I would follow up on what I found with this issue so others
might benefit.  I found that we had dropped some of the timers in
membership.c to get faster response time on a failed node, and
apparently we were too aggressive on the state timers like the
Seg_timeout and Gather_timeout.  It worked fine for 2 nodes, but it must
have been on the edge for more than 2 nodes.  This became apparent from
the debug messages like:

Memb_handle_message: handling join message from -1408236736, State is 4
Memb_token_loss: I lost my token, state is 5
Scast_alive: State is 2
Send_join: State is 4
Memb_handle_message: handling join message from -1408236736, State is 4
Memb_token_loss: I lost my token, state is 5
Scast_alive: State is 2
Send_join: State is 4
Memb_handle_message: handling join message from -1408236736, State is 4
Memb_token_loss: I lost my token, state is 5
Scast_alive: State is 2
Send_join: State is 4
Memb_handle_message: handling join message from -1408236736, State is 4
Memb_token_loss: I lost my token, state is 5
Scast_alive: State is 2
Send_join: State is 4
Memb_handle_message: handling join message from -1408236736, State is 4
Memb_token_loss: I lost my token, state is 5
Scast_alive: State is 2
Send_join: State is 4
Memb_handle_message: handling join message from -1408236736, State is 4
Memb_token_loss: I lost my token, state is 5
Scast_alive: State is 2
Send_join: State is 4

Adjusting the Gather_timeout fixed this issue.

One question.  In playing with the timer values, I believe that I was able to get 
into a configuration where clients on other nodes (other daemons) would have
their connections to the (local) daemons closed when another node was coming up.  I'm
not sure I understand how this could happen.  Shouldn't all of state transition 
be a pure daemon-daemon thing?  Any thoughts?

Thanks,
Scott






On Tue, 2006-05-16 at 18:09 -0400, Scott Barvick wrote:
> I've been looking into an issue I'm seeing, and I think it is time to
> ask for help.  
> 
> I have Spread 3.17.3 running on 3 or 4 hosts with a small number of
> groups and a fair bit of traffic among them.   The hosts are multihomed
> and have configurations that look like:
> 
> Spread_Segment 239.255.3.62:4803 {
>      tap2     172.16.3.65 {
>      172.17.188.2
>      172.16.3.65
>      C 127.0.0.1 }
>      tap3     172.16.3.66 {
>      172.17.188.3
>      172.16.3.66
>      C 127.0.0.1 }
>      tap4     172.16.3.62 {
>      172.17.188.4
>      172.16.3.62
>      C 127.0.0.1 }
> }
> 
> Steady state communication among the three of them works very well, but
> we have noticed that with 3 nodes, a restart of the third node can take
> a long time for the clients to see membership messages and even (I
> think) get connected to the local daemon.  I've been looking at a bunch
> of debug output and have found the following difference between a 'good'
> run (quick connect) and a 'delayed' one where the connection can be
> delayed for minutes sometimes but then it kicks in and all is fine:
> 
> *Good*
> 
> ================
> ... banner ...
> | Version 3.17.03 Built 15/October/2004
> |
> \===========================================================================/
> Conf_init: using file: /var/reva/etc/spread.conf
> Successfully configured Segment 0 [239.255.3.62:4803] with 4 procs:
>                         tap1: 172.16.3.64
>                         tap2: 172.16.3.65
>                         tap3: 172.16.3.66
>                         tap4: 172.16.3.62
> ENABLING Dangerous Monitor Commands! Make sure Spread network is secured
> Setting SO_REUSEADDR to always on -- make sure Spread daemon host is
> secured!
> E_init: went ok
> DL_init_channel: bind for recv_channel for port 4803 with chan 6 ok
> DL_init_channel: Joining multicast address 239.255.3.62 went ok
> DL_init_channel: went ok on channel 6
> DL_init_channel: bind for recv_channel for port 4803 with chan 8 ok
> DL_init_channel: Joining multicast address 239.255.3.62 went ok
> Moving process spread (pid:20378) to Fifo scheduling with priority:4
> DL_init_channel: went ok on channel 8
> DL_init_channel: bind for recv_channel for port 4804 with chan 9 ok
> DL_init_channel: setsockopt for recv and broadcast went ok
> DL_init_channel: went ok on channel 9
> DL_init_channel: bind for recv_channel for port 4803 with chan 10 ok
> DL_init_channel: Joining multicast address 239.255.3.62 went ok
> DL_init_channel: went ok on channel 10
> DL_init_channel: bind for recv_channel for port 4804 with chan 11 ok
> DL_init_channel: setsockopt for recv and broadcast went ok
> DL_init_channel: went ok on channel 11
> DL_init_channel: setsockopt for send and broadcast went ok
> DL_init_channel: setting Mcast TTL to 1
> DL_init_channel: went ok on channel 12
> E_attach_fd: fd 6, fd_type 0, code 0, data 0x0, priority 2
> Active_priority 0
> E_attach_fd: fd 8, fd_type 0, code 0, data 0x0, priority 2
> Active_priority 0
> E_attach_fd: fd 10, fd_type 0, code 0, data 0x0, priority 2
> Active_priority 0
> E_attach_fd: fd 9, fd_type 0, code 0, data 0x0, priority 1
> Active_priority 0
> E_attach_fd: fd 11, fd_type 0, code 0, data 0x0, priority 1
> Active_priority 0
> E_queue: (only) event queued func 0x80566a0 code 0 data 0x0 in future
> (0:0)
> E_queue: (last) event queued func 0x8056410 code 0 data 0x0 in future
> (1:0)
> E_dequeue: no such event
> E_dequeue: no such event
> E_dequeue: no such event
> E_dequeue: no such event
> E_dequeue: no such event
> Sess_init: INET bind for port 4803 interface 172.17.188.3 ok
> Sess_init: INET went ok on mailbox 13
> Sess_init: INET bind for port 4803 interface 172.16.3.66 ok
> Sess_init: INET went ok on mailbox 14
> Sess_init: INET bind for port 4803 interface 127.0.0.1 ok
> Sess_init: INET went ok on mailbox 15
> Sess_init: UNIX bind for name /tmp/4803 ok
> Sess_init: UNIX went ok on mailbox 16
> E_attach_fd: fd 13, fd_type 0, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 13, fd_type 2, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 14, fd_type 0, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 14, fd_type 2, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 15, fd_type 0, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 15, fd_type 2, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 16, fd_type 0, code 1, data 0x0, priority 0
> Active_priority 0
> Sess_init: ended ok
> DL_init_channel: setsockopt for send and broadcast went ok
> DL_init_channel: setting Mcast TTL to 1
> DL_init_channel: went ok on channel 17
> E_handle_events: next event 
> E_sub_time: negative time result.
> E_handle_events: poll select
> E_handle_events: select with timeout (0, 0)
> E_handle_events: next event 
> E_handle_events: exec time event 
> DL_send: sent a message of 40 bytes to (239.255.3.62,4803) on channel 12
> E_queue: (last) event queued func 0x80566a0 code 0 data 0x0 in future
> (1:0)
> E_handle_events: poll select
> E_handle_events: exec handler for fd 6, fd_type 0, priority 2
> DL_recv: received 40 bytes on channel 6
> E_handle_events: next event 
> E_handle_events: poll select
> E_handle_events: select with timeout (0, 999247)
> E_handle_events: exec handler for fd 6, fd_type 0, priority 2
> DL_recv: received 60 bytes on channel 6
> E_dequeue: event dequeued func 0x80566a0 code 0 data 0x0
> E_dequeue: first event dequeued func 0x8056410 code 0 data 0x0
> E_queue: (only) event queued func 0x8056390 code 0 data 0x0 in future
> (1:500000)
> DL_send: sent a message of 40 bytes to (239.255.3.62,4803) on channel 12
> E_handle_events: next event 
> E_handle_events: poll select
> E_handle_events: exec handler for fd 6, fd_type 0, priority 2
> DL_recv: received 44 bytes on channel 6
> E_queue: dequeued a (first) simillar event
> E_queue: (only) event queued func 0x8056390 code 0 data 0x0 in future
> (1:500000)
> E_handle_events: next event 
> E_handle_events: poll select
> E_handle_events: exec handler for fd 6, fd_type 0, priority 2
> DL_recv: received 40 bytes on channel 6
> E_handle_events: next event 
> E_handle_events: poll select
> E_handle_events: select with timeout (1, 499966)
> Moving process tmServer (pid:20397) to Fifo scheduling with priority:2
> E_handle_events: exec handler for fd 11, fd_type 0, priority 1
> DL_recv: received 788 bytes on channel 11
> 
> 
> *delayed*
> 
> | Version 3.17.03 Built 15/October/2004
> |
> \===========================================================================/
> Conf_init: using file: /var/reva/etc/spread.conf
> Successfully configured Segment 0 [239.255.3.62:4803] with 4 procs:
>                         tap1: 172.16.3.64
>                         tap2: 172.16.3.65
>                         tap3: 172.16.3.66
>                         tap4: 172.16.3.62
> ENABLING Dangerous Monitor Commands! Make sure Spread network is secured
> Setting SO_REUSEADDR to always on -- make sure Spread daemon host is
> secured!
> E_init: went ok
> DL_init_channel: bind for recv_channel for port 4803 with chan 6 ok
> DL_init_channel: Joining multicast address 239.255.3.62 went ok
> DL_init_channel: went ok on channel 6
> DL_init_channel: bind for recv_channel for port 4803 with chan 8 ok
> Moving process spread (pid:15202) to Fifo scheduling with priority:4
> DL_init_channel: Joining multicast address 239.255.3.62 went ok
> DL_init_channel: went ok on channel 8
> DL_init_channel: bind for recv_channel for port 4804 with chan 9 ok
> DL_init_channel: setsockopt for recv and broadcast went ok
> DL_init_channel: went ok on channel 9
> DL_init_channel: bind for recv_channel for port 4803 with chan 10 ok
> DL_init_channel: Joining multicast address 239.255.3.62 went ok
> DL_init_channel: went ok on channel 10
> DL_init_channel: bind for recv_channel for port 4804 with chan 11 ok
> DL_init_channel: setsockopt for recv and broadcast went ok
> DL_init_channel: went ok on channel 11
> DL_init_channel: setsockopt for send and broadcast went ok
> DL_init_channel: setting Mcast TTL to 1
> DL_init_channel: went ok on channel 12
> E_attach_fd: fd 6, fd_type 0, code 0, data 0x0, priority 2
> Active_priority 0
> E_attach_fd: fd 8, fd_type 0, code 0, data 0x0, priority 2
> Active_priority 0
> E_attach_fd: fd 10, fd_type 0, code 0, data 0x0, priority 2
> Active_priority 0
> E_attach_fd: fd 9, fd_type 0, code 0, data 0x0, priority 1
> Active_priority 0
> E_attach_fd: fd 11, fd_type 0, code 0, data 0x0, priority 1
> Active_priority 0
> E_queue: (only) event queued func 0x80566a0 code 0 data 0x0 in future
> (0:0)
> E_queue: (last) event queued func 0x8056410 code 0 data 0x0 in future
> (1:0)
> E_dequeue: no such event
> E_dequeue: no such event
> E_dequeue: no such event
> E_dequeue: no such event
> E_dequeue: no such event
> Sess_init: INET bind for port 4803 interface 172.17.188.3 ok
> Sess_init: INET went ok on mailbox 13
> Sess_init: INET bind for port 4803 interface 172.16.3.66 ok
> Sess_init: INET went ok on mailbox 14
> Sess_init: INET bind for port 4803 interface 127.0.0.1 ok
> Sess_init: INET went ok on mailbox 15
> Sess_init: UNIX bind for name /tmp/4803 ok
> Sess_init: UNIX went ok on mailbox 16
> E_attach_fd: fd 13, fd_type 0, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 13, fd_type 2, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 14, fd_type 0, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 14, fd_type 2, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 15, fd_type 0, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 15, fd_type 2, code 2, data 0x0, priority 0
> Active_priority 0
> E_attach_fd: fd 16, fd_type 0, code 1, data 0x0, priority 0
> Active_priority 0
> Sess_init: ended ok
> DL_init_channel: setsockopt for send and broadcast went ok
> DL_init_channel: setting Mcast TTL to 1
> DL_init_channel: went ok on channel 17
> E_handle_events: next event 
> E_sub_time: negative time result.
> E_handle_events: poll select
> E_handle_events: select with timeout (0, 0)
> E_handle_events: next event 
> E_handle_events: exec time event 
> DL_send: sent a message of 40 bytes to (239.255.3.62,4803) on channel 12
> E_queue: (last) event queued func 0x80566a0 code 0 data 0x0 in future
> (1:0)
> E_handle_events: poll select
> E_handle_events: exec handler for fd 6, fd_type 0, priority 2
> DL_recv: received 40 bytes on channel 6      <=========== **************
> E_handle_events: next event 
> E_handle_events: poll select
> E_handle_events: select with timeout (0, 999211)
> E_handle_events: exec handler for fd 6, fd_type 0, priority 2
> DL_recv: received 216 bytes on channel 6
> E_handle_events: next event 
> E_handle_events: poll select
> E_handle_events: select with timeout (0, 950546)
> 
> The difference starts at the ********* and I don't see any traffic on
> channel 11 until it 'kicks in' and everything starts working as shown
> below:
> 
> ....
> DL_send: sent a message of 784 bytes to (172.16.3.64,4804) on channel 12
> DL_send: sent a message of 784 bytes to (172.16.3.64,4804) on channel 12
> E_dequeue: first event dequeued func 0x8056740 code 0 data 0x0
> E_queue: (only) event queued func 0x8055fe0 code 0 data 0x0 in future
> (5:0)
> E_dequeue: no such event
> E_handle_events: poll select
> E_handle_events: select with timeout (4, 999991)
> E_handle_events: exec handler for fd 11, fd_type 0, priority 1
> DL_recv: received 624 bytes on channel 11
> DL_send: sent a message of 28 bytes to (172.16.3.64,4804) on channel 12
> E_queue: dequeued a (first) simillar event
> E_queue: (only) event queued func 0x8055fe0 code 0 data 0x0 in future
> (1:0)
> E_handle_events: next event 
> E_handle_events: poll select
> E_handle_events: exec handler for fd 11, fd_type 0, priority 1
> DL_recv: received 624 bytes on channel 11
> E_handle_events: next event 
> E_handle_events: poll select
> E_handle_events: select with timeout (0, 999958)
> E_handle_events: exec handler for fd 11, fd_type 0, priority 1
> DL_recv: received 28 bytes on channel 11
> DL_send: sent a message of 28 bytes to (172.16.3.64,4804) on channel 12
> E_queue: dequeued a (first) simillar event
> E_queue: (only) event queued func 0x8055fe0 code 0 data 0x0 in future
> (1:0)
> E_handle_events: next event 
> E_handle_events: poll select
> E_handle_events: select with timeout (0, 999987)
> E_handle_events: exec handler for fd 11, fd_type 0, priority 1
> DL_recv: received 28 bytes on channel 11
> DL_send: sent a message of 28 bytes to (172.16.3.64,4804) on channel 12
> E_queue: dequeued a (first) simillar event
> E_queue: (only) event queued func 0x8055fe0 code 0 data 0x0 in future
> (1:0)
> Membership id is ( -1408236736, 1147787491)
> 
> 
> I've watched the network during these times and the interesting thing is
> that from the network point of view, when things become unblocked, a
> unicast packet comes out of the blocked node to start things off 
> 21:06:55.826651 tap2.33264 > tap3.4803: udp 48 (DF)
> 21:06:56.826620 tap3.33022 > tap2.4804: udp 784 (DF)
> 21:06:56.826638 tap3.33022 > tap2.4804: udp 784 (DF)
> 21:07:01.826917 arp who-has tap2 tell tap3
> 21:07:01.827074 arp reply tap2 is-at 0:90:fb:5:32:50
> 21:07:02.827648 tap2.33264 > tap3.4803: udp 48 (DF)
> 21:07:03.827587 tap3.33022 > tap2.4804: udp 784 (DF)
> 21:07:03.827604 tap3.33022 > tap2.4804: udp 784 (DF)
> 
> ... 2 minutes 40 seconds of not establishing group membership ....
> 
> 21:09:43.852675 tap2.33264 > tap3.4803: udp 48 (DF)
> 21:09:44.852625 tap3.33022 > tap2.4804: udp 784 (DF)
> 21:09:44.852640 tap3.33022 > tap2.4804: udp 784 (DF)
> 21:09:44.853624 tap2.33264 > tap3.4804: udp 624 (DF)
> 21:09:44.853681 tap2.33264 > tap3.4804: udp 624 (DF)
> 21:09:44.853744 arp who-has tap4 tell tap3
> 21:09:44.853885 arp reply tap4 is-at 0:90:fb:2:86:12
> 21:09:44.853892 tap3.33022 > tap4.4804: udp 624 (DF)   <======== ******
> 21:09:44.853896 tap3.33022 > tap4.4804: udp 624 (DF)
> 
> In this run, tap2 and tap4 were up and tap3 took a long time to connect.
> There were no packets from tap3 to tap4 or vice versa prior to this, and
> afterward, all was well.
> 
> I have seen no problems like this at all with just two nodes.
> 
> Any similar experiences or pointers where to look would be appreciated.
> 
> Thanks,
> Scott
> 
> 
> 
> 
-- 





More information about the Spread-users mailing list