[Spread-users] spread.log saying "BUG WORKAROUND...swallowing token"

John Lane Schultz jschultz at spreadconcepts.com
Wed Feb 1 10:09:56 EST 2017


Hi Martin,

First, thanks for testing and letting me know that RC2 works in your mixed endian environment.

Second, I think the most likely thing happening here is that your multicast isn’t working in one or more of your 3 daemon segments and Spread’s recovery mechanisms aren’t able to overcome that problem in this situation.

Here’s an example of what could be going wrong.  First, let’s assume that your multicast just doesn’t work at all in one of your 3 daemon segments.  The first daemon, say easy25-ref-back sends a data message that gets locally multicasted to the segment, but because the multicast isn’t working, neither easy26-ref-back nor easy-27-ref-back receive it.  They do receive the unicast token and they both see that they missed a message from easy25-ref-back and request retransmission.  Since both of them request retransmission, then the request is changed from a unicast retransmission request to a multicast retransmission request.  When the token gets back around to easy25-ref-back he sees the multicast retransmission request and he again multicasts the message and it again goes nowhere.  This state is likely permanent and would eventually block any forward progress in the ring and would cause the membership algorithm to act as you reported below because the EVS recovery is unable to fill the hole for that message for the same reason as above.

You can use spmonitor to see if any of the daemons are reporting a significant number of retransmissions.  If multicast is working and all the daemons are sending traffic, then you should see very few (if any) u_retrans, s_retrans or b_retrans.  If any of the daemons report a lot of s_retrans or b_retrans, then that is a strong hint that multicast isn’t working somewhere in your setup.

You can also use the spsend and sprecv programs that can be built in the daemon directory to test basic network functionality, including whether multicast works as expected or not.  If you do this, then you want to test that you can send and receive from each of your machines in both segments.

You said when you didn’t use multicast everything worked as normal.  Were you using broadcast instead?  If so, then it is likely a multicast configuration issue with either your machines or your switches or possibly a firewall issue.

If you want, then you can send me more details either on or off the mailing list with reports of what spmonitor says, questions or reports of what spsend and sprecv say, etc.

Cheers!

-----
John Lane Schultz
Spread Concepts LLC
Cell: 443 838 2200

On Feb 1, 2017, at 4:14 AM, Martin Schu <martin.sc11111 at gmail.com> wrote:

Hi John,

we have a spread-4.4.0 installation and observe some mysterious problems.

We have configured multicast and we have multiple interfaces per machine. Perhaps something is wrong with the configuration for multicast in our network.

In spread.log we see a strange error message saying "Prot_handle_token: BUG WORKAROUND...swallowing token". This message is occurring 4 times per minute. See below.

Digging in the source code of spread-5.0.0-rc2 we see that strange message is still implemented there.

Can you give us some hints which circumstances could induce that strange "BUG WORKAROUND" message? Is this a "workaround" for a bug in spread or for a bug in network setup?

In our situation there is a total malfunction. No spread communication possible at all.

If we remove the multicast setup from our configuration everything is working fine on the same machines/network.

Best regards,
Martin
--
Spread_Segment  239.192.2.1:5555 {
        easydb20-ref-back       1.2.3.131  { 1.2.3.131 }
}
Spread_Segment  239.192.2.2:5555 {
        easy22-ref-back 1.2.3.4    { 1.2.3.4 }
        easy23-ref-back 1.2.3.5    { 1.2.3.5 }
        easy24-ref-back 1.2.3.6    { 1.2.3.6 }
}
Spread_Segment  239.192.2.3:5555 {
        easydb21-ref-back       1.2.3.195  { 1.2.3.195 }
}
Spread_Segment  239.192.2.4:5555 {
        easy25-ref-back 1.2.3.68   { 1.2.3.68 }
        easy26-ref-back 1.2.3.69   { 1.2.3.69 }
        easy27-ref-back 1.2.3.70   { 1.2.3.70 }
}
...
2017-01-31 10:25:10 GMT Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state:
2017-01-31 10:25:25 GMT Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state:
2017-01-31 10:25:43 GMT Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state:
2017-01-31 10:25:58 GMT Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state:
2017-01-31 10:26:16 GMT Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state:
2017-01-31 10:26:31 GMT Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state:
2017-01-31 10:26:34 GMT Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state:
2017-01-31 10:26:37 GMT Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state:
...
2017-01-31 10:39:42 GMT Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state:
2017-01-31 10:39:42 GMT     Aru:              0
2017-01-31 10:39:42 GMT     My_aru:           3366
2017-01-31 10:39:42 GMT     Highest_seq:      0
2017-01-31 10:39:42 GMT     Highest_fifo_seq: 0
2017-01-31 10:39:42 GMT     Last_discarded:   0
2017-01-31 10:39:42 GMT     Last_delivered:   0
2017-01-31 10:39:42 GMT     Last_seq:         3366
2017-01-31 10:39:42 GMT     Token_rounds:     501
2017-01-31 10:39:42 GMT Last Token:
2017-01-31 10:39:42 GMT     type:             0x80050080
2017-01-31 10:39:42 GMT     transmiter_id:    -1726901244
2017-01-31 10:39:42 GMT     seq:              0
2017-01-31 10:39:42 GMT     proc_id:          -1726901244
2017-01-31 10:39:42 GMT     aru:              0
2017-01-31 10:39:42 GMT     aru_last_id:      -1726901179
2017-01-31 10:39:42 GMT     flow_control:     33
2017-01-31 10:39:42 GMT     rtr_len:          144
2017-01-31 10:39:42 GMT     conf_hash:        -2016107656
2017-01-31 10:39:57 GMT Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state:
2017-01-31 10:39:57 GMT     Aru:              0
2017-01-31 10:39:57 GMT     My_aru:           3366
2017-01-31 10:39:57 GMT     Highest_seq:      0
2017-01-31 10:39:57 GMT     Highest_fifo_seq: 0
2017-01-31 10:39:57 GMT     Last_discarded:   0
2017-01-31 10:39:57 GMT     Last_delivered:   0
2017-01-31 10:39:57 GMT     Last_seq:         3366
2017-01-31 10:39:57 GMT     Token_rounds:     501
2017-01-31 10:39:57 GMT Last Token:
2017-01-31 10:39:57 GMT     type:             0x80050080
2017-01-31 10:39:57 GMT     transmiter_id:    -1726901244
2017-01-31 10:39:57 GMT     seq:              0
2017-01-31 10:39:57 GMT     proc_id:          -1726901244
2017-01-31 10:39:57 GMT     aru:              0
2017-01-31 10:39:57 GMT     aru_last_id:      -1726901179
2017-01-31 10:39:57 GMT     flow_control:     33
2017-01-31 10:39:57 GMT     rtr_len:          144
2017-01-31 10:39:57 GMT     conf_hash:        -2016107656






More information about the Spread-users mailing list