[Spread-users] Token too long for packet

Melissa Jenkins melissa-spread at temeletry.co.uk
Fri Dec 3 09:22:46 EST 2010


I've been having some troubles with a specific host trying to join my spread network.  Here is the log for my latest attempt... The Spread 'cluster' has been up and running for several weeks no problems.

[Fri 03 Dec 2010 14:09:44] G_handle_join: #dbupbogons#manch1 joins group radius
[Fri 03 Dec 2010 14:09:44] G_handle_join in GOP
[Fri 03 Dec 2010 14:09:44] G_handle_join: #dbupbogons#manch1 joins group resync
[Fri 03 Dec 2010 14:09:44] G_handle_join in GOP
[Fri 03 Dec 2010 14:09:44] G_handle_join: #dbupbogons#manch1 joins group alive
[Fri 03 Dec 2010 14:09:44] G_handle_join in GOP
[Fri 03 Dec 2010 14:09:46] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:09:56] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:10:06] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:10:16] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:10:26] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:10:36] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:10:37] Send_join: State is 4
[Fri 03 Dec 2010 14:10:38] Send_join: State is 4
[Fri 03 Dec 2010 14:10:39] Send_join: State is 4
[Fri 03 Dec 2010 14:10:40] Send_join: State is 4
[Fri 03 Dec 2010 14:10:41] Send_join: State is 4
[Fri 03 Dec 2010 14:10:46] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:10:56] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:11:06] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:11:16] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:11:26] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:11:36] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:11:41] G_handle_leave: #dbupbogons#manch1 leaves group radius
[Fri 03 Dec 2010 14:11:41] G_handle_leave in GOP
[Fri 03 Dec 2010 14:11:41] G_handle_leave: #dbupbogons#manch1 leaves group alive
[Fri 03 Dec 2010 14:11:41] G_handle_leave in GOP
[Fri 03 Dec 2010 14:11:41] G_handle_kill: #dbupbogons#manch1 is killed
[Fri 03 Dec 2010 14:11:41] G_handle_kill in GOP

It seems to join ok - but in the process totally disrupts the whole group :(  Shortly after it leaves the following showed up in the spread log of the daemon it had connected to:

[Fri 03 Dec 2010 14:14:05] Handle_alive in OP
[Fri 03 Dec 2010 14:14:05] Memb_token_loss: I lost my token, state is 1
[Fri 03 Dec 2010 14:14:05] Scast_alive: State is 2
[Fri 03 Dec 2010 14:14:05] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:05] Handle_alive in SEG
[Fri 03 Dec 2010 14:14:06] Scast_alive: State is 2
[Fri 03 Dec 2010 14:14:06] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:06] Handle_alive in SEG
[Fri 03 Dec 2010 14:14:06] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:06] Handle_alive in SEG
[Fri 03 Dec 2010 14:14:06] Stat_handle_message: sent status to Monitor at manch2
[Fri 03 Dec 2010 14:14:07] Send_join: State is 4
[Fri 03 Dec 2010 14:14:07] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:07] Handle_alive in GATHER
[Fri 03 Dec 2010 14:14:07] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:07] Handle_alive in GATHER
[Fri 03 Dec 2010 14:14:08] Send_join: State is 4
[Fri 03 Dec 2010 14:14:08] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:08] Handle_alive in GATHER
[Fri 03 Dec 2010 14:14:08] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:08] Handle_alive in GATHER
[Fri 03 Dec 2010 14:14:09] Send_join: State is 4
[Fri 03 Dec 2010 14:14:09] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:09] Handle_alive in GATHER
[Fri 03 Dec 2010 14:14:09] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:09] Handle_alive in GATHER
[Fri 03 Dec 2010 14:14:09] Memb_handle_message: handling join message from 192.168.124.1, State is 4
[Fri 03 Dec 2010 14:14:10] Send_join: State is 4
[Fri 03 Dec 2010 14:14:10] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:10] Handle_alive in GATHER
[Fri 03 Dec 2010 14:14:10] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:10] Handle_alive in GATHER
[Fri 03 Dec 2010 14:14:10] Memb_handle_message: handling join message from 192.168.124.1, State is 4
[Fri 03 Dec 2010 14:14:11] Send_join: State is 4
[Fri 03 Dec 2010 14:14:11] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:11] Handle_alive in GATHER
[Fri 03 Dec 2010 14:14:11] Memb_handle_message: handling alive message
[Fri 03 Dec 2010 14:14:11] Handle_alive in GATHER
[Fri 03 Dec 2010 14:14:11] Memb_handle_message: handling join message from 192.168.124.1, State is 4
[Fri 03 Dec 2010 14:14:12] INSERT HOLE 1 IS 1092099 My_aru is 1092098, Highest_seq is 1093702
[Fri 03 Dec 2010 14:14:12] INSERT HOLE 1 IS 1092100 My_aru is 1092098, Highest_seq is 1093702
[Fri 03 Dec 2010 14:14:12] INSERT HOLE 1 IS 1092101 My_aru is 1092098, Highest_seq is 1093702
[Fri 03 Dec 2010 14:14:12] INSERT HOLE 1 IS 1093635 My_aru is 1092098, Highest_seq is 1093702
<snip - loads of these removed>
[Fri 03 Dec 2010 14:14:12] INSERT HOLE 1 IS 1093663 My_aru is 1092098, Highest_seq is 1093702
[Fri 03 Dec 2010 14:14:12] INSERT HOLE 1 IS 1093694 My_aru is 1092098, Highest_seq is 1093702
[Fri 03 Dec 2010 14:14:12] INSERT HOLE 1 IS 1093695 My_aru is 1092098, Highest_seq is 1093702
[Fri 03 Dec 2010 14:14:12] INSERT HOLE 1 IS 1093696 My_aru is 1092098, Highest_seq is 1093702
[Fri 03 Dec 2010 14:14:12] INSERT HOLE 1 IS 1093697 My_aru is 1092098, Highest_seq is 1093702
[Fri 03 Dec 2010 14:14:12] =========== Form Token ==========
[Fri 03 Dec 2010 14:14:12] FORM 1 Token, sent by 192.168.124.203. Seq: 1097035
[Fri 03 Dec 2010 14:14:12] Configuration hash: 1656243324 (local hash 1656243324)
[Fri 03 Dec 2010 14:14:12] ProcID: 192.168.124.203       ARU: 1291385651, ARU LastID: 0.0.0.0
[Fri 03 Dec 2010 14:14:12] FlowControl: -1215   RTR Len: 1572
[Fri 03 Dec 2010 14:14:12] Form Token members list -- Active (1) Pending (2)
        0: 192.168.124.203
Pending Members:
        1: 192.168.124.213      2: 192.168.124.129

[Fri 03 Dec 2010 14:14:12] Form Token reps list -- Count (2) index (1)
        0: 192.168.124.203 (T 1 SegInd 0)       1: 192.168.124.1 (T 1 SegInd 1)
[Fri 03 Dec 2010 14:14:12] Form Token RING list -- Count (1)
[Fri 03 Dec 2010 14:14:12] Ring 0: MembID 192.168.124.203 - 1291260114  TransTime 0
[Fri 03 Dec 2010 14:14:12]      ARU: 1092098    HighSeq: 1093702        NumHoles: 204
[Fri 03 Dec 2010 14:14:12]      NumCommit: 1    NumTrans: 1
[Fri 03 Dec 2010 14:14:12]      Message Holes:  1092099         1092100         1092101         1092102         1092103         1092104         1092105
1092133         1092134         1092135         1092136         1092137         1092138         1092139         1092140         1092141         1092142
1092143         1092144         1092145         1092146         1092147         1092148         1092186         1092187         1092188         1092229
1092230         1092235         1092236         1092237         1092238         1092239         1092240         1092241         1092242         1092243
1092284         1092285         1092286         1092291         1092292         1092293         1092294         1092295         1092335         1092336
1092337         1092372         1092373         1092374         1092375         1092376         1092381         1092382         1092383         1092384
1092385         1092386         1092387         1092388         1092389         1092390         1092391         1092392         1092393         1092394
1092395         1092396         1092397         1092398         1092455         1092456         1092457         1092458         1092466         1092467
1092468         1092469         1092470         1092471         1092472         1092473         1092474         1092475         1092476         1092477
1092478         1092519         1092520         1092544         1092545         1092546         1092547         1092548         1092564         1092565
1092566         1092567         1092568         1092584         1092585         1092586         1092587         1092588         1092604         1092605
1092606         1092607         1092608         1092624         1092625         1092626         1092929         1092942         1092943         1092944
1092945         1092946         1092963         1092964         1092965         1092966         1092967         1092983         1092984         1092985
1092986         1092987         1092988         1092989         1093080         1093081         1093082         1093083         1093084         1093085
1093101         1093102         1093103         1093104         1093105         1093121         1093122         1093123         1093124         1093125
1093141         1093142         1093143         1093144         1093145         1093161         1093162         1093163         1093164         1093165
1093166         1093167         1093168         1093184         1093185         1093186         1093187         1093188         1093204         1093205
1093206         1093207         1093208         1093224         1093225         1093226         1093227         1093228         1093244         1093245
1093246         1093292         1093307         1093322         1093367         1093412         1093413         1093429         1093430         1093446
1093461         1093476         1093521         1093522         1093538         1093539         1093555         1093585         1093600         1093601
1093632         1093635         1093663         1093694         1093695         1093696         1093697
[Fri 03 Dec 2010 14:14:12]      Trans List:     0: 192.168.124.203
[Fri 03 Dec 2010 14:14:12]      Commit List:
[Fri 03 Dec 2010 14:14:12] ====================================================
[Fri 03 Dec 2010 14:14:12] Net_ucast_token: Token too long for packet!
Exit caused by Alarm(EXIT)

At which point the Spread daemon was no longer running :(  Other daemons in the group/segment seemed to start having messages out of order and everything went a bit weird.

I had originally tried to run a spread daemon on the server but that totally breaks all other daemons, with a massively increasing "s retrans" count.  The whole network grinds to a halt as that number goes up (and we hit > 90Mbit/sec!)  I don't believe there is a networking problem, but the "s retrans" issue would suggest there is.  I've confirmed and it's not a firewall issue, another host on the same LAN (next switch port along) is absolutely line.  All are running FreeBSD and all have been patched for the size_t 64 bit problem.

Anybody have any idea where to look as I really need to get this machine in the loop!

Thanks,
Mel



More information about the Spread-users mailing list