[Spread-users] More on messge freezes

Doug Palmer Doug.Palmer at csiro.au
Tue Jun 19 02:27:50 EDT 2007


I've been experimenting more, with the help of some lower-level
networking experts, trying to find the cause of the odd freezes that
we've been seeing.

If I have four spread daemons running, with only two systems actually
producing spread traffic, then I will get smooth communication in one
direction and freezes in the other direction. (All freezes are 2
seconds; we've experimented with reducing the hurry timeout and the
freezes reduce accordingly.)

If I pull the network cable out of the non-traffic systems, then
communication works fine both ways. If I then connect either system,
then the freeze starts re-appearing.

If I configure the systems so that my spread segment only contains the
two machines, with the other systems plugs in, then everything works
correctly in both directions.

If I introduce a third spread daemon into the mix, then one
sender1-receiver pair works correctly, but the other sender2->receiver
pair freezes.

If I try to split the two systems up into two different segments on
different port numbers, then things work OK for a while, but I
eventually see freezes start to appear and things become unstable.

I don't think that it's possible to blame a network problem for this,
particularly as everything else seems to work perfectly correctly. I've
had some network experts look at the network packet flow and they seem
satisfied with the results. To my untutored eye, it looks like
introducing a third spread daemon into the mix, when it's not being
directly used, causes it to hold onto things for a while.

Doug




More information about the Spread-users mailing list