[Spread-users] More on messge freezes

Doug Palmer Doug.Palmer at csiro.au
Wed Jun 20 21:44:55 EDT 2007


On Tue, 2007-06-19 at 12:21 -0400, John Schultz wrote:

> For starters, keep the global window at 60 and lower the daemons' burst 
> window sizes down to 5.  This will cause higher overhead as the ratio of 
> data to control traffic will decrease, but it may eliminate the freezes 
> you are seeing.

I've tried this (after figuring out that I should set DangerousMonitor =
true in the configuration file). I've confirmed from the monitor reports
that the windows are, in fact 60/5. Lowering the burst window sizes to 5
doesn't seem to affect anything.

One reason for thinking that it is not a network problem is that the
problem sender changes as the spread configuration changes. For example,
if I have the machines skoda, beetle and bluebird running daemons, then
sending messages skoda->beetle causes freezes. If I have skoda, trabant
and beetle running, then skoda->beetle is OK, but trabant->beetle
freezes.

In the system that we are building, there is a tendency for traffic to
go from skoda<->bluebird and trabant<->beetle with the traffic dying in
one pair as attention shifts to the other pair. We have discovered that
by having someone "play" on the supposedly idle pair, we can ensure that
the freezes do not happen. What I suspect is that the idle daemons are
getting bored and holding onto tokens. However, I'm not sure how to test
this.

Doug





More information about the Spread-users mailing list