[Spread-users] Answer_retrans: retrans of 1 requested while Aru is 14
matthew.garman at gmail.com
Mon Feb 11 16:03:22 EST 2008
On Mon, Feb 11, 2008 at 02:58:35PM -0500, John Lane Schultz wrote:
> > I can't tell you why that occurred but I can say what the error
> > means. The daemon that crashed believed that all the other
> > daemons had acknowledged receiving up through message #14. But
> > then it looks like one of the daemons requested a resend of
> > message #1. This shouldn't happen because the requesting daemon
> > had (allegedly) already acknowledged receiving up through
> > message #14.
> > This could be some kind of wrap around problem with the message
> > counter? The fact that the message numbers were so small (1,
> > 14) around the time of this failure strikes me as suspicious ...
> Following up on my last point, did this occur in a long running
> system or had you just started up the daemon(s)?
I'd say long-running: non-stop for a whole month. And this daemon
gets a lot of use, too.
More information about the Spread-users