[Spread-users] Different OS versions of segment members in 4.0.0?
matthew.garman at gmail.com
Mon Oct 27 14:42:26 EDT 2014
We have several important Spread segments that are still running
Spread version 4.0.0. Why we're still on an eight-year-old version is
simply oversight on our part. So we do plan to upgrade, as soon as
the current issue is resolved.
The issue is: the segments are all composed of the same four machines.
Until Friday, all machines were running CentOS 5.7. On Friday, just
one of those machines was upgraded to CentOS 6.5.
Now, what we've run into is that after some trigger (seems to be
correlated with traffic and/or load), one or more daemons gets into a
bad state such that the entire segment becomes unusable. In
particular, from the Spread log files, everything looks OK (no
crashes, no weird errors, just a typical logfile). But using our
applications, we either (1) cannot even connect to any daemon in the
segment, or (2) we can connect, but cannot send or receive any
messages. In the latter case, it might look like a message is being
sent, but it is certainly not delivered to any listeners.
The workaround fix for now is to simply kill the Spread daemon
processes on the server with the upgraded OS. When we do this, the
other daemons seem to auto-correct and everything works as expected.
I know the 4.0.0 version is ancient history, but does this problem
description trigger anyone's memory of OS-version-related issues? In
particular, we're trying to re-create the issue in a lab environment
first, so we can have confidence that a Spread version upgrade (or
whatever) will actually fix it (as opposed to simply deploying a newer
version an hoping it works).
Does any of this ring any bells? :)
More information about the Spread-users