[Spread-users] Cluster Vulnerability Question
self-spread at gyepi.com
Mon Feb 25 18:15:15 EST 2013
On Mon, Feb 25, 2013 at 03:04:17PM -0500, Lyric Doshi wrote:
> In our environment, we run a collection of spread daemons, where
> multiple clients connect to each a spread daemon over TCP. Failure of a
> spread daemon host machine disconnects all connected clients, making
> many clients appear down, despite only the daemon failing.
> We'd appreciate any thoughts or help you can provide on how we can
> mitigate the problem of a spread-host dying and inducing all it's
> children spread client nodes to fall behind the rest of the cluster as
> well. An inefficient way to do this may be to connect every child to
> multiple spread-daemon hosts so it may buffer and cross-check every
> message it receives from each parent using some from of global unique
> ordered ID.
The most robust solution is to run a spread daemon on each client machine.
Nearly as robust would be to interpose a proxy service between the
client and the spread daemons. Something like haproxy, for example,
which would, upon detecting a failed spread node, connect to a backup.
Of course, any existing connections will be lost, but assuming that the
client nodes reconnect, they'll be connected to the next available daemon.
More information about the Spread-users