[Spread-users] Cluster Vulnerability Question

Gyepi SAM self-spread at gyepi.com
Mon Feb 25 18:15:15 EST 2013

On Mon, Feb 25, 2013 at 03:04:17PM -0500, Lyric Doshi wrote:
> In our environment, we run a collection of spread daemons, where 
> multiple clients connect to each a spread daemon over TCP. Failure of a 
> spread daemon host machine disconnects all connected clients, making 
> many clients appear down, despite only the daemon failing.
> > 
> We'd appreciate any thoughts or help you can provide on how we can 
> mitigate the problem of a spread-host dying and inducing all it's 
> children spread client nodes to fall behind the rest of the cluster as 
> well. An inefficient way to do this may be to connect every child to 
> multiple spread-daemon hosts so it may buffer and cross-check every 
> message it receives from each parent using some from of global unique 
> ordered ID.

Hi Lyric,

The most robust solution is to run a spread daemon on each client machine.

Nearly as robust would be to interpose a proxy service between the
client and the spread daemons. Something like haproxy, for example,
which would, upon detecting a failed spread node, connect to a backup.

Of course, any existing connections will be lost, but assuming that the
client nodes reconnect, they'll be connected to the next available daemon.


