[Spread-users] Cluster Vulnerability Question

Lyric Doshi ldoshi at vertica.com
Mon Feb 25 15:04:17 EST 2013


hey all,

In our environment, we run a collection of spread daemons, where 
multiple clients connect to each a spread daemon over TCP. Failure of a 
spread daemon host machine disconnects all connected clients, making 
many clients appear down, despite only the daemon failing. We cannot 
afford to have any clients miss messages as they reconnect to a 
different spread daemon.

We'd appreciate any thoughts or help you can provide on how we can 
mitigate the problem of a spread-host dying and inducing all it's 
children spread client nodes to fall behind the rest of the cluster as 
well. An inefficient way to do this may be to connect every child to 
multiple spread-daemon hosts so it may buffer and cross-check every 
message it receives from each parent using some from of global unique 
ordered ID. However, this doubles our traffic and can be painful if 
messages are large so we're hoping for something better that spread can 
provide or other customers may have engineered to eliminate this problem.

Thanks!
-- Lyric



More information about the Spread-users mailing list