[Spread-users] Cluster Vulnerability Question

Lyric Doshi ldoshi at vertica.com
Mon Feb 25 19:20:05 EST 2013


hi Gyepi,

Thanks for your answer! Interestingly, we actually do run a spread 
daemon on each client machine in most cases. However, spread only 
supports up to 128 nodes and so we've run into trouble here with larger 
deployments. That forced us to switch to a smaller ring with clients 
connecting to a shared spread damon.

We also would like to run in environments with higher latency where 
having a smaller ring of spread daemon hosts with clients is preferable 
to a full ring of all the nodes.

Hmm. I'm not sure I follow how the proxy service would guarantee that no 
client ever missed a message in the period where they (or the proxy 
service) detect a failed node and then connect to a back up. Any 
thoughts on how we can address this?

Thanks!
-- Lyric


On 02/25/2013 06:15 PM, Gyepi SAM wrote:
> On Mon, Feb 25, 2013 at 03:04:17PM -0500, Lyric Doshi wrote:
>> In our environment, we run a collection of spread daemons, where
>> multiple clients connect to each a spread daemon over TCP. Failure of a
>> spread daemon host machine disconnects all connected clients, making
>> many clients appear down, despite only the daemon failing.
>> We'd appreciate any thoughts or help you can provide on how we can
>> mitigate the problem of a spread-host dying and inducing all it's
>> children spread client nodes to fall behind the rest of the cluster as
>> well. An inefficient way to do this may be to connect every child to
>> multiple spread-daemon hosts so it may buffer and cross-check every
>> message it receives from each parent using some from of global unique
>> ordered ID.
> Hi Lyric,
>
> The most robust solution is to run a spread daemon on each client machine.
>
> Nearly as robust would be to interpose a proxy service between the
> client and the spread daemons. Something like haproxy, for example,
> which would, upon detecting a failed spread node, connect to a backup.
>
> Of course, any existing connections will be lost, but assuming that the
> client nodes reconnect, they'll be connected to the next available daemon.
>
> -Gyepi
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users




More information about the Spread-users mailing list