[Spread-users] Cluster Vulnerability Question
ldoshi at vertica.com
Mon Feb 25 19:20:05 EST 2013
Thanks for your answer! Interestingly, we actually do run a spread
daemon on each client machine in most cases. However, spread only
supports up to 128 nodes and so we've run into trouble here with larger
deployments. That forced us to switch to a smaller ring with clients
connecting to a shared spread damon.
We also would like to run in environments with higher latency where
having a smaller ring of spread daemon hosts with clients is preferable
to a full ring of all the nodes.
Hmm. I'm not sure I follow how the proxy service would guarantee that no
client ever missed a message in the period where they (or the proxy
service) detect a failed node and then connect to a back up. Any
thoughts on how we can address this?
On 02/25/2013 06:15 PM, Gyepi SAM wrote:
> On Mon, Feb 25, 2013 at 03:04:17PM -0500, Lyric Doshi wrote:
>> In our environment, we run a collection of spread daemons, where
>> multiple clients connect to each a spread daemon over TCP. Failure of a
>> spread daemon host machine disconnects all connected clients, making
>> many clients appear down, despite only the daemon failing.
>> We'd appreciate any thoughts or help you can provide on how we can
>> mitigate the problem of a spread-host dying and inducing all it's
>> children spread client nodes to fall behind the rest of the cluster as
>> well. An inefficient way to do this may be to connect every child to
>> multiple spread-daemon hosts so it may buffer and cross-check every
>> message it receives from each parent using some from of global unique
>> ordered ID.
> Hi Lyric,
> The most robust solution is to run a spread daemon on each client machine.
> Nearly as robust would be to interpose a proxy service between the
> client and the spread daemons. Something like haproxy, for example,
> which would, upon detecting a failed spread node, connect to a backup.
> Of course, any existing connections will be lost, but assuming that the
> client nodes reconnect, they'll be connected to the next available daemon.
> Spread-users mailing list
> Spread-users at lists.spread.org
More information about the Spread-users