[Spread-users] Cluster Vulnerability Question

Mon Mar 4 18:16:35 EST 2013

Can you provide a bit more detail about your system: how many messages /
second, how often there are failures, what you'd like to happen if
everything went down at once, etc.?

I ask, only because often we find ourselves using a particular system
because it is what we have, not what fits our needs.  Spread (at least the
open source version I used a few years back) is not a durable messaging
system - if you are going to try and build durability on top of it, you
might spend lots of effort, when perhaps something already exists that
meets your needs.

On Mon, Feb 25, 2013 at 7:20 PM, Lyric Doshi <ldoshi at vertica.com> wrote:

> hi Gyepi,
>
> Thanks for your answer! Interestingly, we actually do run a spread
> daemon on each client machine in most cases. However, spread only
> supports up to 128 nodes and so we've run into trouble here with larger
> deployments. That forced us to switch to a smaller ring with clients
> connecting to a shared spread damon.
>
> We also would like to run in environments with higher latency where
> having a smaller ring of spread daemon hosts with clients is preferable
> to a full ring of all the nodes.
>
> Hmm. I'm not sure I follow how the proxy service would guarantee that no
> client ever missed a message in the period where they (or the proxy
> service) detect a failed node and then connect to a back up. Any
> thoughts on how we can address this?
>
> Thanks!
> -- Lyric
>
>
> On 02/25/2013 06:15 PM, Gyepi SAM wrote:
> > On Mon, Feb 25, 2013 at 03:04:17PM -0500, Lyric Doshi wrote:
> >> In our environment, we run a collection of spread daemons, where
> >> multiple clients connect to each a spread daemon over TCP. Failure of a
> >> spread daemon host machine disconnects all connected clients, making
> >> many clients appear down, despite only the daemon failing.
> >> We'd appreciate any thoughts or help you can provide on how we can
> >> mitigate the problem of a spread-host dying and inducing all it's
> >> children spread client nodes to fall behind the rest of the cluster as
> >> well. An inefficient way to do this may be to connect every child to
> >> multiple spread-daemon hosts so it may buffer and cross-check every
> >> message it receives from each parent using some from of global unique
> >> ordered ID.
> > Hi Lyric,
> >
> > The most robust solution is to run a spread daemon on each client
> machine.
> >
> > Nearly as robust would be to interpose a proxy service between the
> > client and the spread daemons. Something like haproxy, for example,
> > which would, upon detecting a failed spread node, connect to a backup.
> >
> > Of course, any existing connections will be lost, but assuming that the
> > client nodes reconnect, they'll be connected to the next available
> daemon.
> >
> > -Gyepi
> >
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
>
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20130304/3850fcce/attachment.html