[Spread-users] Cluster Vulnerability Question
jschultz at spreadconcepts.com
Sun Mar 3 12:36:45 EST 2013
I've thought a bit about your problem and I think I have a potential solution.
My basic idea is to build a user level, message recovery process that would normally allow failing over clients to recover any messages they missed whilst they were disconnected. The recovery process would ephemerally cache the most recent messages for the groups in which it is participating.
When a client fails over to a new daemon it would send a message to the recovery process specifying the messages it did receive, then the recovery process would retransmit any missing messages. This could either be done through Spread communication or a more conventional client-server TCP architecture.
The main issue with this approach is being able to specify the msg id's that a client did and did not receive. You could either institute a user-level id'ing scheme or possibly you could change the daemon to reveal the agreed ordering of messages within daemon memberships.
Alternatively, along the same lines, you could augment the Spread daemon and API to build this service into the daemon itself.
I hope that helps! Please feel free to contact me if you have any questions or want help in building such a solution.
John Lane Schultz
Spread Concepts LLC
Phn: 301 830 8100
Cell: 443 838 2200
On Feb 25, 2013, at 3:04 PM, Lyric Doshi wrote:
In our environment, we run a collection of spread daemons, where
multiple clients connect to each a spread daemon over TCP. Failure of a
spread daemon host machine disconnects all connected clients, making
many clients appear down, despite only the daemon failing. We cannot
afford to have any clients miss messages as they reconnect to a
different spread daemon.
We'd appreciate any thoughts or help you can provide on how we can
mitigate the problem of a spread-host dying and inducing all it's
children spread client nodes to fall behind the rest of the cluster as
well. An inefficient way to do this may be to connect every child to
multiple spread-daemon hosts so it may buffer and cross-check every
message it receives from each parent using some from of global unique
ordered ID. However, this doubles our traffic and can be painful if
messages are large so we're hoping for something better that spread can
provide or other customers may have engineered to eliminate this problem.
Spread-users mailing list
Spread-users at lists.spread.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3805 bytes
Desc: not available
Url : http://lists.spread.org/pipermail/spread-users/attachments/20130303/77feee22/attachment.bin
More information about the Spread-users