[Spread-users] Cluster Vulnerability Question (John Schultz)

Marcelo San-Martin Marcelo.San-Martin at harmonicinc.com
Mon Mar 4 16:36:11 EST 2013

The only problem that I see on this approach is that the stored messages have lost their "causality" property. So the receiver can't do much with them except using them to change its internal state or process them as simple input stream. 

If these messages are intended to update some sort of state I would suggest to create and state-transfer protocol (like a checkpoint) so the a new client can request and receive the latest state when rejoining the group. The state can be provided by any active member in the cluster.

I hope this help,

-----Original Message-----
From: spread-users-request at lists.spread.org [mailto:spread-users-request at lists.spread.org] 
Sent: Monday, March 04, 2013 9:00 AM
To: spread-users at lists.spread.org
Subject: Spread-users Digest, Vol 95, Issue 2

Send Spread-users mailing list submissions to
	spread-users at lists.spread.org

To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to
	spread-users-request at lists.spread.org

You can reach the person managing the list at
	spread-users-owner at lists.spread.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of Spread-users digest..."

Today's Topics:

   1. Re: Cluster Vulnerability Question (John Schultz)


Message: 1
Date: Sun, 3 Mar 2013 12:36:45 -0500
From: John Schultz <jschultz at spreadconcepts.com>
Subject: Re: [Spread-users] Cluster Vulnerability Question
To: Lyric Doshi <ldoshi at vertica.com>
Cc: spread-users at lists.spread.org
Message-ID: <C1416663-84E6-49DF-AB11-FF19F0010A86 at spreadconcepts.com>
Content-Type: text/plain; charset="us-ascii"

Hi Lyric,

I've thought a bit about your problem and I think I have a potential solution.  

My basic idea is to build a user level, message recovery process that would normally allow failing over clients to recover any messages they missed whilst they were disconnected.  The recovery process would ephemerally cache the most recent messages for the groups in which it is participating.

When a client fails over to a new daemon it would send a message to the recovery process specifying the messages it did receive, then the recovery process would retransmit any missing messages.  This could either be done through Spread communication or a more conventional client-server TCP architecture.

The main issue with this approach is being able to specify the msg id's that a client did and did not receive.  You could either institute a user-level id'ing scheme or possibly you could change the daemon to reveal the agreed ordering of messages within daemon memberships.

Alternatively, along the same lines, you could augment the Spread daemon and API to build this service into the daemon itself.

I hope that helps!  Please feel free to contact me if you have any questions or want help in building such a solution.


John Lane Schultz
Spread Concepts LLC
Phn: 301 830 8100
Cell: 443 838 2200

On Feb 25, 2013, at 3:04 PM, Lyric Doshi wrote:

hey all,

In our environment, we run a collection of spread daemons, where multiple clients connect to each a spread daemon over TCP. Failure of a spread daemon host machine disconnects all connected clients, making many clients appear down, despite only the daemon failing. We cannot afford to have any clients miss messages as they reconnect to a different spread daemon.

We'd appreciate any thoughts or help you can provide on how we can mitigate the problem of a spread-host dying and inducing all it's children spread client nodes to fall behind the rest of the cluster as well. An inefficient way to do this may be to connect every child to multiple spread-daemon hosts so it may buffer and cross-check every message it receives from each parent using some from of global unique ordered ID. However, this doubles our traffic and can be painful if messages are large so we're hoping for something better that spread can provide or other customers may have engineered to eliminate this problem.

-- Lyric

Spread-users mailing list
Spread-users at lists.spread.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3805 bytes
Desc: not available
Url : http://lists.spread.org/pipermail/spread-users/attachments/20130303/77feee22/attachment-0001.bin 


Spread-users mailing list
Spread-users at lists.spread.org

End of Spread-users Digest, Vol 95, Issue 2

More information about the Spread-users mailing list