[Spread-users] Confirmation for Reliable Delivery
John Lane Schultz
jschultz at spreadconcepts.com
Fri Aug 8 18:53:22 EDT 2008
Your conception of causality is, I think, a bit different than what Spread actually provides. In particular, there is no strong linkage between a client dying and its daemon eventually detecting that fact. It is entirely possible that a client sends a message to its daemon, then one of the intended recipients crashes while Spread is oblivious to that fact for some period of time and continues trying to deliver messages to the recipient believing it to be alive. There is no cheap way around this problem as there is no common knowledge in distributed systems.
This points to one issue with your design: one of the "servers" can crash while the others still assume it is handling it's modulo spot in the group, when it is not in fact. Eventually, the "server's" failure will be detected by Spread and a membership change will be issued, but in the mean time those requests assigned to it by your load balancing algorithm would be lost.
Another way requests could be lost in your system is through network partitions. If you client's daemon was partitioned away from all of the alive "servers'" daemons, then its requests would fall on deaf ears (i.e. - an empty group).
The RELIABLE message service essentially means that your daemon will ensure that any other daemons (and their alive clients) to which it remains connected will get your message (i.e. - intermittent network losses will be overcome). If your daemon disconnects from other daemons (i.e. - CAUSED_BY_NETWORK membership change), then your message may or may not get to those daemons (and their clients) depending on the exact chain of events.
The only 100% certain way for your client to know its message was received and processed is for it to receive an explicit client level ACK from a recipient indicating exactly that.
PS - When SP_multicast returns success, that only means that your message was successfully queued (and possibly sent) to go to your Spread daemon and nothing more.
John Lane Schultz
Spread Concepts LLC
Phn: 443 838 2200
Fax: 301 560 8875
Friday, August 8, 2008, 6:01:57 PM, you wrote:
> I am new to spread, and I am trying to understand how the reliable
> delivery works. Here is what I want to do: A client sends a message
> to a group of servers (the client is not a member of the Servers
> group) and one member in the Servers group will process the request
> based on an internal load balancing function (mod # servers in the
> group). If the spread client multicasts an AGREED message to the
> Servers group, the multicast will return with a success as soon as
> the message is received by one spread daemon. Then the spread
> daemon will multicast the message to all the members of the group.
> Since AGREED service type follows causality, if any member in the
> Servers group dies after client sends the message, and before Spread
> delivers the message to all group members, the alive servers will
> receive the membership change before receiving the client's message,
> and can rearrange the load balancing function to handle all the
> requests. The only time that a client's request might be lost is
> when all members of the Servers group die, in which case client's
> multicast will still return with success, while no server has
> received it. So can I assume that as long as at least one server is
> alive, the client can be sure 100% that all alive servers will
> always receive a client's request? or am I missing some edge cases
> where the message might still get lost somewhere in the queues?
> The same question for multicasting to private groups. If a client
> wants to talk to another client, using a central Spread daemon, as
> soon as the daemon receives the message, multicast will return
> success for the sender client. the daemon then will send the message
> to the receiver client reliably. Therefore the only case that the
> receiver might not receive the message is when it dies before the
> message is delivered. So can I assume that as long as the receiver
> client is alive reliable delivery is 100% ? or am I missing some edge cases here?
> I also saw some threads about Spread recognizing the membership
> change (leave) with delay. does this mean that the alive members of
> a group might receive the membership message after a regular message
> that was sent after the actual death of a certain member? doesn't this contradict the causality?
> I would be grateful if someone can explain these cases.
> Thank you;
More information about the Spread-users