[Spread-users] Question about partitions

Fri Oct 10 21:48:26 EDT 2003

Responses inlined.

--Ryan

Gayathri Dongre wrote:

> Let's say a network partition occurs between two
> daemons in a site. The application at one end tried to
> multicast messages M1-M10 to the group after the
> partition actually occurred and before the application
> received a Membership change message. After the nodes
> get connected on the network again, do i expect to see
> these messages M1-M10? From my tests using your tools,
> it seems like sometimes i do and sometimes i don't.

That sounds reasonable, if I understand you correctly.  To make sure I 
understand, "after the nodes get connected on the network again" means 
after you physically correct the partition?

> Which factors or parameters does it depend on? My
> guess is that it would depend on the queue size, but
> sometimes i don't see any messages.

This depends on the timeouts that dictate the behavior of the membership 
algorithm, in membership.c.  If you repair the partition before a 
certain point in the algorithm is reached, and this is noticed by the 
daemons quickly enough, there's a chance that the daemons will be able 
to avoid having to notify the application layer of a membership change 
at all, in which case all the messages should be delivered.  There's 
theoretically no upper bound on how long it takes to reach that point, 
because the network is asynchronous, but the lower bound is 
approximately 10 seconds, if the timeouts are those that Spread is 
normally distributed with.

> I need to have reliable delivery in my system even in
> view of partitions etc. If i choose to send causally
> ordered messages, the daemon doesn't really tell me if
> my messages, sent just before group membership
> changed, were not delivered to ALL the members of the
> old membership. Is that right, or am i wrong about
> that? Do i have to use SAFE messages to get reliable
> delivery in view of partitions, even if i don't need
> that level of ordering/stability when the message is
> delivered?

There's a good description of what SAFE really gives you in 
http://www.cnds.jhu.edu/pub/papers/AT02_icdcs.pdf .  See section 4.1. 
In general, the paper addresses how to create a globally-consistent, 
persistant total-order of messages.

So, if what you need is reliable delivery, even across partitions, one 
tool for doing this (better than application-level acknowledgements, in 
  a lot of systems) is SAFE message semantics.  Basically, what you can 
say about a SAFE message is that if its delivered in a normal membership 
(i.e. before the Transitional Membership Message signal is delivered), 
then you know that everyone else will deliver the message or crash.  If 
you receive it after the Transitional, you only *know* that those who 
are in the vs_set of the Regular Membership Message that follows will 
deliver the message or crash.

Of course, the only way to know if someone crashed or delivered is to 
get a message from them that was sent after the message you're 
interested in was delivered to them.

So, in short, even using SAFE messages doesn't give you "reliable 
delivery in view of partitions" as I understand it.  There are cases 
where you'll need to buffer messages at the application level for 
retransmission in order to get what you need.  On the bright side, 
messages that weren't received on the other side of a partition can't 
cause causal dependency problems ;-).  Also, the ring protocol in Spread 
doesn't pay extra latency costs for ordering of AGREED messages in 
addition to the costs for CAUSAL, but there are additional costs for 
SAFE (but less than application-level acknowledgements).

I hope this answers your questions.  If you'd like to talk more about 
what you're doing, its possible the list can suggest additional approaches.

--Ryan

-- 
Ryan W. Caudy
Center for Networking and Distributed Systems
Department of Computer Science
Johns Hopkins University