[Spread-users] Question about partitions
caudy at jhu.edu
Fri Oct 10 21:48:26 EDT 2003
Gayathri Dongre wrote:
> Let's say a network partition occurs between two
> daemons in a site. The application at one end tried to
> multicast messages M1-M10 to the group after the
> partition actually occurred and before the application
> received a Membership change message. After the nodes
> get connected on the network again, do i expect to see
> these messages M1-M10? From my tests using your tools,
> it seems like sometimes i do and sometimes i don't.
That sounds reasonable, if I understand you correctly. To make sure I
understand, "after the nodes get connected on the network again" means
after you physically correct the partition?
> Which factors or parameters does it depend on? My
> guess is that it would depend on the queue size, but
> sometimes i don't see any messages.
This depends on the timeouts that dictate the behavior of the membership
algorithm, in membership.c. If you repair the partition before a
certain point in the algorithm is reached, and this is noticed by the
daemons quickly enough, there's a chance that the daemons will be able
to avoid having to notify the application layer of a membership change
at all, in which case all the messages should be delivered. There's
theoretically no upper bound on how long it takes to reach that point,
because the network is asynchronous, but the lower bound is
approximately 10 seconds, if the timeouts are those that Spread is
normally distributed with.
> I need to have reliable delivery in my system even in
> view of partitions etc. If i choose to send causally
> ordered messages, the daemon doesn't really tell me if
> my messages, sent just before group membership
> changed, were not delivered to ALL the members of the
> old membership. Is that right, or am i wrong about
> that? Do i have to use SAFE messages to get reliable
> delivery in view of partitions, even if i don't need
> that level of ordering/stability when the message is
There's a good description of what SAFE really gives you in
http://www.cnds.jhu.edu/pub/papers/AT02_icdcs.pdf . See section 4.1.
In general, the paper addresses how to create a globally-consistent,
persistant total-order of messages.
So, if what you need is reliable delivery, even across partitions, one
tool for doing this (better than application-level acknowledgements, in
a lot of systems) is SAFE message semantics. Basically, what you can
say about a SAFE message is that if its delivered in a normal membership
(i.e. before the Transitional Membership Message signal is delivered),
then you know that everyone else will deliver the message or crash. If
you receive it after the Transitional, you only *know* that those who
are in the vs_set of the Regular Membership Message that follows will
deliver the message or crash.
Of course, the only way to know if someone crashed or delivered is to
get a message from them that was sent after the message you're
interested in was delivered to them.
So, in short, even using SAFE messages doesn't give you "reliable
delivery in view of partitions" as I understand it. There are cases
where you'll need to buffer messages at the application level for
retransmission in order to get what you need. On the bright side,
messages that weren't received on the other side of a partition can't
cause causal dependency problems ;-). Also, the ring protocol in Spread
doesn't pay extra latency costs for ordering of AGREED messages in
addition to the costs for CAUSAL, but there are additional costs for
SAFE (but less than application-level acknowledgements).
I hope this answers your questions. If you'd like to talk more about
what you're doing, its possible the list can suggest additional approaches.
Ryan W. Caudy
Center for Networking and Distributed Systems
Department of Computer Science
Johns Hopkins University
More information about the Spread-users