[Spread-users] Re: Some high level qns reg spread

John Schultz jschultz at commedia.cnds.jhu.edu
Tue Feb 18 23:41:21 EST 2003


I had to send this email to the whole list as it just had too much juicy
stuff about group communication to keep private. I also wanted to get this
email into the Spread list archive for future reference.

John Schultz
Co-Founder, Lead Engineer
D-Fusion, Inc. (http://www.d-fusion.net)
Phn: 443 838 2200

On Tue, 18 Feb 2003, Gautham L wrote:
>
> On Sat, 15 Feb 2003, John Schultz wrote:
>
>> If you install a Flush view, then you are guranteed that all of the
>> members of the VS set of that view that _continued with you_ will
>> deliver the messages you sent in the previous view in that previous
>> view.
>>
>> A process being in your VS set does not gurantee that that process
>> continued with you. The only way to really tell if a member of your VS
>> set continued with you is to receive a msg from that process in the new
>> view.
>
> I did not understand this. How can i tell by receiving a msg from the
> process in the new view, that the process has continuted with me ? I am
> thinking that if a process was in the immediate previous view and is
> also in the current view, then it has continued with me; and that any
> messages that i have sent in the previous view have been received by
> this _surviving companion_ process(i decide this on receiving the new
> view).
>
> perhaps you meant the same thing, but the statement of receiving a msg
> disturbed me a little :)
>
I did not mean the same thing you said. We are talking about very delicate
and difficult distributed system issues that many users can ignore. You
only really need to worry about these things when you are trying to
absolutely maintain hard gurantees.

In the following: deliver a msg m at process P means that m is ready for
immediate receipt by a user application process P; install a view V at
process Q means that V is ready for immediate receipt by a user
application process Q.

In my previous email I said that virtual synchrony was the idea that a
process can know that certain messages had been delivered at other
processes without further communication. The real "spirit" of virtual
synchrony is a bit stronger. It is more along the lines of "Virtually
synchronous processes are proceeding through the same set of events and
therefore based on their shared state and those events can make consistent
decisions without further communication."

The View Synchrony Property says that certain processes were virtually
synchronous throughout the entire previous view all the way into the
current view. Effectively, that these processes saw the same set of events
starting with the previous view event up to and including the current view
event. If you couple this property with a strong ordering of events (by
using CAUSAL/AGREED/SAFE msgs), state synchronization when necessary and
deterministic behavior you can make very strong assertions about your
application's state and behavior at every point throughout any chain of
events. Your individual client applications can be "virtually
synchronous;" meaning synchronous with others without a clock, just based
on a (ordered) set of events. This is the true strength of group
communication and why this paradigm was originally developed. With all of
that being said,

*************************************************************************
* A process being in the membership set of your previous view and also  *
* in the membership set of your current view does NOT imply that it was *
* virtually synchronous with you in the previous view. This is true for *
* two main reasons:                                                     *
*                                                                       *
* (1) A process being in the membership set of a view does NOT gurantee *
* that the process installed that view. The process in question might   *
* have crashed before installing the view, or it could have been unable *
* to finish the membership algorithm and went on to eventually install  *
* another different view.                                               *
*                                                                       *
* (2) A process in question could have installed other views "in        *
* between" YOUR previous and current views. It might have missed msgs   *
* that you delivered and/or it might have delivered messages in those   *
* other views that might have affected its state. Therefore, you and    *
* this process were not virtually synchonous.                           *
*************************************************************************

The View Synchrony Property is NOT based solely on a local perception of
view events. Usually, the View Synchrony Property states that if processes
P and Q both install a view N' in the same previous view N, then they saw
the same set of events in N (i.e. were virtually synchronous in N). This
property requires more information than is available at a process at the
time of view installation -- P needs to know that Q installed N' in N like
P did before P can know if they were virtually synchronous or not.

Now you might be wondering, "What is the VS set then?" The purpose of the
VS set is to let a process locally know which processes were ALMOST SURELY
virtually synchronous with it in the previous view. In particular, it
tells the local process that the listed processes installed the previous
view, are in the membership of the new view, did not install any
intervening views and with very high likelihood (although not 100%)
installed the new view (i.e. they continued with you from the previous
view directly into the new view).

This brings us to your question: "How can i tell by receiving a msg from
the process in the new view, that the process has continuted with me?"
In Flush, if you receive a msg from a member of your VS set in the new
view, then this explicitly indicates that they installed the new view.
Couple this with them being in your VS set and the GCS gurantees that they
moved directly from the previous to the new view with you. Therefore, they
were virtually synchronous with you in the previous view.

In non-Virtual Synchrony GCSs such as Spread (which provides Extended
Virtual Synchrony) I believe you need to get an explicit message at _some
point_ stating that the sender moved directly from membership M to
membership M' to KNOW that they were virtually synchronous in M.

**************************************************************************
* All of this boils down to the following: when a view event M' occurs   *
* at a process P in a view M, the members in P's VS set were VERY likely *
* virtually synchronous with P in M. To be 100% sure of this however,    *
* P must either receive a message from those processes in the new view   *
* (works only for Flush), or receive a message from those processes      *
* stating that they moved directly from view M to view M' (works for     *
* Flush and Spread).                                                     *
*                                                                        *
* Because the VS set is usually correct most applications simply act as  *
* if the VS set directly indicates virtual synchrony. If it is incorrect *
* another view event will occur in the future removing those members     *
* from the next VS set, and the app's state synchronization algorithm    *
* will resolve any introduced inconsistencies.                           *
**************************************************************************

John

PS - One final wrinkle: the Flush layer weakens the usual virtual
synchrony safety property slighlty by saying: if processes P and Q both
install a view N' in the same previous view N AND Q is in P's VS set, then
they were virtually synchronous in N and P and Q have the same VS sets.
This weakening allows the Flush layer to state that two processes aren't
virtually synchronous when it could make them so if it did a lot more work
-- the Flush layer is lazy :)

This weakening does allow the following trivial soln which maintains all
of the system safety and liveness properties: the VS set of your views
always only contains yourself. The Flush layer does considerably better
than this, however. In addition, regular GCS's could achieve the exact
same effect by installing singleton views at every process in between each
real view and maintain all the safety and liveness properties. My
weakening of the property makes this "flaw" more obvious but no more
serious than in regular GCSs.

PPS - Another final note, just because messages were delivered (i.e. -
ready for immediate receipt) to user processes does NOT imply that those
messages were "handled/processed" by the recipients. This is application
level logic and can only be implied by application level behavior. For
example, if after a view everyone sent a "Finished Processing" msg after
they finished processing all of the messages from their previous view,
then you could leverage the View Synchrony Property to say that those
messages were handled by the application and not just delivered.





More information about the Spread-users mailing list