[Spread-users] Adding info to Transitional EVS messages

Nilo Rivera nrivera at cs.jhu.edu
Fri Sep 2 01:15:39 EDT 2005


Thanks!  I have one more question which I am not sure if Spread already 
provides as it has enough information:

Let M be a safe message delivered in a transitional membership.  Can we 
know if M did meet total order?  Then, we know that M was delivered in a 
transitional only because "I don't know if everybody knows".

What knowledge can be made from the above? (Please correct me if Im wrong)
If a message does not meet total-order, then for sure the message was 
not received by anyone in a regular membership (since I had a hole and 
no member could have ever establish safe delivery).  If a transitional 
message is in total order, then we know that other members could have 
safely received the message in a regular membership (which tells us 
almost nothing, but enough for certain scenarios).

Here is a scenario: we have the same state machine in a primary and 
secondary site.  Only the secondary is active and applies updates when a 
safe message is received in a regular membership (where the primary is 
also receiving those messages).  After a transitional membership, if the 
primary knows the set of messages that could have been applied by the 
secondary, and applying those updates would result in an idempotent 
operation from the last checkpointed state, then the primary knows for 
sure the state of the secondary after the partition.  (there could be 
other messages that are not in-order in the transitional which would 
result in a non-idempotent operation, but now we know the secondary 
would have never applied those).  If totally-ordered messages result in 
a non-idempotent operation, we can know the complete set of states that 
could have been reached from the last checkpoint by ignoring non-ordered 
messages.

Thanks again,
Nilo




Yair Amir wrote:

> Hi,
>
> Good discussion.
>
> To add a bit - even algorithm we developed outside of the open source
> version will not achieve this property. The reason is we always
> maintain the FIFO property of not acking a message higher than a message
> sent by the same guy that we did not receive. The advanced protocol
> described in the DSN 2000 paper is such a protocol.
>
> Theoretically you could write a protocol that achieves this, which can
> be done in two ways:
> 1. Separating the control of different groups - this is similar to just
>    instantiating a new Spread network.
> 2. Acknowledging messages explicitly, which is generally very wasteful
>    as it does not use aggregation and implicit acks.
>
> I am of the opinion that if you really care about it, separate the
> networks. Otherwise, perhaps you need a protocol tailored to a specific
> need that can allow an inefficiency for the sake of a more accurate
> signaling in edge cases.
>
> Cheers,
>
>     :) Yair.
>
> John Lane Schultz wrote:
>
>> Nilo Rivera wrote:
>>
>>> Consider the following scenario:
>>>
>>> 1. Clients X and Y (Cx and Cy) are members of groups A and B, each 
>>> running on its own daemon.
>>> 2. Cx sends a safe message Ma to group A, followed by another safe 
>>> message Mb to group B.
>>> 3. Cy receives Mb, followed by a partition which leaves him alone. 
>>> 4. Cy established "I know that everybody knows" for Mb before the 
>>> partition. (may not be the case in Spread)
>>> 5. Cy delivers Mb in a transitional membership because it does not 
>>> have Ma..
>>>
>>
>> Within a heavy-weight daemon membership/view, a control token is 
>> circulated to assign sequence numbers to messages sent within that 
>> view.  This token/sequence assignment establishes a total order on 
>> the messages sent within that view.  Similiarly, safe messages are 
>> also noted on this token through a single ARU (Acknowledge Receipt & 
>> Understanding) counter.  This ARU basically notes the minimum message 
>> sequence number up through which all daemons in the view have 
>> received ...
>>
>>> First, can (4) above happen in Spread?....If not, is it 
>>> implementation dependent (a vector or fined-grained-token could do 
>>> it)?.
>>
>>
>>
>> As such, I don't believe that (4) can be achieved in Spread as the 
>> ARU will still be stuck on trying to make message Ma safe and Mb is 
>> later in the total order (needs a higher ARU than Ma).  This 
>> mechanism is implementation dependent as an all-ACK (inefficient) or 
>> similiar algorithm could locally achieve such knowledge for Mb but 
>> not Ma, which I don't believe can occur in Spread.
>>
>>> Here is the reason: If group A and B are independent state machines, 
>>> then group B gets penalized for a message unrelated to his state 
>>> (cannot apply Mb). However, Mb could have been safely applied if we 
>>> know that is in order and everyone in the group received it.  We 
>>> have such a case where an application is a member of hundreds of 
>>> groups with a few independent state machines.   I guess each state 
>>> machine could use a different spread network to solve the above, but 
>>> is not possible as some events are sent to multiple state machines.
>>
>>
>>
>> The different Spread process groups are "simulated" through the use 
>> of a single daemon "group."  Because we do not run independent state 
>> machines per group but rather 1 per daemon membership, Spread can 
>> efficiently run with thousands of process groups running inside it.  
>> It also allows Spread to easily make guarantees across groups and to 
>> give well-defined meaning to open group semantics.
>>
>> It is true that in edge cases like the one you describe a different 
>> algorithm could be slightly more optimal in some senses, but that 
>> optimization would come at a too heavy performance and capability cost.
>>
>> Cheers!
>> John
>>
>> -- 
>> John Schultz
>> Spread Concepts LLC
>>
>>
>> _______________________________________________
>> Spread-users mailing list
>> Spread-users at lists.spread.org
>> http://lists.spread.org/mailman/listinfo/spread-users
>>
>>
>
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users





More information about the Spread-users mailing list