[Spread-users] Adding info to Transitional EVS messages
Nilo Rivera
nrivera at cs.jhu.edu
Fri Sep 2 01:15:39 EDT 2005
Thanks! I have one more question which I am not sure if Spread already
provides as it has enough information:
Let M be a safe message delivered in a transitional membership. Can we
know if M did meet total order? Then, we know that M was delivered in a
transitional only because "I don't know if everybody knows".
What knowledge can be made from the above? (Please correct me if Im wrong)
If a message does not meet total-order, then for sure the message was
not received by anyone in a regular membership (since I had a hole and
no member could have ever establish safe delivery). If a transitional
message is in total order, then we know that other members could have
safely received the message in a regular membership (which tells us
almost nothing, but enough for certain scenarios).
Here is a scenario: we have the same state machine in a primary and
secondary site. Only the secondary is active and applies updates when a
safe message is received in a regular membership (where the primary is
also receiving those messages). After a transitional membership, if the
primary knows the set of messages that could have been applied by the
secondary, and applying those updates would result in an idempotent
operation from the last checkpointed state, then the primary knows for
sure the state of the secondary after the partition. (there could be
other messages that are not in-order in the transitional which would
result in a non-idempotent operation, but now we know the secondary
would have never applied those). If totally-ordered messages result in
a non-idempotent operation, we can know the complete set of states that
could have been reached from the last checkpoint by ignoring non-ordered
messages.
Thanks again,
Nilo
Yair Amir wrote:
> Hi,
>
> Good discussion.
>
> To add a bit - even algorithm we developed outside of the open source
> version will not achieve this property. The reason is we always
> maintain the FIFO property of not acking a message higher than a message
> sent by the same guy that we did not receive. The advanced protocol
> described in the DSN 2000 paper is such a protocol.
>
> Theoretically you could write a protocol that achieves this, which can
> be done in two ways:
> 1. Separating the control of different groups - this is similar to just
> instantiating a new Spread network.
> 2. Acknowledging messages explicitly, which is generally very wasteful
> as it does not use aggregation and implicit acks.
>
> I am of the opinion that if you really care about it, separate the
> networks. Otherwise, perhaps you need a protocol tailored to a specific
> need that can allow an inefficiency for the sake of a more accurate
> signaling in edge cases.
>
> Cheers,
>
> :) Yair.
>
> John Lane Schultz wrote:
>
>> Nilo Rivera wrote:
>>
>>> Consider the following scenario:
>>>
>>> 1. Clients X and Y (Cx and Cy) are members of groups A and B, each
>>> running on its own daemon.
>>> 2. Cx sends a safe message Ma to group A, followed by another safe
>>> message Mb to group B.
>>> 3. Cy receives Mb, followed by a partition which leaves him alone.
>>> 4. Cy established "I know that everybody knows" for Mb before the
>>> partition. (may not be the case in Spread)
>>> 5. Cy delivers Mb in a transitional membership because it does not
>>> have Ma..
>>>
>>
>> Within a heavy-weight daemon membership/view, a control token is
>> circulated to assign sequence numbers to messages sent within that
>> view. This token/sequence assignment establishes a total order on
>> the messages sent within that view. Similiarly, safe messages are
>> also noted on this token through a single ARU (Acknowledge Receipt &
>> Understanding) counter. This ARU basically notes the minimum message
>> sequence number up through which all daemons in the view have
>> received ...
>>
>>> First, can (4) above happen in Spread?....If not, is it
>>> implementation dependent (a vector or fined-grained-token could do
>>> it)?.
>>
>>
>>
>> As such, I don't believe that (4) can be achieved in Spread as the
>> ARU will still be stuck on trying to make message Ma safe and Mb is
>> later in the total order (needs a higher ARU than Ma). This
>> mechanism is implementation dependent as an all-ACK (inefficient) or
>> similiar algorithm could locally achieve such knowledge for Mb but
>> not Ma, which I don't believe can occur in Spread.
>>
>>> Here is the reason: If group A and B are independent state machines,
>>> then group B gets penalized for a message unrelated to his state
>>> (cannot apply Mb). However, Mb could have been safely applied if we
>>> know that is in order and everyone in the group received it. We
>>> have such a case where an application is a member of hundreds of
>>> groups with a few independent state machines. I guess each state
>>> machine could use a different spread network to solve the above, but
>>> is not possible as some events are sent to multiple state machines.
>>
>>
>>
>> The different Spread process groups are "simulated" through the use
>> of a single daemon "group." Because we do not run independent state
>> machines per group but rather 1 per daemon membership, Spread can
>> efficiently run with thousands of process groups running inside it.
>> It also allows Spread to easily make guarantees across groups and to
>> give well-defined meaning to open group semantics.
>>
>> It is true that in edge cases like the one you describe a different
>> algorithm could be slightly more optimal in some senses, but that
>> optimization would come at a too heavy performance and capability cost.
>>
>> Cheers!
>> John
>>
>> --
>> John Schultz
>> Spread Concepts LLC
>>
>>
>> _______________________________________________
>> Spread-users mailing list
>> Spread-users at lists.spread.org
>> http://lists.spread.org/mailman/listinfo/spread-users
>>
>>
>
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
More information about the Spread-users
mailing list