[Spread-users] partition mask

Thu Mar 14 02:34:34 EST 2002

On Wed, Mar 13, 2002 at 07:09:00PM +0200, Weinsberg Yaron wrote:
> Hi,
> 
> There are 4 daemons running on the LAN.
> Actually, I didn't get any Transitional messages so far -
> I just wanted to know where it is configured.

In the init function of membership.c
> 
> Incrementing the timeout avoids transitional messages, but i had some
> trouble solving the general
> case (where there are partitions) :
> 1. Suppose there are 5 members in a group which need to ACK/NACK a message.
By ACK/NACK you mean that the 5 member group needs to guarnatee that they all
deliver the message or noone delivers it? (atomic broadcast?)

I think a key question is what guarantees are you trying to provide in a
partitionable environment. Do you want the non-primary components to do
anything? Or do they just quit (From below I think they just quit, so really
only a primary component ever exists). 

> 2. If a message is received after a transitional message, and a process is
> not in a quorum it should leave the
> group (which indicates an ACK response for the others).
> BUT from what i have read (Idit Keidar,Yair Amir,Danny Dolev) a process
> can't just look at the members list
> (of a view change event) and decide if it is in a quorum because of possible
> further partitions (and the danger in creating several primary components)

This is true, especially if you want to support partitionable operation. If
you only want one primary, then you can pick several different quorum
functions (absolute majority, single leader, dynamic majority (i.e. majority
of previous membership)). All of which you can calculate based on the
membership message you receive from spread. 

There are two pieces of info the membership message gives you. First, the
current membership of the group, that can determine abolute majority (if the
size of the current membership > (max/2). Second, the transitional membership
(stored in the body of the membership message) which tells you which other
processes "came with" you from the previous membership to the current one.
They are guaranteed to all have the same state as you at the beginning of this
membership as they received the same messages as you did.

Now if the current primary component membership has changed from the last
priimary, and yyou allow partitioned members to continue running, some of the
members of the new primary may not be upto date because they were not members
of the previous primary (they were running in a different partition). In this
case you will need to synchronize them prior to continuing. But if all daemons
crash as soon as they detect they are not in the primary, then the only state
synconriztion should occur when 'new' members join. Only allowing a primary to
continue running simplifies the possible scenerios that you have to handle.

 > 
> Does spread make it easier for a programmer or should i need to implement a
> 3PC in order to decide on a quorum ?

No, you can decide on a quorum locally. Some info about this can be found in
two new papers that are availabe on our website (publications page) about
database replication. The first paper is probably more relevant as it goes
over one way to correctly maintain replicas when partitions are allowed. It
sounds like you don't want partitions to be allowed (you quit when you are
outside the primary) so it is easier.

 > I really start to feel that using EVS is harder than VS (which
supports > flush).... Am I right ?

EVS is a 'lighter weight'' guarntee then VS because it does not force all
messages to be delivered in the view in which they are sent. Because of this
it is more efficient (blocks for less time and avoids end-to-end application
acks).

It is possible to directly implement VS on top of EVS. Check out John
Schultz's thesis on our website and the "Flush" library (on cnds and
spread.org websites) for a library that gives the application VS semantics on
top of Spread. Sometimes the VS semantics are easier to work with and you can
afford the cost of flushing each group.

 > 
>     Thanks for your help!
>         yaron.
> 
> btw, does the postgres-spread manager (which support database replication)
> is open source?
> where can i get it ?

Sorry, it is not currently publicly available as research with it is ongoing.

I hope this helps some. If you explain exactly what semantics you are trying
to achieve, I can probably give you an overview of how you can get that
service with Spread. 

Jonathan

-- 
-------------------------------------------------------
Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science   
Johns Hopkins University    
-------------------------------------------------------