[Spread-users] Re: partition detection
wyaron at sangate.com
Thu Apr 25 08:09:10 EDT 2002
Regarding your reply I was wondering:
1.Suppose I received a Transitional message and a corresponding membership
message (with the VS set, If I understand correctly they are consecutive
messages). What should happen if a new partition occurs before receiving the
next Regular Membership message ? (i.e., can i get a sequence of
Transitional+Membership until a regular configuration is received ?)
2. Regarding the open group semantics and spread's token ring protocol. What
is the difference between FIFO/CAUSAL/TOTAL if spread daemon only broadcast
while possessing the token ? (and assigns a unique sequence number for each
message). If there is a different, do you have any performance comparison
between them ?
(I know there is a difference when talking about WAN using the HOP protocol)
3. If a group outsider sends a TOTAL message to a group on which all
messages are sent CAUSAL. Does the
TOTAL message arrives at the same order (in the group's message flow) at all
members ? (the answer for this depends on your answer for 2...:-)
4. Does the current spread includes the support for WAN ? (i didn't see any
file that contains code to handle the lamport time stamp when sending a
message out from a site).
----- Original Message -----
From: "Jonathan Stanton" <jonathan at cnds.jhu.edu>
To: "Yaron Weinsberg" <wyaron at sangate.com>
Cc: <spread-users at lists.spread.org>
Sent: April 22, 2002 9:11 PM
Subject: Re: partition detection
> From the point of a view of a client (some program linked with libsp or
> equivelent) you can receive 4 types of 'membership' events.
> The type can be determined using the Is_caused_join_mess(),
> Is_caused_leave_mess() ...) functions.
> 1) JOIN: This means a single member joined the group. Noone failed and the
> the new member received no messages to the group prior to the join and
> receive all messages after the join. Everyone is told which member joined.
> 2) LEAVE: A single member left the group (someone called SP_leave). the
> leaving member received all messages prior to this leave message
> spread guarantees nothing about what the program DID with those messages
> received) and wil receive no more messages from the group. Everyone is
> which member left.
> 3) DISCONNECT: A single member 'disconnected' from the daemon it had been
> connected to. This could be because the client called SP_disconnect() or
> could be because the TCP or Unix Domain Socket returned a closed
> to the daemon (something between the client and daemon failed and caused a
> network reset, or the client process crashed, or something else). In this
> case everyone else (other then the disconnected member) will get a message
> indicating who was disconnected. It is not clear what messages the client
> received prior to the disconnet event as it is not know where the failure
> 4) NETWORK: A network event indicates taht more then a single member
> occured and an actual failure, recovery, network partition, or network
> occured between the DAEMONS themselves. As a result of the daemons
> who they were connected with, the clients who were connected to those
> daemons also change membership.
> This event can encompass a parititon and merge at the same time (some
> daemos are now disconnected from this subset, and others have become
> reachable at approximately the same time. In this case each client willr
> eceive a NETWORK event with the new membership set and teh VS set which
> indicates which members of the old view (the last one delivered to the
> client) are still in the new view AND were together at all times between
> views. This VS set is the only set who are guaranteed to have seen teh
> set of messages during the membership change. Other members of the group
> are not in the VS set may have seen a different series of membership views
> and different data messages and will need to be reconciled by the
> A network event can be triggered in several ways. Some control message
> between th daemons keep being dropped (so the link appears completely
> unreliable), a timeout occurs in communication amoung the daemons, some
> daemon hears from new daemons that are not part of the current
> configuration. None of these triggers involves the clients directly, only
> the daemon processes. So if a client cannot talk to a daemon for some
> that will cause a 'DISCONNECT' event, not a 'NETWORK' event. If the daemon
> the client is connected to cannot talk to the other daemons it used to be
> able to, that will be a NETWORK event, not a DISCONNECT event.
> Hope this helps,
> On Sun, Apr 21, 2002 at 03:10:23PM +0300, Yaron Weinsberg wrote:
> > Hi again,
> > Can you please explain what is the exact semantic for a network
> > partition ? Is it the disability to communicate with a remote spread
> > deaemon (due to a possible crash) or maybe it is an ICMP destination
> > unreachable message that triggers the membership change ?
> > best,
> > yaron.
> > p.s. thanks a lot for previous help regarding partitons and quorums.
> Jonathan R. Stanton jonathan at cs.jhu.edu
> Dept. of Computer Science
> Johns Hopkins University
More information about the Spread-users