[Spread-users] Questions about CAUSED_BY_NETWORK
jschultz at spreadconcepts.com
Mon Mar 11 11:14:45 EDT 2013
> 1. I understand a CAUSED_BY_NETWORK can mean both that members have been added to and/or removed from a group, right?
Yes. A CAUSED_BY_NETWORK membership message is generated when a change occurs in the daemon membership that affects the client membership of the group in question. As you said in your email, this can occur due to a daemon crashing and/or the network between daemons partitioning / healing. Typically, a daemon starting up won't cause such an event because although the daemon membership will change to include the new daemon, that daemon shouldn't have any clients in any groups until after it forms its first membership and, therefore, its joining shouldn't affect the membership of any groups.
> Can the same membership message mix sets of private group names for both kinds of events or indicates each message either/or?
Yes. A CAUSED_BY_NETWORK membership can indicate solely new additional members, solely removed old members or a mixture of the two from the POV of each client. Indeed, a given member in the group can be *BOTH* removed and added from the POV of certain clients. This can happen if a member of the client's previous membership partitioned away, installed another membership while he was away, and then merged back into this newly established membership. While such an event is unlikely to occur, it is definitely possible. The service type does not differentiate exactly what kind of change occurred from the POV of each client (i.e. - additive, subtractive, both). If you care to know, then the client must calculate it themselves, which I detail in response to your question #2.
The membership list of the message contains all the clients that are now in the group after the membership event. From the POV of any given client, Spread can tell you the other members of your *previous* membership that stayed with you all the way through your previous membership and into this new membership. This set of members is called the Virtual Synchrony (VS) set. It is called the VS set because in their *previous* membership all those members (so long as they didn't crash) should have seen the same set of AGREED / SAFE messages in the same order as each other, including even where any transitional signal may have appeared. Since these members all saw the same AGREED / SAFE messages in the same order in their previous membership (so long as they didn't crash), then there may not be much of any need for them to communicate about what they saw (or didn't) in their previous membership. This feature can be used to reduce the amount of state reconciliation that may be necessary between members after a membership change for some algorithms (e.g. - a CAUSED_BY_NETWORK membership change that only removed members might not require any additional synchronization work).
> 2. I don't understand how to programmatically determine which private groups are indicated to join or to leave when I receive a CAUSED_BY_NETWORK membership message. I see what spuser does with it, but still I don't fully get it. Could you give me some hints?
For a given client, the membership list of their *previous* membership, the new membership list and their new VS set allows a client to calculate exactly what kind of change they just saw. Anyone who was in the previous membership list but is not in their new VS set (even if they are in the new membership list) partitioned away and may not have seen the same messages in the previous membership of this client. Anyone who is in the new membership list but not in the VS set is a "new" member that merged in. Thus, a member that is both in this client's previous membership list and in the new membership list but isn't in this client's VS set was both removed and added. Again, that peculiar case means that member partitioned away from this client and, therefore, might not have seen the same messages in this client's previous membership, installed an intervening membership and may have delivered messages in that intervening membership too. Thus, additional reconciliation likely must be done with such a member, much like any other "new" member that merged in from "out of the blue."
A client can access its VS set by calling the SP_get_memb_info() on its membership message, which fills out a memb_info structure. You can then pass that structure's my_vs_set field into SP_get_vs_set_members() to get that client's VS set. Additionally, a client can actually see all the different VS sets that merged together by using SP_get_vs_sets_info() and then iterating over the returned array of vs_set_info's and calling SP_get_vs_set_members() on each one. Documentation for these functions can be found here: http://www.spread.org/docs/spread_docs_4/docs/c_api.html
> 3. I got a single Spread segment on a single LAN with a daemon on each individual device (less than 10). I understand that CAUSED_BY_NETWORK can also happen in this configuration (daemon crash, device crash, cable breakage, what else?), right?
Yes. Like you said, you could get a CAUSED_BY_NETWORK if a daemon or device crashed or if something funky happened with the networking such that your LAN partitioned (or healed) either at a low level (e.g. - cable is pulled / breaks, issue with your switch, etc.) or at a higher level (e.g. - a new firewall rule starts / stops blocking some traffic).
I hope that helps. Please feel free to ask more questions if anything is still not clear.
John Lane Schultz
Spread Concepts LLC
Phn: 301 830 8100
Cell: 443 838 2200
On Mar 11, 2013, at 4:33 AM, andreas.ames at de.transport.bombardier.com wrote:
I'm a Spread newbie. I've looked around in the net, the docs and the examples but I still don't crasp the CAUSED_BY_NETWORK membership messages, neither conceptually nor API-wise. I've got the following questions about it:
1. I understand a CAUSED_BY_NETWORK can mean both that members have been added to and/or removed from a group, right? Can the same membership message mix sets of private group names for both kinds of events or indicates each message either/or?
2. I don't understand how to programmatically determine which private groups are indicated to join or to leave when I receive a CAUSED_BY_NETWORK membership message. I see what spuser does with it, but still I don't fully get it. Could you give me some hints?
3. I got a single Spread segment on a single LAN with a daemon on each individual device (less than 10). I understand that CAUSED_BY_NETWORK can also happen in this configuration (daemon crash, device crash, cable breakage, what else?), right?
I'd appreciate explanations, links to other sample code, documents concerning this.
Thanks in advance,
Please consider the environment before you print / Merci de penser à l'environnement avant d'imprimer / Bitte denken Sie an die Umwelt bevor Sie drucken
Bombardier Transportation GmbH
Vorsitzender des Aufsichtsrats / Chairman of Supervisory Board: Prof. Dr. Wilhelm Bender
Geschäftsführung / Executive Board: Michael Clausecker (Vorsitzender/Chairman), Dr. Susanne Kortendick, Luc Charlemagne, Gregorius Peters
Sitz der Gesellschaft / Principal Office: Berlin
Registergericht / Registration Court: Amtsgericht Charlottenburg, HRB 64838
This e-mail communication (and any attachment/s) may contain confidential or privileged information and is intended only for the individual(s) or entity named above and to others who have been specifically authorized to receive it. If you are not the intended recipient, please do not read, copy, use or disclose the contents of this communication to others. Please notify the sender that you have received this e-mail in error by reply e-mail, and delete the e-mail subsequently. Please note that in order to protect the security of our information systems an AntiSPAM solution is in use and will browse through incoming emails.
Ce message (ainsi que le(s) fichier(s)), transmis par courriel, peut contenir des renseignements confidentiels ou protégés et est destiné à l’usage exclusif du destinataire ci-dessus. Toute autre personne est, par les présentes, avisée qu’il est strictement interdit de le diffuser, le distribuer ou le reproduire. Si vous l’avez reçu par inadvertance, veuillez nous en aviser et détruire ce message. Veuillez prendre note qu'une solution antipollupostage (AntiSPAM) est utilisée afin d'assurer la sécurité de nos systèmes d'information et qu'elle furètera les courriels entrants.
Spread-users mailing list
Spread-users at lists.spread.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3805 bytes
Desc: not available
Url : http://lists.spread.org/pipermail/spread-users/attachments/20130311/d78de19f/attachment.bin
More information about the Spread-users