[Spread-users] Fault Resilience?

Yair Amir yairamir at cnds.jhu.edu
Mon Nov 11 20:26:13 EST 2002


Are your clients getting a membership message once you disconnect?
(Are you having your clients join a group?)

    :) Yair.
    
ex1)> I have a question about Spread and fault resilience. I have two separate
ex1)> Windows boxes on my LAN each running a Spread Daemon and a client written in
ex1)> Python. I plan on deploying this setup across a unreliable link once I am
ex1)> finished developing. With the idea of "resilient to faults across external
ex1)> or internal networks" I decided to see how Spread handles the loss of the
ex1)> network between the two client/deamons. I am testing this by, quite simply,
ex1)> momentarily pulling the ethernet cable out of the back of my system and
ex1)> plugging it back in. If I pull the plug for a short time, on the order of
ex1)> 1-3 seconds, all is well. My clients pause and resume once the two daemons
ex1)> seem to find each other again and resume sending message traffic. But for
faults of >>~3 secs, the daemons both seem to go through some sort of reset
ex1)> and never send anymore data. Here is the sequence:

ex1)> Daemon on xxx:

ex1)> << pull the plug >>
ex1)> Memb_token_loss: I lost my token, state is 1
ex1)> Scast_alive: State is 2
ex1)> Scast_alive: State is 2 << replug here is OK >>
ex1)> Send_join: State is 4
ex1)> Send_join: State is 4
ex1)> Send_join: State is 4
ex1)> Send_join: State is 4
ex1)> Send_join: State is 4
ex1)> Memb_handle_token: handling form2 token
ex1)> Handle_form2 in FORM
ex1)> Memb_transitional
ex1)> G_handle_trans_memb:
ex1)> G_handle_trans_memb in GOP
ex1)> Memb_regular
ex1)> Membership id is ( 268183097, 1037061340)
ex1)> --------------------
ex1)> Configuration at xxx is:
ex1)> Num Segments 2
ex1)>         1       15.252.38.255     4803
ex1)>                 yyy              15.252.38.57
ex1)>         0       15.252.39.255     4803
ex1)> ====================
ex1)> G_handle_reg_memb:  with (15.252.38.57, 1037061340) id
ex1)> G_handle_reg_memb in GTRANS

ex1)> So, am I incorrect in understanding what "fault reslience" means, do I have
ex1)> something configured incorrectly or is there a problem in the way I am using
ex1)> the daemon/client interface (I can supply more info if needed)? I am using
ex1)> the Python bindings v1.3, Spread v3.17 and both machines are running 2KSP3.

ex1)> Thanks,

ex1)> don






ex1)> _______________________________________________
ex1)> Spread-users mailing list
ex1)> Spread-users at lists.spread.org
ex1)> http://lists.spread.org/mailman/listinfo/spread-users





More information about the Spread-users mailing list