[Spread-users] Issue with Spread going silent

Yair Amir yairamir at cs.jhu.edu
Sun Nov 7 10:08:35 EST 2010


Hi Luke,

It sheds light in the sense that I see what is happening:

- 147 and 102 are together with 147 the representative and they work well.
- 48 comes along. It finds the others and I think they correctly
   discover 147 and 48 as the representatives. As 48 does not have a ring
   it creates a form1 token and sends it to 147.
- 147 then sends the form1 token to 102, which is good.
- 102 then creates a form2 token and sends it to 147, which will be
   the representative of the new ring. The form2 token now contains all
   the information needed to form the new ring.
- 147 gets that form2 token and processes it, which is good.
- 147 is supposed to send the form2 token to 102, which will be the daemon after
   147 in the new ring.
- I do not see 102 getting that form2 token from 147, which is strange
   (as it did get the form1 token from it). This is what causes the
   ring to dissolve.

I do not understand why this happens though - why that particular message
is lost. Somehow it does not look random though as it probably happens
over and over again.

If you like - you can make 2 slight code changes in Spread in the file
membership.c, rebuild Spread and re-run EXACTLY THE SAME SCENARIO.

The code changes are to add the following unnumbered lines in their
place in the membership.c file.

Code change 1:

1935         Net_set_membership( Future_membership );
              printf("Yair: Installing new network membership ----------->\n");
	     Conf_print( Future_membership);
              printf("Yair: <-------------------------------------------->\n");
1936         FC_new_configuration( );


Code change 2:

2013         if( Conf_last( &Future_membership ) != My.id )
2014         {
2015                 Net_send_token( &send_scat );
2016                 Net_send_token( &send_scat );
              printf("Yair: Sent form2 from Read_form2 ------------------>\n");
2017                 Token_rounds = 0;
2018
2019         }else{
2020                 /* build first regular token */
2021                 send_scat.num_elements = 1;
2022
2023                 form_token->type = 0;
2024                 form_token->seq = 0;
2025                 form_token->aru = Last_seq;
2026                 form_token->flow_control = 0;
2027                 form_token->rtr_len = 0;
2028
2029                 Net_send_token( &send_scat );
              printf("Yair: Sent regular token from Read_form2 ---------->\n");
2030                 Token_rounds = 1;
2031         }


Cheers,

	:) Yair.

On 11/7/10 9:13 AM, Luke Marsden wrote:
> Hi Yair,
> 
> Thank you. I agree 4% packet loss is high. I get quite a bit of packet
> loss when saturating the network interfaces (spsend/recv or ping -f),
> but none at all when transmitting just a small amount of traffic. Since
> in normal operation Spread shouldn't go near saturating the network
> interfaces, I agree that this is unlikely to be the cause of the
> problem. An interesting artefact of the virtualisation though.
> 
> I have rearranged the machines in the spread.conf. They are using their
> public IPs for this test, not the 10.0.0.* addresses (although they
> exhibit the same behaviour either way):
> 
> Spread_Segment 178.22.66.147:4803 {
>     2f20196c853548e7 178.22.66.147
> }
> Spread_Segment 178.22.67.102:4803 {
>     27edda570dce48bb 178.22.67.102
> }
> Spread_Segment 178.22.67.48:4803 {
>     fff0bbd5e0da4103 178.22.67.48
> }
> 
> I've added the MEMBERSHIP debug flag, and this is the output. I started
> the spread daemons from left-to-right, which now corresponds to
> top-to-bottom :-)
> 
> http://lukemarsden.net/debugging.png
> 
> Does this shed any light?
> 




More information about the Spread-users mailing list