[Spread-users] Issue with Spread going silent
Yair Amir
yairamir at cs.jhu.edu
Sun Nov 7 10:08:35 EST 2010
Hi Luke,
It sheds light in the sense that I see what is happening:
- 147 and 102 are together with 147 the representative and they work well.
- 48 comes along. It finds the others and I think they correctly
discover 147 and 48 as the representatives. As 48 does not have a ring
it creates a form1 token and sends it to 147.
- 147 then sends the form1 token to 102, which is good.
- 102 then creates a form2 token and sends it to 147, which will be
the representative of the new ring. The form2 token now contains all
the information needed to form the new ring.
- 147 gets that form2 token and processes it, which is good.
- 147 is supposed to send the form2 token to 102, which will be the daemon after
147 in the new ring.
- I do not see 102 getting that form2 token from 147, which is strange
(as it did get the form1 token from it). This is what causes the
ring to dissolve.
I do not understand why this happens though - why that particular message
is lost. Somehow it does not look random though as it probably happens
over and over again.
If you like - you can make 2 slight code changes in Spread in the file
membership.c, rebuild Spread and re-run EXACTLY THE SAME SCENARIO.
The code changes are to add the following unnumbered lines in their
place in the membership.c file.
Code change 1:
1935 Net_set_membership( Future_membership );
printf("Yair: Installing new network membership ----------->\n");
Conf_print( Future_membership);
printf("Yair: <-------------------------------------------->\n");
1936 FC_new_configuration( );
Code change 2:
2013 if( Conf_last( &Future_membership ) != My.id )
2014 {
2015 Net_send_token( &send_scat );
2016 Net_send_token( &send_scat );
printf("Yair: Sent form2 from Read_form2 ------------------>\n");
2017 Token_rounds = 0;
2018
2019 }else{
2020 /* build first regular token */
2021 send_scat.num_elements = 1;
2022
2023 form_token->type = 0;
2024 form_token->seq = 0;
2025 form_token->aru = Last_seq;
2026 form_token->flow_control = 0;
2027 form_token->rtr_len = 0;
2028
2029 Net_send_token( &send_scat );
printf("Yair: Sent regular token from Read_form2 ---------->\n");
2030 Token_rounds = 1;
2031 }
Cheers,
:) Yair.
On 11/7/10 9:13 AM, Luke Marsden wrote:
> Hi Yair,
>
> Thank you. I agree 4% packet loss is high. I get quite a bit of packet
> loss when saturating the network interfaces (spsend/recv or ping -f),
> but none at all when transmitting just a small amount of traffic. Since
> in normal operation Spread shouldn't go near saturating the network
> interfaces, I agree that this is unlikely to be the cause of the
> problem. An interesting artefact of the virtualisation though.
>
> I have rearranged the machines in the spread.conf. They are using their
> public IPs for this test, not the 10.0.0.* addresses (although they
> exhibit the same behaviour either way):
>
> Spread_Segment 178.22.66.147:4803 {
> 2f20196c853548e7 178.22.66.147
> }
> Spread_Segment 178.22.67.102:4803 {
> 27edda570dce48bb 178.22.67.102
> }
> Spread_Segment 178.22.67.48:4803 {
> fff0bbd5e0da4103 178.22.67.48
> }
>
> I've added the MEMBERSHIP debug flag, and this is the output. I started
> the spread daemons from left-to-right, which now corresponds to
> top-to-bottom :-)
>
> http://lukemarsden.net/debugging.png
>
> Does this shed any light?
>
More information about the Spread-users
mailing list