[From nobody Wed Jul 13 02:52:57 2011 Message-ID: <45D0AD4E.1090907@cs.jhu.edu> Date: Mon, 12 Feb 2007 13:09:18 -0500 From: Yair Amir <yairamir@cs.jhu.edu> User-Agent: Thunderbird 1.5.0.9 (Windows/20061207) MIME-Version: 1.0 To: Mark Eliot <mark.eliot@sri.com> Subject: Re: [Spread-users] Five minute timer? References: <70BCB9E1-7C8D-4005-B00F-31242F49EB30@sri.com> In-Reply-To: <70BCB9E1-7C8D-4005-B00F-31242F49EB30@sri.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Mark, My guess is that the multicast address does not work well on the isolated computer, so in effect, this computer does not communicate using the 225.0.1.1 address (assuming they all have the same config file). You can check this by using the spsend and sprecv programs provided with Spread. Or - you can run spmonitor and see if you have a lot of retransmissions by the guy immediately after the isolated machine in the config file (in a cyclical manner). :) Yair. Mark Eliot wrote: > I've got a Spread network of about a dozen computers. Occasionally > (more often than I'd like), when one is rebooted or its software is > restarted, a particular computer (not the one restarted) will lose its > connection with the rest of the network. According to the Spread log, > this computer thinks that it is part of its own network. The rest of > the computers appear to stay together in the common network. The > curious thing is that *exactly* 5 minutes after the particular computer > partitions itself, it rejoins the common network. I've seen this > behavior repeatedly. > > So, questions for the group: > > 1. Is there something magic about 5 minutes? > 2. Any ideas on how I can prevent, or at least minimize the time, this > one computer is isolated? > > Other info: The isolated computer is a Mac running OS X 10.4.6 and > Spread 3.17.3. It has two IP nets. Public net is has the Spread > network; private doesn't. > > Here's the partitioning event: > > [Fri 09 Feb 2007 16:57:03] G_handle_trans_memb: Received trans memb id > of: {proc_id: -2146303162 time: 1171069023} > [Fri 09 Feb 2007 16:57:03] Memb_regular > Membership id is ( -2146303162, 1171069024) > [Fri 09 Feb 2007 16:57:03] -------------------- > [Fri 09 Feb 2007 16:57:03] Configuration at ams-server is: > [Fri 09 Feb 2007 16:57:03] Num Segments 1 > [Fri 09 Feb 2007 16:57:03] 1 225.0.1.1 3333 > [Fri 09 Feb 2007 16:57:03] ams-server 128.18.3.70 > [Fri 09 Feb 2007 16:57:03] ==================== > [Fri 09 Feb 2007 16:57:03] G_handle_reg_memb: with (128.18.3.70, > 1171069024) id > [Fri 09 Feb 2007 16:57:03] G_handle_reg_memb in GTRANS > [Fri 09 Feb 2007 16:57:03] G_handle_reg_memb: skipping state transfer > for group AlarmMonitorIf. > [Fri 09 Feb 2007 16:57:03] G_handle_reg_memb: skipping state transfer > for group InventoryIf. > [Fri 09 Feb 2007 16:57:03] G_handle_reg_memb: skipping state transfer > for group LogMonitorIf. > [Fri 09 Feb 2007 16:57:03] G_handle_reg_memb: skipping state transfer > for group NodeIf. > [Fri 09 Feb 2007 16:57:03] G_handle_reg_memb: skipping state transfer > for group NodeMonitorIf. > > Here's when the "ams-server" computer rejoins: > > [Fri 09 Feb 2007 17:02:03] Send_join: State is 4 > [Fri 09 Feb 2007 17:02:03] Memb_handle_message: handling join message > from -2146303161, State is 4 > [Fri 09 Feb 2007 17:02:04] Send_join: State is 4 > [Fri 09 Feb 2007 17:02:04] Memb_handle_message: handling join message > from -2146303161, State is 4 > [Fri 09 Feb 2007 17:02:05] Memb_handle_message: handling join message > from -2146303161, State is 4 > [Fri 09 Feb 2007 17:02:05] Send_join: State is 4 > [Fri 09 Feb 2007 17:02:06] Memb_handle_message: handling join message > from -2146303161, State is 4 > [Fri 09 Feb 2007 17:02:06] Send_join: State is 4 > [Fri 09 Feb 2007 17:02:07] Memb_handle_message: handling join message > from -2146303161, State is 4 > [Fri 09 Feb 2007 17:02:07] Send_join: State is 4 > [Fri 09 Feb 2007 17:02:08] Memb_handle_message: handling join message > from -2146303161, State is 4 > [Fri 09 Feb 2007 17:02:08] Memb_handle_token: handling form2 token > [Fri 09 Feb 2007 17:02:08] Handle_form2 in FORM > [Fri 09 Feb 2007 17:02:08] Memb_transitional > [Fri 09 Feb 2007 17:02:08] G_handle_trans_memb: > [Fri 09 Feb 2007 17:02:08] G_handle_trans_memb in GOP > [Fri 09 Feb 2007 17:02:08] G_handle_trans_memb: Received trans memb id > of: {proc_id: -2146303162 time: 1171069328} > [Fri 09 Feb 2007 17:02:08] Memb_regular > Membership id is ( -2146303162, 1171069329) > [Fri 09 Feb 2007 17:02:08] -------------------- > [Fri 09 Feb 2007 17:02:08] Configuration at ams-server is: > [Fri 09 Feb 2007 17:02:08] Num Segments 1 > [Fri 09 Feb 2007 17:02:08] 10 225.0.1.1 3333 > [Fri 09 Feb 2007 17:02:08] ams-server 128.18.3.70 > [Fri 09 Feb 2007 17:02:08] scs-server 128.18.3.71 > [Fri 09 Feb 2007 17:02:08] sns-server 128.18.3.72 > [Fri 09 Feb 2007 17:02:08] trs-server 128.18.3.74 > [Fri 09 Feb 2007 17:02:08] sds-server 128.18.3.75 > [Fri 09 Feb 2007 17:02:08] srs-server 128.18.3.77 > [Fri 09 Feb 2007 17:02:08] mvs-server 128.18.3.78 > [Fri 09 Feb 2007 17:02:08] rms-server 128.18.3.81 > [Fri 09 Feb 2007 17:02:08] sim-server 128.18.3.82 > [Fri 09 Feb 2007 17:02:08] sim2-server 128.18.3.84 > [Fri 09 Feb 2007 17:02:08] ==================== > > Thanks, > -M > > > ------------------------------------------------------------------------ > > _______________________________________________ > Spread-users mailing list > Spread-users@lists.spread.org > http://lists.spread.org/mailman/listinfo/spread-users ]