[Spread-users] Tracking network down event [SEC=UNCLASSIFIED]

Guo, Yanchao Yanchao.Guo at sac.com
Fri Feb 19 02:27:07 EST 2010


Thanks Michael.

So from your description, can I say that my application will be notified only when it tries to send data? How about the receiving side?  

> _____________________________________________ 
> From: 	Pilling, Michael [mailto:Michael.Pilling at dsto.defence.gov.au] 
> Sent:	Friday, February 19, 2010 9:50 AM
> To:	Guo, Yanchao; spread-users at lists.spread.org
> Subject:	RE: Tracking network down event [SEC=UNCLASSIFIED]
> 
> UNCLASSIFIED 
> 
> Yanchao, 
> 
> My understanding is that you will be notified of this event, but only when you attempt to use the network as spread optimises the network load of its failure detection algorithm by only detecting failure on use. If necessary, you could implement some kind of ping process group that sends periodic messages through spread to force early detection. This, of course, is a balancing act. If your application uses spread frequently this may not be necessary.
> 
> Also, it will not in anyway diagnose the cause of the problem but simply tell each party that they can now only communicated with a subset of the parties they could previously communicate with, and it will say that the partitioning has been caused by a network failure, not by application instances crashing or application parties voluntairily resigning from the group.
> 
> Each party would therefore get a partial view of the failure. Since the network is down by definition, it cannot be used to integrate these separate views to do strong failure location or diagnostics, although once the network is reestablished you could program your application to swap information and do fault localisation for incidents in the past. This can be useful for fault characterisation and obtaining a system relibility history.
> 
> Therefore when one of your application processes notices a network partition, if fast fault repair is important to you your spread application should notify a human or some other system that can communicate by some other means to gather a more global view to locate, diagnose and correct the fault. This more global view may include information gathered locally by each spread application node. 
> 
> Once the network fault is corrected, spread will automatically notice that the network is reconnected and issue further group membership change messages, allowing the application to restart and recover as appropriate. The beauty of extended virtual synchrony is that it gives all parties a consistent view of the network failure so that the path to recovery is clear (although application semantics dependent).
> 
> Regards,
> Michael 
> 
> 
> DSTO
> PO BOX 1500
> Att: Dr Michael Pilling
> C3ID
> Building 205
> Edinburgh SA 5111
> Ph +61 8 8259 7017
> Fx +61 8 8259 5589 
> 
> 
> Important:   This document remains the property of the Australian Defence
> Organisation and is subject to the jurisdiction of the Crimes Act Section
> 70.  If you have  received  this  document in error, you are requested to
> contact the sender and delete the document.   
> 
> 
> _____________________________________________
> From:   spread-users-bounces at lists.spread.org [mailto:spread-users-bounces at lists.spread.org]  On Behalf Of Guo, Yanchao
> Sent:   Friday, 19 February 2010 11:35
> To:     spread-users at lists.spread.org
> Subject:        Tracking network down event 
> 
> Hi all, 
> 
> I have an server application and a client application communicate via the spread framework, and they are meant to run for long period of time (>7days). As they are connected via VPN, I am wondering if spread is able to detect network outage? i.e., if one of the routers along the path is down, will the members get notified for this event, so that I can start some auto re-connect process? 
> 
> Thanks.
> Yanchao  << File: ATT2684031.txt >> 
> 
> IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.> 
> 
> 


DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/ms-tnef
Size: 4887 bytes
Desc: not available
Url : http://lists.spread.org/pipermail/spread-users/attachments/20100219/9fd1e817/attachment.bin 


More information about the Spread-users mailing list