[Spread-users] Questions about network disconnections

JL TRESSET bobsky.lists at orange.fr
Thu Mar 1 11:48:40 EST 2007

Thanks a lot for your so fast answer !

I think we are going to build a new test configuration with a simple 
router and 3 or 4 workstation. The behaviors I've described was observed 
using our company "intranet" network and we wonder indeed if it's not a 
problem with it. Thanks again for your help : now we know what we should 
observe ;)

best regards,


John Lane Schultz wrote:
> Scenarios (1) or (3) are the proper behavior of Spread.
> In Spread, previously partitioned daemons will detect each other 
> either through traffic being sent on their segment (i.e. - 
> broadcast/multicast) address or through periodic unicast probes of 
> remote daemons.
> Within a LAN, the periodic probing is very slow (e.g. - once every 5 
> minutes) and the daemons rely on (re)discovering one another primarily 
> through hearing traffic on their segment address.  Spread control 
> traffic, however, doesn't usually go on the segment address.  Most 
> commonly, only user data traffic goes on the segment address.  This is 
> why traffic from a spmonitor or, more commonly, a user application can 
> "wake up" daemons to the fact that they have been reconnected at the 
> lower level.
> Scenarios (2) and (4) are obviously improper behavior.  Most likely 
> though, such behavior points to a problem in your network as Spread 
> has been heavily tested in exactly the scenario you are describing.
> Blocking does occur while the daemons are reconfiguring and 
> synchronizing.  If you have a "flaky" daemon that seems to be 
> constantly dis/connecting, which can cause the membership algorithm to 
> "churn," then this can freeze the configuration for periods of time.  
> However, this shouldn't happen in properly configured and functioning 
> LAN environments.
> Answers to your questions:
> (1) Spread was built to allow distributed applications to cleanly 
> handle network partitions and merges.  It provides strong semantic 
> guarantees and a simple interface for such events.
> (2) Yes, it is.
> (3) Not that I can see.  However, if you don't have any client traffic 
> flowing, then the daemons may remain partitioned from their point of 
> view.  The aberrant behavior you are observing (cases 2 and 4) is most 
> likely due to a flaky switch/router or NIC(s) in your network.  Also, 
> you might want to try a broadcast address to see if you get better 
> behavior as not all switches/routers do multicast properly.
> If you would like Spread to reform the daemons even when no client 
> traffic is flowing, then you could alter the daemon to send some 
> control traffic on the segment address periodically, which would 
> trigger the membership algorithm to reform (much like a user 
> application's traffic does).  If you are interested, Spread Concepts 
> offers consulting services for such projects and you can contact us at 
> info at spreadconcepts.com
> Cheers!
> JL TRESSET wrote:
>> Hi,
>> we are currently trying to use spread to build some redundancy  features
>> and we encounter some strange behaviors of spread daemons (it seems to
>> be strange from our own point of view, but perhaps there is nothing
>> strange, and perhaps it's due to some misunderstanding from us).
>> We use the following simple configuration file with the Spread 4.0 (the
>> precompiled GNU/Linux version) version :
>> Spread_Segment {
>>        metaxa
>>        kebab 
>>        wasabi
>>        muffin
>> }
>> SocketPortReuse = ON
>> and we launch the spread daemon on each workstation (using./spread -n
>> <my host name>). After few seconds, all seems to work on each
>> workstation and something like :
>> Configuration at kebab is:
>> Num Segments 1
>>        4        4848
>>                metaxa        
>>                kebab         
>>                wasabi        
>>                muffin        
>> ====================
>> appears on each console.
>> Then we try to unplug the network from one of the station. After few
>> seconds, the unplugged one detects that the daemon is alone on his
>> segment (only the local workstation is listed on the console) and the
>> three others do the same (displaying a list with only three
>> workstation). After few seconds or few minutes, we plug the unplugged
>> station again. Then we have the following possible behaviors, occurring
>> randomly (from our point of view)  :
>> 1) the four workstations are in the same segment again, after very few
>> seconds
>> 2) the four workstations are in the same segment again, after very few
>> seconds, but after another short time period the originally unplugged
>> workstation go back from the segment and seems to create his "own" 
>> segment.
>> 3) the four workstations are not automatically in the same segment
>> again, but using the spmonitor tool or sptuser sample seems to "excite"
>> them (?!?!) and the original four-stations segment is recreated.
>> 4) the four workstations are never in the same segment again, even using
>> one of the spread tools
>> Note : some times, the 3 remaining workstation seem to be blocking, and
>> the return of the fourth one seems to unlock them....
>> So few questions about these behaviors :
>> 1) should we expect to take into account in our software the problem of
>> "hard" network disconnections using spread  ? (we currently build a
>> daemon client using the Spread library API).
>> 2) Is this case a "nominal" usage of spread ?
>> 3) Is there something we don't understand or something we do wrong using
>> spread ?
>> Thanks in advance for your answers,
>> best regards,
>> JLT
>> _______________________________________________
>> Spread-users mailing list
>> Spread-users at lists.spread.org
>> http://lists.spread.org/mailman/listinfo/spread-users

More information about the Spread-users mailing list