[Spread-users] Questions about network disconnections
JL TRESSET
bobsky.lists at orange.fr
Thu Mar 1 11:48:40 EST 2007
Thanks a lot for your so fast answer !
I think we are going to build a new test configuration with a simple
router and 3 or 4 workstation. The behaviors I've described was observed
using our company "intranet" network and we wonder indeed if it's not a
problem with it. Thanks again for your help : now we know what we should
observe ;)
best regards,
JL TRESSET - Anevia
John Lane Schultz wrote:
> Scenarios (1) or (3) are the proper behavior of Spread.
>
> In Spread, previously partitioned daemons will detect each other
> either through traffic being sent on their segment (i.e. -
> broadcast/multicast) address or through periodic unicast probes of
> remote daemons.
>
> Within a LAN, the periodic probing is very slow (e.g. - once every 5
> minutes) and the daemons rely on (re)discovering one another primarily
> through hearing traffic on their segment address. Spread control
> traffic, however, doesn't usually go on the segment address. Most
> commonly, only user data traffic goes on the segment address. This is
> why traffic from a spmonitor or, more commonly, a user application can
> "wake up" daemons to the fact that they have been reconnected at the
> lower level.
>
> Scenarios (2) and (4) are obviously improper behavior. Most likely
> though, such behavior points to a problem in your network as Spread
> has been heavily tested in exactly the scenario you are describing.
>
> Blocking does occur while the daemons are reconfiguring and
> synchronizing. If you have a "flaky" daemon that seems to be
> constantly dis/connecting, which can cause the membership algorithm to
> "churn," then this can freeze the configuration for periods of time.
> However, this shouldn't happen in properly configured and functioning
> LAN environments.
>
> Answers to your questions:
>
> (1) Spread was built to allow distributed applications to cleanly
> handle network partitions and merges. It provides strong semantic
> guarantees and a simple interface for such events.
>
> (2) Yes, it is.
>
> (3) Not that I can see. However, if you don't have any client traffic
> flowing, then the daemons may remain partitioned from their point of
> view. The aberrant behavior you are observing (cases 2 and 4) is most
> likely due to a flaky switch/router or NIC(s) in your network. Also,
> you might want to try a broadcast address to see if you get better
> behavior as not all switches/routers do multicast properly.
>
> If you would like Spread to reform the daemons even when no client
> traffic is flowing, then you could alter the daemon to send some
> control traffic on the segment address periodically, which would
> trigger the membership algorithm to reform (much like a user
> application's traffic does). If you are interested, Spread Concepts
> offers consulting services for such projects and you can contact us at
> info at spreadconcepts.com
>
> Cheers!
>
> JL TRESSET wrote:
>> Hi,
>>
>> we are currently trying to use spread to build some redundancy features
>> and we encounter some strange behaviors of spread daemons (it seems to
>> be strange from our own point of view, but perhaps there is nothing
>> strange, and perhaps it's due to some misunderstanding from us).
>>
>> We use the following simple configuration file with the Spread 4.0 (the
>> precompiled GNU/Linux version) version :
>>
>> Spread_Segment 239.16.0.1:4848 {
>> metaxa 10.0.1.48
>> kebab 10.0.1.46
>> wasabi 10.0.1.72
>> muffin 10.0.1.47
>> }
>> SocketPortReuse = ON
>>
>>
>> and we launch the spread daemon on each workstation (using./spread -n
>> <my host name>). After few seconds, all seems to work on each
>> workstation and something like :
>> Configuration at kebab is:
>> Num Segments 1
>> 4 239.16.0.1 4848
>> metaxa 10.0.1.48
>> kebab 10.0.1.46
>> wasabi 10.0.1.72
>> muffin 10.0.1.47
>> ====================
>>
>>
>> appears on each console.
>>
>>
>> Then we try to unplug the network from one of the station. After few
>> seconds, the unplugged one detects that the daemon is alone on his
>> segment (only the local workstation is listed on the console) and the
>> three others do the same (displaying a list with only three
>> workstation). After few seconds or few minutes, we plug the unplugged
>> station again. Then we have the following possible behaviors, occurring
>> randomly (from our point of view) :
>>
>> 1) the four workstations are in the same segment again, after very few
>> seconds
>> 2) the four workstations are in the same segment again, after very few
>> seconds, but after another short time period the originally unplugged
>> workstation go back from the segment and seems to create his "own"
>> segment.
>> 3) the four workstations are not automatically in the same segment
>> again, but using the spmonitor tool or sptuser sample seems to "excite"
>> them (?!?!) and the original four-stations segment is recreated.
>> 4) the four workstations are never in the same segment again, even using
>> one of the spread tools
>>
>> Note : some times, the 3 remaining workstation seem to be blocking, and
>> the return of the fourth one seems to unlock them....
>>
>> So few questions about these behaviors :
>> 1) should we expect to take into account in our software the problem of
>> "hard" network disconnections using spread ? (we currently build a
>> daemon client using the Spread library API).
>> 2) Is this case a "nominal" usage of spread ?
>> 3) Is there something we don't understand or something we do wrong using
>> spread ?
>>
>> Thanks in advance for your answers,
>>
>> best regards,
>> JLT
>>
>>
>> _______________________________________________
>> Spread-users mailing list
>> Spread-users at lists.spread.org
>> http://lists.spread.org/mailman/listinfo/spread-users
>>
>
>
More information about the Spread-users
mailing list