[Spread-users] Questions about network disconnections
JL TRESSET
bobsky.lists at orange.fr
Thu Mar 1 06:07:23 EST 2007
Hi,
we are currently trying to use spread to build some redundancy features
and we encounter some strange behaviors of spread daemons (it seems to
be strange from our own point of view, but perhaps there is nothing
strange, and perhaps it's due to some misunderstanding from us).
We use the following simple configuration file with the Spread 4.0 (the
precompiled GNU/Linux version) version :
Spread_Segment 239.16.0.1:4848 {
metaxa 10.0.1.48
kebab 10.0.1.46
wasabi 10.0.1.72
muffin 10.0.1.47
}
SocketPortReuse = ON
and we launch the spread daemon on each workstation (using./spread -n
<my host name>). After few seconds, all seems to work on each
workstation and something like :
Configuration at kebab is:
Num Segments 1
4 239.16.0.1 4848
metaxa 10.0.1.48
kebab 10.0.1.46
wasabi 10.0.1.72
muffin 10.0.1.47
====================
appears on each console.
Then we try to unplug the network from one of the station. After few
seconds, the unplugged one detects that the daemon is alone on his
segment (only the local workstation is listed on the console) and the
three others do the same (displaying a list with only three
workstation). After few seconds or few minutes, we plug the unplugged
station again. Then we have the following possible behaviors, occurring
randomly (from our point of view) :
1) the four workstations are in the same segment again, after very few
seconds
2) the four workstations are in the same segment again, after very few
seconds, but after another short time period the originally unplugged
workstation go back from the segment and seems to create his "own" segment.
3) the four workstations are not automatically in the same segment
again, but using the spmonitor tool or sptuser sample seems to "excite"
them (?!?!) and the original four-stations segment is recreated.
4) the four workstations are never in the same segment again, even using
one of the spread tools
Note : some times, the 3 remaining workstation seem to be blocking, and
the return of the fourth one seems to unlock them....
So few questions about these behaviors :
1) should we expect to take into account in our software the problem of
"hard" network disconnections using spread ? (we currently build a
daemon client using the Spread library API).
2) Is this case a "nominal" usage of spread ?
3) Is there something we don't understand or something we do wrong using
spread ?
Thanks in advance for your answers,
best regards,
JLT
More information about the Spread-users
mailing list