[Spread-users] Questions about network disconnections

Thu Mar 1 06:07:23 EST 2007

Hi,

we are currently trying to use spread to build some redundancy  features
and we encounter some strange behaviors of spread daemons (it seems to
be strange from our own point of view, but perhaps there is nothing
strange, and perhaps it's due to some misunderstanding from us).

We use the following simple configuration file with the Spread 4.0 (the
precompiled GNU/Linux version) version :

Spread_Segment  239.16.0.1:4848 {
        metaxa          10.0.1.48
        kebab           10.0.1.46
        wasabi          10.0.1.72
        muffin          10.0.1.47
}
SocketPortReuse = ON

and we launch the spread daemon on each workstation (using./spread -n
<my host name>). After few seconds, all seems to work on each
workstation and something like :
Configuration at kebab is:
Num Segments 1
        4       239.16.0.1        4848
                metaxa                  10.0.1.48
                kebab                   10.0.1.46
                wasabi                  10.0.1.72
                muffin                  10.0.1.47
====================

appears on each console.

Then we try to unplug the network from one of the station. After few
seconds, the unplugged one detects that the daemon is alone on his
segment (only the local workstation is listed on the console) and the
three others do the same (displaying a list with only three
workstation). After few seconds or few minutes, we plug the unplugged
station again. Then we have the following possible behaviors, occurring
randomly (from our point of view)  :

1) the four workstations are in the same segment again, after very few
seconds
2) the four workstations are in the same segment again, after very few
seconds, but after another short time period the originally unplugged
workstation go back from the segment and seems to create his "own" segment.
3) the four workstations are not automatically in the same segment
again, but using the spmonitor tool or sptuser sample seems to "excite"
them (?!?!) and the original four-stations segment is recreated.
4) the four workstations are never in the same segment again, even using
one of the spread tools

Note : some times, the 3 remaining workstation seem to be blocking, and
the return of the fourth one seems to unlock them....

So few questions about these behaviors :
1) should we expect to take into account in our software the problem of
"hard" network disconnections using spread  ? (we currently build a
daemon client using the Spread library API).
2) Is this case a "nominal" usage of spread ?
3) Is there something we don't understand or something we do wrong using
spread ?

Thanks in advance for your answers,

best regards,
JLT