sdake at mvista.com
Fri Jan 6 15:15:50 EST 2006
On Fri, 2006-01-06 at 10:56 -0500, Paul Rubel wrote:
> I've been moving along with Spread. Before the new year I had some
> questions about tweaking the timeouts to achieve sub-second detection
> of failures. Lowering the parameters by a factor of 100 across the
> board seems to have done the trick. I'm able to detect failure in ~.2
> seconds using 12 daemons in 3 LANS.
> Moving forward I'd like to use the multigroup_multicast calls. We were
> looking for something like that and lo and behold it was already there,
> thanks! I'm curious about the semantics when a message is sent to
> multiple groups. Are the messages delivered as if all the members of
> the groups were in one large group? In the multigroup case does the
> notion of the individual groups mean anything? For example, could a
> message be delivered to the members of one group while still trying to
> reach agreement for members of another or does Spread wait until all
> the members are in agreement as it would with a multicast to a single
my guess is the groups membership algorithm uses the spread
configuration change message, which are atomically determined at the
time of a new ring formation. But I haven't looked at the code and am
not sure how multiple segments may affect this situation. If my
assumptions are correct, they would be in agreement before multicasting.
> On a related topic, when we have been measuring the detection time for
> failures it seems like the first members of a segment, as listed in
> the spread.conf, get the message before members further down in the
> segment list. We're guessing this is caused by the first daemon listed
> in each segment receiving the message/token first and then passing it
> to the others, who receive (and therefore process) it later. Is the
> correct that the ordering of daemons in the file affects the order in
> which daemons get messages?
The "relative" time at which a message is delivered to an application,
including configuration changes, may have some strong correlation to the
order of processors in the ring, but it really depends on the processor
speed and a lot of other external factors.
But the real answer to your question is, that with agreed (or safe)
ordering, all messages (including configuration changes) are delivered
in order to all nodes. There is no guarantee on "when" (ok with safe,
when only occurs when all messages have been received by all
processors), just "what order".
It is improbable to ensure delivery happens at the same time at each
processor, because of relativity. Maybe someone more clever then me
could think up something though but I would guess it would require some
kind of hardware assistance.
> thank you,
> Spread-users mailing list
> Spread-users at lists.spread.org
More information about the Spread-users