[Spread-users] Multipath Spread #2

Jonathan Stanton jonathan at cnds.jhu.edu
Wed Oct 17 09:19:49 EDT 2001


	I hoped to get back to this sooner, but better late then never.

First, I have included a few pieces of this patch that fix bugs in the
spread cvs which will become 3.16.1. I will release one more
testing release to get platform checks on the fix for interface binding
and broadcasts.

The rest of my comments are inline.

On Tue, Oct 09, 2001 at 12:16:39PM +0200, Marc Zyngier wrote:
> Hello all,
> This is the second release of the multipath patch for Spread
> 3.16.1beta1, and the first actually doing something useful. The
> purpose of patch is to send Spread traffic on multiple networks, to
> support network failures in a high avaibility environment.
> As of this version, Spread traffic (both broadcasts and token
> circulation) is replicated on all networks. There is no code to
> handle primary/backup links yet, although we already introduced some
> state information in this patch.   
Have you tried this with multicast addresses and not broadcast? Multicast
have some 'not always useful' behaivor with multi-homed machines and I was
curious if this approach worked there also. I think it should be fine once
the routes were set right. (Some os's only send multicasts out one
interface in a multi-homed machine -- a way around this is to bind the
source socket to the interface to force a send out of that interface as

> What's new since last week :
> * Broadcast and bound socket don't mix...
> Among the new things introduced in 3.16.1beta1, the way broadcast
> reception sockets are bound have changed. If a specific interface is
> used for daemon communication, both broadcast and token sockets are
> bound to this interface, which is quite nice. Unfortunatly, some OSs
> don't deliver broadcast messages to an UDP socket bound to anything
> but INADDR_ANY (W2K is ok, but neither Linux 2.4.x nor Solaris 8 are
> delivering broadcasts). To work around this, we now have both an
> INADDR_ANY bound socket to receive broadcasts, and a per-interface
> socket.

Thanks for discovering this. This is obviously a general problem and one
of the things I'm including a fix for. It turns out after a bit of poking
at it that the cleanest way to fix this is to bind sockets that should
receive broadcasts to the 'Broadcast' adress of the interface as well as
the unicast address. So for network with broadcast and
interface, we bind a socket to and a second socket to The second socket will receive broadcasts and the first will
receive unicasts. It appears like the same approach works for multicasts
based on my local tests here.

> * Avoiding late token...
> One of the problems that bite us was Spread crashing under high load 
> with a message like :
> Answer_retrans: retrans of 8253 requested while Aru is 8411
> This happens when two networks are *very* out of sync, and an old
> token carrying retransmission requests for messages we don't have
> anymore comes in. Too bad. To reduce the window of opportunity for
> this to happend, we did two things :
>       * Expand the ARQ field from 4 to 8 bits,
>       * Tighten the way tokens are accepted when proc is not leader.
> With this modifications, our test machines, which would usually crash
> before processing 10000 messages, survived a 32 million packets test
> tonight.
> Note that this race condition still exists. It is *very* unlikely to
> happend now, but could happen under very specific conditions.

By tightening up you mean this line from the protocol.c patch?

@@ -466,7 +469,7 @@
-		if( Get_arq(Token->type) == Get_arq(Last_token->type) )
+		if( Get_arq(Token->type) != ((Get_arq(Last_token->type)+1)%0x100) )
 		    if( Get_retrans(Token->type) > Get_retrans(Last_token->type) )
So you drop into the resend or swallow case if the new token is not the
'expected' one instead of if it is the same as the one we already
received? Was the problem that tokens 'older' then the previous one were

> * Configuration simplification
> As per Yair Amir suggestion, configuration is a little bit simpler
> (saves some braces...). This is the same configuration that was in the
> first patch :
> Spread_Segment mp:4803
> {
>       {
>               foo   D
>               bar
>               fubar
>       }
>       {
>               foo
>               fubar
>       }
>       foo C
> }

Without the {} is nicer, I agree. One thing that is somewhat confusing I
think is that the foo client interface specification is not near the rest
of the foo specifications. I havn't come up with a better format, but I'm
thinking about it. Somehow to capture that each host is on  multiple
networks and allow the multiple broadcast addresses to be recorded.

> * No more DL_send messing...
> Most of data_link.[ch] hacks have been removed, and multipath.[ch] now
> acts as a layer between network.[ch] and data_link.
This is nicer.

Some other feedback.

I applied your fixes to the memset bugs in groups.c. Thanks,

You added or changed a number of uses of IP macros to inet_ntoa. I think
this is not the right direction for a few reasons. The reason we do not
use inet_ntoa() in Alarms or other places is because it uses a static
buffer to store the returned string, so if you use it multiple times in
one alarm, then all of the printed values are the same and equal whoever
was last. So even though your uses of it are only a single use per Alarm,
it is easy to forget the problem and someone later adds another IP address
to be printed and then you do not get correct values. So it always seemed
too dangerous to use this way. I definitely think there probably is a
better way then the IP1 macros, but I don't think inet_ntoa is the way.

Why did you change the interface to DL_send to use in_addr instead of
directly passing the address? This also requires more includes
(data_link.h) and a more complex interface for apps using data_link (more
then spread use it). I know that using in_addr is more direct sockets
programming, but part of the idea of the DL layer was to expose a slightly
simpler and abstracted interface that did not reqiure using native sockets

Why did monitor need to use the mp send? Shouldn't sending the monitor
commands on one network be sufficient, maybe with a choice of which one to
use? Then if one fails you can pick the other, but you do not duplicate
all of the monitor processing twice. (generating status messges twice,
doing commands twice, ...)

I hope this helps, and if you have any comments on the changes I applied
to fix stuff in CVS, just tell me.


Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science   
Johns Hopkins University    

More information about the Spread-users mailing list