[Spread-users] Multipath Spread #2

Mon Oct 22 07:46:52 EDT 2001

>>>>> "JS" == Jonathan Stanton <jonathan at cnds.jhu.edu> writes:

Hi Jonathan,

JS> Have you tried this with multicast addresses and not broadcast?
JS> Multicast have some 'not always useful' behaivor with multi-homed
JS> machines and I was curious if this approach worked there also. I
JS> think it should be fine once the routes were set right. (Some os's
JS> only send multicasts out one interface in a multi-homed machine --
JS> a way around this is to bind the source socket to the interface to
JS> force a send out of that interface as well)

We haven't tested multicast yet. This will at least need one send
socket per network, bound to the right interface, instead of a global
send socket. This is the next (planned) step...

JS> It turns out after a bit of poking at it that the cleanest way to
JS> fix this is to bind sockets that should receive broadcasts to the
JS> 'Broadcast' adress of the interface as well as the unicast
JS> address. So for network with broadcast 10.0.1.255 and interface
JS> 10.0.1.5, we bind a socket to 10.0.1.5 and a second socket to
JS> 10.0.1.255. The second socket will receive broadcasts and the
JS> first will receive unicasts. It appears like the same approach
JS> works for multicasts based on my local tests here.

Nice idea. Doesn't work on Windows (on W2K, bind fails on a non-local
address), but we can work around this, though.

JS> By tightening up you mean this line from the protocol.c patch?

JS> @@ -466,7 +469,7 @@
JS>  		    return; 
JS>  		}
JS>  	}else{
JS> -		if( Get_arq(Token->type) == Get_arq(Last_token->type) )
JS> +		if( Get_arq(Token->type) != ((Get_arq(Last_token->type)+1)%0x100) )
JS>  		{
JS>  		    if( Get_retrans(Token->type) > Get_retrans(Last_token->type) )
JS>  		    {

yes.

JS> So you drop into the resend or swallow case if the new token is
JS> not the 'expected' one instead of if it is the same as the one we
JS> already received? Was the problem that tokens 'older' then the
JS> previous one were appearing?

Yes, exactly. If you have multiple networks, they can be very out of
sync (congestion, for example). In such a situation, you can easily
receive a token with ARQ=8 on an interface, and an old token with
ARQ=1 on another. This Is Bad(tm).

Having this stricter test, as well as having a wider ARQ field helped
us having a system that stays up on very high loads, instead of
crashing after a few thousand messages.

JS> You added or changed a number of uses of IP macros to inet_ntoa. I
JS> think this is not the right direction for a few reasons. The
JS> reason we do not use inet_ntoa() in Alarms or other places is
JS> because it uses a static buffer to store the returned string, so
JS> if you use it multiple times in one alarm, then all of the printed
JS> values are the same and equal whoever was last. So even though
JS> your uses of it are only a single use per Alarm, it is easy to
JS> forget the problem and someone later adds another IP address to be
JS> printed and then you do not get correct values. So it always
JS> seemed too dangerous to use this way. I definitely think there
JS> probably is a better way then the IP1 macros, but I don't think
JS> inet_ntoa is the way.

Fair enough. Part of this was because of the int/in_addr switch (see
below).

JS> Why did you change the interface to DL_send to use in_addr instead
JS> of directly passing the address? This also requires more includes
JS> (data_link.h) and a more complex interface for apps using
JS> data_link (more then spread use it). I know that using in_addr is
JS> more direct sockets programming, but part of the idea of the DL
JS> layer was to expose a slightly simpler and abstracted interface
JS> that did not reqiure using native sockets types.

If data_link was supposed to be transport agnostic, why using IP
addresses at this level ? Or am I misunderstanding something ?

At least having a specific type for an address would be nice
(net_addr_t, or anything like that). This would help a lot if someone
wants to switch to another transport (say IPv6 for example).

JS> Why did monitor need to use the mp send? Shouldn't sending the
JS> monitor commands on one network be sufficient, maybe with a choice
JS> of which one to use? Then if one fails you can pick the other, but
JS> you do not duplicate all of the monitor processing
JS> twice. (generating status messges twice, doing commands twice,
JS> ...)

Yep, that would be nice. But the question still remains : how do you
detect that sending has failed ?

JS> I hope this helps, and if you have any comments on the changes I
JS> applied to fix stuff in CVS, just tell me.

I'll try to make an updated patch tomorrow, and will keep you posted
about it.

Thanks a lot for your comments.

Marc Zyngier
Evidian - SafeKit Project
http://www.evidian.com
-- 
And don't forget you'll never get a dog to walk upright
Just 'cause you've got the power, that don't mean you've got the right.