[Spread-users] Multipathing in spread(how stable is the patch)

Kashif Shaikh kshaikh at consensys.com
Thu Dec 30 16:43:23 EST 2004


Hi everyone,

I plan to use spread to implement high availability services, but I need a
feature that allows spread to use multiple NICs on a host, and to switch
over to a backup NIC if the primary NIC has failed.

I see the patch, multipath_3.16.1-5.tar.gz, by Marc Zyngier. Is this patch
stable and does it work and will it fill my HA requirement? I see it's a
two-year old patch(2002), so how come it hasn't been committed to CVS yet?

The other way I tried to support multipathing was to start a spread daemon
for each interface using following configuration(note the unique ports for
each daemon). Member1 is a machine, and Member2 is another machine.

Spread_Segment 192.168.1.255:4803 {
       member1_primary_net 192.168.1.105
       member2_primary_net 192.168.1.106
}

Spread_Segment 192.168.1.255:4813 {
       member1_backup_net 192.168.2.105
       member2_backup_net 192.168.2.106
}

Then on member1, do this:

spread -n member1_primary_net &
spread -n member1_backup_net &

And likewise on member2:

spread -n member2_primary_net &
spread -n member2_backup_net &

Finally, I start spuser on both members and connect to port 4803(or 4813)
and join a group. Everything runs fine(can send messages).  Now I pull the
network cable of the member2's primary network, and it doesn't work as
expected:

on member1 I get:
============================
Received REGULAR membership for group kashif with 1 members, where I am
member 0:
        #user#member1_primary_net
grp id is -1062731415 1104443100 1
Due to NETWORK change. VS set has 1 members:
        #user#member1_primary_net

User>


on member2 I get:
============================
Received REGULAR membership for group kashif with 1 members, where I am
member 0:
        #user#member2_primary_net
grp id is -1062731414 1104443118 1
Due to NETWORK change. VS set has 1 members:
        #user#member2_primary_net

User>

So now both sides see a partition(and I can't send SAFE messages). I think
this is happening because member2_primary_net spread process can't forward
packets via member2_backup_net spread process.  The only way I think I can
solve this is at the application level, and have spuser connect both to
4803(primary spread daemon) and 4813(backup spread daemon).

If there is a NETWORK change on the primary connection, then I will send all
future messages to the backup spread connection. Does anyone think this is
possible or will my hypothetical idea break down?

Kashif Shaikh
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.6.6 - Release Date: 12/28/2004





More information about the Spread-users mailing list