[Spread-users] MAX_PROCS_[RING/SEGMENT]

Steven Dake scd at broked.org
Wed Feb 1 03:47:24 EST 2006


On Fri, 2006-01-20 at 12:48, Matthew Gillen wrote:
> Hi,
> I was looking at the MAX_PROCS_RING and MAX_PROCS_SEGMENT #define's in
> spread_params.h, and it looks like they are only used to initialize the
> size of a couple arrays.
> 
> Is there any reason to believe that things would blow up if I bumped
> that number up past 128?  Or is that number more of a memory-usage
> optimization?
> 

Most ring protocols really don't scale past 16 nodes without the use of
intelligent forwarding gateways (which spread implements through
segments).  The reason is that the network may have 100 mbit of
bandwidth - if you have 16 processors, each processor gets 100mbit/16
worth of bandwidth, with 128 processors, each processor gets 100mbit/128
of the bandwidth.  The flow control protocol has some impact on how this
works in a practical sense, but longer delays in message delivery will
occur with larger rings (equal to 1/2 the token rotation time for agreed
messages).  Fortunately the flow control rotates around the ring
providing some fairness.

I have found with some broken switches multicast udp messages larger
then the mtu may never be sent to some processors causing the membership
protocol to behave correctly - but cause many reconfigurations while
forming a ring and rejecting some processors from the ring.  Then when
these processors see a normal multicast from an external ring, they try
to form a new configuration, which causes a new round of the membership
protocol to be applied.  It is best to keep all membership messages
including the commit token under the mtu size unless you are sure your
switches work properly.  Most of these broken switches have to do with
new switchcore designs on ATCA switch blades instead of common
100mbit/gige switches.  If you have a private hub'ed network, you
shouldn't have this problem, but non-private hubbed networks can fall
prey to collisions causing the above problems.  Of course, the spread
(or totem, or other ring protocols based on Dr Amir's et al work)
membership protocol handles this hardware problem in a proven-correct
method, it may be undesirable for your application.

The number of processors in a ring directly impacts the size of the
membership multicasts, especially the commit token.

Regards
-steve

> Thanks,
> Matt
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users





More information about the Spread-users mailing list