[Spread-users] Spread scaling

Alex Bligh alex at alex.org.uk
Tue Nov 4 04:28:41 EST 2008


This is the first of five (possibly newbie - sorry) questions about spread.
I've split them up into separate emails for ease of reply and to
help others search.

1. The environment I am evaluating using spread in will have a number of
   machines that may reach the low single digit thousands. Life would be
   greatly simplified if these are within a single domain. There will be
   a maximum of 10 clients on each machine. Connectivity between machines
   is gigabit and uncontended, and I am not that fussed about total
   latency (under a second would be fine) though obviously the faster the
   better.

   Spread's documentation (possibly outdated) and a mailing list
   entry here:
    <http://www.spread.org/pipermail/spread-users/2002-June/000823.html>
   says there is a hard limit of 127/128 machines at the protocol
   level (i.e. I can't just adjust a compile time constant). Is that
   still correct in 4.0 given the dynamic membership stuff?

   My environment is homogenous (i.e. I cannot use several unconnected
   instances of spread). I cannot run spread daemons on a subset of
   machines only with each node only connecting to one machine, because
   the whole point is to detect machine/process failure and death of
   the machine running the spread node will be too catastrophic. I could,
   in theory, somehow run spread on say 5% of the machines, and make
   each machine try and connect to 3 of them, failing over. However,
   I am not convinced this would still result in reliable delivery and
   no message disordering. I could also, in theory, run (say) 20 separate
   instances of spread and construct an "interdomain router" (to borrow
   some old terminology) and route messages between them. Doing this
   reliably (e.g. coping with the failure of the router means having
   more than one of them) sounds horribly complicated and like it would be
   easier to reengineer spread.

   Any advice?

2. Assuming I can fix (1), there also seems to be a limit of 127 machines
   per segment. All machines will be on the same switched L2 gig ethernet
   fabric. Of course I can arbitrarily divide this into segments of
   maximum size 127 for Spread - is this sufficient? Or has this limit
   been raised too / can I lift it?

Alex








More information about the Spread-users mailing list