[Spread-users] Spread performance with lots of nodes, in a particularly interesting topology

Fri Nov 9 09:27:43 EST 2007

Ok. As I've mentioned lately, I'm involved in developing a snazzy
distributed database, using spread for the reliable multicasting.

The next step in the development of this thing is going to be data
partitioning: storing different bits of data on different groups of
servers, so that we don't have the scaling bottleneck on writes of
every server having to do every write.

This means that our spread setup will start to be that every node
joins a few low-traffic global groups for metadata updates and a
global once-per-second heartbeat and that sort of thing, then there
will be lots of spread groups that are only joined by processes on a
small number of nodes. But all the nodes will be *sending* update
messages to these groups.

Eg, we might move towards having fifty servers. Relatively little
traffic is broadcast across all of them; most of it will be
targetting spread groups only held on four or five servers at a time,
but originating all over the place.

Oh, and this is spread over many spread segments. We currently have
one each in London and Dublin joined by a dedicated link with ~10ms
round trip time, but that may rise as we gain more traffic. Initially
with more segments in the same datacentres, but before long, more
geographically dispersed ones, too.

So, does anyone have any idea how well Spread will handle this? :-)

Thanks,

ABS

--
Alaric Snell-Pym
Work: http://www.snell-systems.co.uk/
Play: http://www.snell-pym.org.uk/alaric/
Blog: http://www.snell-pym.org.uk/?author=4