[Spread-users] Large Objects and TCP vs. Spread (was Public CVS?)

Wed Sep 19 00:54:21 EDT 2001

On Tue, Sep 18, 2001 at 09:14:08PM -0700, Sean Chittenden wrote:
> In any case, I've got an account, it took 15min to have it setup and am
> working away happily with it.  My thoughts on anon CVS are known, but
> now that I'm far enough along that I need spread working, I caved am now
> chunking away getting a distributed persistence layer/module/interface
> working for ruby.  That said, here's a technical question re: spread.

I assume you have seen the Ruby interface that George Schlossnagle wrote
awhile ago. I don't know if it is current.
> 
> How well does spread work for large objects?  I'm wondering whether or
> not I should have the spread session managers open up a TCP port for
> large blobs of data.  Has anyone done any benchmarking to figure out at
> what size a piece of data should be sent over the wire via TCP vs
> spread/UDP?  I'd think that latency, throughput of the network, and the
> size of the messages will make a difference, but am wondering if there's
> decent formulaic way of having this dynamically determined based off of
> the parameters in the spread daemon.  Any thoughts?

Well, probably the first main issue is whether or not the data needs to be
multicast (multiple receivers) or only transfered to one other node. If
you are doing some sort of state transfer and it only goes to one other
node, then TCP might be a better choice, it depends on other parameters.
However, if the lare object needs to be multicast, TCP will rapidly lose
out because of the duplication of used bandwidth to send 'n' copies.

Spread breaks all messages into 'ethernet' sized packets send as UDP
datagrams. If your network has a smaller MSS, you might want to tune
spread to know about it. (also if it supports larger frames like Gigabit
Ethernet). 

Since Spread has a max message size(about 100 KB) suporting really large
objects requires some application level handling, which can make a tcp
stream sound appealing. (For Java we have a subclassed version of
SpreadConnection that supports arbitrarily large objects that is
available)

Spread's network performance will be effected by the flow control
parameters (which can be tweaked through the monitor) and the number of
servers in the ring (not the number of clients of those servers).

The only benchmarks we have up right now are in the paper published in the
Internation Conference on Dependable Systems and Networks 2001. But they
are about wide-area networks with multiple hops and not the current
version of Spread. I am generating some numbers now for Spread
performance, but that won't provide a direct comparison with TCP.

Jonathan

> PS Feel free to change the Subject:, this current topic is depressing
> me.
Hope you like the new one better.

-- 
-------------------------------------------------------
Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science   
Johns Hopkins University    
-------------------------------------------------------