[Spread-users] FIFO versus AGREED performance and max latencies and Spread parameters.

Thu Jun 17 19:26:47 EDT 2004

Hi,

This is just a quick answer without digging too much but your 2 seconds
MAY be a problem with your setup, especially since
I think you write that you have considerable losses (if I understood your
message correctly). Do you have losses and retransmissions?
Can you report how many of them you get?

The fact that there is little difference in performance with 1 or two
servers (or more for that matter) is not that surprising as this is
what Spread is designed to achieve. I am puzzled by your 2 second
numbers. We recently benchmarked a similar version for an upcoming
book by Ken Birman and the results are here:
http://www.cnds.jhu.edu/pub/papers/cnds-2004-1.pdf

When the system is set correctly, I would expect latencies in the
few milliseconds...

Cheers,

       :) Yair.

On Thursday, June 17, 2004 6:53 PM
Gautam Thaker gthaker at atl.lmco.com wrote:

Gautam> Hi,

Gautam> I am new to Spread and am doing some simple performance testing. In our
Gautam> project we care about the maximum latencies suffered in message passing
Gautam> between applications. My eventual goal is to tune parameters per section
Gautam> 2.4 of the Spread manual to drive down maximum latencies suffered when
Gautam> using Spread (both in presence of failures and in absence of any
Gautam> failures.) I started by doing the following test. I have 1 client and
Gautam> either 1 or 2 servers. The server(s) join a group called "forward" while
Gautam> the client joins a group called "reverse". The client sends a message of
Gautam> "n" bytes to the "forward" group and the lowest ranked server sends back
Gautam> a 1 byte message to the "reverse" group. (Thus, invocations from client
Gautam> to server(s) are synchronous.) "n" as in size of forward message from
Gautam> client to server varies from 4,8,16,32,.. to 64k. I send 100,000
Gautam> messages and keep a histogram of results.

Gautam> I did the tests with both FIFO_MESS and AGREED_MESS for server group
Gautam> sizes of 1 and 2. Each process, client and server(s), are on their own
Gautam> Linux PC with 100 MBPS isolated network between them. A Spread daemon
Gautam> runs on any machine running either client or server and client and
Gautam> servers connect only to their local daemons. I show the results in
Gautam> attached graphic gplot_3234.png. Also shown are results for TCP/IP and
Gautam> TAO ORB based measurements for same experiment. (Of course in these two
Gautam> cases the number of "servers" is just 1.) TAO and TCP results are for
Gautam> illustrative purposes. These and many other results are at our QoS website:

Gautam> http://www.atl.external.lmco.com/projects/QoS/compare/dist_oo_compare_ipc.html

Gautam> I was not so suprised that Spread is slwoer than the TAO ORB since I
Gautam> assume Spread communication goes via daemons. Thus, a round trip consists of:

client->>local_daemon_machine_1->local_daemon_machine_2->server and return path of
server->>local_daemon_machine_2->local_daemon_machine_1->client.

Gautam> This is cetainly many more hops than 
tao_client->>tao_server with return
tao_server->>tao_client.

Gautam> So extra latency is to be expected. However, I was surprised that I got
Gautam> no measurable difference between messages to groups of sizes 1 or 2 and
Gautam> I had no difference bewteen FIFO_MESS and AGREED_MESS. (Since I only
Gautam> have 1 sender the fact that FIFO and AGREED are similar is perhaps to be
Gautam> expected also.) If anyone could comment if this is about right I would
Gautam> appreciate it. Also, in gplot_3438.png I show the whole range of
Gautam> measuremets for each message size. (Each vertical line goes from min to
Gautam> max latency observed) Even though there are no network or host failures
Gautam> in these initial tests I note that max latencies for Spread is 2 seconds
Gautam> and some packet loss much occur in the system. This is prob. due to many
Gautam> Spread default paramenters being 2 seconds. But both Solaris 9 and Linux
Gautam> 2.6.x kernels have 1000 HZ internal clock and I would like to push down
Gautam> the smallest Spread parameter to around 4-10 msec and scale the rest in
Gautam> current proportions. Can anyone report how aggressively they might have
Gautam> tried to set these parameters in a LAN setting? And did doing so drive
Gautam> down the maximum latency suffered?

Gautam> Gautam H. Thaker
Gautam> Distributed Processing Lab; Lockheed Martin Adv. Tech. Labs
Gautam> 3 Executive Campus; Cherry Hill, NJ 08002
Gautam> 856-792-9754, fax 856-792-9925  email: gthaker at atl.lmco.com