[Spread-users] FIFO versus AGREED performance and max latencies and Spread parameters.

Thu Jun 17 18:53:37 EDT 2004

Hi,

I am new to Spread and am doing some simple performance testing. In our
project we care about the maximum latencies suffered in message passing
between applications. My eventual goal is to tune parameters per section
2.4 of the Spread manual to drive down maximum latencies suffered when
using Spread (both in presence of failures and in absence of any
failures.) I started by doing the following test. I have 1 client and
either 1 or 2 servers. The server(s) join a group called "forward" while
the client joins a group called "reverse". The client sends a message of
"n" bytes to the "forward" group and the lowest ranked server sends back
a 1 byte message to the "reverse" group. (Thus, invocations from client
to server(s) are synchronous.) "n" as in size of forward message from
client to server varies from 4,8,16,32,.. to 64k. I send 100,000
messages and keep a histogram of results.

I did the tests with both FIFO_MESS and AGREED_MESS for server group
sizes of 1 and 2. Each process, client and server(s), are on their own
Linux PC with 100 MBPS isolated network between them. A Spread daemon
runs on any machine running either client or server and client and
servers connect only to their local daemons. I show the results in
attached graphic gplot_3234.png. Also shown are results for TCP/IP and
TAO ORB based measurements for same experiment. (Of course in these two
cases the number of "servers" is just 1.) TAO and TCP results are for
illustrative purposes. These and many other results are at our QoS website:

http://www.atl.external.lmco.com/projects/QoS/compare/dist_oo_compare_ipc.html

I was not so suprised that Spread is slwoer than the TAO ORB since I
assume Spread communication goes via daemons. Thus, a round trip consists of:

client->local_daemon_machine_1->local_daemon_machine_2->server and return path of
server->local_daemon_machine_2->local_daemon_machine_1->client.

This is cetainly many more hops than 
tao_client->tao_server with return
tao_server->tao_client.

So extra latency is to be expected. However, I was surprised that I got
no measurable difference between messages to groups of sizes 1 or 2 and
I had no difference bewteen FIFO_MESS and AGREED_MESS. (Since I only
have 1 sender the fact that FIFO and AGREED are similar is perhaps to be
expected also.) If anyone could comment if this is about right I would
appreciate it. Also, in gplot_3438.png I show the whole range of
measuremets for each message size. (Each vertical line goes from min to
max latency observed) Even though there are no network or host failures
in these initial tests I note that max latencies for Spread is 2 seconds
and some packet loss much occur in the system. This is prob. due to many
Spread default paramenters being 2 seconds. But both Solaris 9 and Linux
2.6.x kernels have 1000 HZ internal clock and I would like to push down
the smallest Spread parameter to around 4-10 msec and scale the rest in
current proportions. Can anyone report how aggressively they might have
tried to set these parameters in a LAN setting? And did doing so drive
down the maximum latency suffered?

Gautam H. Thaker
Distributed Processing Lab; Lockheed Martin Adv. Tech. Labs
3 Executive Campus; Cherry Hill, NJ 08002
856-792-9754, fax 856-792-9925  email: gthaker at atl.lmco.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gplot_3234.png
Type: image/png
Size: 8294 bytes
Desc: not available
Url : http://lists.spread.org/pipermail/spread-users/attachments/20040617/a7c2b044/attachment.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gplot_3238.png
Type: image/png
Size: 6975 bytes
Desc: not available
Url : http://lists.spread.org/pipermail/spread-users/attachments/20040617/a7c2b044/attachment-0001.png