[Spread-users] Spread Architecture

Sun Nov 12 11:33:59 EST 2006

To your first question, you are correct.  Spread uses TCP/IP or Unix Domain
Sockets for communications between client libraries and the daemons, and for
daemon to daemon communication it uses UDP.  Within the daemon to daemon
communication for data traffic within a shared network Segments, spread uses
either multicast or broadcast.

Spread can detect process and network faults out of the box.  It is tuned
currently to work on WAN and LANs and commodity hardware.  As such, it has
to have its timeouts set in such a way as not to decide that normal network
latency is perceived as a fault.  Furthermore since Spread is software
itself, it competes for CPU time with other processes and the OS and can get
swapped out, therefore leading to latency delays in processing packets.
Finally Spread uses a ring protocol for control traffic and heartbeat.  As
such, as the number of servers grows, the latency necessary for the token to
circulate between the daemons grows.  All these reasons add more latency
that Spread is tuned to be able to handle as normal and NOT process or
network faults.  Therefore if you want to be able to detect process and
network faults faster, you can set the timeouts lower, but you must also
remove the cases of normal latency, so as they not to be perceived as
faults.  

Since you are running on a LAN and not a WAN, this helps already reduce a
lot of the inbuilt network latency.  I would further recommend running a
real-time kernel and setting the Spread daemon process priorities higher so
as to minimize the amount of time the Spread process is swapped out.
Finally I would keep the number of Spread daemons low enough so as the
latency for each daemon to circulate a token, and do the processing
necessary when it gets the token, is minimized.  Doing all this and setting
the token_timeout lower can increase your notification of network and
process faults.  Some clients have gotten this down to around the order of
~1s.  However, getting is down to 10ms may not be possible.  The granularity
of scheduling on commodity hardware and OSs is usually not fine grain
enough.  So I'm not sure ANY software solution will work.

Feel free to contact me directly at Spread Concepts.

Jacob

  _____  

From: spread-users-bounces at lists.spread.org
[mailto:spread-users-bounces at lists.spread.org] On Behalf Of Shumate, James G
Sent: Tuesday, October 31, 2006 8:58 AM
To: spread-users at lists.spread.org
Subject: [Spread-users] Spread Architecture

Spread-users,

I am looking for any information about how spread works under the hood.

Specificity does Spread use UDP between daemons and TCP between application
and daemon?

If it uses UDP between daemons can the timeout and number of retries be set.

I am currently investigation software fault tolerance for real-time embedded
systems and was wondering if SPREAD could be used in our domain.

Our system is on a LAN, we must process data within 10ms of any type of
fault (network cable unplugged, software crash, etc).

So I was wandering if SPREAD can be tuned to retry 3 times.  Each retry is
2ms apart from each other.  Probably have about 20 nodes on the LAN.

Each node will be running a spread daemon and only one application per node
communicating with the daemon on that node.

Also I noticed that older versions of SPREAD had a multi-path capability.  I
assume this was for a node that had multiple network interface cards.

I assume it was used if one network interface card went down SPREAD would
reroute the data over the other network interface card?

Is there any plans to put multi-path capability into Spread 4?

Any information would be greatly appreciated.

Jim

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20061112/efc8137b/attachment.html