[Spread-users] Fault Tolerant Server

David Avraamides David.Avraamides at SevernRiverCapital.com
Thu Dec 9 14:46:01 EST 2004

The basic approach I use is what I call "discovery". In our world each
type of service would publish on a different spread group and whenever a
service comes up it broadcasts a discovery request message on the group.
Any other instances of the service respond with a discovery reply and
the requesting service adds their private group name to a list. This
service then adds itself to the list. The list is sorted alphabetically
and this determines the ranking of the peers. Additionally, each service
sends out heartbeats and a listening service updates the last heartbeat
time of that peer. A background timer removes stale peers from the peer
list. The net effect is that each instance of a service should maintain
an identical list of the active peers in the peer group.

All services go through this process and the decision of whether a
service should implement fault-tolerance, horizontal scaling, or both is
up to the derived class. In the fault-tolerant only case, the peer with
rank 0 is the only one that publishes messages. All of the peers will
"hear" requests but only the 0-th peer in the ranking is active (i.e.
the master). If it goes down, planned or unplanned, the rest of the
peers will adjust their list and one of them will become the new peer
and start publishing. This works whether the service is a request-reply
model or a pub-sub model. In the pub-sub model, after discovery, the
publisher sends out a topic list request so all clients will let it know
what topics they are listening to. This way it should maintain the same
subscription list as the master publisher.

For the horizontal scaling case, I simply "distribute" the requests
among the peers by modding the request ID (set uniquely by the client)
with the number of peers in the peer list. If the remainder matches this
instance's rank, it process the message, else it discards it. For
subscriptions, I don't have a request ID so I mod the hash of the topic
ID. Same diff - each client request is handled by one service.

- I don't support true load balancing, its really just dynamic request
partitioning. But that's fine for our needs (so far).
- There are possible race conditions (server dies while processing a
request and no other service will pick up the request). I just let the
client handle it by retrying the request later. A typical client will
blast out a large number of request and monitor the replies and after
some time period, timeout and examine/retry any missing replies. If two
replies are received by a client, the last one wins. In our world (hedge
fund) the latest one is the best one to use.
- I'm still playing with good timeouts for how long to wait before
marking a service as stale or waiting for discovery replies (right now 5
- Some services partition better less randomly, for example our
calculation service is best partitioned by the type of model
(convertible bond, option, CDS, etc.) so it can look at the instrument
type of the request rather than modding the request ID. The point is
that its up to the specific implementation of the service to make this

I've also written a launcher service that can start/stop other services
proactively or reactively, thus our risk scenario process could ask the
launcher service to start up the calculation service on every machine on
the LAN, run our risk scenarios, then ask them to be stopped (that
hasn't really been tested yet, but its designed in). My plan is to
deploy the launcher and "service" assembly (this is all in C#) on every
machine in the firm (client or server) and make them available as worker
machines if/when needed.


-----Original Message-----
From: spread-users-admin at lists.spread.org
[mailto:spread-users-admin at lists.spread.org] On Behalf Of Mike Perik
Sent: Thursday, December 09, 2004 11:54 AM
To: spread-users at lists.spread.org
Subject: RE: [Spread-users] Fault Tolerant Server

I would be interested.  

Scaling is another thing that I'm interested in or load balancing with
failover.  Ultimately, I'd like to split the load across servers and if
one of them goes down the other server(s) would pick up it's load.  

I've implemented a very simple algorithm in which the servers join a
group and send a simple message requesting who is the current publisher.
It then backs off a random amount of time ( < 1 sec) and if after a few
requests it hasn't gotten a response it claims to the be publisher and
starts to publish.

In order to do load balancing I would have to move this type of logic
from the group level to the subject level.

Any other ideas/methods for handling this?


--- David Avraamides
<David.Avraamides at SevernRiverCapital.com> wrote:

> We have implemented a redundant server model here to handle both 
> server failures and to provide server scale-out where appropriate. Its

> built on top of Spread, thus at this time it doesn't address failures 
> of the Spread daemon itself, rather our notion of a messaging service 
> that sits on top of the Spread network. I'm not sure if that's what 
> you meant. I can give more details if you are interested.
> -Dave
> ________________________________
> From: spread-users-admin at lists.spread.org on behalf of Mike Perik
> Sent: Thu 12/9/2004 9:37 AM
> To: spread-users at lists.spread.org
> Subject: [Spread-users] Fault Tolerant Server
> Has anyone implemented Fault Tolerance into their system.  I'm 
> planning on implementing something that
> would allow multiple servers back each other up.  If
> primary goes down then one of the backups picks up the broadcasting of

> data.
> I'd be interested in any designs you've use and how they have worked.
> Thanks,
> Mike
> __________________________________
> Do you Yahoo!? 
> Yahoo! Mail - Find what you need with new enhanced search.
> http://info.mail.yahoo.com/mail_250
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org

Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 

Spread-users mailing list
Spread-users at lists.spread.org

More information about the Spread-users mailing list