[Spread-users] spread as a shared library?

Wed Jun 29 23:53:34 EDT 2005

Neil Conway wrote:

> I'm working on an application that will use Spread. However, I don't 
> really need a separate Spread daemon process AFAICS: each node in the 
> group will only have a single connection to Spread. I'd also rather 
> not burden my users with the need to install and configure the Spread 
> daemon.

going through the effort of bulding a library that imposes on users that 
they can only have one "connection to spread" would be disappointing.  
There are many legimate situations in which it can be advantageous to 
have multiple sessions from within a single app -- you may find yourself 
in such a situation, so I'd be careful about limiting that possibility.

> I'm wondering if it would be possible to refactor Spread to provide a 
> shared library that implements the Spread protocol. An application 
> linking against the Spread library would need to call into Spread's 
> event loop regularly (or perhaps just dedicate a thread to this task 
> and have it block inside the Spread event loop). The spread daemon 
> could then be implemented using this library.

One of the more acute pains of Spread is it's lack of dynamic 
configurability combined with it's general configuration fragility.  
There are many simple configuration file "mistakes" (like leaving one 
server out of the list on a single machine) that will cause Spread to 
not work with a non-obious error message.  Also, many common network 
configuration errors that don't cause problems for other protocols can 
reek havoc on a Spread cluster.

The fact that libraries, in general, allow multiple simulatneous uses of 
their functions poses a serious problem w.r.t. Spread's coding style 
(with static variables and the like).  To be a "proper" library, the 
whole she-bang would have to be contextualized.  That way you could 
start two Spread rings within a single process and enforce fine grained 
quality of service in very busy systems (where token loss is chronic to 
to network saturation).  You could tune one ring that handles important 
messages with very very aggressive retry settings an the other "bulk 
ring" to be normal.This accidentally saturating your network fabric on 
one ring will not (or is less likely to) collapse the "vital" ring.

The Spread guys can correct me if I'm wrong here, but as far as I know 
no effort has gone into making any of the functions in there 
thread-safe, reentrant, async-signal-safe.  So, you couldn't guarantee 
any of the stuff in those functions would actually complete correctly 
assuming another program was duing threading and signal management.

You'd need to completely rewrite the SESSION layer to not "connect to 
itself".  You'd want a session context that pushed directly into 
messaging stack (thread-safely).  Go digging in the Spread source, and 
you'll see the overhaul would be pretty tremendous.

> Does this sound feasible? It would obviously require some fairly major 
> surgery, I'm just wondering if there's some reason it's not possible.

If you do surgery that dramatic, be prepared for a Frankenstein.  I'd 
recommend a rewrite based on the protocols and concepts.  It would allow 
a different messaging API to be used and allow much higher performance 
as you could effectively do zero-copy messaging in many cases.

I would figure it would require a few days to hack up what you 
described.  However, I think it would be prone to problems.  On the 
other hand, a professional C senior software engineer could do a 
complete rewrite in one month -- easily.  Spread's only 26k lines of 
code after all :-)  If you wanted to just read the academic papers and 
build a new implementation from scratch, I think you're looking a 2-3 
months for one person or 1 month for 4 people.

This is on our todo list here at OmniTI (along with an epic laundry list 
of enterprise enhancements).

-- 
// Theo Schlossnagle
// Principal Engineer -- http://www.omniti.com/~jesus/
// Ecelerity: Run with it. -- http://www.omniti.com/