[Spread-users] Sending messages to a fixed set of servers

Fri May 7 07:25:28 EDT 2004

Hello Ciprian,

  thanks for your very helpful reply. I have looked through 
"Impossibility of Distributed Consensus with One Faulty Process" 
(Fischer-Lynch-Patterson) which without doubt is another important brick 
of knowledge in all this (never thought about it on that general kind of 
a level).

Well, I agree that my first quick-shot idea doesn't even live up to my 
own requirements, just forget about it (forgive me, I'm still novice 
with these matters and will remain so for unknown lengths). Still, 
thanks for your suggestions.

I've read through "From Total Order to Database Replication" and it is 
really inspiring for me in my current position. You and Yair propose a 
lot of properties that I think also comply for the service that I'm 
thinking about. I'm afraid I haven't yet understood all details (I'll 
read it again), but I think I can now identify my consistency 
requirements a lot better than before :)

Now, a general database replication service (as you and Yair propose so 
nicely) clearly has stronger global consistency requirements than a 
persistent message service. Most importantly: unlike a DBMS, a PMS does 
not by itself demand the separation in primary and secondary components, 
in a persistent message service all components are primary (well at 
least that's what I have in mind, or more general: it should be up to 
the application to decide whether it requires that level of 
consistency). Do you agree?

The persistent messages of any node N can be delivered anywhere as long 
as all previous persistent messages from all past components that 
included N have been delivered there, that's my basic consistency 
requirement, plus correct order (happened-before relations) of 
persistent messages and components. Any objections?

I think I could introduce a "persistent session"-level above the 
component level. Nodes must explicitly enter a new persistent session 
whenever they detect the start of a new regular component. Each 
persistent message is contained and ordered in exactly one persistent 
session. Nodes store information about
a. persistent messages and their order within their session,
b. persistent sessions (a global unique session id and membership info) and
c. the "happened-before" relations between adjacent persistent sessions

I think it's natural to attempt to capture messages in the context of 
their component. At the end, a component and its sequence of messages is 
known to a (growing) set of servers and needs to be replicated to a 
disjunct (shrinking) set of servers. This should leave the replication 
layer with a lot less information to interchange between nodes in order 
to determine what to replicate.

Since I get no information like a global unique component identifier or 
about the directly preceeding components, I have to establish that level 
of knowledge by hand, that's why I introduce sessions above components.

At my current level of knowledge, this makes sense to me and should be 
possible to develop without the need for component-wide consensus. The 
algorithm that I have in mind would
1) guarantee that a node becomes member of a current persistent session 
as long as it can receive its own messages
2) block a node from sending out new messages until it is member of a 
persistent session and
3) queue incoming persistent messages from another node N until all 
preceeding persistent sessions that N participated in have been 
delivered to the application

I'll of course post all my thoughts here to the list, but I feel here is 
a good point to stop for now and to listen to your opinions about that.

Thanks,
Christian.