[Spread-users] Daemon deleting client's un-forwarded messages on client disconnect?

Fri Jun 13 08:47:09 EDT 2008

Daniel Marques wrote:
> Observing the behavior of my application, it appears that when a
> client disconnects from the daemon, the messages it has sent but that
> have not yet been forwarded off the daemon are deleted and then never
> forwarded.  A quick (but by no means informed) study of the daemon's
> source (specifically, daemon/session.c:1809) seem to confirm that this
> is by design.
> 
> I'm asking if this is indeed the case, and if so, why (for
> informational purposes) this is the desired behavior, and how one
> could ensure that the messages are indeed forwarded (for practical
> purposes)?

One assumes that the client will block for SAFE_MESS messages until all
clients listening to that channel have confirmed that they correctly
received the message in a given order, before it returns. Therefore, the
client can ensure that atomically all servers have received its message,
and remove from a local queue (eg. local transaction log).

Since messages should be rejected if the sender dies before the message
reaches all hosts (as per two-phase commit semantics), then it seems
sensible to save bandwidth if the message can never commit by removing
anything in the outgoing buffer.

Of course, this behaviour might not be necessary for low reliability
messages, such as those marked UNRELIABLE_MESS, but it adds inconsistent
behaviour if only some of the messages are to be removed from the buffer
in this situation.

The standard way to handle this would be to have a local transaction log
where messages are written to before calling the spread call, then
removed once it returns, with clients detecting previous failure by the
fact that there are items in the transaction log, and sending them on
before any new messages.

If the overhead of this is too complex for most clients (which want to
quickly send a message then return and not care about return results),
then you should consider having a local daemon as an intermediary
maintaining a machine-specific transaction log and writing items out to
spread as it can - sending a message to a local socket should be fast
enough for the most demanding of situations.

Hope this is all correct - I'm attempting to defend spread's behaviour
based on common ideas about distributed systems.

Jeremy

P.S. Would love to use spread, but cannot because of the current
licence. I see Juergen's message on the 5th May regarding LGPL
difficulties was never answered - are there any plans to sort this out?
In the meantime, we'll have to unfortunately keep rolling our own
solution and not be able to give any development towards this project.