[Spread-users] Lamport timestamps?

Fri Aug 24 12:20:31 EDT 2007

On 24 Aug 2007, at 4:49 pm, John Schultz wrote:

> Spread currently implements causal ordering as a total ordering
> that is causal.  Rather than a lamport timestamp, each Spread
> daemon updates a shared sequence number that defines the IDs and
> total order of messages. Only one daemon can update this sequence
> number at a time (within a given membership) when it holds the
> token and sends messages.

Interesting; I think I'm going to have to put some time aside to read
those papers on spread's implementation...

> Exposing this counter would buy users very little, as they can
> easily reconstruct the counter upon delivery.  The first delivered
> message is #1, the second is #2 and so on.  Simply reset the
> counter to zero on any membership change (and associate it with the
> membership ID).  The only tricky thing would be after transitional
> messages where there can be holes in the total order.  Anyway, let
> us know if this helps with your problem or not.

It sounds like my current approach (implementing lamport timestamps
on top of RELIABLE_MESS) is probably the way to go for now, then!

> Because total order is a stronger service than causal order, this
> implementation can cause higher latencies, both in message
> origination and in delivery dependencies, than if causal ordering
> was directly implemented.

So does choosing CAUSAL_MESS over AGREED_MESS actually gain one?
Nothing for now, but future implementations may implement CAUSAL_MESS
more efficiently?

> If one were to implement causal ordering directly in Spread, then
> we most likely would use a Directed Acyclic Graph (DAG) of causal
> dependency rather than vector timestamps.  Essentially, the DAG is
> an optimized version of vector timestamps where only the minimum
> necessary causality information is attached to each message.
>
> For example, if a daemon were to send two messages with no
> intervening receives, then the first one might have several
> dependencies listed but the second message would simply say it
> depends on the first message it just sent.  When you have losses
> you may need to recover some of the messages in the dependency
> chain, but this is true for either implementation.

Is that worth the tracking overhead, though, compared to a lamport
timestamp, which after all encodes that the state of the system
sending the message with timestamp K depeneds on all the message it's
received with timestamps less than K?

I know that suggests dependencies on messages that the recipient has
never itself received, which might perhaps cause more to be replayed
than needs be, but does that outweigh the simplicity of the
algorithm? ;-)

After all, two groups [of nodes] that mainly communicate internally
and occasionally communicate between themselves will both merrily
advance their lamport timestamps in parallel within the two groups,
until communication occurs between them, at which point it's ensured
that the receiving group ends up with a timestamp that's higher than
that of the sending group, to reflect that its state now indirectly
potentially depends on all the machinations the sending group has
been up to preceding the message send.

(on an related note, it's just occurred to me that one might, in my
case of a replicated data store, consider each record in the database
as a node, and each 'computation' (which reads some records from the
local DB, then issues one or more updates messages via Spread) as a
transient node. Each DB records has an update timestamp anyway, so
perhaps each computation could take the maximum of the timestamps of
the records it reads, add one, and use the result as the timestamp
for all the update messages it then sends. That would capture the
actual causal flow of state dependency in quite a fine-grained way,
at little cost. I wonder how much of a benefit that would be, in
practice?)

>
> Cheers!
>

Thanks for the detailed and thought-provoking reply!

ABS

--
Alaric Snell-Pym
Work: http://www.snell-systems.co.uk/
Play: http://www.snell-pym.org.uk/alaric/
Blog: http://www.snell-pym.org.uk/?author=4