[Spread-users] Thrudb - Document Oriented Database Services

Alaric Snell-Pym alaric at snell-pym.org.uk
Tue Nov 6 19:56:26 EST 2007

On 6 Nov 2007, at 10:11 pm, Jake Luciani wrote:
> Instead, a transaction container is created, the document is
> compressed and stored under that transaction id in memcached and
> the transaction id is sent. Memcached becomes kind of a network IPC.

Look out! Memcached, by default, rejects things over 1MB in size. A
large size for a compressed document, you'd think, until people start
storing image files ;-) But easily worked around - just store the
document in chunks named "transaction id:chunk number". And, of
course, memcached may choose to forget things - it is a cache, after
all. How do you handle that, out of interest? My system is designed
for records of only a few KB, so we just send them over spread, so we
don't have that problem.

> Each node receives the message and pulls the document from
> memcached, once all nodes are ready to process they message the
> transaction origionator and and says its ready to commit or its
> failed.

I'm doing something similar for my replicated database project (which
is, alas, closed source... for now... but if the powers that be
consent to changing that, I'll announce it here too); a pre-write is
broadcast and the client then listens for replies from the data
storage nodes. Each storage node attempts to provisionally perform
the write, and returns success or failure (where failure is generally
a unique key collision etc); the client waits until it has a quorate
number of yesses, without a single no, and then broadcasts a commit
message if it succeeds (if it fails, the provisional writes time out,
but there could easily be a fail message)

> also libevent is used to monitor the spread socket.

I need to look into that - I wanted things to be able to time out, so
I wrote a little subprocess that sends a message-per-second heartbeat
to a special spread group, so that things that wish to time out can
join that group and then be sure that SP_receive will not block for
more than a second. I have my nodes perform certain administrative
tasks once per second, so it's never an unnecessary wakeup call.

The heartbeat is generated uniquely in a failsafe manner by the
highest-lexicographically-ordered member of the heartbeat-senders
group being the one that sends them, and the other heartbeat
processes just watching heartbeat-senders for a membership change
that makes THEM the highest-lexicographically-ordered member.

> -Jake


Alaric Snell-Pym
Work: http://www.snell-systems.co.uk/
Play: http://www.snell-pym.org.uk/alaric/
Blog: http://www.snell-pym.org.uk/?author=4

More information about the Spread-users mailing list