Hey Alaric, Thanks for the heads up! I&#39;ll have to add the chunking.&nbsp;  I&#39;m using memcached knowing it forgets things BUT since writes happen right away and since the system can handle errors gracefully I think its a decent solution.

<br><br>Empirically, I&#39;ve bombarded the system with gigs of data with no hiccups.<br><br><br>-Jake<br><br><div class="gmail_quote">On Nov 6, 2007 7:56 PM, Alaric Snell-Pym &lt;<a href="mailto:alaric@snell-pym.org.uk">

alaric@snell-pym.org.uk</a>&gt; wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>On 6 Nov 2007, at 10:11 pm, Jake Luciani wrote:

&gt; &gt; Instead, a transaction container is created, the document is &gt; compressed and stored under that transaction id in memcached and &gt; the transaction id is sent. Memcached becomes kind of a network IPC.

<br>&gt;<br><br></div>Look out! Memcached, by default, rejects things over 1MB in size. A<br>large size for a compressed document, you&#39;d think, until people start<br>storing image files ;-) But easily worked around - just store the

<br>document in chunks named &quot;transaction id:chunk number&quot;. And, of<br>course, memcached may choose to forget things - it is a cache, after<br>all. How do you handle that, out of interest? My system is designed<br>

for records of only a few KB, so we just send them over spread, so we<br>don&#39;t have that problem.<br><div class="Ih2E3d"><br>&gt; Each node receives the message and pulls the document from<br>&gt; memcached, once all nodes are ready to process they message the

<br>&gt; transaction origionator and and says its ready to commit or its<br>&gt; failed.<br><br></div>I&#39;m doing something similar for my replicated database project (which<br>is, alas, closed source... for now... but if the powers that be

<br>consent to changing that, I&#39;ll announce it here too); a pre-write is<br>broadcast and the client then listens for replies from the data<br>storage nodes. Each storage node attempts to provisionally perform<br>the write, and returns success or failure (where failure is generally

<br>a unique key collision etc); the client waits until it has a quorate<br>number of yesses, without a single no, and then broadcasts a commit<br>message if it succeeds (if it fails, the provisional writes time out,<br>but there could easily be a fail message)

<br><div class="Ih2E3d"><br>&gt; also libevent is used to monitor the spread socket.<br><br></div>I need to look into that - I wanted things to be able to time out, so<br>I wrote a little subprocess that sends a message-per-second heartbeat

to a special spread group, so that things that wish to time out can join that group and then be sure that SP_receive will not block for more than a second. I have my nodes perform certain administrative tasks once per second, so it&#39;s never an unnecessary wakeup call.

<br><br>The heartbeat is generated uniquely in a failsafe manner by the<br>highest-lexicographically-ordered member of the heartbeat-senders<br>group being the one that sends them, and the other heartbeat<br>processes just watching heartbeat-senders for a membership change

<br>that makes THEM the highest-lexicographically-ordered member.<br><br>&gt;<br>&gt; -Jake<br>&gt;<br><br>ABS<br><font color="#888888"><br>--<br>Alaric Snell-Pym<br>Work: <a href="http://www.snell-systems.co.uk/" target="_blank">

http://www.snell-systems.co.uk/</a><br>Play: <a href="http://www.snell-pym.org.uk/alaric/" target="_blank">http://www.snell-pym.org.uk/alaric/</a><br>Blog: <a href="http://www.snell-pym.org.uk/?author=4" target="_blank">http://www.snell-pym.org.uk/?author=4

</a><br><br><br></font></blockquote></div><br>