Hey Alaric,<br><br>Thanks for the heads up! I'll have to add the chunking. <br><br>I'm using memcached knowing it forgets things BUT since writes happen right away and since the system can handle errors gracefully I think its a decent solution.
<br><br>Empirically, I've bombarded the system with gigs of data with no hiccups.<br><br><br>-Jake<br><br><div class="gmail_quote">On Nov 6, 2007 7:56 PM, Alaric Snell-Pym <<a href="mailto:alaric@snell-pym.org.uk">
alaric@snell-pym.org.uk</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>On 6 Nov 2007, at 10:11 pm, Jake Luciani wrote:
<br>><br>> Instead, a transaction container is created, the document is<br>> compressed and stored under that transaction id in memcached and<br>> the transaction id is sent. Memcached becomes kind of a network IPC.
<br>><br><br></div>Look out! Memcached, by default, rejects things over 1MB in size. A<br>large size for a compressed document, you'd think, until people start<br>storing image files ;-) But easily worked around - just store the
<br>document in chunks named "transaction id:chunk number". And, of<br>course, memcached may choose to forget things - it is a cache, after<br>all. How do you handle that, out of interest? My system is designed<br>
for records of only a few KB, so we just send them over spread, so we<br>don't have that problem.<br><div class="Ih2E3d"><br>> Each node receives the message and pulls the document from<br>> memcached, once all nodes are ready to process they message the
<br>> transaction origionator and and says its ready to commit or its<br>> failed.<br><br></div>I'm doing something similar for my replicated database project (which<br>is, alas, closed source... for now... but if the powers that be
<br>consent to changing that, I'll announce it here too); a pre-write is<br>broadcast and the client then listens for replies from the data<br>storage nodes. Each storage node attempts to provisionally perform<br>the write, and returns success or failure (where failure is generally
<br>a unique key collision etc); the client waits until it has a quorate<br>number of yesses, without a single no, and then broadcasts a commit<br>message if it succeeds (if it fails, the provisional writes time out,<br>but there could easily be a fail message)
<br><div class="Ih2E3d"><br>> also libevent is used to monitor the spread socket.<br><br></div>I need to look into that - I wanted things to be able to time out, so<br>I wrote a little subprocess that sends a message-per-second heartbeat
<br>to a special spread group, so that things that wish to time out can<br>join that group and then be sure that SP_receive will not block for<br>more than a second. I have my nodes perform certain administrative<br>tasks once per second, so it's never an unnecessary wakeup call.
<br><br>The heartbeat is generated uniquely in a failsafe manner by the<br>highest-lexicographically-ordered member of the heartbeat-senders<br>group being the one that sends them, and the other heartbeat<br>processes just watching heartbeat-senders for a membership change
<br>that makes THEM the highest-lexicographically-ordered member.<br><br>><br>> -Jake<br>><br><br>ABS<br><font color="#888888"><br>--<br>Alaric Snell-Pym<br>Work: <a href="http://www.snell-systems.co.uk/" target="_blank">
http://www.snell-systems.co.uk/</a><br>Play: <a href="http://www.snell-pym.org.uk/alaric/" target="_blank">http://www.snell-pym.org.uk/alaric/</a><br>Blog: <a href="http://www.snell-pym.org.uk/?author=4" target="_blank">http://www.snell-pym.org.uk/?author=4
</a><br><br><br></font></blockquote></div><br>