[Spread-users] Spread and reliable message communication

Fri Aug 8 15:06:18 EDT 2003

[Tuvi, Selim]
> Hi, we just started experimenting with the Python extension
> module for Spread and we are having problems in reliably sending
> and receiving messages. We tested this only under a Windows XP
> environment and only under Python so bear with us. I e-mailed the
> same question to Guido van Rossum, maintainer of the Python
> extension and he seems to think the problem has nothing to do with
> the Python wrappers.

Neither do I, but then I'm one of the authors of the Python wrapper too
<wink>.

> The attached files demonstrate the problem. We read a file that
> contains chunks of binary data about 800 bytes in size. In a loop
> we send the contents (the file contains 5 chunks) 1000 times to
> the receiver. If the CPU is loaded 100% then after a while the
> receiving side stops receiving the data. It doesn't seem to drop
> packets since I am also sending the message number as a value in
> the msg_type member and the last msg_type value matches the received
> number of messages.
>
> We observed that The sending process finishes to completion and
> reports that 4072000 bytes were sent. The receiving process on
> the other hand stops after about 600-800 packets depending on the
> load. I deliberatly slowed down the receiving side by dumping the
> contents of the received messages.
>
> As far as I could read from the documentation, if the sender is
> sending faster than the receiving side can process, then the
> sender should block,

That's where you're off:  Spread leaves flow control to the application.  Do
a google search on, e.g.,

    spread-users flow control

for some good prior discussions about this.  In the app we wrote the Python
wrapper to support, we do a number of things to protect against servers
outrunning clients.  One was to build our own "chunking" layer on top of
Spread messages, to get more data sent per msg (Spread receivers balk based
on number of unread messages pending, independent of their aggregate size);
our app had a very large number of very small messages otherwise.  Another
was to implement a recovery protocol, so that when a client disconnects (for
whatever reason), it can ask the server to resend from the point of the last
message received.  Another is that, in part of the app, a thread does
nothing but call SP_receive in a loop, queueing the read messages for later
action.

All that was enough for our app, but won't be enough for all apps.  Other
techniques are discussed in the archives.  The most popular approach is one
we didn't take, namely recompiling spread after boosting its
MAX_SESSION_MESSAGES #define (which controls how far behind a Spread
receiver is willing to get).

Note that if there's a fundamental mismatch between sending rate and
receiving capacity, all of the approaches I sketched here (except for the
recovery protocol) are really just hacks -- they may delay the onset of a
problem, but can't guarantee to prevent one.  In our app they've been
delaying the onset for a couple of years, though <wink>.