[Spread-users] 1225

Fri Mar 8 14:19:43 EST 2002

***WARNING***: the following is one of my infamously long emails. For 
those of you who don't know what this means, you might want to set aside 
half an hour or so to read this if the topic of flow control interests you.

Spread-Users,

I don't know how much everyone on this list knows about flow control, 
but this email is an attempt to foster discussion over how best to do 
flow control in Spread. I have heard complaints about application level 
flow control before and the general response is that "you have to do 
it." But I have yet to see any learning resources provided by 
Spread/CNDS that actually helps an application builder truly handle this 
problem. So, this email is an attempt at providing such a resource -- 
feel free to add suggestions, better algorithms, questions, critiques or 
correct any errors I make.

I think, if there is enough interest, someone (possibly I) should 
implement a flow control library that works on top of the normal Spread 
library. This library should implement the best flow control algorithm 
this discussion comes up with. Then users of Spread who need flow 
control can simply use this library and group-level, application flow 
control will no longer be a problem. Those who don't need it, won't be 
bothered by it.

I'm sure that Yair, Jonathan, Claudiu, and most of the people in the 
CNDS can propose methods at least as good as what follows...

Recommendations: First, to reduce the chances of having your application 
kicked by Spread in general, do the following:

(1) Highly Recommended: Edit spread_params.h, raise the parameter 
MAX_SESSION_MESSAGES and recompile Spread. For my applications I have 
the parameter set to 10,000 instead of the usual 1,000. This will make 
the Spread Daemon have a larger memory footprint, but it generally isn't 
that big of a deal. This will make Spread/your application more 
resilient to bursts of msgs, but won't solve any real underlying flow 
control problem.

(2) Highly Recommended: In your application, create a high priority 
thread (or an equivalent mechanism) that does nothing but call 
SP_receive and _ONLY_ queues messages that it receives for other threads 
to process. Again, this will make your program more resilient to 
burstiness, but will not solve any real underlying flow control problems.

(3) Lowly Recommended: If possible, you may want to collapse 
application-level "messages" into a smaller number of Spread messages, 
which are then unpacked at the receiver's end. Spread 3 does some 
packing already, but it only collapses msgs in the time it takes for a 
token to make one circuit around your Spread ring, which only takes a 
few milli-seconds. Your application may be able to do more compacting 
over a longer period of time. Again, this doesn't really solve your 
underlying flow control problem, it just makes you more resilient to 
burstiness.

After all of that you still may need to address real underlying flow 
control problems. I would say that most Spread applications currently 
use one of few different forms of flow control:

FC Method 1 (Hope and Pray): I know that my particular application 
doesn't generate more messages than my receiver can handle under normal 
operating conditions. Furthermore, bursts of messages won't exceed my 
applications' buffers (a combo of any internal application buffering and 
Spread buffering).

This is the current method I think you are using, and as you can see it 
doesn't work when extremely large numbers of messages are being generated.

FC Method 2 (Pipelining Senders): My sending application is a member of 
the group to which it is sending. When it needs to send, it keeps track 
of how many messages that it has sent out that it has not yet received 
back from Spread. I allow each sender to have some floating maximum 
number of outstanding messages. When it hits that maximum, it will not 
send any more messages to the group until it receives back one of its 
own outstanding msgs. At such a time, it allows itself to send the next 
message that needs to go out. If the sender (optionally) processes all 
msgs like a receiver, all the members are of approximately equal 
strength+load and your window of outstanding messages isn't too big this 
(along w/ buffering) should usually keep Spread from kicking your 
applications. This method doesn't guarantee that a strong sender won't 
run merrily along while driving a slower receiver into the ground. 
(Note: what is a "good" window size? In small groups I often use 
MAX_SESSION_MESSAGES / (2 * current # of members in the group))

This is the method used by the flooder application that comes with Spread.

FC Method 3 (Yelling Fire!): My receiving applications implement 
Recommendation 2 from above. They notice when their internal buffers are 
growing too fast and/or are getting too large. When they detect this 
condition, they send a message to the group warning all of the other 
members. This warning should make your senders slow down their sending 
or stop. You could have the senders lower their maximum outstanding msgs 
parameter if you are using the "Pipelining Senders" method or completely 
stop altogether. After the condition has cleared the member who yelled 
fire, can indicate an "All Clear," which could raise the max outstanding 
messages or allow senders to begin sending again. This method depends on 
the fact that the receiver is receiving fast enough and yells "FIRE!" 
early enough so that Spread doesn't kick it anyway.

FC Method 4 ("Real" Application Flow Control): At this level we begin 
implementing flow control that is similar to what exists between the 
Spread daemons.

One moderately simple (not great) method is for each sender to mark each 
message it sends with a local counter that it increments for each 
message it sends. Each receiver keeps a nxn ACK matrix where rows 
represent receivers, columns represent senders and entry x = (row i, col 
j) indicates that receiver i has _PROCESSED_ (not just received) up to 
msg x from sender j. Each receiver immediately updates its own row after 
each message it is done processing. Receivers will periodically, and 
possibly piggy-backed on other messages, announce their own ACK vector 
to the group. Senders will also use the "Pipelining Senders" max 
outstanding messages method.

When an application wants to send a msg, it first looks at its ACK 
matrix and determines whether any receiver is in danger of being too far 
behind from its point of view. It does this by computing the difference 
between each receiver's ACK vector and its own ACK vector (these results 
should be intelligently cached between send executions and modified only 
upon receiving ACK vectors from other receivers, or after updating its 
own ACK vector). If the difference is greater than some set threshold 
(which should be some percentage of MAX_SESSION_MESSAGES, and probably 
equal to our max outstanding window) then the send is blocked. 
Otherwise, if we still have room in our outstanding msg window, the 
sender is allowed to send.

The max outstanding window ensures that if all of the applications are 
sending at full speed that we shouldn't be overrunning receivers. The 
nxn matrix allows us to pessimistically check whether or not we believe 
we are actually overrunning receivers.

If you use only AGREED messages, this method can be improved. Rather 
than keeping ACK vectors, we only track the most recent message up to 
which we have handled (ACK singleton). Since all members will receive 
those messages in the same order we can see how far behind they are 
based off only this information.

I just thought this last method up while writing this email (although 
I'm sure its not original), so there are some details to be filled in, 
such as how to handle membership changes and the like. Also, I need to 
think more and determine whether or not this _really_ solves the flow 
control problem always, but my gut tells me that it does, because in my 
understanding it is similar to the flow control in Spread 3 (although 
that flow control is token based).

Other General Comments:

All flow control methods will depend on the fact that your application 
can be blocked from generating "application-level" messages before 
feeding them into your flow control system. If it can't block the 
generations of those messages and the average generation speed exceeds 
your processing capability, you are hosed in any case in the long run.

Finally, some of the above methods can still fail due to flow control 
failures and your application can be kicked from Spread for other myriad 
reasons (your tcp/ip link breaks for some reason, or the OS kills Spread 
due to memory consumption on the system). So regardless of the 
discussion above, you application or Spread can die at any point due to 
external influences. Therefore, recovery from crashes is an important, 
orthogonal issue to flow control. Either you application can tolerate 
missing messages, or it can recover important state from surviving 
members or some other external means.

Please comment back with suggestions, questions, interest, etc.

-- 
John Schultz
Co-Founder, Lead Engineer
D-Fusion, Inc. (http://www.d-fusion.net)
Phn: 443-838-2200 Fax: 707-885-1055

Jonathan Stanton wrote:

 > On Thu, Mar 07, 2002 at 10:47:06AM +0100, 
koorosh.alahiari at ids.allianz.com wrote:
 >
 >>Jonathan,
 >>
 >>I understand your point on flow control. I am
 >>trying to workout what is happening inside spread
 >>to account for the considerable difference in
 >>processing speed between sending & receiving.
 >>
 >
 > It isn't Spread's processing speed that matters. Spread is processing
 > everything fast enough, that is the reason you are being cut off. The
 > difference is that your receiving application runs slower then your 
sending
 > application. Part of that is that at the OS level, receiving is slightly
 > more costly then sending because of the work the kernel has to do. There
 > might also be some other subtle performance differences that matter 
when you
 > are processing messages at high speed.
 >
 > But the whole point is that Spread is delivering the messages to your
 > receivng app faster then your receiving app can process them. So the 
TCP or
 > Unix domain socket queue between spread and your app fills up and 
then after
 > buffering some more messages Spread finally gives up on the client 
and cuts
 > off it's socket. So at the time you were cut off, Spread had 
delivered about
 > 1200 messages to your connection that the receiving app had not yet
 > SP_received.
 >
 > The actual spread library code does a small amount of work on 
receive, and
 > so it also is more costly then sending by a little bit. But it will
 > generally always be true that receving is more costly then sending, which
 > is why you need flow control as you said.
 >
 >
 >>When a sender is sending spread is receiving
 >>(from the sender) that is very fast. BUT when spread
 >>is distributing the message (even to just one client)
 >>it seems to be much slower (eventhough the whole
 >>process is still v. fast).
 >>
 >
 > When all you are doing is sending, spread recevies the message from 
you and
 > then realizes it can discard it because NOONE needs it. So that is only
 > doing less then 1/2 of the work. If anyonne needs it, then the 
message has
 > to be processed more and then actually delivered to the receiver 
which will
 > take at least as long as it took to originally send it. So when both 
sending
 > and receiving apps are running it has to be slower as you are doing 
twice as
 > much work.
 >
 >
 >>Is the overhead arising from multicasting (assuming
 >>my clients don't cause it & there's no net. latency)?
 >>I would appreciate being educated on this topic.
 >>
 >
 > No, if you are only running one daemon nothing is actually being 
multicast.
 >
 > Jonathan
 >
 > P.S. You do not need to include spread-admin at lists.spread.org on the 
list of
 > recipients, the admin address is for mailing list problems not actual
 > questions, that is what the list is for.
 >
 >