[Spread-users] lost flooder messages

Yair Amir yairamir at cnds.jhu.edu
Fri May 9 16:31:42 EDT 2003


Hi Kelvin,

I just look at the flooder program and see that it is
not designed to wait until it receives all of the messages it sent
in the case where the flooder sends AND receives at the same time.
It could be modified to do that though.

>From a quick scan of the current code I see that in such a case the first 200
messages are sent without receiving any messages and after that every
message sent is followed by at least one message received, exactly one
which is sent by the same flooder.
So it is very possible that the last few messages will not be received
before flooder exits. I did not look at this code for a while.
Somehow I vaguely recall that when written, flooder was designed to receive
all the messages it sent (but this portion of the code seems missing in
the current version 3.17.0).

In general, flooder is not a program that aims to demonstrate that
Spread does not lose messages - just to benchmark its performance.
Let us know what you are trying to do so that we can understand
how best to help.

     Cheers,

     :) Yair.
     
On Friday, May 09, 2003 3:36 PM
Kelvin Fedrick Kelvin.Fedrick at noaa.gov wrote:



Kelvin> John Schultz wrote:

>> Could it be that Spread sometimes detects that you have left (broken
>> pipe) before it actually sends your msgs out to the other daemons and
>> therefore discards your msgs? This would explain all of the symptoms. On
>> what kind of system are you running? On what kind of system is the
>> daemon running? What kind of communication are you using to the daemon
>> (TCP/IP remote or UNIX domain socket to local)?

Kelvin> We've experienced this in several different configuration including:

Kelvin>    - single daemon on Linux w/clients presumably using domain sockets (whatever
Kelvin> the standard test clients use)
Kelvin>    - single daemon on Linux w/remote client from either same Linux box, different
Kelvin> Linux box, or Windows XP
Kelvin>    - two daemons on different Linux boxes each with local client

>>
>>
>> BTW, Spread discarding your msgs like this, though not desirable, is
>> allowed by the safety and liveness properties of the system. To gurantee
>> that all of your messages are actually sent in the system, the sender
>> must stay in the group until it receives back its own messages. You can
>> do this by making your flooder both a sender and a receiver.

Kelvin> I'm not sure I understand what you're saying. In spread, a sender need not even
Kelvin> be a group member to multicast to it, so why would the sender be required to
Kelvin> stay in the group if it happened to be a member for a delivery guarantee? Also,
Kelvin> the default flooder is both a sender a receiver and that is exactly when the
Kelvin> problem
Kelvin> occurs. It has not occurred when I use the -wo write-only flag. I would assume
Kelvin> (naively perhaps) that if SP_multicast returns with no error, the daemon should
Kelvin> have the message and deliver it no matter what then happens to the sender.

>>

>>
>>
>> John
>>
>> Kelvin Fedrick wrote:
>>
>> > I just tried it and it doesn't seem to help.
>> >
>> > ./spflooder -m 500 send 500 messages with a sequence number and the spuser
>> > only received message 1 - 498. The number of messages received is variable;
>> > occassionally it gets them all but there are usually a few missing (e.g.
>> > 4994 of 5000
>> > were received on a run I just made).
>> >
>> > Kelvin
>> >
>> >
>> > Joshua Goodall wrote:
>> >
>> >     On Thu, May 08, 2003 at 02:44:37PM -0600, Kelvin Fedrick wrote:
>> >      > I saw a few previous post on this from July 2002, but I never saw a
>> >      > definitive resolutions. We've experienced the same problem. We
>> >      > modified the spflooder to send a message sequence number and find
>> >      > that often a small percentage of messages at the end are never
>> >     delivered
>> >      >
>> >      > (e.g. 400 sent but only 398 delivered).
>> >      > Placing a 1 second sleep at the end of the flooder program
>> >     main(), just
>> >      > before exit seems to fix this. Also I haven't been able to
>> >     reproduce it
>> >      > so
>> >      > far running spflooder as write-only. Any ideas?
>> >
>> >     Instead of a sleep, does adding a SP_disconnect(Mbox) also fix it
>> >     for you?
>> >
>> >     J
>> >
>> >     --
>> >     Joshua Goodall                                      "tea makes itself"
>> >     joshua at roughtrade.net                                       - Ana Susanj
>> >
>>
>> _______________________________________________
>> Spread-users mailing list
>> Spread-users at lists.spread.org
>> http://lists.spread.org/mailman/listinfo/spread-users





More information about the Spread-users mailing list