[Spread-users] More Performance Issues/Questions

Mike Perik mikep at foxriver.com
Wed Jan 19 11:48:13 EST 2005


Has any light been shed on what is happening with this situation? 
Just a note, this problem is reproducible on two machines that have show no  
packet loss between them when using spsend/sprecv.  That is why I suspect a 
problem/bug in Spread.

I've used spsend and sprecv on the set of machines I was using.   What should 
be expected? Zero loss?  Some percentage?


Most machines I  only see 1 packet lost, but I believe that is a bug.  The 
sender starts at count = 1 and the receiver starts at count = 0.  One machine 
does have a large amount of packet loss but I'm avoiding that machine and I'm 
talking to my admins about it, it may be a bad card.

We are running on a Linux infrastructure (Redhat EL2-4).  Has anyone else seen 
this behavior on Linux?

Thanks,
Mike


On Wednesday 22 December 2004 09:25 pm, Ryan Caudy wrote:
> The issue of whether the hurry messages work correctly has come up
> before, as cited in Yair's e-mail linked below, and a quote from one
> of my own e-mails.  I'll try to see if I can find a problem, when I
> have time next week.
>
> Cheers,
> Ryan
>
> On Wed, 22 Dec 2004 12:23:08 -0500, Yair Amir <yairamir at cnds.jhu.edu> wrote:
> > Mike,
> >
> > In my opinion, there are no side effects of tokens, leaders or anything
> > like what you are describing, in the Spread protocols. Your previous
> > e-mails describing how the protocol works do not reflect the algorithms
> > or their implementation.
> >
> > In my opinion the only issue with your latency is your network loosing
> > packets, especially on one machine. The token is lost with some
> > probability for this machine (as any other udp message), and thanks to
> > the Spread protocol, you do not feel this beyond a token_hurry latency
> > for some of the messages as Spread is recovering from that as part of its
> > basic protocol. When you reduce the latency for hurry_timeout, you just
> > make Spread more aggressive and this compensates for your network
> > problem.
> >
> > You could check your network udp losses directly using spsend and sprecv
> > that are provided in the Spread package.
> >
> > If you want to use Spread and are not happy with the latency then either
> > fix your network, or make Spread more aggressive by lowering the
> > hurry_timeout. I really don't know how to help you beyond this.
> >
> > Cheers,
> >
> >         :) Yair.
> >
> > Mike Perik wrote:
> > > Shouldn't the leader give up the token before going into the select?
> > >
> > > What's the purpose of the leader?
> > >
> > > Is this a bug or just a side effect of the implementation?
> > >
> > > Seems to me that this should be documented especially for situations
> > > where you have 1 talker and many listeners.  The leader needs to the be
> > > talker. Couldn't there be some kind of agreement made in the ring that
> > > whoever is talking a lot becomes the leader?  Or the leader could
> > > determine that someone else out there is doing the talking and I'm not
> > > so I'll give up the token a little quicker.
> > >
> > > Thanks,
> > > Mike
> > >
> > > On Tuesday 21 December 2004 01:27 pm, Mike Perik wrote:
> > >>Ok,  I think I've found the problem.
> > >>
> > >>In the spread.conf I had two machines.  The leader is always the first
> > >>machine.  The leader is the one who holds onto the token and he'll hold
> > >>onto the token for the Hurry_timeout.   Since the first machine in the
> > >>configuration file is the client machine he holds onto it for
> > >> Hurry_timeout seconds.  It goes into a select with the Hurry_timeout
> > >> and since the server/publisher is waiting for the token to publish the
> > >> client waits the whole time (Hurry_timeout or 2 seconds by default)
> > >> since there is nothing to read.  I'm assuming the server queues  up
> > >> all the messages that are being sent and when it gets the token it
> > >> sends them all.
> > >>
> > >>I switched the order of the two machines in the configuration file
> > >> around and the problem essentially went away.
> > >>
> > >>If I'm correct on how this is working, I have a couple of questions?
> > >>
> > >>What if I have two servers that are publishing data at a high rate and
> > >>neither of them are the "leader"?
> > >>What kind of delay is this going to cause?
> > >>If I have 20-30 spread daemons in my segment how much additional
> > >> latency is there going to be?
> > >>
> > >>I believe this is why the spmonitor shows the "last" which was the
> > >> server having a high number of retransmits.
> > >>
> > >>Is this a known issue?
> > >>
> > >>How would I best design my system around this problem?
> > >>
> > >>Thanks,
> > >>Mike
> > >>
> > >>_______________________________________________
> > >>Spread-users mailing list
> > >>Spread-users at lists.spread.org
> > >>http://lists.spread.org/mailman/listinfo/spread-users
> > >
> > > _______________________________________________
> > > Spread-users mailing list
> > > Spread-users at lists.spread.org
> > > http://lists.spread.org/mailman/listinfo/spread-users
> >
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users




More information about the Spread-users mailing list