[Spread-users] Slow receiver

Anurag Gupta agupta at yahoo-inc.com
Wed Jan 28 20:10:14 EST 2004


It turned out the problem was not having localhost within the segment. The
problem was that spread daemon was not able to send messages to 127.0.0.255.
Enabling logging showed this and this helped me fix it quickly - I just put
in the IP instead of 127.0.0.255. Will it hurt?

thanks
-anurag

-----Original Message-----
From: Jonathan Stanton [mailto:jonathan at cnds.jhu.edu]
Sent: Tuesday, January 27, 2004 7:42 AM
To: Anurag Gupta
Cc: spread-users
Subject: Re: [Spread-users] Slow receiver


My first comment is I don't know if the spread.conf file you gave is
exactly what you are using -- but if it is (and hte xxx.xxx addresses are
real IP addresses) then one problem is that you should not mix the
internal localhost addresses and external ones. It doesn't make sense from
Spread's point of view.

I think your config may sort of work if you only connect from a client
running on the same machine as the daemon, but the 'client' and
'spread-daemon' machines will certainly not work.

If you are just running on one machine, then I'd try the config

Spread_Segment 127.0.0.255:4803 {
	localhost 127.0.0.1
}

If you are connecting clients from other machines or using multiple
daemons I'd do aconfig like:

Spread_Segment xxx.yyy.zzz.255:4803 {
	client   xxx.yyy.zzz.1
	spread-daemon xxx.yyy.zzz.2
}

(Obviously with real addresses. The key change is you should not include
localhost in the set of machines if you are not ONLY using localhost).

You are correct about what data_link is doing. It retries the sends on
error to overcome transient errors. The question is why is the send
failing in the first case. From the error I'm guessing it might be because
of the config file having both localhost and remote addresses.

The best thing to do is try with one of the config files listed above and
if that doesn't fix it, then turn on the DATALINK debug flag by adding

DebugFlags = { PRINT EXIT DATA_LINK }
EventLogFile = bad_sendmsg.log

Then run spread and your test program. This will generate a log file
"bad_sendmsg.log" that has every send call logged and should report more
information that might show me what the error is.

Cheers,

Jonathan

On Mon, Jan 26, 2004 at 09:33:35PM -0800, Anurag Gupta wrote:
> Thanks for your quick response.
>
> Latency of each message is ~200ms. ktracing the spread daemon (we are on
> free bsd) led us to multiple sendmsg's failling for each one succeeding. I
> think num_try is restricted to 10 in data_link.c thats why it stops after
> 200ms. When I do a dump from ktrace, I see 10 of these for each successful
> sendmsg:
>
> ==============
>  61327 spread   0.000004 RET   select 0
>  61327 spread   0.000003 CALL  sendmsg(0x5,0x80b1ae4,0)
>  61327 spread   0.000005 RET   sendmsg -1 errno 49 Can't assign requested
> address
>  61327 spread   0.000004 CALL  select(0,0,0,0,0x806a824)
> ===============
>
>
> Only uncommented lines in spread.conf are:
>
> ==========
> Spread_Segment  127.0.0.255:4803 {
>   localhost   127.0.0.1
>   client  xxx.xxx.xxx.xxx
>   spread-daemon xxx.xxx.xxx.xxx
> }
> ==========
>
> thanks
> -anurag
>
> -----Original Message-----
> From: spread-users-admin at lists.spread.org
> [mailto:spread-users-admin at lists.spread.org]On Behalf Of Jonathan
> Stanton
> Sent: Monday, January 26, 2004 8:54 PM
> To: Anurag Gupta
> Cc: spread-users
> Subject: Re: [Spread-users] Slow receiver
>
>
> Hi,
>
> I don't have quite enough information to know what is going on. Generally
> the latency of a single message send in Spread should be quite low. On
> modern machines, maybe in the neighborhood of a few hundred microseconds
> of work plus scheduling delays (switching between client, daemon, back to
> client if run on one machine) of anywhere from a few milliseconds to 30
> ms.
>
> Can you provide us with your spread.conf configuration and some more
> information about how much the receiver is lagging (what the delay is) and
> the rough computer power? How are you timing the send time vs receive
> time? Are they on different machines?
>
> One note, although it isn't necessarily affecting your results,
> on most OS's sleeping for 1ms actually sleeps for 10+ms since 10 ms is the
> minimum scheduling delay.
>
> Cheers,
>
> Jonathan
>
> On Mon, Jan 26, 2004 at 08:43:27PM -0800, Anurag Gupta wrote:
> > Hi,
> >
> > I am seeing some unusual delays in getting spread to transfer messages.
I
> > have a simple flooder sleeping 1 millis after publishing each message
> > (message size 100 bytes), and a receiver receiving those messages (no
> > processing done). Receiver is lagging behind in receiving the messages
> quite
> > a bit.
> >
> > How do I see where the delay is? Is this expected? Or a result of some
> > misconfiguration?
> >
> > regards
> > -anurag
> >
> >
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
>
> --
> -------------------------------------------------------
> Jonathan R. Stanton         jonathan at cs.jhu.edu
> Dept. of Computer Science
> Johns Hopkins University
> -------------------------------------------------------
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
>
>

--
-------------------------------------------------------
Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science
Johns Hopkins University
-------------------------------------------------------






More information about the Spread-users mailing list