[Spread-users] Circular token over spread: 2 seconds lap time?
Andreu Moreno i Vendrell
amvendrell at yahoo.es
Wed Jul 28 13:01:33 EDT 2004
Hello,
We have changed only Hurry_timeout to 40 ms and the lap time goes to
milisecond range.
Thanks for you help.
So the desicion is to selecte the suitable value of Hurry_timeout!
Thanks,
Andreu Moreno
>The protocol layer code isn't the area of Spread that I'm most
>familiar with, but as far as I understand it, if the network is doing
>great and there aren't a lot of packets being sent/lost, the network
>leader will hold the token for Hurry_timeout. To see the code I just
>scanned through to try to figure this out, look at
>Prot_handle_token(), To_hold_token(), and Prot_token_hurry() in
>protocol.c.
>
>This is (I think) a performance decision, to avoid wasting too many
>resources rotating the token when it isn't necessary... the goal of
>the system is more throughput, than latency.
>
>The other timeouts in membership.c are definitely unrelated...
>collectively, they represent the time at which Spread assumes the
>token is lost, and the times of several phases of the daemon
>membership algorithm. You may want to update them (carefully) in
>order to improve the performance of the daemon membership algorithm on
>a low-latency network. In general, only do so proportionally.
>
>I suspect that decreasing Hurry_timeout should make your problem go
>away, although there are reasons not to do so if you're using Spread
>for real. Let the list know if this works for you.
>
>Cheers,
>Ryan
>
>
>On Tue, 27 Jul 2004 14:15:47 -0700, Steven Dake <sdake at mvista.com> wrote:
>
>
>>Gautam
>>
>>I have tried a similiar protocol (http://developer.osdl.org/dev/openais)
>>with 12 processors and find the token rotation time under full network
>>load to be about 10msec. Thus, I'd not suggest setting your token
>>timeout to lower then this value and you may even want a much larger
>>value. I settled on 100msec, although I do not support WAN
>>configurations which would warrant much larger timeouts.
>>
>>I find each node takes about 0.5msec to handle the token under full
>>network load (10MB/sec throughput on a 100mbit network, 1472 sized
>>packets) takes about 6msec for one node to send the messages and other
>>nodes to process them per token rotation.
>>
>>If spread uses the same algorithms as Yair Amir's PHD thesis suggests,
>>none of those timer values should have any effect on performance of the
>>token rotation. These timeouts are only for determining a configuration
>>and determining a faulty processor.
>>btw, I am not familiar with some of the timeouts below so I could be
>>wrong :).
>>
>>Thanks
>>-steve
>>
>>
>>
>>On Tue, 2004-07-27 at 08:19, Gautam H. Thaker wrote:
>>
>>
>>>The "2 second" value is a results of the default spread timing
>>>parameters which are:
>>>
>>>Default Spread parameters:
>>>
>>>Token_timeout.sec = 5; Token_timeout.usec = 0;
>>>Hurry_timeout.sec = 2; Hurry_timeout.usec = 0;
>>>Alive_timeout.sec = 1; Alive_timeout.usec = 0;
>>>Join_timeout.sec = 1; Join_timeout.usec = 0;
>>>Rep_timeout.sec = 2; Rep_timeout.usec = 500000;
>>>Seg_timeout.sec = 2; Seg_timeout.usec = 0;
>>>Gather_timeout.sec = 5; Gather_timeout.usec = 0;
>>>Form_timeout.sec = 5; Form_timeout.usec = 0;
>>>Lookup_timeout.sec = 60; Lookup_timeout.usec = 0;
>>>
>>>In my tests I have noted that these values results in Spread
>>>communications suffering a maximum latency of 2 seconds. When I change
>>>these parameters to values below the maximum latencies I observe are
>>>much less.
>>>
>>>"Very Fast" Spread parameters:
>>>
>>>Token_timeout.sec = 0; Token_timeout.usec = 100000;
>>>Hurry_timeout.sec = 0; Hurry_timeout.usec = 40000;
>>>Alive_timeout.sec = 0; Alive_timeout.usec = 20000;
>>>Join_timeout.sec = 0; Join_timeout.usec = 20000;
>>>Rep_timeout.sec = 0; Rep_timeout.usec = 60000;
>>>Seg_timeout.sec = 0; Seg_timeout.usec = 40000;
>>>Gather_timeout.sec = 0; Gather_timeout.usec = 100000;
>>>Form_timeout.sec = 0; Form_timeout.usec = 100000;
>>>Lookup_timeout.sec = 1; Lookup_timeout.usec = 200000;
>>>
>>>
>>>The latencies ranges observed for a variety of message sizes for these
>>>two parameter values are shown in the attached graphic. (All our test
>>>results are also available online at:
>>>
>>>http://www.atl.external.lmco.com/projects/QoS/compare/cgi-bin/left2_part1.cgi?filter=emulab.*%28spread%7Ctcp%29
>>>
>>>I was wondering if anyone has pushed Spread parameter to even much lower
>>>than "very fast" values. Certainly on Linux 2.6 kernel or on Solaris
>>>both of which have 1000 HZ clocks the lowest value of parameter should
>>>be settable at about 2 msec (rather than 20 msec in "very fast" above.)
>>>
>>>Gautam
>>>
>>>Andreu Moreno i Vendrell wrote:
>>>
>>>
>>>>Hello,
>>>>
>>>>We have 2 seconds lap time in a circular token over spread. Do you know what's
>>>>wrong?
>>>>
>>>>Test description:
>>>>
>>>>a) 3 computers in an isolated LAN: Machine 1, Machine 2 and Machine 3.
>>>>b) Spread 3.17.2 version installed in every machine.
>>>>c) RedHat 8.0 Linux installed in every machine.
>>>>d) Machine 1: runs a program that joins group "1" and on reception of a
>>>>message it sends a message to group "2".
>>>>e) Machine 2: runs a program that joins group "2" and on reception of a
>>>>message it sends a message to group "3".
>>>>f) Machine 3: runs a program that joins group "3" and on reception of a
>>>>message it sends a message to group "3". This program is the last to be
>>>>executed and also sends a message to group "1" to start the token to
>>>>circulate.
>>>>
>>>>Results:
>>>>
>>>>The lap time is about 2 seconds?????
>>>>
>>>>Thanks,
>>>>
>>>>Andreu
>>>>
>>>>
>>>>
More information about the Spread-users
mailing list