[Spread-users] Gaps between message send

Heimo Zeilinger zeilinger at ict.tuwien.ac.at
Wed Nov 17 16:55:20 EST 2010


Hello!

Problem solved!
Thanks a lot for your support - one of our nodes ran on an older version of Spread. After the update the issues have been resolved.

Cheers!

-----Ursprüngliche Nachricht-----
Von: spread-users-bounces at lists.spread.org [mailto:spread-users-bounces at lists.spread.org] Im Auftrag von Heimo Zeilinger
Gesendet: Mittwoch, 17. November 2010 11:24
An: jschultz at spreadconcepts.com
Cc: spread-users at lists.spread.org
Betreff: Re: [Spread-users] Gaps between message send

Thanks a lot for your answer, even though I doubt that the reason is packet loss as all four retransmit counters of spmonitor (retrains, uretrans, sretrans, bretrans) are 0. In addition my test-setup is introduced to  a closed sub-network existing out of two daemons, 2 client processes and a switch in between .
I increased the size of the personal window with the result that the package is filled up to the defined number of messages and sent to the sink. This takes around 13ms (measured by wireshark) and afterwards it waits for 12 seconds until the token is next package is formed and sent, even though numerous messages are waiting.  I have to admit that I do not really know how these 12 seconds are calculated but the reduction of token_timeout and hurry_timeout to a fourth decrease it to 8 seconds. 

PRINT and EXIT debug flags are set in spread.conf

This results in the following output. 

Membership id is ( -1408171772, 1289983086) [Wed 17 Nov 2010 09:38:05] -------------------- [Wed 17 Nov 2010 09:38:05] Configuration at user-1 is:
[Wed 17 Nov 2010 09:38:05] Num Segments 1
[Wed 17 Nov 2010 09:38:05]      2       172.17.1.255      4803
[Wed 17 Nov 2010 09:38:05]              user-1               172.17.1.4      
[Wed 17 Nov 2010 09:38:05]              user-0               172.17.1.3      
[Wed 17 Nov 2010 09:38:05] ==================== [Wed 17 Nov 2010 09:38:05] Prot_handle_token: Token Sequence number (1073742181) approaching 2^31 so trigger membership to reset it.
[Wed 17 Nov 2010 09:38:05] Prot_handle_token: Token Sequence number (1073742181) approaching 2^31 so trigger membership to reset it.
Membership id is ( -1408171772, 1289983094) [Wed 17 Nov 2010 09:38:13] -------------------- [Wed 17 Nov 2010 09:38:13] Configuration at user-1 is:
[Wed 17 Nov 2010 09:38:13] Num Segments 1
[Wed 17 Nov 2010 09:38:13]      2       172.17.1.255      4803
[Wed 17 Nov 2010 09:38:13]              user-1               172.17.1.4      
[Wed 17 Nov 2010 09:38:13]              user-0               172.17.1.3      
[Wed 17 Nov 2010 09:38:13] ==================== [Wed 17 Nov 2010 09:38:13] Prot_handle_token: Token Sequence number (1073742096) approaching 2^31 so trigger membership to reset it.
[Wed 17 Nov 2010 09:38:13] Prot_handle_token: Token Sequence number (1073742096) approaching 2^31 so trigger membership to reset it.

As the token sequence number changes I assume that your assumption is right and the token timer expired, even though packet loss cannot be identified. 


Kind regards!
-----Ursprüngliche Nachricht-----
Von: John Schultz [mailto:jschultz at spreadconcepts.com]
Gesendet: Dienstag, 16. November 2010 21:04
An: Heimo Zeilinger
Cc: spread-users at lists.spread.org
Betreff: Re: [Spread-users] Gaps between message send

The Token_timeout defines how long a daemon will wait without receiving the token before it declares its ring dead.
The Hurry_timeout defines how quickly the token will be regenerated by the ring leader if the token doesn't seem to be circulating.

The other membership.c timeouts define how the various portions of the membership algorithm work.

The token is sent by a daemon immediately after it sends any messages it has to send.

You are probably experiencing significant loss which is causing the token to be lost.  In such a case, the system will wait until the Hurry_timeout fires and then the token will be sent from the leader again.  You can try lowering that timeout and see what happens.  However, you want the Hurry_timeout to be significantly longer than the actual time it takes for your token to circulate around the ring you have defined or you will generate pointless token overhead.

The best solution is to figure out why you experiencing such loss (assuming my hunch is correct).  You can run spmonitor and look at the status of all your daemons and if you see the "retrans" field numbers increasing much at all, then loss is very likely your problem.

Cheers!

-----
John Lane Schultz
Spread Concepts LLC
Phn: 301 830 8100
Cell: 443 838 2200

On Nov 16, 2010, at 2:41 PM, Heimo Zeilinger wrote:

Hey folks,
Thanks for your help regarding my last issues! However, time goes by and new issues rise - that's life.
Currently I am working on sending periodically transactions to a database using 2 daemon processes. When I tested the setup the first time I recognized that these messages are sent in blocks with a gap of around 12 seconds in between. As this gap occurs rather periodically I am sure that the token timing parameters are the reason for it even though I have not been able to find the correct settings, yet. I reduced the parameters in membership.c to a third and decreased the gap to 8 seconds. However, this is still quiet unsatisfying for my application. By the way, the CPU load is around 4 -6%. Two question raised for me:
 
1.        I interpret the token timers as they define the MAXIMUM time a daemon process will wait for the token. Is this correct, or do they define the time that a daemon process holds it no matter if it has already finished it tasks or not. Currently the daemon process seams to send a block of messages and afterwards it is doing more or less nothing for around 11 seconds.
2.       Is it possible to configure the daemon the way that it passes on the token to the next daemon process in the moment that it finished its task?
 
Actually I use total ordering, even though a different configuration does not take any effect.
Maybe you know an answer to the problem!
 
Kind regards
 
 
_______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
http://lists.spread.org/mailman/listinfo/spread-users


_______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
http://lists.spread.org/mailman/listinfo/spread-users




More information about the Spread-users mailing list