[Spread-users] main concern on High CPU utilization

Thu Mar 5 08:12:32 EST 2009

Hi

Server 1 (IP 192.168.4.70)
Gcs_Segment 192.168.4.72:8800{
               Node1 192.168.4.70
               Node2  192.168.4.72
         }
Server 2 (IP 192.168.4.72)
Gcs_Segment 192.168.4.70:8800{
               Node1 192.168.4.70
               Node2  192.168.4.72
         }
Above are the two given config files for individuals spread deamons

I am using the Power PC operating system (with Motorola processor) .
And I had tried all the things given in the doc of the spread doc available at the site  but not able to reduce the CPU utilization it is taking around 30%.
I am attaching the monitor snapshots too for analysis .
Could you please send me some doc (Low level Design doc/ doc for understanding of code flow ) for better understanding of the spread process.
Or Could you please suggest me some way to reduce the CPU utilization.
Waiting for your early response .

Sandeep jeevan
Member Technical(Stack)
Mob:9717892153
VNL  | 246, Phase IV, Udyog Vihar, Gurgaon, Haryana 122 015, INDIA | +91-124-4311600-609 | F +91-124-4104766 | www.vnl.in<blocked::http://www.vnl.in/>

________________________________
From: John Lane Schultz [mailto:jschultz at spreadconcepts.com]
Sent: Wednesday, March 04, 2009 9:31 PM
To: Sandeep Jeevan
Cc: spread-users at lists.spread.org
Subject: RE: [Spread-users] Problem with token sending module (main concern on High CPU utilization)

As you can see, the only difference is in the test on the Token_counter.  This function determines whether or not a ring leader should stop circulating the token.  When true, this test basically puts the system into a "dormant" mode, compared to an "active" mode that keeps the token circulating as fast as possible.

In the 4.x.x version, it will stop circulating the token after all daemons have acknowledged receiving all traffic and the token makes 1 additional circulation.  In the 3.x.x. version, it does the same but only after the token makes 100 additional circulations.

This change was made because we were getting complaints from people that the token continued to circulate for some time even when no user traffic was flowing.  So, this way there is likely less token traffic.  The drawback to doing this is that if a daemon wanted to send after the token stopped circulating but before the token_hurry timeout, then it would need to send a request for the token to the ring leader who would then begin circulating it rather than the token just coming to it automatically if it had continued circulating.  In other words, if your system has low amounts of activity, then the optimization that reduces the token traffic will likely increase the latency of sending and delivering messages.

You can change the number any way you like without harming the overall functioning of the protocol.  It will just raise or lower how aggressive Spread is in trying to optimize for low latency.  The higher the # the longer the token will continue circulating after no new traffic is injected.

Cheers!

---
John Lane Schultz
Spread Concepts LLC
Phn: 443 838 2200

________________________________
From: spread-users-bounces at lists.spread.org [mailto:spread-users-bounces at lists.spread.org] On Behalf Of Sandeep Jeevan
Sent: Wednesday, March 04, 2009 12:26 AM
To: John Lane Schultz
Cc: spread-users at lists.spread.org
Subject: Re: [Spread-users] Problem with token sending module (main concern on High CPU utilization)

Dear John

Could you please guide me for this in version 4.x.x in protocol.c

static  int To_hold_token()
{
  if( ( Memb_state() == OP ||
        ( Memb_state() == GATHER && Memb_token_alive() ) )&&
      Get_retrans(Last_token->type) <= 1      &&
      Aru == Highest_seq && Token_counter > 1 ) return ( 1 );
  else return( 0 );
}

While in version 3.x.x in protocol.c

static  int To_hold_token()
{
  if( ( Memb_state() == OP ||
        ( Memb_state() == GATHER && Memb_token_alive() ) )&&
      Get_retrans(Last_token->type) <= 1      &&
      Aru == Highest_seq && Token_counter > 100 ) return ( 1 );
  else return( 0 );
}

If I change 100 to 1 will it impact my system anyway in 3.x.x

Sandeep jeevan
Member Technical(Stack)
Mob:9717892153
VNL  | 246, Phase IV, Udyog Vihar, Gurgaon, Haryana 122 015, INDIA | +91-124-4311600-609 | F +91-124-4104766 | www.vnl.in<blocked::http://www.vnl.in/>

________________________________
From: John Lane Schultz [mailto:jschultz at spreadconcepts.com]
Sent: Tuesday, March 03, 2009 8:36 PM
To: Sandeep Jeevan; spread-users at lists.spread.org
Subject: RE: [Spread-users] Problem with token sending module (main concern on High CPU utilization)

It looks like for the first send he is retransmitting it due to a suspected token loss or another daemon's request at line 2.  On lines 7 and 27, however, this process had just received the token before he sent it on again.

The current token ring algorithm will forward the token as fast as possible so long as any of the daemons has user traffic to send to try and minimize the latency of messages.  Only if the all the daemons have no more user data to send will the ring leader then hold the token and stop passing it around.  In that case, he will hold it for the token_hurry timeout before passing it anyway for failure detection.

Cheers!

---
John Lane Schultz
Spread Concepts LLC
Phn: 443 838 2200

________________________________
From: spread-users-bounces at lists.spread.org [mailto:spread-users-bounces at lists.spread.org] On Behalf Of Sandeep Jeevan
Sent: Tuesday, March 03, 2009 1:42 AM
To: spread-users at lists.spread.org
Subject: [Spread-users] Problem with token sending module (main concern on High CPU utilization)

Following the  logs that I got after enabling the logs.

I am facing a unique problem that is after  line numbered 11 I am continuously sending token after every 1 milli second and while the token timing in the memembership.c (200000 ) could any body let me know why it happens so .

1 [Tue 03 Mar 2009 10:42:03] DL_send: sent a message of 24 bytes to (192.168.4.70,8801) on channel 5

2 [Tue 03 Mar 2009 10:42:03] Prot_token_hurry: retransmiting token 13 1

3 [Tue 03 Mar 2009 10:42:03] E_handle_events: next event

4 [Tue 03 Mar 2009 10:42:03] E_handle_events: poll select

5 [Tue 03 Mar 2009 10:42:03] E_handle_events: exec handler for fd 5, fd_type 0, priority 1

6 [Tue 03 Mar 2009 10:42:03] DL_recv: received 24 bytes on channel 5

7 [Tue 03 Mar 2009 10:42:03] Received Token

8 [Tue 03 Mar 2009 10:42:03] dispose: disposing pointer 0x1013a0d8 to object type 20 named scatter

9 [Tue 03 Mar 2009 10:42:03] dispose: disposing pointer 0x101389a0 to object type 27 named down_link

10[Tue 03 Mar 2009 10:42:03] Send_new_packets: packet 292 sent and inserted

11[Tue 03 Mar 2009 10:42:03] Net_flush_bcast: Flushing with Queued_bytes = 896; num_elements in scat = 2; size of scat0,1 = 32 864

12[Tue 03 Mar 2009 10:42:03] Net_flush_bcast Num_send_needed =0

13[Tue 03 Mar 2009 10:42:03] Net_send_token:before milli 400:Tue Mar  3 10:42:03 2009

[Tue 03 Mar 2009 10:42:03] ifndef ARCH_SCATTER_NONE ::::$$$ DL_send:sendmsg called ret =24 num_try 0

[Tue 03 Mar 2009 10:42:03] DL_send: sent a message of 24 bytes to (192.168.4.70,8801) on channel 5

[Tue 03 Mar 2009 10:42:03] dispose: disposing pointer 0x1010f3e0 to object type 35 named time_event

[Tue 03 Mar 2009 10:42:03] E_queue: dequeued a (first) simillar event

[Tue 03 Mar 2009 10:42:03] E_queue: (first) event queued func 0x1001e574 code 0 data 0x0 in future (0:200000)

[Tue 03 Mar 2009 10:42:03] dispose: disposing pointer 0x10138098 to object type 35 named time_event

[Tue 03 Mar 2009 10:42:03] E_queue: dequeued a simillar event

[Tue 03 Mar 2009 10:42:03] E_queue: event queued for func 0x10010958 code 0 data 0x0 in future (0:500000)

[Tue 03 Mar 2009 10:42:03] dispose: disposing pointer 0x1010f360 to object type 8 named token_head_obj

[Tue 03 Mar 2009 10:42:03] E_handle_events: next event

[Tue 03 Mar 2009 10:42:03] E_handle_events: poll select

[Tue 03 Mar 2009 10:42:03] E_handle_events: exec handler for fd 5, fd_type 0, priority 1

[Tue 03 Mar 2009 10:42:03] DL_recv: received 24 bytes on channel 5

[Tue 03 Mar 2009 10:42:03] Received Token

[Tue 03 Mar 2009 10:42:03] Net_send_token:before milli 400:Tue Mar  3 10:42:03 2009

[Tue 03 Mar 2009 10:42:03] ifndef ARCH_SCATTER_NONE ::::$$$ DL_send:sendmsg called ret =24 num_try 0

[Tue 03 Mar 2009 10:42:03] DL_send: sent a message of 24 bytes to (192.168.4.70,8801) on channel 5

[Tue 03 Mar 2009 10:42:03] dispose: disposing pointer 0x1010f460 to object type 35 named time_event

[Tue 03 Mar 2009 10:42:03] E_queue: dequeued a (first) simillar event

[Tue 03 Mar 2009 10:42:03] E_queue: (first) event queued func 0x1001e574 code 0 data 0x0 in future (0:200000)

[Tue 03 Mar 2009 10:42:03] dispose: disposing pointer 0x1010f3e0 to object type 35 named time_event

[Tue 03 Mar 2009 10:42:03] E_queue: dequeued a simillar event

Sandeep jeevan

Member Technical(Stack)

Mob:9717892153

VNL  | 246, Phase IV, Udyog Vihar, Gurgaon, Haryana 122 015, INDIA | +91-124-4311600-609 | F +91-124-4104766 | www.vnl.in

The information contained in this e-mail is private & confidential and may also be legally privileged. If you are not the intended recipient, please notify us, preferably by e-mail, and do not read, copy or disclose the contents of this message to anyone.

The information contained in this e-mail is private & confidential and may also be legally privileged. If you are not the intended recipient, please notify us, preferably by e-mail, and do not read, copy or disclose the contents of this message to anyone.
The information contained in this e-mail is private & confidential and may also be legally privileged. If you are not the intended recipient, please notify us, preferably by e-mail, and do not read, copy or disclose the contents of this message to anyone.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.spread.org/pipermail/spread-users/attachments/20090305/4578349e/attachment.html