[Spread-users] high cpu load

Dirk Vleugels dvl at 2scale.net
Fri Aug 31 11:00:49 EDT 2001


Hi,

On Fri, Aug 31, 2001 at 10:29:13AM -0400, Yair Amir wrote:
> On a one segment network, unicat retrans are ALWAYS go only to the
> daemon immediately before (in a circular fashion) the one that reports
> sending the u retrans. That is why I asked to see all of the reports
> and not only from one machine.

Ok, these are status messages from all cluster members, two samples with
a delta of 10 seconds. cluster2 has a very low 'u retrans' count, i have
no clue why.

cluster1:

Status at cluster1 V 3.16. 0 (state 1, gstate 1) after 1109946 seconds :
Membership  :  5  procs in 1 segments, leader is cluster1
rounds   : 1037983201   tok_hurry : 2980781     memb change:       6
sent pack: 10469332     recv pack : 32696435    retrans    : 10040412
u retrans: 9774822      s retrans :  265590     b retrans  :       0
My_aru   :  440354      Aru       :  440354     Highest seq:  440354
Sessions :     183      Groups    :       3     Window     :      60
Deliver M: 44489748     Deliver Pk: 44640025    Pers Window:      15
Delta Mes: 44489748     Delta Pack:  440354     Delta sec  : 1109946
==================================

Monitor> Monitor: send status query

============================
Status at cluster1 V 3.16. 0 (state 1, gstate 1) after 1109956 seconds :
Membership  :  5  procs in 1 segments, leader is cluster1
rounds   : 1037995305   tok_hurry : 2980794     memb change:       6
sent pack: 10469537     recv pack : 32696830    retrans    : 10040622
u retrans: 9775032      s retrans :  265590     b retrans  :       0
My_aru   :  440951      Aru       :  440950     Highest seq:  440951
Sessions :     183      Groups    :       3     Window     :      60
Deliver M: 44490343     Deliver Pk: 44640622    Pers Window:      15
Delta Mes:     595      Delta Pack:     596     Delta sec  :      10


cluster2:

Status at cluster2 V 3.16. 0 (state 1, gstate 1) after 1110277 seconds :
Membership  :  5  procs in 1 segments, leader is cluster1
rounds   : 1038077390   tok_hurry : 2890797     memb change:       8
sent pack: 10008960     recv pack : 42422848    retrans    : 1203859
u retrans:     416      s retrans : 1203443     b retrans  :       0
My_aru   :  447204      Aru       :  447204     Highest seq:  447204
Sessions :     144      Groups    :       3     Window     :      60
Deliver M: 44496594     Deliver Pk: 44646887    Pers Window:      15
Delta Mes: 44496594     Delta Pack:  447204     Delta sec  : 1110277
==================================

Monitor> Monitor: send status query

============================
Status at cluster2 V 3.16. 0 (state 1, gstate 1) after 1110287 seconds :
Membership  :  5  procs in 1 segments, leader is cluster1
rounds   : 1038090613   tok_hurry : 2890806     memb change:       8
sent pack: 10009046     recv pack : 42423589    retrans    : 1203949
u retrans:     416      s retrans : 1203533     b retrans  :       0
My_aru   :  447919      Aru       :  447919     Highest seq:  447919
Sessions :     144      Groups    :       3     Window     :      60
Deliver M: 44497306     Deliver Pk: 44647602    Pers Window:      15
Delta Mes:     712      Delta Pack:     715     Delta sec  :      10

cluster3:

Status at cluster3 V 3.16. 0 (state 1, gstate 1) after 1110543 seconds :
Membership  :  5  procs in 1 segments, leader is cluster1
rounds   : 1038118263   tok_hurry : 2890932     memb change:      10
sent pack: 10166703     recv pack : 42931735    retrans    : 10321326
u retrans: 10197701     s retrans :  123625     b retrans  :       0
My_aru   :  449597      Aru       :  449597     Highest seq:  449597
Sessions :     128      Groups    :       3     Window     :      60
Deliver M: 44498985     Deliver Pk: 44649290    Pers Window:      15
Delta Mes: 44498985     Delta Pack:  449597     Delta sec  : 1110543
==================================

Monitor> Monitor: send status query

============================
Status at cluster3 V 3.16. 0 (state 1, gstate 1) after 1110553 seconds :
Membership  :  5  procs in 1 segments, leader is cluster1
rounds   : 1038130650   tok_hurry : 2890935     memb change:      10
sent pack: 10166712     recv pack : 42932830    retrans    : 10321499
u retrans: 10197868     s retrans :  123631     b retrans  :       0
My_aru   :  450564      Aru       :  450563     Highest seq:  450564
Sessions :     128      Groups    :       3     Window     :      60
Deliver M: 44499950     Deliver Pk: 44650257    Pers Window:      15
Delta Mes:     965      Delta Pack:     966     Delta sec  :      10

cluster4:

Status at cluster4 V 3.16. 0 (state 1, gstate 1) after 1110701 seconds :
Membership  :  5  procs in 1 segments, leader is cluster1
rounds   : 1038160146   tok_hurry : 2890996     memb change:      12
sent pack: 10426772     recv pack : 41442543    retrans    : 10214651
u retrans: 9781646      s retrans :  433005     b retrans  :       0
My_aru   :  452960      Aru       :  452960     Highest seq:  452960
Sessions :     237      Groups    :       3     Window     :      60
Deliver M: 44502348     Deliver Pk: 44652679    Pers Window:      15
Delta Mes: 44502348     Delta Pack:  452960     Delta sec  : 1110701
==================================

Monitor> Monitor: send status query

============================
Status at cluster4 V 3.16. 0 (state 1, gstate 1) after 1110711 seconds :
Membership  :  5  procs in 1 segments, leader is cluster1
rounds   : 1038171603   tok_hurry : 2890997     memb change:      12
sent pack: 10427453     recv pack : 41442982    retrans    : 10214751
u retrans: 9781746      s retrans :  433005     b retrans  :       0
My_aru   :  454007      Aru       :  454007     Highest seq:  454007
Sessions :     237      Groups    :       3     Window     :      60
Deliver M: 44503387     Deliver Pk: 44653726    Pers Window:      15
Delta Mes:    1039      Delta Pack:    1047     Delta sec  :      10

loghost:

Status at loghost V 3.16. 0 (state 1, gstate 1) after 980920 seconds :
Membership  :  5  procs in 1 segments, leader is cluster1
rounds   : 1016056425   tok_hurry : 2670049     memb change:       4
sent pack:       2      recv pack : 52600503    retrans    : 9147503
u retrans: 8805205      s retrans :  342298     b retrans  :       0
My_aru   :  455683      Aru       :  455683     Highest seq:  455683
Sessions :       1      Groups    :       3     Window     :      60
Deliver M: 44280203     Deliver Pk: 44429302    Pers Window:      15
Delta Mes: 44280203     Delta Pack:  455683     Delta sec  :  980920
==================================

Monitor> Monitor: send status query

============================
Status at loghost V 3.16. 0 (state 1, gstate 1) after 980930 seconds :
Membership  :  5  procs in 1 segments, leader is cluster1
rounds   : 1016068733   tok_hurry : 2670050     memb change:       4
sent pack:       2      recv pack : 52601860    retrans    : 9147508
u retrans: 8805206      s retrans :  342302     b retrans  :       0
My_aru   :  456573      Aru       :  456573     Highest seq:  456573
Sessions :       1      Groups    :       3     Window     :      60
Deliver M: 44281092     Deliver Pk: 44430192    Pers Window:      15
Delta Mes:     889      Delta Pack:     890     Delta sec  :      10

Flow control:

Flow Control Parameters:
------------------------

Window size:  0

        cluster1        0
        cluster2        0
        cluster3        0
        cluster4        0
        loghost 0

The system is in production, so we can't debug spread to our liking ...

> The way I see it: it is either one network card there is bad / lacking buffers
> or because of the very high CPU usage (which I don't know why it happens)
> the daemon just misses messages.

I tried to check s - r communication from any host to the loghost, the
loss ratio is very low (seldom 0.1 - 0.2 %, mostly 0%). 

> An option is to run flooder on a clean system there and is what happens.
> Now, flow control can be tuned to not loose messages even in a busy system.
> The flow control parameters in the vanilla version should be usually ok
> (e.g. conservative). I assume Spread was not changed. Do you run
> our own build or did you build it yourself?

Which flow-control settings should be tried?

Cheers,
Dirk





More information about the Spread-users mailing list