[Spread-users] high cpu load

Dirk Vleugels dvl at 2scale.net
Thu Aug 30 17:07:31 EDT 2001


Hello,

On Mon, Aug 27, 2001 at 10:53:50AM -0400, Yair Amir wrote:
> Something seems wrong with your network. I think Spread should use less than 10% cpu. 
> You have way to many retransmissions. You have only about 30 packets/sec which is really
> low. Usually in speeds below thousands of packets/sec, there should almost be no retransmissions.
> I don't know why this happens on your network. 

I have still not found an explanation. There is no network
problem on a lower layer i think.

strace shows:

[.....]
recvmsg(4, {msg_name(16)={sin_family=AF_INET, sin_port=htons(4804),
sin_addr=inet_addr("192.168.100.127")}},
msg_iov(2)=[{"\200\0\17\200\177d\250\300\203E\353\1\177d\250\300\203"...,
24}, {"\1d\250\300\233\t\206;\177d\250\300\0\0\2\0\200E\353\1"...,
1448}], msg_controllen=0, msg_flags=0}, 0) = 24
sendmsg(4, {msg_name(16)={sin_family=AF_INET, sin_port=htons(4804),
sin_addr=inet_addr("192.168.100.2")}},
msg_iov(2)=[{"\200\0\0\200\1d\250\300\203E\353\1\1d\250\300\203E\353"...,
24}, {"", 0}], msg_controllen=0, msg_flags=0}, 0) = 24
gettimeofday({999204195, 343903}, {4294967176, 0}) = 0
gettimeofday({999204195, 343940}, {4294967176, 0}) = 0
gettimeofday({999204195, 343977}, {4294967176, 0}) = 0
gettimeofday({999204195, 344012}, {4294967176, 0}) = 0
select(1024, [3 4 6 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 33 34 35 36 37 38 39 42 43 44 45 46 47 48 49 50 51 52
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
77 78 79 80 81 82 83 84 85 86 88 89 90 91 92 93 94 95 96 97 98 99 100
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 118 119
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153], [], [6
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 33
34 35 36 37 38 39 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
107 108 109 110 111 112 113 114 115 116 118 119 120 121 122 123 124 125
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
144 145 146 147 148 149 150 151 152 153], {0, 0}) = 0 (Timeout)
select(1024, [3 4 6 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 33 34 35 36 37 38 39 42 43 44 45 46 47 48 49 50 51 52
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
77 78 79 80 81 82 83 84 85 86 88 89 90 91 92 93 94 95 96 97 98 99 100
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 118 119
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153], [], [6
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 33
34 35 36 37 38 39 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
84 85 86 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106
107 108 109 110 111 112 113 114 115 116 118 119 120 121 122 123 124 125
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
144 145 146 147 148 149 150 151 152 153], {1, 999891} <unfinished ...>
[.....]

Could the select on a large number of fd's slow the daemon down (it
shouldn't afaik)? In peak hours even more httpd's would be launched
(SoftLimit 1024 right now). Is the number of retransmits an RELIABLE
message issue? Assuming this is _no_ ether (100Mbit btw.) problem.

> > sent pack: 5706975      recv pack : 17722855    retrans    : 5540933
> > u retrans: 5386778      s retrans :  154155     b retrans  :       0

What means 'u retrans', 's retrans' and 'b retrans'. I find no
explanation in the user manual. It seems nearly every packet needs a
retransmit?

Puzzled,
Dirk





More information about the Spread-users mailing list