[Spread-users] Performance Issues

Mike Perik mikep at foxriver.com
Tue Dec 21 12:56:27 EST 2004


I've confirmed that all machines have 1 Gbit nic and they are full duplex:
Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: umbg
        Wake-on: d
        Current message level: 0x00000007 (7)
        Link detected: yes

I turned on all of the logging options in the spread configuration.  I see the 
same "hang" in the spread daemon as I do in my client.   When ever the client 
stops getting data the spread daemon goes into a select with a timeout.  At 
the same time I see the retransmit # go up for the server's spread daemon.  

I ran the server & client on the same machine and I don't see the 2 sec hangs.

If I adjust the Hurry_timeout.sec my hang changes accordingly.  I looked 
throught mailing list archives and found this comment by Ryan.

 The protocol layer code isn't the area of Spread that I'm most
 familiar with, but as far as I understand it, if the network is doing
 great and there aren't a lot of packets being sent/lost, the network
 leader will hold the token for Hurry_timeout.  To see the code I just
 scanned through to try to figure this out, look at
 Prot_handle_token(), To_hold_token(), and Prot_token_hurry() in
 protocol.c.

Is it possible that the client is the leader and is holding onto the token 
while it is in the select?

Here is the output of the spread daemon on the client before and after the 
hang. Notice the 4 second hang.  I _know_ the server is sending out data.

Is this a bug in the linux select?

Thanks,
Mike Perik

[Tue 21 Dec 2004 11:50:08] Events: High & Med took 0 16 time to handle
[Tue 21 Dec 2004 11:50:08] Events: Low priority took 0 2 to handle
[Tue 21 Dec 2004 11:50:08] E_handle_events: next event
[Tue 21 Dec 2004 11:50:08] Events: TimeEv's took 0 1 to handle
[Tue 21 Dec 2004 11:50:08] E_handle_events: poll select
[Tue 21 Dec 2004 11:50:08] E_handle_events: select with timeout (3, 999810) 
2d8
[Tue 21 Dec 2004 11:50:12] E_handle_events: exited select with timeout (0, 0) 
0
[Tue 21 Dec 2004 11:50:12] Events: Waiting for fd or timout took 3 998058 
asked for 3 999810
[Tue 21 Dec 2004 11:50:12] Events: High & Med took 0 39 time to handle
[Tue 21 Dec 2004 11:50:12] Events: Low priority took 0 4 to handle
[Tue 21 Dec 2004 11:50:12] E_handle_events: next event
[Tue 21 Dec 2004 11:50:12] Events: TimeEv's took 0 1 to handle
[Tue 21 Dec 2004 11:50:12] E_handle_events: poll select
[Tue 21 Dec 2004 11:50:12] E_handle_events: select with timeout (0, 1701) 2d8
[Tue 21 Dec 2004 11:50:12] E_handle_events: exited select with timeout (0, 0) 
0
[Tue 21 Dec 2004 11:50:12] Events: Waiting for fd or timout took 0 9926 asked 
for 0 1701
[Tue 21 Dec 2004 11:50:12] Events: High & Med took 0 3 time to handle
[Tue 21 Dec 2004 11:50:12] Events: Low priority took 0 2 to handle
[Tue 21 Dec 2004 11:50:12] E_handle_events: next event
[Tue 21 Dec 2004 11:50:12] E_handle_events: exec time event
[Tue 21 Dec 2004 11:50:12] Events: TimeEv is 0 8236 late
[Tue 21 Dec 2004 11:50:12] new: reusing pointer 0x93f54f0 to object type 35 
named time_event
[Tue 21 Dec 2004 11:50:12] E_queue: event queued for func 0x804b728 code 0 
data 0x0 in future (4:0)
[Tue 21 Dec 2004 11:50:12] DL_send: sent a message of 28 bytes to 
(10.0.103.141,5004) on channel 5
[Tue 21 Dec 2004 11:50:12] Prot_token_hurry: retransmiting token 12 1
[Tue 21 Dec 2004 11:50:12] dispose: disposing pointer 0x94314c0 to object type 
35 named time_event
[Tue 21 Dec 2004 11:50:12] Events: TimeEv's took 0 42 to handle
[Tue 21 Dec 2004 11:50:12] E_handle_events: poll select
   





More information about the Spread-users mailing list