[Spread-users] Need debugging advice

David Avraamides David.Avraamides at SevernRiverCapital.com
Fri Oct 29 11:47:56 EDT 2004


Is there a win32 build of spmonitor? I don't see it in the 3.17.1
distribution. 

-----Original Message-----
From: Jonathan Stanton [mailto:jonathan at cnds.jhu.edu] 
Sent: Friday, October 29, 2004 11:10 AM
To: David Avraamides
Cc: spread-users at lists.spread.org
Subject: Re: [Spread-users] Need debugging advice

Hi,

I would first run the spmonitor program. Select the option to display
status information and select a few of your machines. That will report a
table of information about how the spread daemons are functioning. 

You should see that the number of messages and packets that are being
sent. You can also check the "state" and 'gstate' of the daemons. If
they are not 1 and 1 then the daemons are in a membership change. If
that lasts for longer then 10-20 seconds (under normal load) then that's
a problem. It will also report how many daemons are in the currnet
membership and that should total the same as the number that you think
are running. If not then they might hav partitioned because of a network
problem.

The other useful information is in the logs that Spread generates. If
you have selected to log to a file in the spread.conf you can turn on
more DebugFlags in teh spread.conf and see more detailed information. 
For example adding DATA_LINK will print out for every packet sent or
received (which will be a log of log records if the load is at all high,
but might show you an error that is occuring if you just run it for a
brief time.,

Jonathan

On Fri, Oct 29, 2004 at 10:08:51AM -0400, David Avraamides wrote:
> I'm trying to diagnose a problem that just came up on our spread-based

> messaging layer. For months we have had applications running fine in 
> production and yesterday I noticed some problems. It seems I can only 
> see messaging traffic when the client and server are both running on 
> the same box. This was never a problem before and I can't think of 
> anything that changed (no new software, no new config file, etc.). 
> I've written my own application-level "sniff" tool, but its not 
> helpful since its not seeing any cross-machine traffic. I was 
> wondering if there are any spread-level sniffing/debugging tools that 
> could help me understand what might be wrong.
> 
> Thanks,
> -Dave
> 
> --
> 
> The relevant part of the config file I use is:
> 
> Spread_Segment  10.10.1.255:4803 {
>         ct-srvwebin-01
>         ct-srvmon-01
>         ct-srvapp-06
>         ct-devbuild-01
> }
> 
> Spread_Segment  10.10.2.255:4803 {
>         ct-dev-01
>         ct-dev-02
>         ct-dev-04
> }
> 
> And here is the log when I start up a daemon:
> 
> ip_init: using file: spread.access_ip
> Conf_init: using file: spread.conf
> Successfully configured Segment 0 [10.10.1.255:4803] with 4 procs:
>               ct-srvwebin-01: 10.10.1.28
>                 ct-srvmon-01: 10.10.1.37
>                 ct-srvapp-06: 10.10.1.117
>               ct-devbuild-01: 10.10.1.110 Successfully configured 
> Segment 1 [10.10.2.255:4803] with 3 procs:
>                    ct-dev-01: 10.10.2.20
>                    ct-dev-02: 10.10.2.41
>                    ct-dev-04: 10.10.2.50 Finished configuration file.
> Conf_init: My name: ct-dev-01, id: 10.10.2.20, port: 4803 Membership 
> id is ( 168427804, 1099058653)
> --------------------
> Configuration at ct-dev-01 is:
> Num Segments 2
>         4       10.10.1.255       4803
>                 ct-srvwebin-01          10.10.1.28
>                 ct-srvmon-01            10.10.1.37
>                 ct-srvapp-06            10.10.1.117
>                 ct-devbuild-01          10.10.1.110
>         1       10.10.2.255       4803
>                 ct-dev-01               10.10.2.20
> ====================
> ++++++++++++++++++++++
> Num of groups: 3
> [1] group data with 4 members:
>         [1] #r5694-216#ct-devbuild-01
>         [2] #r7467-132#ct-srvwebin-01
>         [3] #r8958-1920#ct-srvmon-01
>         [4] #r9140-144#ct-srvwebin-01
> ----------------------
> [2] group mail with 3 members:
>         [1] #r5694-216#ct-devbuild-01
>         [2] #r8958-1920#ct-srvmon-01
>         [3] #r9140-144#ct-srvwebin-01
> ----------------------
> [3] group xbtest with 3 members:
>         [1] #r5694-216#ct-devbuild-01
>         [2] #r8958-1920#ct-srvmon-01
>         [3] #r9140-144#ct-srvwebin-01
> ----------------------
> 
> 

--
-------------------------------------------------------
Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science   
Johns Hopkins University    
-------------------------------------------------------





More information about the Spread-users mailing list