[Spread-users] Need debugging advice

Jonathan Stanton jonathan at cnds.jhu.edu
Fri Oct 29 11:09:57 EDT 2004


Hi,

I would first run the spmonitor program. Select the option to display 
status information and select a few of your machines. That will report a 
table of information about how the spread daemons are functioning. 

You should see that the number of messages and packets that are being 
sent. You can also check the "state" and 'gstate' of the daemons. If 
they are not 1 and 1 then the daemons are in a membership change. If 
that lasts for longer then 10-20 seconds (under normal load) then that's 
a problem. It will also report how many daemons are in the currnet 
membership and that should total the same as the number that you think 
are running. If not then they might hav partitioned because of a 
network problem.

The other useful information is in the logs that Spread generates. If 
you have selected to log to a file in the spread.conf you can turn on 
more DebugFlags in teh spread.conf and see more detailed information. 
For example adding DATA_LINK will print out for every packet sent or 
received (which will be a log of log records if the load is at all high, 
but might show you an error that is occuring if you just run it for a 
brief time.,

Jonathan

On Fri, Oct 29, 2004 at 10:08:51AM -0400, David Avraamides wrote:
> I'm trying to diagnose a problem that just came up on our spread-based
> messaging layer. For months we have had applications running fine in
> production and yesterday I noticed some problems. It seems I can only
> see messaging traffic when the client and server are both running on the
> same box. This was never a problem before and I can't think of anything
> that changed (no new software, no new config file, etc.). I've written
> my own application-level "sniff" tool, but its not helpful since its not
> seeing any cross-machine traffic. I was wondering if there are any
> spread-level sniffing/debugging tools that could help me understand what
> might be wrong.
> 
> Thanks,
> -Dave
> 
> --
> 
> The relevant part of the config file I use is:
> 
> Spread_Segment  10.10.1.255:4803 {
>         ct-srvwebin-01
>         ct-srvmon-01
>         ct-srvapp-06
>         ct-devbuild-01
> }
> 
> Spread_Segment  10.10.2.255:4803 {
>         ct-dev-01
>         ct-dev-02
>         ct-dev-04
> }
> 
> And here is the log when I start up a daemon:
> 
> ip_init: using file: spread.access_ip
> Conf_init: using file: spread.conf
> Successfully configured Segment 0 [10.10.1.255:4803] with 4 procs:
>               ct-srvwebin-01: 10.10.1.28
>                 ct-srvmon-01: 10.10.1.37
>                 ct-srvapp-06: 10.10.1.117
>               ct-devbuild-01: 10.10.1.110
> Successfully configured Segment 1 [10.10.2.255:4803] with 3 procs:
>                    ct-dev-01: 10.10.2.20
>                    ct-dev-02: 10.10.2.41
>                    ct-dev-04: 10.10.2.50
> Finished configuration file.
> Conf_init: My name: ct-dev-01, id: 10.10.2.20, port: 4803
> Membership id is ( 168427804, 1099058653)
> --------------------
> Configuration at ct-dev-01 is:
> Num Segments 2
>         4       10.10.1.255       4803
>                 ct-srvwebin-01          10.10.1.28
>                 ct-srvmon-01            10.10.1.37
>                 ct-srvapp-06            10.10.1.117
>                 ct-devbuild-01          10.10.1.110
>         1       10.10.2.255       4803
>                 ct-dev-01               10.10.2.20
> ====================
> ++++++++++++++++++++++
> Num of groups: 3
> [1] group data with 4 members:
>         [1] #r5694-216#ct-devbuild-01
>         [2] #r7467-132#ct-srvwebin-01
>         [3] #r8958-1920#ct-srvmon-01
>         [4] #r9140-144#ct-srvwebin-01
> ----------------------
> [2] group mail with 3 members:
>         [1] #r5694-216#ct-devbuild-01
>         [2] #r8958-1920#ct-srvmon-01
>         [3] #r9140-144#ct-srvwebin-01
> ----------------------
> [3] group xbtest with 3 members:
>         [1] #r5694-216#ct-devbuild-01
>         [2] #r8958-1920#ct-srvmon-01
>         [3] #r9140-144#ct-srvwebin-01
> ----------------------
> 
> 

-- 
-------------------------------------------------------
Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science   
Johns Hopkins University    
-------------------------------------------------------




More information about the Spread-users mailing list