[Spread-users] Retransmission and Connection Problems
Yair Amir
yairamir at cnds.jhu.edu
Thu Oct 2 18:46:51 EDT 2003
Hi Jeremy,
It seems that your broadcast address does not work correctly on all of
the machines. This is why I am sure it works with any two machines and
will not work for you with three or more machines.
You could use a multicast address instead of the 10.1.255.255 (which
does not work for you) - such as 225.10.1.4. That would probably
work for you. Otherwise, make sure all of the machines are set
correctly with the broadcast address.
There were several similar cases that were discussed on the mailing
list in the past.
:) Yair.
On Thursday, October 02, 2003 6:29 PM
Jeremy McDermond mcdermj at peak.org wrote:
Jeremy> I feel kinda like an idiot, but I've been fighting these problems for a
Jeremy> few days now, and feel like I'm at the end of my rope. I am attempting
Jeremy> to use spread for apache logging using mod_log_spread and spreadlogd.
Jeremy> I have 3 machines in the spread ring, two webservers, and a log server.
Jeremy> My spread.conf on all 3 machines looks like:
Jeremy> DebugFlags = { PRINT EXIT }
Jeremy> EventLogFile = /var/log/spread.log
Jeremy> EventTimeStamp
Jeremy> DangerousMonitor = false
Jeremy> SocketPortReuse = AUTO
Jeremy> RuntimeDir = /var/run/spread
Jeremy> DaemonUser = spread
Jeremy> DaemonGroup = spread
Jeremy> RequiredAuthMethods = "NULL"
Jeremy> AllowedAuthMethods = "NULL"
Jeremy> #Set the current access control policy.
Jeremy> # This is only needed if you want to establish a customized policy.
Jeremy> # The default policy is to allow any actions by authenticated clients.
Jeremy> AccessControlPolicy = "PERMIT"
Jeremy> Spread_Segment 10.1.255.255:3333 {
Jeremy> a.monitor.peak.org 10.1.255.253
Jeremy> a.www.peak.org 10.1.4.1
Jeremy> b.www.peak.org 10.1.4.2
Jeremy> sysadmin01.peak.org 10.1.255.254
Jeremy> }
Jeremy> There's an extra machine in there that we use for test log catching
Jeremy> every once in a while, but I'm not putting it in the ring right now.
Jeremy> The first issue I have is that the apache servers seem to not be able
Jeremy> to contact their local spread daemon on a UNIX socket. I get
Jeremy> "connection refused" errors in the error log:
Jeremy> Thu Oct 2 22:16:53 2003] [error] (61)Connection refused: Could not
Jeremy> connect to spread 3333 with private_name ap15031. Error -2
Jeremy> The socket (/tmp/3333) seems to be there and available, and I can
Jeremy> connect with 'spuser -r -s 3333' to the ring.
Jeremy> The second issue is that, on the off chance I can get apache to
Jeremy> connect, I get tons of retransmissions on the ring as measured using
Jeremy> spmonitor. It also seems to impact the timely delivery of the log
Jeremy> records to the log writers. Even with a completely idle ring, I'm
Jeremy> seeing retransmissions of 601, when there's 258 sent packets:
Jeremy> ============================
Jeremy> Status at b.www.peak.org V 3.17. 1 (state 1, gstate 1) after 830
Jeremy> seconds :
Jeremy> Membership : 3 procs in 1 segments, leader is 167903229
Jeremy> rounds : 327621 tok_hurry : 714 memb change: 3
Jeremy> sent pack: 260 recv pack : 346 retrans : 609
Jeremy> u retrans: 268 s retrans : 341 b retrans : 0
Jeremy> My_aru : 612 Aru : 612 Highest seq: 612
Jeremy> Sessions : 68 Groups : 0 Window : 60
Jeremy> Deliver M: 616 Deliver Pk: 621 Pers Window: 15
Jeremy> Delta Mes: 7 Delta Pack: 0 Delta sec : 114
Jeremy> ==================================
Jeremy> I'm kinda at the end of my rope, I've checked the archives of this
Jeremy> list, and haven't found anything really useful. I'm running FreeBSD
Jeremy> 5.1R on Intel Xeon procs. Anyone have any ideas?
Jeremy> --
Jeremy> Jeremy C. McDermond
Jeremy> mcdermj at peak.org
Jeremy> Lead Engineer
Jeremy> Peak Internet, LLC
Jeremy> (541) 738-4921
Jeremy> _______________________________________________
Jeremy> Spread-users mailing list
Jeremy> Spread-users at lists.spread.org
Jeremy> http://lists.spread.org/mailman/listinfo/spread-users
More information about the Spread-users
mailing list