[Spread-users] Spread contractors in SF Bay area?

Thu Jan 10 13:40:37 EST 2002

On Wednesday, January 9, 2002, at 06:17 PM, John David Duncan wrote:

>  On Wednesday, January 9, 2002, at 05:15 PM, Tom Mornini wrote:
>
>> We've been quite careful in our implementation, and I believe we're 
>> using Spread
>> as intended. We have written a Perl OO wrapper for the supplied Perl 
>> module and
>> it could very well be doing something that is causing the problems.
>
> I am also in San Francisco having a very similar problem.  We have a 
> small custom-written perl module, and run spread for logging, and have 
> an occasional problem with the system hanging (more like every 4 or 5 
> days than every 10, unfortunately).  When this happens it becomes 
> impossible to connect using spuser, and though spmonitor will connect, 
> it will not report any message activity.  After one server is 
> restarted, it appears that all of the "held" spread messages from the 
> other servers do get delivered to spreadlogd.
>
> I've attached the perl module for comparison, but I'm pretty convinced 
> it's not a part of the problem.
>
> All of this is on Spread 3.16.1 and FreeBSD 4.4

Hey, I think I've solved our problems! Must be a West Coast thing indeed!

The funny thing in our case is that I had solved this once before!

We have two ways that we log with Spread:

1) STDIN to Spread for logging Apache access and error logs via a 
customlog pipe
2) Our own application logging system

When this problem first started, I scratched my head and realized that 
you can't just open a connection and write to it forever! Spread sends 
special membership and perhaps some other messages (been a while since I 
worked on the Spread details) to each and every mailbox each time 
someone joins and leaves a shared mailbox.

If those messages aren't read on a regular basis then surely a buffer is 
eventually going to fill up and cause some grief.

So, last night it occured to me that I had realized this and corrected 
it in #1 above, but somehow had completely missed the fact that #2 does 
exactly the same thing and has the same problem!

I looked at your code and you do the same thing.

So, here's what I do: I set the timeout value to zero, and do a receive 
for each message I send. I don't DO anything with the received messages, 
as I don't care about them, but just receiving them should tidy things 
up for the Spread daemon.

I"ve applied this to our system, and we'll know in a couple of days if 
this truly solves the problem 100%. Here's hoping!

--
-- Tom Mornini
-- eWingz Systems, Inc.
--
-- ICQ: 113526784, AOL: tmornini, Yahoo: tmornini, MSN: tmornini