[Spread-users] problem with spread/mod_log_spread/spreadlogd

Tue Sep 6 17:30:26 EDT 2005

Theo Schlossnagle wrote:

> John Schultz wrote:
>
>> Well -11 is CONNECTION_CLOSED, which just means the connection 
>> between the client and daemon has been shut down.  The most common 
>> reason for this is a flow control problem where msgs are being 
>> injected into the system faster than readers can read them out.  At 
>> some point Spread will kick the connection so that it doesn't run out 
>> of memory and kill the daemon, thus losing all of its connections.
>>
>> I'm not familiar with mod_log_spread and I don't know if it performs 
>> any kind of flow control.  If it doesn't and you are logging too fast 
>> this could cause your clients to be repeatedly disconnected (assuming 
>> they reconnect).
>
>
> mod_log_spread does no flow control what-so-ever.  spreadlogd will 
> read message from Spread as fast as it can write to disk.  So, the 
> typical reason for this sort of behaviour is that you try to journal 
> the logs from your entire cluster on an IDE system or some other slow 
> storage facility.
>
> The lack of flow control was a design decision in mod_log_spread.  In 
> otder to have time-ordered, real-time logs, you either must have no 
> flow control or you must allow the publishers to block.  In m_l_s, it 
> was decided that under no circumstances should publishers block (as 
> that would mean a slowdown in serving web pages).  If that approach 
> doesn't "jive" with your idea of logging in a web cluster, then m_l_s 
> isn't for you.
>
> (The *you* above, is of course not John, but whomever is running m_l_s)
>
Thank you for your awnser!

this is exactly what my thought were, but the problem is that the system 
that is writing the logs to disk is a scsi-320 system with hardware 
raid-1 :)

it's just writing about 100mb of logs per hour, that doesn't look that 
much to me.

Load is low on that server since it is a dedicated logserver, but I/O 
might be a problem (since that isn't always shown in pure load).
I'll hookup a raid-10 system with 15k rpm disks tomorrow and test further.
Another problem believing this conclusion is that we have another 
cluster that is performing great loging to a sata-raid array, writing 
almost 2 gigs of logs every hour, but that is another setup.

Kind regards and please give your comments since this isn't solved yet,

Jer