[Spread-users] problem with spread/mod_log_spread/spreadlogd

John Schultz jschultz at commedia.cnds.jhu.edu
Tue Sep 6 17:43:16 EDT 2005


FYI, Spread by default will buffer up to 1,000 msgs for a connection 
before it will kick it.  The thing is that you can push Spread to pass 
over 10,000 (small) msgs per second in a LAN.  So if you have some kind of 
extreme msg spike and a reader can't keep up in that time frame, then I 
believe they can be disconnected in less than a second!

I believe you can raise this limit by modifying spread_params.h

#define MAX_SESSION_MESSAGES 1000

try raising it by an order of magnitude

#define MAX_SESSION_MESSAGES 10000

Now the potentially negative side effect of this is that your spread 
daemon will probably eat more memory on your system.  So you need to 
ensure there is memory for it to grow because you don't want the daemon to 
die due to low memory.

Good luck!

---
John Schultz
Spread Concepts
Phn: 443 838 2200

On Tue, 6 Sep 2005, Jeroen wrote:

> Theo Schlossnagle wrote:
>
>> John Schultz wrote:
>> 
>>> Well -11 is CONNECTION_CLOSED, which just means the connection between the 
>>> client and daemon has been shut down.  The most common reason for this is 
>>> a flow control problem where msgs are being injected into the system 
>>> faster than readers can read them out.  At some point Spread will kick the 
>>> connection so that it doesn't run out of memory and kill the daemon, thus 
>>> losing all of its connections.
>>> 
>>> I'm not familiar with mod_log_spread and I don't know if it performs any 
>>> kind of flow control.  If it doesn't and you are logging too fast this 
>>> could cause your clients to be repeatedly disconnected (assuming they 
>>> reconnect).
>> 
>> 
>> mod_log_spread does no flow control what-so-ever.  spreadlogd will read 
>> message from Spread as fast as it can write to disk.  So, the typical 
>> reason for this sort of behaviour is that you try to journal the logs from 
>> your entire cluster on an IDE system or some other slow storage facility.
>> 
>> The lack of flow control was a design decision in mod_log_spread.  In otder 
>> to have time-ordered, real-time logs, you either must have no flow control 
>> or you must allow the publishers to block.  In m_l_s, it was decided that 
>> under no circumstances should publishers block (as that would mean a 
>> slowdown in serving web pages).  If that approach doesn't "jive" with your 
>> idea of logging in a web cluster, then m_l_s isn't for you.
>> 
>> (The *you* above, is of course not John, but whomever is running m_l_s)
>> 
> Thank you for your awnser!
>
> this is exactly what my thought were, but the problem is that the system that 
> is writing the logs to disk is a scsi-320 system with hardware raid-1 :)
>
> it's just writing about 100mb of logs per hour, that doesn't look that much 
> to me.
>
> Load is low on that server since it is a dedicated logserver, but I/O might 
> be a problem (since that isn't always shown in pure load).
> I'll hookup a raid-10 system with 15k rpm disks tomorrow and test further.
> Another problem believing this conclusion is that we have another cluster 
> that is performing great loging to a sata-raid array, writing almost 2 gigs 
> of logs every hour, but that is another setup.
>
> Kind regards and please give your comments since this isn't solved yet,
>
> Jer
>
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
>




More information about the Spread-users mailing list