[Spread-users] write(): java.net.SocketException

Mon Nov 5 18:08:29 EST 2012

Ok, that does sound like a much more obscure case then.

AFAIK, if a user never join any groups, then Spread doesn't send messages to such a client, so you shouldn't get disconnected for not reading.  It might be good to verify this though by making such clients read whenever there is read activity, if it doesn't already.

AFAIK, Spread doesn't disconnect senders for being "too bursty" either.

It is possible that there is a bug in the protocol code, in particular the Java implementation, between the client and server.  It is possible that the server disconnects a client for what it perceives as deviation from the expected client-server protocol.

If you haven't already, then I recommend you enable the SESSION flags in your daemons' configuration's DebugFlags and try to find out exactly why the server is disconnecting these clients.  If you grep the daemons' log files for "Sess_kill", then that should show you whenever the servers disconnect a client.  From there you just need to find the kills that match up to your strange case.  You may need to augment the Alarm statements inside the daemon code to understand exactly why the server is disconnecting the client.

Cheers!

-----
John Lane Schultz
Spread Concepts LLC
Phn: 301 830 8100
Cell: 443 838 2200

On Nov 5, 2012, at 5:29 PM, Shawn Bradford wrote:

John,

We have encountered the errors where spread disconnects consumers that are not fast enough.  We have written a fair amount of code to handle this error (including flow control) and recover gracefully.  In this particular instance the client is only a producing content to be sent on spread and is not consuming any messages.  It does not join a group and does not consume any messages.  Does spread ever disconnect a producer for being too bursty? 

Your discussion on flow control is quite good and has several concepts that we have implemented.

We are surprised to see a producer being disconnected from spread.  The spread daemon and clients are all running locally (localhost) so the socket connection should only be terminated by an internal process.

Thanks in advance,
##Shawn

On Mon, Nov 5, 2012 at 1:16 PM, John Schultz <jschultz at spreadconcepts.com> wrote:
A TCP connection reset error typically indicates that the other end of the peer connection has already been closed.  In your case, the Spread daemon has likely already closed the client connection on which you are trying to send.

In Spread, this happens most often when a client isn't reading fast enough from the Server.  After a while, the message delivery queue on the daemon for that client gets too large and it kills the client connection for using too much of its memory resources.

This typically indicates that your sending applications are either (a) too bursty, (b) persistently too fast in how they send to the group or your receivers are (c) occasionally too slow / unresponsive.  The fact that this problem is intermittent hints that the problem is more likely (a) and/or (c).

To avoid this issue you need to have application level flow control such that your senders can't overwhelm your receivers (if that is compatible with your application).

Here's a primer I wrote on application level flow control a long time ago:

http://lists.spread.org/pipermail/spread-users/2002-March/000655.html

Cheers!

-----
John Lane Schultz
Spread Concepts LLC
Phn: 301 830 8100
Cell: 443 838 2200

On Nov 5, 2012, at 3:03 PM, Shawn Bradford wrote:

Hello,

We are currently using spread and have found this error occurring quite frequently. Unfortunately there is little information on write() errors to be found on the net (many more read() errors).

spread.SpreadException: write(): java.net.SocketException: Connection reset

Would someone be able to describe what would be a potential issue causing this?  I am looking for some guidance as to the source of the error (maybe from a developer) to assist in debugging the error.

We have tried to write several test apps to replicate the bug but have been unsuccessful.  Our system is quite large with many moving parts and it is unclear as to what sequence of events are causing the errors.

We are using spread 4.1 on 64 bit centos 5.5.

Thanks in advance,
##Shawn

--
*         Director Software | Mojix Inc.
  phone : +1.562.221.3474
  email : shawn.bradford at mojix.com
  web : www.mojix.com

Unless expressly identified to the contrary herein, this email and any attachments contain  and constitute confidential and  proprietary   material  and information for the sole use of the intended recipient. If you are not the intended recipient or otherwise received this e-mail in error, please (i) immediately delete this email and any attachments, print outs and copies of the foregoing and (ii) please notify me immediately by responding to this e-mail message.

_______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
http://lists.spread.org/mailman/listinfo/spread-users

_______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
http://lists.spread.org/mailman/listinfo/spread-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3805 bytes
Desc: not available
Url : http://lists.spread.org/pipermail/spread-users/attachments/20121105/d056403d/attachment.bin