[Spread-users] write(): java.net.SocketException

Yair Amir yairamir at cs.jhu.edu
Mon Nov 12 07:12:56 EST 2012


Dear Mel,

It seems to me that this is an artifact of sockets.

If the connection is broken (for example, because the daemon has restarted), the
write-only process should find out about this only when they try to send.
In such a case where the connection is broken, the return code of the next SP_multicast
should indicate the appropriate error. If this is not the case, we need to check
the library code. I am pretty sure this is the case, but maybe there are issues
in some operating systems.

Cheers,

	:) Yair.

On 11/12/12 4:19 AM, Melissa Jenkins wrote:
> This is completely true - we used to have write only clients.  (Ones that didn't join a group)
>
> However, in my experience, they would run into all sorts of problems the moment there was any kind of network issue.  In particular they would never report a drop of the network connection and your messages would simply vanish.  Doing an occasional read resolved this completely and made the write-only clients much more stable.
>
> My guess is there is a difference between the read and write paths and the write path doesn't manage the connection in the same fashion as the read path.
>
> Due to time pressure I never looked at the Spread library code to confirm this as it was easy enough to just read occasionally.  I also didn't spend any time identifying what types of failure caused these problems, though restarting the daemon was a known culprit.  With the reads you can detect a loss of connection reliably and reconnect.
>
> Mel
>
>
> On 12 Nov 2012, at 01:40, Yair Amir <yairamir at cs.jhu.edu> wrote:
>
>> Shawn,
>>
>> I am not sure I understand your e-mail.
>>
>> Messages are not born out of thin air. If your application sends a process messages
>> then yes - that process will receive the messages and will need to read them.
>>
>> However, if in a design of an application, a certain process is never supposed to receive
>> messages, then it will not, and there will not be a need to read (assuming it does not join
>> groups for example).
>>
>> It is correct that each connection to Spread also has a private group that can be used
>> to multicast a message directly to that connection (process). But some process has to send
>> such messages specifically to that private group. Right?
>>
>> You should also know that a process does not need to be a member of a group in order to send
>> to a group. Any process can send to any group regardless if it is a member of that group.
>>
>> Cheers,
>>
>> 	:) Yair.
>>
>> On 11/11/12 7:47 PM, Shawn Bradford wrote:
>>>
>>> We have finally got to the bottom of the issue.  Fundamentally, a write only client does not exist.  If you are connected to spread any client can send a message to any other client directly (non-multicast).  This also explains the proffered remedies :
>>>    - Adding a process to read messages (Melissa)
>>>    - Increasing the message buffer size (Marcelo) would only make the disconnect less frequent.
>>>
>>> If there is a way to configure a client to ignore all incoming message I would be interested.  Otherwise our solution is is line with Melissa's and we have added a process to read any messages.
>>>
>>> Many thanks to all for the assistance, it has provided greater insight as well as fixed a nasty bug in our SW.
>>> ##Shawn
>>>
>>>
>>>
>>> On Wed, Nov 7, 2012 at 3:58 PM, Marcelo San-Martin <Marcelo.San-Martin at harmonicinc.com <mailto:Marcelo.San-Martin at harmonicinc.com>> wrote:
>>>
>>>     Hi,
>>>     I used to have a similar problems, in my case I fixed it by increasing MaxSessionMessages in the configuration file. The default value was 1000, I increased it to 10000 and the problem went away.
>>>
>>>     Cheers,
>>>     Marcelo
>>>
>>>
>>>     -----Original Message-----
>>>     From: spread-users-request at lists.spread.org <mailto:spread-users-request at lists.spread.org> [mailto:spread-users-request at lists.spread.org <mailto:spread-users-request at lists.spread.org>]
>>>     Sent: Wednesday, November 07, 2012 2:02 PM
>>>     To: spread-users at lists.spread.org <mailto:spread-users at lists.spread.org>
>>>     Subject: Spread-users Digest, Vol 91, Issue 4
>>>
>>>     Send Spread-users mailing list submissions to
>>>     spread-users at lists.spread.org <mailto:spread-users at lists.spread.org>
>>>
>>>     To subscribe or unsubscribe via the World Wide Web, visit
>>>     http://lists.spread.org/mailman/listinfo/spread-users
>>>     or, via email, send a message with subject or body 'help' to
>>>     spread-users-request at lists.spread.org <mailto:spread-users-request at lists.spread.org>
>>>
>>>     You can reach the person managing the list at
>>>     spread-users-owner at lists.spread.org <mailto:spread-users-owner at lists.spread.org>
>>>
>>>     When replying, please edit your Subject line so it is more specific than "Re: Contents of Spread-users digest..."
>>>
>>>
>>>     Today's Topics:
>>>
>>>         1. Re: write(): java.net.SocketException (Shawn Bradford)
>>>         2. Re: write(): java.net.SocketException (Ed Holyat)
>>>
>>>
>>>     ----------------------------------------------------------------------
>>>
>>>     Message: 1
>>>     Date: Wed, 7 Nov 2012 11:20:23 -0800
>>>     From: Shawn Bradford <shawnb at mojix.com <mailto:shawnb at mojix.com>>
>>>     Subject: Re: [Spread-users] write(): java.net.SocketException
>>>     To: Jonathan Stanton <jonathan at spreadconcepts.com <mailto:jonathan at spreadconcepts.com>>
>>>     Cc: spread-users at lists.spread.org <mailto:spread-users at lists.spread.org>
>>>     Message-ID:
>>>              <CADTONkdQ4GbQc_nD5oTB4jcpJK5uAbWr7WjRaOrShtdp5W4JVw at mail.gmail.com <mailto:CADTONkdQ4GbQc_nD5oTB4jcpJK5uAbWr7WjRaOrShtdp5W4JVw at mail.gmail.com>>
>>>     Content-Type: text/plain; charset="iso-8859-1"
>>>
>>>     Here is an update on the status of this issue :
>>>        - We tried adding 1ms delay between transmissions (still fails)
>>>        - We tried upgrading to spread 4.2.0 (still fails)
>>>
>>>     We will try Melissa's suggestion and do some reading.
>>>
>>>     Thanks,
>>>     ##Shawn
>>>
>>>
>>>     On Mon, Nov 5, 2012 at 3:33 PM, Jonathan Stanton < jonathan at spreadconcepts.com <mailto:jonathan at spreadconcepts.com>> wrote:
>>>
>>>> Hello Shawn,
>>>>
>>>> Since you are using Spread 4.1, this may be a fixed problem. The
>>>> Spread
>>>> 4.2 release that came out in June has a number of fixes (especially to
>>>> the Java API) which solved a number of deadlock, disconnection and crash bugs.
>>>> If you can try the 4.2 release and see if that resolves the problem,
>>>> or look at the changes to the Java API between 4.1 and 4.2 and merge
>>>> them into the version of the Java library that you use that could help.
>>>>
>>>> I've included the summary release notes below.
>>>>
>>>> Cheers,
>>>>
>>>> Jonathan
>>>>
>>>> The main new features of this release are:
>>>>
>>>> 1) Added Keepalive support to client-server TCP connections. Requires
>>>> correct
>>>>    operating system values set for keepalives in order to be useful.
>>>> 2) Switch internal code to use MONOTONIC clocks when available and
>>>> appropriate
>>>>    to remove chance of system clock changes (from the clock being set)
>>>> from affecting
>>>>    message processing
>>>> 3) Break out events, memory, data_link and alarm code into separate
>>>>    libspread-util package. This package also has a number of
>>>> improvements in
>>>>    the functionality of those code files which are listed in the internal
>>>>    package release notes.
>>>>
>>>> It also includes a number of important bug fixes. The most significant
>>>> include:
>>>>
>>>> 1) Fix bug with structure size on 64 bit platforms causing crash.
>>>> 2) Fix several deadlock, crashes and race conditions in java Listener code.
>>>> 3) Fix 100 ms timeout in java socket handling code so it does not corrupt
>>>>    messages that take a long time to arrive.
>>>> 4) Fix java disconnect bug that prevented client from reconnecting
>>>> until restarted.
>>>> 5) Remove cause of slow message delivery when a client is receiving a
>>>> lot of
>>>>    messages and gets into a badger state.
>>>> 6) Improve help output and error messages in utility programs.
>>>> 7) Fix token hurry bug that caused messages to have a 2 second latency in
>>>>    specific circumstances.
>>>> 8) Fix crash bug when new daemon configuration files are loaded while the
>>>>    system is running.
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------------------
>>>> Jonathan Stanton jonathan at spreadconcepts.com <mailto:jonathan at spreadconcepts.com>
>>>> Spread Group Messaging www.spread.org <http://www.spread.org>
>>>> Spread Concepts LLC www.spreadconcepts.com <http://www.spreadconcepts.com>
>>>>
>>>> ----------------------------------------------------------------------
>>>> ---------
>>>>
>>>>
>>>>
>>>> On Nov 5, 2012, at 3:03 PM, Shawn Bradford wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We are currently using spread and have found this error occurring
>>>>> quite frequently. Unfortunately there is little information on
>>>>> write() errors
>>>> to
>>>>> be found on the net (many more read() errors).
>>>>>
>>>>> *spread.SpreadException: write(): java.net.SocketException:
>>>>> Connection
>>>> reset
>>>>> *
>>>>>
>>>>> Would someone be able to describe what would be a potential issue
>>>>> causing this?  I am looking for some guidance as to the source of
>>>>> the error
>>>> (maybe
>>>>> from a developer) to assist in debugging the error.
>>>>>
>>>>> We have tried to write several test apps to replicate the bug but
>>>>> have
>>>> been
>>>>> unsuccessful.  Our system is quite large with many moving parts and
>>>>> it is unclear as to what sequence of events are causing the errors.
>>>>>
>>>>> We are using spread 4.1 on 64 bit centos 5.5.
>>>>>
>>>>> Thanks in advance,
>>>>> ##Shawn
>>>>>
>>>>> *--
>>>>> ------------------------------
>>>>> *  Director Software | Mojix Inc.
>>>>> phone : +1.562.221.3474 <tel:%2B1.562.221.3474>
>>>>> email : shawn.bradford at mojix.com <mailto:shawn.bradford at mojix.com>
>>>>> web : www.mojix.com <http://www.mojix.com>
>>>>>
>>>>> Unless expressly identified to the contrary herein, this email and
>>>>> any attachments contain  and constitute confidential and
>>>>> proprietary material  and information for the sole use of the
>>>>> intended recipient. If you are not the intended recipient or
>>>>> otherwise received this e-mail in error, please (i) immediately
>>>>> delete this email and any attachments,
>>>> print
>>>>> outs and copies of the foregoing and (ii) please notify me
>>>>> immediately by responding to this e-mail message.
>>>>>
>>>>> *
>>>>> _______________________________________________
>>>>> Spread-users mailing list
>>>>> Spread-users at lists.spread.org <mailto:Spread-users at lists.spread.org>
>>>>> http://lists.spread.org/mailman/listinfo/spread-users
>>>>
>>>>
>>>     -------------- next part --------------
>>>     An HTML attachment was scrubbed...
>>>     URL: http://lists.spread.org/pipermail/spread-users/attachments/20121107/d3aaafc6/attachment-0001.html
>>>
>>>     ------------------------------
>>>
>>>     Message: 2
>>>     Date: Wed, 7 Nov 2012 16:44:35 -0500
>>>     From: Ed Holyat <Ed.Holyat at openlink.com <mailto:Ed.Holyat at openlink.com>>
>>>     Subject: Re: [Spread-users] write(): java.net.SocketException
>>>     To: "Shawn.Bradford at mojix.com <mailto:Shawn.Bradford at mojix.com>" <Shawn.Bradford at mojix.com <mailto:Shawn.Bradford at mojix.com>>, Jonathan
>>>              Stanton <jonathan at spreadconcepts.com <mailto:jonathan at spreadconcepts.com>>
>>>     Cc: "spread-users at lists.spread.org <mailto:spread-users at lists.spread.org>" <spread-users at lists.spread.org <mailto:spread-users at lists.spread.org>>
>>>     Message-ID:
>>>              <648AFB5742D6394FB956DC60697556054491EB0883 at OLFANDEXCH01.andover.olf.com <mailto:648AFB5742D6394FB956DC60697556054491EB0883 at OLFANDEXCH01.andover.olf.com>>
>>>
>>>     Content-Type: text/plain; charset="us-ascii"
>>>
>>>     I have not used the java version of spread, but, usually a connection reset means that the connection terminated hard and the other side did not close it. Have you verified that the spread daemon closed the connection on purpose; you can put debugging on the spread daemon to determine if Spread closed the connection because of a slow consumer.
>>>     Here are some other scenarios I have scene.
>>>     anti virus software delaying the packets and one side getting a Sockettimeoutconnection which wasn't handled correctly, this produced a connection reset on the other side.  Try disabling any virus software.
>>>     This can also occur if a client terminates before a socket is flushed of all its packets.  This can happen on a system with high memory or CPU usage or just sending large packets.  Monitor resources and check that the MTU is the same on both sides of the connection.
>>>     And there is always the possibility of hardware issues.  You can try duplicating the problem outside of Spread by executing ping with a large buffer size ping -t -l 1350 and look for packet loss.  This should be performed from client host to daemon host and vise versa
>>>
>>>
>>>     From: Shawn Bradford [mailto:shawnb at mojix.com <mailto:shawnb at mojix.com>]
>>>     Sent: Wednesday, November 07, 2012 2:20 PM
>>>     To: Jonathan Stanton
>>>     Cc: spread-users at lists.spread.org <mailto:spread-users at lists.spread.org>
>>>     Subject: Re: [Spread-users] write(): java.net.SocketException
>>>
>>>
>>>     Here is an update on the status of this issue :
>>>        - We tried adding 1ms delay between transmissions (still fails)
>>>        - We tried upgrading to spread 4.2.0 (still fails)
>>>
>>>     We will try Melissa's suggestion and do some reading.
>>>
>>>     Thanks,
>>>     ##Shawn
>>>
>>>     On Mon, Nov 5, 2012 at 3:33 PM, Jonathan Stanton <jonathan at spreadconcepts.com <mailto:jonathan at spreadconcepts.com><mailto:jonathan at spreadconcepts.com <mailto:jonathan at spreadconcepts.com>>> wrote:
>>>     Hello Shawn,
>>>
>>>     Since you are using Spread 4.1, this may be a fixed problem. The Spread 4.2 release that came out in June has a number of fixes (especially to the Java API) which solved a number of deadlock, disconnection and crash bugs. If you can try the 4.2 release and see if that resolves the problem, or look at the changes to the Java API between 4.1 and 4.2 and merge them into the version of the Java library that you use that could help.
>>>
>>>     I've included the summary release notes below.
>>>
>>>     Cheers,
>>>
>>>     Jonathan
>>>
>>>     The main new features of this release are:
>>>
>>>     1) Added Keepalive support to client-server TCP connections. Requires correct
>>>         operating system values set for keepalives in order to be useful.
>>>     2) Switch internal code to use MONOTONIC clocks when available and appropriate
>>>         to remove chance of system clock changes (from the clock being set) from affecting
>>>         message processing
>>>     3) Break out events, memory, data_link and alarm code into separate
>>>         libspread-util package. This package also has a number of improvements in
>>>         the functionality of those code files which are listed in the internal
>>>         package release notes.
>>>
>>>     It also includes a number of important bug fixes. The most significant include:
>>>
>>>     1) Fix bug with structure size on 64 bit platforms causing crash.
>>>     2) Fix several deadlock, crashes and race conditions in java Listener code.
>>>     3) Fix 100 ms timeout in java socket handling code so it does not corrupt
>>>         messages that take a long time to arrive.
>>>     4) Fix java disconnect bug that prevented client from reconnecting until restarted.
>>>     5) Remove cause of slow message delivery when a client is receiving a lot of
>>>         messages and gets into a badger state.
>>>     6) Improve help output and error messages in utility programs.
>>>     7) Fix token hurry bug that caused messages to have a 2 second latency in
>>>         specific circumstances.
>>>     8) Fix crash bug when new daemon configuration files are loaded while the
>>>         system is running.
>>>
>>>
>>>     -------------------------------------------------------------------------------
>>>     Jonathan Stanton jonathan at spreadconcepts.com <mailto:jonathan at spreadconcepts.com><mailto:jonathan at spreadconcepts.com <mailto:jonathan at spreadconcepts.com>>
>>>     Spread Group Messaging www.spread.org <http://www.spread.org><http://www.spread.org>
>>>     Spread Concepts LLC www.spreadconcepts.com <http://www.spreadconcepts.com><http://www.spreadconcepts.com>
>>>     -------------------------------------------------------------------------------
>>>
>>>
>>>
>>>     On Nov 5, 2012, at 3:03 PM, Shawn Bradford wrote:
>>>> Hello,
>>>>
>>>> We are currently using spread and have found this error occurring
>>>> quite frequently. Unfortunately there is little information on write()
>>>> errors to be found on the net (many more read() errors).
>>>>
>>>> *spread.SpreadException: write(): java.net.SocketException: Connection
>>>> reset
>>>> *
>>>>
>>>> Would someone be able to describe what would be a potential issue
>>>> causing this?  I am looking for some guidance as to the source of the
>>>> error (maybe from a developer) to assist in debugging the error.
>>>>
>>>> We have tried to write several test apps to replicate the bug but have
>>>> been unsuccessful.  Our system is quite large with many moving parts
>>>> and it is unclear as to what sequence of events are causing the errors.
>>>>
>>>> We are using spread 4.1 on 64 bit centos 5.5.
>>>>
>>>> Thanks in advance,
>>>> ##Shawn
>>>>
>>>> *--
>>>> ------------------------------
>>>> *  Director Software | Mojix Inc.
>>>> phone : +1.562.221.3474 <tel:%2B1.562.221.3474><tel:%2B1.562.221.3474>
>>>> email : shawn.bradford at mojix.com <mailto:shawn.bradford at mojix.com><mailto:shawn.bradford at mojix.com <mailto:shawn.bradford at mojix.com>>
>>>> web : www.mojix.com <http://www.mojix.com><http://www.mojix.com>
>>>>
>>>> Unless expressly identified to the contrary herein, this email and any
>>>> attachments contain  and constitute confidential and  proprietary
>>>> material  and information for the sole use of the intended recipient.
>>>> If you are not the intended recipient or otherwise received this
>>>> e-mail in error, please (i) immediately delete this email and any
>>>> attachments, print outs and copies of the foregoing and (ii) please
>>>> notify me immediately by responding to this e-mail message.
>>>>
>>>> *
>>>> _______________________________________________
>>>> Spread-users mailing list
>>>> Spread-users at lists.spread.org <mailto:Spread-users at lists.spread.org><mailto:Spread-users at lists.spread.org <mailto:Spread-users at lists.spread.org>>
>>>> http://lists.spread.org/mailman/listinfo/spread-users
>>>
>>>     -------------- next part --------------
>>>     An HTML attachment was scrubbed...
>>>     URL: http://lists.spread.org/pipermail/spread-users/attachments/20121107/f9bc44c7/attachment.html
>>>
>>>     ------------------------------
>>>
>>>     _______________________________________________
>>>     Spread-users mailing list
>>>     Spread-users at lists.spread.org <mailto:Spread-users at lists.spread.org>
>>>     http://lists.spread.org/mailman/listinfo/spread-users
>>>
>>>
>>>     End of Spread-users Digest, Vol 91, Issue 4
>>>     *******************************************
>>>
>>>     _______________________________________________
>>>     Spread-users mailing list
>>>     Spread-users at lists.spread.org <mailto:Spread-users at lists.spread.org>
>>>     http://lists.spread.org/mailman/listinfo/spread-users
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Spread-users mailing list
>>> Spread-users at lists.spread.org
>>> http://lists.spread.org/mailman/listinfo/spread-users
>>>
>>
>> _______________________________________________
>> Spread-users mailing list
>> Spread-users at lists.spread.org
>> http://lists.spread.org/mailman/listinfo/spread-users
>
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
>



More information about the Spread-users mailing list