[Spread-users] 'Connection closed by spread' ...

Ryan Caudy rcaudy at gmail.com
Sun Sep 5 00:03:09 EDT 2004


To clarify what I said earlier, the -8, CONNECTION CLOSED, return code
isn't specific to receiving.  It will be returned by any of the
library functions if they try to send or recv on the mbox socket, and
get an error besides EAGAIN, EINTR, or EWOULDBLOCK.  You may not be
using the C API, but the behavior of any of the APIs should be
similar.  Although there are other possible errors that could cause
this to happen, the most likely one in this situation (no real network
problem, etc) is that Spread closed the socket for failing to receive.

Part of the reason I think that the cause is what I described is that
you said "about once every 1000 publishing attempts."  It probably
isn't coincidental that this is the defined value for
MAX_SESSION_MESSAGES (see spread_params.h), which dictates the number
of messages Spread will allow to pile up for a session before
disconnecting it for failing to receive.

Could you tell me a little bit more about your application?  What you
described should be absolutely fine, since you don't rely on Spread's
internal queuing any more than absolutely necessary.

When do the applications that are having trouble connect to spread? 
The normal paradigm for something like what you've described is to
have them connect before spinning off the receiving thread, and share
the mbox (with some sort of synchronization).  If I had to guess what
was going wrong from what you said before, I would guess that for the
applications that are both publishers and subscribers, you have opened
two mbox's, joined the relevant groups on both, and are only receiving
on one of them.

If this is the case, I would recommend that you do one of the
following: (a) Have only one mbox.  Depending on the library
implementation you're working with, you may or may not need additional
synchronization.  OR (b) Have two mboxes, but for the
sending/publishing thread, do NOT join the groups.  Spread supports
open-group semantics, which means that you can send to a group without
being a member of it.

I hope this helps.  If it doesn't, please give the list whatever other
information you can provide.

Cheers,
Ryan

On Sat, 4 Sep 2004 07:55:15 -0600, Jim Vickroy <jim.vickroy at noaa.gov> wrote:
> Thanks for your response, Ryan.
> 
> I did not make it clear in my original posting, that these are publishing
> errors -- not subscriber errors.  The errors are being trapped by try-catch
> blocks wrapping publishing requests.
> 
> Most of the publishers are also subscribers to the same message group (they
> must be), but each subscriber operates in its own dedicated thread that does
> nothing but receive and queue messages for subsequent processing.  I doubt
> the receiving/queuing thread is not keeping up with the publishers
> especially since the burst rate is only on the order of 10 messages per
> second for one second.  The applications keep rather detailed logs of the
> messages received/published, and I see no evidence of any subscriber failing
> to keep up with the publishing rate.
> 
> It is curious, however, that the one publisher which is not also a (Spread)
> subscriber is the only component that, so far, has not experienced a
> publishing error.  This component does have a receiver thread, but it is
> monitoring a simple socket connection for message traffic.
> 
> That said, I am a novice user of Spread and certainly may have an
> implementation problem; it is just not clear what is wrong.
> 
> I will ask our administrator to upgrade to the current, stable version of
> Spread.
> 
> 
> 
> 
> -----Original Message-----
> From: spread-users-admin at lists.spread.org
> [mailto:spread-users-admin at lists.spread.org]On Behalf Of Ryan Caudy
> Sent: Friday, September 03, 2004 8:54 PM
> To: Jim Vickroy
> Cc: SPREAD-USERS
> Subject: Re: [Spread-users] 'Connection closed by spread' ...
> 
> Hi,
> 
> This error is usually caused by a failure to receive by clients to
> Spread.  If your clients let more than a certain number of messages,
> 1000 with a "vanilla" Spread, pile up at the daemon without receiving
> them, then Spread will disconnect them with that error code.
> 
> You may want to look at past posts on this list about flow control.
> 
> Also, on a side note, I would encourage you to use the most recent
> stable release of Spread.
> 
> Cheers,
> Ryan
> 
> On Fri, 3 Sep 2004 12:12:54 -0600, Jim Vickroy <jim.vickroy at noaa.gov> wrote:
> > ... is the error that is happening more frequently than desirable -- about
> > once every 1000 publishing attempts.
> >
> > Could someone suggest a way to reduce this error rate (at least by a
> factor
> > of 10)?
> >
> > The platform:
> >         Spread: v 3.17.01 (20 June 2003)
> >         Spread Host: RedHat Workstation, Kernel: 2.4.21-4.EL
> >         Client Host: Microsoft Windows 2000 Server
> >         Client Software: Python v 2.3.3
> >
> > The use case:
> >         Messages are published in bursts at 1-minute intervals.
> >         Each burst of messages comprises 5-10 messages; each message is
> generated
> > by a distinct process.
> >         Each message is about 100 bytes.
> >         Publication service type is set to spread.SAFE_MESS.
> >
> > Thanks,
> >
> > -- jv
> >
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
> >
> 
> --
> ---------------------------------------------------------------------
> Ryan W. Caudy
> <rcaudy at gmail.com>
> ---------------------------------------------------------------------
> Bloomberg L.P.
> <rcaudy1 at bloomberg.net>
> ---------------------------------------------------------------------
> [Alumnus]
> <caudy at cnds.jhu.edu>
> Center for Networking and Distributed Systems
> Department of Computer Science
> Johns Hopkins University
> ---------------------------------------------------------------------
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
> 
> 



-- 
---------------------------------------------------------------------
Ryan W. Caudy
<rcaudy at gmail.com>
---------------------------------------------------------------------
Bloomberg L.P.
<rcaudy1 at bloomberg.net>
---------------------------------------------------------------------
[Alumnus]
<caudy at cnds.jhu.edu>         
Center for Networking and Distributed Systems
Department of Computer Science
Johns Hopkins University          
---------------------------------------------------------------------




More information about the Spread-users mailing list