[Spread-users] 'Connection closed by spread' ...

Ryan Caudy rcaudy at gmail.com
Mon Sep 6 22:07:34 EDT 2004


Could you post some sample source code for your simplified program?  I
can't think of anything from your description to explain this behavior
from Spread.

Cheers,
Ryan


On Mon, 6 Sep 2004 19:03:50 -0600, Jim Vickroy <jim.vickroy at noaa.gov> wrote:
> OK, I let the simple publisher program (described below) run until it had
> completed ~ 4500 successful publications with zero failures.  So then, I
> introduced more realism by running 10 copies of it simultaneously in
> separate processes (separate console windows) and within 10 minutes all of
> the processes started experiencing (intermittent) publishing failures.  This
> is the behavior I'm seeing in the real application (which comprises 10
> independent processes running on the same machine).
> 
> I probably did not make it clear that these simple simulation programs are
> pure publishers (not publishers/subscribers).
> 
> Any ideas about what is causing these failures?
> 
> Thanks,
> 
> -- jv
> 
> 
> 
> -----Original Message-----
> From: spread-users-admin at lists.spread.org
> [mailto:spread-users-admin at lists.spread.org]On Behalf Of Jim Vickroy
> Sent: Monday, September 06, 2004 3:08 PM
> To: Ryan Caudy
> Cc: SPREAD-USERS
> Subject: RE: [Spread-users] 'Connection closed by spread' ...
> 
> Thanks again for the feedback, Ryan -- and for your patience in providing a
> detailed explanation.
> 
> I have checked the application and the publisher does not join the group it
> publishes to -- only subscribers join groups.
> 
> I have created a highly simplified version of the application that hopefully
> will capture the (errant) behavior I reported.  One difference between the
> simplified version and the real application is that the simplified version
> is a single process that periodically publishes 1-10 messages while the real
> application is 10 separate processes that periodically publish 1 or 2
> messages.
> 
> The simulation is running now, and I will post a follow-up when it has run
> for a sufficient period of time.
> 
> Thanks,
> 
> -- jv
> 
> -----Original Message-----
> From: spread-users-admin at lists.spread.org
> [mailto:spread-users-admin at lists.spread.org]On Behalf Of Ryan Caudy
> Sent: Saturday, September 04, 2004 10:03 PM
> To: Jim Vickroy
> Cc: SPREAD-USERS
> Subject: Re: [Spread-users] 'Connection closed by spread' ...
> 
> To clarify what I said earlier, the -8, CONNECTION CLOSED, return code
> isn't specific to receiving.  It will be returned by any of the
> library functions if they try to send or recv on the mbox socket, and
> get an error besides EAGAIN, EINTR, or EWOULDBLOCK.  You may not be
> using the C API, but the behavior of any of the APIs should be
> similar.  Although there are other possible errors that could cause
> this to happen, the most likely one in this situation (no real network
> problem, etc) is that Spread closed the socket for failing to receive.
> 
> Part of the reason I think that the cause is what I described is that
> you said "about once every 1000 publishing attempts."  It probably
> isn't coincidental that this is the defined value for
> MAX_SESSION_MESSAGES (see spread_params.h), which dictates the number
> of messages Spread will allow to pile up for a session before
> disconnecting it for failing to receive.
> 
> Could you tell me a little bit more about your application?  What you
> described should be absolutely fine, since you don't rely on Spread's
> internal queuing any more than absolutely necessary.
> 
> When do the applications that are having trouble connect to spread?
> The normal paradigm for something like what you've described is to
> have them connect before spinning off the receiving thread, and share
> the mbox (with some sort of synchronization).  If I had to guess what
> was going wrong from what you said before, I would guess that for the
> applications that are both publishers and subscribers, you have opened
> two mbox's, joined the relevant groups on both, and are only receiving
> on one of them.
> 
> If this is the case, I would recommend that you do one of the
> following: (a) Have only one mbox.  Depending on the library
> implementation you're working with, you may or may not need additional
> synchronization.  OR (b) Have two mboxes, but for the
> sending/publishing thread, do NOT join the groups.  Spread supports
> open-group semantics, which means that you can send to a group without
> being a member of it.
> 
> I hope this helps.  If it doesn't, please give the list whatever other
> information you can provide.
> 
> Cheers,
> Ryan
> 
> On Sat, 4 Sep 2004 07:55:15 -0600, Jim Vickroy <jim.vickroy at noaa.gov> wrote:
> > Thanks for your response, Ryan.
> >
> > I did not make it clear in my original posting, that these are publishing
> > errors -- not subscriber errors.  The errors are being trapped by
> try-catch
> > blocks wrapping publishing requests.
> >
> > Most of the publishers are also subscribers to the same message group
> (they
> > must be), but each subscriber operates in its own dedicated thread that
> does
> > nothing but receive and queue messages for subsequent processing.  I doubt
> > the receiving/queuing thread is not keeping up with the publishers
> > especially since the burst rate is only on the order of 10 messages per
> > second for one second.  The applications keep rather detailed logs of the
> > messages received/published, and I see no evidence of any subscriber
> failing
> > to keep up with the publishing rate.
> >
> > It is curious, however, that the one publisher which is not also a
> (Spread)
> > subscriber is the only component that, so far, has not experienced a
> > publishing error.  This component does have a receiver thread, but it is
> > monitoring a simple socket connection for message traffic.
> >
> > That said, I am a novice user of Spread and certainly may have an
> > implementation problem; it is just not clear what is wrong.
> >
> > I will ask our administrator to upgrade to the current, stable version of
> > Spread.
> >
> >
> >
> >
> > -----Original Message-----
> > From: spread-users-admin at lists.spread.org
> > [mailto:spread-users-admin at lists.spread.org]On Behalf Of Ryan Caudy
> > Sent: Friday, September 03, 2004 8:54 PM
> > To: Jim Vickroy
> > Cc: SPREAD-USERS
> > Subject: Re: [Spread-users] 'Connection closed by spread' ...
> >
> > Hi,
> >
> > This error is usually caused by a failure to receive by clients to
> > Spread.  If your clients let more than a certain number of messages,
> > 1000 with a "vanilla" Spread, pile up at the daemon without receiving
> > them, then Spread will disconnect them with that error code.
> >
> > You may want to look at past posts on this list about flow control.
> >
> > Also, on a side note, I would encourage you to use the most recent
> > stable release of Spread.
> >
> > Cheers,
> > Ryan
> >
> > On Fri, 3 Sep 2004 12:12:54 -0600, Jim Vickroy <jim.vickroy at noaa.gov>
> wrote:
> > > ... is the error that is happening more frequently than desirable --
> about
> > > once every 1000 publishing attempts.
> > >
> > > Could someone suggest a way to reduce this error rate (at least by a
> > factor
> > > of 10)?
> > >
> > > The platform:
> > >         Spread: v 3.17.01 (20 June 2003)
> > >         Spread Host: RedHat Workstation, Kernel: 2.4.21-4.EL
> > >         Client Host: Microsoft Windows 2000 Server
> > >         Client Software: Python v 2.3.3
> > >
> > > The use case:
> > >         Messages are published in bursts at 1-minute intervals.
> > >         Each burst of messages comprises 5-10 messages; each message is
> > generated
> > > by a distinct process.
> > >         Each message is about 100 bytes.
> > >         Publication service type is set to spread.SAFE_MESS.
> > >
> > > Thanks,
> > >
> > > -- jv
> > >
> > > _______________________________________________
> > > Spread-users mailing list
> > > Spread-users at lists.spread.org
> > > http://lists.spread.org/mailman/listinfo/spread-users
> > >
> >
> > --
> > ---------------------------------------------------------------------
> > Ryan W. Caudy
> > <rcaudy at gmail.com>
> > ---------------------------------------------------------------------
> > Bloomberg L.P.
> > <rcaudy1 at bloomberg.net>
> > ---------------------------------------------------------------------
> > [Alumnus]
> > <caudy at cnds.jhu.edu>
> > Center for Networking and Distributed Systems
> > Department of Computer Science
> > Johns Hopkins University
> > ---------------------------------------------------------------------
> >
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
> >
> >
> 
> --
> ---------------------------------------------------------------------
> Ryan W. Caudy
> <rcaudy at gmail.com>
> ---------------------------------------------------------------------
> Bloomberg L.P.
> <rcaudy1 at bloomberg.net>
> ---------------------------------------------------------------------
> [Alumnus]
> <caudy at cnds.jhu.edu>
> Center for Networking and Distributed Systems
> Department of Computer Science
> Johns Hopkins University
> ---------------------------------------------------------------------
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
> 
> 



-- 
---------------------------------------------------------------------
Ryan W. Caudy
<rcaudy at gmail.com>
---------------------------------------------------------------------
Bloomberg L.P.
<rcaudy1 at bloomberg.net>
---------------------------------------------------------------------
[Alumnus]
<caudy at cnds.jhu.edu>         
Center for Networking and Distributed Systems
Department of Computer Science
Johns Hopkins University          
---------------------------------------------------------------------




More information about the Spread-users mailing list