[Spread-users] 'Connection closed by spread' ...
Jim Vickroy
Jim.Vickroy at noaa.gov
Tue Sep 7 07:27:46 EDT 2004
Thanks, again Ryan. Here is the source code; I hope you speak Python.
<smile>
# START of file ----------------------------------------------------
'''
A Spread publisher, intended to be run for "long" periods of time, that
counts publication successes and failures.
NOTES
o When run for hours by itself, no publishing failures are detected
-- 100% success rate.
o When run with multiple (i.e., 10) copies of itself, simultaneously,
the success rate drops to ~ 99.8%.
LANGUAGE
O Python
http://www.python.org/
REFERENCES
o The Python/Spread API is documented at:
http://www.python.org/other/spread/doc.html
AUTHOR
jim.vickroy at noaa.gov
'''
import os, spread, time
from random import randint
print 'Spread version:', spread.version() # prints (3, 17, 1) on my system
host = '... host server name goes here ...'
port = 4803
address = '%d@%s' % (port, host)
sender = 'Spread publishing failures checker on %s' %
os.environ['COMPUTERNAME']
group = 'SEC.publishing.failures.statistics'
template = 'timestamp: %s'
service = spread.SAFE_MESS
successes = 0
failures = 0
while True:
now = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(time.time()))
message = template % now
count = randint(2,10)
for this in range(1, count):
try:
mailbox = spread.connect(address, '', priority=0, membership=True)
# membership=True -> receive membership messages
# Is it possible that membership messages are what is filling
the mailbox?
bytes_transmitted = mailbox.multicast(service, group, message, 0) #
message_type is zero
time.sleep(1) # second -- a precaution to ensure message makes it
to Spread server
mailbox.disconnect()
assert bytes_transmitted == len(message), \
'expected %d bytes to be transmitted -- actual = %d' %
(len(message), bytes_transmitted)
except Exception, cause:
failures += 1
print cause
else:
successes += 1
print '%s: success: %d failures: %d' % (now, successes, failures)
time.sleep(5) # seconds
# END of file ----------------------------------------------------
-----Original Message-----
From: spread-users-admin at lists.spread.org
[mailto:spread-users-admin at lists.spread.org]On Behalf Of Ryan Caudy
Sent: Monday, September 06, 2004 8:08 PM
To: Jim Vickroy
Cc: SPREAD-USERS
Subject: Re: [Spread-users] 'Connection closed by spread' ...
Could you post some sample source code for your simplified program? I
can't think of anything from your description to explain this behavior
from Spread.
Cheers,
Ryan
On Mon, 6 Sep 2004 19:03:50 -0600, Jim Vickroy <jim.vickroy at noaa.gov> wrote:
> OK, I let the simple publisher program (described below) run until it had
> completed ~ 4500 successful publications with zero failures. So then, I
> introduced more realism by running 10 copies of it simultaneously in
> separate processes (separate console windows) and within 10 minutes all of
> the processes started experiencing (intermittent) publishing failures.
This
> is the behavior I'm seeing in the real application (which comprises 10
> independent processes running on the same machine).
>
> I probably did not make it clear that these simple simulation programs are
> pure publishers (not publishers/subscribers).
>
> Any ideas about what is causing these failures?
>
> Thanks,
>
> -- jv
>
>
>
> -----Original Message-----
> From: spread-users-admin at lists.spread.org
> [mailto:spread-users-admin at lists.spread.org]On Behalf Of Jim Vickroy
> Sent: Monday, September 06, 2004 3:08 PM
> To: Ryan Caudy
> Cc: SPREAD-USERS
> Subject: RE: [Spread-users] 'Connection closed by spread' ...
>
> Thanks again for the feedback, Ryan -- and for your patience in providing
a
> detailed explanation.
>
> I have checked the application and the publisher does not join the group
it
> publishes to -- only subscribers join groups.
>
> I have created a highly simplified version of the application that
hopefully
> will capture the (errant) behavior I reported. One difference between the
> simplified version and the real application is that the simplified version
> is a single process that periodically publishes 1-10 messages while the
real
> application is 10 separate processes that periodically publish 1 or 2
> messages.
>
> The simulation is running now, and I will post a follow-up when it has run
> for a sufficient period of time.
>
> Thanks,
>
> -- jv
>
> -----Original Message-----
> From: spread-users-admin at lists.spread.org
> [mailto:spread-users-admin at lists.spread.org]On Behalf Of Ryan Caudy
> Sent: Saturday, September 04, 2004 10:03 PM
> To: Jim Vickroy
> Cc: SPREAD-USERS
> Subject: Re: [Spread-users] 'Connection closed by spread' ...
>
> To clarify what I said earlier, the -8, CONNECTION CLOSED, return code
> isn't specific to receiving. It will be returned by any of the
> library functions if they try to send or recv on the mbox socket, and
> get an error besides EAGAIN, EINTR, or EWOULDBLOCK. You may not be
> using the C API, but the behavior of any of the APIs should be
> similar. Although there are other possible errors that could cause
> this to happen, the most likely one in this situation (no real network
> problem, etc) is that Spread closed the socket for failing to receive.
>
> Part of the reason I think that the cause is what I described is that
> you said "about once every 1000 publishing attempts." It probably
> isn't coincidental that this is the defined value for
> MAX_SESSION_MESSAGES (see spread_params.h), which dictates the number
> of messages Spread will allow to pile up for a session before
> disconnecting it for failing to receive.
>
> Could you tell me a little bit more about your application? What you
> described should be absolutely fine, since you don't rely on Spread's
> internal queuing any more than absolutely necessary.
>
> When do the applications that are having trouble connect to spread?
> The normal paradigm for something like what you've described is to
> have them connect before spinning off the receiving thread, and share
> the mbox (with some sort of synchronization). If I had to guess what
> was going wrong from what you said before, I would guess that for the
> applications that are both publishers and subscribers, you have opened
> two mbox's, joined the relevant groups on both, and are only receiving
> on one of them.
>
> If this is the case, I would recommend that you do one of the
> following: (a) Have only one mbox. Depending on the library
> implementation you're working with, you may or may not need additional
> synchronization. OR (b) Have two mboxes, but for the
> sending/publishing thread, do NOT join the groups. Spread supports
> open-group semantics, which means that you can send to a group without
> being a member of it.
>
> I hope this helps. If it doesn't, please give the list whatever other
> information you can provide.
>
> Cheers,
> Ryan
>
> On Sat, 4 Sep 2004 07:55:15 -0600, Jim Vickroy <jim.vickroy at noaa.gov>
wrote:
> > Thanks for your response, Ryan.
> >
> > I did not make it clear in my original posting, that these are
publishing
> > errors -- not subscriber errors. The errors are being trapped by
> try-catch
> > blocks wrapping publishing requests.
> >
> > Most of the publishers are also subscribers to the same message group
> (they
> > must be), but each subscriber operates in its own dedicated thread that
> does
> > nothing but receive and queue messages for subsequent processing. I
doubt
> > the receiving/queuing thread is not keeping up with the publishers
> > especially since the burst rate is only on the order of 10 messages per
> > second for one second. The applications keep rather detailed logs of
the
> > messages received/published, and I see no evidence of any subscriber
> failing
> > to keep up with the publishing rate.
> >
> > It is curious, however, that the one publisher which is not also a
> (Spread)
> > subscriber is the only component that, so far, has not experienced a
> > publishing error. This component does have a receiver thread, but it is
> > monitoring a simple socket connection for message traffic.
> >
> > That said, I am a novice user of Spread and certainly may have an
> > implementation problem; it is just not clear what is wrong.
> >
> > I will ask our administrator to upgrade to the current, stable version
of
> > Spread.
> >
> >
> >
> >
> > -----Original Message-----
> > From: spread-users-admin at lists.spread.org
> > [mailto:spread-users-admin at lists.spread.org]On Behalf Of Ryan Caudy
> > Sent: Friday, September 03, 2004 8:54 PM
> > To: Jim Vickroy
> > Cc: SPREAD-USERS
> > Subject: Re: [Spread-users] 'Connection closed by spread' ...
> >
> > Hi,
> >
> > This error is usually caused by a failure to receive by clients to
> > Spread. If your clients let more than a certain number of messages,
> > 1000 with a "vanilla" Spread, pile up at the daemon without receiving
> > them, then Spread will disconnect them with that error code.
> >
> > You may want to look at past posts on this list about flow control.
> >
> > Also, on a side note, I would encourage you to use the most recent
> > stable release of Spread.
> >
> > Cheers,
> > Ryan
> >
> > On Fri, 3 Sep 2004 12:12:54 -0600, Jim Vickroy <jim.vickroy at noaa.gov>
> wrote:
> > > ... is the error that is happening more frequently than desirable --
> about
> > > once every 1000 publishing attempts.
> > >
> > > Could someone suggest a way to reduce this error rate (at least by a
> > factor
> > > of 10)?
> > >
> > > The platform:
> > > Spread: v 3.17.01 (20 June 2003)
> > > Spread Host: RedHat Workstation, Kernel: 2.4.21-4.EL
> > > Client Host: Microsoft Windows 2000 Server
> > > Client Software: Python v 2.3.3
> > >
> > > The use case:
> > > Messages are published in bursts at 1-minute intervals.
> > > Each burst of messages comprises 5-10 messages; each message
is
> > generated
> > > by a distinct process.
> > > Each message is about 100 bytes.
> > > Publication service type is set to spread.SAFE_MESS.
> > >
> > > Thanks,
> > >
> > > -- jv
> > >
> > > _______________________________________________
> > > Spread-users mailing list
> > > Spread-users at lists.spread.org
> > > http://lists.spread.org/mailman/listinfo/spread-users
> > >
> >
> > --
> > ---------------------------------------------------------------------
> > Ryan W. Caudy
> > <rcaudy at gmail.com>
> > ---------------------------------------------------------------------
> > Bloomberg L.P.
> > <rcaudy1 at bloomberg.net>
> > ---------------------------------------------------------------------
> > [Alumnus]
> > <caudy at cnds.jhu.edu>
> > Center for Networking and Distributed Systems
> > Department of Computer Science
> > Johns Hopkins University
> > ---------------------------------------------------------------------
> >
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
> >
> >
>
> --
> ---------------------------------------------------------------------
> Ryan W. Caudy
> <rcaudy at gmail.com>
> ---------------------------------------------------------------------
> Bloomberg L.P.
> <rcaudy1 at bloomberg.net>
> ---------------------------------------------------------------------
> [Alumnus]
> <caudy at cnds.jhu.edu>
> Center for Networking and Distributed Systems
> Department of Computer Science
> Johns Hopkins University
> ---------------------------------------------------------------------
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
>
>
--
---------------------------------------------------------------------
Ryan W. Caudy
<rcaudy at gmail.com>
---------------------------------------------------------------------
Bloomberg L.P.
<rcaudy1 at bloomberg.net>
---------------------------------------------------------------------
[Alumnus]
<caudy at cnds.jhu.edu>
Center for Networking and Distributed Systems
Department of Computer Science
Johns Hopkins University
---------------------------------------------------------------------
_______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
http://lists.spread.org/mailman/listinfo/spread-users
More information about the Spread-users
mailing list