[Spread-users] mbox corruption and timeouts in membership.c

Matt Garman matthew.garman at gmail.com
Fri Nov 3 17:52:45 EST 2006


On 11/3/06, John Schultz <jschultz at spreadconcepts.com> wrote:
> -8 (Connection Closed) errors for a client are almost invariably traced to
> a lack of flow control amongst sending applications.  In a multicast
> environment it is very easy for aggressive senders to overrun receivers'
> buffers.  Spread tries not to allow slow readers to exhaust its memory and
> cause the daemon to crash so it disconnects readers that aren't keeping up
> with their flow of traffic.

Thank you for the quick feedback.  I set up a test spread
configuration as follows:

    - Two segments (in different VLANs)
    - One machine on each segment
    - One spread daemon on each machine (i.e. two total spread daemons)
    - Both daemons have the SESSION logging flag
    - A client on machine A sends an enormous amount of messages (1000
Hz, each message is about 1 kB)
    - A receiver on machine B sleeps 1 sec between each received message

> One recommendation I can give is to turn on the SESSION logging flag and
> then search the log for "sess_kill" and see why it is disconencting your
> clients.

In the spread log on machine A (sender side) I have tons of these messages:

[Fri 03 Nov 2006 22:16:32] Sess_read: Message has type field 0x80000082
[Fri 03 Nov 2006 22:16:32] Sess_read: queueing message of type 2 with
len 0 to the protocol

Why does it say that it's queuing a message of length zero?

In the spread log on machine B (receiver side) I have many of the
following messages:

[Fri 03 Nov 2006 22:27:29] Sess_badger: for mbox 9

After those, next in the log is as follows:

[Fri 03 Nov 2006 22:27:29] Sess_write: killing mbox 9 for not reading
[Fri 03 Nov 2006 22:27:29] Sess_kill: killing session 29818 ( mailbox 9 )
[Fri 03 Nov 2006 22:29:54] Sess_accept: set sndbuf/rcvbuf to 204800
[Fri 03 Nov 2006 22:29:54] Sess_recv_client_auth: Client requested
NULL type authentication
[Fri 03 Nov 2006 22:29:54] Sess_session_authorized: Accepting from
0.0.0.0 with private name 29818 on mailbox 9
[Fri 03 Nov 2006 22:29:54] Sess_read: Message has type field 0x80010080
[Fri 03 Nov 2006 22:29:54] Sess_read: queueing message of type 8 with
len 0 to the protocol

My cursory glance through the code suggests that "sess_badger" is a
"nag"-type function that keeps trying to send the message through to
the sender.  Is this correct?  As you suggested, the log pretty
plainly said that the connection was killed due to not reading.

Finally, where is the sendbuf/rcvbuf size set?

Thanks again,
Matt




More information about the Spread-users mailing list