[Spread-users] Is there a limit on total traffic sent by Spread?

Guido van Rossum guido at python.org
Fri Jan 18 23:19:52 EST 2002


We've added batching of small messages to our app.  I think we're
experiencing a different problem now, but it may be related.  As a
stress test of our app, I'm pumping a database of 4.5 Gbyte through
it.  The sending process gets a CONNECTION_CLOSED error from Spread
towards the end of the test; at that point the receiving process has
received fairly close to 4 Gbyte of data (the point where a 32-bit
unsigned int holding a byte count would overflow).

Could it be that the sender gets disconnected for sending more than 4
Gbyte of data?  (There is enough overhead in the messages we use to
transfer the data that I believe that the total message size up to
that point is larger than 4 Gbyte.)

Lots of programs can't deal with over 4 Gbyte of data (e.g. scp
doesn't), but there is no such limit in TCP: in a different mode, our
app successfully transfers the entire 4.5 Gbyte database over a single
TCP connection (not using Spread).

This error has happened several times, at roughly (but not exactly)
the same point in the transfer; I've not seen a successful transfer of
the 4.5 Gbyte database.  I've also seen it happen at a much earlier
point; again several times around the same point in the data --
probably the "burstiness" of our data has significance.

Once I was logging all SESSION messages from the Spread daemon to a
file; the log file abruptly ended with these messages:

Sess_read: failed receiving message on session 9, ret is -1: error: Resource temporarily unavailable
Sess_read: failed recv msg more info: len read: 102192, remain: 240, to_read: 240, pkt_index: 71, b_index: 0, scat_nums: 72
Sess_kill: killing session test1 ( mailbox 9 )

If it's not the 4Gbyte limit, are there other reasons why a prolific
*sender* can be disconnected?

Also, this *is* a multithreaded app; there's one thread receiving and
one thread sending.  But the receiving thread is mostly idle: there
isn't any traffic going in that direction.  Yet, it *appears* that it
gets the error first (before the sending thread gets the same error).
The threads are sharing an mbox.  Could it be that the Linux network
drivers somehow get overloaded and make the receive fail?  (This *is*
a stress test -- before we upgraded all our kiernels to the latest,
2.4.17, we had regular kernel crashes in an earlier test using the
same 4.5 Gbyte database.)

--Guido van Rossum (home page: http://www.python.org/~guido/)





More information about the Spread-users mailing list