[Spread-users] Corrupt packets
Jonathan Stanton
jonathan at cnds.jhu.edu
Mon Dec 4 16:11:34 EST 2006
Very interesting.
I saw in the patch that you are checksumming both the daemon-to-daemon
traffic (UDP) and the client-server (message contents only) which goes
over TCP/UnixDomain. This is really strange, as both UDP and TCP have
checksums and should not deliver corrupted data to the application
(Spread)
Were the UDP/TCP checksums valid on the 'corrupt' data -- I'd guess they
had to be for the packets not to be dropped -- were you able to capture
an example packet that had a valid checksum but was corrupt?
This kind of checksum is something I'd like to avoid if possible as it
complicates the code and is more overhead per packet -- but if we can
have corrupt data delivery and it isn't just a particular OS bug, then
it's worth considering.
If the data is corrupted in kernel/memory before being sent but after
"spread" finished with it, then that would explain the situation -- but
should indicate an OS bug.
Jonathan
On Mon, Dec 04, 2006 at 09:01:51AM -0800, Alec H. Peterson wrote:
> Hi all,
>
> So a few days ago I e-mailed about getting ring lockups. We tracked
> this problem down to corrupt packets getting delivered to Spread
> (both over the session and data link layers). I've attached a patch
> that seems to address the problems by adding a checksum to the
> appropriate data structures, and we feel this could potentially be
> useful to others. If there are reasons why this shouldn't be
> included in Spread we would love to know, because those may well be
> reasons why we shouldn't use it. Clearly it changes the network
> protocol, so it won't be compatible with other builds of Spread.
> However, this does solve our lockup and corrupt data problems.
>
> We're also curious if anybody else has seen 'odd' Spread behavior
> (like ring lockups and/or corrupt data delivered to the client
> library). The configuration we have seen this on is very straight-
> forward:
>
> Sun x4100 Server
> Solaris 10
> Spread 3.17.3 (both stock and with some local patches)
>
> We have some very similar servers deployed in-house that do not
> experience these problems at all.
>
> Thanks!
>
> Alec
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
--
-------------------------------------------------------
Jonathan R. Stanton jonathan at cs.jhu.edu
Dept. of Computer Science
Johns Hopkins University
-------------------------------------------------------
More information about the Spread-users
mailing list