[Spread-users] Issue with Spread going silent

Luke Marsden luke-lists at hybrid-logic.co.uk
Sun Nov 7 12:54:13 EST 2010


Hi Yair,

Thank you for this. I have now recompiled the Spread Python bindings
against the version of the library. The size of the spread.so file
changed, so I was hopeful. No luck though, I still get the same problem.
Spread works fine when the Python daemon is disconnected, but fails to
accept a new Spread node to a group of two when the Python daemon is
connected and sending messages.

Can you think of any reason why having clients connected to Spread could
cause it to behave in this way?

My next step is to try and reproduce the problem with the smallest
possible Python script which just sends a few bytes of heartbeat data
every second.

I do have some real hardware, some old PowerEdge 1850s, in my basement
which I can upgrade to FreeBSD 8.1 -- it is possible that the problem is
triggered by both having a Python client connected *and* being on a
virtualised platform.

Getting the servers up and running will take a bit of time though. First
I'll see if I can reproduce the issue with a simplest-possible Python
test case.

I'll be in touch with my findings as soon as possible.

Thank you again.

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Mobile: +447791750420


On Sun, 2010-11-07 at 12:29 -0500, Yair Amir wrote:
> Dear Luke,
> 
> Thanks - this is very helpful. This confirmed my analysis from before.
> 
> The network membership looks good, so the form2 token should be sent
> to the correct address, but unfortunately, that specific message is never received
> (actually the message is sent twice by 147 but none of the copies is
>   received by 102).
> 
> I don't see an easy way to diagnose this without digging to the network level
> because, from Spread perspective, it seems it is doing its job correctly and
> just a specific message is never making it even though it is sent several times.
> And the same thing repeats exactly the same.
> 
> So all in all, I don't think it is a higher level bug. The next step would be
> to turn on NETWORK level debug messages and to see what the network layer of
> Spread is doing with that specific message. You do this similarly to the
> way we turned MEMBERSHIP debug messages - just add the word NETWORK
> before (or after) MEMBERSHIP in the spread.conf file
> 
> If the network level of spread will do its job as I expect, it will go to the
> data link level of Spread, and beyond that become an operating system / network
> card issue.
> 
> Before we dive in this - is there a way to natively have 3 computers running
> the exact same operating system but without virtualization?
> I know many people use Spread with virtualization successfully as I do in some
> testings, but not with FreeBSD (I have mac, linux and Windows).
> It is ironic - Spread was originally developed on NetBSD.
> 
> Cheers,
> 
> 	:) Yair.
> 
> On 11/7/10 10:46 AM, Luke Marsden wrote:
> > Hi Yair,
> > 
> > Thank you so much for your time on this.
> > 
> > Here is the diff so you can check it:
> > https://github.com/hybridlogic/Spread-Yair-fix/commit/cc456dcaa073629634ce0019673324b54af71b4f
> > Also I had to do this to get it to compile:
> > https://github.com/hybridlogic/Spread-Yair-fix/commit/15649ddc00bc728204b324f63c13fe77fb15a33a
> > 
> > And here is the output for the first few seconds after starting the
> > third daemon:
> > 
> > http://lukemarsden.net/yair-debug/Screenshot-1.png
> > http://lukemarsden.net/yair-debug/Screenshot-2.png
> > http://lukemarsden.net/yair-debug/Screenshot-3.png
> > http://lukemarsden.net/yair-debug/Screenshot-4.png
> > 
> > (Ignore the *** GOT HERE ***, that was me.)
> > 
> > If you wish to make any code changes, you can fork the repo at
> > https://github.com/hybridlogic/Spread-Yair-fix to your own GitHub
> > account, commit the changes and issue a pull request, then I can merge
> > and test very quickly.
> > 
> > Alternatively just send me line numbers and code and I'll apply the
> > changes manually, whatever's quicker for you :-)
> > 
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users





More information about the Spread-users mailing list