[Spread-users] spread daemon rejects connections from a disconnected client

Shlomi Yaakobovich Shlomi at exanet.com
Tue Nov 16 09:41:15 EST 2004


Hi all,

Well, this happened again, and this time I was able to do some more debugging. 

It seems that the unix socket that is opened in /tmp/4803, which is used to receive connections from the local client, has dissapeared !  When you look at the contents of /tmp it is not there... However, when I did lsof on the spreaqd daemon, it still showed a file descriptor pointing to this location. So, clients were unable to connect because it was missing (for them), but the daemon thought everything was OK (for it).

I can add monitoring service to check this, but it is a strange behavior. 

Any ideas ?

Shlomi

> -----Original Message-----
> From: spread-users-admin at lists.spread.org 
> [mailto:spread-users-admin at lists.spread.org]On Behalf Of 
> Shlomi Yaakobovich
> Sent: Monday, November 15, 2004 8:28 AM
> To: Ryan Caudy
> Cc: spread-users at lists.spread.org
> Subject: RE: [Spread-users] spread daemon rejects connections 
> from a disconnected client
> 
> 
> Hi,
> 
> > Well, I certainly think that Spread should be checking 
> return codes a
> > bit more robustly.  I've only looked at 3.17.3, but it's doing a
> > non-blocking recv call, and not making sure that the full length it
> > expected is read out.  0 is certainly a legal return, that should be
> > checked for, but it usually means that the socket was 
> closed from the
> 
> Yes, I believe that this is worth a fix in 3.17.3 anyway, 
> there's no point checking the version buffer if no bytes have 
> been read...
> 
> > other end.  Do you know what was going on in the client library?
> 
> I do not have information about the client's actions, but the 
> normal way that it works with the client is by calling 
> SP_connect_timeout and checking its return code. There might 
> have been an error connecting, but I don't know why it had 
> persisted, and why was it resolved due to spread' daemon 
> restart. It is quite a rare problem, and we've already 
> upgraded to 3.17.3, so we'll see if this happens again. If it 
> does, I'll try to look at the client as well.
>  
> Shlomi
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
> 




More information about the Spread-users mailing list