[Spread-users] spread daemon rejects connections from a disconnected client

Jonathan Stanton jonathan at cnds.jhu.edu
Tue Nov 16 10:33:24 EST 2004


It's a bit of a long shot, but I've seen a problem like this a few years 
ago -- Do you happen to have a cron job that cleans out /tmp once a week 
or so?

I'm not sure what Spread could do that can cause the unix socket to 
disappear, so I'd check for any outside program that has write access to 
the directory the spread socket is in. 

I remember someone changing the default socket location for the unix 
socket from /tmp to a subdirectory of /var/run  to get better 
security control over it and match some distributiosn standard file 
locations. Currently I think that requires patching the source and 
rebuildling the daemon and libraries as they both have to agree on teh 
location to look for the socket. 

Cheers,

Jonathan

On Tue, Nov 16, 2004 at 04:41:15PM +0200, Shlomi Yaakobovich wrote:
> Hi all,
> 
> Well, this happened again, and this time I was able to do some more debugging. 
> 
> It seems that the unix socket that is opened in /tmp/4803, which is used to receive connections from the local client, has dissapeared !  When you look at the contents of /tmp it is not there... However, when I did lsof on the spreaqd daemon, it still showed a file descriptor pointing to this location. So, clients were unable to connect because it was missing (for them), but the daemon thought everything was OK (for it).
> 
> I can add monitoring service to check this, but it is a strange behavior. 
> 
> Any ideas ?
> 
> Shlomi
> 
> > -----Original Message-----
> > From: spread-users-admin at lists.spread.org 
> > [mailto:spread-users-admin at lists.spread.org]On Behalf Of 
> > Shlomi Yaakobovich
> > Sent: Monday, November 15, 2004 8:28 AM
> > To: Ryan Caudy
> > Cc: spread-users at lists.spread.org
> > Subject: RE: [Spread-users] spread daemon rejects connections 
> > from a disconnected client
> > 
> > 
> > Hi,
> > 
> > > Well, I certainly think that Spread should be checking 
> > return codes a
> > > bit more robustly.  I've only looked at 3.17.3, but it's doing a
> > > non-blocking recv call, and not making sure that the full length it
> > > expected is read out.  0 is certainly a legal return, that should be
> > > checked for, but it usually means that the socket was 
> > closed from the
> > 
> > Yes, I believe that this is worth a fix in 3.17.3 anyway, 
> > there's no point checking the version buffer if no bytes have 
> > been read...
> > 
> > > other end.  Do you know what was going on in the client library?
> > 
> > I do not have information about the client's actions, but the 
> > normal way that it works with the client is by calling 
> > SP_connect_timeout and checking its return code. There might 
> > have been an error connecting, but I don't know why it had 
> > persisted, and why was it resolved due to spread' daemon 
> > restart. It is quite a rare problem, and we've already 
> > upgraded to 3.17.3, so we'll see if this happens again. If it 
> > does, I'll try to look at the client as well.
> >  
> > Shlomi
> > 
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
> > 
> 
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users

-- 
-------------------------------------------------------
Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science   
Johns Hopkins University    
-------------------------------------------------------




More information about the Spread-users mailing list