[Spread-users] spread daemon rejects connections from a disconnected client
Jonathan Stanton
jonathan at cnds.jhu.edu
Tue Nov 16 10:33:24 EST 2004
It's a bit of a long shot, but I've seen a problem like this a few years
ago -- Do you happen to have a cron job that cleans out /tmp once a week
or so?
I'm not sure what Spread could do that can cause the unix socket to
disappear, so I'd check for any outside program that has write access to
the directory the spread socket is in.
I remember someone changing the default socket location for the unix
socket from /tmp to a subdirectory of /var/run to get better
security control over it and match some distributiosn standard file
locations. Currently I think that requires patching the source and
rebuildling the daemon and libraries as they both have to agree on teh
location to look for the socket.
Cheers,
Jonathan
On Tue, Nov 16, 2004 at 04:41:15PM +0200, Shlomi Yaakobovich wrote:
> Hi all,
>
> Well, this happened again, and this time I was able to do some more debugging.
>
> It seems that the unix socket that is opened in /tmp/4803, which is used to receive connections from the local client, has dissapeared ! When you look at the contents of /tmp it is not there... However, when I did lsof on the spreaqd daemon, it still showed a file descriptor pointing to this location. So, clients were unable to connect because it was missing (for them), but the daemon thought everything was OK (for it).
>
> I can add monitoring service to check this, but it is a strange behavior.
>
> Any ideas ?
>
> Shlomi
>
> > -----Original Message-----
> > From: spread-users-admin at lists.spread.org
> > [mailto:spread-users-admin at lists.spread.org]On Behalf Of
> > Shlomi Yaakobovich
> > Sent: Monday, November 15, 2004 8:28 AM
> > To: Ryan Caudy
> > Cc: spread-users at lists.spread.org
> > Subject: RE: [Spread-users] spread daemon rejects connections
> > from a disconnected client
> >
> >
> > Hi,
> >
> > > Well, I certainly think that Spread should be checking
> > return codes a
> > > bit more robustly. I've only looked at 3.17.3, but it's doing a
> > > non-blocking recv call, and not making sure that the full length it
> > > expected is read out. 0 is certainly a legal return, that should be
> > > checked for, but it usually means that the socket was
> > closed from the
> >
> > Yes, I believe that this is worth a fix in 3.17.3 anyway,
> > there's no point checking the version buffer if no bytes have
> > been read...
> >
> > > other end. Do you know what was going on in the client library?
> >
> > I do not have information about the client's actions, but the
> > normal way that it works with the client is by calling
> > SP_connect_timeout and checking its return code. There might
> > have been an error connecting, but I don't know why it had
> > persisted, and why was it resolved due to spread' daemon
> > restart. It is quite a rare problem, and we've already
> > upgraded to 3.17.3, so we'll see if this happens again. If it
> > does, I'll try to look at the client as well.
> >
> > Shlomi
> >
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
> >
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
--
-------------------------------------------------------
Jonathan R. Stanton jonathan at cs.jhu.edu
Dept. of Computer Science
Johns Hopkins University
-------------------------------------------------------
More information about the Spread-users
mailing list