[Spread-users] after fork: session 'xxx' trying to make session 'yyy' do something

Yair Amir yairamir at cnds.jhu.edu
Tue Oct 1 07:53:39 EDT 2002


I miss some information. For example I need to see the EXACT
SP_multicast call you do (including all of its parameters).

             :) Yair.
             
James> Hi,

James> I'm a Spread beginner.  We're using Spread 3.16.2 in a system which
James> distributes work from a shared queue to to a number of distributed
James> compute servers.  The servers each join spread groups with the same
James> name as the queue, and I use the group ordering to designate one as
James> the "leader".  The leader is responsible for monitoring the queue
James> and dispatching jobs to the remaining servers.

James> When a job is dispatched, the server which is running it forks a
James> new process in which to execute it.  The FD for the spread connection
James> is (of course) inherited across the fork.  The child needs its own
James> connection to the spread daemon, so it closes the parent's mbox
James> (with close(2)), and then calls SP_connect to reconnect.
James> Schematically:

James>    parent_mbox = SP_connect("xxx");
James>    SP_join(parent_mbox, "queue");
James>    while ( ... ) {
James>      SP_receive();
James>      if ( fork() == 0 ) {
James>         // child process for a new job
James>         close(parent_mbox);
James>         child_mbox = SP_connect("yyy");
James>         SP_join(child_mbox, "queue");
James>         SP_multicast()
James>      }
James>    }

James> If parent_mbox and child_mbox happen to be assigned the same (numerical)
James> FD, then the spread daemon produces the error message quoted in the
James> subject line, where "xxx" and "yyy" are the private names of the parent
James> and child, respectively, and the child's first attempt to SP_multicast()
James> dies with error -8: "connection closed by spread".  I can work around
James> the bug by doing a dummy open() or pipe() call between the close()
James> and the SP_connect() in the child.

James> After a quick examination of sp.c, it appears that I don't want to do
James> an SP_disconnect() in the child, because that actually updates the
James> membership at the daemon and the parent is not actually disconnecting.
James> The internal function SP_kill() looks like it does what I want (close
James> the mbox and update the client-side session table), but it's static and
James> undocumented.

James> Suggesstion? Any help would be appreciated.

James> Thanks, Jim





More information about the Spread-users mailing list