[Spread-users] problem re-using private names with 3.17.0
jonathan at cnds.jhu.edu
Mon Oct 28 22:50:01 EST 2002
I would like to clarify one aspect of the connect/disconnect that may
illuminate the problem.
In short: A SP_connect does NOT require any SAFE/AGREED or other ordered
messages, it is a purely one-to-one client-server operation. And after the
SP_connection completes, only the daemon who handled it knows about the
client. SP_disconnect DOES require a SAFE message to be exchanged to all
of the daemons in order to clean out all knowledge of the 'disconnected'
In the long:
When a client 'connects' to a daemon, as soon as the daemon get the client
socket from accept, it does a connection initiation protocol that
exchanges some information between client and server such as version
number, client user name, authentication protocol, etc. When that
client-server data exchange is complete, the connection is "established"
and the SP_connect call returns success. The key point here is that NO
ordered message (Agreed, or SAFE) is exchanged by the daemons. Actually NO
communication at all is required among the daemons when a SP_connect is
handled. It is purely a client-server operation.
However, when a SP_disconnect occurs, it is required that an ordered
message be sent among all of the daemons as SAFE, so that they all remove
the knowledge of that client connection from any groups it is in at the
same 'virtual' time. The SP_disconnect call does not wait for that message
to be exchanged before it returns to the user application. Because the
client can always just close the socket, a SP_disconnect is essentially
the same as a closed socket.
So the basic issue is that a client does SP_disconnect, and the call
returns "immediately" while the daemon sends a SAFE message and has to
wait for the message to be ordered and 'safe' before acting on it and
removing the state of the client. While the daemon is waiting the client
does a SP_connect using teh same name. This connection might be refused
because the SAFE message has not been delivered yet and so it appears to
be a duplicate client. In a very 'short' time the SAFE message will be
delivered and the client will be able to connect.
This is known behavior, and as the messages from the Python people note,
they have dealt with this by retrying the connect. However, in the current
version 3.17.0, they see that the connect fails for a long time, much
longer then the time a SAFE message takes. I'm working on duplicating that
and seeing if I can figure it out. But the basic 'short' delay is not a
With regards to joining and leaving groups, as Yair said, they are sent as
AGREED messages and will correctly be ordered with regards to SAFE
On Mon, Oct 28, 2002 at 03:05:53PM -0500, Yair Amir wrote:
> Hi Sean,
> I am not sure I understand your message.
> Assuming you mean "joining the group" when you write "logging into the
> group" then we currently are not using the same service for
> connecting to Spread and for joining/leaving a group.
> It is sufficient to use an AGREED message for the join and leave
> events. Until a version or two ago we used the SAFE service for this
> but this was not necessary. AGREED is much faster (about 4 times...)
> and is absolutely correct to use in this implementation of Spread.
> In terms of the connect/disconnect, we currently are using the
> SAFE service, which is absolutely correct to do and should provide
> more information in terms of when you disconnected in terms of the
> stream of messages. Note that the safe and agreed services provide
> exactly the same ordering guarantee so there is no issue in using
> connect/disconnect as safe while join/leave as agreed.
> :) Yair.
> >> If you disconnect and immediately connect with the same name, it is
> >> very likely to get the not-unique rejection and this is perfectly
> >> normal and is the intended outcome (and not a bug).
> >> Upon disconnection, Spread passes a safe message to let everyone
> >> knows that this guy disconnected and only upon delivery of this
> >> message it actually clears this guy out. In the meantime, connecting
> >> with the same name will be rejected. Usually this should take a
> >> small amount of time, very likely less than a second in local area
> >> networks.
> Sean> Does it not use a safe message for logging into the group? I'd think
> Sean> with ordered delivery that this wouldn't be an issue. What's the
> Sean> message type used on a login attempt? If the message type is anything
> Sean> but the same as the disconnection message... well... that behavior
> Sean> doesn't make sense to me. In what cases/situations would this be
> Sean> desired? -sc
> Spread-users mailing list
> Spread-users at lists.spread.org
Jonathan R. Stanton jonathan at cs.jhu.edu
Dept. of Computer Science
Johns Hopkins University
More information about the Spread-users