[Spread-users] Spread 4
Serge Aleynikov
serge at hq.idt.net
Sun Jul 3 13:10:54 EDT 2005
Yair,
Since you are addressing the robustness issues, I'd like to make a
couple of improvement suggestions.
In one of the projects I needed to integrate spread communications in a
form of a shared library, where the client would load the *.so library
and handle all message passing through abstracted API calls, so that the
client wasn't aware of the underlying implementation of the messaging
layer. One of the limitations I found were related to the fact that
Spread API didn't allow the client to set the handling of SIGPIPE in the
socket send() function (MSG_NOSIGNAL). As a result when there is a
problem with the network connection to the Spread daemon, and the client
attempts to do a send() call, it gets shut down by the OS due to
unhanded SIGPIPE signal. When spread connection is managed in the
client which is an executable program, the program can define a message
handler using signal(), but in my case I was embedding Spread
communications in a shared library, where it would be improper to alter
signal handling of the process that loaded the library.
I believe it would be beneficial if you could revise this logic.
Another feature that might be useful is that the determination of
network partitioning between spread daemons could be done via several
connectivity paths, including serial interface. If a daemon detects
network partitioning (where it cannot reach a group of preconfigured
IPs), it should disconnect all clients connected to it. This way if
clients have some failover logic, that would be able to change roles
gracefully without having a situation with live two masters (such as
having a spread group FOO defined in two Spread daemons that initially
had two members BAR1 and BAR2, and after network partitioning have first
daemon stating that FOO has the member BAR1 and the second daemon
stating that FOO has the member BAR2). I think that in this example the
daemons must have a good strategy of letting connected clients know how
to deal with this situation. As an example linux-ha.org project has
some good ways of dealing with network partitioning gracefully.
Regards,
Serge
Yair Amir wrote:
> Hi Gavin,
>
> > Neil Conway and I are busily hacking on spread to see if we can increase
> > its robustness and usability. There is some discussion in the source and
> > in the mailing list archives of spread 4. Is spread 4 being actively
> > worked upon? If so, is the source publicly available?
>
> The next version is actively worked on and most of it is almost ready
> for final
> testing. So, it is now especially a good time to raise robustness issues.
> There are few issues in the new version that still need attention and
> some of
> us were bogged down by a lot of other work, vacation, and even moving.
>
> I am of the attitude that code better be ready before dropped out on
> people,
> wasting their valuable time, and Spread 3.17.3 is doing its job
> reasonably well I
> think, so I see no point to rush code out before its ready.
> In any case, we at Spread Concepts need some of the new features real quick
> so it can't be too long before its out :)
>
> The new version has considerable changes from Spread 3, changing
> the way virtual synchrony sets are reported in membership messages.
> This is done for a good reason - to support a stronger semantics, that
> comes
> for free, for membership VS sets (instead of who-came-with-us, we can now
> know who-came-with-whom, which happens to be very useful in the context of
> fast synchronization as well as for some security algorithms). Of
> course, the
> old semantics is subsumed by the new semantics so programs that relied
> on the
> old semantics have what they need.
>
> In general, though, we hope people should not have hard time migrating to
> this version if they so choose.
>
> Other important changes:
>
> - Ability to dynamically add and remove machines from the configuration,
> with
> deep optimization for the common cases everyone is screaming about for
> a few
> years now. We at Spread Concepts experimented with a few different
> solutions
> over the years in custom settings but were not happy with any of them, so
> we prefered to leave it as it is instead of creating more problems
> allowing
> semi-solutions. We finally were happy with a new approach we came up
> with a
> few months ago and this is part of the new version.
>
> - Inclusion of the Flush layer (virtual synchrony) as an integral part
> of Spread.
> Flush was written for our security research many years ago and ended
> up encapsulating
> extremely deep research in John Schultz' thesis. We find this
> semantics useful
> for some uses, Flush is extremely robust and is thoroughly tested as part
> of Secure Spread, so it is time for it to be adopted.
>
> There are other changes but the message is becoming too long so this
> will do for now.
>
> Cheers,
>
> :) Yair.
More information about the Spread-users
mailing list