On 08/12/2011 03:38 PM, Adam Grossman wrote:
> hello,
> i am using version 4.1.0 on fedora 15. i follow these steps:
> Machine A & B: Created a spread config with one segment, machines A and
> B in the segment
> Machine A & B: Ran a program which joins a group (both machines join the
> same group) and then in an infinite loop, sends out a message saying
> "hello from <machine name>" , reads in any incoming messaging, and
> sleeps for 1 second.
> Machine A & B: Programs receive all messages
> Machine A: Remove Machine B from the configuration, and use spmonitor to
> reload the configuration
> Machine A: core dumps with the error: "G_compare_proc_ids_by_conf:
> Assertion `ia > -1' failed"
> this only happens if under these exact conditions. is this a bug, or i
> am handling this incorrectly? i was hoping that any still incoming
> messages from the removed daemons would just be ignored. if i am
> handling this incorrectly, would a solution be, any way to resolve
> this? since spread is a single thread, i can't view it as race
> condition or anything that some simple semaphores/mutexes would solve.

i have researched this a bit further, and the problem seems to be when 
spread sends out a message to the daemon that has been removed, it 
can't, so it tries to remove it (by calling G_remove_daemon).  that is 
where assertion fails, because the daemon is not in the config struct.

There is evidently some structure that is not cleared out when a config 
is reloaded with removed daemons.  i think the solution would be:
1. if the config reload, go through the new and old and remove any 
daemons that are not in the new from the other settings
2. if G_remove_daemon does not find a daemon in config, remove it from 
the other settings.

the issue is, i do not know what those other settings are, and how to 
remove them...

thank you,
-=- adam grossman
