[Spread-users] bugreport: spread-5 crashing on conf reload, if number of daemons changed, because of virtual ID mess

John Lane Schultz jschultz at spreadconcepts.com
Wed Aug 1 10:52:54 EDT 2018


Thanks for the report Martin.

I’ll think about this and look into the code.

Thanks,
John

On Jul 26, 2018, at 11:04 AM, Martin Schu <martin.sc11111 at gmail.com> wrote:

Hi John, hi all,

if an already running daemon is triggered to reload the spread.conf at runtime, it can crash if the number of daemons has changed. There can be a confusion in the internal VirtualID table. Apparently spread fetches the wrong auto generated virtual id, if some daemon is removed/added in the middle of the table at runtime. 

Following steps lead to the problem:
- spread version 5.0.1
- No VirtualIDs are configured by us. VirtualIDs are auto-generated.
- All daemons are running.
- One Segment with one daemon is removed in the middle of spread.conf.
- The spread.conf is distributed to all hosts.
- spread is triggered to reload spread.conf by spmonitor r
- some spread daemon will abort with bad failure shown below. It is the daemon behind the removed one in the list.

Here a conflict of virtual IDs is logged:
2018-07-20 16:12:27 GMT Auto-generated virtual ID = '1696126475' for daemon 'host-06a'
2018-07-20 16:12:27 GMT The virtual ID '1696126475' of 'host-06a' is already in use by 'host-52'!  You will probably need to explicitly reconfigure the daemons' virtual IDs so that they don't conflict.

One of the spread daemons complaining after reload:
2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701
2018-07-25 14:43:04 GMT Finished configuration file.
2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-05a, id: 3524380600, addr: 1.2.3.5, port: 9876
2018-07-25 14:43:04 GMT Conf_reload_initiate: daemon identity mapped to two different old daemons: name ' host-06a' -> 0x7f57eccd8800, id '2876338096' -> 0x7f57eccd8208! Partitioning to singleton!
2018-07-25 14:43:04 GMT Conf_reload_initiate: Return need_singleton = 1

One other spread daemon crashing after reload because of virtual ID chaos:
2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701
2018-07-25 14:43:04 GMT Finished configuration file.
2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-06a, id: 2876338096, addr: 1.2.3.7, port: 9876
2018-07-25 14:43:04 GMT Conf_reload_initiate: My daemon parameters have changed! Exiting!
2018-07-25 14:43:04 GMT     Old: name 'host-06a', addr [1.2.3.7]:7654, id '1696126475', num_ifs 1
2018-07-25 14:43:04 GMT     New: name 'host-06a', addr [1.2.3.7]:7654, id '2876338096', num_ifs 1
Exit caused by Alarm!

Workaround:
- Configure a unique VirtualID in spread.conf explicitly for each daemon is a solution, because this configuration is always reloaded including configured virtualID.

Maybe the virtualID table in memory has to be cleared before the parser is loading the spread.conf a second time.
The auto-generation of virtualID is done only if the found virtualID is zero. This is not true for a reload.

No, I think this is not a problem of the internal hash algorithm delivering ambiguous hashes.

Currently we have no fix for that bug, because the workaround is good enough for us.

Best regards,
Martin

_______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
http://lists.spread.org/mailman/listinfo/spread-users




More information about the Spread-users mailing list