[Spread-users] bugreport: spread-5 crashing on conf reload, if number of daemons changed, because of virtual ID mess

Martin Schu martin.sc11111 at gmail.com
Thu Jul 26 11:04:58 EDT 2018


Hi John, hi all,

if an already running daemon is triggered to reload the spread.conf at
runtime, it can crash if the number of daemons has changed. There can be a
confusion in the internal VirtualID table. Apparently spread fetches the
wrong auto generated virtual id, if some daemon is removed/added in the
middle of the table at runtime.

Following steps lead to the problem:
- spread version 5.0.1
- No VirtualIDs are configured by us. VirtualIDs are auto-generated.
- All daemons are running.
- One Segment with one daemon is removed in the middle of spread.conf.
- The spread.conf is distributed to all hosts.
- spread is triggered to reload spread.conf by spmonitor r
- some spread daemon will abort with bad failure shown below. It is the
daemon behind the removed one in the list.

Here a conflict of virtual IDs is logged:
2018-07-20 16:12:27 GMT Auto-generated virtual ID = '1696126475' for daemon
'host-06a'
2018-07-20 16:12:27 GMT The virtual ID '1696126475' of 'host-06a' is
already in use by 'host-52'!  You will probably need to explicitly
reconfigure the daemons' virtual IDs so that they don't conflict.

One of the spread daemons complaining after reload:
2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701
2018-07-25 14:43:04 GMT Finished configuration file.
2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-05a, id:
3524380600, addr: 1.2.3.5, port: 9876
2018-07-25 14:43:04 GMT Conf_reload_initiate: daemon identity mapped to two
different old daemons: name ' host-06a' -> 0x7f57eccd8800, id '2876338096'
-> 0x7f57eccd8208! Partitioning to singleton!
2018-07-25 14:43:04 GMT Conf_reload_initiate: Return need_singleton = 1

One other spread daemon crashing after reload because of virtual ID chaos:
2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701
2018-07-25 14:43:04 GMT Finished configuration file.
2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-06a, id:
2876338096, addr: 1.2.3.7, port: 9876
2018-07-25 14:43:04 GMT Conf_reload_initiate: My daemon parameters have
changed! Exiting!
2018-07-25 14:43:04 GMT     Old: name 'host-06a', addr [1.2.3.7]:7654, id
'1696126475', num_ifs 1
2018-07-25 14:43:04 GMT     New: name 'host-06a', addr [1.2.3.7]:7654, id
'2876338096', num_ifs 1
Exit caused by Alarm!

Workaround:
- Configure a unique VirtualID in spread.conf explicitly for each daemon is
a solution, because this configuration is always reloaded including
configured virtualID.

Maybe the virtualID table in memory has to be cleared before the parser is
loading the spread.conf a second time.
The auto-generation of virtualID is done only if the found virtualID is
zero. This is not true for a reload.

No, I think this is not a problem of the internal hash algorithm delivering
ambiguous hashes.

Currently we have no fix for that bug, because the workaround is good
enough for us.

Best regards,
Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spread.org/pipermail/spread-users/attachments/20180726/077ec7bf/attachment.html>


More information about the Spread-users mailing list