[Spread-users] bugreport: spread-5 crashing on conf reload, if number of daemons changed, because of virtual ID mess

Martin Schu martin.sc11111 at gmail.com
Wed Aug 1 12:01:58 EDT 2018


Hi John,

below is our old/new spread.conf. The host-53 is removed. Afterwards a
reload (spmonitor r) is triggered. Afterwards the spread daemon at host-06a
will crash with following log. This might be because host-06a takes
virtualid of deleted host-53, instead of autogenerating it again on reload.

Thanks,
Martin

2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701
2018-07-25 14:43:04 GMT Finished configuration file.
2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-06a, id:
2876338096, addr: 1.2.3.7, port: 9876
2018-07-25 14:43:04 GMT Conf_reload_initiate: My daemon parameters have
changed! Exiting!
2018-07-25 14:43:04 GMT     Old: name 'host-06a', addr [1.2.3.7]:9876 , id
'1696126475', num_ifs 1
2018-07-25 14:43:04 GMT     New: name 'host-06a', addr [1.2.3.7]:9876 , id
'2876338096', num_ifs 1
Exit caused by Alarm!

###################################################
# Section: Spread_Segments (old version)
###################################################

Spread_Segment  1.2.3.5:9876 {
    host-05a   1.2.3.5   { 1.2.3.5 }
}
Spread_Segment  1.2.2.1:9876 {
    host-51    1.2.2.1   { 1.2.2.1 }
}
Spread_Segment  1.2.2.2:9876 {
    host-52    1.2.2.2   { 1.2.2.2 }
}
Spread_Segment  1.2.2.3:9876 {
    host-53    1.2.2.3   { 1.2.2.3 }
}
Spread_Segment  1.2.3.7:9876 {
    host-06a   1.2.3.7   { 1.2.3.7 }
}

###################################################
# Section: Spread_Segments (new version)
###################################################

Spread_Segment  1.2.3.5:9876 {
    host-05a   1.2.3.5   { 1.2.3.5 }
}
Spread_Segment  1.2.2.1:9876 {
    host-51    1.2.2.1   { 1.2.2.1 }
}
Spread_Segment  1.2.2.2:9876 {
    host-52    1.2.2.2   { 1.2.2.2 }
}
##Spread_Segment  1.2.2.3:9876 {
##   host-53    1.2.2.3   { 1.2.2.3 }
##}
Spread_Segment  1.2.3.7:9876 {
    host-06a   1.2.3.7   { 1.2.3.7 }
}


On Wed, Aug 1, 2018 at 5:30 PM, John Lane Schultz <
jschultz at spreadconcepts.com> wrote:

> Hi Martin,
>
> Would you please send me the two configuration files (old and new) that
> caused this issue, so I can more concretely understand your incident?
>
> Thanks,
> John
>
> On Jul 26, 2018, at 11:04 AM, Martin Schu <martin.sc11111 at gmail.com>
> wrote:
>
> Hi John, hi all,
>
> if an already running daemon is triggered to reload the spread.conf at
> runtime, it can crash if the number of daemons has changed. There can be a
> confusion in the internal VirtualID table. Apparently spread fetches the
> wrong auto generated virtual id, if some daemon is removed/added in the
> middle of the table at runtime.
>
> Following steps lead to the problem:
> - spread version 5.0.1
> - No VirtualIDs are configured by us. VirtualIDs are auto-generated.
> - All daemons are running.
> - One Segment with one daemon is removed in the middle of spread.conf.
> - The spread.conf is distributed to all hosts.
> - spread is triggered to reload spread.conf by spmonitor r
> - some spread daemon will abort with bad failure shown below. It is the
> daemon behind the removed one in the list.
>
> Here a conflict of virtual IDs is logged:
> 2018-07-20 16:12:27 GMT Auto-generated virtual ID = '1696126475' for
> daemon 'host-06a'
> 2018-07-20 16:12:27 GMT The virtual ID '1696126475' of 'host-06a' is
> already in use by 'host-52'!  You will probably need to explicitly
> reconfigure the daemons' virtual IDs so that they don't conflict.
>
> One of the spread daemons complaining after reload:
> 2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701
> 2018-07-25 14:43:04 GMT Finished configuration file.
> 2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-05a, id:
> 3524380600, addr: 1.2.3.5, port: 9876
> 2018-07-25 14:43:04 GMT Conf_reload_initiate: daemon identity mapped to
> two different old daemons: name ' host-06a' -> 0x7f57eccd8800, id
> '2876338096' -> 0x7f57eccd8208! Partitioning to singleton!
> 2018-07-25 14:43:04 GMT Conf_reload_initiate: Return need_singleton = 1
>
> One other spread daemon crashing after reload because of virtual ID chaos:
> 2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701
> 2018-07-25 14:43:04 GMT Finished configuration file.
> 2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-06a, id:
> 2876338096, addr: 1.2.3.7, port: 9876
> 2018-07-25 14:43:04 GMT Conf_reload_initiate: My daemon parameters have
> changed! Exiting!
> 2018-07-25 14:43:04 GMT     Old: name 'host-06a', addr [1.2.3.7]:9876 , id
> '1696126475', num_ifs 1
> 2018-07-25 14:43:04 GMT     New: name 'host-06a', addr [1.2.3.7]:9876 , id
> '2876338096', num_ifs 1
> Exit caused by Alarm!
>
> Workaround:
> - Configure a unique VirtualID in spread.conf explicitly for each daemon
> is a solution, because this configuration is always reloaded including
> configured virtualID.
>
> Maybe the virtualID table in memory has to be cleared before the parser is
> loading the spread.conf a second time.
> The auto-generation of virtualID is done only if the found virtualID is
> zero. This is not true for a reload.
>
> No, I think this is not a problem of the internal hash algorithm
> delivering ambiguous hashes.
>
> Currently we have no fix for that bug, because the workaround is good
> enough for us.
>
> Best regards,
> Martin
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.spread.org/pipermail/spread-users/attachments/20180801/e6b83d05/attachment.html>


More information about the Spread-users mailing list