From jschultz at spreadconcepts.com Wed Aug 1 10:52:54 2018 From: jschultz at spreadconcepts.com (John Lane Schultz) Date: Wed, 1 Aug 2018 10:52:54 -0400 Subject: [Spread-users] bugreport: spread-5 crashing on conf reload, if number of daemons changed, because of virtual ID mess In-Reply-To: References: Message-ID: <89619059-F680-47E7-B2A0-5552A58D7A12@spreadconcepts.com> Thanks for the report Martin. I?ll think about this and look into the code. Thanks, John On Jul 26, 2018, at 11:04 AM, Martin Schu wrote: Hi John, hi all, if an already running daemon is triggered to reload the spread.conf at runtime, it can crash if the number of daemons has changed. There can be a confusion in the internal VirtualID table. Apparently spread fetches the wrong auto generated virtual id, if some daemon is removed/added in the middle of the table at runtime. Following steps lead to the problem: - spread version 5.0.1 - No VirtualIDs are configured by us. VirtualIDs are auto-generated. - All daemons are running. - One Segment with one daemon is removed in the middle of spread.conf. - The spread.conf is distributed to all hosts. - spread is triggered to reload spread.conf by spmonitor r - some spread daemon will abort with bad failure shown below. It is the daemon behind the removed one in the list. Here a conflict of virtual IDs is logged: 2018-07-20 16:12:27 GMT Auto-generated virtual ID = '1696126475' for daemon 'host-06a' 2018-07-20 16:12:27 GMT The virtual ID '1696126475' of 'host-06a' is already in use by 'host-52'! You will probably need to explicitly reconfigure the daemons' virtual IDs so that they don't conflict. One of the spread daemons complaining after reload: 2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701 2018-07-25 14:43:04 GMT Finished configuration file. 2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-05a, id: 3524380600, addr: 1.2.3.5, port: 9876 2018-07-25 14:43:04 GMT Conf_reload_initiate: daemon identity mapped to two different old daemons: name ' host-06a' -> 0x7f57eccd8800, id '2876338096' -> 0x7f57eccd8208! Partitioning to singleton! 2018-07-25 14:43:04 GMT Conf_reload_initiate: Return need_singleton = 1 One other spread daemon crashing after reload because of virtual ID chaos: 2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701 2018-07-25 14:43:04 GMT Finished configuration file. 2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-06a, id: 2876338096, addr: 1.2.3.7, port: 9876 2018-07-25 14:43:04 GMT Conf_reload_initiate: My daemon parameters have changed! Exiting! 2018-07-25 14:43:04 GMT Old: name 'host-06a', addr [1.2.3.7]:7654, id '1696126475', num_ifs 1 2018-07-25 14:43:04 GMT New: name 'host-06a', addr [1.2.3.7]:7654, id '2876338096', num_ifs 1 Exit caused by Alarm! Workaround: - Configure a unique VirtualID in spread.conf explicitly for each daemon is a solution, because this configuration is always reloaded including configured virtualID. Maybe the virtualID table in memory has to be cleared before the parser is loading the spread.conf a second time. The auto-generation of virtualID is done only if the found virtualID is zero. This is not true for a reload. No, I think this is not a problem of the internal hash algorithm delivering ambiguous hashes. Currently we have no fix for that bug, because the workaround is good enough for us. Best regards, Martin _______________________________________________ Spread-users mailing list Spread-users at lists.spread.org http://lists.spread.org/mailman/listinfo/spread-users From jschultz at spreadconcepts.com Wed Aug 1 11:30:37 2018 From: jschultz at spreadconcepts.com (John Lane Schultz) Date: Wed, 1 Aug 2018 11:30:37 -0400 Subject: [Spread-users] bugreport: spread-5 crashing on conf reload, if number of daemons changed, because of virtual ID mess In-Reply-To: References: Message-ID: Hi Martin, Would you please send me the two configuration files (old and new) that caused this issue, so I can more concretely understand your incident? Thanks, John On Jul 26, 2018, at 11:04 AM, Martin Schu wrote: Hi John, hi all, if an already running daemon is triggered to reload the spread.conf at runtime, it can crash if the number of daemons has changed. There can be a confusion in the internal VirtualID table. Apparently spread fetches the wrong auto generated virtual id, if some daemon is removed/added in the middle of the table at runtime. Following steps lead to the problem: - spread version 5.0.1 - No VirtualIDs are configured by us. VirtualIDs are auto-generated. - All daemons are running. - One Segment with one daemon is removed in the middle of spread.conf. - The spread.conf is distributed to all hosts. - spread is triggered to reload spread.conf by spmonitor r - some spread daemon will abort with bad failure shown below. It is the daemon behind the removed one in the list. Here a conflict of virtual IDs is logged: 2018-07-20 16:12:27 GMT Auto-generated virtual ID = '1696126475' for daemon 'host-06a' 2018-07-20 16:12:27 GMT The virtual ID '1696126475' of 'host-06a' is already in use by 'host-52'! You will probably need to explicitly reconfigure the daemons' virtual IDs so that they don't conflict. One of the spread daemons complaining after reload: 2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701 2018-07-25 14:43:04 GMT Finished configuration file. 2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-05a, id: 3524380600, addr: 1.2.3.5, port: 9876 2018-07-25 14:43:04 GMT Conf_reload_initiate: daemon identity mapped to two different old daemons: name ' host-06a' -> 0x7f57eccd8800, id '2876338096' -> 0x7f57eccd8208! Partitioning to singleton! 2018-07-25 14:43:04 GMT Conf_reload_initiate: Return need_singleton = 1 One other spread daemon crashing after reload because of virtual ID chaos: 2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701 2018-07-25 14:43:04 GMT Finished configuration file. 2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-06a, id: 2876338096, addr: 1.2.3.7, port: 9876 2018-07-25 14:43:04 GMT Conf_reload_initiate: My daemon parameters have changed! Exiting! 2018-07-25 14:43:04 GMT Old: name 'host-06a', addr [1.2.3.7]:7654, id '1696126475', num_ifs 1 2018-07-25 14:43:04 GMT New: name 'host-06a', addr [1.2.3.7]:7654, id '2876338096', num_ifs 1 Exit caused by Alarm! Workaround: - Configure a unique VirtualID in spread.conf explicitly for each daemon is a solution, because this configuration is always reloaded including configured virtualID. Maybe the virtualID table in memory has to be cleared before the parser is loading the spread.conf a second time. The auto-generation of virtualID is done only if the found virtualID is zero. This is not true for a reload. No, I think this is not a problem of the internal hash algorithm delivering ambiguous hashes. Currently we have no fix for that bug, because the workaround is good enough for us. Best regards, Martin _______________________________________________ Spread-users mailing list Spread-users at lists.spread.org http://lists.spread.org/mailman/listinfo/spread-users From martin.sc11111 at gmail.com Wed Aug 1 12:01:58 2018 From: martin.sc11111 at gmail.com (Martin Schu) Date: Wed, 1 Aug 2018 18:01:58 +0200 Subject: [Spread-users] bugreport: spread-5 crashing on conf reload, if number of daemons changed, because of virtual ID mess In-Reply-To: References: Message-ID: Hi John, below is our old/new spread.conf. The host-53 is removed. Afterwards a reload (spmonitor r) is triggered. Afterwards the spread daemon at host-06a will crash with following log. This might be because host-06a takes virtualid of deleted host-53, instead of autogenerating it again on reload. Thanks, Martin 2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701 2018-07-25 14:43:04 GMT Finished configuration file. 2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-06a, id: 2876338096, addr: 1.2.3.7, port: 9876 2018-07-25 14:43:04 GMT Conf_reload_initiate: My daemon parameters have changed! Exiting! 2018-07-25 14:43:04 GMT Old: name 'host-06a', addr [1.2.3.7]:9876 , id '1696126475', num_ifs 1 2018-07-25 14:43:04 GMT New: name 'host-06a', addr [1.2.3.7]:9876 , id '2876338096', num_ifs 1 Exit caused by Alarm! ################################################### # Section: Spread_Segments (old version) ################################################### Spread_Segment 1.2.3.5:9876 { host-05a 1.2.3.5 { 1.2.3.5 } } Spread_Segment 1.2.2.1:9876 { host-51 1.2.2.1 { 1.2.2.1 } } Spread_Segment 1.2.2.2:9876 { host-52 1.2.2.2 { 1.2.2.2 } } Spread_Segment 1.2.2.3:9876 { host-53 1.2.2.3 { 1.2.2.3 } } Spread_Segment 1.2.3.7:9876 { host-06a 1.2.3.7 { 1.2.3.7 } } ################################################### # Section: Spread_Segments (new version) ################################################### Spread_Segment 1.2.3.5:9876 { host-05a 1.2.3.5 { 1.2.3.5 } } Spread_Segment 1.2.2.1:9876 { host-51 1.2.2.1 { 1.2.2.1 } } Spread_Segment 1.2.2.2:9876 { host-52 1.2.2.2 { 1.2.2.2 } } ##Spread_Segment 1.2.2.3:9876 { ## host-53 1.2.2.3 { 1.2.2.3 } ##} Spread_Segment 1.2.3.7:9876 { host-06a 1.2.3.7 { 1.2.3.7 } } On Wed, Aug 1, 2018 at 5:30 PM, John Lane Schultz < jschultz at spreadconcepts.com> wrote: > Hi Martin, > > Would you please send me the two configuration files (old and new) that > caused this issue, so I can more concretely understand your incident? > > Thanks, > John > > On Jul 26, 2018, at 11:04 AM, Martin Schu > wrote: > > Hi John, hi all, > > if an already running daemon is triggered to reload the spread.conf at > runtime, it can crash if the number of daemons has changed. There can be a > confusion in the internal VirtualID table. Apparently spread fetches the > wrong auto generated virtual id, if some daemon is removed/added in the > middle of the table at runtime. > > Following steps lead to the problem: > - spread version 5.0.1 > - No VirtualIDs are configured by us. VirtualIDs are auto-generated. > - All daemons are running. > - One Segment with one daemon is removed in the middle of spread.conf. > - The spread.conf is distributed to all hosts. > - spread is triggered to reload spread.conf by spmonitor r > - some spread daemon will abort with bad failure shown below. It is the > daemon behind the removed one in the list. > > Here a conflict of virtual IDs is logged: > 2018-07-20 16:12:27 GMT Auto-generated virtual ID = '1696126475' for > daemon 'host-06a' > 2018-07-20 16:12:27 GMT The virtual ID '1696126475' of 'host-06a' is > already in use by 'host-52'! You will probably need to explicitly > reconfigure the daemons' virtual IDs so that they don't conflict. > > One of the spread daemons complaining after reload: > 2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701 > 2018-07-25 14:43:04 GMT Finished configuration file. > 2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-05a, id: > 3524380600, addr: 1.2.3.5, port: 9876 > 2018-07-25 14:43:04 GMT Conf_reload_initiate: daemon identity mapped to > two different old daemons: name ' host-06a' -> 0x7f57eccd8800, id > '2876338096' -> 0x7f57eccd8208! Partitioning to singleton! > 2018-07-25 14:43:04 GMT Conf_reload_initiate: Return need_singleton = 1 > > One other spread daemon crashing after reload because of virtual ID chaos: > 2018-07-25 14:43:04 GMT Hash value for this configuration is: 4055467701 > 2018-07-25 14:43:04 GMT Finished configuration file. > 2018-07-25 14:43:04 GMT Conf_load_conf_file: My name: host-06a, id: > 2876338096, addr: 1.2.3.7, port: 9876 > 2018-07-25 14:43:04 GMT Conf_reload_initiate: My daemon parameters have > changed! Exiting! > 2018-07-25 14:43:04 GMT Old: name 'host-06a', addr [1.2.3.7]:9876 , id > '1696126475', num_ifs 1 > 2018-07-25 14:43:04 GMT New: name 'host-06a', addr [1.2.3.7]:9876 , id > '2876338096', num_ifs 1 > Exit caused by Alarm! > > Workaround: > - Configure a unique VirtualID in spread.conf explicitly for each daemon > is a solution, because this configuration is always reloaded including > configured virtualID. > > Maybe the virtualID table in memory has to be cleared before the parser is > loading the spread.conf a second time. > The auto-generation of virtualID is done only if the found virtualID is > zero. This is not true for a reload. > > No, I think this is not a problem of the internal hash algorithm > delivering ambiguous hashes. > > Currently we have no fix for that bug, because the workaround is good > enough for us. > > Best regards, > Martin > > _______________________________________________ > Spread-users mailing list > Spread-users at lists.spread.org > http://lists.spread.org/mailman/listinfo/spread-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: