[Spread-users] Clock skew and spread

Sat May 12 22:45:24 EDT 2007

Hi,

I had similar problems with another software that uses the same event 
system.

In general, when you need to schedule a callback function 5 seconds in 
the future, you schedule the event to the current time + 5 seconds (look 
at E_queue in events.c).  You may have a lot of events in between, but 
when you reach that time (based on the system time), the event system 
will call that event before any other that was schedule for a future time. 

If your clock jumps into some future time, and stays in that time, the 
problem is that a lot of events will look as expired and will start 
firing.  The protocols may not behave properly.

But the bad case happens when it jumps into the future and comes back 
into the past.  In this case,  you may schedule events at the future 
time + 5 seconds.  When the clock comes back to the current time, it 
will not hit the event until reaching the expected time.  In that case, 
your program may be stock for quite a while. 

I avoided the problem by blocking NTP port from my network, and allowing 
NTP to set clocks when I knew it was safe (when my program was not 
running).  Then again: (1) why are some NTP daemons making clock jumps 
(my pure guess at the time was that it was setting the system time to 
GMT and then back to EST, but I never looked at the NTP code), (2) is 
there any easy/pretty solution to avoid this problem in an event system.

Cheers,
Nilo

Vsevolod Vlaskin wrote:
> Hi,
>
> A while ago, we seemed to consistently see a similar
> problem in our configuration, which was only one
> Spread daemon with a number of clients all on the same
> LAN on Linux. We used just FIFO ordering for all our
> Spread clients.
>
> A few times, the Spread communication failed
> altogether: messages stopped being delivered (which
> was quite tragic) and we noticed that our NTP service
> did noticeable clock jumps at the time of failure.
>
> We posted the question on this list, but there was no
> reply. Maybe now there will be more response.
>
> Best regards,
>
> Vsevolod Vlaskine
>
>
>
> --- John Robinson <jr at vertica.com> wrote:
>
>   
>> We lost our T1 connection to the world for a while
>> today, and I think 
>> some of our servers' clocks may have drifted (no
>> internal NTP source...).
>>
>> Can this cause oddities among a subnet of spread
>> daemons?  Do they have 
>> to drop connections to their clients for reasons
>> related to clock drift 
>> amongst the host machines?  If so, is there some
>> logging I can enable to 
>> track this?
>>
>> I think I have seen similar things happen when we
>> try to run spread 
>> daemons on a "cluster" under VMWare, which is known
>> to introduce clock 
>> issues.
>>
>> thanks,
>> /jr
>>
>>
>> _______________________________________________
>> Spread-users mailing list
>> Spread-users at lists.spread.org
>>
>>     
> http://lists.spread.org/mailman/listinfo/spread-users
>   
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
>
> _______________________________________________
> Spread-users mailing list
> Spread-users at lists.spread.org
> http://lists.spread.org/mailman/listinfo/spread-users
>