[Spread-users] Re: Send_new_packets: created packet 16 already exist 2

Matt Garman matthew.garman at gmail.com
Fri May 9 11:51:49 EDT 2008


On Fri, May 09, 2008 at 10:37:57AM -0400, Rodrick Brown wrote:
> Are all systems running the same setup? Ie. same version of glibc
> and linux kernel? Have you compared these versions to the systems
> that dont crash? Was spread built from source or was it provided
> by your distro?

Some systems are the same, some are slightly different: the oldest
machines are using CentOS 4.3, and others are using 4.4 and still
others on 4.5.  So the versions of major packages like glibc and the
kernel are the same, although the patchlevel may be different.

But within the subset of machines that are all on the exact same
version of the OS, some experience the crash and some don't.  I.e.,
we've got four machines with identical installations of CentOS 4.3;
two have had the crash I described and two haven't.  And the
machines share a common spread segment.

We built the spread version ourselves from source.  We made some
#define changes as suggested in the spread manual, but otherwise it
was just a "configure ; make install".  And the machine on which
spread was built has its filesystem shared via NFS, so on all the
other boxes we just did a "make install" (i.e. we didn't rebuild for
each machine---each has an identical copy of the spread binary and
libraries).

Thanks!
Matt

> On Fri, May 9, 2008 at 9:01 AM, Matt Garman <matthew.garman at gmail.com>
> wrote:
> 
> >
> > Just some more info on the problem below.
> >
> > I'm seeing this problem with spread 4.0.0.  The link below, and
> > another one I found:
> >    http://marc.info/?l=spread-users&m=107427748906439&w=2
> > both talk about this problem with spread version 3.x.
> >
> > I also can't reproduce this with the spflooder -b 100000 method
> > described in the post above.
> >
> > This is on CentOS 4.3, Linux kernel version 2.6.9-34.ELsmp on
> > x86_64.
> >
> > This crash is pretty rare; it's happened three times on one box,
> > once on another, and never on any of our other ~20 machines.  The
> > three crashes on the one box have been fairly recent though: 5-Mar,
> > 1-May and 8-May.  We're worried that it's going to continue to
> > increase in frequency.
> >
> > Thanks again,
> > Matt
> >
> > On Fri, May 09, 2008 at 07:44:01AM -0500, Matt Garman wrote:
> > > We've experienced some random spread crashes recently.
> > >
> > > In the log, we have the following message:
> > >
> > > [Thu 01 May 2008 23:01:52] Send_new_packets: created packet 16 already
> > exist 2
> > > Exit caused by Alarm(EXIT)
> > >
> > > I found another post on this in the archives, but it doesn't help
> > > much: http://marc.info/?t=111651309700001&r=1&w=2
> > >
> > > We have modified some of the #defines for spread, but not
> > > recently---we've been running with the same parameter set for well
> > > over a year.  But only recently are we seeing this issue.
> > >
> > > I don't know if it's related, but I'm also seeing the following in
> > > the logs:
> > >
> > > [Thu 01 May 2008 23:01:52] Prot_handle_token: BUG WORKAROUND: Too many
> > rounds in EVS state; swallowing token; state:
> > > [Thu 01 May 2008 23:01:52]  Aru:              31
> > > [Thu 01 May 2008 23:01:52]  My_aru:           31
> > > [Thu 01 May 2008 23:01:52]  Highest_seq:      15
> > > [Thu 01 May 2008 23:01:52]  Highest_fifo_seq: 2
> > > [Thu 01 May 2008 23:01:52]  Last_discarded:   0
> > > [Thu 01 May 2008 23:01:52]  Last_delivered:   31
> > > [Thu 01 May 2008 23:01:52]  Last_seq:         3348
> > > [Thu 01 May 2008 23:01:52]  Token_rounds:     501
> > > [Thu 01 May 2008 23:01:52] Last Token:
> > > [Thu 01 May 2008 23:01:52]  type:             0x80040080
> > > [Thu 01 May 2008 23:01:52]  transmiter_id:    -1062683843
> > > [Thu 01 May 2008 23:01:52]  seq:              0
> > > [Thu 01 May 2008 23:01:52]  proc_id:          -1062683843
> > > [Thu 01 May 2008 23:01:52]  aru:              31
> > > [Thu 01 May 2008 23:01:52]  aru_last_id:      -1062683843
> > > [Thu 01 May 2008 23:01:52]  flow_control:     0
> > > [Thu 01 May 2008 23:01:52]  rtr_len:          0
> > > [Thu 01 May 2008 23:01:52]  conf_hash:        1602235222
> > >
> > > And here is a more recent example:
> > >
> > > [Thu 08 May 2008 16:22:01] Prot_handle_token: BUG WORKAROUND: Too many
> > rounds in EVS state; swallowing token; state:
> > > [Thu 08 May 2008 16:22:01]  Aru:              42
> > > [Thu 08 May 2008 16:22:01]  My_aru:           42
> > > [Thu 08 May 2008 16:22:01]  Highest_seq:      33
> > > [Thu 08 May 2008 16:22:01]  Highest_fifo_seq: 0
> > > [Thu 08 May 2008 16:22:01]  Last_discarded:   0
> > > [Thu 08 May 2008 16:22:01]  Last_delivered:   42
> > > [Thu 08 May 2008 16:22:01]  Last_seq:         3366
> > > [Thu 08 May 2008 16:22:01]  Token_rounds:     501
> > > [Thu 08 May 2008 16:22:01] Last Token:
> > > [Thu 08 May 2008 16:22:01]  type:             0x80040080
> > > [Thu 08 May 2008 16:22:01]  transmiter_id:    -1062683843
> > > [Thu 08 May 2008 16:22:01]  seq:              0
> > > [Thu 08 May 2008 16:22:01]  proc_id:          -1062683843
> > > [Thu 08 May 2008 16:22:01]  aru:              42
> > > [Thu 08 May 2008 16:22:01]  aru_last_id:      -1062683843
> > > [Thu 08 May 2008 16:22:01]  flow_control:     0
> > > [Thu 08 May 2008 16:22:01]  rtr_len:          0
> > > [Thu 08 May 2008 16:22:01]  conf_hash:        -1879690443
> > >
> > > [ ... ]
> > >
> > > [Thu 08 May 2008 16:22:02] Send_new_packets: created packet 34
> > > already exist 2
> > > Exit caused by Alarm(EXIT)
> > >
> > >
> > > The "BUG WORKAROUND" messages occur more often, maybe two to five
> > > times a month, but they are not always correlated with a crash.
> > >
> > > Any help would be much appreciated.
> > >
> > > Thank you,
> > > Matt
> > >
> >
> > _______________________________________________
> > Spread-users mailing list
> > Spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
> >
> 
> 
> 
> -- 
> [ Rodrick R. Brown ]
> http://www.rodrickbrown.com http://www.linkedin.com/in/rodrickbrown




More information about the Spread-users mailing list