[Spread-users] mod_log_spread errors in error_log

George Schlossnagle george at omniti.com
Thu Aug 16 12:06:18 EDT 2001


----- Original Message -----
From: "Monte Ohrt" <monte at ispi.net>
To: "George Schlossnagle" <george at omniti.com>
Cc: "Jonathan Stanton" <jonathan at cnds.jhu.edu>;
<spread-users at lists.spread.org>
Sent: Thursday, August 16, 2001 11:49 AM
Subject: Re: [Spread-users] mod_log_spread errors in error_log


> Ok great! This is my first time putting spread and mod_log_spread to the
> test with production systems, so a couple more questions I have:
>
> * How do you suggest the spread daemon be started on the apache hosts?
> Right now I'm starting with an init.d start/stop script. The problem is
> if the spread daemon dies, the logs will be lost for this host, yes?
> Should I run spread from inittab, or should I monitor the daemon
> process? What are others doing?

That's an interesting idea.  I run mine from cron every minute and just let
it fail to bind if it's already running.

>
> * heavy traffic configuration
> All my spread hosts are on one network segment (100mb), currently I have
> about 100 virtual hosts, we're probably seeing 3-5 million hits a day
> collectively. I only have a few servers now, but this will grow over
> time, as will the hits. Is this a walk in the park for spread, or is
> this considered  to be heavy traffic?

That should be fine.

>   * What is the most efficient way to run spread?  Should I put all the
> machines into one segment in the config file? I am limited to 128
> systems per segment, correct?

I would recommend a single ring up to around 10-15 hosts, or as  long as it
stays stable.

>   * Are the default WATER_MARK and MAX_SESSION_MESSAGES values adequate?
>
> Thanks!
> Monte
>
> George Schlossnagle wrote:
> >
> > Yep.  Looks like it's working.  If those name not unique (-6) messages
only
> > occur at startup, I would just consider it a benign but annoying bug.
Still
> > should be identified and fixed, but doesn't seem to cause problems.  I
know
> > you had mentioned earlier that you had made chaned to m_l_s to make it
ansi
> > compliant.  Can I see the patch... maybe there's something that is going
on
> > there.
> >
> > Also, you it might be interesting to add the following code at  line 857
of
> > mod_log_spread.c
> >
> > else {
> >     ap_log_error(APLOG_MARK, APLOG_ERR, s, "Connected to spread with
priv
> > name (%s)", private name);
> > }
> >
> > This will (rather noisily) log evry time a successful spread connect is
> > done.  It may help identify the source of the problem.
> >
> > george
> >
> > ----- Original Message -----
> > From: "Monte Ohrt" <monte at ispi.net>
> > To: "George Schlossnagle" <george at omniti.com>
> > Cc: "Jonathan Stanton" <jonathan at cnds.jhu.edu>;
> > <spread-users at lists.spread.org>
> > Sent: Thursday, August 16, 2001 11:01 AM
> > Subject: Re: [Spread-users] mod_log_spread errors in error_log
> >
> > > I'm pretty sure everything is working OK. Here is my test.
> > >
> > > spread.conf
> > > -----------
> > >
> > > Spread_Segment  10.131.192.255:4803 {
> > >
> > > getz-prv        10.131.192.114
> > > drew-prv        10.131.192.115
> > > }
> > >
> > >
> > > apache httpd.conf
> > > -----------------
> > >
> > > SpreadDaemon 4803
> > > CustomLog $test combined
> > >
> > >
> > > * Here is the message when I start spread:
> > >
> > >  Conf_init: using file: /usr/local/etc/spread/spread.conf
> > > Successfully configured Segment 0 [10.131.192.255:4803] with 2 procs:
> > >             getz-prv: 10.131.192.114
> > >             drew-prv: 10.131.192.115
> > > Finished configuration file.
> > > Conf_init: My name: drew-prv, id: 10.131.192.115, port: 4803
> > > Spread: not running as root, won't chroot
> > > Membership id is ( 176406643, 997973726)
> > > --------------------
> > > Configuration at drew-prv is:
> > > Num Segments 1
> > > 1 10.131.192.255    4803
> > > drew-prv            10.131.192.115
> > > ====================
> > >
> > >
> > >
> > > * Then I start apache (see error_log errors from previous e-mails)
> > >
> > > * Then I log into spread from the command line to watch the traffic:
> > >
> > > 9:59 drew[237] /usr/local/etc/spread/tuser -s 4803
> > > Spread library version is 3.15.2
> > > User: connected to 4803 with private group #user#drew-prv
> > >
> > > ==========
> > > User Menu:
> > > ----------
> > >
> > >         j <group> -- join a group
> > >         l <group> -- leave a group
> > >
> > >         s <group> -- send a message
> > >         b <group> -- send a burst of messages
> > >
> > >         r -- receive a message (stuck)
> > >         p -- poll for a message
> > >         e -- enable asynchonous read (default)
> > >         d -- disable asynchronous read
> > >
> > >         q -- quit
> > >
> > > User> j test
> > >
> > > User>
> > > ============================
> > > Received REGULAR membership for group test with 1 members, where I am
> > > member 0:
> > >         #user#drew-prv
> > > grp id is 176406643 997973726 1
> > > Due to the JOIN of #user#drew-prv
> > >
> > > User>
> > >
> > >
> > > Now, I hit the web server with my browser, and this is what comes up
in
> > > the terminal:
> > >
> > > ============================
> > > received RELIABLE message from #ap28008#drew-prv, of type 1, (endian
0)
> > > to 1 groups
> > > (127 bytes): 206.131.193.10 - - [16/Aug/2001:10:00:55 -0500] "GET /
> > > HTTP/1.0" 200 1251 "-" "Mozilla/4.77 [en] (X11; U; Linux 2.4.2-2
i686)"
> > >
> > > User>
> > >
> > >
> > >
> > > So this tells me that everything is working, yes?
> > >
> > >
> > > George Schlossnagle wrote:
> > > >
> > > > I've never seen this behaviour actually.  The SP_connect is only
done in
> > > > child_init, so it shouldn't be due to the Apache's double-loading of
> > > > modules.  It is possible for this to occur if a connection is broken
to
> > > > spread, I guess, m_l_s works like
> > > >
> > > > if(SP_multicast() < 0) {
> > > >     error_log();
> > > >     SP_disconnect();
> > > >     SP_connect();
> > > >     if(SP_multicast()< 0) {
> > > >         error_log();
> > > >     }
> > > > }
> > > >
> > > > still, weird that that would be the first error.   Also, the lack of
> > > > multicast errors is strange as well (this implies that the sending
is
> > > > working).  Logging is working, right?
> > > >
> > > > ----- Original Message -----
> > > > From: "Jonathan Stanton" <jonathan at cnds.jhu.edu>
> > > > To: <spread-users at lists.spread.org>
> > > > Sent: Thursday, August 16, 2001 10:35 AM
> > > > Subject: Re: [Spread-users] mod_log_spread errors in error_log
> > > >
> > > > > On Thu, Aug 16, 2001 at 09:16:20AM -0500, Monte Ohrt wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I got spread 3.15.2 and mod_log_spread working, however there
are
> > some
> > > > > > errors I am seeing in the Apache error_log that concern me:
> > > > > >
> > > > > > Here is the output to error_log when I start the server:
> > > > > >
> > > > > > [Thu Aug 16 09:05:43 2001] [notice] Create log to group test for
> > daemon
> > > > > > 0
> > > > > > [Thu Aug 16 09:05:44 2001] [notice] set_spread_daemon(4803) for
> > index 0
> > > > > > [Thu Aug 16 09:05:44 2001] [notice] Create log to group test for
> > daemon
> > > > > > 0
> > > > > > [Thu Aug 16 09:05:45 2001] [notice] mod_backhand --
UnixSocketDir
> > set to
> > > > > > /export/apache/backhand
> > > > > > [Thu Aug 16 09:05:45 2001] [notice] mod_backhand -- Broadcast
> > > > > > 10.131.192.255:4445 added
> > > > > > [Thu Aug 16 09:05:45 2001] [notice] mod_backhand -- Multicast
accept
> > > > > > 10.131.192.0/24
> > > > > > [Thu Aug 16 09:05:45 2001] [notice] backhand_init(12292)
spawning
> > > > > > moderator (PID 12293)
> > > > > > [Thu Aug 16 09:05:45 2001] [notice] mod_backhand moderator ready
to
> > go
> > > > > > [Thu Aug 16 09:05:45 2001] [error] (9)Bad file number: Could not
> > connect
> > > > > > to spread  with private_name ap12294. Error -6
> > > > > > [Thu Aug 16 09:05:45 2001] [error] (9)Bad file number: Could not
> > connect
> > > > > > to spread  with private_name ap12295. Error -6
> > > > > > [Thu Aug 16 09:05:45 2001] [error] (9)Bad file number: Could not
> > connect
> > > > > > to spread  with private_name ap12296. Error -6
> > > > > > [Thu Aug 16 09:05:45 2001] [error] (9)Bad file number: Could not
> > connect
> > > > > > to spread  with private_name ap12297. Error -6
> > > > > > [Thu Aug 16 09:05:45 2001] [notice] Apache/1.3.20 (Unix)
> > mod_ssl/2.8.4
> > > > > > OpenSSL/0.9.6b mod_gzip/1.3.17.1a balanced_by_mod_backhand/1.2.0
> > > > > > configured -- resuming normal operations
> > > > > > [Thu Aug 16 09:05:45 2001] [error] (9)Bad file number: Could not
> > connect
> > > > > > to spread  with private_name ap12298. Error -6
> > > > > >
> > > > > >
> > > > > > Although spread seems to be working fine, the "Bad file number"
> > errors
> > > > > > are what concern me, what could be causing this?
> > > > >
> > > > > This erroor means that the private name used to connect to spread
was
> > not
> > > > > "unique" meaning some other connection using the same name was
already
> > > > > established. It means the attempt to connect failed. If they only
show
> > up
> > > > > transiently when the system starts up I wouldn't worry about it.
I'll
> > > > think
> > > > > and see why they happen -- probably an interaction between
> > mod_log_spread,
> > > > > the way Apache starts processes and how spread accepts
connections.
> > > > >
> > > > > If they continue regularaly after it has started then tell me. As
long
> > as
> > > > > it does succesfully connect 'quickly' (i.e. it doesn't keep
failing
> > for
> > > > > seconds) you should be ok.
> > > > >
> > > > > The mod_log_spread authors are here on this list also, they might
have
> > > > seen
> > > > > this error before and have a better answer.
> > > > >
> > > > > Jonathan
> > > > > --
> > > > > -------------------------------------------------------
> > > > > Jonathan R. Stanton         jonathan at cs.jhu.edu
> > > > > Dept. of Computer Science
> > > > > Johns Hopkins University
> > > > > -------------------------------------------------------
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > spread-users mailing list
> > > > > spread-users at lists.spread.org
> > > > > http://lists.spread.org/mailman/listinfo/spread-users
> > > > >
> > > >
> > > > _______________________________________________
> > > > spread-users mailing list
> > > > spread-users at lists.spread.org
> > > > http://lists.spread.org/mailman/listinfo/spread-users
> > >
> > > --
> > > Monte Ohrt <monte at ispi.net>
> > > http://www.ispi.net/
> > >
> >
> > _______________________________________________
> > spread-users mailing list
> > spread-users at lists.spread.org
> > http://lists.spread.org/mailman/listinfo/spread-users
>
> --
> Monte Ohrt <monte at ispi.net>
> http://www.ispi.net/
>







More information about the Spread-users mailing list