[Spread-users] Another way to leak (valgrind report)

Tue Aug 31 15:48:52 EDT 2004

On Tue, Aug 31, 2004 at 02:50:25PM -0400, Jonathan Stanton wrote:
> On Tue, Aug 31, 2004 at 02:19:22PM -0400, David Shaw wrote:

> > I've attached a program that does that.  Running this program causes
> > spread's memory and fd usage to shoot upwards.  The fd count stops at
> > 511, but by that time, spread has most of the physical ram on the box.
> > When the program finishes, spread's fd count eventually falls, but the
> > memory is not released.  I suppose this could be extraordinarily
> > agressive caching, but leak or cache, spread is ending up with far too
> > much memory.
> 
> How long does it take in real-time to get to this state? ( a few seconds?) 

Quite a bit longer - 2 minutes or so to get all the memory, then
almost 3 minutes to release all the fds.

> Now, I must say that this usage pattern looks quite unrealistic --
> i.e.  why are you asking for membership messages if you don't ever
> read them? If you do read them (insert a SP_recv() call in the inner
> loop) then I don't think you will see excessive memory usage (even
> if you run multiple threads in parallel).

I'm using membership messages in the example because I got scolded
that a previous example that used SP_multicast was unrealistic. ;)

Of course this is an unrealistic example.  It's a completely absurd
example, as are the other abuse programs.  They only serve to show
behavior that might be problematic (the abuse program that showed the
leak in join/leave was completely unrealistic as well, but did show a
leak).

Let me try and reset this thread:

Now that we are past the join/leave problem, which seems to have been
a bug, and seems to have been fixed, and now seems fine, I do not
claim that what I am currently seeing is a bug.  I do, however, have
some minor concerns about how spread manages memory.

I am using spread in a rather chatty environment where it is not
impossible for dozens-to-hundreds of connections to get behind.  In my
environment, spread also competes with many other memory-hungry
processes.  There is a huge allergy to swap since it destroys
performance.  I am very concious of memory usage for all of those
reasons.

> So you are basically having upto 600 connections each initiating a
> thousand events (that Spread has to deliver reliabily so it can't
> discard) and noting that this causes a large amount of memory
> usage. I do not believe the memory usage is unbounded (just
> large). (given the standard C sbrk() issue)

I agree with you, and we're testing that belief with the CVS spread
right now.  What troubles me is that "large" in my case can fairly
easily be more RAM than I have, which throws things into swap.

In an ideal world, perhaps, I could cap the amount of memory given to
spread by the OS and spread would in turn "push back" on clients
trying to push more into the daemon.  That's probably a significant
change to spread though.  (I could use ulimit, of course, but the
spread daemon currently exits if it cannot malloc).

You suggested that perhaps the disconnection in the daemon is not
handled quickly enough.  That sounds quite possible to me.  The test
program opens a fd, does some work and closes it (on the client side).
The daemon doesn't give up the fd for quite some time after that.  The
abuse program is able to get over 500 fds tied up in the daemon
without holding any of them open on the client side.

David