Re: [BUG] 2.4 VM sucks. Again

Roy Sigurd Karlsbakk (roy@karlsbakk.net)
Wed, 19 Jun 2002 13:26:47 +0200


> Roy, all we know is that "nuke-buffers stops your machine from locking up".
> But we don't know why your machine locks up in the first place. This just
> isn't sufficient grounds to apply it! We need to know exactly why your
> kernel is failing. We don't know what the bug is.

The bug, as previously described, occurs when multiple (20+) clients downloads
large files (3-6Gigs each) at a speed of ~5Mbps. The error does _not_ occur
when a fewer number of clients are downloading at speeds close to disk speed.
All testing is being done on gigE crossover.

> You have two gigabytes of RAM, yes? It's very weird that stripping buffers
> prevents a lockup on a machine with such a small highmem/lowmem ratio.

No. I have 1GB - highmem (which is disabled) giving me ~900MB

> I'll have yet another shot at reproducing it. So, again, could you please
> tell me *exactly*, in great deatail, what I need to do to reproduce this
> problem?

> - memory size

1GB - highmem

> - number of CPUs

1 Athlon 1133Mz, 256kB cache

> - IO system

standard 33MHz/32bit single peer PCI motherboard (SiS based)
on-board SiS IDE/ATA 100 controller.
promise 20269 controller
realtek 100mbps nic
e1000 gigE nic
4 IBM 40gig 120GXP drives - one on each IDE channel
data partition on RAID-0 across all drives

> - kernel version, any applied patches, compiler version
kernel 2.4.19-pre8+tux+akpm buffer patch
I have tried _many_ different kernels, and as I needed the 20269 support, I
chose 2.4.19-pre, Tux is there as I did some testing with that. The problem
is _not_ tux specific, as I've tried with other server software (custom or
standard) as well.
gcc2.95.3

> - exact sequence of commands

start http server software
start 20+ downloads. each downloaded file is 3-6 gigs
after some time most processes are killed OOM

> - anything else you can think of

I have not tried to give it coffee yet, although that might help. I'm usually
pretty pissed if I haven't got my morning coffee

> Have you been able to reproduce the failure on any other machine?

yes. I have set up one other machine with exact same setup and one with
slightly different setup and reproduced it.

> No, not at all. All the pagecache is still there - the patch just
> throws away the buffer_heads which are attached to those pagecache
> pages.

oh. that's good.

> The 2.5 kernel does it tons better. Have you tried it?

I haven't. I've tried to compile it a few times, but it has failed. And. I
don't want to run 2.5 on a production server.

But - If you ask me to test it, I will

thanks for all help

roy

-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners. They stop working when you open Windows.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/