Re: linux-2.4.13 high SWAP

Theo Schlossnagle (jesus@omniti.com)
Tue, 30 Oct 2001 13:56:52 -0500


On Tuesday, October 30, 2001, at 10:10 AM, Andre Margis wrote:
> I test 2.4.9 , 2.4.10-ac7, 2.4.13 and all have this problem, I'm not
> using
> XFS, but reiserfs with LVM and 4 GB RAM. I detected if use tmpfs the
> kswapd
> eat my all CPU's, in 2.4.13 the system hang after a time. Now I'm
> testing
> 2.4.13-ac3 without tmpfs and he is very better than the others
> versions. But
> a nice test is disable the HIGHMEM support. I have a machine with 1GB
> RAM and
> the system is very fine and stable, running 2.4.10-ac7.

To contrast. We have had the best success with Linux 2.4.7 with HIGHMEM
support enabled.

I see several people having similar symptoms and everyone is point
fingers at tmpfs or ramfs or xfs (I pointed my finger there
originally). It seems to me that it is something more fundamental and
that particular usage patterns are triggering this. I have around 20
different dual processor machines that run 2.4.7-xfs, 2.4.2-xfs, or
2.4.12-xfs and I can only replicate it when I put heavy filesystem
activity on the machine.

Every time I replicate the problem, I become less certain of the actual
cause! I have kdb and SysReq enabled on all these boxes and it doesn't
cause a console freeze, so if some extra info would be helpful, let me
know and I will gather it during the next "glitch."

I am confident that the problem, on triggered, directly effects all
filesystem code. Once I have an "unkillable" process scheduled, it is
CPU bound and all filesystem sync operations fail. The sync command
never returns, reboot won't work without (-n) and all my mailers (exim
in this case) freeze on their fsync calls.

The _REALLY_ bad thing here (other than suspended disk I/O and the cruel
reboot) is that all file blocks that are modified after this "glitch"
are full of \000's upon reboot. A good portion of my mail spool is
corrupted as well as other files. I don't know if this is specific to
only journalled file systems as I cannot afford the downtime of
replicating this problem on a non-journalled fs.

--
Theo Schlossnagle
1024D/82844984/95FD 30F1 489E 4613 F22E  491A 7E88 364C 8284 4984
2047R/33131B65/71 F7 95 64 49 76 5D BA  3D 90 B9 9F BE 27 24 E7

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/