These tests are all being done with Linux 2.4.3 + the bigpatch fix for
knfsd and quotas. The rest of the OS is Debian unstable.
Before moving the storage into production I am performing tests on it
to gauge its stability. The first test I performed was a single
bonnie++ -s 16096 instance, and the timing results are inline with
what I would expect from fast SCSI disks.
However, multiple instance of bonnie++ completely kill the machine.
Once two or three bonnies are running kswapd, kupdated, and bdflush
each jump to using 99% of a cpu and the machine becomes incredibly
unresponsive. Even using a root shell at nice -20 it can take several
minutes for "killall bonnie++" to appear after being typed and then
run. After the bonnies are killed and kswapd, kupdated, and bdflush
are given a minute or two to finish whatever they are doing, the
machine becomes responsive again.
I don't think the machine should be behaving like this. I certainly
expect some slowdowns with that much IO, but the computer should still
be resonably responsive, particularly because no system or user files
that need to be accessed are on that channel of the SCSI controller.
Any advice on approaching this problem would be appreciated. I will
try my best to provide any debugging information that would be useful,
but the machine is on another continent from myself, so without a
serial console I have a hard time getting any information that doesn't
make it into a logfile.
-- Thanks, Jeff Lessem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/