We're running a quite busy mailserver (50.000 mailboxes, 170000+ msgs a
day) with maildir 'mailboxes' on an NFS volume. The server was running
redhat 7.1 with i686 2.4.3-12smp kernel.
Ever since the machine came into full production we've had big problems
on our dell 2540 dual p3-733, 1Gb RAM machine. At least twice a day we
would see nfs server timeouts, followed by "can't get request slot"
messages completeley hanging the machine and only a reboot could get the
system going again. We've tried every cure known to man to fix this
problem (changing nics, mount params, interal buffers, etc) no luck.
But when I switched to a Single processor kernel (RH 2.4.3-12) on the
same machine the problems where instantly solved! (13 days without
problems so far)
So my (blunt?) conclusion is that there must be some serious problems
with RPC/NFS (I guess RPC) and 2.4 SMP kernels! (and lots of processes
doing NFS stuff)
Anyone any thoughts on this? My kernel hacking knowledge is limited,
but I'm willing to test patches :)
Please CC: me as I'm not subscribed to this list.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/