I am getting lock ups on a production server with a high network load
once or twice a day. After a hang earlier this evening, the same message
repeating off the console screen:
wait_on_bh, CPU 0:
irq: 0 [0 0]
bh: 1 [0 1]
<[c010af05]> <[c015c994]> <[c016a18e]> <[c014c4e6]>
(This is slightly different to what ursus@usa.net reported last month
http://www.deja.com/=dnc/[ST_rn=ps]/getdoc.xp?AN=563124486&fmt=text .)
Another hang happened about an hour later, this time with nothing written
to the console.
c010aecc T synchronize_bh
c014c4ac T sock_recvmsg
c015c800 T tcp_recvmsg
c016a100 T inet_recvmsg
Another system, with identical hardware and kernel, but not quite so heavily
loaded, has been running flawlessly for a couple of weeks.
Probably-irrelevant details: Dell PowerEdge 2300 with AMI MegaRAID;
kernel built from the Debian kernel-source-2.2.13_2.2.13-2 package,
to which I added freeswan-1.1 - otherwise a vanilla Debian 2.1 system;
eth0 and eth1 are eepro100 (module), though only eth0 is up;
CONFIG_M686, CONFIG_X86_GOOD_APIC, CONFIG_1GB, CONFIG_MTRR, CONFIG_SMP.
Of course, further config details are available on request.
I'd really appreciate any assistance - this is interrupting our services,
as well as my so-called holiday.
Thanks,
Mark.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/