Multiple crashes with 2.0.32

Daniel Rogers (rogersd@nanaimo.island.net)
Mon, 8 Dec 1997 23:34:16 -0800 (PST)


I've had a strange day today. Three machines that have been running fine
with various 2.0.x kernels all crashed today. I have what looks like two
kernel dumps. One from the syslog, and the other (partial) that I typed in
from the screen of the locked machine. This both must have happened within
moments of each other.

Here's the first one (reformatted for email):

general protection: 0000
CPU: 0
EIP: 0010:[timer_bh+723/820]
EFLAGS: 00010006
eax: 0328e9fc ebx: 345e4d0a ecx: 5d3a3e55 edx: 2229225a
esi: 5f235a50 edi: 00000001 ebp: 001a6158 esp: 001a6144
ds: 0018 es: 0018 fs: 002b gs: 0018 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=001a4220)
Stack: 00000001 ffffffff 00000001 00000001 001a6174 001c3cd0 00117df3 001a6174
001a61fc 00000000 00009000 0010a53f 05da811b 00000008 001a69d4 001a61fc
00000000 00009000 00000000 00000018 00000018 0000002b 00000018 fffffffe
Call Trace: [do_bottom_half+59/96] [handle_bottom_half+11/24]
[sys_idle+92/112] [system_call+85/124] [init+0/624]
[start_kernel+429/440] [it_real_fn+0/72] [schedule+564/652]
Code: 89 51 04 85 d2 74 02 89 0a c7 40 04 00 00 00 00 c7 00 00 00
Aiee, killing interrupt handler

And the second with the ksymoops output:

EFLAGS: 00010002
eax: 001a9aeb ebx: 001a3b00 ecx: 001a69d4 edx: 00000000
esi: 00000106 edi: 00000002 ebp: 001a3b54 esp: 001a3b20
ds: 0018 es: 0018 fs:002b gs: 0000 ss: 0018
Corrupted stack page
Process swapper (pid: 0, process nr: 0, stackpage=001a4220)
Stack: 001a3b24 00000010 0010c84d 00000000 001c3cf0 001a3b54 001a3b54 00000014
001a4000 00000000 0010b558 00000000 001a3b54 00000000 001a69d4 014fbc0c
00000014 001a4000 00000000 001a69d4 00000018 00000018 00000000 00000000
Call Trace: [<0010c84d>] [<0010b558>] [<001168d8>] [<00116c50>] [<0010aadf>] [<05000000>] [<04800000>]
[<00190018>] [<0010aef4>] [<0010aecc>] [<0010a710>] [<001125ab>] [<00117df3>] [<0010a53f>] [<00111736>]
[<00111ae9>] [<001111bc>] [<00111b32>] [<0019094b>] [<001111e9>] [<0010a710>] [<001111bc>] [<00111634>]
[<0011699b>] [<00116c7f>] [<0010aadf>] [<05000000>] [<04800000>] [<0010aec5>] [<0010ae88>] [<0010a710>]
[<0018cb35>] [<00165c94>] [<0017d254>] [<00188153>] [<0017442d>] [<0010c84d>] [<0010c17f>] [<04800066>]
[<00113d6f>] [<04800030>] [<0010aa77>] [<04800000>] [<05000000>] [<04800000>] [<00110018>] [<0010ac95>]
[<001113c7>] [<001111bc>] [<00173100>] [<0010a710>] [<04800000>]
Code: 00 84 37 14 00 b0 be 14 00 bc b7 14 00 a8 67 14 00 1c 98 14
Aiee, killing interrupt handler

Using `/System.map' to map addresses to symbols.

Trace: 10c84d <do_IRQ+2d/50>
Trace: 10b558 <fast_IRQ0_interrupt+58/80>
Trace: 1168d8 <exit_notify+18/1d8>
Trace: 116c50 <do_exit+1b8/1ec>
Trace: 10aadf <die_if_kernel+2b7/2c0>
Trace: 5000000
Trace: 4800000
Trace: 190018 <sprint_dev_config+2a8/4f8>
Trace: 10aef4 <do_general_protection+28/54>
Trace: 10aef4 <do_general_protection+28/54>
Trace: 10a710 <error_code+40/48>
Trace: 1125ab <timer_bh+2d3/334>
Trace: 117df3 <do_bottom_half+3b/60>
Trace: 10a53f <handle_bottom_half+b/18>
Trace: 111736 <schedule+16e/28c>
Trace: 111ae9 <__do_down+85/c0>
Trace: 1111bc <do_page_fault>
Trace: 111b32 <__down+e/14>
Trace: 19094b <down_failed+7/c>
Trace: 1111e9 <do_page_fault+2d/310>
Trace: 10a710 <error_code+40/48>
Trace: 1111e9 <do_page_fault+2d/310>
Trace: 111634 <schedule+6c/28c>
Trace: 11699b <exit_notify+db/1d8>
Trace: 116c7f <do_exit+1e7/1ec>
Trace: 10aadf <die_if_kernel+2b7/2c0>
Trace: 5000000
Trace: 4800000
Trace: 10aec5 <do_reserved+3d/44>
Trace: 10aec5 <do_reserved+3d/44>
Trace: 10a710 <error_code+40/48>
Trace: 18cb35 <rw_intr+245/4bc>
Trace: 165c94 <tty_check_change>
Trace: 17d254 <scsi_done+67c/688>
Trace: 188153 <aic7xxx_isr+357/53c>
Trace: 17442d <clear_selection+d/48>
Trace: 10c84d <do_IRQ+2d/50>
Trace: 10c17f <IRQ11_interrupt+5f/90>
Trace: 4800066
Trace: 113d6f <printk+117/130>
Trace: 4800030
Trace: 10aa77 <die_if_kernel+24f/2c0>
Trace: 4800000
Trace: 5000000
Trace: 4800000
Trace: 110018 <get_cpuinfo+148/1d4>
Trace: 10ac95 <do_invalid_op+3d/44>
Trace: 1113c7 <do_page_fault+20b/310>
Trace: 1113c7 <do_page_fault+20b/310>
Trace: 173100 <SHATransform+173c/1a8c>
Trace: 10a710 <error_code+40/48>
Trace: 4800000

Code: addb %al,0xbeb00014(%edi,%esi,1)
Code: adcb $0x0,%al
Code: movl $0xa80014b7,%esp
Code: addr16 adcb $0x0,%al
Code: sbbb $0x98,%al
Code: adcb $0x0,%al
Code: nop
Code: nop
Code: nop

The config of this machine:

CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KERNELD=y
CONFIG_NET=y
CONFIG_PCI=y
CONFIG_SYSVIPC=y
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_ELF=y
CONFIG_KERNEL_ELF=y
CONFIG_M586=y
CONFIG_BLK_DEV_FD=m
CONFIG_BLK_DEV_LOOP=m
CONFIG_FIREWALL=y
CONFIG_NET_ALIAS=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_SYN_COOKIES=y
CONFIG_IP_FIREWALL=y
CONFIG_IP_FIREWALL_VERBOSE=y
CONFIG_IP_ACCT=y
CONFIG_IP_ALIAS=m
CONFIG_IP_NOSR=y
CONFIG_SKB_LARGE=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_TAGGED_QUEUEING=y
CONFIG_AIC7XXX_PAGE_ENABLE=y
CONFIG_AIC7XXX_RESET_DELAY=15
CONFIG_NETDEVICES=y
CONFIG_NET_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
CONFIG_QUOTA=y
CONFIG_MINIX_FS=m
CONFIG_EXT_FS=m
CONFIG_EXT2_FS=y
CONFIG_XIA_FS=m
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_PROC_FS=y
CONFIG_NFS_FS=y
CONFIG_SMB_FS=m
CONFIG_SMB_WIN95=y
CONFIG_ISO9660_FS=m
CONFIG_HPFS_FS=m
CONFIG_SYSV_FS=m
CONFIG_UFS_FS=m
CONFIG_SERIAL=m
CONFIG_PRINTER=m
CONFIG_RTC=y

The other two machines had identical ethernet, except the 3c59x driver had
been taken back a level to work the the 3c590. Other of the other machines
has only IDE drives, the other has a 53c810.

All three of these machines had been up for about 15 days when this occured
and all run with fairly heavy network loads. One is a web server, one an
FTP server and the last (the one with the info above) a news server.

Please email me directly if you would like any more info.

Thanks.

Dan.