PROBLEM: 2.2.18 oops leaves umount hung in disk sleep

Camm Maguire (camm@enhanced.com)
Thu, 22 Mar 2001 12:13:29 -0500


Please reply directly as I'm in ECN exile from this mailing list!

[1.] One line summary of the problem:

2.2.18 oops leaves umount hung in disk sleep

[2.] Full description of the problem/report:

Greetings! We have a backup server running 2.2.18,
nfs-kernel-server 0.1.9.1-1 (Debian), mount 2.10f-5.1, autofs
3.1.4-9. Two nights ago, the kernel had an oops when autofs
tried to umount an nfs-mounted filesystem, leaving the umount
process in an uninterruptible state.

15655 ? D 0:00 /bin/umount /mnt/i19

[3.] Keywords (i.e., modules, networking, kernel):

nfs, autofs, mount

[4.] Kernel version (from /proc/version):

intech9# cat /proc/version

Linux version 2.2.18-i586tsc (root@intech19) (gcc version 2.95.2
20000220 (Debian GNU/Linux)) #1 Tue Feb 13 14:31:34 EST 2001

[5.] Output of Oops.. message (if applicable) with symbolic information
resolved (see Documentation/oops-tracing.txt)

intech9# cat oo.txt

Unable to handle kernel NULL pointer dereference at virtual address 00000000
current->tss.cr3 = 02872000, %%cr3 = 02872000
*pde = 00000000
Oops: 0000
CPU: 0

intech9# ksymoops <oo.txt

ksymoops 2.3.4 on i586 2.2.18-i586tsc. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.2.18-i586tsc/ (default)
-m /boot/System.map-2.2.18-i586tsc (default)

Warning: You did not tell me where to find symbol information. I
will assume that the log matches the kernel and modules that are
running right now and I'll use the default options above for
symbol resolution. If the current kernel and/or modules do not
match the log, you can get more accurate output by telling me the
kernel version and where to find map, modules, ksyms etc.
ksymoops -h explains the options.

Warning (compare_maps): ksyms_base symbol
module_list_R__ver_module_list not found in System.map. Ignoring
ksyms_base entry

Unable to handle kernel NULL pointer dereference at virtual
address 00000000 current->tss.cr3 = 02872000, %%cr3 = 02872000
*pde = 00000000 Oops: 0000 CPU: 0

2 warnings issued. Results may not be reliable.

[6.] A small shell script or example program which triggers the
problem (if possible)

NA

[7.] Environment
[7.1.] Software (add the output of the ver_linux script here)

intech9# sh ./ver_linux
sh ./ver_linux
-- Versions installed: (if some fields are empty or looks
-- unusual then possibly you have very old versions)
Linux intech9 2.2.18-i586tsc #1 Tue Feb 13 14:31:34 EST 2001 i586 unknown
Kernel modules 2.4.1
Gnu C 2.95.2
Binutils 2.9.5.0.37
Linux C Library 2.1.3
Dynamic linker ldd: version 1.9.11
Procps 2.0.6
Mount 2.10f
Net-tools 2.05
Console-tools 0.2.3
Sh-utils 2.0
Modules Loaded nls_cp437 smbfs st ide-scsi scsi_mod nfsd parport_probe parport_pc lp parport nfs autofs lockd sunrpc ne2k-pci 8390 serial ide-disk ide-probe ide-mod

[7.2.] Processor information (from /proc/cpuinfo):

intech9# cat /proc/cpuinfo
cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 8
model name : AMD-K6(tm) 3D processor
stepping : 12
cpu MHz : 451.019
cache size : 64 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mmx 3dnow
bogomips : 901.12

[7.3.] Module information (from /proc/modules):

intech9# cat /proc/modules
cat /proc/modules
nls_cp437 3920 0 (autoclean)
smbfs 26144 0 (autoclean)
st 24160 0 (autoclean)
ide-scsi 7424 0 (autoclean)
scsi_mod 38160 2 (autoclean) [st ide-scsi]
nfsd 161952 8 (autoclean)
parport_probe 3408 0 (autoclean)
parport_pc 7408 1 (autoclean)
lp 5296 0 (autoclean) (unused)
parport 7584 1 (autoclean) [parport_probe parport_pc lp]
nfs 44480 0 (autoclean)
autofs 9136 4 (autoclean)
lockd 43312 1 (autoclean) [nfsd nfs]
sunrpc 58768 1 (autoclean) [nfsd nfs lockd]
ne2k-pci 4176 1 (autoclean)
8390 6144 0 (autoclean) [ne2k-pci]
serial 18016 0 (autoclean)
ide-disk 5736 8 (autoclean)
ide-probe 6116 0 (autoclean)
ide-mod 43384 8 (autoclean) [ide-scsi ide-disk ide-probe]

[7.4.] SCSI information (from /proc/scsi/scsi)

intech9# cat /proc/scsi/scsi
cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: CONNER Model: CTT8000-A Rev: 1.17
Type: Sequential-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: HP Model: COLORADO 14GB Rev: 4.00
Type: Sequential-Access ANSI SCSI revision: 02
intech9#

[7.5.] Other information that might be relevant to the problem
(please look in /proc and include all information that you
think to be relevant):

intech9# cat /proc/15655/status
cat /proc/15655/status
Name: umount
State: D (disk sleep)
Pid: 15655
PPid: 305
Uid: 0 0 0 0
Gid: 0 0 0 0
Groups:
VmSize: 1024 kB
VmLck: 0 kB
VmRSS: 408 kB
VmData: 32 kB
VmStk: 8 kB
VmExe: 36 kB
VmLib: 924 kB
SigPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 8000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 00000000fffffeff
CapEff: 00000000fffffeff

Here are the syslog excepts:

Mar 21 01:09:01 intech9 automount[15648]: running expiration on path /mnt
Mar 21 01:09:01 intech9 automount[15648]: expired /mnt/local
Mar 21 01:09:01 intech9 automount[15648]: expired /mnt/i19d
Mar 21 01:09:47 intech9 automount[305]: >> umount: /mnt/i19: device is busy
Mar 21 01:09:47 intech9 automount[305]: using kernel protocol version 3 on reawaken
Mar 21 01:11:03 intech9 automount[305]: >> umount: /mnt/i19: device is busy
Mar 21 01:11:03 intech9 automount[305]: using kernel protocol version 3 on reawaken
Mar 21 01:12:18 intech9 automount[305]: >> umount: /mnt/i19: device is busy
Mar 21 01:12:18 intech9 automount[305]: using kernel protocol version 3 on reawaken
Mar 21 01:13:33 intech9 automount[305]: >> umount: /mnt/i19: device is busy
Mar 21 01:13:33 intech9 automount[305]: using kernel protocol version 3 on reawaken
Mar 21 01:14:01 intech9 automount[15653]: running expiration on path /mnt
Mar 21 01:14:01 intech9 automount[15653]: expired /mnt/local
Mar 21 01:14:01 intech9 automount[15653]: expired /mnt/i19d
Mar 21 01:14:49 intech9 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Mar 21 01:14:49 intech9 kernel: current->tss.cr3 = 02872000, %%cr3 = 02872000
Mar 21 01:14:49 intech9 kernel: *pde = 00000000
Mar 21 01:14:49 intech9 kernel: Oops: 0000
Mar 21 01:14:49 intech9 kernel: CPU: 0
Mar 21 01:14:49 intech9 automount[305]: using kernel protocol version 3 on reawaken

The nfs host i19 does not show the client as having an active mount at
present.

[X.] Other notes, patches, fixes, workarounds:

Andrea's vmglobal patch applied to kernel. PIII SSE enabling
patch also applied.

-- 
Camm Maguire			     			camm@enhanced.com
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/