SCSI error handling bug in 2.3.36pre3?

Simon Kirby (sim@stormix.com)
Sun, 2 Jan 2000 12:22:10 -0500


I got an NMI lockup oops on my dual-processor box after having played
around with CDRW that had an I/O error on it. I did a "cp /dev/cdrom
/dev/null" to see if there was an error on it, and got some I/O error
messages to the console and "cp" aborted. I went and did some other
unrelated things and then went to a console where /cdrom the current
working directory (and mounted), and typed "w". The system paused after
printing the "w" header, and I was able to switch to another console and
back to see if the system had locked up -- I could not type at the shell
in the other console, but the switching worked. Then, the NMI oops
message occurred, and nothing worked (no sysrq, no console switching).

The oops seems to be SCSI related, but I don't see how "w" would have
touched the current directory, unless I'm missing something from the
strace output. The only thing active on my SCSI bus at this time was my
CDRW drive (my main disks are IDE).

Here is the oops:

NMI Watchdog detected LOCKUP on CPU0, registers:
CPU: 0
EIP: 0010:[<c0245bbd>]
EFLAGS: 00000002
eax: 00000000 ebx: c0329b44 ecx: ffffff1a edx: c02aac88
esi: c7dafa64 edi: c7fab000 ebp: c6d0dcf4 esp: c6d0dcb8
ds: 0018 es: 0018 ss: 0018
Process w (pid: 9013, stackpage=c6d0d000)
Stack: 00000282 00000082 c7fab000 c7dafa40 c6d0dcec 000003e8 00000005 c0329b44
00000000 00000282 c4babe00 c6d0dcec 00005401 0000001e 00000001 c4babe00
c02127c1 c7dafa40 00000000 00000000 c4babc98 c7dafa64 00000282 c7fab000
Call Trace: [<c02127c1>] [<c0211efa>] [<c020a8f1>] [<c020d6db>] [<c020d574>] [<c020dfa8>] [<c0158e56>]
[<c0212889>] [<c01d4795>] [<c013c8e2>] [<c0159b36>] [<c0159d0f>] [<c0147c23>] [<c0147e78>] [<c0147f3e>]
[<c0144089>] [<c010bf54>]
Code: 75 f7 e9 4a 41 fc ff f6 03 01 75 fb e9 60 42 fc ff f6 05 70
>>EIP; c0245bbd <sprintf+58ed/34d40> <=====
Trace; c02127c1 <scsi_io_completion+4f1/20b8>
Trace; c0211efa <scsi_mark_host_reset+5b6/838>
Trace; c020a8f1 <scsi_do_cmd+165/638>
Trace; c020d6db <scsi_register+507/6e8>
Trace; c020d574 <scsi_register+3a0/6e8>
Trace; c020dfa8 <scsi_ioctl+2c0/3c0>
Trace; c0158e56 <resetup_one_dev+3d22/d1a0>
Trace; c0212889 <scsi_io_completion+5b9/20b8>
Trace; c01d4795 <unplug_device+61/56c>
Trace; c013c8e2 <__wait_on_buffer+26e/764>
Trace; c0159b36 <resetup_one_dev+4a02/d1a0>
Trace; c0159d0f <resetup_one_dev+4bdb/d1a0>
Trace; c0147c23 <put_write_access+137/280>
Trace; c0147e78 <lookup_dentry+10c/1ac>
Trace; c0147f3e <__namei+26/d0>
Trace; c0144089 <block_fsync+3c1/10e8>
Trace; c010bf54 <__read_lock_failed+14c8/2da4>
Code; c0245bbd <sprintf+58ed/34d40>
00000000 <_EIP>:
Code; c0245bbd <sprintf+58ed/34d40> <=====
0: 75 f7 jne fffffff9 <_EIP+0xfffffff9> c0245bb6 <sprintf+58e6/34d40> <=====
Code; c0245bbf <sprintf+58ef/34d40>
2: e9 4a 41 fc ff jmp fffc4151 <_EIP+0xfffc4151> c0209d0e <scsi_allocate_device+66/584>
Code; c0245bc4 <sprintf+58f4/34d40>
7: f6 03 01 testb $0x1,(%ebx)
Code; c0245bc7 <sprintf+58f7/34d40>
a: 75 fb jne 7 <_EIP+0x7> c0245bc4 <sprintf+58f4/34d40>
Code; c0245bc9 <sprintf+58f9/34d40>
c: e9 60 42 fc ff jmp fffc4271 <_EIP+0xfffc4271> c0209e2e <scsi_allocate_device+186/584>
Code; c0245bce <sprintf+58fe/34d40>
11: f6 05 70 00 00 00 00 testb $0x0,0x70

Simon-

[ Stormix Technologies Inc. ][ NetNation Communcations Inc. ]
[ sim@stormix.com ][ sim@netnation.com ]
[ Opinions expressed are not necessarily those of my employers. ]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/