2.5.75 - BUG in ide-iops.c, TCQ, AS, drive standby and i/o lockup

Ivan Gyurdiev (ivg2@cornell.edu)
Sun, 13 Jul 2003 12:03:33 -0400


In thread "2.5.75 does not boot - TCQ oops,"
I applied Jens Axboe's patch to boot a TCQ-enabled kernel.
That resulted in massive filesystem corruption for me, and as I further
discovered it only occurs 8-depth tcq queue (versus 32 when it works).
Elevator choice does not seem to matter for the bug.

The following occurs with a 32-depth tcq queue (tested my new xfs root)
with AS:

I noticed machine lockups on screensaver activation, and got an oops while
working with kmail, but did not write down. Machine froze.
Tried to reproduce with heavy i/o, but the kernel worked fine for a long time
and I was not successful. I decided the bug must be triggered by the 20
minute machine standby, and reduced it to 5 seconds. After attempted wake up
and disk read/write activity (may take several tries), I get either:

- Input/Output error on any shell command
- I/O freezes, but things like dmesg, ls on current directory (cached?)
reports of: hda: invalidating tag queue...ide_tcq_intr_timeout: timeout
waiting for completion interrupt

OR

The errors and oops (written by hand - disk was dead):
=========================================
hda: alt stat timeout
hda: rw queued: status=0xc0
hda: invalidating tag queue (1 commands)
hda: status_timeout: status=0xc0 { Busy }

hda: drive not ready for command
hda: status_timeout: status=0xc0

hda: DMA disabled
hda: drive not ready for command
hda: status_timeout: status=0x80

kernel BUG at drivers/ide/ide-iops.c: 1246!
CPU:0
EIP: 0060: [<c02bc2de>] Tainted: PF (my Nvidia module)
....
EIP is at do_reset1+0x2e/02x250
Process pdflush...
Call trace:
ide_do_reset + 0x17/0x20
idedisk_error+0x156/0x230
common_interrupt+0x18/0x20
ide_wait_stat+0xcc/0x130
start_request+0xc9/0x280
ide_do_request+0x26e/0x470
as_next_request+0x33/0x40
do_ide_request+0x1d/0x30
generic_unplug_device+0x78/0x80
blk_run_queues+0x7a/0xb0
xfs_iflush+0x1fd/0x510
xfs_inode_flush+0x298/0x2c0
generic_unplug_device+0x75/0x80
linvfs_writepage+0x0/0x110
linvfs_write_inode+0x32/0x40
write_inode+0x46/0x50
__sync_single_inode+0x1f9/0x230
writeback_inodes+0x4d/0xa0
wb_kupdate+0x9c/0x120
__pdflush+0xd2/0x1d0
pdflush+0x0/0x20
pdflush+0xf/0x20
wb_kupdate+0x0/0x120
kernel_thread_helper+0x0/0x10
kernel_thread_helper+0x5/0x1(?)

Code: 0f 0b de 04 73 90 3a c0 80 bf 3d 02 00 00 20 74 0c 8b 44 24

<3> ide_tcq_intr_timeout: timeout waiting for completion interrupt

hda: invalidating tag queue (0 commands)
hda: status_timeout: status=0x80 { Busy }

hda: invalidating tag queue (0 commands)
hda: status_timeout: status=0x80 { Busy }

end_request: I/O error, dev hda, sector 26387519
hda: drive not ready for command
hda: status_timeout: status=0x80 { Busy }

hda: drive not ready for command
note: pdflush[7] exited with preempt_count 1

ide_tcq_intr_timeout: timeout waiting for completion interrupt
hda: invalidating tag queue (0 commands)
hda: status_timeout: status=0x80 { Busy }

hda: drive not ready
hda: status_timeout: status=0x80 { Busy }

etc.......

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/