Re: [problem captured] Re: cerberus on 2.4.17-rc2 UP

Andrew Morton (akpm@zip.com.au)
Wed, 09 Jan 2002 01:04:01 -0800


Manfred Spraul wrote:
>
> > Yes, I can generate it at will on two quite different IDE machines
> > with the run-bash-shared-mapping script from
> > http://www.zip.com.au/~akpm/ext3-tools.tar.gz
>
> Could you apply the attached patch and try to reproduce it?

Nice patch.

> Enable CONFIG_DEBUG_SLAB.
>
> The patch poisons all objects I could find that might have something
> to do with the bug. (all slab caches, struct request, struct page,
> struct filp, partially struct buffer_head).
>
> My test box survives the run-bash_shared-mapping script (~30 min, 128
> MB memory).
>

Mine survives only a few minutes. Once it only lasted a second.
That's with mem=64m. It lasts much, much longer with more memory.

The patch, alas, sheds no light. I'll delve into it fairly soon,
I expect.

EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 212k freed
end_request: buffer-list destroyed
hda6: bad access: block=86256, count=-8
end_request: I/O error, dev 03:06 (hda), sector 86256
hda: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hda: drive not ready for command
hda: lost interrupt

and:

end_request: buffer-list destroyed
hda6: bad access: block=93608, count=-8
end_request: I/O error, dev 03:06 (hda), sector 93608
hda6: bad access: block=93616, count=-16
end_request: I/O error, dev 03:06 (hda), sector 93616
hda6: bad access: block=93624, count=-24
end_request: I/O error, dev 03:06 (hda), sector 93624
hda6: bad access: block=93632, count=-32
end_request: I/O error, dev 03:06 (hda), sector 93632
hda6: bad access: block=93640, count=-40
end_request: I/O error, dev 03:06 (hda), sector 93640
hda6: bad access: block=93648, count=-48
end_request: I/O error, dev 03:06 (hda), sector 93648
hda6: bad access: block=93656, count=-56
end_request: I/O error, dev 03:06 (hda), sector 93656
hda6: bad access: block=93664, count=-64
end_request: I/O error, dev 03:06 (hda), sector 93664
hda6: bad access: block=93672, count=-72
end_request: I/O error, dev 03:06 (hda), sector 93672
hda6: bad access: block=93680, count=-80
end_request: I/O error, dev 03:06 (hda), sector 93680
hda6: bad access: block=93688, count=-88
end_request: I/O error, dev 03:06 (hda), sector 93688
hda6: bad access: block=93696, count=-96
end_request: I/O error, dev 03:06 (hda), sector 93696
hda6: bad access: block=93704, count=-104
end_request: I/O error, dev 03:06 (hda), sector 93704
hda: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hda: drive not ready for command
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt
hda: lost interrupt

hmm.. hda6 is the root filesystem. The test was hitting hda8
and hda5(swap). The only activity happening on hda6 would be
a bit of pagein, maybe syslog. hmm.

Always hda6:

end_request: buffer-list destroyed
hda6: bad access: block=90704, count=-8
end_request: I/O error, dev 03:06 (hda), sector 90704
hda6: bad access: block=90712, count=-16
end_request: I/O error, dev 03:06 (hda), sector 90712
hda6: bad access: block=90720, count=-24
end_request: I/O error, dev 03:06 (hda), sector 90720
hda6: bad access: block=90728, count=-32

Interestingly, 2.4.13-ac8 doesn't fail. Well, it eventually takes
oopses in do_IRQ()'s get_current() - %cr2 has value 0x4017a000.

That kernel has the new IDE drivers, but I've seen the problem with
Andre's latest patches on PIIX, on VIA, and there are reports of it
on SCSI. And "buffer-list destroyed" is always the first message.
It doesn't feel like a driver problem. I'll go do a binary search
through some kernel revs.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/