Re: 2.4.1-pre7 raid5syncd oops

junio@siamese.dhis.twinsun.com
16 Jan 2001 23:06:32 -0800


>>>>> "NB" == Neil Brown <neilb@cse.unsw.edu.au> writes:

NB> On January 16, junio@siamese.dhis.twinsun.com wrote:

NB> Or in short "this cannot happen" :-)

NB> Is there any chance of a memory error?

NB> Has this happened more than once?

I have to confess that I was not running with the stock
2.4.1-pre7 drivers/md/raid5.c; instead I compiled it with the
change in -ac9 tree, which checks the return value of
alloc_page() early (around line 160). Also I had your ``Desk
check'' patch (responding to mtew@cds.duke.edu's post) from
linux-raid list (around line 1075), and mingo's
hot-add/hot-remove fixes.

The symptom was very reproducible with that particular kernel
(essentially, early in the every reboot sequence I got the same
error). In the end, I had to futz with partition type to
disable autodetection of those offending raid-5 component
partitions to recover from the failure. Since then I reverted
back to the stock 2.4.1-pre7 driver with only the ``Desk check''
and mingo's hot-add/hot-remove patch, and have not seen the
problem again. It appears that using those early-null-check
code from -ac9 without understanding its implications was purely
my stupidity, but I still do not offhand see why that would
hurt...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/