Re: 2.2.21pre2 oops

Ville Herva (vherva@niksula.hut.fi)
Wed, 9 Jan 2002 14:45:49 +0200


On Tue, Jan 08, 2002 at 10:13:15PM +0200, you [Ville Herva] claimed:
> On Tue, Jan 08, 2002 at 08:16:03PM +0000, you [Alan Cox] claimed:
> > > essentially cat /dev/md0 > /dev/null kind of test to stress the Via KT133
> > > pci transfers.
> > >
> > > Rootfs is on ide cdrom, the harddrives had no fs on them.
> > >
> > > ksymoops 0.7c on i686 2.2.21pre2-ide+e2compr+raid. Options
> > > used
> >
> > Can you repeat the test to make sure its replicable, then repeat it again
> > after disabling the new VIA fixups in pci/quirks.c
>
> The test has been repeated several times even with 2.2.21pre2 (although
> we've run a lot more 2.2.20 tests). This was the first time we saw an oops.
> The difference between this and the former 2.2.21pre2 runs is certain bios
> settings. (We are still trying to isolate the one setting that triggers the
> Via pci transfer corruption on HPT reads.) We'll repeat the test with these
> settings and try to see if it is via bios settings / pci/quirks.c related.
>
> There seems to be _something_ fishy in the pre2 quirks, since there is at
> least one bios setting combination with which 2.2.20 does not show the pci
> corruption, but 2.2.21pre2 does. It just that it is really tedious to
> isolate. But we are working on it.

Bleah.

It turned out that mere hpt370 read/write test hadn't caused it. My
colleague had launched "ping -f" on background which had immediately
triggered the oops. (When I found the oops on the screen, I initially tought
he had just left the hpt370 read/write test running and left.)

We booted and tried to reproduce it. ping -f didn't immediately trigger it,
but after a while it happened. We got a number of oopses one of which was
similar to the first one and one of which showed process table corruption
(the name of the process in the oops was a random ascii pattern.)

We also got the oops with 2.2.20+patches, so this is not a pre2 thing.
Rather, the difference is that we now ran ping -f on background.

The bad news is that all the bios setting configurations we thought stable
(that had run the hpt370 read/write test without a hitch for days) now give
oopses and corruption pretty quickly when we run ping -f on background :(.

Also, ping -f shows "...EEE.EE.EEE.." which I gather means the packets get
corrupted somewhere.

I'm not too hopeful regarding finding a set of bios settings that would fix
this. It seems the "stable" configuration we found just hid the problem, but
when we push the board further, it appears again.

The two disks on HPT370 read on parallel give about 60MB/s. Add the 10MB/s
from 3c905 to that, and we are pretty close to the 75MB/s number that I've
seen referred somewhere(1) as the maximum Via KT133 can do.

My conclusion at this point is that Via KT133 / Abit KT7-RAID pci transfer
is positively FUBAR, and no sane person should touch the bugger with a ten
foot pole. I'd be happy to be proven wrong, though.

-- v --

v@iki.fi

(1) http://www.tecchannel.de/hardware/817/1.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/