Re: Undo aic7xxx changes (now rc7+aic20030603)

Stephan von Krawczynski (skraw@ithnet.com)
Wed, 11 Jun 2003 01:55:34 +0200


On Tue, 10 Jun 2003 14:15:58 -0400 (EDT)
Zwane Mwaikambo <zwane@linuxpower.ca> wrote:

> On Tue, 10 Jun 2003, Stephan von Krawczynski wrote:
>
> > The controller used is the second aic7xxx. The 31 interrupts on CPU0 have
> > occured before the test. This setup fails during verify (data corruption).
> >
> > I would say that the interrupt code of the aic in itself is therefore ok
> > with SMP. If it were a SMP race condition inside the interrupt routine this
> > test should have been ok (as only one CPU is used).
>
> Thanks for verifying this, at least i know the problem isn't with
> interrupt routing in your specific case.
>
> Zwane

I guess your comment is a bit ahead of my tests. I just completed the test with
rc7+aic20030603 SMP, apic and maxcpus=1. It fails.
This means that although there is only one CPU used through the whole kernel
the data corruption occurs.
I would therefore conclude that the corruption is only possible if in fact the
standard code path is flaky in terms of data completeness per request.
Something like a broken synchronous action, a read request coming back
completed although it is in fact still running or the like.
May also be a misinterpretation of a kind of an "action completed" interrupt.
Or something like one interrupt for multiple running actions with a mixup of
the various causes.
To make sure it is not a problem in the SMP code path through the driver I have
to check a UP kernel with apic support enabled. I will do this tommorrow.
If this is ok then things are simple, because its nailed down to the SMP code
path without a concurrency cause.
Lets see ...

Regards,
Stephan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/