Re: [PATCH][2.5] provisional 32-way x445 patches

Zwane Mwaikambo (zwane@linuxpower.ca)
Mon, 2 Jun 2003 22:44:08 -0400 (EDT)


On Mon, 2 Jun 2003, James Cleverdon wrote:

> Back from holiday.
>
> This kludge doesn't change any IDT behavior, so it is vulnerable to vector
> exhaustion too. It just deals with large systems that have large I/O APICs.
> Since we are indexing irq_vectors by the sum of all available I/O APIC RTEs
> and not checking for overflow, we can get into trouble.

Yeah i get the same with 2.5.70 on a 32way NUMAQ, basically we just keep
going regardless of what NR_IRQS is and then try and write garbage all
over the IDT. There appears to be something wrong with my patch as it gets
ignored each time i send it.

> Some numbers:
>
> * A 32-way x445 is made up of four 8-way chassis hooked together by
> scalability cables.
>
> * Each Summit chassis has 2 I/O APICs with 50 RTEs per. The BIOS guys are
> trying to help out by using some hardware to only use one I/O APIC for all
> but the boot chassis.
>
> * Each RXE100 PCI expansion box contains one or two I/O APICs with 50 RTEs
> each. Every chassis can have one RXE100.
>
> Even without PCI expansion boxes, 5 * 50 == 250 which is > 224. The kernel
> overflows irq_vectors and dies.
>
> Since the value stuffed into irq_vectors is 0x31 to 0xF8, it easily fits into
> a byte. As a quick kludge, I changed the type of irq_vectors and quadrupled
> the number. With 896 elements in the array, the system survived and ran.

Are you implying that the large array stopped your box from booting?

> For a real fix, irq_vectors should be dynamically allocated. But then, I
> should port the dynamic MAX_MP_BUSSES patch from 2.4 to 2.5 anyway....

Hmm dynamic irq_vectors sounds good, we could start off with a static
amount and then dynamically allocate once we start reaching the silly
numbers. Dynamic allocation for all might get interesting at early boot.

This is the patch i use right now to boot 2.5.70 on 32way NUMAQ without
disabling IOAPICs, however i do drop the overflow interrupts.

http://function.linuxpower.ca/patches/misc/patch-fix-8quad-mainline3

I also have another patch to just increase NR_IRQS, this was used on the
same 8quad and allowed full functionality of all the devices upto and
including node7

http://www.osdl.org/projects/numaqhwspprt/results/patch-numaq-highirq7

http://www.osdl.org/projects/numaqhwspprt/results/data/dmesg-32way-8quad-2.5.70-640irqs

But the static arrays aren't all that nice, do you plan on starting on the
dynamic allocation of NR_IRQS sized arrays sometime soon?

Thanks,
Zwane

-- 
function.linuxpower.ca
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/