Re: [PATCH][2.5] provisional 32-way x445 patches

James Cleverdon (jamesclv@us.ibm.com)
Mon, 2 Jun 2003 18:36:14 -0700


On Saturday 24 May 2003 02:03 pm, Zwane Mwaikambo wrote:
> On Sat, 24 May 2003, Zwane Mwaikambo wrote:
> > Can't you just skip that one (-ENOSPC)? It would oops on 32way NUMAQ. Why
>
> crackpipe alert!! i didn't notice your s/int/u8/ change, regardless how
> are you handling irqs > 256? The code in 2.5.68 simply overwrites previous
> vector entries in your IDT when it runs out.
>
> Zwane

Back from holiday.

This kludge doesn't change any IDT behavior, so it is vulnerable to vector
exhaustion too. It just deals with large systems that have large I/O APICs.
Since we are indexing irq_vectors by the sum of all available I/O APIC RTEs
and not checking for overflow, we can get into trouble.

Some numbers:

* A 32-way x445 is made up of four 8-way chassis hooked together by
scalability cables.

* Each Summit chassis has 2 I/O APICs with 50 RTEs per. The BIOS guys are
trying to help out by using some hardware to only use one I/O APIC for all
but the boot chassis.

* Each RXE100 PCI expansion box contains one or two I/O APICs with 50 RTEs
each. Every chassis can have one RXE100.

Even without PCI expansion boxes, 5 * 50 == 250 which is > 224. The kernel
overflows irq_vectors and dies.

Since the value stuffed into irq_vectors is 0x31 to 0xF8, it easily fits into
a byte. As a quick kludge, I changed the type of irq_vectors and quadrupled
the number. With 896 elements in the array, the system survived and ran.

For a real fix, irq_vectors should be dynamically allocated. But then, I
should port the dynamic MAX_MP_BUSSES patch from 2.4 to 2.5 anyway....

-- 
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/