Re: [patch] 2.4.19-pre10-ac2: O(1) scheduler merge, -A3.

Matthew Dobson (colpatch@us.ibm.com)
Tue, 18 Jun 2002 18:05:22 -0700


I'm looking at this right now, as it is definitely broken on our NUMA-Q
hardware when running in multiquad mode. It needs to respect clustered APIC
mode, so I'm working on it.

Cheers!

-Matt

William Lee Irwin III wrote:
> On Mon, Jun 17, 2002 at 11:00:26AM +0200, Ingo Molnar wrote:
>
>>irqbalance uses the set_ioapic_affinity() method to set affinity. The
>>clustered APIC code is broken if it doesnt handle this properly. (i dont
>>have such hardware so i cant tell, but it indeed doesnt appear to handle
>>this case properly.) By wrapping around at node boundary the irqbalance
>>code will work just fine.
>
>
> Perhaps a brief look at the code will help. Please forgive my
> non-preservation of whitespace as I cut and pasted it.
>
>
> static inline void balance_irq(int irq)
> {
> #if CONFIG_SMP
> irq_balance_t *entry = irq_balance + irq;
> unsigned long now = jiffies;
>
> if (unlikely(entry->timestamp != now)) {
> unsigned long allowed_mask;
> int random_number;
>
> rdtscl(random_number);
> random_number &= 1;
>
> allowed_mask = cpu_online_map & irq_affinity[irq];
> entry->timestamp = now;
> entry->cpu = move(entry->cpu, allowed_mask, now, random_number);
> set_ioapic_affinity(irq, 1 << entry->cpu);
> }
> #endif
> }
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 1 << entry->cpu
>
>
>
> This could be problematic ...
>
>
> static void set_ioapic_affinity (unsigned int irq, unsigned long mask)
> {
> unsigned long flags;
>
> /*
> * Only the first 8 bits are valid.
> */
> mask = mask << 24;
> spin_lock_irqsave(&ioapic_lock, flags);
> __DO_ACTION(1, = mask, )
> spin_unlock_irqrestore(&ioapic_lock, flags);
> }
>
>
> According to this, nothing over 8 cpu's can work as the cpu id is used
> as a shift into an 8-bit bitfield. Also,
>
>
> #define __DO_ACTION(R, ACTION, FINAL) \
> \
> { \
> int pin; \
> struct irq_pin_list *entry = irq_2_pin + irq; \
> \
> for (;;) { \
> unsigned int reg; \
> pin = entry->pin; \
> if (pin == -1) \
> break; \
> reg = io_apic_read(entry->apic, 0x10 + R + pin*2); \
> reg ACTION; \
> io_apic_modify(entry->apic, reg); \
> if (!entry->next) \
> break; \
> entry = irq_2_pin + entry->next; \
> } \
> FINAL; \
> }
>
> ACTION is supposed to be an assignment to reg; in clustered hierarchical
> destination format this is not a bitmask as assumed by 1 << entry->cpu.
>
>
> Matt, Mike, please comment.
>
>
> Cheers,
> Bill
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/