Re: 2.5.74-mm1 fails to boot due to APIC trouble, 2.5.73mm3 works.

Martin J. Bligh (mbligh@aracnet.com)
Fri, 04 Jul 2003 13:37:56 -0700


>> Maybe not, but it looks like one. Maybe if you actually explain
>> what you're trying to fix, and why?
>> I think this kind of change deserves a better explanation that
>> "I'm right" ... that's my main objection.
>
> I'll try to be more verbose, then.

Thanks ... will help a lot ;-)

> On Fri, Jul 04, 2003 at 12:38:19PM -0700, Martin J. Bligh wrote:
>> Explain. Not obvious to the casual observer.
>
> The function assigns physical APIC ID's to IO-APIC's. The loop is
> intended to iterate over the physical APIC ID space. 0xf is an
> inaccurate description of the upper bound on the physical APIC ID space.
> APIC_BROADCAST_ID is a more accurate upper bound.

OK, you're right. Is just confusing because it works as it is right
now ... even on a 32x system - however, that's only because Summit
doesn't actually run that region of code, and NUMA-Q ignores it.

> On Fri, Jul 04, 2003 at 02:35:31AM -0700, William Lee Irwin III wrote:
>>> APIC_BROADCAST_ID is an upper bound on valid physical APIC ID's as it
>>> is used in the code. That actually was commented in the patch.
>
> On Fri, Jul 04, 2003 at 12:38:19PM -0700, Martin J. Bligh wrote:
>> I find it odd that this worked before then. Also seems to be a separate
>> issue from the rest of the patch. Is quite probably correct, is just
>> non-obvious in the context of the rest of the patch.
>
> I audited not only for usage of limited-width bitmaps for APIC ID
> spaces, but also improper bounds on iterations over APIC ID spaces.
> Things ran out of APIC ID's when phys_cpu_present_map was NR_CPUS
> wide. This patch makes the limits accurate to the hardware with
> the brute-force application of bitmaps. The semantic impact of
> dropping in a bitmap is very low. The issue that arose was that it
> wasn't wide enough, which was obvious enough to spot as a thinko
> without even testing.
>
>> Why is Summit 0xF, and bigsmp 0xFF then?
>
> Summit (and all other xAPIC-based subarches) should be 0xFF; I missed
> it in the sweep.

OK. Makes more sense now if both are that way.

> On Fri, Jul 04, 2003 at 02:35:31AM -0700, William Lee Irwin III wrote:
>>> Look at where it's used.
>
> On Fri, Jul 04, 2003 at 12:38:19PM -0700, Martin J. Bligh wrote:
>> I did. Still unclear why you think this is correct, or what physical
>> apicids have to do with a function that maps from apicids to the
>> phys_cpu_present_map, which is a compact mapping of logical apicids
>> for NUMA-Q.
>> Sorry, but this needs more explanation.
>
> The bitmap width is sufficient. NUMA-Q abuses what everything else
> uses for physical APIC ID's (partly because of the BIOS). It so happens
> that the array is MAX_APICS wide, which suffices for NUMA-Q (and
> anything else that cares to use it).
>
> No. This was not written for or around NUMA-Q; it's meant for the
> io_apic.c loops and sparse physid wakeup on non-NUMA-Q machines.

OK, maybe this is just an extension of my earlier abuse - in which
case, let's just remove it. Was bad enough before, but now even I
can't understand it, and I wrote the damned thing.

So yes, it looks correct. I'll see if I can see a neat way to bury this
under the existing abstractions like Summit does ... I'm not sure it's
a good idea to have two different methods for this; that whole area of
code is getting horribly complicated ...

Thanks very much for the explanations,

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/