Re: 2.5.74-mm1 fails to boot due to APIC trouble, 2.5.73mm3 works.

William Lee Irwin III (wli@holomorphy.com)
Fri, 4 Jul 2003 13:07:46 -0700


On Fri, Jul 04, 2003 at 02:35:31AM -0700, William Lee Irwin III wrote:
>> It's not a cleanup, and it doesn't touch trailing whitespace etc.

On Fri, Jul 04, 2003 at 12:38:19PM -0700, Martin J. Bligh wrote:
> Maybe not, but it looks like one. Maybe if you actually explain
> what you're trying to fix, and why?
> I think this kind of change deserves a better explanation that
> "I'm right" ... that's my main objection.

I'll try to be more verbose, then.

On Fri, Jul 04, 2003 at 02:35:31AM -0700, William Lee Irwin III wrote:
>> It is.

On Fri, Jul 04, 2003 at 12:38:19PM -0700, Martin J. Bligh wrote:
> Explain. Not obvious to the casual observer.

The function assigns physical APIC ID's to IO-APIC's. The loop is
intended to iterate over the physical APIC ID space. 0xf is an
inaccurate description of the upper bound on the physical APIC ID space.
APIC_BROADCAST_ID is a more accurate upper bound.

On Fri, Jul 04, 2003 at 02:35:31AM -0700, William Lee Irwin III wrote:
>> APIC_BROADCAST_ID is an upper bound on valid physical APIC ID's as it
>> is used in the code. That actually was commented in the patch.

On Fri, Jul 04, 2003 at 12:38:19PM -0700, Martin J. Bligh wrote:
> I find it odd that this worked before then. Also seems to be a separate
> issue from the rest of the patch. Is quite probably correct, is just
> non-obvious in the context of the rest of the patch.

I audited not only for usage of limited-width bitmaps for APIC ID
spaces, but also improper bounds on iterations over APIC ID spaces.
Things ran out of APIC ID's when phys_cpu_present_map was NR_CPUS
wide. This patch makes the limits accurate to the hardware with
the brute-force application of bitmaps. The semantic impact of
dropping in a bitmap is very low. The issue that arose was that it
wasn't wide enough, which was obvious enough to spot as a thinko
without even testing.

On Fri, Jul 04, 2003 at 02:35:31AM -0700, William Lee Irwin III wrote:
>> The change is correct, and I am not thinking of any such thing.
>> APIC_BROADCAST_ID's sole usage is for terminating loops over physical
>> APIC ID's while setting the physical APIC ID's of IO-APIC's.

On Fri, Jul 04, 2003 at 12:38:19PM -0700, Martin J. Bligh wrote:
> Why is Summit 0xF, and bigsmp 0xFF then?

Summit (and all other xAPIC-based subarches) should be 0xFF; I missed
it in the sweep.

On Fri, Jul 04, 2003 at 02:35:31AM -0700, William Lee Irwin III wrote:
>> Look at where it's used.

On Fri, Jul 04, 2003 at 12:38:19PM -0700, Martin J. Bligh wrote:
> I did. Still unclear why you think this is correct, or what physical
> apicids have to do with a function that maps from apicids to the
> phys_cpu_present_map, which is a compact mapping of logical apicids
> for NUMA-Q.
> Sorry, but this needs more explanation.

The bitmap width is sufficient. NUMA-Q abuses what everything else
uses for physical APIC ID's (partly because of the BIOS). It so happens
that the array is MAX_APICS wide, which suffices for NUMA-Q (and
anything else that cares to use it).

No. This was not written for or around NUMA-Q; it's meant for the
io_apic.c loops and sparse physid wakeup on non-NUMA-Q machines.

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/