Re: 2.5.74-mm1 fails to boot due to APIC trouble, 2.5.73mm3 works.

Martin J. Bligh (mbligh@aracnet.com)
Fri, 04 Jul 2003 09:18:12 -0700


> On Fri, 4 Jul 2003, Martin J. Bligh wrote:
>
>> Is it really necessary to turn half the apic code upside down in order
>> to fix this? What's the actual bugfix that's buried in this cleanup?
>
> The way i see it is that you can't use NR_CPUS to determine the upper
> bound on APIC IDs. e.g. my 3way is normally configured with NR_CPUS = 3
> but has APIC IDs of 0, 3 and 4. We need to make a distinction.

Fair enough. But that would seem to be a simpler operation than this patch.

>> > - if (i >= 0xf)
>> > + if (i >= APIC_BROADCAST_ID)
>>
>> Is that always correct? it's not equivalent.
>
> Well we really want APIC_MAX_ID (or whatever it's called)

Indeed. maybe MAX_PHYS_APIC_ID or something (it's different for logical).
We break it out in subarch, but it's the same everywhere, which seems
utterly useless - is probably historical cruft that needs to die.
But that sounds like a separate issue, and a separate patch to me.

>> > - for (bit = 0; kicked < NR_CPUS && bit < 8*sizeof(cpumask_t); bit++) {
>> > + for (bit = 0; kicked < NR_CPUS && bit < MAX_APICS; bit++) {
>>
>> Is that the actual one-line bugfix this is all about?
>
> No, the problem is no space for physical ids in cpumask bitmaps, this
> could manifest itself later on unless we fix it now.

Ugh, are you saying the cpumask stuff shrinks masks to < 32 bits if
NR_CPUS is low enough? If so, I can see more point to the patch, but
it still seems like violent overkill. Stopping it doing that would
probably fix it ... I can't imagine it buys you much.

phys_cpu_present_map started off as an unsigned long, and I reused it
in a fairly twisted way for NUMA-Q. As it's an array that's bounded
by apic space, using the bios_cpu_apicid method that summit uses
would be a much cleaner fix, and just leave the old one as a long
bitmask like it used to be - which is fine for non- clustered apic
systems, and saves inventing a whole new data type. See the
cpu_present_to_apicid abstraction.

>> Hmmmm. What are you using physical apicids here for? They seem
>> irrelevant to this function.
>
> Urgh, it's really hard to determine what these functions really want half
> the time. But that change does look wrong.

Yeah, things taking logical apicids, and turning them into cpu numbers
presumably shouldn't have to touch that.

M.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/