Re: Athlon/AGP issue update

Albert D. Cahalan (acahalan@cs.uml.edu)
Wed, 23 Jan 2002 11:31:38 -0500 (EST)


David S. Miller writes:
> From: benh@kernel.crashing.org

>> The workaround here would be for AGP to also _unmap_ the AGP pages from
>> the main kernel mapping, which isn't always possible, for example on PPC
>> we use the BATs to map the kernel lowmem, we can't easily make "holes" in
>> a BAT mapping. That's one reason why I did some experiments to make the
>> PPC kernel able to disable it's BAT mapping.
>
> This would be impossible on sparc64 too, since we implement these
> mappings statically with an add instruction in the TLB handler.
>
> But we also lack AGP on sparc64 so...
>
> I don't think your PPC case needs the kernel mappings messed with.
> I really doubt the PPC will speculatively fetch/store to a TLB
> missing address.... unless you guys have large TLB mappings on
> PPC too?

Yup, we do.

The PPC has a regular TLB for 4 kB pages, typically loaded
by a hardware hash-table lookup. It also has the BAT registers,
which act as a 4-entry software reloaded TLB for large mappings.

Early-stage MMU operations go like this:

1. simultaneous lookup in BAT registers and regular TLB
2. use BAT mapping if found
3. use TLB entry if found
4. proceed to page table lookup

So, if a speculative load/store operation happens in kernel memory,
it will definitely not be impeded by any TLB or page restrictions.
The regular TLB is simply ignored when there is a BAT hit.

That leaves 2 things required for the problem:

speculative stores cause cache loads with the dirty bit?
AGP non-coherent?

In the MPC7400 (first "G4") user's manual, I find no indication
that speculative stores occur at all. Motorola's manuals are
horrible though, so who knows...

AGP might be non-coherent. If so, then the CPU should use a
non-coherent mapping to avoid useless memory bus traffic.
User code has access to some cache control instructions,
so one may mark the memory cacheable for better performance
even when it is non-coherent. ("flush when you're done")

BTW, I'd say the Athlon is pretty broken to set the dirty bit
before a store is certain. The CPU has to be able to set this
bit on a clean cache line anyway, so I don't see how this
brokenness could help performance. Indeed, it hurts performance
by causing erroneous memory bus traffic. (It's a bug.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/