Re: [PATCH] x86 page fault handler not interrupt safe

Alan Cox (alan@lxorguk.ukuu.org.uk)
Mon, 7 May 2001 11:45:34 +0100 (BST)


> Yes, we'll get a clobbered value, but we'll get a _valid_ clobbered value,
> and we'll just end up doing the fixups twice (and returning to the user
> process that didn't get the page it wanted, which will end up re-doing the
> page fault).

I dont see that we will get a valid value in both cases.

get_user
fault - set %cr2
IRQ
vmalloc
fault
set %cr2
fixup runs
end IRQ
cr2 is corrupt

At this point the Linus tree suffers from one problem because it gets parallel
fixups wrong, and the -ac tree suffers from a different problem because
it gets parallel fixups right and to handle that case wont allow a vmalloc
fixup on a fault from userspace (or you get infinite loops)

> [ Looks closer.. ]
>
> Actually, the second time we'd do the fixup we'd be unhappy, because it
> has already been done. That test should probably be removed. Hmm.

There are a whole set of races with the vmalloc fixups. Most are I think fixed
in the -ac bits if you want a look (I've not submitted them as they are not
too pretty). What I don't currently see is how you handle this without
looping forever or getting the SMP race fixup wrong.

(The current -ac fix for the double vmalloc races is below. WP test makes it
more complex than is nice)

--- /usr/src/LINUS/LINUX2.4/linux.245p1/arch/i386/mm/fault.c Wed May 2 13:52:04 2001
+++ /usr/src/LINUS/LINUX2.4/linux.ac/arch/i386/mm/fault.c Fri May 4 15:03:45 2001
@@ -127,8 +183,11 @@
* be in an interrupt or a critical region, and should
* only copy the information from the master page table,
* nothing more.
+ *
+ * Handle kernel vmalloc fill in faults. User space doesn't take
+ * this path. It isnt permitted to go poking up there.
*/
- if (address >= TASK_SIZE)
+ if (address >= TASK_SIZE && !(error_code & 4))
goto vmalloc_fault;

mm = tsk->mm;
@@ -325,7 +378,11 @@
*
* Do _not_ use "tsk" here. We might be inside
* an interrupt in the middle of a task switch..
+ *
+ * Note. There is 1 gotcha here. We may be doing the WP
+ * test. If so then fixing the pgd/pmd won't help.
*/
+
int offset = __pgd_offset(address);
pgd_t *pgd, *pgd_k;
pmd_t *pmd, *pmd_k;
@@ -344,7 +401,29 @@
pmd = pmd_offset(pgd, address);
pmd_k = pmd_offset(pgd_k, address);

- if (pmd_present(*pmd) || !pmd_present(*pmd_k))
+ /* If the pmd is present then we either have two cpus trying
+ * to fill in the vmalloc entries at once, or we have an
+ * exception. We can treat the collision as a slow path without
+ * worry. Its incredibly incredibly rare
+ *
+ * If the pte is read only then we know its a fault and we must
+ * exception or Oops as it would loop forever otherwise
+ */
+
+ if (pmd_present(*pmd))
+ {
+ pte_t *ptep = pte_offset(pmd, address);
+ if ((error_code & 2) && !pte_write(*ptep))
+ {
+ if ((fixup = search_exception_table(regs->eip)) != 0) {
+ regs->eip = fixup;
+ return;
+ }
+ goto bad_area_nosemaphore;
+ }
+ /* SMP race.. continue */
+ }
+ if (!pmd_present(*pmd_k))
goto bad_area_nosemaphore;
set_pmd(pmd, *pmd_k);
return;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/