cow-ahead N pages for fault clustering

Antonio Vargas (wind@cocodriloo.com)
Mon, 14 Apr 2003 20:32:51 +0200


This is a MIME-formatted message. If you see this text it means that your
E-mail software does not support MIME-formatted messages.

--=_courier-20827-1050345008-0001-2
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Mon, Apr 14, 2003 at 10:22:46AM -0700, Martin J. Bligh wrote:
> >> Ah, you probably don't want to do that ... it's very expensive. Moreover,
> >> if you exec 2ns later, all the effort will be wasted ... and it's very
> >> hard to deterministically predict whether you'll exec or not (stupid
> >> UNIX semantics). Doing it lazily is probably best, and as to "nodes
> >> would not have to reference the memory from others" - you're still
> >> doing that, you're just batching it on the front end.
> >
> > True... What about a vma-level COW-ahead just like we have a file-level
> > read-ahead, then? I mean batching the COW at unCOW-because-of-write time.
>
> That'd be interesting ... and you can test that on a UP box, is not just
> NUMA. Depends on the workload quite heavily, I suspect.
>
> > btw, COW-ahead sound really silly :)
>
> Yeah. So be sure to call it that if it works out ... we need more things
> like that ;-) Moooooo.

What about the attached one? I'm compiling it right now to test in UML :)

[ snip fake-NUMA-on-SMP discussion ]

Greets, Antonio.

--=_courier-20827-1050345008-0001-2
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cow-ahead.patch"

mm/memory.c | 32 ++++++++++++++++++++++++++++----
1 files changed, 28 insertions(+), 4 deletions(-)

diff -puN mm/memory.c~cow-ahead mm/memory.c
--- 25/mm/memory.c~cow-ahead Mon Apr 14 20:08:44 2003
+++ 25-wind/mm/memory.c Mon Apr 14 20:26:17 2003
@@ -1452,7 +1452,7 @@ static int do_file_page(struct mm_struct
*/
static inline int handle_pte_fault(struct mm_struct *mm,
struct vm_area_struct * vma, unsigned long address,
- int write_access, pte_t *pte, pmd_t *pmd)
+ int write_access, pte_t *pte, pmd_t *pmd, int *cowahead)
{
pte_t entry;

@@ -1471,8 +1471,11 @@ static inline int handle_pte_fault(struc
}

if (write_access) {
- if (!pte_write(entry))
+ if (!pte_write(entry)) {
+ if(!cowahead)
+ *cowahead = 1;
return do_wp_page(mm, vma, address, pte, pmd, entry);
+ }

entry = pte_mkdirty(entry);
}
@@ -1492,6 +1495,17 @@ int handle_mm_fault(struct mm_struct *mm
pgd_t *pgd;
pmd_t *pmd;

+ int cowahead, i;
+ int retval, x;
+
+ /*
+ * Implement cow-ahead: copy-on-write several
+ * pages when we fault one of them
+ */
+
+ i = cowahead = 0;
+
+do_cowahead:
__set_current_state(TASK_RUNNING);
pgd = pgd_offset(mm, address);

@@ -1509,8 +1523,18 @@ int handle_mm_fault(struct mm_struct *mm

if (pmd) {
pte_t * pte = pte_alloc_map(mm, pmd, address);
- if (pte)
- return handle_pte_fault(mm, vma, address, write_access, pte, pmd);
+ if (!pte) break;
+
+ x = handle_pte_fault(mm, vma, address, write_access, pte, pmd, cowahead);
+ if(!i) retval = x;
+
+ i++;
+ address += PAGE_SIZE;
+
+ if(!cowahead || i >= 0 || address >= vma->vm_end)
+ return retval;
+
+ goto do_cowahead;
}
spin_unlock(&mm->page_table_lock);
return VM_FAULT_OOM;

_

--=_courier-20827-1050345008-0001-2--