<!-- received="Wed Oct 13 04:29:11 1999 EET DST" -->
<!-- sent="Tue, 12 Oct 1999 18:25:42 -0700 (PDT)" -->
<!-- name="Kanoj Sarcar" -->
<!-- email="kanoj@google.engr.sgi.com" -->
<!-- subject="Re: locking question: do_mmap(), do_munmap()" -->
<!-- id="199910130125.SAA66579@google.engr.sgi.com" -->
<!-- inreplyto="14338.25466.233239.59715@dukat.scot.redhat.com" -->
<title>Linux-kernel mailing list archive 1999-41,: Re: locking question: do_mmap(), do_munmap()</title>
<body bgcolor="#FFFFFF"><font face="Arial,Helvetica">
<h1>Re: locking question: do_mmap(), do_munmap()</h1>
<b>Kanoj Sarcar</b> (<a href="mailto:kanoj@google.engr.sgi.com"><i>kanoj@google.engr.sgi.com</i></a>)<br>
<i>Tue, 12 Oct 1999 18:25:42 -0700 (PDT)</i>
<p>
<ul>
<li> <b>Messages sorted by:</b> <a href="date.html#549">[ date ]</a><a href="index.html#549">[ thread ]</a><a href="subject.html#549">[ subject ]</a><a href="author.html#549">[ author ]</a>
<!-- next="start" -->
<li> <b>Next message:</b> <a href="0550.html">Artur Skawina: "[PATCH] scheduler cleanup"</a>
<li> <b>Previous message:</b> <a href="0548.html">Eric Dittman: "2.3.21 APIC error interrupts"</a>
<!-- nextthread="start" -->
<li> <b>Next in thread:</b> <a href="0606.html">Stephen C. Tweedie: "Re: locking question: do_mmap(), do_munmap()"</a>
<li> <b>Reply:</b> <a href="0606.html">Stephen C. Tweedie: "Re: locking question: do_mmap(), do_munmap()"</a>
<!-- reply="end" -->
</ul>
<hr>
<!-- body="start" -->
<i>&gt; </i><br>
<i>&gt; Hi,</i><br>
<i>&gt; </i><br>
<i>&gt; On Mon, 11 Oct 1999 12:07:08 -0700 (PDT), <a href="mailto:kanoj@google.engr.sgi.com">kanoj@google.engr.sgi.com</a></i><br>
<i>&gt; (Kanoj Sarcar) said:</i><br>
<i>&gt; </i><br>
<i>&gt; &gt;&gt; What about something like a rw-semaphore which protects the vma list:</i><br>
<i>&gt; &gt;&gt; vma-list modifiers [ie merge_segments(), insert_vm_struct() and</i><br>
<i>&gt; &gt;&gt; do_munmap()] grab it exclusive, swapper grabs it "shared, starve</i><br>
<i>&gt; &gt;&gt; exclusive".</i><br>
<i>&gt; &gt;&gt; All other vma-list readers are protected by mm-&gt;mmap_sem.</i><br>
<i>&gt; </i><br>
<i>&gt; It will deadlock.</i><br>
<i>&gt; </i><br>
<p>
Not sure why you say that. Yes, I did see the nesting semdaphore grabbing<br>
scenario that you posted, but I think you can prevent that from <br>
happening if you follow some rules. <br>
<p>
Here's a primitive patch showing the direction I am thinking of. I do not<br>
have any problem with a spinning lock, but I coded this against 2.2.10,<br>
where insert_vm_struct could go to sleep, hence I had to use sleeping<br>
locks to protect the vma chain. Slight change is needed to vmscan.c<br>
to use spinning locks.<br>
<p>
Kanoj<br>
<p>
This is a skeleton of the solution that prevents kswapd from walking<br>
down a vma chain without protections. I am trying to get comments on<br>
this approach before I try a full blown implementation.<br>
<p>
The rules:<br>
1. To modify the vmlist (add/delete), you must hold mmap_sem to <br>
guard against clones doing mmap/munmap/faults, (ie all vm system <br>
calls and faults), and from ptrace, swapin due to swap deletion<br>
etc.<br>
2. To modify the vmlist (add/delete), you must also hold<br>
vmlist_modify_lock, to guard against page stealers scanning the<br>
list.<br>
3. To scan the vmlist, you must either <br>
	a. grab mmap_sem, which should be all cases except page stealer.<br>
or<br>
	b. grab vmlist_access_lock, only done by page stealer.<br>
4. While holding the vmlist_modify_lock, you must be able to guarantee<br>
that no code path will lead to page stealing.<br>
5. You must be able to guarantee that while holding vmlist_modify_lock<br>
or vmlist_access_lock of mm A, you will not try to get either lock<br>
for mm B.<br>
<p>
The assumptions:<br>
1. No code path reachable thru insert_vm_struct and merge_segments<br>
sleep for memory.<br>
<p>
<p>
--- /usr/tmp/p_rdiff_a00368/vmscan.c	Tue Oct 12 17:58:50 1999<br>
+++ mm/vmscan.c	Tue Oct 12 16:55:13 1999<br>
@@ -295,6 +295,7 @@<br>
 	/*<br>
 	 * Find the proper vm-area<br>
 	 */<br>
+	vmlist_access_lock(mm);<br>
 	vma = find_vma(mm, address);<br>
 	if (vma) {<br>
 		if (address &lt; vma-&gt;vm_start)<br>
@@ -302,8 +303,10 @@<br>
 <br>
 		for (;;) {<br>
 			int result = swap_out_vma(vma, address, gfp_mask);<br>
-			if (result)<br>
+			if (result) {<br>
+				vmlist_access_unlock(mm);<br>
 				return result;<br>
+			}<br>
 			vma = vma-&gt;vm_next;<br>
 			if (!vma)<br>
 				break;<br>
@@ -310,6 +313,7 @@<br>
 			address = vma-&gt;vm_start;<br>
 		}<br>
 	}<br>
+	vmlist_access_unlock(mm);<br>
 <br>
 	/* We didn't find anything for the process */<br>
 	mm-&gt;swap_cnt = 0;<br>
--- /usr/tmp/p_rdiff_a0036H/mlock.c	Tue Oct 12 17:59:06 1999<br>
+++ mm/mlock.c	Tue Oct 12 16:35:25 1999<br>
@@ -13,7 +13,9 @@<br>
 <br>
 static inline int mlock_fixup_all(struct vm_area_struct * vma, int newflags)<br>
 {<br>
+	vmlist_modify_lock(vma-&gt;vm_mm);<br>
 	vma-&gt;vm_flags = newflags;<br>
+	vmlist_modify_unlock(vma-&gt;vm_mm);<br>
 	return 0;<br>
 }<br>
 <br>
@@ -26,15 +28,17 @@<br>
 	if (!n)<br>
 		return -EAGAIN;<br>
 	*n = *vma;<br>
-	vma-&gt;vm_start = end;<br>
 	n-&gt;vm_end = end;<br>
-	vma-&gt;vm_offset += vma-&gt;vm_start - n-&gt;vm_start;<br>
 	n-&gt;vm_flags = newflags;<br>
 	if (n-&gt;vm_file)<br>
 		get_file(n-&gt;vm_file);<br>
 	if (n-&gt;vm_ops &amp;&amp; n-&gt;vm_ops-&gt;open)<br>
 		n-&gt;vm_ops-&gt;open(n);<br>
+	vmlist_modify_lock(vma-&gt;vm_mm);<br>
+	vma-&gt;vm_offset += end - vma-&gt;vm_start;<br>
+	vma-&gt;vm_start = end;<br>
 	insert_vm_struct(current-&gt;mm, n);<br>
+	vmlist_modify_unlock(vma-&gt;vm_mm);<br>
 	return 0;<br>
 }<br>
 <br>
@@ -47,7 +51,6 @@<br>
 	if (!n)<br>
 		return -EAGAIN;<br>
 	*n = *vma;<br>
-	vma-&gt;vm_end = start;<br>
 	n-&gt;vm_start = start;<br>
 	n-&gt;vm_offset += n-&gt;vm_start - vma-&gt;vm_start;<br>
 	n-&gt;vm_flags = newflags;<br>
@@ -55,7 +58,10 @@<br>
 		get_file(n-&gt;vm_file);<br>
 	if (n-&gt;vm_ops &amp;&amp; n-&gt;vm_ops-&gt;open)<br>
 		n-&gt;vm_ops-&gt;open(n);<br>
+	vmlist_modify_lock(vma-&gt;vm_mm);<br>
+	vma-&gt;vm_end = start;<br>
 	insert_vm_struct(current-&gt;mm, n);<br>
+	vmlist_modify_unlock(vma-&gt;vm_mm);<br>
 	return 0;<br>
 }<br>
 <br>
@@ -75,10 +81,7 @@<br>
 	*left = *vma;<br>
 	*right = *vma;<br>
 	left-&gt;vm_end = start;<br>
-	vma-&gt;vm_start = start;<br>
-	vma-&gt;vm_end = end;<br>
 	right-&gt;vm_start = end;<br>
-	vma-&gt;vm_offset += vma-&gt;vm_start - left-&gt;vm_start;<br>
 	right-&gt;vm_offset += right-&gt;vm_start - left-&gt;vm_start;<br>
 	vma-&gt;vm_flags = newflags;<br>
 	if (vma-&gt;vm_file)<br>
@@ -88,8 +91,14 @@<br>
 		vma-&gt;vm_ops-&gt;open(left);<br>
 		vma-&gt;vm_ops-&gt;open(right);<br>
 	}<br>
+	vmlist_modify_lock(vma-&gt;vm_mm);<br>
+	vma-&gt;vm_offset += start - vma-&gt;vm_start;<br>
+	vma-&gt;vm_start = start;<br>
+	vma-&gt;vm_end = end;<br>
+	vma-&gt;vm_flags = newflags;<br>
 	insert_vm_struct(current-&gt;mm, left);<br>
 	insert_vm_struct(current-&gt;mm, right);<br>
+	vmlist_modify_unlock(vma-&gt;vm_mm);<br>
 	return 0;<br>
 }<br>
 <br>
@@ -168,7 +177,9 @@<br>
 			break;<br>
 		}<br>
 	}<br>
+	vmlist_modify_lock(current-&gt;mm);<br>
 	merge_segments(current-&gt;mm, start, end);<br>
+	vmlist_modify_unlock(current-&gt;mm);<br>
 	return error;<br>
 }<br>
 <br>
@@ -240,7 +251,9 @@<br>
 		if (error)<br>
 			break;<br>
 	}<br>
+	vmlist_modify_lock(current-&gt;mm);<br>
 	merge_segments(current-&gt;mm, 0, TASK_SIZE);<br>
+	vmlist_modify_unlock(current-&gt;mm);<br>
 	return error;<br>
 }<br>
 <br>
<p>
-<br>
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br>
the body of a message to majordomo@vger.rutgers.edu<br>
Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br>
<!-- body="end" -->
<hr>
<p>
<ul>
<!-- next="start" -->
<li> <b>Next message:</b> <a href="0550.html">Artur Skawina: "[PATCH] scheduler cleanup"</a>
<li> <b>Previous message:</b> <a href="0548.html">Eric Dittman: "2.3.21 APIC error interrupts"</a>
<!-- nextthread="start" -->
<li> <b>Next in thread:</b> <a href="0606.html">Stephen C. Tweedie: "Re: locking question: do_mmap(), do_munmap()"</a>
<li> <b>Reply:</b> <a href="0606.html">Stephen C. Tweedie: "Re: locking question: do_mmap(), do_munmap()"</a>
<!-- reply="end" -->
</ul>
</font></body>
