It is up.
> 
> > If the fix is to avoid page_launder in these cases then the number of
> > occurrences when an alloc_pages fails will go up. 
> 
> > I was attempting to come up with a way of making try_to_free_buffers
> > fail on buffers which are being processed in the generic_make_request
> > path by marking them, the problem is there is no single place to reset
> > the state of a buffer so that try_to_free_buffers will wait for it.
> > Doing it after the end of the loop in generic_make_request is race
> > prone to say the least.
> 
> I really want to fix things like this in 2.5. (ie not avoid the deadlock
> by completly avoiding physical IO, but avoid the deadlock by avoiding
> physical IO on the "device" which is doing the allocation)
> 
> Could you send me your code ? No problem if it does not work at all :)
> 
Well, the basic idea is simple, but I suspect the implementation might
rapidly become historical in 2.5. Basically I added a new buffer state bit,
although BH_Req looks like it could be cannibalized, no one appears to check
for it (is it really dead code?). 
Using a flag to skip buffers in try_to_free_buffers is easy:
===========================================================================
Index: linux/fs/buffer.c
===========================================================================
--- /usr/tmp/TmpDir.3237-0/linux/fs/buffer.c_1.68	Sat Jun 30 12:56:29 2001
+++ linux/fs/buffer.c	Sat Jun 30 12:57:52 2001
@@ -2365,7 +2365,7 @@
 /*
  * Can the buffer be thrown out?
  */
-#define BUFFER_BUSY_BITS	((1<<BH_Dirty) | (1<<BH_Lock) | (1<<BH_Protected))
+#define BUFFER_BUSY_BITS	((1<<BH_Dirty) | (1<<BH_Lock) | (1<<BH_Protected) | (1<<BH_Clamped))
 #define buffer_busy(bh)		(atomic_read(&(bh)->b_count) | ((bh)->b_state & BUFFER_BUSY_BITS))
 
 /*
@@ -2430,7 +2430,11 @@
 	spin_unlock(&free_list[index].lock);
 	write_unlock(&hash_table_lock);
 	spin_unlock(&lru_list_lock);
-	if (wait) {
+	/* Buffers in the middle of generic_make_request processing cannot
+	 * be waited for, they may be allocating memory right now and be
+	 * locked by this thread.
+	 */
+	if (wait && !buffer_clamped(tmp)) {
 		sync_page_buffers(bh, wait);
 		/* We waited synchronously, so we can free the buffers. */
 		if (wait > 1 && !loop) {
===========================================================================
Index: linux/include/linux/fs.h
===========================================================================
--- /usr/tmp/TmpDir.3237-0/linux/include/linux/fs.h_1.99	Sat Jun 30 12:56:29 2001
+++ linux/include/linux/fs.h	Sat Jun 30 07:05:37 2001
@@ -224,6 +224,8 @@
 	BH_Mapped,	/* 1 if the buffer has a disk mapping */
 	BH_New,		/* 1 if the buffer is new and not yet written out */
 	BH_Protected,	/* 1 if the buffer is protected */
+	BH_Clamped,	/* 1 if the buffer cannot be reclaimed
+			 * in it's current state */
 	BH_Delay,	/* 1 if the buffer is delayed allocate */
 
 	BH_PrivateStart,/* not a state bit, but the first bit available
@@ -286,6 +288,7 @@
 #define buffer_mapped(bh)	__buffer_state(bh,Mapped)
 #define buffer_new(bh)		__buffer_state(bh,New)
 #define buffer_protected(bh)	__buffer_state(bh,Protected)
+#define buffer_clamped(bh)	__buffer_state(bh,Clamped)
 #define buffer_delay(bh)	__buffer_state(bh,Delay)
 
 #define bh_offset(bh)		((unsigned long)(bh)->b_data & ~PAGE_MASK)
The tricky part which I had not worked out how to do yet is to manage the
clearing of a state bit in all the correct places. You would have to set it
when the buffer got locked when I/O was about to start, it becomes clearable
after the last memory allocation during the I/O submission process. I do
not like the approach because there are so many ways a buffer can go
once you get into generic_make_request. At first I thought I could just
explicitly set and clear a flag around memory allocations like the bounce
buffer path. However, that can lead to AB BA deadlocks between multiple
threads submitting I/O requests. At this point I started to think I was
going to build an unmaintainable rats nest and decided I had not got
the correct answer.
I am not sure that an approach which avoids a specific device will fly either,
all the I/O can be on one device, and what does device mean when it comes
to md/lvm and request remapping?
Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/