<!-- received="Fri Jun 25 20:59:26 1999 EET DST" -->
<!-- sent="Fri, 25 Jun 1999 19:16:56 +0200 (CEST)" -->
<!-- name="Andrea Arcangeli" -->
<!-- email="andrea@suse.de" -->
<!-- subject="Re: [patch] pagecache-2.3.9-E8, fixes against pre3-2.3.9" -->
<!-- id="" -->
<!-- inreplyto="Pine.LNX.3.96.990625175948.29126A-200000@chiara.csoma.elte.hu" -->
<title>Linux-kernel mailing list archive 1999-25,: Re: [patch] pagecache-2.3.9-E8, fixes against pre3-2.3.9</title>
<body bgcolor="#FFFFFF"><font face="Arial,Helvetica">
<h1>Re: [patch] pagecache-2.3.9-E8, fixes against pre3-2.3.9</h1>
<b>Andrea Arcangeli</b> (<a href="mailto:andrea@suse.de"><i>andrea@suse.de</i></a>)<br>
<i>Fri, 25 Jun 1999 19:16:56 +0200 (CEST)</i>
<p>
<ul>
<li> <b>Messages sorted by:</b> <a href="date.html#1218">[ date ]</a><a href="index.html#1218">[ thread ]</a><a href="subject.html#1218">[ subject ]</a><a href="author.html#1218">[ author ]</a>
<!-- next="start" -->
<li> <b>Next message:</b> <a href="1219.html">Alexander Viro: "Re: [RFC] File flags handling - proposal for API."</a>
<li> <b>Previous message:</b> <a href="1217.html">kuznet@ms2.inr.ac.ru: "Re: Got no answer: 2.2.9: MSG_DONTROUTE / SO_DONTROUTE: still working??"</a>
<!-- nextthread="start" -->
<!-- reply="end" -->
</ul>
<hr>
<!-- body="start" -->
On Fri, 25 Jun 1999, Ingo Molnar wrote:<br>
<p>
<i>&gt;- i reworked end_buffer_io_async() and mark_buffer_uptodate(), they were</i><br>
<i>&gt;  rather redundant. mark_buffer_uptodate() does no more set the page</i><br>
<i>&gt;  uptodate - this also speeds up lots of other places. Eg. what</i><br>
<p>
Just done here and it was just included into 2.3.7_andrea1.bz2.<br>
<p>
<i>&gt;  [David also removed the reuse_list (noticed by V Ganesh), and i removed</i><br>
<i>&gt;  BH_protected logic, these two were obsolete concepts.]</i><br>
<p>
Done that here too.<br>
<p>
I have also further fs-corrption fixes and cleanups.<br>
<p>
This patch does:<br>
<p>
o	fix for an fs corrution bug in all 2.3.[89]: even if we write to a<br>
	partial buffer, this doesn't mean that we have all buffers<br>
	in the page uptodate. The below check was bogus:<br>
<p>
	if (!partial)<br>
                SetPageUptodate(page);<br>
        return bytes;<br>
<p>
o	avoid mark_buffer_uptodate to check if the page is uptodate too<br>
o	general cleanup (some list-helper function) and improvement with<br>
	slowpath for panics<br>
o	removed bogus flushtime initialization (there still some mess in<br>
	the sync_old_buffers() but I left out such rewrite of the code<br>
	from this patch)<br>
o	pages read and written with brw_page are supposed to be never<br>
	used with the fs-write-read helper functions (they are supposed<br>
	to have always the right buffers).<br>
o	better hashtable initialization (after than you can return to<br>
	set the max memlist order to 6)<br>
o	removed reuse_list<br>
o	better buffer initialization (removed not needed initlializations)<br>
o	set bh_shared to trap possible bugs (I am mostly using it to<br>
	trap possible races in shrink_mmap, when it will be stable<br>
	we'll remove it)<br>
o	rewrote invalidate-set_blocksize to avoid races<br>
<p>
The missing part of my current buffer.c are the dirty-management and the<br>
bdflush rewrite.<br>
<p>
Index: linux/fs/buffer.c<br>
===================================================================<br>
RCS file: /var/cvs/linux/fs/buffer.c,v<br>
retrieving revision 1.1.1.24<br>
diff -u -r1.1.1.24 buffer.c<br>
--- linux/fs/buffer.c	1999/06/25 13:31:35	1.1.1.24<br>
+++ linux/fs/buffer.c	1999/06/25 17:07:08<br>
@@ -71,7 +71,6 @@<br>
 static kmem_cache_t *bh_cachep;<br>
 <br>
 static struct buffer_head * unused_list = NULL;<br>
-static struct buffer_head * reuse_list = NULL;<br>
 static DECLARE_WAIT_QUEUE_HEAD(buffer_wait);<br>
 <br>
 static int nr_buffers = 0;<br>
@@ -218,7 +217,6 @@<br>
 				continue;<br>
 			bh-&gt;b_count++;<br>
 			next-&gt;b_count++;<br>
-			bh-&gt;b_flushtime = 0;<br>
 			ll_rw_block(WRITE, 1, &amp;bh);<br>
 			bh-&gt;b_count--;<br>
 			next-&gt;b_count--;<br>
@@ -394,50 +392,58 @@<br>
 	return err;<br>
 }<br>
 <br>
-void invalidate_buffers(kdev_t dev)<br>
+static inline void __insert_into_list(struct buffer_head * bh,<br>
+				      struct buffer_head ** list_p)<br>
 {<br>
-	int i;<br>
-	int nlist;<br>
-	struct buffer_head * bh;<br>
-<br>
-	for(nlist = 0; nlist &lt; NR_LIST; nlist++) {<br>
-		bh = lru_list[nlist];<br>
-		for (i = nr_buffers_type[nlist]*2 ; --i &gt; 0 ; bh = bh-&gt;b_next_free) {<br>
-			if (bh-&gt;b_dev != dev)<br>
-				continue;<br>
-			wait_on_buffer(bh);<br>
-			if (bh-&gt;b_dev != dev)<br>
-				continue;<br>
-			if (bh-&gt;b_count)<br>
-				continue;<br>
-			bh-&gt;b_flushtime = 0;<br>
-			clear_bit(BH_Protected, &amp;bh-&gt;b_state);<br>
-			clear_bit(BH_Uptodate, &amp;bh-&gt;b_state);<br>
-			clear_bit(BH_Dirty, &amp;bh-&gt;b_state);<br>
-			clear_bit(BH_Req, &amp;bh-&gt;b_state);<br>
-		}<br>
+	if (*list_p)<br>
+		bh-&gt;b_prev_free = (*list_p)-&gt;b_prev_free;<br>
+	else<br>
+	{<br>
+		bh-&gt;b_prev_free = bh;<br>
+		*list_p = bh;<br>
 	}<br>
+	bh-&gt;b_next_free = *list_p;<br>
+	(*list_p)-&gt;b_prev_free-&gt;b_next_free = bh;<br>
+	(*list_p)-&gt;b_prev_free = bh;<br>
 }<br>
 <br>
+static inline void __remove_from_list(struct buffer_head * bh,<br>
+				      struct buffer_head ** list_p)<br>
+{<br>
+	if (bh-&gt;b_next_free != bh)<br>
+	{<br>
+		bh-&gt;b_prev_free-&gt;b_next_free = bh-&gt;b_next_free;<br>
+		bh-&gt;b_next_free-&gt;b_prev_free = bh-&gt;b_prev_free;<br>
+<br>
+		if (*list_p == bh)<br>
+			*list_p = bh-&gt;b_next_free;<br>
+	} else<br>
+		*list_p = NULL;<br>
+<br>
+	bh-&gt;b_next_free = bh-&gt;b_prev_free = NULL;<br>
+}<br>
+<br>
 #define _hashfn(dev,block) (((unsigned)(HASHDEV(dev)^block)) &amp; bh_hash_mask)<br>
 #define hash(dev,block) hash_table[_hashfn(dev,block)]<br>
 <br>
 static void insert_into_hash_list(struct buffer_head * bh)<br>
 {<br>
+	struct buffer_head **bhp = &amp;hash(bh-&gt;b_dev, bh-&gt;b_blocknr);<br>
+	struct buffer_head *next = *bhp;<br>
+<br>
 	bh-&gt;b_next = NULL;<br>
 	bh-&gt;b_pprev = NULL;<br>
-	if (bh-&gt;b_dev) {<br>
-		struct buffer_head **bhp = &amp;hash(bh-&gt;b_dev, bh-&gt;b_blocknr);<br>
-		struct buffer_head *next = *bhp;<br>
 <br>
-		if (next) {<br>
-			bh-&gt;b_next = next;<br>
-			next-&gt;b_pprev = &amp;bh-&gt;b_next;<br>
-		}<br>
-		*bhp = bh;<br>
-		bh-&gt;b_pprev = bhp;<br>
-		nr_hashed_buffers++;<br>
-	}<br>
+	if (bh-&gt;b_dev == B_FREE)<br>
+		BUG();<br>
+<br>
+	if (next) {<br>
+		bh-&gt;b_next = next;<br>
+		next-&gt;b_pprev = &amp;bh-&gt;b_next;<br>
+	}<br>
+	*bhp = bh;<br>
+	bh-&gt;b_pprev = bhp;<br>
+	nr_hashed_buffers++;<br>
 }<br>
 <br>
 static void remove_from_hash_queue(struct buffer_head * bh)<br>
@@ -460,65 +466,67 @@<br>
 	struct buffer_head **bhp = &amp;lru_list[bh-&gt;b_list];<br>
 <br>
 	if (bh-&gt;b_dev == B_FREE)<br>
-		BUG();<br>
+		goto panic_bfree;<br>
 <br>
-	if(!*bhp) {<br>
-		*bhp = bh;<br>
-		bh-&gt;b_prev_free = bh;<br>
-	}<br>
-<br>
 	if (bh-&gt;b_next_free)<br>
-		panic("VFS: buffer LRU pointers corrupted");<br>
+		goto panic_corrupted;<br>
 <br>
-	bh-&gt;b_next_free = *bhp;<br>
-	bh-&gt;b_prev_free = (*bhp)-&gt;b_prev_free;<br>
-	(*bhp)-&gt;b_prev_free-&gt;b_next_free = bh;<br>
-	(*bhp)-&gt;b_prev_free = bh;<br>
+	__insert_into_list(bh, bhp);<br>
 <br>
 	nr_buffers++;<br>
 	nr_buffers_type[bh-&gt;b_list]++;<br>
+	return;<br>
+<br>
+ panic_bfree:<br>
+	BUG();<br>
+	panic("VFS: inserting a B_FREE buffer in the LRU list");<br>
+ panic_corrupted:<br>
+	BUG();<br>
+	panic("VFS: buffer LRU pointers corrupted");<br>
 }<br>
 <br>
 static void remove_from_lru_list(struct buffer_head * bh)<br>
 {<br>
-	if (!(bh-&gt;b_prev_free) || !(bh-&gt;b_next_free))<br>
+	if (!(bh-&gt;b_next_free))<br>
 		return;<br>
 <br>
-	if (bh-&gt;b_dev == B_FREE) {<br>
-		printk("LRU list corrupted");<br>
-		*(int*)0 = 0;<br>
-	}<br>
-	bh-&gt;b_prev_free-&gt;b_next_free = bh-&gt;b_next_free;<br>
-	bh-&gt;b_next_free-&gt;b_prev_free = bh-&gt;b_prev_free;<br>
-<br>
-	if (lru_list[bh-&gt;b_list] == bh)<br>
-		 lru_list[bh-&gt;b_list] = bh-&gt;b_next_free;<br>
-	if (lru_list[bh-&gt;b_list] == bh)<br>
-		 lru_list[bh-&gt;b_list] = NULL;<br>
-	bh-&gt;b_next_free = bh-&gt;b_prev_free = NULL;<br>
+	if (bh-&gt;b_dev == B_FREE)<br>
+		goto panic_bfree;<br>
+<br>
+	__remove_from_list(bh, &amp;lru_list[bh-&gt;b_list]);<br>
 <br>
 	nr_buffers--;<br>
 	nr_buffers_type[bh-&gt;b_list]--;<br>
+	return;<br>
+<br>
+ panic_bfree:<br>
+	BUG();<br>
+	panic("VFS: removing a B_FREE buffer from the lru list");<br>
 }<br>
 <br>
 static void remove_from_free_list(struct buffer_head * bh)<br>
 {<br>
 	int isize = BUFSIZE_INDEX(bh-&gt;b_size);<br>
+<br>
 	if (!(bh-&gt;b_prev_free) || !(bh-&gt;b_next_free))<br>
-		panic("VFS: Free block list corrupted");<br>
+		goto panic_corrupted;<br>
 	if(bh-&gt;b_dev != B_FREE)<br>
-		panic("Free list corrupted");<br>
+		goto panic_bfree;<br>
 	if(!free_list[isize])<br>
-		panic("Free list empty");<br>
-	if(bh-&gt;b_next_free == bh)<br>
-		 free_list[isize] = NULL;<br>
-	else {<br>
-		bh-&gt;b_prev_free-&gt;b_next_free = bh-&gt;b_next_free;<br>
-		bh-&gt;b_next_free-&gt;b_prev_free = bh-&gt;b_prev_free;<br>
-		if (free_list[isize] == bh)<br>
-			 free_list[isize] = bh-&gt;b_next_free;<br>
-	}<br>
-	bh-&gt;b_next_free = bh-&gt;b_prev_free = NULL;<br>
+		goto panic_empty;<br>
+<br>
+	__remove_from_list(bh, &amp;free_list[isize]);<br>
+	return;<br>
+<br>
+ panic_corrupted:<br>
+	BUG();<br>
+	panic("VFS: Free block list corrupted");<br>
+ panic_bfree:<br>
+	BUG();<br>
+	panic("Free list corrupted");<br>
+ panic_empty:<br>
+	BUG();<br>
+	panic("Free list empty");<br>
 }<br>
 <br>
 static void remove_from_queues(struct buffer_head * bh)<br>
@@ -531,43 +539,27 @@<br>
 <br>
 static void put_last_free(struct buffer_head * bh)<br>
 {<br>
-	if (bh) {<br>
-		struct buffer_head **bhp = &amp;free_list[BUFSIZE_INDEX(bh-&gt;b_size)];<br>
-<br>
-		if (bh-&gt;b_count)<br>
-			BUG();<br>
-<br>
-		bh-&gt;b_dev = B_FREE;  /* So it is obvious we are on the free list. */<br>
+	struct buffer_head **bhp = &amp;free_list[BUFSIZE_INDEX(bh-&gt;b_size)];<br>
 <br>
-		/* Add to back of free list. */<br>
-		if(!*bhp) {<br>
-			*bhp = bh;<br>
-			bh-&gt;b_prev_free = bh;<br>
-		}<br>
-<br>
-		bh-&gt;b_next_free = *bhp;<br>
-		bh-&gt;b_prev_free = (*bhp)-&gt;b_prev_free;<br>
-		(*bhp)-&gt;b_prev_free-&gt;b_next_free = bh;<br>
-		(*bhp)-&gt;b_prev_free = bh;<br>
-	}<br>
+	remove_from_queues(bh);<br>
+	bh-&gt;b_count = 0;<br>
+	bh-&gt;b_state = 0;<br>
+	bh-&gt;b_dev = B_FREE;  /* So it is obvious we are on the free list. */<br>
+	__insert_into_list(bh, bhp);<br>
 }<br>
 <br>
 struct buffer_head * find_buffer(kdev_t dev, int block, int size)<br>
 {		<br>
-	struct buffer_head * next;<br>
-<br>
-	next = hash(dev,block);<br>
-	for (;;) {<br>
-		struct buffer_head *tmp = next;<br>
-		if (!next)<br>
+	struct buffer_head * bh;<br>
+	<br>
+	for (bh = hash(dev,block); bh; bh = bh-&gt;b_next)<br>
+	{<br>
+		if (bh-&gt;b_blocknr == block &amp;&amp; bh-&gt;b_dev == dev &amp;&amp;<br>
+		    bh-&gt;b_size == size)<br>
 			break;<br>
-		next = tmp-&gt;b_next;<br>
-		if (tmp-&gt;b_blocknr != block || tmp-&gt;b_size != size || tmp-&gt;b_dev != dev)<br>
-			continue;<br>
-		next = tmp;<br>
-		break;<br>
 	}<br>
-	return next;<br>
+<br>
+	return bh;<br>
 }<br>
 <br>
 /*<br>
@@ -605,11 +597,51 @@<br>
 	return 0;<br>
 }<br>
 <br>
+void invalidate_buffers(kdev_t dev)<br>
+{<br>
+	int i, nlist, slept;<br>
+	struct buffer_head * bh, * bhnext;<br>
+<br>
+ again:<br>
+	slept = 0;<br>
+	for(nlist = 0; nlist &lt; NR_LIST; nlist++) {<br>
+		bh = lru_list[nlist];<br>
+		if (!bh)<br>
+			continue;<br>
+		for (i = nr_buffers_type[nlist] ; i &gt; 0 ;<br>
+		     bh = bhnext, i--)<br>
+		{<br>
+			bhnext = bh-&gt;b_next_free;<br>
+			if (bh-&gt;b_dev != dev)<br>
+				continue;<br>
+			if (buffer_locked(bh))<br>
+			{<br>
+				slept = 1;<br>
+				wait_on_buffer(bh);<br>
+			}<br>
+			if (bh-&gt;b_dev != dev)<br>
+				goto panic_changed;<br>
+			if (buffer_shared(bh))<br>
+				goto panic_shared;<br>
+			if (!bh-&gt;b_count)<br>
+				put_last_free(bh);<br>
+			if (slept)<br>
+				goto again;<br>
+		}<br>
+	}<br>
+	return;<br>
+<br>
+ panic_changed:<br>
+	panic("invalidate_buffers: buffer changed under us");<br>
+ panic_shared:<br>
+	panic("invalidate_buffers: invalidating a shared buffer");<br>
+}<br>
+<br>
 void set_blocksize(kdev_t dev, int size)<br>
 {<br>
 	extern int *blksize_size[];<br>
-	int i, nlist;<br>
-	struct buffer_head * bh, *bhnext;<br>
+	int i, nlist, slept;<br>
+	struct buffer_head * bh, * bhnext;<br>
 <br>
 	if (!blksize_size[MAJOR(dev)])<br>
 		return;<br>
@@ -630,33 +662,44 @@<br>
 	/* We need to be quite careful how we do this - we are moving entries<br>
 	 * around on the free list, and we can get in a loop if we are not careful.<br>
 	 */<br>
-	for(nlist = 0; nlist &lt; NR_LIST; nlist++) {<br>
+ again:<br>
+	slept = 0;<br>
+ 	for(nlist = 0; nlist &lt; NR_LIST; nlist++) {<br>
 		bh = lru_list[nlist];<br>
-		for (i = nr_buffers_type[nlist]*2 ; --i &gt; 0 ; bh = bhnext) {<br>
-			if(!bh)<br>
-				break;<br>
-<br>
-			bhnext = bh-&gt;b_next_free; <br>
-			if (bh-&gt;b_dev != dev)<br>
-				 continue;<br>
-			if (bh-&gt;b_size == size)<br>
+		if (!bh)<br>
+			continue;<br>
+		for (i = nr_buffers_type[nlist] ; i &gt; 0 ;<br>
+		     bh = bhnext, i--)<br>
+		{<br>
+			bhnext = bh-&gt;b_next_free;<br>
+			if (bh-&gt;b_dev != dev || bh-&gt;b_size == size)<br>
 				 continue;<br>
-			bhnext-&gt;b_count++;<br>
-			bh-&gt;b_count++;<br>
-			wait_on_buffer(bh);<br>
-			bhnext-&gt;b_count--;<br>
-			if (bh-&gt;b_dev == dev &amp;&amp; bh-&gt;b_size != size) {<br>
-				clear_bit(BH_Dirty, &amp;bh-&gt;b_state);<br>
-				clear_bit(BH_Uptodate, &amp;bh-&gt;b_state);<br>
-				clear_bit(BH_Req, &amp;bh-&gt;b_state);<br>
-				bh-&gt;b_flushtime = 0;<br>
+			if (buffer_locked(bh))<br>
+			{<br>
+				slept = 1;<br>
+				wait_on_buffer(bh);<br>
 			}<br>
-			if (--bh-&gt;b_count)<br>
-				continue;<br>
-			remove_from_queues(bh);<br>
-			put_last_free(bh);<br>
+			if (bh-&gt;b_dev != dev || bh-&gt;b_size == size)<br>
+				goto panic_changed;<br>
+			if (buffer_shared(bh))<br>
+				goto panic_shared;<br>
+			if (!bh-&gt;b_count)<br>
+				put_last_free(bh);<br>
+			else<br>
+				printk(KERN_ERR<br>
+				       "set_blocksize: "<br>
+				       "b_count %d, block %lu!\n",<br>
+				       bh-&gt;b_count, bh-&gt;b_blocknr);<br>
+			if (slept)<br>
+				goto again;<br>
 		}<br>
 	}<br>
+	return;<br>
+<br>
+ panic_changed:<br>
+	panic("set_blocksize: buffer changed under us");<br>
+ panic_shared:<br>
+	panic("set_blocksize: buffer shared");<br>
 }<br>
 <br>
 /*<br>
@@ -675,7 +718,6 @@<br>
 		 bh_end_io_t *handler, void *dev_id)<br>
 {<br>
 	bh-&gt;b_list = BUF_CLEAN;<br>
-	bh-&gt;b_flushtime = 0;<br>
 	bh-&gt;b_dev = dev;<br>
 	bh-&gt;b_blocknr = block;<br>
 	bh-&gt;b_end_io = handler;<br>
@@ -788,11 +830,11 @@<br>
 	int isize;<br>
 <br>
 repeat:<br>
-	bh = get_hash_table(dev, block, size);<br>
-	if (bh) {<br>
-		if (!buffer_dirty(bh)) {<br>
-			bh-&gt;b_flushtime = 0;<br>
-		}<br>
+	bh = find_buffer(dev, block, size);<br>
+	if (bh)<br>
+	{<br>
+		bh-&gt;b_count++;<br>
+		touch_buffer(bh);<br>
 		goto out;<br>
 	}<br>
 <br>
@@ -813,6 +855,7 @@<br>
 	/* Insert the buffer into the regular lists */<br>
 	insert_into_lru_list(bh);<br>
 	insert_into_hash_list(bh);<br>
+	touch_buffer(bh);<br>
 	goto out;<br>
 <br>
 	/*<br>
@@ -868,6 +911,7 @@<br>
 {<br>
 	bh-&gt;b_flushtime = jiffies + (flag ? bdf_prm.b_un.age_super : bdf_prm.b_un.age_buffer);<br>
 	refile_buffer(bh);<br>
+	balance_dirty(bh-&gt;b_dev);<br>
 }<br>
 <br>
 void __mark_buffer_dirty(struct buffer_head *bh, int flag)<br>
@@ -890,10 +934,8 @@<br>
 {<br>
 	int dispose;<br>
 <br>
-	if (buf-&gt;b_dev == B_FREE) {<br>
-		printk("Attempt to refile free buffer\n");<br>
-		return;<br>
-	}<br>
+	if(buf-&gt;b_dev == B_FREE)<br>
+		goto bug_bfree;<br>
 <br>
 	dispose = BUF_CLEAN;<br>
 	if (buffer_locked(buf))<br>
@@ -903,6 +945,10 @@<br>
 <br>
 	if (dispose != buf-&gt;b_list)<br>
 		file_buffer(buf, dispose);<br>
+	return;<br>
+<br>
+ bug_bfree:<br>
+	printk(KERN_ERR "Attempt to refile free buffer\n");<br>
 }<br>
 <br>
 /*<br>
@@ -910,8 +956,6 @@<br>
  */<br>
 void __brelse(struct buffer_head * buf)<br>
 {<br>
-	touch_buffer(buf);<br>
-<br>
 	if (buf-&gt;b_count) {<br>
 		buf-&gt;b_count--;<br>
 		wake_up(&amp;buffer_wait);<br>
@@ -928,14 +972,18 @@<br>
  */<br>
 void __bforget(struct buffer_head * buf)<br>
 {<br>
-	if (buf-&gt;b_count != 1 || buffer_locked(buf)) {<br>
-		__brelse(buf);<br>
+	if (buffer_shared(buf))<br>
+		goto panic_shared;<br>
+	if (buf-&gt;b_count == 1 &amp;&amp; !buffer_locked(buf))<br>
+	{<br>
+		put_last_free(buf);<br>
 		return;<br>
 	}<br>
-	buf-&gt;b_count = 0;<br>
-	buf-&gt;b_state = 0;<br>
-	remove_from_queues(buf);<br>
-	put_last_free(buf);<br>
+	__brelse(buf);<br>
+	return;<br>
+<br>
+ panic_shared:<br>
+	panic("bforget: buffer shared");<br>
 }<br>
 <br>
 /*<br>
@@ -1033,42 +1081,16 @@<br>
 		return;<br>
 	}<br>
 <br>
-//	memset(bh, 0, sizeof(*bh));<br>
-	bh-&gt;b_blocknr = -1;<br>
-	init_waitqueue_head(&amp;bh-&gt;b_wait);<br>
 	nr_unused_buffer_heads++;<br>
 	bh-&gt;b_next_free = unused_list;<br>
-	bh-&gt;b_this_page = NULL;<br>
 	unused_list = bh;<br>
 }<br>
 <br>
-/* <br>
- * We can't put completed temporary IO buffer_heads directly onto the<br>
- * unused_list when they become unlocked, since the device driver<br>
- * end_request routines still expect access to the buffer_head's<br>
- * fields after the final unlock.  So, the device driver puts them on<br>
- * the reuse_list instead once IO completes, and we recover these to<br>
- * the unused_list here.<br>
- *<br>
- * Note that we don't do a wakeup here, but return a flag indicating<br>
- * whether we got any buffer heads. A task ready to sleep can check<br>
- * the returned value, and any tasks already sleeping will have been<br>
- * awakened when the buffer heads were added to the reuse list.<br>
- */<br>
-static inline int recover_reusable_buffer_heads(void)<br>
+static inline void first_bh_init(struct buffer_head * bh)<br>
 {<br>
-	struct buffer_head *head = xchg(&amp;reuse_list, NULL);<br>
-	int found = 0;<br>
-	<br>
-	if (head) {<br>
-		do {<br>
-			struct buffer_head *bh = head;<br>
-			head = head-&gt;b_next_free;<br>
-			put_unused_buffer_head(bh);<br>
-		} while (head);<br>
-		found = 1;<br>
-	}<br>
-	return found;<br>
+	bh-&gt;b_pprev = NULL;<br>
+	init_waitqueue_head(&amp;bh-&gt;b_wait);<br>
+	bh-&gt;b_reqnext = NULL;<br>
 }<br>
 <br>
 /*<br>
@@ -1076,11 +1098,10 @@<br>
  * no-buffer-head deadlock.  Return NULL on failure; waiting for<br>
  * buffer heads is now handled in create_buffers().<br>
  */ <br>
-static struct buffer_head * get_unused_buffer_head(int async)<br>
+static struct buffer_head * get_unused_buffer_head(int async, int slab_mask)<br>
 {<br>
 	struct buffer_head * bh;<br>
 <br>
-	recover_reusable_buffer_heads();<br>
 	if (nr_unused_buffer_heads &gt; NR_RESERVED) {<br>
 		bh = unused_list;<br>
 		unused_list = bh-&gt;b_next_free;<br>
@@ -1092,9 +1113,8 @@<br>
 	 * more buffer heads, because the swap-out may need<br>
 	 * more buffer-heads itself.  Thus SLAB_BUFFER.<br>
 	 */<br>
-	if((bh = kmem_cache_alloc(bh_cachep, SLAB_BUFFER)) != NULL) {<br>
-		memset(bh, 0, sizeof(*bh));<br>
-		init_waitqueue_head(&amp;bh-&gt;b_wait);<br>
+	if((bh = kmem_cache_alloc(bh_cachep, slab_mask)) != NULL) {<br>
+		first_bh_init(bh);<br>
 		nr_buffer_heads++;<br>
 		return bh;<br>
 	}<br>
@@ -1118,8 +1138,6 @@<br>
 	 */<br>
 	if(!async &amp;&amp;<br>
 	   (bh = kmem_cache_alloc(bh_cachep, SLAB_KERNEL)) != NULL) {<br>
-		memset(bh, 0, sizeof(*bh));<br>
-		init_waitqueue_head(&amp;bh-&gt;b_wait);<br>
 		nr_buffer_heads++;<br>
 		return bh;<br>
 	}<br>
@@ -1137,9 +1155,8 @@<br>
  * from ordinary buffer allocations, and only async requests are allowed<br>
  * to sleep waiting for buffer heads. <br>
  */<br>
-static struct buffer_head * create_buffers(unsigned long page, unsigned long size, int async)<br>
+static struct buffer_head * create_buffers(unsigned long page, unsigned long size, int async, int slab_mask)<br>
 {<br>
-	DECLARE_WAITQUEUE(wait, current);<br>
 	struct buffer_head *bh, *head;<br>
 	long offset;<br>
 <br>
@@ -1147,7 +1164,7 @@<br>
 	head = NULL;<br>
 	offset = PAGE_SIZE;<br>
 	while ((offset -= size) &gt;= 0) {<br>
-		bh = get_unused_buffer_head(async);<br>
+		bh = get_unused_buffer_head(async, slab_mask);<br>
 		if (!bh)<br>
 			goto no_grow;<br>
 <br>
@@ -1162,7 +1179,6 @@<br>
 <br>
 		bh-&gt;b_data = (char *) (page+offset);<br>
 		bh-&gt;b_list = BUF_CLEAN;<br>
-		bh-&gt;b_flushtime = 0;<br>
 		bh-&gt;b_end_io = end_buffer_io_bad;<br>
 	}<br>
 	return head;<br>
@@ -1202,12 +1218,7 @@<br>
 	 * Set our state for sleeping, then check again for buffer heads.<br>
 	 * This ensures we won't miss a wake_up from an interrupt.<br>
 	 */<br>
-	add_wait_queue(&amp;buffer_wait, &amp;wait);<br>
-	current-&gt;state = TASK_UNINTERRUPTIBLE;<br>
-	if (!recover_reusable_buffer_heads())<br>
-		schedule();<br>
-	remove_wait_queue(&amp;buffer_wait, &amp;wait);<br>
-	current-&gt;state = TASK_RUNNING;<br>
+	__wait_event(buffer_wait, unused_list);<br>
 	goto try_again;<br>
 }<br>
 <br>
@@ -1226,7 +1237,7 @@<br>
 	 * page-&gt;buffers.<br>
 	 */<br>
 	lock_kernel();<br>
-	head = create_buffers(page_address(page), size, 1);<br>
+	head = create_buffers(page_address(page), size, 1, SLAB_KERNEL);<br>
 	unlock_kernel();<br>
 	if (page-&gt;buffers)<br>
 		BUG();<br>
@@ -1238,6 +1249,7 @@<br>
 <br>
 		tail = bh;<br>
 		init_buffer(bh, dev, block, end_buffer_io_async, NULL);<br>
+		bh-&gt;b_state = 1&lt;&lt;BH_Shared;<br>
 <br>
 		/*<br>
 		 * When we use bmap, we define block zero to represent<br>
@@ -1322,7 +1334,7 @@<br>
 	struct buffer_head *bh, *head, *tail;<br>
 <br>
 	lock_kernel();<br>
-	head = create_buffers(page_address(page), blocksize, 1);<br>
+	head = create_buffers(page_address(page), blocksize, 1, SLAB_KERNEL);<br>
 	unlock_kernel();<br>
 	if (page-&gt;buffers)<br>
 		BUG();<br>
@@ -1331,7 +1343,7 @@<br>
 	do {<br>
 		bh-&gt;b_dev = inode-&gt;i_dev;<br>
 		bh-&gt;b_blocknr = 0;<br>
-		bh-&gt;b_end_io = end_buffer_io_bad;<br>
+		bh-&gt;b_state = 1&lt;&lt;BH_Shared;<br>
 		tail = bh;<br>
 		bh = bh-&gt;b_this_page;<br>
 	} while (bh);<br>
@@ -1386,7 +1398,7 @@<br>
 			if (err)<br>
 				goto out;<br>
 		}<br>
-		set_bit(BH_Uptodate, &amp;bh-&gt;b_state);<br>
+		mark_buffer_uptodate(bh, 1);<br>
 		atomic_mark_buffer_dirty(bh,0);<br>
 <br>
 		bh = bh-&gt;b_this_page;<br>
@@ -1405,7 +1417,7 @@<br>
 	struct dentry *dentry = file-&gt;f_dentry;<br>
 	struct inode *inode = dentry-&gt;d_inode;<br>
 	unsigned long block;<br>
-	int err, partial;<br>
+	int err, nr_uptodate = 0, uptodate;<br>
 	unsigned long blocksize, start_block, end_block;<br>
 	unsigned long start_offset, start_bytes, end_bytes;<br>
 	unsigned long bbits, blocks, i, len;<br>
@@ -1449,16 +1461,15 @@<br>
 <br>
 	i = 0;<br>
 	bh = head;<br>
-	partial = 0;<br>
 	do {<br>
 		if (!bh)<br>
 			BUG();<br>
 <br>
-		if ((i &lt; start_block) || (i &gt; end_block)) {<br>
-			if (!buffer_uptodate(bh))<br>
-				partial = 1;<br>
+		uptodate = buffer_uptodate(bh);<br>
+		nr_uptodate += uptodate;<br>
+<br>
+		if ((i &lt; start_block) || (i &gt; end_block))<br>
 			goto skip;<br>
-		}<br>
 <br>
 		/*<br>
 		 * If the buffer is not up-to-date, we need to ask the low-level<br>
@@ -1470,7 +1481,7 @@<br>
 		 * not going to fill it completely.<br>
 		 */<br>
 		bh-&gt;b_end_io = end_buffer_io_sync;<br>
-		if (!buffer_uptodate(bh)) {<br>
+		if (!uptodate) {<br>
 			int update = start_offset || (end_bytes &amp;&amp; (i == end_block));<br>
 <br>
 			err = fs_get_block(inode, block, bh, update);<br>
@@ -1483,10 +1494,8 @@<br>
 		if (start_offset) {<br>
 			len = start_bytes;<br>
 			start_offset = 0;<br>
-		} else if (end_bytes &amp;&amp; (i == end_block)) {<br>
+		} else if (end_bytes &amp;&amp; (i == end_block))<br>
 			len = end_bytes;<br>
-			end_bytes = 0;<br>
-		}<br>
 		if (copy_from_user(target_buf, buf, len))<br>
 			goto out;<br>
 		target_buf += len;<br>
@@ -1508,14 +1517,10 @@<br>
 		 * should not penalize them for somebody else writing<br>
 		 * lots of dirty pages.<br>
 		 */<br>
-		set_bit(BH_Uptodate, &amp;bh-&gt;b_state);<br>
-		if (!test_and_set_bit(BH_Dirty, &amp;bh-&gt;b_state)) {<br>
-			lock_kernel();<br>
-			__mark_dirty(bh, 0);<br>
-			if (too_many_dirty_buffers)<br>
-				balance_dirty(bh-&gt;b_dev);<br>
-			unlock_kernel();<br>
-		}<br>
+		mark_buffer_uptodate(bh, 1);<br>
+		if (!uptodate)<br>
+			nr_uptodate++;<br>
+		atomic_mark_buffer_dirty(bh, 0);<br>
 <br>
 skip:<br>
 		i++;<br>
@@ -1529,11 +1534,11 @@<br>
 	 * the next read(). Here we 'discover' wether the page went<br>
 	 * uptodate as a result of this (potentially partial) write.<br>
 	 */<br>
-	if (!partial)<br>
+	if ((PAGE_SIZE &gt;&gt; bbits) == nr_uptodate)<br>
 		SetPageUptodate(page);<br>
 	return bytes;<br>
 out:<br>
-	ClearPageUptodate(page);<br>
+	mark_buffer_uptodate(bh, 0);<br>
 	return err;<br>
 }<br>
 <br>
@@ -1549,7 +1554,7 @@<br>
 int brw_page(int rw, struct page *page, kdev_t dev, int b[], int size, int bmap)<br>
 {<br>
 	struct buffer_head *head, *bh, *arr[MAX_BUF_PER_PAGE];<br>
-	int nr, fresh /* temporary debugging flag */, block;<br>
+	int nr, block;<br>
 <br>
 	if (!PageLocked(page))<br>
 		panic("brw_page: page not locked for I/O");<br>
@@ -1558,11 +1563,8 @@<br>
 	 * We pretty much rely on the page lock for this, because<br>
 	 * create_page_buffers() might sleep.<br>
 	 */<br>
-	fresh = 0;<br>
-	if (!page-&gt;buffers) {<br>
+	if (!page-&gt;buffers)<br>
 		create_page_buffers(rw, page, dev, b, size, bmap);<br>
-		fresh = 1;<br>
-	}<br>
 	if (!page-&gt;buffers)<br>
 		BUG();<br>
 	page-&gt;owner = -1;<br>
@@ -1573,30 +1575,14 @@<br>
 	do {<br>
 		block = *(b++);<br>
 <br>
-		if (fresh &amp;&amp; (bh-&gt;b_count != 0))<br>
+		if (bh-&gt;b_blocknr != block)<br>
+			BUG();<br>
+		if (bh-&gt;b_end_io != end_buffer_io_async)<br>
 			BUG();<br>
 		if (rw == READ) {<br>
-			if (!fresh)<br>
-				BUG();<br>
-			if (bmap &amp;&amp; !block) {<br>
-				if (block)<br>
-					BUG();<br>
-			} else {<br>
-				if (bmap &amp;&amp; !block)<br>
-					BUG();<br>
-				if (!buffer_uptodate(bh)) {<br>
-					arr[nr++] = bh;<br>
-				}<br>
-			}<br>
+			if (!buffer_uptodate(bh))<br>
+				arr[nr++] = bh;<br>
 		} else { /* WRITE */<br>
-			if (!bh-&gt;b_blocknr) {<br>
-				if (!block)<br>
-					BUG();<br>
-				bh-&gt;b_blocknr = block;<br>
-			} else {<br>
-				if (!block)<br>
-					BUG();<br>
-			}<br>
 			set_bit(BH_Uptodate, &amp;bh-&gt;b_state);<br>
 			set_bit(BH_Dirty, &amp;bh-&gt;b_state);<br>
 			arr[nr++] = bh;<br>
@@ -1605,46 +1591,21 @@<br>
 	} while (bh != head);<br>
 	if (rw == READ)<br>
 		++current-&gt;maj_flt;<br>
-	if ((rw == READ) &amp;&amp; nr) {<br>
-		if (Page_Uptodate(page))<br>
+	if (nr) {<br>
+		if (rw == READ &amp;&amp; Page_Uptodate(page))<br>
 			BUG();<br>
 		ll_rw_block(rw, nr, arr);<br>
 	} else {<br>
-		if (!nr &amp;&amp; rw == READ) {<br>
+		if (rw == READ) {<br>
 			SetPageUptodate(page);<br>
 			page-&gt;owner = (int)current;<br>
 			UnlockPage(page);<br>
 		}<br>
-		if (nr &amp;&amp; (rw == WRITE))<br>
-			ll_rw_block(rw, nr, arr);<br>
 	}<br>
 	return 0;<br>
 }<br>
 <br>
 /*<br>
- * This is called by end_request() when I/O has completed.<br>
- */<br>
-void mark_buffer_uptodate(struct buffer_head * bh, int on)<br>
-{<br>
-	if (on) {<br>
-		struct buffer_head *tmp = bh;<br>
-		struct page *page;<br>
-		set_bit(BH_Uptodate, &amp;bh-&gt;b_state);<br>
-		/* If a page has buffers and all these buffers are uptodate,<br>
-		 * then the page is uptodate. */<br>
-		do {<br>
-			if (!test_bit(BH_Uptodate, &amp;tmp-&gt;b_state))<br>
-				return;<br>
-			tmp=tmp-&gt;b_this_page;<br>
-		} while (tmp &amp;&amp; tmp != bh);<br>
-		page = mem_map + MAP_NR(bh-&gt;b_data);<br>
-		SetPageUptodate(page);<br>
-		return;<br>
-	}<br>
-	clear_bit(BH_Uptodate, &amp;bh-&gt;b_state);<br>
-}<br>
-<br>
-/*<br>
  * Generic "readpage" function for block devices that have the normal<br>
  * bmap functionality. This is most of the block device filesystems.<br>
  * Reads the page asynchronously --- the unlock_buffer() and<br>
@@ -1674,7 +1635,6 @@<br>
 	bh = head;<br>
 	nr = 0;<br>
 	do {<br>
-		phys_block = bh-&gt;b_blocknr;<br>
 		/*<br>
 		 * important, we have to retry buffers that already have<br>
 		 * their bnr cached but had an IO error!<br>
@@ -1686,8 +1646,7 @@<br>
 			 */<br>
 			if (phys_block) {<br>
 				init_buffer(bh, inode-&gt;i_dev, phys_block, end_buffer_io_async, NULL);<br>
-				arr[nr] = bh;<br>
-				nr++;<br>
+				arr[nr++] = bh;<br>
 			} else {<br>
 				/*<br>
 				 * filesystem 'hole' represents zero-contents.<br>
@@ -1738,7 +1697,7 @@<br>
 <br>
 	if (!(page = __get_free_page(GFP_BUFFER)))<br>
 		return 0;<br>
-	bh = create_buffers(page, size, 0);<br>
+	bh = create_buffers(page, size, 0, SLAB_BUFFER);<br>
 	if (!bh) {<br>
 		free_page(page);<br>
 		return 0;<br>
@@ -1841,6 +1800,15 @@<br>
 	printk("Buffer blocks:   %6d\n",nr_buffers);<br>
 	printk("Buffer hashed:   %6d\n",nr_hashed_buffers);<br>
 <br>
+	/*<br>
+	 * This code runs in parallel with the lru list management.<br>
+	 * To avoid to SMP race we _must_ grab the global spinlock here too.<br>
+	 * It's really really ugly to grab the global kernel lock from an irq<br>
+	 * this way... but it's needed... -Andrea<br>
+	 */<br>
+	if (!spin_trylock(&amp;kernel_flag))<br>
+		goto no_lock;<br>
+<br>
 	for(nlist = 0; nlist &lt; NR_LIST; nlist++) {<br>
 	  found = locked = dirty = used = lastused = protected = 0;<br>
 	  bh = lru_list[nlist];<br>
@@ -1863,44 +1831,94 @@<br>
 		 buf_types[nlist], found, used, lastused,<br>
 		 locked, protected, dirty);<br>
 	};<br>
+	spin_unlock(&amp;kernel_flag);<br>
+	return;<br>
+<br>
+ no_lock:<br>
+	printk("Can't grab the kernel lock this time, try again...\n");<br>
+	return;<br>
 }<br>
 <br>
 <br>
 /* ===================== Init ======================= */<br>
 <br>
 /*<br>
- * allocate the hash table and init the free list<br>
- * Use gfp() for the hash table to decrease TLB misses, use<br>
- * SLAB cache for buffer heads.<br>
+ * Alloc the ram for the hashtable without having to play with the<br>
+ * free area list of the VM. -Andrea<br>
  */<br>
-void __init buffer_init(unsigned long memory_size)<br>
+unsigned long __init buffer_hash_init(unsigned long start, unsigned long end)<br>
 {<br>
-	int order;<br>
-	unsigned int nr_hash;<br>
+	unsigned long mem_size, max_nr_buffers, nr_hash, hash_size;<br>
 <br>
-	/* we need to guess at the right sort of size for a buffer cache.<br>
-	   the heuristic from working with large databases and getting<br>
-	   fsync times (ext2) manageable, is the following */<br>
+#define	BUF_MEAN_BUFFERS_PER_BUCKET	1<br>
 <br>
-	memory_size &gt;&gt;= 22;<br>
-	for (order = 5; (1UL &lt;&lt; order) &lt; memory_size; order++);<br>
+	/*<br>
+	 * My heuristic is to have a mean distribution of 8 buffer chained<br>
+	 * in every hash bucket (supposing all buffers are BLOCK_SIZE wide).<br>
+	 * You can change the distribution simply changing the define<br>
+	 * above. Consider that not all the mem_size RAM can be used for<br>
+	 * buffers (here we are in the early stage of the kernrel boot),<br>
+	 * so using a mean distribution of 1 (supposing to have a perfect<br>
+	 * hash-function) would be waste of ram. Also consider that<br>
+	 * the hash_size will be power-of-two enlarged. -Andrea<br>
+	 */<br>
+	mem_size = end - start;<br>
+	max_nr_buffers = mem_size &gt;&gt; BLOCK_SIZE_BITS;<br>
 <br>
-	/* try to allocate something until we get it or we're asking<br>
-	   for something that is really too small */<br>
+	nr_hash = max_nr_buffers/BUF_MEAN_BUFFERS_PER_BUCKET;<br>
 <br>
-	do {<br>
-		nr_hash = (1UL &lt;&lt; order) * PAGE_SIZE /<br>
-		    sizeof(struct buffer_head *);<br>
-		hash_table = (struct buffer_head **)<br>
-		    __get_free_pages(GFP_ATOMIC, order);<br>
-	} while (hash_table == NULL &amp;&amp; --order &gt; 4);<br>
-	printk("buffer-cache hash table entries: %d (order: %d, %ld bytes)\n", nr_hash, order, (1UL&lt;&lt;order) * PAGE_SIZE);<br>
-	<br>
-	if (!hash_table)<br>
-		panic("Failed to allocate buffer hash table\n");<br>
-	memset(hash_table, 0, nr_hash * sizeof(struct buffer_head *));<br>
-	bh_hash_mask = nr_hash-1;<br>
+	/*<br>
+	 * Now we want nr_hash to be a power of 2 so we'll be allowed<br>
+	 * to do a faster logic AND in the hash function. If it's not a<br>
+	 * power of 2 I enlarge it to the nearest power of 2.<br>
+	 * To do that I invented a funny algorithm. Seems also to work ;),<br>
+	 * but if you know of something of better let me know ;). -Andrea<br>
+	 */<br>
+	if (nr_hash &amp; (nr_hash-1))<br>
+	{<br>
+		nr_hash &lt;&lt;= 1;<br>
+		do<br>
+			nr_hash &amp;= nr_hash-1;<br>
+		while (nr_hash &amp; (nr_hash-1));<br>
+	}<br>
+<br>
+ try_again:<br>
+	hash_size = nr_hash * sizeof(struct buffer_head *);<br>
+	if (!hash_size)<br>
+		panic("hash table zero-sized");<br>
 <br>
+	if (start+hash_size &gt;= end)<br>
+	{<br>
+		/*<br>
+		 * Strange, something gone wrong, so try to decrease the<br>
+		 * power order of the hash table. I think this can never<br>
+		 * happens but better to be paranoid and verbose enough.<br>
+		 * -Andrea<br>
+		 */<br>
+		printk("buffer hashtable too big %lu decresing to %lu\n",<br>
+		       hash_size, hash_size &gt;&gt; 1);<br>
+		nr_hash &gt;&gt;= 1;<br>
+		goto try_again;<br>
+	}<br>
+<br>
+	hash_table = (struct buffer_head **) start;<br>
+	memset(hash_table, 0, hash_size);<br>
+	bh_hash_mask = nr_hash - 1;<br>
+	printk("buffer hashtable: buckets = %lu, size = %lu bytes, "<br>
+	       "mask = %lx\n",<br>
+	       nr_hash, hash_size, bh_hash_mask);<br>
+	/*<br>
+	 * Supposing there is no buggy code around us, we can safely avoid to<br>
+	 * page align. -Andrea<br>
+	 */<br>
+	return start + hash_size;<br>
+}<br>
+<br>
+/*<br>
+ * Allocate the buffer-slab head and init the free list.<br>
+ */<br>
+void __init buffer_init(void)<br>
+{<br>
 	bh_cachep = kmem_cache_create("buffer_head",<br>
 				      sizeof(struct buffer_head),<br>
 				      0,<br>
Index: linux/include/linux/fs.h<br>
===================================================================<br>
RCS file: /var/cvs/linux/include/linux/fs.h,v<br>
retrieving revision 1.1.1.22<br>
diff -u -r1.1.1.22 fs.h<br>
--- linux/include/linux/fs.h	1999/06/25 13:33:53	1.1.1.22<br>
+++ linux/include/linux/fs.h	1999/06/25 16:47:31<br>
@@ -176,7 +176,8 @@<br>
 extern void update_atime (struct inode *);<br>
 #define UPDATE_ATIME(inode) update_atime (inode)<br>
 <br>
-extern void buffer_init(unsigned long);<br>
+extern unsigned long buffer_hash_init(unsigned long, unsigned long);<br>
+extern void buffer_init(void);<br>
 extern void inode_init(void);<br>
 extern void file_table_init(void);<br>
 extern void dcache_init(void);<br>
@@ -189,6 +190,7 @@<br>
 #define BH_Lock		2	/* 1 if the buffer is locked */<br>
 #define BH_Req		3	/* 0 if the buffer has been invalidated */<br>
 #define BH_Protected	6	/* 1 if the buffer is protected */<br>
+#define BH_Shared	7	/* 1 if the buffer is shared */<br>
 /*<br>
  * Try to keep the most commonly used fields in single cache lines (16<br>
  * bytes) to improve performance.  This ordering should be<br>
@@ -204,24 +206,25 @@<br>
 struct buffer_head {<br>
 	/* First cache line: */<br>
 	struct buffer_head * b_next;	/* Hash queue list */<br>
+	struct buffer_head ** b_pprev;	/* doubly linked list of hash-queue */<br>
 	unsigned long b_blocknr;	/* block number */<br>
-	unsigned long b_size;		/* block size */<br>
 	kdev_t b_dev;			/* device (B_FREE = free) */<br>
+	unsigned long b_size;		/* block size */<br>
+<br>
 	kdev_t b_rdev;			/* Real device */<br>
 	unsigned long b_rsector;	/* Real buffer location on disk */<br>
+	char * b_data;			/* pointer to data block (1024 bytes) */<br>
 	struct buffer_head * b_this_page;	/* circular list of buffers in one page */<br>
-	unsigned long b_state;		/* buffer state bitmap (see above) */<br>
 	struct buffer_head * b_next_free;<br>
+	struct buffer_head * b_prev_free;/* doubly linked list of buffers */<br>
 	unsigned int b_count;		/* users using this block */<br>
-<br>
-	/* Non-performance-critical data follows. */<br>
-	char * b_data;			/* pointer to data block (1024 bytes) */<br>
+	unsigned long b_state;		/* buffer state bitmap (see above) */<br>
 	unsigned int b_list;		/* List that this buffer appears */<br>
-	unsigned long b_flushtime;	/* Time when this (dirty) buffer<br>
+	unsigned long b_flushtime;      /* Time when this (dirty) buffer<br>
 					 * should be written */<br>
+<br>
+	/* Non-performance-critical data follows. */<br>
 	wait_queue_head_t b_wait;<br>
-	struct buffer_head ** b_pprev;		/* doubly linked list of hash-queue */<br>
-	struct buffer_head * b_prev_free;	/* doubly linked list of buffers */<br>
 	struct buffer_head * b_reqnext;		/* request queue */<br>
 <br>
 	/*<br>
@@ -234,13 +237,14 @@<br>
 typedef void (bh_end_io_t)(struct buffer_head *bh, int uptodate);<br>
 void init_buffer(struct buffer_head *, kdev_t, int, bh_end_io_t *, void *);<br>
 <br>
-#define __buffer_state(bh, state)	(((bh)-&gt;b_state &amp; (1UL &lt;&lt; BH_##state)) != 0)<br>
+#define __buffer_state(bh, state)	test_bit(BH_##state, &amp;(bh)-&gt;b_state)<br>
 <br>
 #define buffer_uptodate(bh)	__buffer_state(bh,Uptodate)<br>
 #define buffer_dirty(bh)	__buffer_state(bh,Dirty)<br>
 #define buffer_locked(bh)	__buffer_state(bh,Lock)<br>
 #define buffer_req(bh)		__buffer_state(bh,Req)<br>
 #define buffer_protected(bh)	__buffer_state(bh,Protected)<br>
+#define buffer_shared(bh)	__buffer_state(bh,Shared)<br>
 <br>
 #define buffer_page(bh)		(mem_map + MAP_NR((bh)-&gt;b_data))<br>
 #define touch_buffer(bh)	set_bit(PG_referenced, &amp;buffer_page(bh)-&gt;flags)<br>
@@ -748,7 +752,14 @@<br>
 #define BUF_DIRTY	2	/* Dirty buffers, not yet scheduled for write */<br>
 #define NR_LIST		3<br>
 <br>
-void mark_buffer_uptodate(struct buffer_head *, int);<br>
+extern inline void mark_buffer_uptodate(struct buffer_head * bh, int on)<br>
+{<br>
+	if (on) {<br>
+		set_bit(BH_Uptodate, &amp;bh-&gt;b_state);<br>
+		return;<br>
+	}<br>
+	clear_bit(BH_Uptodate, &amp;bh-&gt;b_state);<br>
+}<br>
 <br>
 extern inline void mark_buffer_clean(struct buffer_head * bh)<br>
 {<br>
@@ -781,7 +792,6 @@<br>
 }<br>
 <br>
 <br>
-extern void balance_dirty(kdev_t);<br>
 extern int check_disk_change(kdev_t);<br>
 extern int invalidate_inodes(struct super_block *);<br>
 extern void invalidate_inode_pages(struct inode *);<br>
@@ -845,9 +855,9 @@<br>
 extern void insert_inode_hash(struct inode *);<br>
 extern void remove_inode_hash(struct inode *);<br>
 extern struct file * get_empty_filp(void);<br>
-extern struct buffer_head * get_hash_table(kdev_t, int, int);<br>
-extern struct buffer_head * getblk(kdev_t, int, int);<br>
-extern struct buffer_head * find_buffer(kdev_t, int, int);<br>
+extern struct buffer_head * FASTCALL(get_hash_table(kdev_t, int, int));<br>
+extern struct buffer_head * FASTCALL(getblk(kdev_t, int, int));<br>
+extern struct buffer_head * FASTCALL(find_buffer(kdev_t, int, int));<br>
 extern void ll_rw_block(int, int, struct buffer_head * bh[]);<br>
 extern int is_read_only(kdev_t);<br>
 extern void __brelse(struct buffer_head *);<br>
Index: linux/init/main.c<br>
===================================================================<br>
RCS file: /var/cvs/linux/init/main.c,v<br>
retrieving revision 1.1.1.17<br>
diff -u -r1.1.1.17 main.c<br>
--- linux/init/main.c	1999/06/22 14:23:43	1.1.1.17<br>
+++ linux/init/main.c	1999/06/25 16:46:13<br>
@@ -1166,6 +1166,7 @@<br>
 		memset(prof_buffer, 0, prof_len * sizeof(unsigned int));<br>
 	}<br>
 <br>
+	memory_start = buffer_hash_init(memory_start, memory_end);<br>
 	memory_start = kmem_cache_init(memory_start, memory_end);<br>
 	sti();<br>
 	calibrate_delay();<br>
@@ -1185,7 +1186,7 @@<br>
 	filescache_init();<br>
 	dcache_init();<br>
 	vma_init();<br>
-	buffer_init(memory_end-memory_start);<br>
+	buffer_init();<br>
 	signals_init();<br>
 	inode_init();<br>
 	file_table_init();<br>
Index: linux/kernel/ksyms.c<br>
===================================================================<br>
RCS file: /var/cvs/linux/kernel/ksyms.c,v<br>
retrieving revision 1.1.1.18<br>
diff -u -r1.1.1.18 ksyms.c<br>
--- linux/kernel/ksyms.c	1999/06/22 22:36:34	1.1.1.18<br>
+++ linux/kernel/ksyms.c	1999/06/25 17:00:08<br>
@@ -162,7 +162,6 @@<br>
 EXPORT_SYMBOL(__bforget);<br>
 EXPORT_SYMBOL(ll_rw_block);<br>
 EXPORT_SYMBOL(__wait_on_buffer);<br>
-EXPORT_SYMBOL(mark_buffer_uptodate);<br>
 EXPORT_SYMBOL(add_blkdev_randomness);<br>
 EXPORT_SYMBOL(generic_file_read);<br>
 EXPORT_SYMBOL(generic_file_write);<br>
<p>
BTW, this patch is a good idea too:<br>
<p>
Index: linux/mm/vmscan.c<br>
===================================================================<br>
RCS file: /var/cvs/linux/mm/vmscan.c,v<br>
retrieving revision 1.1.1.11<br>
diff -u -r1.1.1.11 vmscan.c<br>
--- linux/mm/vmscan.c	1999/06/22 22:36:36	1.1.1.11<br>
+++ linux/mm/vmscan.c	1999/06/25 17:13:26<br>
@@ -51,7 +51,7 @@<br>
 	 * Dont be too eager to get aging right if<br>
 	 * memory is dangerously low.<br>
 	 */<br>
-	if (!low_on_memory &amp;&amp; pte_young(pte)) {<br>
+	if (pte_young(pte)) {<br>
 		/*<br>
 		 * Transfer the "accessed" bit from the page<br>
 		 * tables to the global page map.<br>
<p>
<p>
Andrea Arcangeli<br>
<p>
<p>
<p>
-<br>
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in<br>
the body of a message to majordomo@vger.rutgers.edu<br>
Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a><br>
<!-- body="end" -->
<hr>
<p>
<ul>
<!-- next="start" -->
<li> <b>Next message:</b> <a href="1219.html">Alexander Viro: "Re: [RFC] File flags handling - proposal for API."</a>
<li> <b>Previous message:</b> <a href="1217.html">kuznet@ms2.inr.ac.ru: "Re: Got no answer: 2.2.9: MSG_DONTROUTE / SO_DONTROUTE: still working??"</a>
<!-- nextthread="start" -->
<!-- reply="end" -->
</ul>
</font></body>
