Alan Cox wrote:
>On Sat, 2002-10-26 at 20:22, Manfred Spraul wrote:
>  
>
>>kmalloc spends a large part of the total execution time trying to find 
>>the cache for the passed in size.
>>
>>What about the attached patch (against 2.5.44-mm5)?
>>It uses fls jump over the caches that are definitively too small.
>>    
>>
>
>Out of curiousity how does fls compare with finding the right cache by
>using a binary tree walk ? A lot of platforms seem to use generic_fls
>which has a lot of conditions in it and also a lot of references to just
>computed values that look likely to stall 
>  
>
Binary tree walk means 4 unpredictable branches and at least i386 can 
use bsrl for a fast fls().
Patch is attached.
--
    Manfred
--------------040808040707070602020605
Content-Type: text/plain;
 name="patch-fls"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="patch-fls"
--- 2.5/include/asm-i386/bitops.h	Sun Sep 22 06:25:12 2002
+++ build-2.5/include/asm-i386/bitops.h	Sun Oct 27 11:04:57 2002
@@ -414,11 +414,22 @@
 	return word;
 }
 
-/*
+/**
  * fls: find last bit set.
+ * @x: The word to search
+ *
  */
 
-#define fls(x) generic_fls(x)
+static inline int fls(int x)
+{
+	int r;
+
+	__asm__("bsrl %1,%0\n\t"
+		"jnz 1f\n\t"
+		"movl $-1,%0\n"
+		"1:" : "=r" (r) : "g" (x));
+	return r+1;
+}
 
 #ifdef __KERNEL__
 
--------------040808040707070602020605--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/