Re: [patch 13/13] lseek speedup

Anton Altaparmakov (aia21@cantab.net)
Wed, 17 Jul 2002 11:54:35 +0100


Hi Andrew,

At 06:31 17/07/02, Andrew Morton wrote:
> static inline loff_t llseek(struct file *file, loff_t offset, int origin)
> {
>- loff_t (*fn)(struct file *, loff_t, int);
>-
>- fn = default_llseek;
> if (file->f_op && file->f_op->llseek)
>- fn = file->f_op->llseek;
>- return fn(file, offset, origin);
>+ return (*file->f_op->llseek)(file, offset, origin);
>+ return default_llseek(file, offset, origin);
> }

This one is interesting. I have been wondering for quite a while whether
constructs like the original one actually produce better machine code or
whether they should all be cleaned up just as you do here. I was never
bothered enough about it to try but I have half an hour spare at the moment
so I tried both - using gcc-2.96, RedHat 7.3 latest version, ia32,
compiling for Athlon.

While I cannot speak for other compilers, gcc-2.96 at least generates
better code for the old llseek() than for the new llseek() proposed by the
above patch snippet.

The old code has following advantages:
- Has only one conditional jump (comparing to two in new code.
- Has no unconditional jump (comparing to one in new code).
- Uses 16 bytes less stack space.
- Machine code is shorter.

Here are the relevant different sections in sys_llseek() taken from the gcc
generated assembly files both with and without just the above patch snippet
applied:

----old code----
movl $default_llseek, %esi
movl 16(%ebx), %eax
testl %eax, %eax
je .L1427
movl 4(%eax), %eax
testl %eax, %eax
cmovne %eax, %esi
.L1427:
pushl 48(%esp)
pushl %ecx
pushl %edx
pushl 12(%esp)
call *%esi

----new code----
movl 16(%ebx), %eax
testl %eax, %eax
je .L1428
movl 4(%eax), %eax
testl %eax, %eax
je .L1428
pushl 64(%esp)
pushl %ecx
pushl %edx
pushl %ebx
call *%eax
jmp .L1442
.p2align 4,,7
.L1428:
pushl 64(%esp)
pushl %ecx
pushl %edx
pushl 12(%esp)
call default_llseek
.L1442:

Obviously we are counting a few cycles difference only (although the fewer
jmps could make a bigger difference if the branch prediction of the CPU
doesn't match to the common usage patch and the pipe line is stalled) and
sys_llseek() is not exactly the most performance critical code, however I
thought it is interesting to note that the old code is a "coding style"
which produces better machine code in general with (gcc-2.96)...

And also it provided me with something fun to do in the half hour I am
waiting for my gel to run. (-8 Back to lab work...

Best regards,

Anton

-- 
   "I've not lost my mind. It's backed up on tape somewhere." - Unknown
-- 
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/